summaryrefslogtreecommitdiff
path: root/drivers/xen/gntdev.c
AgeCommit message (Collapse)Author
2019-05-14xen/gntdev.c: convert to use vm_map_pages()Souptick Joarder
Convert to use vm_map_pages() to map range of kernel memory to user vma. map->count is passed to vm_map_pages() and internal API verify map->count against count ( count = vma_pages(vma)) for page array boundary overrun condition. Link: http://lkml.kernel.org/r/88e56e82d2db98705c2d842e9c9806c00b366d67.1552921225.git.jrdr.linux@gmail.com Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Airlie <airlied@linux.ie> Cc: Heiko Stuebner <heiko@sntech.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Cc: Pawel Osciak <pawel@osciak.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Sandy Huang <hjc@rock-chips.com> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Thierry Reding <treding@nvidia.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14mm/mmu_notifier: convert user range->blockable to helper functionJérôme Glisse
Use the mmu_notifier_range_blockable() helper function instead of directly dereferencing the range->blockable field. This is done to make it easier to change the mmu_notifier range field. This patch is the outcome of the following coccinelle patch: %<------------------------------------------------------------------- @@ identifier I1, FN; @@ FN(..., struct mmu_notifier_range *I1, ...) { <... -I1->blockable +mmu_notifier_range_blockable(I1) ...> } ------------------------------------------------------------------->% spatch --in-place --sp-file blockable.spatch --dir . Link: http://lkml.kernel.org/r/20190326164747.24405-3-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Christian König <christian.koenig@amd.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14mm/gup: change GUP fast to use flags rather than a write 'bool'Ira Weiny
To facilitate additional options to get_user_pages_fast() change the singular write parameter to be gup_flags. This patch does not change any functionality. New functionality will follow in subsequent patches. Some of the get_user_pages_fast() call sites were unchanged because they already passed FOLL_WRITE or 0 for the write parameter. NOTE: It was suggested to change the ordering of the get_user_pages_fast() arguments to ensure that callers were converted. This breaks the current GUP call site convention of having the returned pages be the final parameter. So the suggestion was rejected. Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marshall <hubcap@omnibond.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-18xen/gntdev: Do not destroy context while dma-bufs are in useOleksandr Andrushchenko
If there are exported DMA buffers which are still in use and grant device is closed by either normal user-space close or by a signal this leads to the grant device context to be destroyed, thus making it not possible to correctly destroy those exported buffers when they are returned back to gntdev and makes the module crash: [ 339.617540] [<ffff00000854c0d8>] dmabuf_exp_ops_release+0x40/0xa8 [ 339.617560] [<ffff00000867a6e8>] dma_buf_release+0x60/0x190 [ 339.617577] [<ffff0000082211f0>] __fput+0x88/0x1d0 [ 339.617589] [<ffff000008221394>] ____fput+0xc/0x18 [ 339.617607] [<ffff0000080ed4e4>] task_work_run+0x9c/0xc0 [ 339.617622] [<ffff000008089714>] do_notify_resume+0xfc/0x108 Fix this by referencing gntdev on each DMA buffer export and unreferencing on buffer release. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Boris Ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2018-12-28mm/mmu_notifier: use structure for invalidate_range_start/end callbackJérôme Glisse
Patch series "mmu notifier contextual informations", v2. This patchset adds contextual information, why an invalidation is happening, to mmu notifier callback. This is necessary for user of mmu notifier that wish to maintains their own data structure without having to add new fields to struct vm_area_struct (vma). For instance device can have they own page table that mirror the process address space. When a vma is unmap (munmap() syscall) the device driver can free the device page table for the range. Today we do not have any information on why a mmu notifier call back is happening and thus device driver have to assume that it is always an munmap(). This is inefficient at it means that it needs to re-allocate device page table on next page fault and rebuild the whole device driver data structure for the range. Other use case beside munmap() also exist, for instance it is pointless for device driver to invalidate the device page table when the invalidation is for the soft dirtyness tracking. Or device driver can optimize away mprotect() that change the page table permission access for the range. This patchset enables all this optimizations for device drivers. I do not include any of those in this series but another patchset I am posting will leverage this. The patchset is pretty simple from a code point of view. The first two patches consolidate all mmu notifier arguments into a struct so that it is easier to add/change arguments. The last patch adds the contextual information (munmap, protection, soft dirty, clear, ...). This patch (of 3): To avoid having to change many callback definition everytime we want to add a parameter use a structure to group all parameters for the mmu_notifier invalidate_range_start/end callback. No functional changes with this patch. [akpm@linux-foundation.org: fix drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c kerneldoc] Link: http://lkml.kernel.org/r/20181205053628.3210-2-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Jason Gunthorpe <jgg@mellanox.com> [infiniband] Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-14xen/gntdev: fix up blockable calls to mn_invl_range_startMichal Hocko
Patch series "mmu_notifiers follow ups". Tetsuo has noticed some fallouts from 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers"). One of them has been fixed and picked up by AMD/DRM maintainer [1]. XEN issue is fixed by patch 1. I have also clarified expectations about blockable semantic of invalidate_range_end. Finally the last patch removes MMU_INVALIDATE_DOES_NOT_BLOCK which is no longer used nor needed. [1] http://lkml.kernel.org/r/20180824135257.GU29735@dhcp22.suse.cz This patch (of 3): 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers") has introduced blockable parameter to all mmu_notifiers and the notifier has to back off when called in !blockable case and it could block down the road. The above commit implemented that for mn_invl_range_start but both in_range checks are done unconditionally regardless of the blockable mode and as such they would fail all the time for regular calls. Fix this by checking blockable parameter as well. Once we are there we can remove the stale TODO. The lock has to be sleepable because we wait for completion down in gnttab_unmap_refs_sync. Link: http://lkml.kernel.org/r/20180827112623.8992-2-mhocko@kernel.org Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers") Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: David Rientjes <rientjes@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-22mm, oom: distinguish blockable mode for mmu notifiersMichal Hocko
There are several blockable mmu notifiers which might sleep in mmu_notifier_invalidate_range_start and that is a problem for the oom_reaper because it needs to guarantee a forward progress so it cannot depend on any sleepable locks. Currently we simply back off and mark an oom victim with blockable mmu notifiers as done after a short sleep. That can result in selecting a new oom victim prematurely because the previous one still hasn't torn its memory down yet. We can do much better though. Even if mmu notifiers use sleepable locks there is no reason to automatically assume those locks are held. Moreover majority of notifiers only care about a portion of the address space and there is absolutely zero reason to fail when we are unmapping an unrelated range. Many notifiers do really block and wait for HW which is harder to handle and we have to bail out though. This patch handles the low hanging fruit. __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks are not allowed to sleep if the flag is set to false. This is achieved by using trylock instead of the sleepable lock for most callbacks and continue as long as we do not block down the call chain. I think we can improve that even further because there is a common pattern to do a range lookup first and then do something about that. The first part can be done without a sleeping lock in most cases AFAICS. The oom_reaper end then simply retries if there is at least one notifier which couldn't make any progress in !blockable mode. A retry loop is already implemented to wait for the mmap_sem and this is basically the same thing. The simplest way for driver developers to test this code path is to wrap userspace code which uses these notifiers into a memcg and set the hard limit to hit the oom. This can be done e.g. after the test faults in all the mmu notifier managed memory and set the hard limit to something really small. Then we are looking for a proper process tear down. [akpm@linux-foundation.org: coding style fixes] [akpm@linux-foundation.org: minor code simplification] Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp Reported-by: David Rientjes <rientjes@google.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-26xen/gntdev: Add initial support for dma-buf UAPIOleksandr Andrushchenko
Add UAPI and IOCTLs for dma-buf grant device driver extension: the extension allows userspace processes and kernel modules to use Xen backed dma-buf implementation. With this extension grant references to the pages of an imported dma-buf can be exported for other domain use and grant references coming from a foreign domain can be converted into a local dma-buf for local export. Implement basic initialization and stubs for Xen DMA buffers' support. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-07-26xen/gntdev: Make private routines/structures accessibleOleksandr Andrushchenko
This is in preparation for adding support of DMA buffer functionality: make map/unmap related code and structures, used privately by gntdev, ready for dma-buf extension, which will re-use these. Rename corresponding structures as those become non-private to gntdev now. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-07-26xen/gntdev: Allow mappings for DMA buffersOleksandr Andrushchenko
Allow mappings for DMA backed buffers if grant table module supports such: this extends grant device to not only map buffers made of balloon pages, but also from buffers allocated with dma_alloc_xxx. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-01-10xen/gntdev: Fix partial gntdev_mmap() cleanupRoss Lagerwall
When cleaning up after a partially successful gntdev_mmap(), unmap the successfully mapped grant pages otherwise Xen will kill the domain if in debug mode (Attempt to implicitly unmap a granted PTE) or Linux will kill the process and emit "BUG: Bad page map in process" if Xen is in release mode. This is only needed when use_ptemod is true because gntdev_put_map() will unmap grant pages itself when use_ptemod is false. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-01-10xen/gntdev: Fix off-by-one error when unmapping with holesRoss Lagerwall
If the requested range has a hole, the calculation of the number of pages to unmap is off by one. Fix it. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-10-25xen/gntdev: avoid out of bounds access in case of partial gntdev_mmap()Juergen Gross
In case gntdev_mmap() succeeds only partially in mapping grant pages it will leave some vital information uninitialized needed later for cleanup. This will lead to an out of bounds array access when unmapping the already mapped pages. So just initialize the data needed for unmapping the pages a little bit earlier. Cc: <stable@vger.kernel.org> Reported-by: Arthur Borsboom <arthurborsboom@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-08-31xen/gntdev: update to new mmu_notifier semanticJérôme Glisse
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Roger Pau Monné <roger.pau@citrix.com> Cc: xen-devel@lists.xenproject.org (moderated for non-subscribers) Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-13drivers, xen: convert grant_map.users from atomic_t to refcount_tElena Reshetova
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar
<linux/sched/mm.h> We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/mm.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. The APIs that are going to be moved first are: mm_alloc() __mmdrop() mmdrop() mmdrop_async_fn() mmdrop_async() mmget_not_zero() mmput() mmput_async() get_task_mm() mm_access() mm_release() Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-28xen/gntdev: Use VM_MIXEDMAP instead of VM_IO to avoid NUMA balancingBoris Ostrovsky
Commit 9c17d96500f7 ("xen/gntdev: Grant maps should not be subject to NUMA balancing") set VM_IO flag to prevent grant maps from being subjected to NUMA balancing. It was discovered recently that this flag causes get_user_pages() to always fail with -EFAULT. check_vma_flags __get_user_pages __get_user_pages_locked __get_user_pages_unlocked get_user_pages_fast iov_iter_get_pages dio_refill_pages do_direct_IO do_blockdev_direct_IO do_blockdev_direct_IO ext4_direct_IO_read generic_file_read_iter aio_run_iocb (which can happen if guest's vdisk has direct-io-safe option). To avoid this let's use VM_MIXEDMAP flag instead --- it prevents NUMA balancing just as VM_IO does and has no effect on check_vma_flags(). Cc: stable@vger.kernel.org Reported-by: Olaf Hering <olaf@aepfle.de> Suggested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Tested-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Juergen Gross <jgross@suse.com>
2016-07-06xen: use vma_pages().Muhammad Falak R Wani
Replace explicit computation of vma page count by a call to vma_pages(). Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-05-24xen/gntdev: reduce copy batch size to 16David Vrabel
IOCTL_GNTDEV_GRANT_COPY batches copy operations to reduce the number of hypercalls. The stack is used to avoid a memory allocation in a hot path. However, a batch size of 24 requires more than 1024 bytes of stack which in some configurations causes a compiler warning. xen/gntdev.c: In function ‘gntdev_ioctl_grant_copy’: xen/gntdev.c:949:1: warning: the frame size of 1248 bytes is larger than 1024 bytes [-Wframe-larger-than=] This is a harmless warning as there is still plenty of stack spare, but people keep trying to "fix" it. Reduce the batch size to 16 to reduce stack usage to less than 1024 bytes. This should have minimal impact on performance. Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-01-07xen/gntdev: add ioctl for grant copyDavid Vrabel
Add IOCTL_GNTDEV_GRANT_COPY to allow applications to copy between user space buffers and grant references. This interface is similar to the GNTTABOP_copy hypercall ABI except the local buffers are provided using a virtual address (instead of a GFN and offset). To avoid userspace from having to page align its buffers the driver will use two or more ops if required. If the ioctl returns 0, the application must check the status of each segment with the segments status field. If the ioctl returns a -ve error code (EINVAL or EFAULT), the status of individual ops is undefined. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2015-12-21xen/gntdev: constify mmu_notifier_ops structuresJulia Lawall
This mmu_notifier_ops structure is never modified, so declare it as const, like the other mmu_notifier_ops structures. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-11-26xen/gntdev: Grant maps should not be subject to NUMA balancingBoris Ostrovsky
Doing so will cause the grant to be unmapped and then, during fault handling, the fault to be mistakenly treated as NUMA hint fault. In addition, even if those maps could partcipate in NUMA balancing, it wouldn't provide any benefit since we are unable to determine physical page's node (even if/when VNUMA is implemented). Marking grant maps' VMAs as VM_IO will exclude them from being part of NUMA balancing. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-09-10mm: mark most vm_operations_struct constKirill A. Shutemov
With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct structs should be constant. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Minchan Kim <minchan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30xen/gntdevt: Fix race condition in gntdev_release()Marek Marczykowski-Górecki
While gntdev_release() is called the MMU notifier is still registered and can traverse priv->maps list even if no pages are mapped (which is the case -- gntdev_release() is called after all). But gntdev_release() will clear that list, so make sure that only one of those things happens at the same time. Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-06-17xen: Include xen/page.h rather than asm/xen/page.hJulien Grall
Using xen/page.h will be necessary later for using common xen page helpers. As xen/page.h already include asm/xen/page.h, always use the later. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: netdev@vger.kernel.org Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-04-27xen/grant: introduce func gnttab_unmap_refs_sync()Bob Liu
There are several place using gnttab async unmap and wait for completion, so move the common code to a function gnttab_unmap_refs_sync(). Signed-off-by: Bob Liu <bob.liu@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-01-28xen/gntdev: provide find_special_page VMA operationDavid Vrabel
For a PV guest, use the find_special_page op to find the right page. To handle VMAs being split, remember the start of the original VMA so the correct index in the pages array can be calculated. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2015-01-28xen/gntdev: mark userspace PTEs as special on x86 PV guestsDavid Vrabel
In an x86 PV guest, get_user_pages_fast() on a userspace address range containing foreign mappings does not work correctly because the M2P lookup of the MFN from a userspace PTE may return the wrong page. Force get_user_pages_fast() to fail on such addresses by marking the PTEs as special. If Xen has XENFEAT_gnttab_map_avail_bits (available since at least 4.0), we can do so efficiently in the grant map hypercall. Otherwise, it needs to be done afterwards. This is both inefficient and racy (the mapping is visible to the task before we fixup the PTEs), but will be fine for well-behaved applications that do not use the mapping until after the mmap() system call returns. Guests with XENFEAT_auto_translated_physmap (ARM and x86 HVM or PVH) do not need this since get_user_pages() has always worked correctly for them. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2015-01-28xen/gntdev: safely unmap grants in case they are still in useJennifer Herbert
Use gnttab_unmap_refs_async() to wait until the mapped pages are no longer in use before unmapping them. This allows userspace programs to safely use Direct I/O and AIO to a network filesystem which may retain refs to pages in queued skbs after the filesystem I/O has completed. Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-01-28xen/gntdev: convert priv->lock to a mutexDavid Vrabel
Unmapping may require sleeping and we unmap while holding priv->lock, so convert it to a mutex. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2015-01-28xen/grant-table: add helpers for allocating pagesDavid Vrabel
Add gnttab_alloc_pages() and gnttab_free_pages() to allocate/free pages suitable to for granted maps. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2015-01-28xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs()David Vrabel
When unmapping grants, instead of converting the kernel map ops to unmap ops on the fly, pre-populate the set of unmap ops. This allows the grant unmap for the kernel mappings to be trivially batched in the future. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-02-03Revert "xen/grant-table: Avoid m2p_override during mapping"Konrad Rzeszutek Wilk
This reverts commit 08ece5bb2312b4510b161a6ef6682f37f4eac8a1. As it breaks ARM builds and needs more attention on the ARM side. Acked-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-31xen/grant-table: Avoid m2p_override during mappingZoltan Kiss
The grant mapping API does m2p_override unnecessarily: only gntdev needs it, for blkback and future netback patches it just cause a lock contention, as those pages never go to userspace. Therefore this series does the following: - the original functions were renamed to __gnttab_[un]map_refs, with a new parameter m2p_override - based on m2p_override either they follow the original behaviour, or just set the private flag and call set_phys_to_machine - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with m2p_override false - a new function gnttab_[un]map_refs_userspace provides the old behaviour It also removes a stray space from page.h and change ret to 0 if XENFEAT_auto_translated_physmap, as that is the only possible return value there. v2: - move the storing of the old mfn in page->index to gnttab_map_refs - move the function header update to a separate patch v3: - a new approach to retain old behaviour where it needed - squash the patches into one v4: - move out the common bits from m2p* functions, and pass pfn/mfn as parameter - clear page->private before doing anything with the page, so m2p_find_override won't race with this v5: - change return value handling in __gnttab_[un]map_refs - remove a stray space in page.h - add detail why ret = 0 now at some places v6: - don't pass pfn to m2p* functions, just get it locally Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Suggested-by: David Vrabel <david.vrabel@citrix.com> Acked-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-06xen/pvh: Piggyback on PVHVM for grant driver (v4)Konrad Rzeszutek Wilk
In PVH the shared grant frame is the PFN and not MFN, hence its mapped via the same code path as HVM. The allocation of the grant frame is done differently - we do not use the early platform-pci driver and have an ioremap area - instead we use balloon memory and stitch all of the non-contingous pages in a virtualized area. That means when we call the hypervisor to replace the GMFN with a XENMAPSPACE_grant_table type, we need to lookup the old PFN for every iteration instead of assuming a flat contingous PFN allocation. Lastly, we only use v1 for grants. This is because PVHVM is not able to use v2 due to no XENMEM_add_to_physmap calls on the error status page (see commit 69e8f430e243d657c2053f097efebc2e2cd559f0 xen/granttable: Disable grant v2 for HVM domains.) Until that is implemented this workaround has to be in place. Also per suggestions by Stefano utilize the PVHVM paths as they share common functionality. v2 of this patch moves most of the PVH code out in the arch/x86/xen/grant-table driver and touches only minimally the generic driver. v3, v4: fixes us some of the code due to earlier patches. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2013-08-20xen/m2p: use GNTTABOP_unmap_and_replace to reinstate the original mappingStefano Stabellini
GNTTABOP_unmap_grant_ref unmaps a grant and replaces it with a 0 mapping instead of reinstating the original mapping. Doing so separately would be racy. To unmap a grant and reinstate the original mapping atomically we use GNTTABOP_unmap_and_replace. GNTTABOP_unmap_and_replace doesn't work with GNTMAP_contains_pte, so don't use it for kmaps. GNTTABOP_unmap_and_replace zeroes the mapping passed in new_addr so we have to reinstate it, however that is a per-cpu mapping only used for balloon scratch pages, so we can be sure that it's not going to be accessed while the mapping is not valid. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: alex@alex.org.uk CC: dcrisan@flexiant.com [v1: Konrad fixed up the conflicts] Conflicts: arch/x86/xen/p2m.c
2013-06-28xen: Convert printks to pr_<level>Joe Perches
Convert printks to pr_<level> (excludes printk(KERN_DEBUG...) to be more consistent throughout the xen subsystem. Add pr_fmt with KBUILD_MODNAME or "xen:" KBUILD_MODNAME Coalesce formats and add missing word spaces Add missing newlines Align arguments and reflow to 80 columns Remove DRV_NAME from formats as pr_fmt adds the same content This does change some of the prefixes of these messages but it also does make them more consistent. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-15xen/gntdev: remove erronous use of copy_to_userDaniel De Graaf
Since there is now a mapping of granted pages in kernel address space in both PV and HVM, use it for UNMAP_NOTIFY_CLEAR_BYTE instead of accessing memory via copy_to_user and triggering sleep-in-atomic warnings. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-15xen/gntdev: correctly unmap unlinked maps in mmu notifierDaniel De Graaf
If gntdev_ioctl_unmap_grant_ref is called on a range before unmapping it, the entry is removed from priv->maps and the later call to mn_invl_range_start won't find it to do the unmapping. Fix this by creating another list of freeable maps that the mmu notifier can search and use to unmap grants. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-15xen/gntdev: fix unsafe vma accessDaniel De Graaf
In gntdev_ioctl_get_offset_for_vaddr, we need to hold mmap_sem while calling find_vma() to avoid potentially having the result freed out from under us. Similarly, the MMU notifier functions need to synchronize with gntdev_vma_close to avoid map->vma being freed during their iteration. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-10-30xen/gntdev: don't leak memory from IOCTL_GNTDEV_MAP_GRANT_REFDavid Vrabel
map->kmap_ops allocated in gntdev_alloc_map() wasn't freed by gntdev_put_map(). Add a gntdev_free_map() helper function to free everything allocated by gntdev_alloc_map(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-10-09mm: kill vma flag VM_RESERVED and mm->reserved_vm counterKonstantin Khlebnikov
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA, currently it lost original meaning but still has some effects: | effect | alternative flags -+------------------------+--------------------------------------------- 1| account as reserved_vm | VM_IO 2| skip in core dump | VM_IO, VM_DONTDUMP 3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP 4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP This patch removes reserved_vm counter from mm_struct. Seems like nobody cares about it, it does not exported into userspace directly, it only reduces total_vm showed in proc. Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP. remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP. remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP. [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-02Merge tag 'stable/for-linus-3.7-x86-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen update from Konrad Rzeszutek Wilk: "Features: - When hotplugging PCI devices in a PV guest we can allocate Xen-SWIOTLB later. - Cleanup Xen SWIOTLB. - Support pages out grants from HVM domains in the backends. - Support wild cards in xen-pciback.hide=(BDF) arguments. - Update grant status updates with upstream hypervisor. - Boot PV guests with more than 128GB. - Cleanup Xen MMU code/add comments. - Obtain XENVERS using a preferred method. - Lay out generic changes to support Xen ARM. - Allow privcmd ioctl for HVM (used to do only PV). - Do v2 of mmap_batch for privcmd ioctls. - If hypervisor saves the LED keyboard light - we will now instruct the kernel about its state. Fixes: - More fixes to Xen PCI backend for various calls/FLR/etc. - With more than 4GB in a 64-bit PV guest disable native SWIOTLB. - Fix up smatch warnings. - Fix up various return values in privmcmd and mm." * tag 'stable/for-linus-3.7-x86-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (48 commits) xen/pciback: Restore the PCI config space after an FLR. xen-pciback: properly clean up after calling pcistub_device_find() xen/vga: add the xen EFI video mode support xen/x86: retrieve keyboard shift status flags from hypervisor. xen/gndev: Xen backend support for paged out grant targets V4. xen-pciback: support wild cards in slot specifications xen/swiotlb: Fix compile warnings when using plain integer instead of NULL pointer. xen/swiotlb: Remove functions not needed anymore. xen/pcifront: Use Xen-SWIOTLB when initting if required. xen/swiotlb: For early initialization, return zero on success. xen/swiotlb: Use the swiotlb_late_init_with_tbl to init Xen-SWIOTLB late when PV PCI is used. xen/swiotlb: Move the error strings to its own function. xen/swiotlb: Move the nr_tbl determination in its own function. xen/arm: compile and run xenbus xen: resynchronise grant table status codes with upstream xen/privcmd: return -EFAULT on error xen/privcmd: Fix mmap batch ioctl error status copy back. xen/privcmd: add PRIVCMD_MMAPBATCH_V2 ioctl xen/mm: return more precise error from xen_remap_domain_range() xen/mmu: If the revector fails, don't attempt to revector anything else. ...
2012-09-12xen/m2p: do not reuse kmap_op->dev_bus_addrStefano Stabellini
If the caller passes a valid kmap_op to m2p_add_override, we use kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is part of the interface with Xen and if we are batching the hypercalls it might not have been written by the hypervisor yet. That means that later on Xen will write to it and we'll think that the original mfn is actually what Xen has written to it. Rather than "stealing" struct members from kmap_op, keep using page->index to store the original mfn and add another parameter to m2p_remove_override to get the corresponding kmap_op instead. It is now responsibility of the caller to keep track of which kmap_op corresponds to a particular page in the m2p_override (gntdev, the only user of this interface that passes a valid kmap_op, is already doing that). CC: stable@kernel.org Reported-and-Tested-By: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-21xen/apic/xenbus/swiotlb/pcifront/grant/tmem: Make functions or variables static.Konrad Rzeszutek Wilk
There is no need for those functions/variables to be visible. Make them static and also fix the compile warnings of this sort: drivers/xen/<some file>.c: warning: symbol '<blah>' was not declared. Should it be static? Some of them just require including the header file that declares the functions. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-17xen/gntdev: do not set VM_PFNMAPStefano Stabellini
Since we are using the m2p_override we do have struct pages corresponding to the user vma mmap'ed by gntdev. Removing the VM_PFNMAP flag makes get_user_pages work on that vma. An example test case would be using a Xen userspace block backend (QDISK) on a file on NFS using O_DIRECT. CC: stable@kernel.org Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-12-20xen/grant-table: Support mappings required by blkbackDaniel De Graaf
Add support for mappings without GNTMAP_contains_pte. This was not supported because the unmap operation assumed that this flag was being used; adding a parameter to the unmap operation to allow the PTE clearing to be disabled is sufficient to make unmap capable of supporting either mapping type. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> [v1: Fix cleanpatch warnings] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-12-20Merge commit 'v3.2-rc3' into stable/for-linus-3.3Konrad Rzeszutek Wilk
* commit 'v3.2-rc3': (412 commits) Linux 3.2-rc3 virtio-pci: make reset operation safer virtio-mmio: Correct the name of the guest features selector virtio: add HAS_IOMEM dependency to MMIO platform bus driver eCryptfs: Extend array bounds for all filename chars eCryptfs: Flush file in vma close eCryptfs: Prevent file create race condition regulator: TPS65910: Fix VDD1/2 voltage selector count i2c: Make i2cdev_notifier_call static i2c: Delete ANY_I2C_BUS i2c: Fix device name for 10-bit slave address i2c-algo-bit: Generate correct i2c address sequence for 10-bit target drm: integer overflow in drm_mode_dirtyfb_ioctl() Revert "of/irq: of_irq_find_parent: check for parent equal to child" drivers/gpu/vga/vgaarb.c: add missing kfree drm/radeon/kms/atom: unify i2c gpio table handling drm/radeon/kms: fix up gpio i2c mask bits for r4xx for real ttm: Don't return the bo reserved on error path mount_subtree() pointless use-after-free iio: fix a leak due to improper use of anon_inode_getfd() ...
2011-11-21xen/gnt{dev,alloc}: reserve event channels for notifyDaniel De Graaf
When using the unmap notify ioctl, the event channel used for notification needs to be reserved to avoid it being deallocated prior to sending the notification. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-16xen-gntdev: integer overflow in gntdev_alloc_map()Dan Carpenter
The multiplications here can overflow resulting in smaller buffer sizes than expected. "count" comes from a copy_from_user(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>