linux.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2016-12-02	powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown	Alexey Kardashevskiy
	At the moment the userspace tool is expected to request pinning of the entire guest RAM when VFIO IOMMU SPAPR v2 driver is present. When the userspace process finishes, all the pinned pages need to be put; this is done as a part of the userspace memory context (MM) destruction which happens on the very last mmdrop(). This approach has a problem that a MM of the userspace process may live longer than the userspace process itself as kernel threads use userspace process MMs which was runnning on a CPU where the kernel thread was scheduled to. If this happened, the MM remains referenced until this exact kernel thread wakes up again and releases the very last reference to the MM, on an idle system this can take even hours. This moves preregistered regions tracking from MM to VFIO; insteads of using mm_iommu_table_group_mem_t::used, tce_container::prereg_list is added so each container releases regions which it has pre-registered. This changes the userspace interface to return EBUSY if a memory region is already registered in a container. However it should not have any practical effect as the only userspace tool available now does register memory region once per container anyway. As tce_iommu_register_pages/tce_iommu_unregister_pages are called under container->lock, this does not need additional locking. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-12-02	vfio/spapr: Reference mm in tce_container	Alexey Kardashevskiy
	In some situations the userspace memory context may live longer than the userspace process itself so if we need to do proper memory context cleanup, we better have tce_container take a reference to mm_struct and use it later when the process is gone (@current or @current->mm is NULL). This references mm and stores the pointer in the container; this is done in a new helper - tce_iommu_mm_set() - when one of the following happens: - a container is enabled (IOMMU v1); - a first attempt to pre-register memory is made (IOMMU v2); - a DMA window is created (IOMMU v2). The @mm stays referenced till the container is destroyed. This replaces current->mm with container->mm everywhere except debug prints. This adds a check that current->mm is the same as the one stored in the container to prevent userspace from making changes to a memory context of other processes. DMA map/unmap ioctls() do not check for @mm as they already check for @enabled which is set after tce_iommu_mm_set() is called. This does not reference a task as multiple threads within the same mm are allowed to ioctl() to vfio and supposedly they will have same limits and capabilities and if they do not, we'll just fail with no harm made. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-12-02	vfio/spapr: Postpone default window creation	Alexey Kardashevskiy
	We are going to allow the userspace to configure container in one memory context and pass container fd to another so we are postponing memory allocations accounted against the locked memory limit. One of previous patches took care of it_userspace. At the moment we create the default DMA window when the first group is attached to a container; this is done for the userspace which is not DDW-aware but familiar with the SPAPR TCE IOMMU v2 in the part of memory pre-registration - such client expects the default DMA window to exist. This postpones the default DMA window allocation till one of the folliwing happens: 1. first map/unmap request arrives; 2. new window is requested; This adds noop for the case when the userspace requested removal of the default window which has not been created yet. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-12-02	vfio/spapr: Add a helper to create default DMA window	Alexey Kardashevskiy
	There is already a helper to create a DMA window which does allocate a table and programs it to the IOMMU group. However tce_iommu_take_ownership_ddw() did not use it and did these 2 calls itself to simplify error path. Since we are going to delay the default window creation till the default window is accessed/removed or new window is added, we need a helper to create a default window from all these cases. This adds tce_iommu_create_default_window(). Since it relies on a VFIO container to have at least one IOMMU group (for future use), this changes tce_iommu_attach_group() to add a group to the container first and then call the new helper. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-12-02	vfio/spapr: Postpone allocation of userspace version of TCE table	Alexey Kardashevskiy
	The iommu_table struct manages a hardware TCE table and a vmalloc'd table with corresponding userspace addresses. Both are allocated when the default DMA window is created and this happens when the very first group is attached to a container. As we are going to allow the userspace to configure container in one memory context and pas container fd to another, we have to postpones such allocations till a container fd is passed to the destination user process so we would account locked memory limit against the actual container user constrainsts. This postpones the it_userspace array allocation till it is used first time for mapping. The unmapping patch already checks if the array is allocated. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-12-02	powerpc/iommu: Stop using @current in mm_iommu_xxx	Alexey Kardashevskiy
	This changes mm_iommu_xxx helpers to take mm_struct as a parameter instead of getting it from @current which in some situations may not have a valid reference to mm. This changes helpers to receive @mm and moves all references to @current to the caller, including checks for !current and !current->mm; checks in mm_iommu_preregistered() are removed as there is no caller yet. This moves the mm_iommu_adjust_locked_vm() call to the caller as it receives mm_iommu_table_group_mem_t but it needs mm. This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-12-01	vfio: support notifier chain in vfio_group	Jike Song
	Beyond vfio_iommu events, users might also be interested in vfio_group events. For example, if a vfio_group is used along with Qemu/KVM, whenever kvm pointer is set to/cleared from the vfio_group, users could be notified. Currently only VFIO_GROUP_NOTIFY_SET_KVM supported. Cc: Kirti Wankhede <kwankhede@nvidia.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Jike Song <jike.song@intel.com> [aw: remove use of new typedef] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-12-01	vfio: vfio_register_notifier: classify iommu notifier	Jike Song
	Currently vfio_register_notifier assumes that there is only one notifier chain, which is in vfio_iommu. However, the user might also be interested in events other than vfio_iommu, for example, vfio_group. Refactor vfio_{un}register_notifier implementation to make it feasible. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Jike Song <jike.song@intel.com> [aw: merge with commit 816ca69ea9c7 ("vfio: Fix handling of error returned by 'vfio_group_get_from_dev()'"), remove typedef] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-12-01	vfio: Fix handling of error returned by 'vfio_group_get_from_dev()'	Christophe JAILLET
	'vfio_group_get_from_dev()' seems to return only NULL on error, not an error pointer. Fixes: 2169037dc322 ("vfio iommu: Added pin and unpin callback functions to vfio_iommu_driver_ops") Fixes: c086de818dd8 ("vfio iommu: Add blocking notifier to notify DMA_UNMAP") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-21	vfio: fix vfio_info_cap_add/shift	Eric Auger
	Capability header next field is an offset relative to the start of the INFO buffer. tmp->next is assigned the proper value but iterations implemented in vfio_info_cap_add and vfio_info_cap_shift use next as an offset between the headers. When coping with multiple capabilities this leads to an Oops. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-18	vfio/pci: Drop unnecessary pcibios_err_to_errno()	Cao jin
	As of commit d97ffe236894 ("PCI: Fix return value from pci_user_{read,write}_config_*()") it's unnecessary to call pcibios_err_to_errno() to fixup the return value from these functions. pcibios_err_to_errno() already does simple passthrough of -errno values, therefore no functional change is expected. [aw: changelog] Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	docs: Add Documentation for Mediated devices	Kirti Wankhede
	Add file Documentation/vfio-mediated-device.txt that include details of mediated device framework. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio_platform: Updated to use vfio_set_irqs_validate_and_prepare()	Kirti Wankhede
	Updated vfio_platform_common.c file to use vfio_set_irqs_validate_and_prepare() Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare()	Kirti Wankhede
	Updated vfio_pci.c file to use vfio_set_irqs_validate_and_prepare() Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio: Introduce vfio_set_irqs_validate_and_prepare()	Kirti Wankhede
	Vendor driver using mediated device framework would use same mechnism to validate and prepare IRQs. Introducing this function to reduce code replication in multiple drivers. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio_pci: Update vfio_pci to use vfio_info_add_capability()	Kirti Wankhede
	Update msix_sparse_mmap_cap() to use vfio_info_add_capability() Update region type capability to use vfio_info_add_capability() Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio: Introduce common function to add capabilities	Kirti Wankhede
	Vendor driver using mediated device framework should use vfio_info_add_capability() to add capabilities. Introduced this function to reduce code duplication in vendor drivers. vfio_info_cap_shift() manipulated a data buffer to add an offset to each element in a chain. This data buffer is documented in a uapi header. Changing vfio_info_cap_shift symbol to be available to all drivers. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio iommu: Add blocking notifier to notify DMA_UNMAP	Kirti Wankhede
	Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers about DMA_UNMAP. Exported two APIs vfio_register_notifier() and vfio_unregister_notifier(). Notifier should be registered, if external user wants to use vfio_pin_pages()/vfio_unpin_pages() APIs to pin/unpin pages. Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate mappings. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio iommu type1: Add support for mediated devices	Kirti Wankhede
	VFIO IOMMU drivers are designed for the devices which are IOMMU capable. Mediated device only uses IOMMU APIs, the underlying hardware can be managed by an IOMMU domain. Aim of this change is: - To use most of the code of TYPE1 IOMMU driver for mediated devices - To support direct assigned device and mediated device in single module This change adds pin and unpin support for mediated device to TYPE1 IOMMU backend module. More details: - Domain for external user is tracked separately in vfio_iommu structure. It is allocated when group for first mdev device is attached. - Pages pinned for external domain are tracked in each vfio_dma structure for that iova range. - Page tracking rb-tree in vfio_dma keeps <iova, pfn, ref_count>. Key of rb-tree is iova, but it actually aims to track pfns. - On external pin request for an iova, page is pinned once, if iova is already pinned and tracked, ref_count is incremented. - External unpin request unpins pages only when ref_count is 0. - Pinned pages list is used to find pfn from iova and then unpin it. WARN_ON is added if there are entires in pfn_list while detaching the group and releasing the domain. - Page accounting is updated to account in its address space where the pages are pinned/unpinned, i.e dma->task - Accouting for mdev device is only done if there is no iommu capable domain in the container. When there is a direct device assigned to the container and that domain is iommu capable, all pages are already pinned during DMA_MAP. - Page accouting is updated on hot plug and unplug mdev device and pass through device. Tested by assigning below combinations of devices to a single VM: - GPU pass through only - vGPU device only - One GPU pass through and one vGPU device - Linux VM hot plug and unplug vGPU device while GPU pass through device exist - Linux VM hot plug and unplug GPU pass through device while vGPU device exist Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio iommu type1: Add task structure to vfio_dma	Kirti Wankhede
	Add task structure to vfio_dma structure. Task structure is used for: - During DMA_UNMAP, same task who mapped it or other task who shares same address space is allowed to unmap, otherwise unmap fails. QEMU maps few iova ranges initially, then fork threads and from the child thread calls DMA_UNMAP on previously mapped iova. Since child shares same address space, DMA_UNMAP is successful. - Avoid accessing struct mm while process is exiting by acquiring reference of task's mm during page accounting. - It is also used to get task mlock capability and rlimit for mlock. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio iommu type1: Add find_iommu_group() function	Kirti Wankhede
	Add find_iommu_group() Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Jike Song <jike.song@intel.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio iommu type1: Update argument of vaddr_get_pfn()	Kirti Wankhede
	Update arguments of vaddr_get_pfn() to take struct mm_struct *mm as input argument. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio iommu type1: Update arguments of vfio_lock_acct	Kirti Wankhede
	Added task structure as input argument to vfio_lock_acct() function. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio iommu: Added pin and unpin callback functions to vfio_iommu_driver_ops	Kirti Wankhede
	Added APIs for pining and unpining set of pages. These call back into backend iommu module to actually pin and unpin pages. Added two new callback functions to struct vfio_iommu_driver_ops. Backend IOMMU module that supports pining and unpinning pages for mdev devices should provide these functions. Renamed static functions in vfio_type1_iommu.c to resolve conflicts Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio: Common function to increment container_users	Kirti Wankhede
	This change rearrange functions to have common function to increment container_users Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Jike Song <jike.song@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio: Rearrange functions to get vfio_group from dev	Kirti Wankhede
	This patch rearranges functions to get vfio_group from device Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Jike Song <jike.song@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio: VFIO based driver for Mediated devices	Kirti Wankhede
	vfio_mdev driver registers with mdev core driver. mdev core driver creates mediated device and calls probe routine of vfio_mdev driver for each device. Probe routine of vfio_mdev driver adds mediated device to VFIO core module This driver forms a shim layer that pass through VFIO devices operations to vendor driver for mediated devices. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Jike Song <jike.song@intel.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio: Mediated device Core driver	Kirti Wankhede
	Design for Mediated Device Driver: Main purpose of this driver is to provide a common interface for mediated device management that can be used by different drivers of different devices. This module provides a generic interface to create the device, add it to mediated bus, add device to IOMMU group and then add it to vfio group. Below is the high Level block diagram, with Nvidia, Intel and IBM devices as example, since these are the devices which are going to actively use this module as of now. +---------------+ \| \| \| +-----------+ \| mdev_register_driver() +--------------+ \| \| \| +<------------------------+ __init() \| \| \| mdev \| \| \| \| \| \| bus \| +------------------------>+ \|<-> VFIO user \| \| driver \| \| probe()/remove() \| vfio_mdev.ko \| APIs \| \| \| \| \| \| \| +-----------+ \| +--------------+ \| \| \| MDEV CORE \| \| MODULE \| \| mdev.ko \| \| +-----------+ \| mdev_register_device() +--------------+ \| \| \| +<------------------------+ \| \| \| \| \| \| nvidia.ko \|<-> physical \| \| \| +------------------------>+ \| device \| \| \| \| callback +--------------+ \| \| Physical \| \| \| \| device \| \| mdev_register_device() +--------------+ \| \| interface \| \|<------------------------+ \| \| \| \| \| \| i915.ko \|<-> physical \| \| \| +------------------------>+ \| device \| \| \| \| callback +--------------+ \| \| \| \| \| \| \| \| mdev_register_device() +--------------+ \| \| \| +<------------------------+ \| \| \| \| \| \| ccw_device.ko\|<-> physical \| \| \| +------------------------>+ \| device \| \| \| \| callback +--------------+ \| +-----------+ \| +---------------+ Core driver provides two types of registration interfaces: 1. Registration interface for mediated bus driver: /** * struct mdev_driver - Mediated device's driver * @name: driver name * @probe: called when new device created * @remove:called when device removed * @driver:device driver structure * */ struct mdev_driver { const char name; int (probe) (struct device dev); void (remove) (struct device dev); struct device_driver driver; }; Mediated bus driver for mdev device should use this interface to register and unregister with core driver respectively: int mdev_register_driver(struct mdev_driver drv, struct module owner); void mdev_unregister_driver(struct mdev_driver drv); Mediated bus driver is responsible to add/delete mediated devices to/from VFIO group when devices are bound and unbound to the driver. 2. Physical device driver interface This interface provides vendor driver the set APIs to manage physical device related work in its driver. APIs are : dev_attr_groups: attributes of the parent device. * mdev_attr_groups: attributes of the mediated device. * supported_type_groups: attributes to define supported type. This is mandatory field. * create: to allocate basic resources in vendor driver for a mediated device. This is mandatory to be provided by vendor driver. * remove: to free resources in vendor driver when mediated device is destroyed. This is mandatory to be provided by vendor driver. * open: open callback of mediated device * release: release callback of mediated device * read : read emulation callback. * write: write emulation callback. * ioctl: ioctl callback. * mmap: mmap emulation callback. Drivers should use these interfaces to register and unregister device to mdev core driver respectively: extern int mdev_register_device(struct device dev, const struct parent_ops ops); extern void mdev_unregister_device(struct device *dev); There are no locks to serialize above callbacks in mdev driver and vfio_mdev driver. If required, vendor driver can have locks to serialize above APIs in their driver. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Jike Song <jike.song@intel.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-26	vfio/pci: Fix integer overflows, bitmask check	Vlad Tsyrklevich
	The VFIO_DEVICE_SET_IRQS ioctl did not sufficiently sanitize user-supplied integers, potentially allowing memory corruption. This patch adds appropriate integer overflow checks, checks the range bounds for VFIO_IRQ_SET_DATA_NONE, and also verifies that only single element in the VFIO_IRQ_SET_DATA_TYPE_MASK bitmask is set. VFIO_IRQ_SET_ACTION_TYPE_MASK is already correctly checked later in vfio_pci_set_irqs_ioctl(). Furthermore, a kzalloc is changed to a kcalloc because the use of a kzalloc with an integer multiplication allowed an integer overflow condition to be reached without this patch. kcalloc checks for overflow and should prevent a similar occurrence. Signed-off-by: Vlad Tsyrklevich <vlad@tsyrklevich.net> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-09-29	vfio_pci: use pci_alloc_irq_vectors	Christoph Hellwig
	Simplify the interrupt setup by using the new PCI layer helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-09-26	vfio-pci: Disable INTx after MSI/X teardown	Alex Williamson
	The MSI/X shutdown path can gratuitously enable INTx, which is not something we want to happen if we're dealing with broken INTx device. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-09-26	vfio-pci: Virtualize PCIe & AF FLR	Alex Williamson
	We use a BAR restore trick to try to detect when a user has performed a device reset, possibly through FLR or other backdoors, to put things back into a working state. This is important for backdoor resets, but we can actually just virtualize the "front door" resets provided via PCIe and AF FLR. Set these bits as virtualized + writable, allowing the default write to set them in vconfig, then we can simply check the bit, perform an FLR of our own, and clear the bit. We don't actually have the granularity in PCI to specify the type of reset we want to do, but generally devices don't implement both PCIe and AF FLR and we'll favor these over other types of reset, so we should generally lineup. We do test whether the device provides the requested FLR type to stay consistent with hardware capabilities though. This seems to fix several instance of devices getting into bad states with userspace drivers, like dpdk, running inside a VM. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Greg Rose <grose@lightfleet.com>
2016-09-13	vfio: platform: mark symbols static where possible	Baoyou Xie
	We get a few warnings when building kernel with W=1: drivers/vfio/platform/vfio_platform_common.c:76:5: warning: no previous prototype for 'vfio_platform_acpi_call_reset' [-Wmissing-prototypes] drivers/vfio/platform/vfio_platform_common.c:98:6: warning: no previous prototype for 'vfio_platform_acpi_has_reset' [-Wmissing-prototypes] drivers/vfio/platform/vfio_platform_common.c:640:5: warning: no previous prototype for 'vfio_platform_of_probe' [-Wmissing-prototypes] drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:59:5: warning: no previous prototype for 'vfio_platform_amdxgbe_reset' [-Wmissing-prototypes] drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c:60:5: warning: no previous prototype for 'vfio_platform_calxedaxgmac_reset' [-Wmissing-prototypes] .... In fact, these functions are only used in the file in which they are declared and don't need a declaration, but can be made static. so this patch marks these functions with 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-08-29	vfio/pci: Fix typos in comments	Wei Jiangang
	Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-08-08	vfio/pci: Fix NULL pointer oops in error interrupt setup handling	Alex Williamson
	There are multiple cases in vfio_pci_set_ctx_trigger_single() where we assume we can safely read from our data pointer without actually checking whether the user has passed any data via the count field. VFIO_IRQ_SET_DATA_NONE in particular is entirely broken since we attempt to pull an int32_t file descriptor out before even checking the data type. The other data types assume the data pointer contains one element of their type as well. In part this is good news because we were previously restricted from doing much sanitization of parameters because it was missed in the past and we didn't want to break existing users. Clearly DATA_NONE is completely broken, so it must not have any users and we can fix it up completely. For DATA_BOOL and DATA_EVENTFD, we'll just protect ourselves, returning error when count is zero since we previously would have oopsed. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reported-by: Chris Thompson <the_cartographer@hotmail.com> Cc: stable@vger.kernel.org Reviewed-by: Eric Auger <eric.auger@redhat.com>
2016-07-19	vfio: platform: check reset call return code during release	Sinan Kaya
	Release call is ignoring the return code from reset call and can potentially continue even though reset call failed. If reset_required module parameter is set, this patch is going to validate the return code and will cause stack dump with WARN_ON and warn the user of failure. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-07-19	vfio: platform: check reset call return code during open	Sinan Kaya
	Open call is ignoring the return code from reset call and can potentially continue even though reset call failed. If reset_required module parameter is set, this patch is going to validate the return code and will abort open if reset fails. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-07-19	vfio, platform: make reset driver a requirement by default	Sinan Kaya
	The code was allowing platform devices to be used without a supporting VFIO reset driver. The hardware can be left in some inconsistent state after a guest machine abort. The reset driver will put the hardware back to safe state and disable interrupts before returning the control back to the host machine. Adding a new reset_required kernel module option to platform VFIO drivers. The default value is true for the DT and ACPI based drivers. The reset requirement value for AMBA drivers is set to false and is unchangeable to maintain the existing functionality. New requirements are: 1. A reset function needs to be implemented by the corresponding driver via DT/ACPI. 2. The reset function needs to be discovered via DT/ACPI. The probe of the driver will fail if any of the above conditions are not satisfied. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-07-19	vfio: platform: call _RST method when using ACPI	Sinan Kaya
	The device tree code checks for the presence of a reset driver and calls the of_reset function pointer by looking up the reset driver as a module. ACPI defines _RST method to perform device level reset. After the _RST method is executed, the OS can resume using the device. _RST method is expected to stop DMA transfers and IRQs. This patch introduces two functions as vfio_platform_acpi_has_reset and vfio_platform_acpi_call_reset. The has reset method is used to declare reset capability via the ioctl flag VFIO_DEVICE_FLAGS_RESET. The call reset function is used to execute the _RST ACPI method. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-07-19	vfio: platform: add extra debug info argument to call reset	Sinan Kaya
	Getting ready to bring out extra debug information to the caller so that more verbose information can be printed when an error is observed. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-07-19	vfio: platform: add support for ACPI probe	Sinan Kaya
	The code is using the compatible DT string to associate a reset driver with the actual device itself. The compatible string does not exist on ACPI based systems. HID is the unique identifier for a device driver instead. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-07-19	vfio: platform: determine reset capability	Sinan Kaya
	Creating a new function to determine if this driver supports reset function or not. This is an attempt to abstract device tree calls from the rest of the code. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-07-19	vfio: platform: move reset call to a common function	Sinan Kaya
	The reset call sequence seems to replicate itself multiple times across the file. Grouping them together for maintenance reasons. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-07-19	vfio: platform: rename reset function	Sinan Kaya
	Renaming the reset function to of_reset as it is only used by the device tree based platforms. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-07-14	vfio: fix possible use after free of vfio group	Ilya Lesokhin
	The vfio group should be released after the vfio_group_try_dissolve_container call. The code should not rely on someone else to hold a reference on the group. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-07-08	vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive	Yongji Xie
	Current vfio-pci implementation disallows to mmap sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio page may be shared with other BARs. This will cause some performance issues when we passthrough a PCI device with this kind of BARs. Guest will be not able to handle the mmio accesses to the BARs which leads to mmio emulations in host. However, not all sub-page BARs will share page with other BARs. We should allow to mmap the sub-page MMIO BARs which we can make sure will not share page with other BARs. This patch adds support for this case. And we try to add a dummy resource to reserve the remainder of the page which hot-add device's BAR might be assigned into. But it's not necessary to handle the case when the BAR is not page aligned. Because we can't expect the BAR will be assigned into the same location in a page in guest when we passthrough the BAR. And it's hard to access this BAR in userspace because we have no way to get the BAR's location in a page. Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-06-23	vfio: platform: support No-IOMMU mode	Peng Fan
	The vfio No-IOMMU mode was supported by this 'commit 03a76b60f8ba2797 ("vfio: Include No-IOMMU mode")', but it only support vfio-pci. Using vfio_iommu_group_get/put, but not iommu_group_get/put, the platform devices can be exposed to userspace with CONFIG_VFIO_NOIOMMU and the "enable_unsafe_noiommu_mode" option enabled. From 'commit 03a76b60f8ba2797 ("vfio: Include No-IOMMU mode")', "This should make it very clear that this mode is not safe. Additionally, CAP_SYS_RAWIO privileges are necessary to work with groups and containers using this mode. Groups making use of this support are named /dev/vfio/noiommu-$GROUP and can only make use of the special VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically binding a device without a native IOMMU group to a VFIO bus driver will taint the kernel and should therefore not be considered supported." Signed-off-by: Peng Fan <van.freenix@gmail.com> Cc: Eric Auger <eric.auger@linaro.org> Cc: Baptiste Reynal <b.reynal@virtualopensystems.com> Cc: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-05-31	vfio/pci: Allow VPD short read	Alex Williamson
	The size of the VPD area is not necessarily 4-byte aligned, so a pci_vpd_read() might return less than 4 bytes. Zero our buffer and accept anything other than an error. Intel X710 NICs exercise this. Fixes: 4e1a635552d3 ("vfio/pci: Use kernel VPD access functions") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-05-30	vfio/type1: Fix build warning	Alex Williamson
	This function cannot actually be called with npage = 0, so in practice this doesn't return an uninitialized value. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-05-30	vfio/pci: Fix ordering of eventfd vs virqfd shutdown	Alex Williamson
	Both the INTx and MSI/X disable paths do an eventfd_ctx_put() for the trigger eventfd before calling vfio_virqfd_disable() any potential mask and unmask eventfds. This opens a use-after-free race where an inopportune irqfd can reference the freed signalling eventfd. Reorder to avoid this possibility. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>