summaryrefslogtreecommitdiff
path: root/drivers/virtio/virtio_ring.c
AgeCommit message (Collapse)Author
2019-11-19virtio_ring: fix return code on DMA mapping failsHalil Pasic
Commit 780bc7903a32 ("virtio_ring: Support DMA APIs") makes virtqueue_add() return -EIO when we fail to map our I/O buffers. This is a very realistic scenario for guests with encrypted memory, as swiotlb may run out of space, depending on it's size and the I/O load. The virtio-blk driver interprets -EIO form virtqueue_add() as an IO error, despite the fact that swiotlb full is in absence of bugs a recoverable condition. Let us change the return code to -ENOMEM, and make the block layer recover form these failures when virtio-blk encounters the condition described above. Cc: stable@vger.kernel.org Fixes: 780bc7903a32 ("virtio_ring: Support DMA APIs") Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Michael Mueller <mimu@linux.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-10-28virtio_ring: fix stalls for packed ringsMarvin Liu
When VIRTIO_F_RING_EVENT_IDX is negotiated, virtio devices can use virtqueue_enable_cb_delayed_packed to reduce the number of device interrupts. At the moment, this is the case for virtio-net when the napi_tx module parameter is set to false. In this case, the virtio driver selects an event offset and expects that the device will send a notification when rolling over the event offset in the ring. However, if this roll-over happens before the event suppression structure update, the notification won't be sent. To address this race condition the driver needs to check wether the device rolled over the offset after updating the event suppression structure. With VIRTIO_F_RING_PACKED, the virtio driver did this by reading the flags field of the descriptor at the specified offset. Unfortunately, checking at the event offset isn't reliable: if descriptors are chained (e.g. when INDIRECT is off) not all descriptors are overwritten by the device, so it's possible that the device skipped the specific descriptor driver is checking when writing out used descriptors. If this happens, the driver won't detect the race condition and will incorrectly expect the device to send a notification. For virtio-net, the result will be a TX queue stall, with the transmission getting blocked forever. With the packed ring, it isn't easy to find a location which is guaranteed to change upon the roll-over, except the next device descriptor, as described in the spec: Writes of device and driver descriptors can generally be reordered, but each side (driver and device) are only required to poll (or test) a single location in memory: the next device descriptor after the one they processed previously, in circular order. while this might be sub-optimal, let's do exactly this for now. Cc: stable@vger.kernel.org Cc: Jason Wang <jasowang@redhat.com> Fixes: f51f982682e2a ("virtio_ring: leverage event idx in packed ring") Signed-off-by: Marvin Liu <yong.liu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-09-09virtio_ring: fix unmap of indirect descriptorsMatthias Lange
The function virtqueue_add_split() DMA-maps the scatterlist buffers. In case a mapping error occurs the already mapped buffers must be unmapped. This happens by jumping to the 'unmap_release' label. In case of indirect descriptors the release is wrong and may leak kernel memory. Because the implementation assumes that the head descriptor is already mapped it starts iterating over the descriptor list starting from the head descriptor. However for indirect descriptors the head descriptor is never mapped in case of an error. The fix is to initialize the start index with zero in case of indirect descriptors and use the 'desc' pointer directly for iterating over the descriptor chain. Signed-off-by: Matthias Lange <matthias.lange@kernkonzept.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 102Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 51 franklin st fifth floor boston ma 02110 1301 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 50 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190523091649.499889647@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-12virtio/virtio_ring: do some comment fixesJiang Biao
There are lots of mismatches between comments and codes, this patch do these comment fixes. Signed-off-by: Jiang Biao <benbjiang@tencent.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-05-12virtio_ring: Fix potential mem leak in virtqueue_add_indirect_packedYueHaibing
'desc' should be freed before leaving from err handing path. Fixes: 1ce9e6055fa0 ("virtio_ring: introduce packed ring support") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> stable@vger.kernel.org
2019-04-08virtio: Honour 'may_reduce_num' in vring_create_virtqueueCornelia Huck
vring_create_virtqueue() allows the caller to specify via the may_reduce_num parameter whether the vring code is allowed to allocate a smaller ring than specified. However, the split ring allocation code tries to allocate a smaller ring on allocation failure regardless of what the caller specified. This may cause trouble for e.g. virtio-pci in legacy mode, which does not support ring resizing. (The packed ring code does not resize in any case.) Let's fix this by bailing out immediately in the split ring code if the requested size cannot be allocated and may_reduce_num has not been specified. While at it, fix a typo in the usage instructions. Fixes: 2a2d1382fe9d ("virtio: Add improved queue allocation API") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Jens Freimann <jfreimann@redhat.com>
2019-03-06virtio: Introduce virtio_max_dma_size()Joerg Roedel
This function returns the maximum segment size for a single dma transaction of a virtio device. The possible limit comes from the SWIOTLB implementation in the Linux kernel, that has an upper limit of (currently) 256kb of contiguous memory it can map. Other DMA-API implementations might also have limits. Use the new dma_max_mapping_size() function to determine the maximum mapping size when DMA-API is in use for virtio. Cc: stable@vger.kernel.org Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-02-05virtio: drop internal struct from UAPIMichael S. Tsirkin
There's no reason to expose struct vring_packed in UAPI - if we do we won't be able to change or drop it, and it's not part of any interface. Let's move it to virtio_ring.c Cc: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-24virtio: support VIRTIO_F_ORDER_PLATFORMTiwei Bie
This patch introduces the support for VIRTIO_F_ORDER_PLATFORM. If this feature is negotiated, the driver must use the barriers suitable for hardware devices. Otherwise, the device and driver are assumed to be implemented in software, that is they can be assumed to run on identical CPUs in an SMP configuration. Thus a weaker form of memory barriers is sufficient to yield better performance. It is recommended that an add-in card based PCI device offers this feature for portability. The device will fail to operate further or will operate in a slower emulation mode if this feature is offered but not accepted. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-26virtio_ring: advertize packed ring layoutTiwei Bie
Advertize the packed ring layout support. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio_ring: leverage event idx in packed ringTiwei Bie
Leverage the EVENT_IDX feature in packed ring to suppress events when it's available. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio_ring: introduce packed ring supportTiwei Bie
Introduce the packed ring support. Packed ring can only be created by vring_create_virtqueue() and each chunk of packed ring will be allocated individually. Packed ring can not be created on preallocated memory by vring_new_virtqueue() or the likes currently. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio_ring: cache whether we will use DMA APITiwei Bie
Cache whether we will use DMA API, instead of doing the check every time. We are going to check whether DMA API is used more often in packed ring. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio_ring: extract split ring handling from ring creationTiwei Bie
Introduce a specific function to create the split ring. And also move the DMA allocation and size information to the .split sub-structure. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio_ring: allocate desc state for split ring separatelyTiwei Bie
Put the split ring's desc state into the .split sub-structure, and allocate desc state for split ring separately, this makes the code more readable and more consistent with what we will do for packed ring. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio_ring: introduce helper for indirect featureTiwei Bie
Introduce a helper to check whether we will use indirect feature. It will be used by packed ring too. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio_ring: introduce debug helpersTiwei Bie
Introduce debug helpers for last_add_time update, check and invalid. They will be used by packed ring too. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio_ring: put split ring fields in a sub structTiwei Bie
Put the split ring specific fields in a sub-struct named as "split" to avoid misuse after introducing packed ring. There is no functional change. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio_ring: put split ring functions togetherTiwei Bie
Put the xxx_split() functions together to make the code more readable and avoid misuse after introducing the packed ring. There is no functional change. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio_ring: add _split suffix for split ring functionsTiwei Bie
Add _split suffix for split ring specific functions. This is a preparation for introducing the packed ring support. There is no functional change. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-12treewide: kmalloc() -> kmalloc_array()Kees Cook
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-04-05headers: untangle kmemleak.h from mm.hRandy Dunlap
Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious reason. It looks like it's only a convenience, so remove kmemleak.h from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that don't already #include it. Also remove <linux/kmemleak.h> from source files that do not use it. This is tested on i386 allmodconfig and x86_64 allmodconfig. It would be good to run it through the 0day bot for other $ARCHes. I have neither the horsepower nor the storage space for the other $ARCHes. Update: This patch has been extensively build-tested by both the 0day bot & kisskb/ozlabs build farms. Both of them reported 2 build failures for which patches are included here (in v2). [ slab.h is the second most used header file after module.h; kernel.h is right there with slab.h. There could be some minor error in the counting due to some #includes having comments after them and I didn't combine all of those. ] [akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr] Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org Link: http://kisskb.ellerman.id.au/kisskb/head/13396/ Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Reported-by: Michael Ellerman <mpe@ellerman.id.au> [2 build failures] Reported-by: Fengguang Wu <fengguang.wu@intel.com> [2 build failures] Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: John Johansen <john.johansen@canonical.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-01virtio_ring: fix num_free handling in error caseTiwei Bie
The vq->vq.num_free hasn't been changed when error happens, so it shouldn't be changed when handling the error. Fixes: 780bc7903a32 ("virtio_ring: Support DMA APIs") Cc: Andy Lutomirski <luto@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-09-07Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas, megaraid_sas, zfcp and a host of minor updates. The major driver change here is the elimination of the block based cciss driver in favour of the SCSI based hpsa driver (which now drives all the legacy cases cciss used to be required for). Plus a reset handler clean up and the redo of the SAS SMP handler to use bsg lib" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits) scsi: scsi-mq: Always unprepare before requeuing a request scsi: Show .retries and .jiffies_at_alloc in debugfs scsi: Improve requeuing behavior scsi: Call scsi_initialize_rq() for filesystem requests scsi: qla2xxx: Reset the logo flag, after target re-login. scsi: qla2xxx: Fix slow mem alloc behind lock scsi: qla2xxx: Clear fc4f_nvme flag scsi: qla2xxx: add missing includes for qla_isr scsi: qla2xxx: Fix an integer overflow in sysfs code scsi: aacraid: report -ENOMEM to upper layer from aac_convert_sgraw2() scsi: aacraid: get rid of one level of indentation scsi: aacraid: fix indentation errors scsi: storvsc: fix memory leak on ring buffer busy scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough scsi: smartpqi: remove the smp_handler stub scsi: hpsa: remove the smp_handler stub scsi: bsg-lib: pass the release callback through bsg_setup_queue scsi: Rework handling of scsi_device.vpd_pg8[03] scsi: Rework the code for caching Vital Product Data (VPD) scsi: rcu: Introduce rcu_swap_protected() ...
2017-08-24scsi: virtio: Reduce BUG if total_sg > virtqueue size to WARN.Richard W.M. Jones
If using indirect descriptors, you can make the total_sg as large as you want. If not, BUG is too serious because the function later returns -ENOSPC. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-07-24virtio_ring: allow to store zero as the ctxJason Wang
Allow zero to be store as a ctx, with this we could store e.g zero value which could be meaningful for the case of storing headroom through ctx. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-02virtio: allow extra context per descriptorMichael S. Tsirkin
Allow extra context per descriptor. To avoid slow down for data path, this disables use of indirect descriptors for this vq. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-02virtio: add context flag to find vqsMichael S. Tsirkin
Allows maintaining extra context per vq. For ease of use, passing in NULL is legal and disables the feature for all vqs. Includes fixes by Christian for s390, acked by Cornelia. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-03Revert "vring: Force use of DMA API for ARM-based systems with legacy devices"Michael S. Tsirkin
This reverts commit c7070619f3408d9a0dffbed9149e6f00479cf43b. This has been shown to regress on some ARM systems: by forcing on DMA API usage for ARM systems, we have inadvertently kicked open a hornets' nest in terms of cache-coherency. Namely that unless the virtio device is explicitly described as capable of coherent DMA by firmware, the DMA APIs on ARM and other DT-based platforms will assume it is non-coherent. This turns out to cause a big problem for the likes of QEMU and kvmtool, which generate virtio-mmio devices in their guest DTs but neglect to add the often-overlooked "dma-coherent" property; as a result, we end up with the guest making non-cacheable accesses to the vring, the host doing so cacheably, both talking past each other and things going horribly wrong. We are working on a safer work-around. Fixes: c7070619f340 ("vring: Force use of DMA API for ARM-based systems with legacy devices") Reported-by: Robin Murphy <robin.murphy@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-25vring: Force use of DMA API for ARM-based systems with legacy devicesWill Deacon
Booting Linux on an ARM fastmodel containing an SMMU emulation results in an unexpected I/O page fault from the legacy virtio-blk PCI device: [ 1.211721] arm-smmu-v3 2b400000.smmu: event 0x10 received: [ 1.211800] arm-smmu-v3 2b400000.smmu: 0x00000000fffff010 [ 1.211880] arm-smmu-v3 2b400000.smmu: 0x0000020800000000 [ 1.211959] arm-smmu-v3 2b400000.smmu: 0x00000008fa081002 [ 1.212075] arm-smmu-v3 2b400000.smmu: 0x0000000000000000 [ 1.212155] arm-smmu-v3 2b400000.smmu: event 0x10 received: [ 1.212234] arm-smmu-v3 2b400000.smmu: 0x00000000fffff010 [ 1.212314] arm-smmu-v3 2b400000.smmu: 0x0000020800000000 [ 1.212394] arm-smmu-v3 2b400000.smmu: 0x00000008fa081000 [ 1.212471] arm-smmu-v3 2b400000.smmu: 0x0000000000000000 <system hangs failing to read partition table> This is because the legacy virtio-blk device is behind an SMMU, so we have consequently swizzled its DMA ops and configured the SMMU to translate accesses. This then requires the vring code to use the DMA API to establish translations, otherwise all transactions will result in fatal faults and termination. Given that ARM-based systems only see an SMMU if one is really present (the topology is all described by firmware tables such as device-tree or IORT), then we can safely use the DMA API for all legacy virtio devices. Modern devices can advertise the prescense of an IOMMU using the VIRTIO_F_IOMMU_PLATFORM feature flag. Cc: Andy Lutomirski <luto@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: <stable@vger.kernel.org> Fixes: 876945dbf649 ("arm64: Hook up IOMMU dma_ops") Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2016-12-16virtio_ring: fix description of virtqueue_get_bufFelipe Franciosi
The device (not the driver) populates the used ring and includes the len of how much data was written. Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15virtio_ring: fix complaint by sparseGonglei
# make C=2 CF="-D__CHECK_ENDIAN__" ./drivers/virtio/ drivers/virtio/virtio_ring.c:423:19: warning: incorrect type in assignment (different base types) drivers/virtio/virtio_ring.c:423:19: expected unsigned int [unsigned] [assigned] i drivers/virtio/virtio_ring.c:423:19: got restricted __virtio16 [usertype] next drivers/virtio/virtio_ring.c:423:19: warning: incorrect type in assignment (different base types) drivers/virtio/virtio_ring.c:423:19: expected unsigned int [unsigned] [assigned] i drivers/virtio/virtio_ring.c:423:19: got restricted __virtio16 [usertype] next drivers/virtio/virtio_ring.c:423:19: warning: incorrect type in assignment (different base types) drivers/virtio/virtio_ring.c:423:19: expected unsigned int [unsigned] [assigned] i drivers/virtio/virtio_ring.c:423:19: got restricted __virtio16 [usertype] next drivers/virtio/virtio_ring.c:604:39: warning: incorrect type in initializer (different base types) drivers/virtio/virtio_ring.c:604:39: expected unsigned short [unsigned] [usertype] nextflag drivers/virtio/virtio_ring.c:604:39: got restricted __virtio16 drivers/virtio/virtio_ring.c:612:33: warning: restricted __virtio16 degrades to integer Signed-off-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-10-31virtio_ring: mark vring_dma_dev inlineMichael S. Tsirkin
This inline function is unused on configurations where dma_map/unmap are empty macros. Make the function inline to avoid gcc errors because of an unused static function. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-10-31virtio_ring: Make interrupt suppression spec compliantLadi Prosek
According to the spec, if the VIRTIO_RING_F_EVENT_IDX feature bit is negotiated the driver MUST set flags to 0. Not dirtying the available ring in virtqueue_disable_cb also has a minor positive performance impact, improving L1 dcache load missed by ~0.5% in vring_bench. Writes to the used event field (vring_used_event) are still unconditional. Cc: Michael S. Tsirkin <mst@redhat.com> Cc: <stable@vger.kernel.org> # f277ec4 virtio_ring: shadow available Cc: <stable@vger.kernel.org> Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09virtio: mark vring_dma_dev() staticBaoyou Xie
We get 1 warning when building kernel with W=1: drivers/virtio/virtio_ring.c:170:16: warning: no previous prototype for 'vring_dma_dev' [-Wmissing-prototypes] In fact, this function is only used in the file in which it is declared and don't need a declaration, but can be made static. so this patch marks this function with 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-09virtio: fix error handling for debug buildsMichael S. Tsirkin
On error, virtqueue_add calls START_USE but not END_USE. Thankfully that's normally empty anyway, but might not be when debugging. Fix it up. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-09virtio: fix memory leak in virtqueue_add()Wei Yongjun
When using the indirect buffers feature, 'desc' is allocated in virtqueue_add() but isn't freed before leaving on a ring full error, causing a memory leak. For example, it seems rather clear that this can trigger with virtio net if mergeable buffers are not used. Cc: stable@vger.kernel.org Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-01virtio: new feature to detect IOMMU device quirkMichael S. Tsirkin
The interaction between virtio and IOMMUs is messy. On most systems with virtio, physical addresses match bus addresses, and it doesn't particularly matter which one we use to program the device. On some systems, including Xen and any system with a physical device that speaks virtio behind a physical IOMMU, we must program the IOMMU for virtio DMA to work at all. On other systems, including SPARC and PPC64, virtio-pci devices are enumerated as though they are behind an IOMMU, but the virtio host ignores the IOMMU, so we must either pretend that the IOMMU isn't there or somehow map everything as the identity. Add a feature bit to detect that quirk: VIRTIO_F_IOMMU_PLATFORM. Any device with this feature bit set to 0 needs a quirk and has to be passed physical addresses (as opposed to bus addresses) even though the device is behind an IOMMU. Note: it has to be a per-device quirk because for example, there could be a mix of passed-through and virtual virtio devices. As another example, some devices could be implemented by an out of process hypervisor backend (in case of qemu vhost, or vhost-user) and so support for an IOMMU needs to be coded up separately. It would be cleanest to handle this in IOMMU core code, but that needs per-device DMA ops. While we are waiting for that to be implemented, use a work-around in virtio core. Note: a "noiommu" feature is a quirk - add a wrapper to make that clear. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-05-01virtio: Silence uninitialized variable warningDan Carpenter
Smatch complains that we might not initialize "queue". The issue is callers like setup_vq() from virtio_pci_modern.c where "num" could be something like 2 and "vring_align" is 64. In that case, vring_size() is less than PAGE_SIZE. It won't happen in real life, but we're getting the value of "num" from a register so it's not really possible to tell what value it holds with static analysis. Let's just silence the warning. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-02vring: Use the DMA API on XenAndy Lutomirski
Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
2016-03-02virtio: Add improved queue allocation APIAndy Lutomirski
This leaves vring_new_virtqueue alone for compatbility, but it adds two new improved APIs: vring_create_virtqueue: Creates a virtqueue backed by automatically allocated coherent memory. (Some day it this could be extended to support non-coherent memory, too, if there ends up being a platform on which it's worthwhile.) __vring_new_virtqueue: Creates a virtqueue with a manually-specified layout. This should allow mic_virtio to work much more cleanly. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-02virtio_ring: Support DMA APIsAndy Lutomirski
virtio_ring currently sends the device (usually a hypervisor) physical addresses of its I/O buffers. This is okay when DMA addresses and physical addresses are the same thing, but this isn't always the case. For example, this never works on Xen guests, and it is likely to fail if a physical "virtio" device ever ends up behind an IOMMU or swiotlb. The immediate use case for me is to enable virtio on Xen guests. For that to work, we need vring to support DMA address translation as well as a corresponding change to virtio_pci or to another driver. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-02vring: Introduce vring_use_dma_api()Andy Lutomirski
This is a kludge, but no one has come up with a a better idea yet. We'll introduce DMA API support guarded by vring_use_dma_api(). Eventually we may be able to return true on more and more systems, and hopefully we can get rid of vring_use_dma_api() entirely some day. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-01-12virtio_ring: use virt_store_mbMichael S. Tsirkin
We need a full barrier after writing out event index, using virt_store_mb there seems better than open-coding. As usual, we need a wrapper to account for strong barriers. It's tempting to use this in vhost as well, for that, we'll need a variant of smp_store_mb that works on __user pointers. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2015-12-07virtio_ring: shadow available ring flags & indexVenkatesh Srinivas
Improves cacheline transfer flow of available ring header. Virtqueues are implemented as a pair of rings, one producer->consumer avail ring and one consumer->producer used ring; preceding the avail ring in memory are two contiguous u16 fields -- avail->flags and avail->idx. A producer posts work by writing to avail->idx and a consumer reads avail->idx. The flags and idx fields only need to be written by a producer CPU and only read by a consumer CPU; when the producer and consumer are running on different CPUs and the virtio_ring code is structured to only have source writes/sink reads, we can continuously transfer the avail header cacheline between 'M' states between cores. This flow optimizes core -> core bandwidth on certain CPUs. (see: "Software Optimization Guide for AMD Family 15h Processors", Section 11.6; similar language appears in the 10h guide and should apply to CPUs w/ exclusive caches, using LLC as a transfer cache) Unfortunately the existing virtio_ring code issued reads to the avail->idx and read-modify-writes to avail->flags on the producer. This change shadows the flags and index fields in producer memory; the vring code now reads from the shadows and only ever writes to avail->flags and avail->idx, allowing the cacheline to transfer core -> core optimally. In a concurrent version of vring_bench, the time required for 10,000,000 buffer checkout/returns was reduced by ~2% (average across many runs) on an AMD Piledriver (15h) CPU: (w/o shadowing): Performance counter stats for './vring_bench': 5,451,082,016 L1-dcache-loads ... 2.221477739 seconds time elapsed (w/ shadowing): Performance counter stats for './vring_bench': 5,405,701,361 L1-dcache-loads ... 2.168405376 seconds time elapsed The further away (in a NUMA sense) virtio producers and consumers are from each other, the more we expect to benefit. Physical implementations of virtio devices and implementations of virtio where the consumer polls vring avail indexes (vhost) should also benefit. Signed-off-by: Venkatesh Srinivas <venkateshs@google.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-12-07virtio: Do not drop __GFP_HIGH in alloc_indirectMichal Hocko
b92b1b89a33c ("virtio: force vring descriptors to be allocated from lowmem") tried to exclude highmem pages for descriptors so it cleared __GFP_HIGHMEM from a given gfp mask. The patch also cleared __GFP_HIGH which doesn't make much sense for this fix because __GFP_HIGH only controls access to memory reserves and it doesn't have any influence on the zone selection. Some of the call paths use GFP_ATOMIC and dropping __GFP_HIGH will reduce their changes for success because the lack of access to memory reserves. Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Mel Gorman <mgorman@techsingularity.net>
2015-02-11virtio: Avoid possible kernel panic if DEBUG is enabled.Tetsuo Handa
The virtqueue_add() calls START_USE() upon entry. The virtqueue_kick() is called if vq->num_added == (1 << 16) - 1 before calling END_USE(). The virtqueue_kick_prepare() called via virtqueue_kick() calls START_USE() upon entry, and will call panic() if DEBUG is enabled. Move this virtqueue_kick() call to after END_USE() call. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-21virtio_ring: coding style fixMichael S. Tsirkin
Most of our code has struct foo { } Fix one instances where ring is inconsistent. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-12-09virtio: make VIRTIO_F_VERSION_1 a transport bitMichael S. Tsirkin
Activate VIRTIO_F_VERSION_1 automatically unless legacy_only is set. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>