summaryrefslogtreecommitdiff
path: root/drivers/block/xen-blkfront.c
AgeCommit message (Collapse)Author
2017-02-17Merge branch 'for-4.11/next' into for-4.11/linus-mergeJens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31block: fold cmd_type into the REQ_OP_ spaceChristoph Hellwig
Instead of keeping two levels of indirection for requests types, fold it all into the operations. The little caveat here is that previously cmd_type only applied to struct request, while the request and bio op fields were set to plain REQ_OP_READ/WRITE even for passthrough operations. Instead this patch adds new REQ_OP_* for SCSI passthrough and driver private requests, althought it has to add two for each so that we can communicate the data in/out nature of the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-23xen-blkfront: correct maximum segment accountingJan Beulich
Making use of "max_indirect_segments=" has issues: - blkfront_setup_indirect() may end up with zero psegs when PAGE_SIZE is sufficiently much larger than XEN_PAGE_SIZE - the variable driven by the command line option (xen_blkif_max_segments) has a somewhat different purpose, and hence should namely never end up being zero - as long as the specified value is lower than the legacy default, we better don't use indirect segments at all (or we'd in fact lower throughput) Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2017-01-23xen-blkfront: feature flags handling adjustmentsJan Beulich
Don't truncate the "feature-persistent" value read from xenstore: Any non-zero value is supposed to enable the feature, just like is already being done for feature_secdiscard. Just like the other feature_* fields, feature_flush and feature_fua are boolean flags, and hence fit well into a single bit. Keep all bit fields together to limit gaps. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-12-13Merge tag 'for-linus-4.10-rc0-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "Xen features and fixes for 4.10 These are some fixes, a move of some arm related headers to share them between arm and arm64 and a series introducing a helper to make code more readable. The most notable change is David stepping down as maintainer of the Xen hypervisor interface. This results in me sending you the pull requests for Xen related code from now on" * tag 'for-linus-4.10-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (29 commits) xen/balloon: Only mark a page as managed when it is released xenbus: fix deadlock on writes to /proc/xen/xenbus xen/scsifront: don't request a slot on the ring until request is ready xen/x86: Increase xen_e820_map to E820_X_MAX possible entries x86: Make E820_X_MAX unconditionally larger than E820MAX xen/pci: Bubble up error and fix description. xen: xenbus: set error code on failure xen: set error code on failures arm/xen: Use alloc_percpu rather than __alloc_percpu arm/arm64: xen: Move shared architecture headers to include/xen/arm xen/events: use xen_vcpu_id mapping for EVTCHNOP_status xen/gntdev: Use VM_MIXEDMAP instead of VM_IO to avoid NUMA balancing xen-scsifront: Add a missing call to kfree MAINTAINERS: update XEN HYPERVISOR INTERFACE xenfs: Use proc_create_mount_point() to create /proc/xen xen-platform: use builtin_pci_driver xen-netback: fix error handling output xen: make use of xenbus_read_unsigned() in xenbus xen: make use of xenbus_read_unsigned() in xen-pciback xen: make use of xenbus_read_unsigned() in xen-fbfront ...
2016-11-07xen: make use of xenbus_read_unsigned() in xen-blkfrontJuergen Gross
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible. This requires to change the type of some reads from int to unsigned, but these cases have been wrong before: negative values are not allowed for the modified cases. Cc: konrad.wilk@oracle.com Cc: roger.pau@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: David Vrabel <david.vrabel@citrix.com>
2016-11-02blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request()Bart Van Assche
Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls are followed by kicking the requeue list. Hence add an argument to these two functions that allows to kick the requeue list. This was proposed by Christoph Hellwig. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-02blk-mq: Avoid that requeueing starts stopped queuesBart Van Assche
Since blk_mq_requeue_work() starts stopped queues and since execution of this function can be scheduled after a queue has been stopped it is not possible to stop queues without using an additional state variable to track whether or not the queue has been stopped. Hence modify blk_mq_requeue_work() such that it does not start stopped queues. My conclusion after a review of the blk_mq_stop_hw_queues() and blk_mq_{delay_,}kick_requeue_list() callers is as follows: * In the dm driver starting and stopping queues should only happen if __dm_suspend() or __dm_resume() is called and not if the requeue list is processed. * In the SCSI core queue stopping and starting should only be performed by the scsi_internal_device_block() and scsi_internal_device_unblock() functions but not by any other function. Although the blk_mq_stop_hw_queue() call in scsi_queue_rq() may help to reduce CPU load if a LLD queue is full, figuring out whether or not a queue should be restarted when requeueing a command would require to introduce additional locking in scsi_mq_requeue_cmd() to avoid a race with scsi_internal_device_block(). Avoid this complexity by removing the blk_mq_stop_hw_queue() call from scsi_queue_rq(). * In the NVMe core only the functions that call blk_mq_start_stopped_hw_queues() explicitly should start stopped queues. * A blk_mq_start_stopped_hwqueues() call must be added in the xen-blkfront driver in its blkif_recover() function. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Roger Pau Monné <roger.pau@citrix.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: James Bottomley <jejb@linux.vnet.ibm.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-15blk-mq: remove ->map_queueChristoph Hellwig
All drivers use the default, so provide an inline version of it. If we ever need other queue mapping we can add an optional method back, although supporting will also require major changes to the queue setup code. This provides better code generation, and better debugability as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-19xen-blkfront: free resources if xlvbd_alloc_gendisk failsBob Liu
Current code forgets to free resources in the failure path of xlvbd_alloc_gendisk(), this patch fix it. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-08-19xen-blkfront: introduce blkif_set_queue_limits()Bob Liu
blk_mq_update_nr_hw_queues() reset all queue limits to default which it's not as xen-blkfront expected, introducing blkif_set_queue_limits() to reset limits with initial correct values. Signed-off-by: Bob Liu <bob.liu@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-08-19xen-blkfront: fix places not updated after introducing 64KB page granularityBob Liu
Two places didn't get updated when 64KB page granularity was introduced, this patch fix them. Signed-off-by: Bob Liu <bob.liu@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-07-27Merge tag 'for-linus-4.8-rc0-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from David Vrabel: "Features and fixes for 4.8-rc0: - ACPI support for guests on ARM platforms. - Generic steal time support for arm and x86. - Support cases where kernel cpu is not Xen VCPU number (e.g., if in-guest kexec is used). - Use the system workqueue instead of a custom workqueue in various places" * tag 'for-linus-4.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (47 commits) xen: add static initialization of steal_clock op to xen_time_ops xen/pvhvm: run xen_vcpu_setup() for the boot CPU xen/evtchn: use xen_vcpu_id mapping xen/events: fifo: use xen_vcpu_id mapping xen/events: use xen_vcpu_id mapping in events_base x86/xen: use xen_vcpu_id mapping when pointing vcpu_info to shared_info x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op xen: introduce xen_vcpu_id mapping x86/acpi: store ACPI ids from MADT for future usage x86/xen: update cpuid.h from Xen-4.7 xen/evtchn: add IOCTL_EVTCHN_RESTRICT xen-blkback: really don't leak mode property xen-blkback: constify instance of "struct attribute_group" xen-blkfront: prefer xenbus_scanf() over xenbus_gather() xen-blkback: prefer xenbus_scanf() over xenbus_gather() xen: support runqueue steal time on xen arm/xen: add support for vm_assist hypercall xen: update xen headers xen-pciback: drop superfluous variables xen-pciback: short-circuit read path used for merging write values ...
2016-07-26Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: "This branch also contains core changes. I've come to the conclusion that from 4.9 and forward, I'll be doing just a single branch. We often have dependencies between core and drivers, and it's hard to always split them up appropriately without pulling core into drivers when that happens. That said, this contains: - separate secure erase type for the core block layer, from Christoph. - set of discard fixes, from Christoph. - bio shrinking fixes from Christoph, as a followup up to the op/flags change in the core branch. - map and append request fixes from Christoph. - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty exciting! - nvme-loop fixes from Arnd. - removal of ->driverfs_dev from Dan, after providing a device_add_disk() helper. - bcache fixes from Bhaktipriya and Yijing. - cdrom subchannel read fix from Vchannaiah. - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier. - set of drbd updates and fixes from Fabian, Lars, and Philipp. - mg_disk error path fix from Bart. - user notification for failed device add for loop, from Minfei. - NVMe in general: + NVMe delay quirk from Guilherme. + SR-IOV support and command retry limits from Keith. + fix for memory-less NUMA node from Masayoshi. + use UINT_MAX for discard sectors, from Minfei. + cancel IO fixes from Ming. + don't allocate unused major, from Neil. + error code fixup from Dan. + use constants for PSDT/FUSE from James. + variable init fix from Jay. + fabrics fixes from Ming, Sagi, and Wei. + various fixes" * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits) nvme/pci: Provide SR-IOV support nvme: initialize variable before logical OR'ing it block: unexport various bio mapping helpers scsi/osd: open code blk_make_request target: stop using blk_make_request block: simplify and export blk_rq_append_bio block: ensure bios return from blk_get_request are properly initialized virtio_blk: use blk_rq_map_kern memstick: don't allow REQ_TYPE_BLOCK_PC requests block: shrink bio size again block: simplify and cleanup bvec pool handling block: get rid of bio_rw and READA block: don't ignore -EOPNOTSUPP blkdev_issue_write_same block: introduce BLKDEV_DISCARD_ZERO to fix zeroout NVMe: don't allocate unused nvme_major nvme: avoid crashes when node 0 is memoryless node. nvme: Limit command retries loop: Make user notify for adding loop device failed nvme-loop: fix nvme-loop Kconfig dependencies nvmet: fix return value check in nvmet_subsys_alloc() ...
2016-07-26Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: - the big change is the cleanup from Mike Christie, cleaning up our uses of command types and modified flags. This is what will throw some merge conflicts - regression fix for the above for btrfs, from Vincent - following up to the above, better packing of struct request from Christoph - a 2038 fix for blktrace from Arnd - a few trivial/spelling fixes from Bart Van Assche - a front merge check fix from Damien, which could cause issues on SMR drives - Atari partition fix from Gabriel - convert cfq to highres timers, since jiffies isn't granular enough for some devices these days. From Jan and Jeff - CFQ priority boost fix idle classes, from me - cleanup series from Ming, improving our bio/bvec iteration - a direct issue fix for blk-mq from Omar - fix for plug merging not involving the IO scheduler, like we do for other types of merges. From Tahsin - expose DAX type internally and through sysfs. From Toshi and Yigal * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits) block: Fix front merge check block: do not merge requests without consulting with io scheduler block: Fix spelling in a source code comment block: expose QUEUE_FLAG_DAX in sysfs block: add QUEUE_FLAG_DAX for devices to advertise their DAX support Btrfs: fix comparison in __btrfs_map_block() block: atari: Return early for unsupported sector size Doc: block: Fix a typo in queue-sysfs.txt cfq-iosched: Charge at least 1 jiffie instead of 1 ns cfq-iosched: Fix regression in bonnie++ rewrite performance cfq-iosched: Convert slice_resid from u64 to s64 block: Convert fifo_time from ulong to u64 blktrace: avoid using timespec block/blk-cgroup.c: Declare local symbols static block/bio-integrity.c: Add #include "blk.h" block/partition-generic.c: Remove a set-but-not-used variable block: bio: kill BIO_MAX_SIZE cfq-iosched: temporarily boost queue priority for idle classes block: drbd: avoid to use BIO_MAX_SIZE block: bio: remove BIO_MAX_SECTORS ...
2016-07-22xen-blkfront: prefer xenbus_scanf() over xenbus_gather()Jan Beulich
... for single items being collected: It is more typesafe (as the compiler can check format string and to-be-written-to variable match) and requires one less parameter to be passed. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-06-29xen-blkfront: save uncompleted reqs in blkfront_resume()Bob Liu
Uncompleted reqs used to be 'saved and resubmitted' in blkfront_recover() during migration, but that's too late after multi-queue was introduced. After a migrate to another host (which may not have multiqueue support), the number of rings (block hardware queues) may be changed and the ring and shadow structure will also be reallocated. The blkfront_recover() then can't 'save and resubmit' the real uncompleted reqs because shadow structure have been reallocated. This patch fixes this issue by moving the 'save' logic out of blkfront_recover() to earlier place in blkfront_resume(). The 'resubmit' is not changed and still in blkfront_recover(). Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: stable@vger.kernel.org
2016-06-27block: convert to device_add_disk()Dan Williams
For block drivers that specify a parent device, convert them to use device_add_disk(). This conversion was done with the following semantic patch: @@ struct gendisk *disk; expression E; @@ - disk->driverfs_dev = E; ... - add_disk(disk); + device_add_disk(E, disk); @@ struct gendisk *disk; expression E1, E2; @@ - disk->driverfs_dev = E1; ... E2 = disk; ... - add_disk(E2); + device_add_disk(E1, E2); ...plus some manual fixups for a few missed conversions. Cc: Jens Axboe <axboe@fb.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-06-09block: add a separate operation type for secure eraseChristoph Hellwig
Instead of overloading the discard support with the REQ_SECURE flag. Use the opportunity to rename the queue flag as well, and remove the dead checks for this flag in the RAID 1 and RAID 10 drivers that don't claim support for secure erase. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-08xen-blkfront: fix resume issues after a migrationBob Liu
After a migrate to another host (which may not have multiqueue support), the number of rings (block hardware queues) may be changed and the ring info structure will also be reallocated. This patch fixes two related bugs: * call blk_mq_update_nr_hw_queues() to make blk-core know the number of hardware queues have been changed. * Don't store rinfo pointer to hctx->driver_data, because rinfo may be reallocated so use hctx->queue_num to get the rinfo structure instead. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-06-08xen-blkfront: don't call talk_to_blkback when already connected to blkbackBob Liu
Sometimes blkfront may twice receive blkback_changed() notification (XenbusStateConnected) after migration, which will cause talk_to_blkback() to be called twice too and confuse xen-blkback. The flow is as follow: blkfront blkback blkfront_resume() > talk_to_blkback() > Set blkfront to XenbusStateInitialised front changed() > Connect() > Set blkback to XenbusStateConnected blkback_changed() > Skip talk_to_blkback() because frontstate == XenbusStateInitialised > blkfront_connect() > Set blkfront to XenbusStateConnected ----- And here we get another XenbusStateConnected notification leading to: ----- blkback_changed() > because now frontstate != XenbusStateInitialised talk_to_blkback() is also called again > blkfront state changed from XenbusStateConnected to XenbusStateInitialised (Which is not correct!) front_changed(): > Do nothing because blkback already in XenbusStateConnected Now blkback is in XenbusStateConnected but blkfront is still in XenbusStateInitialised - leading to no disks. Poking of the XenbusStateConnected state is allowed (to deal with block disk change) and has to be dealt with. The most likely cause of this bug are custom udev scripts hooking up the disks and then validating the size. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-06-07block: do not use REQ_FLUSH for tracking flush supportMike Christie
The last patch added a REQ_OP_FLUSH for request_fn drivers and the next patch renames REQ_FLUSH to REQ_PREFLUSH which will be used by file systems and make_request_fn drivers so they can send a write/flush combo. This patch drops xen's use of REQ_FLUSH to track if it supports REQ_OP_FLUSH requests, so REQ_FLUSH can be deleted. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Juergen Gross <kernel@pfupf.net> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07block, drivers: add REQ_OP_FLUSH operationMike Christie
This adds a REQ_OP_FLUSH operation that is sent to request_fn based drivers by the block layer's flush code, instead of sending requests with the request->cmd_flags REQ_FLUSH bit set. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07drivers: use req op accessorMike Christie
The req operation REQ_OP is separated from the rq_flag_bits definition. This converts the block layer drivers to use req_op to get the op from the request struct. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07block/fs/drivers: remove rw argument from submit_bioMike Christie
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw instead of passing it in. This makes that use the same as generic_make_request and how we set the other bio fields. Signed-off-by: Mike Christie <mchristi@redhat.com> Fixed up fs/ext4/crypto.c Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-12xen-blkfront: switch to using blk_queue_write_cache()Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-03-18Merge branch 'for-4.6/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: "This is the block driver pull request for this merge window. It sits on top of for-4.6/core, that was just sent out. This contains: - A set of fixes for lightnvm. One from Alan, fixing an overflow, and the rest from the usual suspects, Javier and Matias. - A set of fixes for nbd from Markus and Dan, and a fixup from Arnd for correct usage of the signed 64-bit divider. - A set of bug fixes for the Micron mtip32xx, from Asai. - A fix for the brd discard handling from Bart. - Update the maintainers entry for cciss, since that hardware has transferred ownership. - Three bug fixes for bcache from Eric Wheeler. - Set of fixes for xen-blk{back,front} from Jan and Konrad. - Removal of the cpqarray driver. It has been disabled in Kconfig since 2013, and we were initially scheduled to remove it in 3.15. - Various updates and fixes for NVMe, with the most important being: - Removal of the per-device NVMe thread, replacing that with a watchdog timer instead. From Christoph. - Exposing the namespace WWID through sysfs, from Keith. - Set of cleanups from Ming Lin. - Logging the controller device name instead of the underlying PCI device name, from Sagi. - And a bunch of fixes and optimizations from the usual suspects in this area" * 'for-4.6/drivers' of git://git.kernel.dk/linux-block: (49 commits) NVMe: Expose ns wwid through single sysfs entry drivers:block: cpqarray clean up brd: Fix discard request processing cpqarray: remove it from the kernel cciss: update MAINTAINERS NVMe: Remove unused sq_head read in completion path bcache: fix cache_set_flush() NULL pointer dereference on OOM bcache: cleaned up error handling around register_cache() bcache: fix race of writeback thread starting before complete initialization NVMe: Create discard zero quirk white list nbd: use correct div_s64 helper mtip32xx: remove unneeded variable in mtip_cmd_timeout() lightnvm: generalize rrpc ppa calculations lightnvm: remove struct nvm_dev->total_blocks lightnvm: rename ->nr_pages to ->nr_sects lightnvm: update closed list outside of intr context xen/blback: Fit the important information of the thread in 17 characters lightnvm: fold get bb tbl when using dual/quad plane mode lightnvm: fix up nonsensical configure overrun checking xen-blkback: advertise indirect segment support earlier ...
2016-03-03xen-blkfront: rename indirect descriptor parameterJan Beulich
"max" is rather ambiguous and carries pretty little meaning, the more that there are also "max_queues" and "max_ring_page_order". Make this "max_indirect_segments" instead, and at once change the type from int to uint (to match the respective variable's type). Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-29xen/blkfront: realloc ring info in blkif_resumeBob Liu
Need to reallocate ring info in the resume path, because info->rinfo was freed in blkif_free(). And 'multi-queue-max-queues' backend reports may have been changed. Signed-off-by: Bob Liu <bob.liu@oracle.com> Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: Fix crash if backend doesn't follow the right states.Konrad Rzeszutek Wilk
We have split the setting up of all the resources in two steps: 1) talk_to_blkback - which figures out the num_ring_pages (from the default value of zero), sets up shadow and so 2) blkfront_connect - does the real part of filling out the internal structures. The problem is if we bypass the 1) step and go straight to 2) and call blkfront_setup_indirect where we use the macro BLK_RING_SIZE - which returns an negative value (because sz is zero - since num_ring_pages is zero - since it has never been set). We can fix this by making sure that we always have called talk_to_blkback before going to blkfront_connect. Or we could set in blkfront_probe info->nr_ring_pages = 1 to have a default value. But that looks odd - as we haven't actually negotiated any ring size. This patch changes XenbusStateConnected state to detect if we haven't done the initial handshake - and if so continue on as if were in XenbusStateInitWait state. We also roll the error recovery (freeing the structure) into talk_to_blkback error path - which is safe since that function is only called from blkback_changed. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: Handle non-indirect grant with 64KB pagesJulien Grall
The minimal size of request in the block framework is always PAGE_SIZE. It means that when 64KB guest is support, the request will at least be 64KB. Although, if the backend doesn't support indirect descriptor (such as QDISK in QEMU), a ring request is only able to accommodate 11 segments of 4KB (i.e 44KB). The current frontend is assuming that an I/O request will always fit in a ring request. This is not true any more when using 64KB page granularity and will therefore crash during boot. On ARM64, the ABI is completely neutral to the page granularity used by the domU. The guest has the choice between different page granularity supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB). This can't be enforced by the hypervisor and therefore it's possible to run guests using different page granularity. So we can't mandate the block backend to support indirect descriptor when the frontend is using 64KB page granularity and have to fix it properly in the frontend. The solution exposed below is based on modifying directly the frontend guest rather than asking the block framework to support smaller size (i.e < PAGE_SIZE). This is because the change is the block framework are not trivial as everything seems to relying on a struct *page (see [1]). Although, it may be possible that someone succeed to do it in the future and we would therefore be able to use it. Given that a block request may not fit in a single ring request, a second request is introduced for the data that cannot fit in the first one. This means that the second ring request should never be used on Linux if the page size is smaller than 44KB. To achieve the support of the extra ring request, the block queue size is divided by two. Therefore, the ring will always contain enough space to accommodate 2 ring requests. While this will reduce the overall performance, it will make the implementation more contained. The way forward to get better performance is to implement in the backend either indirect descriptor or multiple grants ring. Note that the parameters blk_queue_max_* helpers haven't been updated. The block code will set the mimimum size supported and we may be able to support directly any change in the block framework that lower down the minimal size of a request. [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.html Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen-blkfront: Introduce blkif_ring_get_requestJulien Grall
The code to get a request is always the same. Therefore we can factorize it in a single function. Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blocks: Return -EXX instead of -1Konrad Rzeszutek Wilk
Lets return sensible values instead of -1. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: correct setting for xen_blkif_max_ring_orderPeng Fan
According to this piece code: " pr_info("Invalid max_ring_order (%d), will use default max: %d.\n", xen_blkif_max_ring_order, XENBUS_MAX_RING_GRANT_ORDER); " if xen_blkif_max_ring_order is bigger that XENBUS_MAX_RING_GRANT_ORDER, need to set xen_blkif_max_ring_order using XENBUS_MAX_RING_GRANT_ORDER, but not 0. Signed-off-by: Peng Fan <van.freenix@gmail.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: make persistent grants pool per-queueBob Liu
Make persistent grants per-queue/ring instead of per-device, so that we can drop the 'dev_lock' and get better scalability. Test was done based on null_blk driver: dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk" domu: v4.2-rc8 16vcpus 10GB [test] rw=read direct=1 ioengine=libaio bs=4k time_based runtime=30 filename=/dev/xvdb numjobs=16 iodepth=64 iodepth_batch=64 iodepth_batch_complete=64 group_reporting Queues: 1 4 8 16 Iops orig(k): 810 1064 780 700 Iops patched(k): 810 1230(~20%) 1024(~20%) 850(~20%) Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: Remove duplicate setting of ->xbdev.Bob Liu
We do the same exact operations a bit earlier in the function. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: Cleanup of comments, fix unaligned variables, and syntax errors.Konrad Rzeszutek Wilk
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: negotiate number of queues/rings to be used with backendBob Liu
The max number of hardware queues for xen/blkfront is set by parameter 'max_queues'(default 4), while it is also capped by the max value that the xen/blkback exposes through XenStore key 'multi-queue-max-queues'. The negotiated number is the smaller one and would be written back to xenstore as "multi-queue-num-queues", blkback needs to read this negotiated number. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: split per device io_lockBob Liu
After patch "xen/blkfront: separate per ring information out of device info", per-ring data is protected by a per-device lock ('io_lock'). This is not a good way and will effect the scalability, so introduce a per-ring lock ('ring_lock'). The old 'io_lock' is renamed to 'dev_lock' which protects the ->grants list and ->persistent_gnts_c which are shared by all rings. Note that in 'blkfront_probe' the 'blkfront_info' is setup via kzalloc so setting ->persistent_gnts_c to zero is not needed. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: pseudo support for multi hardware queues/ringsBob Liu
Preparatory patch for multiple hardware queues (rings). The number of rings is unconditionally set to 1, larger number will be enabled in patch "xen/blkfront: negotiate number of queues/rings to be used with backend" so as to make review easier. Note that blkfront_gather_backend_features does not call blkfront_setup_indirect anymore (as that needs to be done per ring). That means that in blkif_recover/blkif_connect we have to do it in a loop (bounded by nr_rings). Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: separate per ring information out of device infoBob Liu
Split per ring information to a new structure "blkfront_ring_info". A ring is the representation of a hardware queue, every vbd device can associate with one or more rings depending on how many hardware queues/rings to be used. This patch is a preparation for supporting real multi hardware queues/rings. We also add a backpointer to 'struct blkfront_info' (dev_info) which is not needed (we could use containers_of) but further patch ("xen/blkfront: pseudo support for multi hardware queues/rings") will make allocation of 'blkfront_ring_info' dynamic. Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-11-04Merge tag 'for-linus-4.4-rc0-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from David Vrabel: - Improve balloon driver memory hotplug placement. - Use unpopulated hotplugged memory for foreign pages (if supported/enabled). - Support 64 KiB guest pages on arm64. - CPU hotplug support on arm/arm64. * tag 'for-linus-4.4-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (44 commits) xen: fix the check of e_pfn in xen_find_pfn_range x86/xen: add reschedule point when mapping foreign GFNs xen/arm: don't try to re-register vcpu_info on cpu_hotplug. xen, cpu_hotplug: call device_offline instead of cpu_down xen/arm: Enable cpu_hotplug.c xenbus: Support multiple grants ring with 64KB xen/grant-table: Add an helper to iterate over a specific number of grants xen/xenbus: Rename *RING_PAGE* to *RING_GRANT* xen/arm: correct comment in enlighten.c xen/gntdev: use types from linux/types.h in userspace headers xen/gntalloc: use types from linux/types.h in userspace headers xen/balloon: Use the correct sizeof when declaring frame_list xen/swiotlb: Add support for 64KB page granularity xen/swiotlb: Pass addresses rather than frame numbers to xen_arch_need_swiotlb arm/xen: Add support for 64KB page granularity xen/privcmd: Add support for Linux 64KB page granularity net/xen-netback: Make it running on 64KB page granularity net/xen-netfront: Make it running on 64KB page granularity block/xen-blkback: Make it running on 64KB page granularity block/xen-blkfront: Make it running on 64KB page granularity ...
2015-10-23xen/xenbus: Rename *RING_PAGE* to *RING_GRANT*Julien Grall
Linux may use a different page size than the size of grant. So make clear that the order is actually in number of grant. Signed-off-by: Julien Grall <julien.grall@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23block/xen-blkfront: Make it running on 64KB page granularityJulien Grall
The PV block protocol is using 4KB page granularity. The goal of this patch is to allow a Linux using 64KB page granularity using block device on a non-modified Xen. The block API is using segment which should at least be the size of a Linux page. Therefore, the driver will have to break the page in chunk of 4K before giving the page to the backend. When breaking a 64KB segment in 4KB chunks, it is possible that some chunks are empty. As the PV protocol always require to have data in the chunk, we have to count the number of Xen page which will be in use and avoid sending empty chunks. Note that, a pre-defined number of grants are reserved before preparing the request. This pre-defined number is based on the number and the maximum size of the segments. If each segment contains a very small amount of data, the driver may reserve too many grants (16 grants is reserved per segment with 64KB page granularity). Furthermore, in the case of persistent grants we allocate one Linux page per grant although only the first 4KB of the page will be effectively in use. This could be improved by sharing the page with multiple grants. Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23block/xen-blkfront: split get_grant in 2Julien Grall
Prepare the code to support 64KB page granularity. The first implementation will use a full Linux page per indirect and persistent grant. When non-persistent grant is used, each page of a bio request may be split in multiple grant. Furthermore, the field page of the grant structure is only used to copy data from persistent grant or indirect grant. Avoid to set it for other use case as it will have no meaning given the page will be split in multiple grant. Provide 2 functions, to setup indirect grant, the other for bio page. Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23block/xen-blkfront: Store a page rather a pfn in the grant structureJulien Grall
All the usage of the field pfn are done using the same idiom: pfn_to_page(grant->pfn) This will return always the same page. Store directly the page in the grant to clean up the code. Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23block/xen-blkfront: Split blkif_queue_request in 2Julien Grall
Currently, blkif_queue_request has 2 distinct execution path: - Send a discard request - Send a read/write request The function is also allocating grants to use for generating the request. Although, this is only used for read/write request. Rather than having a function with 2 distinct execution path, separate the function in 2. This will also remove one level of tabulation. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-07Merge branch 'stable/for-jens-4.3' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus Konrad writes: Please git pull an update branch to your 'for-4.3/drivers' branch (which oddly I don't see does not have the previous pull?) git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-jens-4.3 which has two fixes - one where we use the Xen blockfront EFI driver and don't release all the requests, the other if the allocation of resources for a particular state failed - we would go back 'Closing' and assume that an structure would be allocated while in fact it may not be - and crash.
2015-10-07xen-blkfront: check for null drvdata in blkback_changed (XenbusStateClosing)Cathy Avery
xen-blkfront will crash if the check to talk_to_blkback() in blkback_changed()(XenbusStateInitWait) returns an error. The driver data is freed and info is set to NULL. Later during the close process via talk_to_blkback's call to xenbus_dev_fatal() the null pointer is passed to and dereference in blkfront_closing. CC: stable@vger.kernel.org Signed-off-by: Cathy Avery <cathy.avery@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-10-01blk-mq: fix racy updates of rq->errorsChristoph Hellwig
blk_mq_complete_request may be a no-op if the request has already been completed by others means (e.g. a timeout or cancellation), but currently drivers have to set rq->errors before calling blk_mq_complete_request, which might leave us with the wrong error value. Add an error parameter to blk_mq_complete_request so that we can defer setting rq->errors until we known we won the race to complete the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>