summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-01-31virtio_blk: remove struct request backpointer from virtblk_reqChristoph Hellwig
We can simply use blk_mq_rq_from_pdu to get back at the request at I/O completion time. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31block: make scsi_request and scsi ioctl support optionalChristoph Hellwig
We only need this code to support scsi, ide, cciss and virtio. And at least for virtio it's a deprecated feature to start with. This should shrink the kernel size for embedded device that only use, say eMMC a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31skd: implement trivial scsi ioctls directlyChristoph Hellwig
This way there is no need to drag in a dependency on the BLOCK_PC code, which is going to become optional. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-30nvme/scsi: don't rely on BLK_MAX_CDBChristoph Hellwig
The NVMe SCSI emulation doesn't use BLOCK_PC requests, so BLK_MAX_CDB doesn't have a meaning for it. Instead opencode the value of 16 and refactor the code a bit so that related checks are next to each other and we only need to use the value in one place. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-29nvme: fix compilation of scsi componentJens Axboe
Since we moved the cdb parts and define out of the block proper, we need to include scsi/scsi_request.h for the nvme scsi layer. Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27block: don't assign cmd_flags in __blk_rq_prep_cloneChristoph Hellwig
These days we have the proper flags set since request allocation time. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27block: split scsi_request out of struct requestChristoph Hellwig
And require all drivers that want to support BLOCK_PC to allocate it as the first thing of their private data. To support this the legacy IDE and BSG code is switched to set cmd_size on their queues to let the block layer allocate the additional space. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27block/bsg: move queue creation into bsg_setup_queueChristoph Hellwig
Simply the boilerplate code needed for bsg nodes a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27scsi: allocate scsi_cmnd structures as part of struct requestChristoph Hellwig
Rely on the new block layer functionality to allocate additional driver specific data behind struct request instead of implementing it in SCSI itѕelf. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27scsi: remove __scsi_alloc_queueChristoph Hellwig
Instead do an internal export of __scsi_init_queue for the transport classes that export BSG nodes. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27scsi: remove scsi_cmd_dma_poolChristoph Hellwig
There is no need for GFP_DMA allocations of the scsi_cmnd structures themselves, all that might be DMAed to or from is the actual payload, or the sense buffers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27scsi: respect unchecked_isa_dma for blk-mqChristoph Hellwig
Currently blk-mq always allocates the sense buffer using normal GFP_KERNEL allocation. Refactor the cmd pool code to split the cmd and sense allocation and share the code to allocate the sense buffers as well as the sense buffer slab caches between the legacy and blk-mq path. Note that this switches to lazy allocation of the sense slab caches - the slab caches (not the actual allocations) won't be destroy until the scsi module is unloaded instead of keeping track of hosts using them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27scsi: remove gfp_flags member in scsi_host_cmd_poolChristoph Hellwig
When using the slab allocator we already decide at cache creation time if an allocation comes from a GFP_DMA pool using the SLAB_CACHE_DMA flag, and there is no point passing the kmalloc-family only GFP_DMA flag to kmem_cache_alloc. Drop all the infrastructure for doing so. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27scsi_dh_hp_sw: switch to scsi_execute_req_flags()Hannes Reinecke
Switch to scsi_execute_req_flags() instead of using the block interface directly. This will set REQ_QUIET and REQ_PREEMPT, but this is okay as we're evaluating the errors anyway and should be able to send the command even if the device is quiesced. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27scsi_dh_emc: switch to scsi_execute_req_flags()Hannes Reinecke
Switch to scsi_execute_req_flags() and scsi_get_vpd_page() instead of open-coding it. Using scsi_execute_req_flags() will set REQ_QUIET and REQ_PREEMPT, but this is okay as we're evaluating the errors anyway and should be able to send the command even if the device is quiesced. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27scsi_dh_rdac: switch to scsi_execute_req_flags()Hannes Reinecke
Switch to scsi_execute_req_flags() and scsi_get_vpd_page() instead of open-coding it. Using scsi_execute_req_flags() will set REQ_QUIET and REQ_PREEMPT, but this is okay as we're evaluating the errors anyway and should be able to send the command even if the device is quiesced. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27dm: always defer request allocation to the owner of the request_queueChristoph Hellwig
DM already calls blk_mq_alloc_request on the request_queue of the underlying device if it is a blk-mq device. But now that we allow drivers to allocate additional data and initialize it ahead of time we need to do the same for all drivers. Doing so and using the new cmd_size infrastructure in the block layer greatly simplifies the dm-rq and mpath code, and should also make arbitrary combinations of SQ and MQ devices with SQ or MQ device mapper tables easily possible as a further step. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27dm: remove incomplete BLOCK_PC supportChristoph Hellwig
DM tries to copy a few fields around for BLOCK_PC requests, but given that no dm-target ever wires up scsi_cmd_ioctl BLOCK_PC can't actually be sent to dm. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27block: cleanup tracingChristoph Hellwig
A couple tweaks to the tracing code: - trace the request size for all requests - trace request sector and nr_sectors only for fs requests, enforced by helpers - drop SCSI CDB tracing - we have SCSI tracing for this and are going to me the CDB out of the generic struct request soon. With this the tracing code stops to know about BLOCK_PC requests entirely, it's just FS vs passthrough requests now, where the latter includes any driver-private requests. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27block: allow specifying size for extra command dataChristoph Hellwig
This mirrors the blk-mq capabilities to allocate extra drivers-specific data behind struct request by setting a cmd_size field, as well as having a constructor / destructor for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27block: simplify blk_init_allocated_queueChristoph Hellwig
Return an errno value instead of the passed in queue so that the callers don't have to keep track of two queues, and move the assignment of the request_fn and lock to the caller as passing them as argument doesn't simplify anything. While we're at it also remove two pointless NULL assignments, given that the request structure is zeroed on allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27block: fix elevator init checkChristoph Hellwig
We can't initalize the elevator fields for flushes as flush share space in struct request with the elevator data. But currently we can't communicate that a request is a flush through blk_get_request as we can only pass READ or WRITE, and the low-level code looks at the possible NULL bio to check for a flush. Fix this by allowing to pass any block op and flags, and by checking for the flush flags in __get_request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27md: cleanup bio op / flags handling in raid1_write_requestChristoph Hellwig
No need for the local variables, the bio is still live and we can just assign the bits we want directly. Make me wonder why we can't assign all the bio flags to start with. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27Merge branch 'for-4.11/block' into for-4.11/rq-refactorJens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27blk-mq: fix debugfs compilation issuesOmar Sandoval
This fixes a couple of problems: 1. In the !CONFIG_DEBUG_FS case, the stub definitions were bogus. 2. In the !CONFIG_BLOCK case, blk-mq-debugfs.c shouldn't be compiled at all. Fix the stub definitions and add a CONFIG_BLK_DEBUG_FS Kconfig option. Fixes: 07e4fead45e6 ("blk-mq: create debugfs directory tree") Signed-off-by: Omar Sandoval <osandov@fb.com> Augment Kconfig description. Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27block: cleanup remaining manual checks for PREFLUSH|FUAJens Axboe
Use op_is_flush() where applicable. Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27blk-mq-sched: add flush insertion into blk_mq_sched_insert_request()Jens Axboe
Instead of letting the caller check this and handle the details of inserting a flush request, put the logic in the scheduler insertion function. This fixes direct flush insertion outside of the usual make_request_fn calls, like from dm via blk_insert_cloned_request(). Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27block: add a op_is_flush helperChristoph Hellwig
This centralizes the checks for bios that needs to be go into the flush state machine. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()Jens Axboe
When we invoke dispatch_requests(), the scheduler empties everything into the passed in list. This isn't always a good thing, since it means that we remove items that we could have potentially merged with. Change the function to dispatch single requests at the time. If we do that, we can backoff exactly at the point where the device can't consume more IO, and leave the rest with the scheduler for better merging and future dispatch decision making. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Tested-by: Hannes Reinecke <hare@suse.com>
2017-01-27blk-mq-sched: fix starvation for multiple hardware queues and shared tagsJens Axboe
If we have both multiple hardware queues and shared tag map between devices, we need to ensure that we propagate the hardware queue restart bit higher up. This is because we can get into a situation where we don't have any IO pending on a hardware queue, yet we fail getting a tag to start new IO. If that happens, it's not enough to mark the hardware queue as needing a restart, we need to bubble that up to the higher level queue as well. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Tested-by: Hannes Reinecke <hare@suse.com>
2017-01-27blk-mq: release driver tag on a requeue eventJens Axboe
We don't want to hold on to this resource when we have a scheduler attached. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Tested-by: Hannes Reinecke <hare@suse.com>
2017-01-27blk-mq: fix potential race in queue restart and driver tag allocationJens Axboe
Once we mark the queue as needing a restart, re-check if we can get a driver tag. This fixes a theoretical issue where the needed IO completes _after_ blk_mq_get_driver_tag() fails, but before we manage to set the restart bit. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Tested-by: Hannes Reinecke <hare@suse.com>
2017-01-27blk-mq: improve scheduler queue sync/async runningJens Axboe
We'll use the same criteria for whether we need to run the queue sync or async when we have a scheduler, as we do without one. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Tested-by: Hannes Reinecke <hare@suse.com>
2017-01-27blk-mq: move hctx and ctx counters from sysfs to debugfsOmar Sandoval
These counters aren't as out-of-place in sysfs as the other stuff, but debugfs is a slightly better home for them. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27blk-mq: move hctx io_poll, stats, and dispatched from sysfs to debugfsOmar Sandoval
These statistics _might_ be useful to userspace, but it's better not to commit to an ABI for these yet. Also, the dispatched file in sysfs couldn't be cleared, so make it clearable like the others in debugfs. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27blk-mq: add tags and sched_tags bitmaps to debugfsOmar Sandoval
These can be used to debug issues like tag leaks and stuck requests. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27blk-mq: move tags and sched_tags info from sysfs to debugfsOmar Sandoval
These are very tied to the blk-mq tag implementation, so exposing them to sysfs isn't a great idea. Move the debugging information to debugfs and add basic entries for the number of tags and the number of reserved tags to sysfs. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27blk-mq: export software queue pending map to debugfsOmar Sandoval
This is useful for debugging problems where we've gotten stuck with requests in the software queues. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27sbitmap: add helpers for dumping to a seq_fileOmar Sandoval
This is useful debugging information that will be used in the blk-mq debugfs directory. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Changed 'weight' to 'busy'. Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27blk-mq: add extra request information to debugfsOmar Sandoval
The request pointers by themselves aren't super useful. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27blk-mq: move hctx->dispatch and ctx->rq_list from sysfs to debugfsOmar Sandoval
These lists are only useful for debugging; they definitely don't belong in sysfs. Putting them in debugfs also removes the limitation of a single page of output. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27blk-mq: add hctx->{state,flags} to debugfsOmar Sandoval
hctx->state could come in handy for bugs where the hardware queue gets stuck in the stopped state, and hctx->flags is just useful to know. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27blk-mq: create debugfs directory treeOmar Sandoval
In preparation for putting blk-mq debugging information in debugfs, create a directory tree mirroring the one in sysfs: # tree -d /sys/kernel/debug/block /sys/kernel/debug/block |-- nvme0n1 | `-- mq | |-- 0 | | `-- cpu0 | |-- 1 | | `-- cpu1 | |-- 2 | | `-- cpu2 | `-- 3 | `-- cpu3 `-- vda `-- mq `-- 0 |-- cpu0 |-- cpu1 |-- cpu2 `-- cpu3 Also add the scaffolding for the actual files that will go in here, either under the hardware queue or software queue directories. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-26blk-mq-sched: check for successful allocation before assigning tagJens Axboe
We don't trigger this from the normal IO path, since we always use blocking allocations from there. But Bart saw it testing multipath dm, since that is a heavy user of atomic request allocations in the map and clone path. Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-26blk-mq: don't lose flags passed in to blk_mq_alloc_request()Jens Axboe
If we come in from blk_mq_alloc_requst() with NOWAIT set in flags, we must ensure that we don't later overwrite that in blk_mq_sched_get_request(). Initialize alloc_data->flags before passing it in. Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-25blk-mq: only apply active queue tag throttling for driver tagsJens Axboe
If we have a scheduler attached, we have two sets of tags. We don't want to apply our active queue throttling for the scheduler side of tags, that only applies to driver tags since that's the resource we need to dispatch an IO. Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-23cfq-iosched: Adjust one function call together with a variable assignmentMarkus Elfring
The script "checkpatch.pl" pointed information out like the following. ERROR: do not use assignment in if condition Thus fix the affected source code place. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-23blk-throttle: Adjust two function calls together with a variable assignmentMarkus Elfring
The script "checkpatch.pl" pointed information out like the following. ERROR: do not use assignment in if condition Thus fix the affected source code places. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-23block: Initialize cfqq->ioprio_class in cfq_get_queue()Alexander Potapenko
KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of uninitialized memory in cfq_init_cfqq(): ================================================================== BUG: KMSAN: use of unitialized memory ... Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff8202ac97>] dump_stack+0x157/0x1d0 lib/dump_stack.c:51 [<ffffffff813e9b65>] kmsan_report+0x205/0x360 ??:? [<ffffffff813eabbb>] __msan_warning+0x5b/0xb0 ??:? [< inline >] cfq_init_cfqq block/cfq-iosched.c:3754 [<ffffffff8201e110>] cfq_get_queue+0xc80/0x14d0 block/cfq-iosched.c:3857 ... origin: [<ffffffff8103ab37>] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67 [<ffffffff813e836b>] kmsan_internal_poison_shadow+0xab/0x150 ??:? [<ffffffff813e88ab>] kmsan_poison_slab+0xbb/0x120 ??:? [< inline >] allocate_slab mm/slub.c:1627 [<ffffffff813e533f>] new_slab+0x3af/0x4b0 mm/slub.c:1641 [< inline >] new_slab_objects mm/slub.c:2407 [<ffffffff813e0ef3>] ___slab_alloc+0x323/0x4a0 mm/slub.c:2564 [< inline >] __slab_alloc mm/slub.c:2606 [< inline >] slab_alloc_node mm/slub.c:2669 [<ffffffff813dfb42>] kmem_cache_alloc_node+0x1d2/0x1f0 mm/slub.c:2746 [<ffffffff8201d90d>] cfq_get_queue+0x47d/0x14d0 block/cfq-iosched.c:3850 ... ================================================================== (the line numbers are relative to 4.8-rc6, but the bug persists upstream) The uninitialized struct cfq_queue is created by kmem_cache_alloc_node() and then passed to cfq_init_cfqq(), which accesses cfqq->ioprio_class before it's initialized. Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-22Linux 4.10-rc5Linus Torvalds