summaryrefslogtreecommitdiff
path: root/drivers/scsi/scsi_lib.c
AgeCommit message (Collapse)Author
2015-11-06mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIMMel Gorman
__GFP_WAIT was used to signal that the caller was in atomic context and could not sleep. Now it is possible to distinguish between true atomic context and callers that are not willing to sleep. The latter should clear __GFP_DIRECT_RECLAIM so kswapd will still wake. As clearing __GFP_WAIT behaves differently, there is a risk that people will clear the wrong flags. This patch renames __GFP_WAIT to __GFP_RECLAIM to clearly indicate what it does -- setting it allows all reclaim activity, clearing them prevents it. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-01blk-mq: fix racy updates of rq->errorsChristoph Hellwig
blk_mq_complete_request may be a no-op if the request has already been completed by others means (e.g. a timeout or cancellation), but currently drivers have to set rq->errors before calling blk_mq_complete_request, which might leave us with the wrong error value. Add an error parameter to blk_mq_complete_request so that we can defer setting rq->errors until we known we won the race to complete the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-28scsi_dh: kill struct scsi_dh_dataChristoph Hellwig
Add a ->handler and a ->handler_data field to struct scsi_device and kill this indirection. Also move struct scsi_device_handler to scsi_dh.h so that changes to it don't require rebuilding every SCSI LLDD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-08-26scsi: Add ALUA state change UA handlingHannes Reinecke
Log the ALUA state change unit attention correctly with the message log and emit an event to allow user-space tools to react to it. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-07-30scsi: retry MODE SENSE on unit attentionHannes Reinecke
The 'sd' driver is calling scsi_mode_sense() to figure out internal details. But scsi_mode_sense() never checks for any pending unit attentions, so we're getting annoying error messages like: MODE SENSE: unimplemented page/subpage: 0x00/0x00 and a possible wrong decision for device cache handling. Reviewed-by: Ewan Milne <emilne@redhat.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-07-30scsi: fix memory leak with scsi-mqTony Battersby
Fix a memory leak with scsi-mq triggered by commands with large data transfer length. __sg_alloc_table() sets both table->nents and table->orig_nents to the same value. When the scatterlist is DMA-mapped, table->nents is overwritten with the (possibly smaller) size of the DMA-mapped scatterlist, while table->orig_nents retains the original size of the allocated scatterlist. scsi_free_sgtable() should therefore check orig_nents instead of nents, and all code that initializes sdb->table without calling __sg_alloc_table() should set both nents and orig_nents. Fixes: d285203cf647 ("scsi: add support for a blk-mq based I/O path.") Cc: <stable@vger.kernel.org> # 3.17+ Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-04-08Defer processing of REQ_PREEMPT requests for blocked devicesBart Van Assche
SCSI transport drivers and SCSI LLDs block a SCSI device if the transport layer is not operational. This means that in this state no requests should be processed, even if the REQ_PREEMPT flag has been set. This patch avoids that a rescan shortly after a cable pull sporadically triggers the following kernel oops: BUG: unable to handle kernel paging request at ffffc9001a6bc084 IP: [<ffffffffa04e08f2>] mlx4_ib_post_send+0xd2/0xb30 [mlx4_ib] Process rescan-scsi-bus (pid: 9241, threadinfo ffff88053484a000, task ffff880534aae100) Call Trace: [<ffffffffa0718135>] srp_post_send+0x65/0x70 [ib_srp] [<ffffffffa071b9df>] srp_queuecommand+0x1cf/0x3e0 [ib_srp] [<ffffffffa0001ff1>] scsi_dispatch_cmd+0x101/0x280 [scsi_mod] [<ffffffffa0009ad1>] scsi_request_fn+0x411/0x4d0 [scsi_mod] [<ffffffff81223b37>] __blk_run_queue+0x27/0x30 [<ffffffff8122a8d2>] blk_execute_rq_nowait+0x82/0x110 [<ffffffff8122a9c2>] blk_execute_rq+0x62/0xf0 [<ffffffffa000b0e8>] scsi_execute+0xe8/0x190 [scsi_mod] [<ffffffffa000b2f3>] scsi_execute_req+0xa3/0x130 [scsi_mod] [<ffffffffa000c1aa>] scsi_probe_lun+0x17a/0x450 [scsi_mod] [<ffffffffa000ce86>] scsi_probe_and_add_lun+0x156/0x480 [scsi_mod] [<ffffffffa000dc2f>] __scsi_scan_target+0xdf/0x1f0 [scsi_mod] [<ffffffffa000dfa3>] scsi_scan_host_selected+0x183/0x1c0 [scsi_mod] [<ffffffffa000edfb>] scsi_scan+0xdb/0xe0 [scsi_mod] [<ffffffffa000ee13>] store_scan+0x13/0x20 [scsi_mod] [<ffffffff811c8d9b>] sysfs_write_file+0xcb/0x160 [<ffffffff811589de>] vfs_write+0xce/0x140 [<ffffffff81158b53>] sys_write+0x53/0xa0 [<ffffffff81464592>] system_call_fastpath+0x16/0x1b [<00007f611c9d9300>] 0x7f611c9d92ff Reported-by: Max Gurtuvoy <maxg@mellanox.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Cc: <stable@vger.kernel.org> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-02-12Merge branch 'for-3.20/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block IO changes from Jens Axboe: "This contains: - A series from Christoph that cleans up and refactors various parts of the REQ_BLOCK_PC handling. Contributions in that series from Dongsu Park and Kent Overstreet as well. - CFQ: - A bug fix for cfq for realtime IO scheduling from Jeff Moyer. - A stable patch fixing a potential crash in CFQ in OOM situations. From Konstantin Khlebnikov. - blk-mq: - Add support for tag allocation policies, from Shaohua. This is a prep patch enabling libata (and other SCSI parts) to use the blk-mq tagging, instead of rolling their own. - Various little tweaks from Keith and Mike, in preparation for DM blk-mq support. - Minor little fixes or tweaks from me. - A double free error fix from Tony Battersby. - The partition 4k issue fixes from Matthew and Boaz. - Add support for zero+unprovision for blkdev_issue_zeroout() from Martin" * 'for-3.20/core' of git://git.kernel.dk/linux-block: (27 commits) block: remove unused function blk_bio_map_sg block: handle the null_mapped flag correctly in blk_rq_map_user_iov blk-mq: fix double-free in error path block: prevent request-to-request merging with gaps if not allowed blk-mq: make blk_mq_run_queues() static dm: fix multipath regression due to initializing wrong request cfq-iosched: handle failure of cfq group allocation block: Quiesce zeroout wrapper block: rewrite and split __bio_copy_iov() block: merge __bio_map_user_iov into bio_map_user_iov block: merge __bio_map_kern into bio_map_kern block: pass iov_iter to the BLOCK_PC mapping functions block: add a helper to free bio bounce buffer pages block: use blk_rq_map_user_iov to implement blk_rq_map_user block: simplify bio_map_kern block: mark blk-mq devices as stackable block: keep established cmd_flags when cloning into a blk-mq request block: add blk-mq support to blk_insert_cloned_request() block: require blk_rq_prep_clone() be given an initialized clone request blk-mq: add tag allocation policy ...
2015-01-23blk-mq: add tag allocation policyShaohua Li
This is the blk-mq part to support tag allocation policy. The default allocation policy isn't changed (though it's not a strict FIFO). The new policy is round-robin for libata. But it's a try-best implementation. If multiple tasks are competing, the tags returned will be mixed (which is unavoidable even with !mq, as requests from different tasks can be mixed in queue) Cc: Jens Axboe <axboe@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-20scsi: Avoid crashing if device uses DIX but adapter does not support itEwan D. Milne
This can happen if a multipathed device uses DIX and another path is added via an adapter that does not support it. Multipath should not allow this path to be added, but we should not depend upon that to avoid crashing. Signed-off-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-01-09scsi: ->queue_rq can't sleepChristoph Hellwig
The blk-mq ->queue_rq method is always called from process context, but might have preemption disabled. This means we still always have to use GFP_ATOMIC for memory allocations, and thus need to revert part of commit 3c356bde1 ("scsi: stop passing a gfp_mask argument down the command setup path"). Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Sasha Levin <sasha.levin@oracle.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
2014-12-15scsi: fix random memory corruption with scsi-mq + T10 PITony Battersby
This fixes random memory corruption triggered when all three of the following are true: * scsi-mq enabled * T10 Protection Information (DIF) enabled * SCSI host with sg_tablesize > SCSI_MAX_SG_SEGMENTS (128) The symptoms of this bug are unpredictable memory corruption, BUG()s, oopses, lockups, etc., any of which may appear to be completely unrelated to the root cause. Cc: <stable@vger.kernel.org> # 3.17.x, 3.18.x Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-12-13Merge branch 'for-3.19/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver core update from Jens Axboe: "This is the pull request for the core block IO changes for 3.19. Not a huge round this time, mostly lots of little good fixes: - Fix a bug in sysfs blktrace interface causing a NULL pointer dereference, when enabled/disabled through that API. From Arianna Avanzini. - Various updates/fixes/improvements for blk-mq: - A set of updates from Bart, mostly fixing buts in the tag handling. - Cleanup/code consolidation from Christoph. - Extend queue_rq API to be able to handle batching issues of IO requests. NVMe will utilize this shortly. From me. - A few tag and request handling updates from me. - Cleanup of the preempt handling for running queues from Paolo. - Prevent running of unmapped hardware queues from Ming Lei. - Move the kdump memory limiting check to be in the correct location, from Shaohua. - Initialize all software queues at init time from Takashi. This prevents a kobject warning when CPUs are brought online that weren't online when a queue was registered. - Single writeback fix for I_DIRTY clearing from Tejun. Queued with the core IO changes, since it's just a single fix. - Version X of the __bio_add_page() segment addition retry from Maurizio. Hope the Xth time is the charm. - Documentation fixup for IO scheduler merging from Jan. - Introduce (and use) generic IO stat accounting helpers for non-rq drivers, from Gu Zheng. - Kill off artificial limiting of max sectors in a request from Christoph" * 'for-3.19/core' of git://git.kernel.dk/linux-block: (26 commits) bio: modify __bio_add_page() to accept pages that don't start a new segment blk-mq: Fix uninitialized kobject at CPU hotplugging blktrace: don't let the sysfs interface remove trace from running list blk-mq: Use all available hardware queues blk-mq: Micro-optimize bt_get() blk-mq: Fix a race between bt_clear_tag() and bt_get() blk-mq: Avoid that __bt_get_word() wraps multiple times blk-mq: Fix a use-after-free blk-mq: prevent unmapped hw queue from being scheduled blk-mq: re-check for available tags after running the hardware queue blk-mq: fix hang in bt_get() blk-mq: move the kdump check to blk_mq_alloc_tag_set blk-mq: cleanup tag free handling blk-mq: use 'nr_cpu_ids' as highest CPU ID count for hwq <-> cpu map blk: introduce generic io stat accounting help function blk-mq: handle the single queue case in blk_mq_hctx_next_cpu genhd: check for int overflow in disk_expand_part_tbl() blk-mq: add blk_mq_free_hctx_request() blk-mq: export blk_mq_free_request() blk-mq: use get_cpu/put_cpu instead of preempt_disable/preempt_enable ...
2014-11-24scsi: move scsi_dispatch_cmd to scsi_lib.cChristoph Hellwig
scsi_lib.c is where the rest of the I/O submission path lives, so move scsi_dispatch_cmd there and mark it static. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-24scsi: stop passing a gfp_mask argument down the command setup pathChristoph Hellwig
There is no reason for ULDs to pass in a flag on how to allocate the S/G lists. While we don't need GFP_ATOMIC for the blk-mq case because we don't hold locks, that decision can be made way down the chain without having to pass a pointless gfp_mask argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-24scsi: remove scsi_next_commandChristoph Hellwig
There's only one caller left, so inline it and reduce the blk-mq vs !blk-mq diff a litte bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-12scsi: add new scsi-command flag for tagged commandsChristoph Hellwig
Currently scsi piggy backs on the block layer to define the concept of a tagged command. But we want to be able to have block-level host-wide tags assigned even for untagged commands like the initial INQUIRY, so add a new SCSI-level flag for commands that are tagged at the scsi level, so that even commands without that set can have tags assigned to them. Note that this alredy is the case for the blk-mq code path, and this just lets the old path catch up with it. We also set this flag based upon sdev->simple_tags instead of the block queue flag, so that it is entirely independent of the block layer tagging, and thus always correct even if a driver doesn't use block level tagging yet. Also remove the old blk_rq_tagged; it was only used by SCSI drivers, and removing it forces them to look for the proper replacement. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-12scsi: add support for multiple hardware queuesBart Van Assche
Allow a SCSI LLD to declare how many hardware queues it supports by setting Scsi_Host.nr_hw_queues before calling scsi_add_host(). Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12scsi: ratelimit I/O error messagesHannes Reinecke
There can be quite a lot of I/O error messages, even on smaller machines. So we need to ratelimit them to not overwhelm logging. Signed-off-by: Hannes Reinecke <hare@suse.de> Tested-by: Robert Elliott <elliott@hp.com> Reviewed-by: Robert Elliott <elliott@hp.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12scsi: simplify scsi_log_(send|completion)Hannes Reinecke
Simplify scsi_log_(send|completion) by externalizing scsi_mlreturn_string() and always print the command address. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Robert Elliott <elliott@hp.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12scsi: use 'bool' as return value for scsi_normalize_sense()Hannes Reinecke
Convert scsi_normalize_sense() and friends to return 'bool' instead of an integer. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Robert Elliott <elliott@hp.com> Reviewed-by: Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12scsi: use sdev as argument for sense code printingHannes Reinecke
We should be using the standard dev_printk() variants for sense code printing. [hch: remove __scsi_print_sense call in xen-scsiback, Acked by Juergen] [hch: folded bracing fix from Dan Carpenter] Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Robert Elliott <elliott@hp.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12scsi: resolve some missing-field-initializers warningsMark Rustad
Resolve some missing-field-initializers warnings by using designated initialization. [hch: W=2 with modern gcc warns about this. Pretty pointless to me, but I'd prefer to keep us warning free] Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-10-29blk-mq: add a 'list' parameter to ->queue_rq()Jens Axboe
Since we have the notion of a 'last' request in a chain, we can use this to have the hardware optimize the issuing of requests. Add a list_head parameter to queue_rq that the driver can use to temporarily store hw commands for issue when 'last' is true. If we are doing a chain of requests, pass in a NULL list for the first request to force issue of that immediately, then batch the remainder for deferred issue until the last request has been sent. Instead of adding yet another argument to the hot ->queue_rq path, encapsulate the passed arguments in a blk_mq_queue_data structure. This is passed as a constant, and has been tested as faster than passing 4 (or even 3) args through ->queue_rq. Update drivers for the new ->queue_rq() prototype. There are no functional changes in this patch for drivers - if they don't use the passed in list, then they will just queue requests individually like before. Signed-off-by: Jens Axboe <axboe@fb.com>
2014-10-28scsi: set REQ_QUEUE for the blk-mq caseChristoph Hellwig
To generate the right SPI tag messages we need to properly set QUEUE_FLAG_QUEUED in the request_queue and mirror it to the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Jens Axboe <axboe@kernel.dk> Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Cc: stable@vger.kernel.org
2014-10-18Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block layer changes from Jens Axboe: "This is the core block IO pull request for 3.18. Apart from the new and improved flush machinery for blk-mq, this is all mostly bug fixes and cleanups. - blk-mq timeout updates and fixes from Christoph. - Removal of REQ_END, also from Christoph. We pass it through the ->queue_rq() hook for blk-mq instead, freeing up one of the request bits. The space was overly tight on 32-bit, so Martin also killed REQ_KERNEL since it's no longer used. - blk integrity updates and fixes from Martin and Gu Zheng. - Update to the flush machinery for blk-mq from Ming Lei. Now we have a per hardware context flush request, which both cleans up the code should scale better for flush intensive workloads on blk-mq. - Improve the error printing, from Rob Elliott. - Backing device improvements and cleanups from Tejun. - Fixup of a misplaced rq_complete() tracepoint from Hannes. - Make blk_get_request() return error pointers, fixing up issues where we NULL deref when a device goes bad or missing. From Joe Lawrence. - Prep work for drastically reducing the memory consumption of dm devices from Junichi Nomura. This allows creating clone bio sets without preallocating a lot of memory. - Fix a blk-mq hang on certain combinations of queue depths and hardware queues from me. - Limit memory consumption for blk-mq devices for crash dump scenarios and drivers that use crazy high depths (certain SCSI shared tag setups). We now just use a single queue and limited depth for that" * 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits) block: Remove REQ_KERNEL blk-mq: allocate cpumask on the home node bio-integrity: remove the needless fail handle of bip_slab creating block: include func name in __get_request prints block: make blk_update_request print prefix match ratelimited prefix blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio block: fix alignment_offset math that assumes io_min is a power-of-2 blk-mq: Make bt_clear_tag() easier to read blk-mq: fix potential hang if rolling wakeup depth is too high block: add bioset_create_nobvec() block: use bio_clone_fast() in blk_rq_prep_clone() block: misplaced rq_complete tracepoint sd: Honor block layer integrity handling flags block: Replace strnicmp with strncasecmp block: Add T10 Protection Information functions block: Don't merge requests if integrity flags differ block: Integrity checksum flag block: Relocate bio integrity flags block: Add a disk flag to block integrity profile block: Add prefix to block integrity profile flags ...
2014-10-07Merge tag 'scsi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This patch set consists of the usual driver updates (megaraid_sas, arcmsr, be2iscsi, lpfc, mpt2sas, mpt3sas, qla2xxx, ufs) plus several assorted fixes and miscellaneous updates (including the pci_msix_enable_range() changes that have been pending for a while)" * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (202 commits) scsi: add a CONFIG_SCSI_MQ_DEFAULT option ufs: definitions for phy interface ufs: tune bkops while power managment events ufs: Add support for clock scaling using devfreq framework ufs: Add freq-table-hz property for UFS device ufs: Add support for clock gating ufs: refactor configuring power mode ufs: add UFS power management support ufs: introduce well known logical unit in ufs ufs: manually add well known logical units ufs: Active Power Mode - configuring bActiveICCLevel ufs: improve init sequence ufs: refactor query descriptor API support ufs: add voting support for host controller power ufs: Add clock initialization support ufs: Add regulator enable support ufs: Allow vendor specific initialization scsi: don't add scsi_device if its already visible scsi: fix the type for well known LUs scsi: fix comment in struct Scsi_Host definition ...
2014-09-22scsi: move blk_mq_start_request call earlierChristoph Hellwig
Some ATA drivers need the dma drain size workaround, and thus need to call blk_mq_start_request before the S/G mapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22blk-mq: pass a reserved argument to the timeout handlerChristoph Hellwig
Allow blk-mq to pass an argument to the timeout handler to indicate if we're timing out a reserved or regular command. For many drivers those need to be handled different. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22blk-mq: rename blk_mq_end_io to blk_mq_end_requestChristoph Hellwig
Now that we've changed the driver API on the submission side use the opportunity to fix up the name on the completion side to fit into the general scheme. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22blk-mq: call blk_mq_start_request from ->queue_rqChristoph Hellwig
When we call blk_mq_start_request from the core blk-mq code before calling into ->queue_rq there is a racy window where the timeout handler can hit before we've fully set up the driver specific part of the command. Move the call to blk_mq_start_request into the driver so the driver can start the request only once it is fully set up. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22blk-mq: remove REQ_ENDChristoph Hellwig
Pass an explicit parameter for the last request in a batch to ->queue_rq instead of using a request flag. Besides being a cleaner and non-stateful interface this is also required for the next patch, which fixes the blk-mq I/O submission code to not start a time too early. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-19[SCSI] fix for bidi use after freeDaniel Gryniewicz
When ending a bi-directionional SCSI request, blk_finish_request() cleans up and frees the request, but scsi_release_bidi_buffers() tries to indirect through the request to find it's data buffers. This causes a panic due to a null pointer dereference. Move the call to scsi_release_bidi_buffers() before the call to blk_finish_request(). Signed-off-by: Daniel Gryniewicz <dang@linuxbox.com> Reviewed-by: Webb Scales <webbnh@hp.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2014-09-15scsi: add use_cmd_list flagKashyap.Desai@avagotech.com
Add a use_cmd_list flag in struct Scsi_Host to request keeping track of all outstanding commands per device. Default behaviour is not to keep track of cmd_list per sdev, as this may introduce lock contention. (overhead is more on multi-node NUMA.), and only enable it on the two drivers that need it. Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-11Merge branch 'for-linus' into for-3.18/coreJens Axboe
A bit of churn on the for-linus side that would be nice to have in the core bits for 3.18, so pull it in to catch us up and make forward progress easier. Signed-off-by: Jens Axboe <axboe@fb.com> Conflicts: block/scsi_ioctl.c
2014-08-29Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer fixes from Jens Axboe: "A smaller collection of fixes that have come up since the initial merge window pull request. This contains: - error handling cleanup and support for larger than 16 byte cdbs in sg_io() from Christoph. The latter just matches what bsg and friends support, sg_io() got left out in the merge. - an option for brd to expose partitions in /proc/partitions. They are hidden by default for compat reasons. From Dmitry Monakhov. - a few blk-mq fixes from me - killing a dead/unused flag, fix for merging happening even if turned off, and correction of a few comments. - removal of unnecessary ->owner setting in systemace. From Michal Simek. - two related fixes for a problem with nesting freezing of queues in blk-mq. One from Ming Lei removing an unecessary freeze operation, and another from Tejun fixing the nesting regression introduced in the merge window. - fix for a BUG_ON() at bio_endio time when protection info is attached and the IO has an error. From Sagi Grimberg. - two scsi_ioctl bug fixes for regressions with scsi-mq from Tony Battersby. - a cfq weight update fix and subsequent comment update from Toshiaki Makita" * 'for-linus' of git://git.kernel.dk/linux-block: cfq-iosched: Add comments on update timing of weight cfq-iosched: Fix wrong children_weight calculation block: fix error handling in sg_io fix regression in SCSI_IOCTL_SEND_COMMAND scsi-mq: fix requests that use a separate CDB buffer block: support > 16 byte CDBs for SG_IO block: cleanup error handling in sg_io brd: add ram disk visibility option block: systemace: Remove .owner field for driver blk-mq: blk_mq_freeze_queue() should allow nesting blk-mq: correct a few wrong/bad comments block: Fix BUG_ON when pi errors occur blk-mq: don't allow merges if turned off for the queue blk-mq: get rid of unused BLK_MQ_F_SHOULD_SORT flag blk-mq: fix WARNING "percpu_ref_kill() called more than once!"
2014-08-28block,scsi: fixup blk_get_request dead queue scenariosJoe Lawrence
The blk_get_request function may fail in low-memory conditions or during device removal (even if __GFP_WAIT is set). To distinguish between these errors, modify the blk_get_request call stack to return the appropriate ERR_PTR. Verify that all callers check the return status and consider IS_ERR instead of a simple NULL pointer check. For consistency, make a similar change to the blk_mq_alloc_request leg of blk_get_request. It may fail if the queue is dead, or the caller was unwilling to wait. Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com> Acked-by: Jiri Kosina <jkosina@suse.cz> [for pktdvd] Acked-by: Boaz Harrosh <bharrosh@panasas.com> [for osd] Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-22scsi-mq: fix requests that use a separate CDB bufferTony Battersby
This patch fixes code such as the following with scsi-mq enabled: rq = blk_get_request(...); blk_rq_set_block_pc(rq); rq->cmd = my_cmd_buffer; /* separate CDB buffer */ blk_execute_rq_nowait(...); Code like this appears in e.g. sg_start_req() in drivers/scsi/sg.c (for large CDBs only). Without this patch, scsi_mq_prep_fn() will set rq->cmd back to rq->__cmd, causing the wrong CDB to be sent to the device. Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-19scsi: Fix qemu boot hang problemGuenter Roeck
The latest kernel fails to boot qemu arm images when using scsi for disk access. Boot gets stuck after the following messages. brd: module loaded sym53c8xx 0000:00:0c.0: enabling device (0100 -> 0103) sym0: <895a> rev 0x0 at pci 0000:00:0c.0 irq 93 sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking sym0: SCSI BUS has been reset. scsi host0: sym-2.2.3 Bisect points to commit 71e75c97f97a ("scsi: convert device_busy to atomic_t"). Code inspection shows the following suspicious change in scsi_request_fn. out_delay: - if (sdev->device_busy == 0 && !scsi_device_blocked(sdev)) + if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev)) blk_delay_queue(q, SCSI_QUEUE_DELAY); } 'sdev->device_busy == 0' was replaced with 'atomic_read(&sdev->device_busy)', meaning the logic was reversed. Changing this expression to '!atomic_read(&sdev->device_busy)' fixes the problem. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Jens Axboe <axboe@fb.com> Reviewed-by: Venkatesh Srinivas <venkateshs@google.com> Reviewed-by: Webb Scales <webbnh@hp.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-25scsi: add support for a blk-mq based I/O path.Christoph Hellwig
This patch adds support for an alternate I/O path in the scsi midlayer which uses the blk-mq infrastructure instead of the legacy request code. Use of blk-mq is fully transparent to drivers, although for now a host template field is provided to opt out of blk-mq usage in case any unforseen incompatibilities arise. In general replacing the legacy request code with blk-mq is a simple and mostly mechanical transformation. The biggest exception is the new code that deals with the fact the I/O submissions in blk-mq must happen from process context, which slightly complicates the I/O completion handler. The second biggest differences is that blk-mq is build around the concept of preallocated requests that also include driver specific data, which in SCSI context means the scsi_cmnd structure. This completely avoids dynamic memory allocations for the fast path through I/O submission. Due the preallocated requests the MQ code path exclusively uses the host-wide shared tag allocator instead of a per-LUN one. This only affects drivers actually using the block layer provided tag allocator instead of their own. Unlike the old path blk-mq always provides a tag, although drivers don't have to use it. For now the blk-mq path is disable by defauly and must be enabled using the "use_blk_mq" module parameter. Once the remaining work in the block layer to make blk-mq more suitable for slow devices is complete I hope to make it the default and eventually even remove the old code path. Based on the earlier scsi-mq prototype by Nicholas Bellinger. Thanks to Bart Van Assche and Robert Elliot for testing, benchmarking and various sugestions and code contributions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25scatterlist: allow chaining to preallocated chunksChristoph Hellwig
Blk-mq drivers usually preallocate their S/G list as part of the request, but if we want to support the very large S/G lists currently supported by the SCSI code that would tie up a lot of memory in the preallocated request pool. Add support to the scatterlist code so that it can initialize a S/G list that uses a preallocated first chunks and dynamically allocated additional chunks. That way the scsi-mq code can preallocate a first page worth of S/G entries as part of the request, and dynamically extend the S/G list when needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25scsi: unwind blk_end_request_all and blk_end_request_err callsChristoph Hellwig
Replace the calls to the various blk_end_request variants with opencode equivalents. Blk-mq is using a model that gives the driver control between the bio updates and the actual completion, and making the old code follow that same model allows us to keep the code more similar for both paths. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25scsi: only maintain target_blocked if the driver has a target queue limitChristoph Hellwig
This saves us an atomic operation for each I/O submission and completion for the usual case where the driver doesn't set a per-target can_queue value. Only a few iscsi hardware offload drivers set the per-target can_queue value at the moment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25scsi: fix the {host,target,device}_blocked counter messChristoph Hellwig
Seems like these counters are missing any sort of synchronization for updates, as a over 10 year old comment from me noted. Fix this by using atomic counters, and while we're at it also make sure they are in the same cacheline as the _busy counters and not needlessly stored to in every I/O completion. With the new model the _busy counters can temporarily go negative, so all the readers are updated to check for > 0 values. Longer term every successful I/O completion will reset the counters to zero, so the temporarily negative values will not cause any harm. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25scsi: convert device_busy to atomic_tChristoph Hellwig
Avoid taking the queue_lock to check the per-device queue limit. Instead we do an atomic_inc_return early on to grab our slot in the queue, and if necessary decrement it after finishing all checks. Unlike the host and target busy counters this doesn't allow us to avoid the queue_lock in the request_fn due to the way the interface works, but it'll allow us to prepare for using the blk-mq code, which doesn't use the queue_lock at all, and it at least avoids a queue_lock round trip in scsi_device_unbusy, which is still important given how busy the queue_lock is. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25scsi: convert host_busy to atomic_tChristoph Hellwig
Avoid taking the host-wide host_lock to check the per-host queue limit. Instead we do an atomic_inc_return early on to grab our slot in the queue, and if necessary decrement it after finishing all checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25scsi: convert target_busy to an atomic_tChristoph Hellwig
Avoid taking the host-wide host_lock to check the per-target queue limit. Instead we do an atomic_inc_return early on to grab our slot in the queue, and if necessary decrement it after finishing all checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25scsi: push host_lock down into scsi_{host,target}_queue_readyChristoph Hellwig
Prepare for not taking a host-wide lock in the dispatch path by pushing the lock down into the places that actually need it. Note that this patch is just a preparation step, as it will actually increase lock roundtrips and thus decrease performance on its own. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25scsi: set ->scsi_done before calling scsi_dispatch_cmdChristoph Hellwig
The blk-mq code path will set this to a different function, so make the code simpler by setting it up in a legacy-request specific place. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25scsi: centralize command re-queueing in scsi_dispatch_fnChristoph Hellwig
Make sure we only have the logic for requeing commands in one place. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>