summaryrefslogtreecommitdiff
path: root/block/mq-deadline.c
AgeCommit message (Collapse)Author
2020-09-03blk-mq, elevator: Count requests per hctx to improve performanceKashyap Desai
High CPU utilization on "native_queued_spin_lock_slowpath" due to lock contention is possible for mq-deadline and bfq IO schedulers when nr_hw_queues is more than one. It is because kblockd work queue can submit IO from all online CPUs (through blk_mq_run_hw_queues()) even though only one hctx has pending commands. The elevator callback .has_work for mq-deadline and bfq scheduler considers pending work if there are any IOs on request queue but it does not account hctx context. Add a per-hctx 'elevator_queued' count to the hctx to avoid triggering the elevator even though there are no requests queued. [jpg: Relocated atomic_dec() in dd_dispatch_request(), update commit message per Kashyap] Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-29blk-mq: remove the bio argument to ->prepare_requestChristoph Hellwig
None of the I/O schedulers actually needs it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-05block: Introduce elevator featuresDamien Le Moal
Introduce the definition of elevator features through the elevator_features flags in the elevator_type structure. Each flag can represent a feature supported by an elevator. The first feature defined by this patch is support for zoned block device sequential write constraint with the flag ELEVATOR_F_ZBD_SEQ_WRITE, which is implemented by the mq-deadline elevator using zone write locking. Other possible features are IO priorities, write hints, latency targets or single-LUN dual-actuator disks (for which the elevator could maintain one LBA ordered list per actuator). The required_elevator_features field is also added to the request_queue structure to allow a device driver to specify elevator feature flags that an elevator must support for the correct operation of the device (e.g. device drivers for zoned block devices can have the ELEVATOR_F_ZBD_SEQ_WRITE flag as a required feature). The helper function blk_queue_required_elevator_features() is defined for setting this new field. With these two new fields in place, the elevator functions elevator_match() and elevator_find() are modified to allow a user to set only an elevator with a set of features that satisfies the device required features. Elevators not matching the device requirements are not shown in the device sysfs queue/scheduler file to prevent their use. The "none" elevator can always be selected as before. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-03block: mq-deadline: Fix queue restart handlingDamien Le Moal
Commit 7211aef86f79 ("block: mq-deadline: Fix write completion handling") added a call to blk_mq_sched_mark_restart_hctx() in dd_dispatch_request() to make sure that write request dispatching does not stall when all target zones are locked. This fix left a subtle race when a write completion happens during a dispatch execution on another CPU: CPU 0: Dispatch CPU1: write completion dd_dispatch_request() lock(&dd->lock); ... lock(&dd->zone_lock); dd_finish_request() rq = find request lock(&dd->zone_lock); unlock(&dd->zone_lock); zone write unlock unlock(&dd->zone_lock); ... __blk_mq_free_request check restart flag (not set) -> queue not run ... if (!rq && have writes) blk_mq_sched_mark_restart_hctx() unlock(&dd->lock) Since the dispatch context finishes after the write request completion handling, marking the queue as needing a restart is not seen from __blk_mq_free_request() and blk_mq_sched_restart() not executed leading to the dispatch stall under 100% write workloads. Fix this by moving the call to blk_mq_sched_mark_restart_hctx() from dd_dispatch_request() into dd_finish_request() under the zone lock to ensure full mutual exclusion between write request dispatch selection and zone unlock on write request completion. Fixes: 7211aef86f79 ("block: mq-deadline: Fix write completion handling") Cc: stable@vger.kernel.org Reported-by: Hans Holmberg <Hans.Holmberg@wdc.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-15docs: block: convert to ReSTMauro Carvalho Chehab
Rename the block documentation files to ReST, add an index for them and adjust in order to produce a nice html output via the Sphinx build system. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-06-20block: remove the bi_phys_segments field in struct bioChristoph Hellwig
We only need the number of segments in the blk-mq submission path. Remove the field from struct bio, and return it from a variant of blk_queue_split instead of that it can passed as an argument to those functions that need the value. This also means we stop recounting segments except for cloning and partial segments. To keep the number of arguments in this how path down remove pointless struct request_queue arguments from any of the functions that had it and grew a nr_segs argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30block: add SPDX tags to block layer files missing licensing informationChristoph Hellwig
Various block layer files do not have any licensing information at all. Add SPDX tags for the default kernel GPLv2 license to those. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-17block: mq-deadline: Fix write completion handlingDamien Le Moal
For a zoned block device using mq-deadline, if a write request for a zone is received while another write was already dispatched for the same zone, dd_dispatch_request() will return NULL and the newly inserted write request is kept in the scheduler queue waiting for the ongoing zone write to complete. With this behavior, when no other request has been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to __blk_mq_free_request() call of blk_mq_sched_restart() to not run the queue when the already dispatched write request completes. The newly dispatched request stays stuck in the scheduler queue until eventually another request is submitted. This problem does not affect SCSI disk as the SCSI stack handles queue restart on request completion. However, this problem is can be triggered the nullblk driver with zoned mode enabled. Fix this by always requesting a queue restart in dd_dispatch_request() if no request was dispatched while WRITE requests are queued. Fixes: 5700f69178e9 ("mq-deadline: Introduce zone locking support") Cc: <stable@vger.kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Add missing export of blk_mq_sched_restart() Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07block: get rid of MQ scheduler ops unionJens Axboe
This is a remnant of when we had ops for both SQ and MQ schedulers. Now it's just MQ, so get rid of the union. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07block: remove dead elevator codeJens Axboe
This removes a bunch of core and elevator related code. On the core front, we remove anything related to queue running, draining, initialization, plugging, and congestions. We also kill anything related to request allocation, merging, retrieval, and completion. Remove any checking for single queue IO schedulers, as they no longer exist. This means we can also delete a bunch of code related to request issue, adding, completion, etc - and all the SQ related ops and helpers. Also kill the load_default_modules(), as all that did was provide for a way to load the default single queue elevator. Tested-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-24block drivers/block: Use octal not symbolic permissionsJoe Perches
Convert the S_<FOO> symbolic permissions to their octal equivalents as using octal and not symbolic permissions is preferred by many as more readable. see: https://lkml.org/lkml/2016/8/2/1945 Done with automated conversion via: $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...> Miscellanea: o Wrapped modified multi-line calls to a single line where appropriate o Realign modified multi-line calls to open parenthesis Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-01mq-deadline: Make sure to always unlock zonesDamien Le Moal
In case of a failed write request (all retries failed) and when using libata, the SCSI error handler calls scsi_finish_command(). In the case of blk-mq this means that scsi_mq_done() does not get called, that blk_mq_complete_request() does not get called and also that the mq-deadline .completed_request() method is not called. This results in the target zone of the failed write request being left in a locked state, preventing that any new write requests are issued to the same zone. Fix this by replacing the .completed_request() method with the .finish_request() method as this method is always called whether or not a request completes successfully. Since the .finish_request() method is only called by the blk-mq core if a .prepare_request() method exists, add a dummy .prepare_request() method. Fixes: 5700f69178e9 ("mq-deadline: Introduce zone locking support") Cc: Hannes Reinecke <hare@suse.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> [ bvanassche: edited patch description ] Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-06mq-deadline: make it clear that __dd_dispatch_request() works on all hw queuesJens Axboe
Don't pass in the hardware queue to __dd_dispatch_request(), since it leads the reader to believe that we are returning a request for that specific hardware queue. That's not how mq-deadline works, the state for determining which request to serve next is shared across all hardware queues for a device. Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-05mq-deadline: Introduce zone locking supportDamien Le Moal
Introduce zone write locking to avoid write request reordering with zoned block devices. This is achieved using a finer selection of the next request to dispatch: 1) Any non-write request is always allowed to proceed. 2) Any write to a conventional zone is always allowed to proceed. 3) For a write to a sequential zone, the zone lock is first checked. a) If the zone is not locked, the write is allowed to proceed after its target zone is locked. b) If the zone is locked, the write request is skipped and the next request in the dispatch queue tested (back to step 1). For a write request that has locked its target zone, the zone is unlocked either when the request completes with a call to the method deadline_request_completed() or when the request is requeued using dd_insert_request(). Requests targeting a locked zone are always left in the scheduler queue to preserve the lba ordering for write requests. If no write request can be dispatched, allow reads to be dispatched even if the write batch is not done. If the device used is not a zoned block device, or if zoned block device support is disabled, this patch does not modify mq-deadline behavior. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-05mq-deadline: Introduce dispatch helpersDamien Le Moal
Avoid directly referencing the next_rq and fifo_list arrays using the helper functions deadline_next_request() and deadline_fifo_request() to facilitate changes in the dispatch request selection in __dd_dispatch_request() for zoned block devices. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bart Van Assche <Bart.VanAssche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-25mq-deadline: add 'deadline' as a name aliasJens Axboe
The scheduler framework now supports looking up the appropriate scheduler with the {name,mq} tupple. We can register mq-deadline with the alias of 'deadline', so that switching to 'deadline' will do the right thing based on the type of driver attached to it. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-29mq-deadline: Enable auto-loading when built as moduleBen Hutchings
The block core requests modules with the "-iosched" name suffix, but mq-deadline does not have that suffix. Add an alias. Fixes: 945ffb60c11d ("mq-deadline: add blk-mq adaptation of the deadline ...") Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-28block, scheduler: convert xxx_var_store to voidweiping zhang
The last parameter "count" never be used in xxx_var_store, convert these functions to void. Signed-off-by: weiping zhang <zhangweiping@didichuxing.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-05-04mq-deadline: add debugfs attributesOmar Sandoval
Expose the fifo lists, cached next requests, batching state, and dispatch list. It'd also be possible to add the sorted lists, but there aren't already seq_file helpers for rbtrees. Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-08block: enumify ELEVATOR_*_MERGEChristoph Hellwig
Switch these constants to an enum, and make let the compiler ensure that all callers of blk_try_merge and elv_merge handle all potential values. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-03block: free merged request in the callerJens Axboe
If we end up doing a request-to-request merge when we have completed a bio-to-request merge, we free the request from deep down in that path. For blk-mq-sched, the merge path has to hold the appropriate lock, but we don't need it for freeing the request. And in fact holding the lock is problematic, since we are now calling the mq sched put_rq_private() hook with the lock held. Other call paths do not hold this lock. Fix this inconsistency by ensuring that the caller frees a merged request. Then we can do it outside of the lock, making it both more efficient and fixing the blk-mq-sched problem of invoking parts of the scheduler with an unknown lock state. Reported-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-02-02blk-mq-sched: bypass the scheduler for flushes entirelyOmar Sandoval
There's a weird inconsistency that flushes are mostly hidden from the scheduler, but it needs to be aware of them in ->insert_requests(). Instead of having every scheduler call blk_mq_sched_bypass_insert(), let's do it in the common framework. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31block: introduce blk_rq_is_passthroughChristoph Hellwig
This can be used to check for fs vs non-fs requests and basically removes all knowledge of BLOCK_PC specific from the block layer, as well as preparing for removing the cmd_type field in struct request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()Jens Axboe
When we invoke dispatch_requests(), the scheduler empties everything into the passed in list. This isn't always a good thing, since it means that we remove items that we could have potentially merged with. Change the function to dispatch single requests at the time. If we do that, we can backoff exactly at the point where the device can't consume more IO, and leave the rest with the scheduler for better merging and future dispatch decision making. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Tested-by: Hannes Reinecke <hare@suse.com>
2017-01-17mq-deadline: add blk-mq adaptation of the deadline IO schedulerJens Axboe
This is basically identical to deadline-iosched, except it registers as a MQ capable scheduler. This is still a single queue design. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com>