From 0034af036554c39eefd14d835a8ec3496ac46712 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Thu, 16 Jul 2015 09:14:26 -0600 Subject: block: make /sys/block//queue/discard_max_bytes writeable Lots of devices support huge discard sizes these days. Depending on how the device handles them internally, huge discards can introduce massive latencies (hundreds of msec) on the device side. We have a sysfs file, discard_max_bytes, that advertises the max hardware supported discard size. Make this writeable, and split the settings into a soft and hard limit. This can be set from 'discard_granularity' and up to the hardware limit. Add a new sysfs file, 'discard_max_hw_bytes', that shows the hw set limit. Reviewed-by: Jeff Moyer Signed-off-by: Jens Axboe --- Documentation/block/queue-sysfs.txt | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt index 3a29f8914df9..e5d914845be6 100644 --- a/Documentation/block/queue-sysfs.txt +++ b/Documentation/block/queue-sysfs.txt @@ -20,7 +20,7 @@ This shows the size of internal allocation of the device in bytes, if reported by the device. A value of '0' means device does not support the discard functionality. -discard_max_bytes (RO) +discard_max_hw_bytes (RO) ---------------------- Devices that support discard functionality may have internal limits on the number of bytes that can be trimmed or unmapped in a single operation. @@ -29,6 +29,14 @@ number of bytes that can be discarded in a single operation. Discard requests issued to the device must not exceed this limit. A discard_max_bytes value of 0 means that the device does not support discard functionality. +discard_max_bytes (RW) +---------------------- +While discard_max_hw_bytes is the hardware limit for the device, this +setting is the software limit. Some devices exhibit large latencies when +large discards are issued, setting this value lower will make Linux issue +smaller discards and potentially help reduce latencies induced by large +discard operations. + discard_zeroes_data (RO) ------------------------ When read, this file will show if the discarded block are zeroed by the -- cgit v1.2.3 From 4246a0b63bd8f56a1469b12eafeb875b1041a451 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 20 Jul 2015 15:29:37 +0200 Subject: block: add a bi_error field to struct bio Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: NeilBrown Signed-off-by: Jens Axboe --- Documentation/block/biodoc.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt index fd12c0d835fd..5be8a7f4cc7f 100644 --- a/Documentation/block/biodoc.txt +++ b/Documentation/block/biodoc.txt @@ -1109,7 +1109,7 @@ it will loop and handle as many sectors (on a bio-segment granularity) as specified. Now bh->b_end_io is replaced by bio->bi_end_io, but most of the time the -right thing to use is bio_endio(bio, uptodate) instead. +right thing to use is bio_endio(bio) instead. If the driver is dropping the io_request_lock from its request_fn strategy, then it just needs to replace that with q->queue_lock instead. -- cgit v1.2.3 From 2ec3182f9c20a9eef0dacc0512cf2ca2df7be5ad Mon Sep 17 00:00:00 2001 From: Dongsu Park Date: Fri, 19 Dec 2014 14:53:03 +0100 Subject: Documentation: update notes in biovecs about arbitrarily sized bios Update block/biovecs.txt so that it includes a note on what kind of effects arbitrarily sized bios would bring to the block layer. Also fix a trivial typo, bio_iter_iovec. Cc: Christoph Hellwig Cc: Kent Overstreet Cc: Jonathan Corbet Cc: linux-doc@vger.kernel.org Signed-off-by: Dongsu Park Signed-off-by: Ming Lin Signed-off-by: Jens Axboe --- Documentation/block/biovecs.txt | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/block/biovecs.txt b/Documentation/block/biovecs.txt index 74a32ad52f53..25689584e6e0 100644 --- a/Documentation/block/biovecs.txt +++ b/Documentation/block/biovecs.txt @@ -24,7 +24,7 @@ particular, presenting the illusion of partially completed biovecs so that normal code doesn't have to deal with bi_bvec_done. * Driver code should no longer refer to biovecs directly; we now have - bio_iovec() and bio_iovec_iter() macros that return literal struct biovecs, + bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs, constructed from the raw biovecs but taking into account bi_bvec_done and bi_size. @@ -109,3 +109,11 @@ Other implications: over all the biovecs in the new bio - which is silly as it's not needed. So, don't use bi_vcnt anymore. + + * The current interface allows the block layer to split bios as needed, so we + could eliminate a lot of complexity particularly in stacked drivers. Code + that creates bios can then create whatever size bios are convenient, and + more importantly stacked drivers don't have to deal with both their own bio + size limitations and the limitations of the underlying devices. Thus + there's no need to define ->merge_bvec_fn() callbacks for individual block + drivers. -- cgit v1.2.3