summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-06-05dm zoned: allocate zone by device indexHannes Reinecke
When allocating a zone, pass in an indicator on which device the zone should be allocated; this increases performance for a multi-device setup because reclaim will now allocate zones on the device for which reclaim is running. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm zoned: support arbitrary number of devicesHannes Reinecke
Remove the hard-coded limit of two devices and support an unlimited number of additional zoned devices. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm zoned: move random and sequential zones into struct dmz_devHannes Reinecke
Random and sequential zones should be part of the respective device structure to make arbitration between devices possible. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm zoned: per-device reclaimHannes Reinecke
Instead of having one reclaim workqueue for the entire set we should be allocating a reclaim workqueue per device; doing so will reduce contention and should boost performance for a multi-device setup. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm zoned: add metadata pointer to struct dmz_devHannes Reinecke
Add a metadata pointer within struct dmz_dev and use it as argument for blkdev_report_zones() instead of the metadata itself. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm zoned: add device pointer to struct dm_zoneHannes Reinecke
Add a pointer, to the containing device, within struct dm_zone and kill dmz_zone_to_dev(). Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm zoned: allocate temporary superblock for tertiary devicesHannes Reinecke
Checking the tertiary superblock just consists of validating UUIDs, crcs, and the generation number; it doesn't have contents which would be required during the actual operation. So allocate a temporary superblock when checking tertiary devices to avoid having to store it together with the 'real' superblocks. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm zoned: convert to xarrayHannes Reinecke
The zones array is getting really large, and large arrays tend to wreak havoc with the CPU caches. So convert it to xarray to become more cache friendly. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> # fix leak in dmz_insert Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm zoned: add a 'reserved' zone flagHannes Reinecke
Instead of counting the number of reserved zones in dmz_free_zone(), mark the zone as 'reserved' during allocation and simplify dmz_free_zone(). Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm zoned: improve logging messages for reclaimHannes Reinecke
Instead of just reporting the errno, add some more verbose debugging message in the reclaim path. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm zoned: avoid unnecessary device recalulation for secondary superblockHannes Reinecke
The secondary superblock must reside on the same device as the primary superblock, so there is no need to re-calculate the device. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm zoned: add debugging message for reading superblocksHannes Reinecke
Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm ebs: use dm_bufio_forget_buffersMikulas Patocka
Use dm_bufio_forget_buffers instead of a block-by-block loop that calls dm_bufio_forget. dm_bufio_forget_buffers is faster than the loop because it searches for used buffers using rb-tree. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm bufio: introduce forget_buffer_lockedMikulas Patocka
Introduce a function forget_buffer_locked that forgets a range of buffers. It is more efficient than calling forget_buffer in a loop. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm bufio: clean up rbtree block orderingMikulas Patocka
dm-bufio uses unnatural ordering in the rb-tree - blocks with smaller numbers were put to the right node and blocks with bigger numbers were put to the left node. Reverse that logic so that it's natural. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm integrity: add status line documentationMikulas Patocka
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-04dm bufio: delete unused and inefficient dm_bufio_discard_buffersMikulas Patocka
There is no user for this interface. If in future it is needed it can be reimplemented to walk the rbtree of buffers instead of doing block-by-block lookups. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-22dm zoned: remove leftover hunk for switching to sequential zonesHannes Reinecke
Remove a leftover hunk to switch from random zones to sequential zones when selecting a reclaim zone; the logic has moved into the caller and this hunk is now pointless. Fixes: 34f5affd04c4 ("dm zoned: separate random and cache zones") Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm zoned: terminate reclaim on congestionHannes Reinecke
When dmz_get_chunk_mapping() selects a zone which is under reclaim we should terminate the reclaim copy process. Since we're changing the zone itself, reclaim needs to run afterwards again anyway. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm zoned: start reclaim with sequential zonesHannes Reinecke
Sequential zones perform better for reclaim, so start off using them and only use random zones as a fallback when cache zones are present. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm zoned: reclaim random zones when idleHannes Reinecke
When the system is idle we should be starting reclaiming random zones, too. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm zoned: separate random and cache zonesHannes Reinecke
Instead of lumping emulated zones together with random zones we should be handling them as separate 'cache' zones. This improves code readability and allows an easier implementation of different cache policies. Also add additional allocation flags, to separate the type (cache, random, or sequential) from the purpose (eg reclaim). Also switch the allocation policy to not use random zones as buffer zones if cache zones are present. This avoids a performance drop when all cache zones are used. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm zoned: return NULL if dmz_get_zone_for_reclaim() fails to find a zoneHannes Reinecke
The only case where dmz_get_zone_for_reclaim() cannot return a zone is if the respective lists are empty. So we should just return a simple NULL value here as we really don't have an error code which would make sense. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm zoned: Avoid 64-bit division error in dmz_fixup_devicesNathan Chancellor
When building arm32 allyesconfig: ld.lld: error: undefined symbol: __aeabi_uldivmod >>> referenced by dm-zoned-target.c >>> md/dm-zoned-target.o:(dmz_ctr) in archive drivers/built-in.a dmz_fixup_devices uses DIV_ROUND_UP with variables of type sector_t. As such, it should be using DIV_ROUND_UP_SECTOR_T, which handles this automatically. Fixes: 70978208ec91 ("dm zoned: metadata version 2") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm: use DMDEBUG macros now that they use pr_debug variantsMike Snitzer
Now that DMDEBUG uses pr_debug and DMDEBUG_LIMIT uses pr_debug_ratelimited cleanup DM's 2 direct pr_debug callers to use them to get the benefit of consistent DM_FMT formatting of debugging messages. While doing so, dm-mpath.c:dm_report_EIO() was switched over to using DMDEBUG_LIMIT due to the potential for error handling floods in the IO completion path. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm zoned: remove spurious newlines from debugging messagesHannes Reinecke
DMDEBUG will already add a newline to the logging messages, so we shouldn't be adding it to the message itself. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm: use dynamic debug instead of compile-time config optionHannes Reinecke
Switch to use dynamic debug to avoid having recompile the kernel just to enable debugging messages. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm: replace zero-length array with flexible-arrayGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm zoned: metadata version 2Hannes Reinecke
Implement handling for metadata version 2. The new metadata adds a label and UUID for the device mapper device, and additional UUID for the underlying block devices. It also allows for an additional regular drive to be used for emulating random access zones. The emulated zones will be placed logically in front of the zones from the zoned block device, causing the superblocks and metadata to be stored on that device. The first zone of the original zoned device will be used to hold another, tertiary copy of the metadata; this copy carries a generation number of 0 and is never updated; it's just used for identification. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm zoned: ignore metadata zone in dmz_alloc_zone()Hannes Reinecke
When looking up zones in dmz_alloc_zone() we need to ignore metadata zones so as not to accidentally overwrite metadata. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm zoned: Reduce logging output on startupHannes Reinecke
dm-zoned is becoming quite chatty during startup; reduce the noise by moving some information to 'debug' level. Suggested-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm zoned: add metadata logging functionsHannes Reinecke
Use the metadata label for logging and not the underlying device. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-19dm zoned: use dmz_zone_to_dev() when handling metadata I/OHannes Reinecke
Use accessors to retrieve the device pointer in preparation for adding an additional block device. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-19dm zoned: replace 'target' pointer in the bio contextHannes Reinecke
Replace the 'target' pointer in the bio context with the device pointer as this is what's actually used. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-19dm zoned: remove 'dev' argument from reclaimHannes Reinecke
Use the dmz_zone_to_dev() mapping function to remove the 'dev' argument from reclaim. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm zoned: Introduce dmz_dev_is_dying() and dmz_check_dev()Hannes Reinecke
Introduce accessors dmz_dev_is_dying() and dmz_check_dev() to avoid having to reference the devices directly. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm zoned: introduce dmz_metadata_label() to format device nameHannes Reinecke
Introduce dmz_metadata_label() to format the device-mapper device name and use it instead of the device name of the underlying device. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm zoned: move fields from struct dmz_dev to dmz_metadataHannes Reinecke
Move fields from the device structure into the metadata structure and provide accessor functions. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm zoned: store device in struct dmz_sbHannes Reinecke
Store the device together with the superblock so that we don't have to recur to the metadata to find it. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm zoned: use array for superblock zonesHannes Reinecke
Instead of storing just the first superblock zone and calculate the secondary relative to that we should be using an array for holding the superblock zones. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm zoned: store zone id within the zone structure and kill dmz_id()Hannes Reinecke
Instead of calculating the zone index by the offset within the zone array store the index within the structure itself. With that the helper dmz_id() is pointless and can be replaced with accessing the ->id value directly. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm zoned: add 'message' callbackHannes Reinecke
Add callback for 'dmsetup message' to allow the reclaim process to be triggered manually. Eg. dmsetup message /dev/dm-X 0 message will start the reclaim process even if the default threshold of 50 percent of free random zones is not reached. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm zoned: add 'status' callbackHannes Reinecke
Add callback to supply information for 'dmsetup status' and 'dmsetup table'. The output for 'dmsetup status' is 0 <size> zoned <nr_zones> zones <nr_unmap_rnd>/<nr_rnd> random <nr_unmap_seq>/<nr_seq> sequential where <nr_unmap_rnd> is the number of unmapped (ie free) random zones, <nr_rnd> the total number of random zones, <nr_unmap_seq> the number of unmapped sequential zones, and <nr_seq> the total number of sequential zones. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm mpath: add Historical Service Time Path SelectorKhazhismel Kumykov
This new selector keeps an exponential moving average of the service time for each path (losely defined as delta between start_io and end_io), and uses this along with the number of inflight requests to estimate future service time for a path. Since we don't have a prober to account for temporally slow paths, re-try "slow" paths every once in a while (num_paths * historical_service_time). To account for fast paths transitioning to slow, if a path has not completed any request within (num_paths * historical_service_time), limit the number of outstanding requests. To account for low volume situations where number of inflight IOs would be zero, the last finish time of each path is factored in. Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Co-developed-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm mpath: pass IO start time to path selectorGabriel Krisman Bertazi
The HST path selector needs this information to perform path prediction. For request-based mpath, struct request's io_start_time_ns is used, while for bio-based, use the start_time stored in dm_io. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm writecache: improve performance on DDR persistent memory (Optane)Mikulas Patocka
When testing the dm-writecache target on a real DDR persistent memory (Intel Optane), it turned out that explicit cache flushing using the clflushopt instruction performs better than non-temporal stores for block sizes 1k, 2k and 4k. The dm-writecache target is singlethreaded (all the copying is done while holding the writecache lock), so it benefits from clwb, see: http://lore.kernel.org/r/alpine.LRH.2.02.2004160411460.7833@file01.intranet.prod.int.rdu2.redhat.com Add a new function memcpy_flushcache_optimized() that tests if clflushopt is present - and if it is, we use it instead of memcpy_flushcache. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm writecache: remove superfluous test in persistent_memory_claimMikulas Patocka
Remove superfluous test if dax_dev is NULL - dax_direct_access already does this test. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm persistent data: switch exit_ro_spine to return voidZhiqiang Liu
In commit 4c7da06f5a78 ("dm persistent data: eliminate unnecessary return values"), r value in exit_ro_spine will not change, so exit_ro_spine doesn't need a return value. Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm integrity: remove set but not used variablesYueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: drivers/md/dm-integrity.c: In function 'integrity_metadata': drivers/md/dm-integrity.c:1557:12: warning: variable 'save_metadata_offset' set but not used [-Wunused-but-set-variable] drivers/md/dm-integrity.c:1556:12: warning: variable 'save_metadata_block' set but not used [-Wunused-but-set-variable] They are never used, so remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm ebs: pass discards down to underlying deviceHeinz Mauelshagen
Make use of dm_bufio_issue_discard() to pass discards down to the underlying device. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>