summaryrefslogtreecommitdiff
path: root/drivers/md/persistent-data
AgeCommit message (Collapse)Author
2018-04-03dm bufio: move dm-bufio.h to include/linux/Mikulas Patocka
Move dm-bufio.h to include/linux/ so that external GPL'd DM target modules can use it. It is better to allow the use of dm-bufio than force external modules to implement the equivalent buffered IO mechanism in some new way. The hope is this will encourage the use of dm-bufio; which will then make it easier for a GPL'd external DM target module to be included upstream. A couple dm-bufio EXPORT_SYMBOL exports have also been updated to use EXPORT_SYMBOL_GPL. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm btree: fix serious bug in btree_split_beneath()Joe Thornber
When inserting a new key/value pair into a btree we walk down the spine of btree nodes performing the following 2 operations: i) space for a new entry ii) adjusting the first key entry if the new key is lower than any in the node. If the _root_ node is full, the function btree_split_beneath() allocates 2 new nodes, and redistibutes the root nodes entries between them. The root node is left with 2 entries corresponding to the 2 new nodes. btree_split_beneath() then adjusts the spine to point to one of the two new children. This means the first key is never adjusted if the new key was lower, ie. operation (ii) gets missed out. This can result in the new key being 'lost' for a period; until another low valued key is inserted that will uncover it. This is a serious bug, and quite hard to make trigger in normal use. A reproducing test case ("thin create devices-in-reverse-order") is available as part of the thin-provision-tools project: https://github.com/jthornber/thin-provisioning-tools/blob/master/functional-tests/device-mapper/dm-tests.scm#L593 Fix the issue by changing btree_split_beneath() so it no longer adjusts the spine. Instead it unlocks both the new nodes, and lets the main loop in btree_insert_raw() relock the appropriate one and make any neccessary adjustments. Cc: stable@vger.kernel.org Reported-by: Monty Pavel <monty_pavel@sina.com> Signed-off-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-14Merge tag 'for-4.15/dm' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - a few conversions from atomic_t to ref_count_t - a DM core fix for a race during device destruction that could result in a BUG_ON - a stable@ fix for a DM cache race condition that could lead to data corruption when operating in writeback mode (writethrough is default) - various DM cache cleanups and improvements - add DAX support to the DM log-writes target - a fix for the DM zoned target's ability to deal with the last zone of the drive being smaller than all others - a stable@ DM crypt and DM integrity fix for a negative check that was to restrictive (prevented slab debug with XFS ontop of DM crypt from working) - a DM raid target fix for a panic that can occur when forcing a raid to sync * tag 'for-4.15/dm' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (25 commits) dm cache: lift common migration preparation code to alloc_migration() dm cache: remove usused deferred_cells member from struct cache dm cache policy smq: allocate cache blocks in order dm cache policy smq: change max background work from 10240 to 4096 blocks dm cache background tracker: limit amount of background work that may be issued at once dm cache policy smq: take origin idle status into account when queuing writebacks dm cache policy smq: handle races with queuing background_work dm raid: fix panic when attempting to force a raid to sync dm integrity: allow unaligned bv_offset dm crypt: allow unaligned bv_offset dm: small cleanup in dm_get_md() dm: fix race between dm_get_from_kobject() and __dm_destroy() dm: allocate struct mapped_device with kvzalloc dm zoned: ignore last smaller runt zone dm space map metadata: use ARRAY_SIZE dm log writes: add support for DAX dm log writes: add support for inline data buffers dm cache: simplify get_per_bio_data() by removing data_size argument dm cache: remove all obsolete writethrough-specific code dm cache: submit writethrough writes in parallel to origin and cache ...
2017-11-10dm space map metadata: use ARRAY_SIZEJérémy Lefaure
Using the ARRAY_SIZE macro improves the readability of the code. Found with Coccinelle with the following semantic patch: @r depends on (org || report)@ type T; T[] E; position p; @@ ( (sizeof(E)@p /sizeof(*E)) | (sizeof(E)@p /sizeof(E[...])) | (sizeof(E)@p /sizeof(T)) ) Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-15dm space map disk: fix some book keeping in the disk space mapJoe Thornber
When decrementing the reference count for a block, the free count wasn't being updated if the reference count went to zero. Cc: stable@vger.kernel.org Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-27dm block manager: remove an unused argument from dm_block_manager_create()Bart Van Assche
The 'cache_size' argument of dm_block_manager_create() has never been used. Remove it along with the definitions of the constants passed as the 'cache_size' argument. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-24dm btree: fix for dm_btree_find_lowest_key()Vinothkumar Raja
dm_btree_find_lowest_key() is giving incorrect results. find_key() traverses the btree correctly for finding the highest key, but there is an error in the way it traverses the btree for retrieving the lowest key. dm_btree_find_lowest_key() fetches the first key of the rightmost block of the btree instead of fetching the first key from the leftmost block. Fix this by conditionally passing the correct parameter to value64() based on the @find_highest flag. Cc: stable@vger.kernel.org Signed-off-by: Erez Zadok <ezk@fsl.cs.sunysb.edu> Signed-off-by: Vinothkumar Raja <vinraja@cs.stonybrook.edu> Signed-off-by: Nidhi Panpalia <npanpalia@cs.stonybrook.edu> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-03-02sched/headers: Prepare to move the get_task_struct()/put_task_struct() and ↵Ingo Molnar
related APIs from <linux/sched.h> to <linux/sched/task.h> But first update usage sites with the new header dependency. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-21Merge tag 'dm-4.11-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Fix dm-raid transient device failure processing and other smaller tweaks. - Add journal support to the DM raid target to close the 'write hole' on raid 4/5/6. - Fix dm-cache corruption, due to rounding bug, when cache exceeds 2TB. - Add 'metadata2' feature to dm-cache to separate the dirty bitset out from other cache metadata. This improves speed of shutting down a large cache device (which implies writing out dirty bits). - Fix a memory leak during dm-stats data structure destruction. - Fix a DM multipath round-robin path selector performance regression that was caused by less precise balancing across all paths. - Lastly, introduce a DM core fix for a long-standing DM snapshot deadlock that is rooted in the complexity of the device stack used in conjunction with block core maintaining bios on current->bio_list to manage recursion in generic_make_request(). A more comprehensive fix to block core (and its hook in the cpu scheduler) would be wonderful but this DM-specific fix is pragmatic considering how difficult it has been to make progress on a generic fix. * tag 'dm-4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits) dm: flush queued bios when process blocks to avoid deadlock dm round robin: revert "use percpu 'repeat_count' and 'current_path'" dm stats: fix a leaked s->histogram_boundaries array dm space map metadata: constify dm_space_map structures dm cache metadata: use cursor api in blocks_are_clean_separate_dirty() dm persistent data: add cursor skip functions to the cursor APIs dm cache metadata: use dm_bitset_new() to create the dirty bitset in format 2 dm bitset: add dm_bitset_new() dm cache metadata: name the cache block that couldn't be loaded dm cache metadata: add "metadata2" feature dm cache metadata: use bitset cursor api to load discard bitset dm bitset: introduce cursor api dm btree: use GFP_NOFS in dm_btree_del() dm space map common: memcpy the disk root to ensure it's arch aligned dm block manager: add unlikely() annotations on dm_bufio error paths dm cache: fix corruption seen when using cache > 2TB dm raid: cleanup awkward branching in raid_message() option processing dm raid: use mddev rather than rdev->mddev dm raid: use read_disk_sb() throughout dm raid: add raid4/5/6 journaling support ...
2017-02-16dm space map metadata: constify dm_space_map structuresBhumika Goyal
Declare dm_space_map structures as const as they are only passed as an argument to the function memcpy. This argument is of type const void *, so dm_space_map structures having this property can be declared as const. File size before: text data bss dec hex filename 4889 240 0 5129 1409 dm-space-map-metadata.o File size after: text data bss dec hex filename 5139 0 0 5139 1413 dm-space-map-metadata.o Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-02-16dm persistent data: add cursor skip functions to the cursor APIsJoe Thornber
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-02-16dm bitset: add dm_bitset_new()Joe Thornber
A more efficient way of creating a populated bitset. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-02-16dm bitset: introduce cursor apiJoe Thornber
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-02-16dm btree: use GFP_NOFS in dm_btree_del()Joe Thornber
dm_btree_del() is called from an ioctl so don't recurse into FS. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-02-16dm space map common: memcpy the disk root to ensure it's arch alignedJoe Thornber
The metadata_space_map_root passed to sm_ll_open_metadata() may or may not be arch aligned, use memcpy to ensure it is. This is not a fast path so the extra memcpy doesn't hurt us. Long-term it'd be better to use the kernel's alignment infrastructure to remove the memcpy()s that are littered across persistent-data (btree, array, space-maps, etc). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-02-16dm block manager: add unlikely() annotations on dm_bufio error pathsJoe Thornber
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-01-14sched/core: Remove set_task_state()Davidlohr Bueso
This is a nasty interface and setting the state of a foreign task must not be done. As of the following commit: be628be0956 ("bcache: Make gc wakeup sane, remove set_task_state()") ... everyone in the kernel calls set_task_state() with current, allowing the helper to be removed. However, as the comment indicates, it is still around for those archs where computing current is more expensive than using a pointer, at least in theory. An important arch that is affected is arm64, however this has been addressed now [1] and performance is up to par making no difference with either calls. Of all the callers, if any, it's the locking bits that would care most about this -- ie: we end up passing a tsk pointer to a lot of the lock slowpath, and setting ->state on that. The following numbers are based on two tests: a custom ad-hoc microbenchmark that just measures latencies (for ~65 million calls) between get_task_state() vs get_current_state(). Secondly for a higher overview, an unlink microbenchmark was used, which pounds on a single file with open, close,unlink combos with increasing thread counts (up to 4x ncpus). While the workload is quite unrealistic, it does contend a lot on the inode mutex or now rwsem. [1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com == 1. x86-64 == Avg runtime set_task_state(): 601 msecs Avg runtime set_current_state(): 552 msecs vanilla dirty Hmean unlink1-processes-2 36089.26 ( 0.00%) 38977.33 ( 8.00%) Hmean unlink1-processes-5 28555.01 ( 0.00%) 29832.55 ( 4.28%) Hmean unlink1-processes-8 37323.75 ( 0.00%) 44974.57 ( 20.50%) Hmean unlink1-processes-12 43571.88 ( 0.00%) 44283.01 ( 1.63%) Hmean unlink1-processes-21 34431.52 ( 0.00%) 38284.45 ( 11.19%) Hmean unlink1-processes-30 34813.26 ( 0.00%) 37975.17 ( 9.08%) Hmean unlink1-processes-48 37048.90 ( 0.00%) 39862.78 ( 7.59%) Hmean unlink1-processes-79 35630.01 ( 0.00%) 36855.30 ( 3.44%) Hmean unlink1-processes-110 36115.85 ( 0.00%) 39843.91 ( 10.32%) Hmean unlink1-processes-141 32546.96 ( 0.00%) 35418.52 ( 8.82%) Hmean unlink1-processes-172 34674.79 ( 0.00%) 36899.21 ( 6.42%) Hmean unlink1-processes-203 37303.11 ( 0.00%) 36393.04 ( -2.44%) Hmean unlink1-processes-224 35712.13 ( 0.00%) 36685.96 ( 2.73%) == 2. ppc64le == Avg runtime set_task_state(): 938 msecs Avg runtime set_current_state: 940 msecs vanilla dirty Hmean unlink1-processes-2 19269.19 ( 0.00%) 30704.50 ( 59.35%) Hmean unlink1-processes-5 20106.15 ( 0.00%) 21804.15 ( 8.45%) Hmean unlink1-processes-8 17496.97 ( 0.00%) 17243.28 ( -1.45%) Hmean unlink1-processes-12 14224.15 ( 0.00%) 17240.21 ( 21.20%) Hmean unlink1-processes-21 14155.66 ( 0.00%) 15681.23 ( 10.78%) Hmean unlink1-processes-30 14450.70 ( 0.00%) 15995.83 ( 10.69%) Hmean unlink1-processes-48 16945.57 ( 0.00%) 16370.42 ( -3.39%) Hmean unlink1-processes-79 15788.39 ( 0.00%) 14639.27 ( -7.28%) Hmean unlink1-processes-110 14268.48 ( 0.00%) 14377.40 ( 0.76%) Hmean unlink1-processes-141 14023.65 ( 0.00%) 16271.69 ( 16.03%) Hmean unlink1-processes-172 13417.62 ( 0.00%) 16067.55 ( 19.75%) Hmean unlink1-processes-203 15293.08 ( 0.00%) 15440.40 ( 0.96%) Hmean unlink1-processes-234 13719.32 ( 0.00%) 16190.74 ( 18.01%) Hmean unlink1-processes-265 16400.97 ( 0.00%) 16115.22 ( -1.74%) Hmean unlink1-processes-296 14388.60 ( 0.00%) 16216.13 ( 12.70%) Hmean unlink1-processes-320 15771.85 ( 0.00%) 15905.96 ( 0.85%) x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching) and ppc64 (with paca) show similar improvements in the unlink microbenches. The small delta for ppc64 (2ms), does not represent the gains on the unlink runs. In the case of x86, there was a decent amount of variation in the latency runs, but always within a 20 to 50ms increase), ppc was more constant. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: mark.rutland@arm.com Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-08dm space map: always set ev if sm_ll_mutate() succeedsBenjamin Marzinski
If no block was allocated or freed, sm_ll_mutate() wasn't setting *ev, leaving the variable unitialized. sm_ll_insert(), sm_disk_inc_block(), and sm_disk_new_block() all check ev to see if there was an allocation event in sm_ll_mutate(), possibly reading unitialized data. If no allocation event occured, sm_ll_mutate() should set *ev to SM_NONE. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-12-08dm space map metadata: skip useless memcpy in metadata_ll_init_index()Benjamin Marzinski
When metadata_ll_init_index() is called by sm_ll_new_metadata(), ll->mi_le hasn't been initialized yet. So, when metadata_ll_init_index() copies the contents of ll->mi_le into the newly allocated bitmap_root, it is just copying garbage. ll->mi_le will be allocated later in sm_ll_extend() and copied into the bitmap_root, in sm_ll_commit(). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-12-08dm space map metadata: fix 'struct sm_metadata' leak on failed createBenjamin Marzinski
In dm_sm_metadata_create() we temporarily change the dm_space_map operations from 'ops' (whose .destroy function deallocates the sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't). If dm_sm_metadata_create() fails in sm_ll_new_metadata() or sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls dm_sm_destroy() with the intention of freeing the sm_metadata, but it doesn't (because the dm_space_map operations is still set to 'bootstrap_ops'). Fix this by setting the dm_space_map operations back to 'ops' if dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2016-12-08dm array: remove a dead assignment in populate_ablock_with_values()Bart Van Assche
A value is assigned to 'nr_entries' but is never used, remove it. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-11-14dm block manager: make block locking optionalJoe Thornber
The block manager's locking is useful for catching cycles that may result from certain btree metadata corruption. But in general it serves as a developer tool to catch bugs in code. Unless you're finding that DM thin provisioning is hanging due to infinite loops within the block manager's access to btree nodes you can safely disable this feature. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> # do/while(0) macro fix Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-22dm array: introduce cursor apiJoe Thornber
More efficient way to iterate an array due to prefetching (makes use of the new dm_btree_cursor_* api). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-22dm btree: introduce cursor apiJoe Thornber
This uses prefetching to speed up iteration through a btree. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-22dm array: add dm_array_new()Joe Thornber
dm_array_new() creates a new, populated array more efficiently than starting with an empty one and resizing. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20dm btree: fix a bug in dm_btree_find_next_single()Joe Thornber
dm_btree_find_next_single() can short-circuit the search for a block with a return of -ENODATA if all entries are higher than the search key passed to lower_bound(). This hasn't been a problem because of the way the btree has been used by DM thinp. But it must be fixed now in preparation for fixing the race in DM thinp's handling of simultaneous block discard vs allocation. Otherwise, once that fix is in place, some of the blocks in a discard would not be unmapped as expected. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-14dm space map metadata: remove unused variable in brb_pop()Mike Snitzer
Remove the unused struct block_op pointer that was inadvertantly introduced, via cut-and-paste of previous brb_op() code, as part of commit 50dd842ad. (Cc'ing stable@ because commit 50dd842ad did) Fixes: 50dd842ad ("dm space map metadata: fix ref counting bug when bootstrapping a new space map") Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-12-10dm btree: factor out need_insert() helperMike Snitzer
Eliminates code duplication within insert(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10dm bufio: store stacktrace in buffers to help find buffer leaksMikulas Patocka
The option DM_DEBUG_BLOCK_STACK_TRACING is moved from persistent-data directory to device mapper directory because it will now be used by persistent-data and bufio. When the option is enabled, each bufio buffer stores the stacktrace of the last dm_bufio_get(), dm_bufio_read() or dm_bufio_new() call that increased the hold count to 1. The buffer's stacktrace is printed if the buffer was not released before the bufio client is destroyed. When DM_DEBUG_BLOCK_STACK_TRACING is enabled, any bufio buffer leaks are considered warnings - i.e. the kernel continues afterwards. If not enabled, buffer leaks are considered BUGs and the kernel with crash. Reasoning on this disposition is: if we only ever warned on buffer leaks users would generally ignore them and the problematic code would never get fixed. Successfully used to find source of bufio leaks fixed with commit fce079f63c3 ("dm btree: fix bufio buffer leaks in dm_btree_del() error path"). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10dm block manager: cleanup code that prints stacktraceMikulas Patocka
There is no need to record stack trace and immediately print it. Just use dump_stack() to print the current stack. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10dm btree: fix bufio buffer leaks in dm_btree_del() error pathJoe Thornber
If dm_btree_del()'s call to push_frame() fails, e.g. due to btree_node_validator finding invalid metadata, the dm_btree_del() error path must unlock all frames (which have active dm-bufio buffers) that were pushed onto the del_stack. Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio buffers have leaked, e.g.: device-mapper: bufio: leaked buffer 3, hold count 1, list 0 Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-12-09dm space map metadata: fix ref counting bug when bootstrapping a new space mapJoe Thornber
When applying block operations (BOPs) do not remove them from the uncommitted BOP ring-buffer until after they've been applied -- in case we recurse. Also, perform BOP_INC operation, in dm_sm_metadata_create() and sm_metadata_extend(), in terms of the uncommitted BOP ring-buffer rather than using direct calls to sm_ll_inc(). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-12-02dm thin metadata: fix bug in dm_thin_remove_range()Joe Thornber
dm_btree_remove_leaves() only unmaps a contiguous region so we need a loop, in __remove_range(), to handle ranges that contain multiple regions. A new btree function, dm_btree_lookup_next(), is introduced which is more efficiently able to skip over regions of the thin device which aren't mapped. __remove_range() uses dm_btree_lookup_next() for each iteration of __remove_range()'s loop. Also, improve description of dm_btree_remove_leaves(). Fixes: 6550f075 ("dm thin metadata: add dm_thin_remove_range()") Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 4.1+
2015-12-02dm btree: fix leak of bufio-backed block in btree_split_sibling error pathMike Snitzer
The block allocated at the start of btree_split_sibling() is never released if later insert_at() fails. Fix this by releasing the previously allocated bufio block using unlock_block(). Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-11-04Merge tag 'dm-4.4-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: "Smaller set of DM changes for this merge. I've based these changes on Jens' for-4.4/reservations branch because the associated DM changes required it. - Revert a dm-multipath change that caused a regression for unprivledged users (e.g. kvm guests) that issued ioctls when a multipath device had no available paths. - Include Christoph's refactoring of DM's ioctl handling and add support for passing through persistent reservations with DM multipath. - All other changes are very simple cleanups" * tag 'dm-4.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm switch: simplify conditional in alloc_region_table() dm delay: document that offsets are specified in sectors dm delay: capitalize the start of an delay_ctr() error message dm delay: Use DM_MAPIO macros instead of open-coded equivalents dm linear: remove redundant target name from error messages dm persistent data: eliminate unnecessary return values dm: eliminate unused "bioset" process for each bio-based DM device dm: convert ffs to __ffs dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() dm: add support for passing through persistent reservations dm: refactor ioctl handling Revert "dm mpath: fix stalls when handling invalid ioctls" dm: initialize non-blk-mq queue data before queue is used
2015-10-31dm persistent data: eliminate unnecessary return valuesMikulas Patocka
dm_bm_unlock and dm_tm_unlock return an integer value but the returned value is always 0. The calling code sometimes checks the return value and sometimes doesn't. Eliminate these unnecessary return values and also the checks for them. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-10-23dm btree: fix leak of bufio-backed block in btree_split_beneath error pathMike Snitzer
btree_split_beneath()'s error path had an outstanding FIXME that speaks directly to the potential for _not_ cleaning up a previously allocated bufio-backed block. Fix this by releasing the previously allocated bufio block using unlock_block(). Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <thornber@redhat.com> Cc: stable@vger.kernel.org
2015-10-23dm btree remove: fix a bug when rebalancing nodes after removalJoe Thornber
Commit 4c7e309340ff ("dm btree remove: fix bug in redistribute3") wasn't a complete fix for redistribute3(). The redistribute3 function takes 3 btree nodes and shares out the entries evenly between them. If the three nodes in total contained (MAX_ENTRIES * 3) - 1 entries between them then this was erroneously getting rebalanced as (MAX_ENTRIES - 1) on the left and right, and (MAX_ENTRIES + 1) in the center. Fix this issue by being more careful about calculating the target number of entries for the left and right nodes. Unit tested in userspace using this program: https://github.com/jthornber/redistribute3-test/blob/master/redistribute3_t.c Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-08-12dm: remove unlikely() before IS_ERR()viresh kumar
IS_ERR() already contains an 'unlikely' compiler flag so there is no need to do that again from IS_ERR() callers. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-08-12dm btree remove: remove unused function get_nr_entries()Vivek Goyal
rebalance_children() calls get_nr_entries() and assigns the result to an unused local 'child_entries' variable. Remove get_nr_entries() and cleanup rebalance_children() accordingly. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-08-12dm btree: remove unused "dm_block_t root" parameter in btree_split_sibling()Vivek Goyal
Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-08-12dm btree: add ref counting ops for the leaves of top level btreesJoe Thornber
When using nested btrees, the top leaves of the top levels contain block addresses for the root of the next tree down. If we shadow a shared leaf node the leaf values (sub tree roots) should be incremented accordingly. This is only an issue if there is metadata sharing in the top levels. Which only occurs if metadata snapshots are being used (as is possible with dm-thinp). And could result in a block from the thinp metadata snap being reused early, thus corrupting the thinp metadata snap. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-08-07dm btree remove: fix bug in remove_one()Joe Thornber
remove_one() was not incrementing the key for the beginning of the range, so not all entries were being removed. This resulted in discards that were not unmapping all blocks. Fixes: 4ec331c3ea ("dm btree: add dm_btree_remove_leaves()") Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-07-06dm btree: silence lockdep lock inversion in dm_btree_del()Joe Thornber
Allocate memory using GFP_NOIO when deleting a btree. dm_btree_del() can be called via an ioctl and we don't want to recurse into the FS or block layer. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-07-05dm btree remove: fix bug in redistribute3Dennis Yang
redistribute3() shares entries out across 3 nodes. Some entries were being moved the wrong way, breaking the ordering. This manifested as a BUG() in dm-btree-remove.c:shift() when entries were removed from the btree. For additional context see: https://www.redhat.com/archives/dm-devel/2015-May/msg00113.html Signed-off-by: Dennis Yang <shinrairis@gmail.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-06-17dm space map metadata: fix occasional leak of a metadata block on resizeJoe Thornber
The metadata space map has a simplified 'bootstrap' mode that is operational when extending the space maps. Whilst in this mode it's possible for some refcount decrement operations to become queued (eg, as a result of shadowing one of the bitmap indexes). These decrements were not being applied when switching out of bootstrap mode. The effect of this bug was the leaking of a 4k metadata block. This is detected by the latest version of thin_check as a non fatal error. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-06-11dm btree: add dm_btree_remove_leaves()Joe Thornber
Removes a range of leaf values from the tree. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29dm thin metadata: remove in-core 'read_only' flagMike Snitzer
Leverage the block manager's read_only flag instead of duplicating it; access with new dm_bm_is_read_only() method. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-02-21Merge tag 'dm-3.20-changes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull more device mapper changes from Mike Snitzer: - Significant dm-crypt CPU scalability performance improvements thanks to changes that enable effective use of an unbound workqueue across all available CPUs. A large battery of tests were performed to validate these changes, summary of results is available here: https://www.redhat.com/archives/dm-devel/2015-February/msg00106.html - A few additional stable fixes (to DM core, dm-snapshot and dm-mirror) and a small fix to the dm-space-map-disk. * tag 'dm-3.20-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm snapshot: fix a possible invalid memory access on unload dm: fix a race condition in dm_get_md dm crypt: sort writes dm crypt: add 'submit_from_crypt_cpus' option dm crypt: offload writes to thread dm crypt: remove unused io_pool and _crypt_io_pool dm crypt: avoid deadlock in mempools dm crypt: don't allocate pages for a partial request dm crypt: use unbound workqueue for request processing dm io: reject unsupported DISCARD requests with EOPNOTSUPP dm mirror: do not degrade the mirror on discard error dm space map disk: fix sm_disk_count_is_more_than_one()