summaryrefslogtreecommitdiff
path: root/fs/btrfs/dev-replace.c
AgeCommit message (Collapse)Author
2016-09-26btrfs: convert pr_* to btrfs_* where possibleJeff Mahoney
For many printks, we want to know which file system issued the message. This patch converts most pr_* calls to use the btrfs_* versions instead. In some cases, this means adding plumbing to allow call sites access to an fs_info pointer. fs/btrfs/check-integrity.c is left alone for another day. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-26btrfs: unsplit printed stringsJeff Mahoney
CodingStyle chapter 2: "[...] never break user-visible strings such as printk messages, because that breaks the ability to grep for them." This patch unsplits user-visible strings. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26btrfs: btrfs_test_opt and friends should take a btrfs_fs_infoJeff Mahoney
btrfs_test_opt and friends only use the root pointer to access the fs_info. Let's pass the fs_info directly in preparation to eliminate similar patterns all over btrfs. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-25Merge branch 'cleanups-4.7' into for-chris-4.7-20160525David Sterba
2016-05-25btrfs: fix string and comment grammatical issues and typosNicholas D Steeves
Signed-off-by: Nicholas D Steeves <nsteeves@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-17Merge branch 'for-chris-4.7' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.7 Signed-off-by: Chris Mason <clm@fb.com>
2016-05-13Btrfs: don't wait for unrelated IO to finish before relocationFilipe Manana
Before the relocation process of a block group starts, it sets the block group to readonly mode, then flushes all delalloc writes and then finally it waits for all ordered extents to complete. This last step includes waiting for ordered extents destinated at extents allocated in other block groups, making us waste unecessary time. So improve this by waiting only for ordered extents that fall into the block group's range. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
2016-05-04btrfs: cleanup assigning next active device with a checkAnand Jain
Creates helper fucntion as needed by the device delete and replace operations. Also now it checks if the next device being assigned is an active device. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-04btrfs: s_bdev is not null after missing replaceAnand Jain
Yauhen reported in the ML that s_bdev is null at mount, and s_bdev gets updated to some device when missing device is replaced, as because bdev is null for missing device, things gets matched up. Fix this by checking if s_bdev is set. I didn't want to completely remove updating s_bdev because the future multi device support at vfs layer may need it. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reported-by: Yauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: refactor btrfs_dev_replace_start for reuseAnand Jain
A refactor patch, and avoids user input verification in the btrfs_dev_replace_start(), and so this function can be reused. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: use fs_info directlyAnand Jain
Local variable fs_info, contains root->fs_info, use it. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: rename btrfs_find_device_by_user_inputDavid Sterba
For clarity how we are going to find the device, let's call it a device specifier, devspec for short. Also rename the arguments that are a leftover from previous function purpose. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: enhance btrfs_find_device_by_user_input() to check device pathAnand Jain
The operation of device replace and device delete follows same steps upto some depth with in btrfs kernel, however they don't share codes. This enhancement will help replace and delete to share codes. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: create helper btrfs_find_device_by_user_input()Anand Jain
The patch renames btrfs_dev_replace_find_srcdev() to btrfs_find_device_by_user_input() and moves it to volumes.c, so that delete device can use it. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-04btrfs: Reset IO error counters before start of device replacingYauhen Kharuzhy
If device replace entry was found on disk at mounting and its num_write_errors stats counter has non-NULL value, then replace operation will never be finished and -EIO error will be reported by btrfs_scrub_dev() because this counter is never reset. # mount -o degraded /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/ # btrfs replace status /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/ Started on 25.Mar 07:28:00, canceled on 25.Mar 07:28:01 at 0.0%, 40 write errs, 0 uncorr. read errs # btrfs replace start -B 4 /dev/sdg /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/ ERROR: ioctl(DEV_REPLACE_START) failed on "/media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/": Input/output error, no error Reset num_write_errors and num_uncorrectable_read_errors counters in the dev_replace structure before start of replacing. Signed-off-by: Yauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-03-14btrfs: Fix misspellings in comments.Adam Buchbinder
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-26Merge branch 'foreign/liubo/replace-lockup' into for-chris-4.6David Sterba
2016-02-23Btrfs: fix lockdep deadlock warning due to dev_replaceLiu Bo
Xfstests btrfs/011 complains about a deadlock warning, [ 1226.649039] ========================================================= [ 1226.649039] [ INFO: possible irq lock inversion dependency detected ] [ 1226.649039] 4.1.0+ #270 Not tainted [ 1226.649039] --------------------------------------------------------- [ 1226.652955] kswapd0/46 just changed the state of lock: [ 1226.652955] (&delayed_node->mutex){+.+.-.}, at: [<ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0 [ 1226.652955] but this lock took another, RECLAIM_FS-unsafe lock in the past: [ 1226.652955] (&fs_info->dev_replace.lock){+.+.+.} and interrupts could create inverse lock ordering between them. [ 1226.652955] other info that might help us debug this: [ 1226.652955] Chain exists of: &delayed_node->mutex --> &found->groups_sem --> &fs_info->dev_replace.lock [ 1226.652955] Possible interrupt unsafe locking scenario: [ 1226.652955] CPU0 CPU1 [ 1226.652955] ---- ---- [ 1226.652955] lock(&fs_info->dev_replace.lock); [ 1226.652955] local_irq_disable(); [ 1226.652955] lock(&delayed_node->mutex); [ 1226.652955] lock(&found->groups_sem); [ 1226.652955] <Interrupt> [ 1226.652955] lock(&delayed_node->mutex); [ 1226.652955] *** DEADLOCK *** Commit 084b6e7c7607 ("btrfs: Fix a lockdep warning when running xfstest.") tried to fix a similar one that has the exactly same warning, but with that, we still run to this. The above lock chain comes from btrfs_commit_transaction ->btrfs_run_delayed_items ... ->__btrfs_update_delayed_inode ... ->__btrfs_cow_block ... ->find_free_extent ->cache_block_group ->load_free_space_cache ->btrfs_readpages ->submit_one_bio ... ->__btrfs_map_block ->btrfs_dev_replace_lock However, with high memory pressure, tasks which hold dev_replace.lock can be interrupted by kswapd and then kswapd is intended to release memory occupied by superblock, inodes and dentries, where we may call evict_inode, and it comes to [ 1226.652955] [<ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0 [ 1226.652955] [<ffffffff81459e74>] btrfs_remove_delayed_node+0x24/0x30 [ 1226.652955] [<ffffffff8140c5fe>] btrfs_evict_inode+0x34e/0x700 delayed_node->mutex may be acquired in __btrfs_release_delayed_node(), and it leads to a ABBA deadlock. To fix this, we can use "blocking rwlock" used in the case of extent_buffer, but things are simpler here since we only needs read's spinlock to blocking lock. With this, btrfs/011 no more produces warnings in dmesg. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: scrub: use GFP_KERNEL on the submission pathDavid Sterba
Scrub is not on the critical writeback path we don't need to use GFP_NOFS for all allocations. The failures are handled and stats passed back to userspace. Let's use GFP_KERNEL on the paths where everything is ok, ie. setup the global structures and the IO submission paths. Functions that do the repair and fixups still use GFP_NOFS as we might want to skip any other filesystem activity if we encounter an error. This could turn out to be unnecessary, but requires more review compared to the easy cases in this patch. Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-15btrfs: cleanup, stop casting for extent_map->lookup everywhereJeff Mahoney
Overloading extent_map->bdev to struct map_lookup * might have started out as a means to an end, but it's a pattern that's used all over the place now. Let's get rid of the casting and just add a union instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-12Merge branch 'fix/waitqueue-barriers' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.4
2015-10-12Merge branch 'anand/sysfs-updates-v4.3-rc3' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.4 Signed-off-by: Chris Mason <clm@fb.com>
2015-10-10btrfs: remove waitqueue_active check from btrfs_rm_dev_replace_unblockedDavid Sterba
Normally the waitqueue_active would need a barrier, but this is not necessary here because it's not a performance sensitive context and we can call wake_up directly. Suggested-by: Chris Mason <clm@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-08btrfs: switch message printers to _in_rcu variantsDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-01Btrfs: move kobj stuff out of dev_replace lock rangeLiu Bo
To avoid deadlock described in commit 084b6e7c7607 ("btrfs: Fix a lockdep warning when running xfstest."), we should move kobj stuff out of dev_replace lock range. "It is because the btrfs_kobj_{add/rm}_device() will call memory allocation with GFP_KERNEL, which may flush fs page cache to free space, waiting for it self to do the commit, causing the deadlock. To solve the problem, move btrfs_kobj_{add/rm}_device() out of the dev_replace lock range, also involing split the btrfs_rm_dev_replace_srcdev() function into remove and free parts. Now only btrfs_rm_dev_replace_remove_srcdev() is called in dev_replace lock range, and kobj_{add/rm} and btrfs_rm_dev_replace_free_srcdev() are called out of the lock range." Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> [added lockup description] Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-01Btrfs: kernel operation should come after user input has been verifiedAnand Jain
By general rule of thumb there shouldn't be any way that user land could trigger a kernel operation just by sending wrong arguments. Here do commit cleanups after user input has been verified. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29Btrfs: rename btrfs_kobj_rm_device to btrfs_sysfs_rm_device_linkAnand Jain
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-09-29Btrfs: rename btrfs_kobj_add_device to btrfs_sysfs_add_device_linkAnand Jain
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-08-31Btrfs: cleanup: remove unnecessary check before btrfs_free_path is calledTsutomu Itoh
We need not check path before btrfs_free_path() is called because path is checked in btrfs_free_path(). Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-07-22btrfs: its btrfs_err() instead of btrfs_error()Anand Jain
sorry I indented to use btrfs_err() and I have no idea how btrfs_error() got there. infact I was thinking about these kind of oversights since these two func are too closely named. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-19Btrfs: sysfs: add support to show replacing target in the sysfsAnand Jain
This patch will add support to show the replacing target in sysfs during the process of replacement. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
2015-05-27Btrfs: sysfs btrfs_kobj_rm_device() pass fs_devices instead of fs_infoAnand Jain
since btrfs_kobj_rm_device() does nothing with fs_info Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
2015-05-27Btrfs: sysfs btrfs_kobj_add_device() pass fs_devices instead of fs_infoAnand Jain
btrfs_kobj_add_device() does not need fs_info any more. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
2015-03-03btrfs: cleanup 64bit/32bit divs, compile time constantsDavid Sterba
Switch to div_u64 if the divisor is a numeric constant or sum of sizeof()s. We can remove a few instances of do_div that has the hidden semtantics of changing the 1st argument. Small power-of-two divisors are converted to bitshifts, large values are kept intact for clarity. Signed-off-by: David Sterba <dsterba@suse.cz>
2015-01-21Btrfs: btrfs_rm_dev_replace_blocked(): Use wait_event()Zhao Lei
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-01-21Btrfs: Cleanup btrfs_bio_counter_inc_blocked()Zhao Lei
1: Remove no-need DEFINE_WAIT(wait) 2: Add likely() for BTRFS_FS_STATE_DEV_REPLACING condition 3: Use while loop instead of goto Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-12-02Merge branch 'raid56-scrub-replace' of git://github.com/miaoxie/linux-btrfs ↵Chris Mason
into for-linus
2014-12-03Btrfs, replace: enable dev-replace for raid56Zhao Lei
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2014-12-03Btrfs, raid56: fix use-after-free problem in the final device replace ↵Miao Xie
procedure on raid56 The commit c404e0dc (Btrfs: fix use-after-free in the finishing procedure of the device replace) fixed a use-after-free problem which happened when removing the source device at the end of device replace, but at that time, btrfs didn't support device replace on raid56, so we didn't fix the problem on the raid56 profile. Currently, we implemented device replace for raid56, so we need kick that problem out before we enable that function for raid56. The fix method is very simple, we just increase the bio per-cpu counter before we submit a raid56 io, and decrease the counter when the raid56 io ends. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2014-11-25btrfs: Fix a lockdep warning when running xfstest.Qu Wenruo
The following lockdep warning is triggered during xfstests: [ 1702.980872] ========================================================= [ 1702.981181] [ INFO: possible irq lock inversion dependency detected ] [ 1702.981482] 3.18.0-rc1 #27 Not tainted [ 1702.981781] --------------------------------------------------------- [ 1702.982095] kswapd0/77 just changed the state of lock: [ 1702.982415] (&delayed_node->mutex){+.+.-.}, at: [<ffffffffa03b0b51>] __btrfs_release_delayed_node+0x41/0x1f0 [btrfs] [ 1702.982794] but this lock took another, RECLAIM_FS-unsafe lock in the past: [ 1702.983160] (&fs_info->dev_replace.lock){+.+.+.} and interrupts could create inverse lock ordering between them. [ 1702.984675] other info that might help us debug this: [ 1702.985524] Chain exists of: &delayed_node->mutex --> &found->groups_sem --> &fs_info->dev_replace.lock [ 1702.986799] Possible interrupt unsafe locking scenario: [ 1702.987681] CPU0 CPU1 [ 1702.988137] ---- ---- [ 1702.988598] lock(&fs_info->dev_replace.lock); [ 1702.989069] local_irq_disable(); [ 1702.989534] lock(&delayed_node->mutex); [ 1702.990038] lock(&found->groups_sem); [ 1702.990494] <Interrupt> [ 1702.990938] lock(&delayed_node->mutex); [ 1702.991407] *** DEADLOCK *** It is because the btrfs_kobj_{add/rm}_device() will call memory allocation with GFP_KERNEL, which may flush fs page cache to free space, waiting for it self to do the commit, causing the deadlock. To solve the problem, move btrfs_kobj_{add/rm}_device() out of the dev_replace lock range, also involing split the btrfs_rm_dev_replace_srcdev() function into remove and free parts. Now only btrfs_rm_dev_replace_remove_srcdev() is called in dev_replace lock range, and kobj_{add/rm} and btrfs_rm_dev_replace_free_srcdev() are called out of the lock range. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-11-20Btrfs: return failure if btrfs_dev_replace_finishing() failedEryu Guan
device replace could fail due to another running scrub process or any other errors btrfs_scrub_dev() may hit, but this failure doesn't get returned to userspace. The following steps could reproduce this issue mkfs -t btrfs -f /dev/sdb1 /dev/sdb2 mount /dev/sdb1 /mnt/btrfs while true; do btrfs scrub start -B /mnt/btrfs >/dev/null 2>&1; done & btrfs replace start -Bf /dev/sdb2 /dev/sdb3 /mnt/btrfs # if this replace succeeded, do the following and repeat until # you see this log in dmesg # BTRFS: btrfs_scrub_dev(/dev/sdb2, 2, /dev/sdb3) failed -115 #btrfs replace start -Bf /dev/sdb3 /dev/sdb2 /mnt/btrfs # once you see the error log in dmesg, check return value of # replace echo $? Introduce a new dev replace result BTRFS_IOCTL_DEV_REPLACE_RESULT_SCRUB_INPROGRESS to catch -EINPROGRESS explicitly and return other errors directly to userspace. Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: make the logic of source device removing more clearMiao Xie
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: fix use-after-free problem of the device during device replaceMiao Xie
The problem is: Task0(device scan task) Task1(device replace task) scan_one_device() mutex_lock(&uuid_mutex) device = find_device() mutex_lock(&device_list_mutex) lock_chunk() rm_and_free_source_device unlock_chunk() mutex_unlock(&device_list_mutex) check device Destroying the target device if device replace fails also has the same problem. We fix this problem by locking uuid_mutex during destroying source device or target device, just like the device remove operation. It is a temporary solution, we can fix this problem and make the code more clear by atomic counter in the future. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: Fix misuse of chunk mutexMiao Xie
There were several problems about chunk mutex usage: - Lock chunk mutex when updating metadata. It would cause the nested deadlock because updating metadata might need allocate new chunks that need acquire chunk mutex. We remove chunk mutex at this case, because b-tree lock and other lock mechanism can help us. - ABBA deadlock occured between device_list_mutex and chunk_mutex. When we update device status, we must acquire device_list_mutex at the beginning, and then we might get chunk_mutex during the device status update because we need allocate new chunks for metadata COW. But at most place, we acquire chunk_mutex at first and then acquire device list mutex. We need change the lock order. - Some place we needn't acquire chunk_mutex. For example we needn't get chunk_mutex when we free a empty seed fs_devices structure. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: fix unprotected device's variants on 32bits machineMiao Xie
->total_bytes,->disk_total_bytes,->bytes_used is protected by chunk lock when we change them, but sometimes we read them without any lock, and we might get unexpected value. We fix this problem like inode's i_size. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: fix wrong device bytes_used in the super blockMiao Xie
device->bytes_used will be changed when allocating a new chunk, and disk_total_size will be changed if resizing is successful. Meanwhile, the on-disk super blocks of the previous transaction might not be updated. Considering the consistency of the metadata in the previous transaction, We should use the size in the previous transaction to check if the super block is beyond the boundary of the device. Though it is not big problem because we don't use it now, but anyway it is better that we make it be consistent with the common metadata, maybe we will use it in the future. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: fix wrong disk size when writing super blocksMiao Xie
total_size will be changed when resizing a device, and disk_total_size will be changed if resizing is successful. Meanwhile, the on-disk super blocks of the previous transaction might not be updated. Considering the consistency of the metadata in the previous transaction, We should use the size in the previous transaction to check if the super block is beyond the boundary of the device. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: fix unprotected assignment of the target deviceMiao Xie
We didn't protect the assignment of the target device, it might cause the problem that the super block update was skipped because we might find wrong size of the target device during the assignment. Fix it by moving the assignment sentences into the initialization function of the target device. And there is another merit that we can check if the target device is suitable more early. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: cleanup double assignment of device->bytes_used when device replace ↵Miao Xie
finishes Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: Fix a deadlock in btrfs_dev_replace_finishing()Qu Wenruo
btrfs-transacion:5657 [stack snip] btrfs_bio_map() btrfs_bio_counter_inc_blocked() percpu_counter_inc(&fs_info->bio_counter) ###bio_counter > 0(A) __btrfs_bio_map() btrfs_dev_replace_lock() mutex_lock(dev_replace->lock) ###wait mutex(B) btrfs:32612 [stack snip] btrfs_dev_replace_start() btrfs_dev_replace_lock() mutex_lock(dev_replace->lock) ###hold mutex(B) btrfs_dev_replace_finishing() btrfs_rm_dev_replace_blocked() wait until percpu_counter_sum == 0 ###wait on bio_counter(A) This bug can be triggered quite easily by the following test script: http://pastebin.com/MQmb37Cy This patch will fix the ABBA problem by calling btrfs_dev_replace_unlock() before btrfs_rm_dev_replace_blocked(). The consistency of btrfs devices list and their superblocks is protected by device_list_mutex, not btrfs_dev_replace_lock/unlock(). So it is safe the move btrfs_dev_replace_unlock() before btrfs_rm_dev_replace_blocked(). Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Cc: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <clm@fb.com>