summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLiu Bo <bo.li.liu@oracle.com>2014-06-09 11:04:49 +0800
committerChris Mason <clm@fb.com>2014-06-09 17:21:10 -0700
commit0b43e04f700bce8eecf3581612e9d3e15bf258bc (patch)
treebcdff8fabe11740744af8d1193988612d7faa504
parent337c6f6830a5ceb650eaf10d38ed8e8f496c8f61 (diff)
Btrfs: fix leaf corruption after __btrfs_drop_extents
Several reports about leaf corruption has been floating on the list, one of them points to __btrfs_drop_extents(), and we find that the leaf becomes corrupted after __btrfs_drop_extents(), it's really a rare case but it does exist. The problem turns out to be btrfs_next_leaf() called in __btrfs_drop_extents(). So in btrfs_next_leaf(), we release the current path to re-search the last key of the leaf for locating next leaf, and we've taken it into account that there might be balance operations between leafs during this 'unlock and re-lock' dance, so we check the path again and advance it if there are now more items available. But things are a bit different if that last key happens to be removed and balance gets a bigger key as the last one, and btrfs_search_slot will return it with ret > 0, IOW, nothing change in this leaf except the new last key, then we think we're okay because there is no more item balanced in, fine, we thinks we can go to the next leaf. However, we should return that bigger key, otherwise we deserve leaf corruption, for example, in endio, skipping that key means that __btrfs_drop_extents() thinks it has dropped all extent matched the required range and finish_ordered_io can safely insert a new extent, but it actually doesn't and ends up a leaf corruption. One may be asking that why our locking on extent io tree doesn't work as expected, ie. it should avoid this kind of race situation. But in __btrfs_drop_extents(), we don't always find extents which are included within our locking range, IOW, extents can start before our searching start, in this case locking on extent io tree doesn't protect us from the race. This takes the special case into account. Reviewed-by: Filipe Manana <fdmanana@gmail.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
-rw-r--r--fs/btrfs/ctree.c18
1 files changed, 18 insertions, 0 deletions
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 4eada52f3969..aeab453b8e24 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -5718,6 +5718,24 @@ again:
ret = 0;
goto done;
}
+ /*
+ * So the above check misses one case:
+ * - after releasing the path above, someone has removed the item that
+ * used to be at the very end of the block, and balance between leafs
+ * gets another one with bigger key.offset to replace it.
+ *
+ * This one should be returned as well, or we can get leaf corruption
+ * later(esp. in __btrfs_drop_extents()).
+ *
+ * And a bit more explanation about this check,
+ * with ret > 0, the key isn't found, the path points to the slot
+ * where it should be inserted, so the path->slots[0] item must be the
+ * bigger one.
+ */
+ if (nritems > 0 && ret > 0 && path->slots[0] == nritems - 1) {
+ ret = 0;
+ goto done;
+ }
while (level < BTRFS_MAX_LEVEL) {
if (!path->nodes[level]) {