linux.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2020-03-23	btrfs: kill the subvol_srcu	Josef Bacik
	Now that we have proper root ref counting everywhere we can kill the subvol_srcu. * removal of fs_info::subvol_srcu reduces size of fs_info by 1176 bytes * the refcount_t used for the references checks for accidental 0->1 in cases where the root lifetime would not be properly protected * there's a leak detector for roots to catch unfreed roots at umount time * SRCU served us well over the years but is was not a proper synchronization mechanism for some cases Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23	btrfs: don't take an extra root ref at allocation time	Josef Bacik
	Now that all the users of roots take references for them we can drop the extra root ref we've been taking. Before we had roots at 2 refs for the life of the file system, one for the radix tree, and one simply for existing. Now that we have proper ref accounting in all places that use roots we can drop this extra ref simply for existing as we no longer need it. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23	btrfs: move the root freeing stuff into btrfs_put_root	Josef Bacik
	There are a few different ways to free roots, either you allocated them yourself and you just do free_extent_buffer(root->node); free_extent_buffer(root->commit_node); btrfs_put_root(root); Which is the pattern for log roots. Or for snapshots/subvolumes that are being dropped you simply call btrfs_free_fs_root() which does all the cleanup for you. Unify this all into btrfs_put_root(), so that we don't free up things associated with the root until the last reference is dropped. This makes the root freeing code much more significant. The only caveat is at close_ctree() time we have to free the extent buffers for all of our main roots (extent_root, chunk_root, etc) because we have to drop the btree_inode and we'll run into issues if we hold onto those nodes until ->kill_sb() time. This will be addressed in the future when we kill the btree_inode. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23	btrfs: rename btrfs_put_fs_root and btrfs_grab_fs_root	Josef Bacik
	We are now using these for all roots, rename them to btrfs_put_root() and btrfs_grab_root(); Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23	btrfs: add a leak check for roots	Josef Bacik
	Now that we're going to start relying on getting ref counting right for roots, add a list to track allocated roots and print out any roots that aren't freed up at free_fs_info time. Hide this behind CONFIG_BTRFS_DEBUG because this will just be used for developers to verify they aren't breaking things. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23	btrfs: make the init of static elements in fs_info separate	Josef Bacik
	In adding things like eb leak checking and root leak checking there were a lot of weird corner cases that come from the fact that 1) We do not init the fs_info until we get to open_ctree time in the normal case and 2) The test infrastructure half-init's the fs_info for things that it needs. This makes it really annoying to make changes because you have to add init in two different places, have special cases for testing fs_info's that may not have certain things initialized, and cases for fs_info's that didn't make it to open_ctree and thus are not fully set up. Fix this by extracting out the non-allocating init of the fs info into it's own public function and use that to make sure we're all getting consistent views of an allocated fs_info. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23	btrfs: use btrfs_put_fs_root to free roots always	Josef Bacik
	If we are going to track leaked roots we need to free them all the same way, so don't kfree() roots directly, use btrfs_put_fs_root. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-31	btrfs: Correctly handle empty trees in find_first_clear_extent_bit	Nikolay Borisov
	Raviu reported that running his regular fs_trim segfaulted with the following backtrace: [ 237.525947] assertion failed: prev, in ../fs/btrfs/extent_io.c:1595 [ 237.525984] ------------[ cut here ]------------ [ 237.525985] kernel BUG at ../fs/btrfs/ctree.h:3117! [ 237.525992] invalid opcode: 0000 [#1] SMP PTI [ 237.525998] CPU: 4 PID: 4423 Comm: fstrim Tainted: G U OE 5.4.14-8-vanilla #1 [ 237.526001] Hardware name: ASUSTeK COMPUTER INC. [ 237.526044] RIP: 0010:assfail.constprop.58+0x18/0x1a [btrfs] [ 237.526079] Call Trace: [ 237.526120] find_first_clear_extent_bit+0x13d/0x150 [btrfs] [ 237.526148] btrfs_trim_fs+0x211/0x3f0 [btrfs] [ 237.526184] btrfs_ioctl_fitrim+0x103/0x170 [btrfs] [ 237.526219] btrfs_ioctl+0x129a/0x2ed0 [btrfs] [ 237.526227] ? filemap_map_pages+0x190/0x3d0 [ 237.526232] ? do_filp_open+0xaf/0x110 [ 237.526238] ? _copy_to_user+0x22/0x30 [ 237.526242] ? cp_new_stat+0x150/0x180 [ 237.526247] ? do_vfs_ioctl+0xa4/0x640 [ 237.526278] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] [ 237.526283] do_vfs_ioctl+0xa4/0x640 [ 237.526288] ? __do_sys_newfstat+0x3c/0x60 [ 237.526292] ksys_ioctl+0x70/0x80 [ 237.526297] __x64_sys_ioctl+0x16/0x20 [ 237.526303] do_syscall_64+0x5a/0x1c0 [ 237.526310] entry_SYSCALL_64_after_hwframe+0x49/0xbe That was due to btrfs_fs_device::aloc_tree being empty. Initially I thought this wasn't possible and as a percaution have put the assert in find_first_clear_extent_bit. Turns out this is indeed possible and could happen when a file system with SINGLE data/metadata profile has a 2nd device added. Until balance is run or a new chunk is allocated on this device it will be completely empty. In this case find_first_clear_extent_bit should return the full range [0, -1ULL] and let the caller handle this i.e for trim the end will be capped at the size of actual device. Link: https://lore.kernel.org/linux-btrfs/izW2WNyvy1dEDweBICizKnd2KDwDiDyY2EYQr4YCwk7pkuIpthx-JRn65MPBde00ND6V0_Lh8mW0kZwzDiLDv25pUYWxkskWNJnVP0kgdMA=@protonmail.com/ Fixes: 45bfcfc168f8 ("btrfs: Implement find_first_clear_extent_bit") CC: stable@vger.kernel.org # 5.2+ Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-31	Btrfs: fix race between adding and putting tree mod seq elements and nodes	Filipe Manana
	There is a race between adding and removing elements to the tree mod log list and rbtree that can lead to use-after-free problems. Consider the following example that explains how/why the problems happens: 1) Task A has mod log element with sequence number 200. It currently is the only element in the mod log list; 2) Task A calls btrfs_put_tree_mod_seq() because it no longer needs to access the tree mod log. When it enters the function, it initializes 'min_seq' to (u64)-1. Then it acquires the lock 'tree_mod_seq_lock' before checking if there are other elements in the mod seq list. Since the list it empty, 'min_seq' remains set to (u64)-1. Then it unlocks the lock 'tree_mod_seq_lock'; 3) Before task A acquires the lock 'tree_mod_log_lock', task B adds itself to the mod seq list through btrfs_get_tree_mod_seq() and gets a sequence number of 201; 4) Some other task, name it task C, modifies a btree and because there elements in the mod seq list, it adds a tree mod elem to the tree mod log rbtree. That node added to the mod log rbtree is assigned a sequence number of 202; 5) Task B, which is doing fiemap and resolving indirect back references, calls btrfs get_old_root(), with 'time_seq' == 201, which in turn calls tree_mod_log_search() - the search returns the mod log node from the rbtree with sequence number 202, created by task C; 6) Task A now acquires the lock 'tree_mod_log_lock', starts iterating the mod log rbtree and finds the node with sequence number 202. Since 202 is less than the previously computed 'min_seq', (u64)-1, it removes the node and frees it; 7) Task B still has a pointer to the node with sequence number 202, and it dereferences the pointer itself and through the call to __tree_mod_log_rewind(), resulting in a use-after-free problem. This issue can be triggered sporadically with the test case generic/561 from fstests, and it happens more frequently with a higher number of duperemove processes. When it happens to me, it either freezes the VM or it produces a trace like the following before crashing: [ 1245.321140] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI [ 1245.321200] CPU: 1 PID: 26997 Comm: pool Not tainted 5.5.0-rc6-btrfs-next-52 #1 [ 1245.321235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 [ 1245.321287] RIP: 0010:rb_next+0x16/0x50 [ 1245.321307] Code: .... [ 1245.321372] RSP: 0018:ffffa151c4d039b0 EFLAGS: 00010202 [ 1245.321388] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8ae221363c80 RCX: 6b6b6b6b6b6b6b6b [ 1245.321409] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8ae221363c80 [ 1245.321439] RBP: ffff8ae20fcc4688 R08: 0000000000000002 R09: 0000000000000000 [ 1245.321475] R10: ffff8ae20b120910 R11: 00000000243f8bb1 R12: 0000000000000038 [ 1245.321506] R13: ffff8ae221363c80 R14: 000000000000075f R15: ffff8ae223f762b8 [ 1245.321539] FS: 00007fdee1ec7700(0000) GS:ffff8ae236c80000(0000) knlGS:0000000000000000 [ 1245.321591] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1245.321614] CR2: 00007fded4030c48 CR3: 000000021da16003 CR4: 00000000003606e0 [ 1245.321642] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1245.321668] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1245.321706] Call Trace: [ 1245.321798] __tree_mod_log_rewind+0xbf/0x280 [btrfs] [ 1245.321841] btrfs_search_old_slot+0x105/0xd00 [btrfs] [ 1245.321877] resolve_indirect_refs+0x1eb/0xc60 [btrfs] [ 1245.321912] find_parent_nodes+0x3dc/0x11b0 [btrfs] [ 1245.321947] btrfs_check_shared+0x115/0x1c0 [btrfs] [ 1245.321980] ? extent_fiemap+0x59d/0x6d0 [btrfs] [ 1245.322029] extent_fiemap+0x59d/0x6d0 [btrfs] [ 1245.322066] do_vfs_ioctl+0x45a/0x750 [ 1245.322081] ksys_ioctl+0x70/0x80 [ 1245.322092] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 1245.322113] __x64_sys_ioctl+0x16/0x20 [ 1245.322126] do_syscall_64+0x5c/0x280 [ 1245.322139] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 1245.322155] RIP: 0033:0x7fdee3942dd7 [ 1245.322177] Code: .... [ 1245.322258] RSP: 002b:00007fdee1ec6c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 1245.322294] RAX: ffffffffffffffda RBX: 00007fded40210d8 RCX: 00007fdee3942dd7 [ 1245.322314] RDX: 00007fded40210d8 RSI: 00000000c020660b RDI: 0000000000000004 [ 1245.322337] RBP: 0000562aa89e7510 R08: 0000000000000000 R09: 00007fdee1ec6d44 [ 1245.322369] R10: 0000000000000073 R11: 0000000000000246 R12: 00007fdee1ec6d48 [ 1245.322390] R13: 00007fdee1ec6d40 R14: 00007fded40210d0 R15: 00007fdee1ec6d50 [ 1245.322423] Modules linked in: .... [ 1245.323443] ---[ end trace 01de1e9ec5dff3cd ]--- Fix this by ensuring that btrfs_put_tree_mod_seq() computes the minimum sequence number and iterates the rbtree while holding the lock 'tree_mod_log_lock' in write mode. Also get rid of the 'tree_mod_seq_lock' lock, since it is now redundant. Fixes: bd989ba359f2ac ("Btrfs: add tree modification log functions") Fixes: 097b8a7c9e48e2 ("Btrfs: join tree mod log code with the code holding back delayed refs") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-23	btrfs: Add self-tests for btrfs_rmap_block	Nikolay Borisov
	Add RAID1 and single testcases to verify that data stripes are excluded from super block locations and that the address mapping is valid. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-23	btrfs: selftests: Add support for dummy devices	Nikolay Borisov
	Add basic infrastructure to create and link dummy btrfs_devices. This will be used in the pending btrfs_rmap_block test which deals with the block groups. Calling btrfs_alloc_dummy_device will link the newly created device to the passed fs_info and the test framework will free them once the test is finished. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20	btrfs: drop create parameter to btrfs_get_extent()	Omar Sandoval
	We only pass this as 1 from __extent_writepage_io(). The parameter basically means "pretend I didn't pass in a page". This is silly since we can simply not pass in the page. Get rid of the parameter from btrfs_get_extent(), and since it's used as a get_extent_t callback, remove it from get_extent_t and btree_get_extent(), neither of which need it. While we're here, let's document btrfs_get_extent(). Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-12-13	btrfs: return error pointer from alloc_test_extent_buffer	Dan Carpenter
	Callers of alloc_test_extent_buffer have not correctly interpreted the return value as error pointer, as alloc_test_extent_buffer should behave as alloc_extent_buffer. The self-tests were unaffected but btrfs_find_create_tree_block could call both functions and that would cause problems up in the call chain. Fixes: faa2dbf004e8 ("Btrfs: add sanity tests for new qgroup accounting code") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18	btrfs: rename btrfs_block_group_cache	David Sterba
	The type name is misleading, a single entry is named 'cache' while this normally means a collection of objects. Rename that everywhere. Also the identifier was quite long, making function prototypes harder to format. Suggested-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18	btrfs: add dedicated members for start and length of a block group	David Sterba
	The on-disk format of block group item makes use of the key that stores the offset and length. This is further used in the code, although this makes thing harder to understand. The key is also packed so the offset/length is not properly aligned as u64. Add start (key.objectid) and length (key.offset) members to block group and remove the embedded key. When the item is searched or written, a local variable for key is used. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-24	Btrfs: fix selftests failure due to uninitialized i_mode in test inodes	Filipe Manana
	Some of the self tests create a test inode, setup some extents and then do calls to btrfs_get_extent() to test that the corresponding extent maps exist and are correct. However btrfs_get_extent(), since the 5.2 merge window, now errors out when it finds a regular or prealloc extent for an inode that does not correspond to a regular file (its ->i_mode is not S_IFREG). This causes the self tests to fail sometimes, specially when KASAN, slub_debug and page poisoning are enabled: $ modprobe btrfs modprobe: ERROR: could not insert 'btrfs': Invalid argument $ dmesg [ 9414.691648] Btrfs loaded, crc32c=crc32c-intel, debug=on, assert=on, integrity-checker=on, ref-verify=on [ 9414.692655] BTRFS: selftest: sectorsize: 4096 nodesize: 4096 [ 9414.692658] BTRFS: selftest: running btrfs free space cache tests [ 9414.692918] BTRFS: selftest: running extent only tests [ 9414.693061] BTRFS: selftest: running bitmap only tests [ 9414.693366] BTRFS: selftest: running bitmap and extent tests [ 9414.696455] BTRFS: selftest: running space stealing from bitmap to extent tests [ 9414.697131] BTRFS: selftest: running extent buffer operation tests [ 9414.697133] BTRFS: selftest: running btrfs_split_item tests [ 9414.697564] BTRFS: selftest: running extent I/O tests [ 9414.697583] BTRFS: selftest: running find delalloc tests [ 9415.081125] BTRFS: selftest: running find_first_clear_extent_bit test [ 9415.081278] BTRFS: selftest: running extent buffer bitmap tests [ 9415.124192] BTRFS: selftest: running inode tests [ 9415.124195] BTRFS: selftest: running btrfs_get_extent tests [ 9415.127909] BTRFS: selftest: running hole first btrfs_get_extent test [ 9415.128343] BTRFS critical (device (efault)): regular/prealloc extent found for non-regular inode 256 [ 9415.131428] BTRFS: selftest: fs/btrfs/tests/inode-tests.c:904 expected a real extent, got 0 This happens because the test inodes are created without ever initializing the i_mode field of the inode, and neither VFS's new_inode() nor the btrfs callback btrfs_alloc_inode() initialize the i_mode. Initialization of the i_mode is done through the various callbacks used by the VFS to create new inodes (regular files, directories, symlinks, tmpfiles, etc), which all call btrfs_new_inode() which in turn calls inode_init_owner(), which sets the inode's i_mode. Since the tests only uses new_inode() to create the test inodes, the i_mode was never initialized. This always happens on a VM I used with kasan, slub_debug and many other debug facilities enabled. It also happened to someone who reported this on bugzilla (on a 5.3-rc). Fix this by setting i_mode to S_IFREG at btrfs_new_test_inode(). Fixes: 6bf9e4bd6a2778 ("btrfs: inode: Verify inode mode to avoid NULL pointer dereference") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204397 Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09	btrfs: stop clearing EXTENT_DIRTY in inode I/O tree	Omar Sandoval
	Since commit fee187d9d9dd ("Btrfs: do not set EXTENT_DIRTY along with EXTENT_DELALLOC"), we never set EXTENT_DIRTY in inode->io_tree, so we can simplify and stop trying to clear it. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09	Btrfs: make test_find_first_clear_extent_bit fail on incorrect results	Filipe Manana
	If any call to find_first_clear_extent_bit() returns an unexpected result, the test should fail and not just print an error message, otherwise it makes detection of regressions much harder to notice. Fixes: 1eaebb341d2b41 ("btrfs: Don't trim returned range based on input value in find_first_clear_extent_bit") Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09	Btrfs: fix memory leaks in the test test_find_first_clear_extent_bit	Filipe Manana
	The test creates an extent io tree and sets several ranges with the CHUNK_ALLOCATED and CHUNK_TRIMMED bits, resulting in the allocation of several extent state structures. However the test never clears those ranges, resulting in memory leaks of the extent state structures. This is detected when CONFIG_BTRFS_DEBUG is set once we remove the btrfs module (rmmod btrfs): [57399.787918] BTRFS: state leak: start 67108864 end 75497471 state 1 in tree 1 refs 1 [57399.790155] BTRFS: state leak: start 33554432 end 67108863 state 33 in tree 1 refs 1 [57399.791941] BTRFS: state leak: start 1048576 end 4194303 state 33 in tree 1 refs 1 [57399.793753] BTRFS: state leak: start 67108864 end 75497471 state 1 in tree 1 refs 1 [57399.795188] BTRFS: state leak: start 33554432 end 67108863 state 33 in tree 1 refs 1 [57399.796453] BTRFS: state leak: start 1048576 end 4194303 state 33 in tree 1 refs 1 [57399.797765] BTRFS: state leak: start 67108864 end 75497471 state 1 in tree 1 refs 1 [57399.799049] BTRFS: state leak: start 33554432 end 67108863 state 33 in tree 1 refs 1 [57399.800142] BTRFS: state leak: start 1048576 end 4194303 state 33 in tree 1 refs 1 [57399.801126] BTRFS: state leak: start 67108864 end 75497471 state 1 in tree 1 refs 1 [57399.802106] BTRFS: state leak: start 33554432 end 67108863 state 33 in tree 1 refs 1 [57399.803119] BTRFS: state leak: start 1048576 end 4194303 state 33 in tree 1 refs 1 [57399.804153] BTRFS: state leak: start 67108864 end 75497471 state 1 in tree 1 refs 1 [57399.805196] BTRFS: state leak: start 33554432 end 67108863 state 33 in tree 1 refs 1 [57399.806191] BTRFS: state leak: start 1048576 end 4194303 state 33 in tree 1 refs 1 The start and end offsets reported correspond exactly to the ranges used by the test. So fix that by clearing all the ranges when the test finishes. Fixes: 1eaebb341d2b41 ("btrfs: Don't trim returned range based on input value in find_first_clear_extent_bit") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09	btrfs: move basic block_group definitions to their own header	Josef Bacik
	This is prep work for moving all of the block group cache code into its own file. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ minor comment updates ] Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09	btrfs: Remove leftover of in-band dedupe	Nikolay Borisov
	It's unlikely in-band dedupe is going to land so just remove any leftovers - dedupe.h header as well as the 'dedupe' parameter to btrfs_set_extent_delalloc. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-19	Merge branch 'work.mount0' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs mount updates from Al Viro: "The first part of mount updates. Convert filesystems to use the new mount API" * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) mnt_init(): call shmem_init() unconditionally constify ksys_mount() string arguments don't bother with registering rootfs init_rootfs(): don't bother with init_ramfs_fs() vfs: Convert smackfs to use the new mount API vfs: Convert selinuxfs to use the new mount API vfs: Convert securityfs to use the new mount API vfs: Convert apparmorfs to use the new mount API vfs: Convert openpromfs to use the new mount API vfs: Convert xenfs to use the new mount API vfs: Convert gadgetfs to use the new mount API vfs: Convert oprofilefs to use the new mount API vfs: Convert ibmasmfs to use the new mount API vfs: Convert qib_fs/ipathfs to use the new mount API vfs: Convert efivarfs to use the new mount API vfs: Convert configfs to use the new mount API vfs: Convert binfmt_misc to use the new mount API convenience helper: get_tree_single() convenience helper get_tree_nodev() vfs: Kill sget_userns() ...
2019-07-04	btrfs: Evaluate io_tree in find_lock_delalloc_range()	Goldwyn Rodrigues
	Simplification. No point passing the tree variable when it can be evaluated from inode. The tests now use the io_tree from btrfs_inode as opposed to creating one. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01	btrfs: tests: add locks around add_extent_mapping	David Sterba
	There are no concerns about locking during the selftests so the locks are not necessary, but following patches will add lockdep assertions to add_extent_mapping so this is needed in tests too. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01	btrfs: Don't trim returned range based on input value in ↵	Nikolay Borisov
	find_first_clear_extent_bit Currently find_first_clear_extent_bit always returns a range whose starting value is >= passed 'start'. This implicit trimming behavior is somewhat subtle and an implementation detail. Instead, this patch modifies the function such that now it always returns the range which contains passed 'start' and has the given bits unset. This range could either be due to presence of existing records which contains 'start' but have the bits unset or because there are no records that contain the given starting offset. This patch also adds test cases which cover find_first_clear_extent_bit since they were missing up until now. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-25	vfs: Convert btrfs_test to use the new mount API	David Howells
	Convert the btrfs_test filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: David Sterba <dsterba@suse.com> cc: Chris Mason <clm@fb.com> cc: Josef Bacik <josef@toxicpanda.com> cc: linux-btrfs@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-25	mount_pseudo(): drop 'name' argument, switch to d_make_root()	Al Viro
	Once upon a time we used to set ->d_name of e.g. pipefs root so that d_path() on pipes would work. These days it's completely pointless - dentries of pipes are not even connected to pipefs root. However, mount_pseudo() had set the root dentry name (passed as the second argument) and callers kept inventing names to pass to it. Including those that didn't have any non-root dentries to start with... All of that had been pointless for about 8 years now; it's time to get rid of that cargo-culting... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-07	Merge tag 'for-5.2-tag' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "This time the majority of changes are cleanups, though there's still a number of changes of user interest. User visible changes: - better read time and write checks to catch errors early and before writing data to disk (to catch potential memory corruption on data that get checksummed) - qgroups + metadata relocation: last speed up patch int the series to address the slowness, there should be no overhead comparing balance with and without qgroups - FIEMAP ioctl does not start a transaction unnecessarily, this can result in a speed up and less blocking due to IO - LOGICAL_INO (v1, v2) does not start transaction unnecessarily, this can speed up the mentioned ioctl and scrub as well - fsync on files with many (but not too many) hardlinks is faster, finer decision if the links should be fsynced individually or completely - send tries harder to find ranges to clone - trim/discard will skip unallocated chunks that haven't been touched since the last mount Fixes: - send flushes delayed allocation before start, otherwise it could miss some changes in case of a very recent rw->ro switch of a subvolume - fix fallocate with qgroups that could lead to space accounting underflow, reported as a warning - trim/discard ioctl honours the requested range - starting send and dedupe on a subvolume at the same time will let only one of them succeed, this is to prevent changes that send could miss due to dedupe; both operations are restartable Core changes: - more tree-checker validations, errors reported by fuzzing tools: - device item - inode item - block group profiles - tracepoints for extent buffer locking - async cow preallocates memory to avoid errors happening too deep in the call chain - metadata reservations for delalloc reworked to better adapt in many-writers/low-space scenarios - improved space flushing logic for intense DIO vs buffered workloads - lots of cleanups - removed unused struct members - redundant argument removal - properties and xattrs - extent buffer locking - selftests - use common file type conversions - many-argument functions reduction" * tag 'for-5.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (227 commits) btrfs: Use kvmalloc for allocating compressed path context btrfs: Factor out common extent locking code in submit_compressed_extents btrfs: Set io_tree only once in submit_compressed_extents btrfs: Replace clear_extent_bit with unlock_extent btrfs: Make compress_file_range take only struct async_chunk btrfs: Remove fs_info from struct async_chunk btrfs: Rename async_cow to async_chunk btrfs: Preallocate chunks in cow_file_range_async btrfs: reserve delalloc metadata differently btrfs: track DIO bytes in flight btrfs: merge calls of btrfs_setxattr and btrfs_setxattr_trans in btrfs_set_prop btrfs: delete unused function btrfs_set_prop_trans btrfs: start transaction in xattr_handler_set_prop btrfs: drop local copy of inode i_mode btrfs: drop old_fsflags in btrfs_ioctl_setflags btrfs: modify local copy of btrfs_inode flags btrfs: drop useless inode i_flags copy and restore btrfs: start transaction in btrfs_ioctl_setflags() btrfs: export btrfs_set_prop btrfs: refactor btrfs_set_props to validate externally ...
2019-04-29	btrfs: get fs_info from block group in search_free_space_info	David Sterba
	We can read fs_info from the block group cache structure and can drop it from the parameters. Though the transaction is also availabe, it's not guaranteed to be non-NULL. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: inode: Verify inode mode to avoid NULL pointer dereference	Qu Wenruo
	[BUG] When accessing a file on a crafted image, btrfs can crash in block layer: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 PGD 136501067 P4D 136501067 PUD 124519067 PMD 0 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc8-default #252 RIP: 0010:end_bio_extent_readpage+0x144/0x700 Call Trace: <IRQ> blk_update_request+0x8f/0x350 blk_mq_end_request+0x1a/0x120 blk_done_softirq+0x99/0xc0 __do_softirq+0xc7/0x467 irq_exit+0xd1/0xe0 call_function_single_interrupt+0xf/0x20 </IRQ> RIP: 0010:default_idle+0x1e/0x170 [CAUSE] The crafted image has a tricky corruption, the INODE_ITEM has a different type against its parent dir: item 20 key (268 INODE_ITEM 0) itemoff 2808 itemsize 160 generation 13 transid 13 size 1048576 nbytes 1048576 block group 0 mode 121644 links 1 uid 0 gid 0 rdev 0 sequence 9 flags 0x0(none) This mode number 0120000 means it's a symlink. But the dir item think it's still a regular file: item 8 key (264 DIR_INDEX 5) itemoff 3707 itemsize 32 location key (268 INODE_ITEM 0) type FILE transid 13 data_len 0 name_len 2 name: f4 item 40 key (264 DIR_ITEM 51821248) itemoff 1573 itemsize 32 location key (268 INODE_ITEM 0) type FILE transid 13 data_len 0 name_len 2 name: f4 For symlink, we don't set BTRFS_I(inode)->io_tree.ops and leave it empty, as symlink is only designed to have inlined extent, all handled by tree block read. Thus no need to trigger btrfs_submit_bio_hook() for inline file extent. However end_bio_extent_readpage() expects tree->ops populated, as it's reading regular data extent. This causes NULL pointer dereference. [FIX] This patch fixes the problem in two ways: - Verify inode mode against its dir item when looking up inode So in btrfs_lookup_dentry() if we find inode mode mismatch with dir item, we error out so that corrupted inode will not be accessed. - Verify inode mode when getting extent mapping Only regular file should have regular or preallocated extent. If we found regular/preallocated file extent for symlink or the rest, we error out before submitting the read bio. With this fix that crafted image can be rejected gracefully: BTRFS critical (device loop0): inode mode mismatch with dir: inode mode=0121644 btrfs type=7 dir type=1 Reported-by: Yoon Jungyeon <jungyeon@gatech.edu> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202763 Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: qgroup: remove obsolete fs_info members	David Sterba
	The commit fcebe4562dec ("Btrfs: rework qgroup accounting") reworked qgroups and added some new structures. Another rework of qgroup mechanics e69bcee37692 ("btrfs: qgroup: Cleanup the old ref_node-oriented mechanism.") stopped using them and left uncleaned. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: unify messages when tests start	David Sterba
	- make the messages more visually consistent and use same format "running ... test", any error or other warning can be easily spotted - move some message to the test entry function - add message to the inode tests Example output: [ 8.187391] Btrfs loaded, crc32c=crc32c-generic, assert=on, integrity-checker=on, ref-verify=on [ 8.189476] BTRFS: selftest: sectorsize: 4096 nodesize: 4096 [ 8.190761] BTRFS: selftest: running btrfs free space cache tests [ 8.192245] BTRFS: selftest: running extent only tests [ 8.193573] BTRFS: selftest: running bitmap only tests [ 8.194876] BTRFS: selftest: running bitmap and extent tests [ 8.196166] BTRFS: selftest: running space stealing from bitmap to extent tests [ 8.198026] BTRFS: selftest: running extent buffer operation tests [ 8.199328] BTRFS: selftest: running btrfs_split_item tests [ 8.200653] BTRFS: selftest: running extent I/O tests [ 8.201808] BTRFS: selftest: running find delalloc tests [ 8.320733] BTRFS: selftest: running extent buffer bitmap tests [ 8.340795] BTRFS: selftest: running inode tests [ 8.341766] BTRFS: selftest: running btrfs_get_extent tests [ 8.342981] BTRFS: selftest: running hole first btrfs_get_extent test [ 8.344342] BTRFS: selftest: running outstanding_extents tests [ 8.345575] BTRFS: selftest: running qgroup tests [ 8.346537] BTRFS: selftest: running qgroup add/remove tests [ 8.347725] BTRFS: selftest: running qgroup multiple refs test [ 8.354982] BTRFS: selftest: running free space tree tests [ 8.372175] BTRFS: selftest: sectorsize: 4096 nodesize: 8192 [ 8.373539] BTRFS: selftest: running btrfs free space cache tests [ 8.374989] BTRFS: selftest: running extent only tests [ 8.376236] BTRFS: selftest: running bitmap only tests [ 8.377483] BTRFS: selftest: running bitmap and extent tests [ 8.378854] BTRFS: selftest: running space stealing from bitmap to extent tests ... Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: drop messages when some tests finish	David Sterba
	The messages like 'extent I/O tests finished' are redundant, if the test fails it's quite obvious in the log and hang is also noticeable. No other then extent_io and free space tree tests print that so make it consistent. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: fix comments about tested extent map ranges	David Sterba
	Comments about ranges did not match the code, the correct calculation is to use start and start+len as the interval boundaries. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: use SZ_ constants everywhere	David Sterba
	There are a few unconverted constants that are not powers of two and haven't been converted. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: use standard error message after extent map allocation failure	David Sterba
	Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: return error from all extent map test cases	David Sterba
	The way the extent map tests handle errors does not conform to the rest of the suite, where the first failure is reported and then it stops. Do the same now that we have the errors returned from all the functions. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: return errors from extent map test case 4	David Sterba
	Replace asserts with error messages and return errors. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: return errors from extent map test case 3	David Sterba
	Replace asserts with error messages and return errors. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: return errors from extent map test case 2	David Sterba
	Replace asserts with error messages and return errors. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: return errors from extent map test case 1	David Sterba
	Replace asserts with error messages and return errors. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: return errors from extent map tests	David Sterba
	The individual testcases for extent maps do not return an error on allocation failures. This is not a big problem as the allocation don't fail in general but there are functional tests handled with ASSERTS. This makes tests dependent on them and it's not reliable. This patch adds the allocation failure handling and allows for the conversion of the asserts to proper error handling and reporting. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: properly initialize fs_info of extent buffer	David Sterba
	The fs_info is supposed to be valid, even though it's not used right now and the test does not crash. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: use standard error message after block group allocation failure	David Sterba
	Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: use standard error message after inode allocation failure	David Sterba
	Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: use standard error message after path allocation failure	David Sterba
	Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: use standard error message after extent buffer allocation failure	David Sterba
	Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: use standard error message after root allocation failure	David Sterba
	Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: use standard error message after fs_info allocation failure	David Sterba
	Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29	btrfs: tests: add table of most common errors	David Sterba
	Allocation of main objects like fs_info or extent buffers is in each test so let's simplify and unify the error messages to a table and add a convenience helper. Signed-off-by: David Sterba <dsterba@suse.com>