summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-05-28btrfs: trace: Allow trace_qgroup_update_counters() to record old rfer/excl valueQu Wenruo
Origin trace_qgroup_update_counters() only records qgroup id and its reference count change. It's good enough to debug qgroup accounting change, but when rescan race is involved, it's pretty hard to distinguish which modification belongs to which rescan. So add old_rfer and old_excl trace output to help distinguishing different rescan instance. (Different rescan instance should reset its qgroup->rfer to 0) For trace event parameter, it just changes from u64 qgroup_id to struct btrfs_qgroup *qgroup, so number of parameters is not changed at all. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: Unexport btrfs_alloc_delalloc_workNikolay Borisov
It's used only in inode.c so makes no sense to have it exported. Also move the definition of btrfs_delalloc_work to inode.c since it's used only this file. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: Remove delayed_iput member from btrfs_delalloc_workNikolay Borisov
When allocating a delalloc work we are always setting the delayed_iput to 0. So remove the delay_iput member of btrfs_delalloc_work, as a result also remove it as a parameter from btrfs_alloc_delalloc_work since it's not used anymore. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: Remove delay_iput parameter from __start_delalloc_inodesNikolay Borisov
It's always set to 0 so remove it. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> [ rename to start_delalloc_inodes ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: Remove delayed_iput parameter from btrfs_start_delalloc_inodesNikolay Borisov
It's always set to 0, so just remove it and collapse the constant value to the only function we are passing it. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: Remove delayed_iput parameter of btrfs_start_delalloc_rootsNikolay Borisov
This parameter was introduced alongside the function in eb73c1b7cea7 ("Btrfs: introduce per-subvolume delalloc inode list") to avoid deadlocks since this function was used in the transaction commit path. However, commit 8d875f95da43 ("btrfs: disable strict file flushes for renames and truncates") removed that usage, rendering the parameter obsolete. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: do reverse path readahead in btrfs_shrink_deviceGu Jinxiang
In btrfs_shrink_device, before btrfs_search_slot, path->reada is set to READA_FORWARD. But I think READA_BACK is correct. Since: 1. key.offset is set to (u64)-1 2. after btrfs_search_slot, btrfs_previous_item is called So, for readahead previous items, READA_BACK is the correct one. Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: trace: Add trace points for unused block groupsQu Wenruo
This patch will add the following trace events: 1) btrfs_remove_block_group For btrfs_remove_block_group() function. Triggered when a block group is really removed. 2) btrfs_add_unused_block_group Triggered which block group is added to unused_bgs list. 3) btrfs_skip_unused_block_group Triggered which unused block group is not deleted. These trace events is pretty handy to debug case related to block group auto remove. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: trace: Remove unnecessary fs_info parameter for btrfs__reserve_extent ↵Qu Wenruo
event class fs_info can be extracted from btrfs_block_group_cache, and all btrfs_block_group_cache is created by btrfs_create_block_group_cache() with fs_info initialized, no need to worry about NULL pointer dereference. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: remove unused fs_info parameterGu Jinxiang
Since the commit c6100a4b4e3d ("Btrfs: replace tree->mapping with tree->private_data"), parameter fs_info in alloc_reloc_control is not used. So remove it. Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: move btrfs_raid_mindev_errorvalues to btrfs_raid_attr tableAnand Jain
Add a new member struct btrfs_raid_attr::mindev_error so that btrfs_raid_array can maintain the error code to return if the minimum number of devices condition is not met while trying to delete a device in the given raid. And so we can drop btrfs_raid_mindev_error. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: move btrfs_raid_group values to btrfs_raid_attr tableAnand Jain
Add a new member struct btrfs_raid_attr::bg_flag so that btrfs_raid_array can maintain the bit map flag of the raid type, and so we can drop btrfs_raid_group. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: move btrfs_raid_type_names values to btrfs_raid_attr tableAnand Jain
Add a new member struct btrfs_raid_attr::raid_name so that btrfs_raid_array can maintain the name of the raid type, and so we can drop btrfs_raid_type_names. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: print-tree: Add eb locking status output for debug buildQu Wenruo
It's pretty handy if we can get the debug output for locking status of an extent buffer, specially for race condition related debugging. So add the following output for btrfs_print_tree() and btrfs_print_leaf(): - refs - write_locks (as w:%d) - read_locks (as r:%d) - blocking_writers (as bw:%d) - blocking_readers (as br:%d) - spinning_writers (as sw:%d) - spinning_readers (as sr:%d) - lock_owner - current->pid Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update comment ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: open code set_balance_controlDavid Sterba
The helper is quite simple and I'd like to see the locking in the caller. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: use mutex in btrfs_resume_balance_asyncDavid Sterba
While the spinlock does not cause problems, using the mutex is more correct and consistent with others. The global status of balance is eg. checked from btrfs_pause_balance or btrfs_cancel_balance with mutex. Resuming balance happens during mount or ro->rw remount. In the former case, no other user of the balance_ctl exists, in the latter, balance cannot run until the ro/rw transition is finished. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: drop lock parameter from update_ioctl_balance_args and renameDavid Sterba
The parameter controls locking of the stats part but we can lock it unconditionally, as this only happens once when balance starts. This is not performance critical. Add the prefix for an exported function. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: move and comment read-only check in btrfs_cancel_balanceDavid Sterba
Balance cannot be started on a read-only filesystem and will have to finish/exit before eg. going to read-only via remount. In case the filesystem is forcibly set to read-only after an error, balance will finish anyway and if the cancel call is too fast it will just wait for that to happen. The last case is when the balance is paused after mount but it's read-only and cancelling would want to delete the item. The test is moved after the check if balance is running at all, as it looks more logical to report "no balance running" instead of "read-only filesystem". Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: track running balance in a simpler wayDavid Sterba
Currently fs_info::balance_running is 0 or 1 and does not use the semantics of atomics. The pause and cancel check for 0, that can happen only after __btrfs_balance exits for whatever reason. Parallel calls to balance ioctl may enter btrfs_ioctl_balance multiple times but will block on the balance_mutex that protects the fs_info::flags bit. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: kill btrfs_fs_info::volume_mutexDavid Sterba
Mutual exclusion of device add/rm and balance was done by the volume mutex up to version 3.7. The commit 5ac00addc7ac091109 ("Btrfs: disallow mutually exclusive admin operations from user mode") added a bit that essentially tracked the same information. The status bit has an advantage over a mutex that it can be set without restrictions of function context, so it started to be used in the mount-time resuming of balance or device replace. But we don't really need to track the same information in two ways. 1) After the previous cleanups, the main ioctl handlers for add/del/resize copy the EXCL_OP bit next to the volume mutex, here it's clearly safe. 2) Resuming balance during mount or after rw remount will set only the EXCL_OP bit and the volume_mutex is held in the kernel thread that calls btrfs_balance. 3) Resuming device replace during mount or after rw remount is done after balance and is excluded by the EXCL_OP bit. It does not take the volume_mutex at all and completely relies on the EXCL_OP bit. 4) The resuming of balance and dev-replace cannot hapen at the same time as the ioctls cannot be started in parallel. Nevertheless, a crafted image could trigger that and a warning is printed. 5) Balance is normally excluded by EXCL_OP and also uses own mutex to protect against concurrent access to its status data. There's some trickery to maintain the right lock nesting in case we need to reexamine the status in btrfs_ioctl_balance. The volume_mutex is removed and the unlock/lock sequence is left in place as we might expect other waiters to proceed. 6) Similar to 5, the unlock/lock sequence is kept in btrfs_cancel_balance to allow waiters to continue. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_startDavid Sterba
The volume mutex does not protect against anything in this case, the comment about scrub is right but not related to locking and looks confusing. The comment in btrfs_find_device_missing_or_by_path is wrong and confusing too. The device_list_mutex is not held here to protect device lookup, but in this case device replace cannot run in parallel with device removal (due to exclusive op protection), so we don't need further locking here. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: cleanup helpers that reset balance stateDavid Sterba
The function __cancel_balance name is confusing with the cancel operation of balance and it really resets the state of balance back to zero. The unset_balance_control helper is called only from one place and simple enough to be inlined. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: add sanity check when resuming balance after mountDavid Sterba
Replace a WARN_ON with a proper check and message in case something goes really wrong and resumed balance cannot set up its exclusive status. The check is a user friendly assertion, I don't expect to ever happen under normal circumstances. Also document that the paused balance starts here and owns the exclusive op status. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: add proper safety check before resuming dev-replaceDavid Sterba
The device replace is paused by unmount or read only remount, and resumed on next mount or write remount. The exclusive status should be checked properly as it's a global invariant and we must not allow 2 operations run. In this case, the balance can be also paused and resumed under same conditions. It's always checked first so dev-replace could see the EXCL_OP already taken, BUT, the ioctl would never let start both at the same time. Replace the WARN_ON with message and return 0, indicating no error as this is purely theoretical and the user will be informed. Resolving that manually should be possible by waiting for the other operation to finish or cancel the paused state. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: move clearing of EXCL_OP out of __cancel_balanceDavid Sterba
Make the clearning visible in the callers so we can pair it with the test_and_set part. Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: move volume_mutex to callers of btrfs_rm_deviceDavid Sterba
Move locking and unlocking next to the BTRFS_FS_EXCL_OP bit manipulation so it's obvious that the two happen at the same time. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: move btrfs_init_dev_replace_tgtdev to dev-replace.c and make staticDavid Sterba
The function logically belongs there and there's only a single caller, no need to export it. No code changes. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: export and rename free_deviceDavid Sterba
The function will be used outside of volumes.c, the allocation btrfs_alloc_device is also exported. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: make success path out of btrfs_init_dev_replace_tgtdev more clearDavid Sterba
This is a preparatory cleanup that will make clear that the only successful way out of btrfs_init_dev_replace_tgtdev will also set the device_out to a valid pointer. With this guarantee, the callers can be simplified. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: squeeze btrfs_dev_replace_continue_on_mount to its callerDavid Sterba
The function is called once and is fairly small, we can merge it with the caller. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: cleanup btrfs_rm_device() promote fs_devices pointerAnand Jain
This function uses fs_info::fs_devices number of time, however we declare and use it only at the end, instead do it in the beginning of the function and use it. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: cleanup find_device() drop list_head pointerAnand Jain
find_device() declares struct list_head *head pointer and used only once, instead just use it directly. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: rename __btrfs_open_devices to open_fs_devicesAnand Jain
__btrfs_open_devices() is un-exported drop __ prefix and rename it to open_fs_devices(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: rename __btrfs_close_devices to close_fs_devicesAnand Jain
__btrfs_close_devices() is un-exported, drop the __ prefix and rename it to close_fs_devices(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: cleanup __btrfs_open_devices() drop head pointerAnand Jain
__btrfs_open_devices() declares struct list_head *head, however head is used only once, instead use btrfs_fs_devices::devices directly. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: rename struct btrfs_fs_devices::listAnand Jain
btrfs_fs_devices::list is the list of BTRFS fsid in the kernel, a generic name 'list' makes it's search very difficult, rename it to fs_list. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: Drop fs_info parameter from btrfs_merge_delayed_refsNikolay Borisov
It's provided by the transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: Drop fs_info parameter from add_delayed_data_refNikolay Borisov
It's provided by the transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: Drop add_delayed_ref_head fs_info parameterNikolay Borisov
It's provided by the transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: Remove btrfs_wait_and_free_delalloc_workNikolay Borisov
This function is called from only 1 place and is effectively a wrapper over wait_completion/kfree. It doesn't really bring any value having those two calls in a separate function. Just open code it and remove it. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: Remove tree argument from extent_writepagesNikolay Borisov
It can be directly referenced from the passed address_space so do that. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: Use list_empty instead of list_empty_carefulNikolay Borisov
list_empty_careful usually is a signal of something tricky going on. Its usage in btrfs is actually not needed since both lists it's used on are local to a function and cannot be modified concurrently. So switch to plain list_empty. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: Remove redundant tree argument from extent_readpagesNikolay Borisov
This function is called only from btrfs_readpage and is already passed the mapping. Simplify its signature by moving the code obtaining reference to the extent tree in the function. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: Remove map argument from try_release_extent_stateNikolay Borisov
It's not used in the function so just remove it. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: Sink extent_tree arguments in try_release_extent_mappingNikolay Borisov
This function already gets the page from which the two extent trees are referenced. Simplify its signature by moving the code getting the trees inside the function. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: Allow rmdir(2) to delete an empty subvolumeMisono Tomohiro
Change the behavior of rmdir(2) and allow it to delete an empty subvolume by using btrfs_delete_subvolume() which is used by btrfs_ioctl_snap_destroy(). This is a change in behaviour and has been requested by users. Deleting the subvolume by ioctl requires root permissions while the rmdir way does works with standard tools and syscalls for all users that can access the subvolume. The main usecase is to allow 'rm -rf /path/with/subvols' to simply work. We were not able to find any nasty usability surprises, the intention is to do the destructive rm. Without allowing rmdir, this would have to be followed by the ioctl subvolume deletion, which is more of an annoyance. Implementation details: The required lock for @dir and inode of @dentry is already acquired in vfs layer. We need some check before deleting a subvolume. Permission check is done in vfs layer, emptiness check is in btrfs_rmdir() and additional check (i.e. neither the subvolume is a default subvolume nor send is in progress) is in btrfs_delete_subvolume(). Note that in btrfs_ioctl_snap_destroy(), d_delete() is called after btrfs_delete_subvolume(). For rmdir(2), d_delete() is called in vfs layer later. Tested-by: Goffredo Baroncelli <kreijack@inwind.it> Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> [ enhance changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: Factor out the main deletion process from btrfs_ioctl_snap_destroy()Misono Tomohiro
Factor out the second half of btrfs_ioctl_snap_destroy() as btrfs_delete_subvolume(), which performs some subvolume specific checks before deletion: 1. send is not in progress 2. the subvolume is not the default subvolume 3. the subvolume does not contain other subvolumes and actual deletion process. btrfs_delete_subvolume() requires inode_lock for both @dir and inode of @dentry. The remaining part of btrfs_ioctl_snap_destroy() is mainly permission checks. Note that call of d_delete() is not included in btrfs_delete_subvolume() as this function will also be used by btrfs_rmdir() to delete an empty subvolume and in that case d_delete() is called in VFS layer. As a result, btrfs_unlink_subvol() and may_destroy_subvol() become static functions. No functional changes. Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> [ minor comment updates ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: Move may_destroy_subvol() from ioctl.c to inode.cMisono Tomohiro
This is a preparation work to refactor btrfs_ioctl_snap_destroy() and to allow rmdir(2) to delete an empty subvolume. Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> [ minor update of the function comment ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: remove unused le_test_bit()Howard McLauchlan
With commit b18253ec57c0 ("btrfs: optimize free space tree bitmap conversion"), there are no more callers to le_test_bit(). This patch removes le_test_bit(). Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: optimize free space tree bitmap conversionHoward McLauchlan
Presently, convert_free_space_to_extents() does a linear scan of the bitmap. We can speed this up with find_next_{bit,zero_bit}_le(). This patch replaces the linear scan with find_next_{bit,zero_bit}_le(). Testing shows a 20-33% decrease in execution time for convert_free_space_to_extents(). Since we change bitmap to be unsigned long, we have to do some casting for the bitmap cursor. In le_bitmap_set() it makes sense to use u8, as we are doing bit operations. Everywhere else, we're just using it for pointer arithmetic and not directly accessing it, so char seems more appropriate. Suggested-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>