summaryrefslogtreecommitdiff
path: root/fs/btrfs
AgeCommit message (Collapse)Author
2019-07-22Merge tag 'for-5.3-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fixes for leaks caused by recently merged patches - one build fix - a fix to prevent mixing of incompatible features * tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: don't leak extent_map in btrfs_get_io_geometry() btrfs: free checksum hash on in close_ctree btrfs: Fix build error while LIBCRC32C is module btrfs: inode: Don't compress if NODATASUM or NODATACOW set
2019-07-19Merge branch 'work.mount0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs mount updates from Al Viro: "The first part of mount updates. Convert filesystems to use the new mount API" * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) mnt_init(): call shmem_init() unconditionally constify ksys_mount() string arguments don't bother with registering rootfs init_rootfs(): don't bother with init_ramfs_fs() vfs: Convert smackfs to use the new mount API vfs: Convert selinuxfs to use the new mount API vfs: Convert securityfs to use the new mount API vfs: Convert apparmorfs to use the new mount API vfs: Convert openpromfs to use the new mount API vfs: Convert xenfs to use the new mount API vfs: Convert gadgetfs to use the new mount API vfs: Convert oprofilefs to use the new mount API vfs: Convert ibmasmfs to use the new mount API vfs: Convert qib_fs/ipathfs to use the new mount API vfs: Convert efivarfs to use the new mount API vfs: Convert configfs to use the new mount API vfs: Convert binfmt_misc to use the new mount API convenience helper: get_tree_single() convenience helper get_tree_nodev() vfs: Kill sget_userns() ...
2019-07-17btrfs: don't leak extent_map in btrfs_get_io_geometry()Johannes Thumshirn
btrfs_get_io_geometry() calls btrfs_get_chunk_map() to acquire a reference on a extent_map, but on normal operation it does not drop this reference anymore. This leads to excessive kmemleak reports. Always call free_extent_map(), not just in the error case. Fixes: 5f1411265e16 ("btrfs: Introduce btrfs_io_geometry infrastructure") Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-17btrfs: free checksum hash on in close_ctreeJohannes Thumshirn
fs_info::csum_hash gets initialized in btrfs_init_csum_hash() which is called by open_ctree(). But it only gets freed if open_ctree() fails, not on normal operation. This leads to a memory leak like the following found by kmemleak: unreferenced object 0xffff888132cb8720 (size 96): comm "mount", pid 450, jiffies 4294912436 (age 17.584s) hex dump (first 32 bytes): 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000000c9643d4>] crypto_create_tfm+0x2d/0xd0 [<00000000ae577f68>] crypto_alloc_tfm+0x4b/0xb0 [<000000002b5cdf30>] open_ctree+0xb84/0x2060 [btrfs] [<0000000043204297>] btrfs_mount_root+0x552/0x640 [btrfs] [<00000000c99b10ea>] legacy_get_tree+0x22/0x40 [<0000000071a6495f>] vfs_get_tree+0x1f/0xc0 [<00000000f180080e>] fc_mount+0x9/0x30 [<000000009e36cebd>] vfs_kern_mount.part.11+0x6a/0x80 [<0000000004594c05>] btrfs_mount+0x174/0x910 [btrfs] [<00000000c99b10ea>] legacy_get_tree+0x22/0x40 [<0000000071a6495f>] vfs_get_tree+0x1f/0xc0 [<00000000b86e92c5>] do_mount+0x6b0/0x940 [<0000000097464494>] ksys_mount+0x7b/0xd0 [<0000000057213c80>] __x64_sys_mount+0x1c/0x20 [<00000000cb689b5e>] do_syscall_64+0x43/0x130 [<000000002194e289>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Free fs_info::csum_hash in close_ctree() to avoid the memory leak. Fixes: 6d97c6e31b55 ("btrfs: add boilerplate code for directly including the crypto framework") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-17btrfs: Fix build error while LIBCRC32C is moduleYueHaibing
If CONFIG_BTRFS_FS is y and CONFIG_LIBCRC32C is m, building fails: fs/btrfs/super.o: In function `btrfs_mount_root': super.c:(.text+0xb7f9): undefined reference to `crc32c_impl' fs/btrfs/super.o: In function `init_btrfs_fs': super.c:(.init.text+0x3465): undefined reference to `crc32c_impl' fs/btrfs/extent-tree.o: In function `hash_extent_data_ref': extent-tree.c:(.text+0xe60): undefined reference to `crc32c' extent-tree.c:(.text+0xe78): undefined reference to `crc32c' extent-tree.c:(.text+0xe8b): undefined reference to `crc32c' fs/btrfs/dir-item.o: In function `btrfs_insert_xattr_item': dir-item.c:(.text+0x291): undefined reference to `crc32c' fs/btrfs/dir-item.o: In function `btrfs_insert_dir_item': dir-item.c:(.text+0x429): undefined reference to `crc32c' Select LIBCRC32C to fix it. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: d5178578bcd4 ("btrfs: directly call into crypto framework for checksumming") Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-17btrfs: inode: Don't compress if NODATASUM or NODATACOW setQu Wenruo
As btrfs(5) specified: Note If nodatacow or nodatasum are enabled, compression is disabled. If NODATASUM or NODATACOW set, we should not compress the extent. Normally NODATACOW is detected properly in run_delalloc_range() so compression won't happen for NODATACOW. However for NODATASUM we don't have any check, and it can cause compressed extent without csum pretty easily, just by: mkfs.btrfs -f $dev mount $dev $mnt -o nodatasum touch $mnt/foobar mount -o remount,datasum,compress $mnt xfs_io -f -c "pwrite 0 128K" $mnt/foobar And in fact, we have a bug report about corrupted compressed extent without proper data checksum so even RAID1 can't recover the corruption. (https://bugzilla.kernel.org/show_bug.cgi?id=199707) Running compression without proper checksum could cause more damage when corruption happens, as compressed data could make the whole extent unreadable, so there is no need to allow compression for NODATACSUM. The fix will refactor the inode compression check into two parts: - inode_can_compress() As the hard requirement, checked at btrfs_run_delalloc_range(), so no compression will happen for NODATASUM inode at all. - inode_need_compress() As the soft requirement, checked at btrfs_run_delalloc_range() and compress_file_range(). Reported-by: James Harvey <jamespharvey20@gmail.com> CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-16Merge tag 'for-5.3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "Highlights: - chunks that have been trimmed and unchanged since last mount are tracked and skipped on repeated trims - use hw assissed crc32c on more arches, speedups if native instructions or optimized implementation is available - the RAID56 incompat bit is automatically removed when the last block group of that type is removed Fixes: - fsync fix for reflink on NODATACOW files that could lead to ENOSPC - fix data loss after inode eviction, renaming it, and fsync it - fix fsync not persisting dentry deletions due to inode evictions - update ctime/mtime/iversion after hole punching - fix compression type validation (reported by KASAN) - send won't be allowed to start when relocation is in progress, this can cause spurious errors or produce incorrect send stream Core: - new tracepoints for space update - tree-checker: better check for end of extents for some tree items - preparatory work for more checksum algorithms - run delayed iput at unlink time and don't push the work to cleaner thread where it's not properly throttled - wrap block mapping to structures and helpers, base for further refactoring - split large files, part 1: - space info handling - block group reservations - delayed refs - delayed allocation - other cleanups and refactoring" * tag 'for-5.3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (103 commits) btrfs: fix memory leak of path on error return path btrfs: move the subvolume reservation stuff out of extent-tree.c btrfs: migrate the delalloc space stuff to it's own home btrfs: migrate btrfs_trans_release_chunk_metadata btrfs: migrate the delayed refs rsv code btrfs: Evaluate io_tree in find_lock_delalloc_range() btrfs: migrate the global_block_rsv helpers to block-rsv.c btrfs: migrate the block-rsv code to block-rsv.c btrfs: stop using block_rsv_release_bytes everywhere btrfs: cleanup the target logic in __btrfs_block_rsv_release btrfs: export __btrfs_block_rsv_release btrfs: export btrfs_block_rsv_add_bytes btrfs: move btrfs_block_rsv definitions into it's own header btrfs: Simplify update of space_info in __reserve_metadata_bytes() btrfs: unexport can_overcommit btrfs: move reserve_metadata_bytes and supporting code to space-info.c btrfs: move dump_space_info to space-info.c btrfs: export block_rsv_use_bytes btrfs: move btrfs_space_info_add_*_bytes to space-info.c btrfs: move the space info update macro to space-info.h ...
2019-07-15Merge tag 'for-linus-20190715' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more block updates from Jens Axboe: "A later pull request with some followup items. I had some vacation coming up to the merge window, so certain things items were delayed a bit. This pull request also contains fixes that came in within the last few days of the merge window, which I didn't want to push right before sending you a pull request. This contains: - NVMe pull request, mostly fixes, but also a few minor items on the feature side that were timing constrained (Christoph et al) - Report zones fixes (Damien) - Removal of dead code (Damien) - Turn on cgroup psi memstall (Josef) - block cgroup MAINTAINERS entry (Konstantin) - Flush init fix (Josef) - blk-throttle low iops timing fix (Konstantin) - nbd resize fixes (Mike) - nbd 0 blocksize crash fix (Xiubo) - block integrity error leak fix (Wenwen) - blk-cgroup writeback and priority inheritance fixes (Tejun)" * tag 'for-linus-20190715' of git://git.kernel.dk/linux-block: (42 commits) MAINTAINERS: add entry for block io cgroup null_blk: fixup ->report_zones() for !CONFIG_BLK_DEV_ZONED block: Limit zone array allocation size sd_zbc: Fix report zones buffer allocation block: Kill gfp_t argument of blkdev_report_zones() block: Allow mapping of vmalloc-ed buffers block/bio-integrity: fix a memory leak bug nvme: fix NULL deref for fabrics options nbd: add netlink reconfigure resize support nbd: fix crash when the blksize is zero block: Disable write plugging for zoned block devices block: Fix elevator name declaration block: Remove unused definitions nvme: fix regression upon hot device removal and insertion blk-throttle: fix zero wait time for iops throttled group block: Fix potential overflow in blk_report_zones() blkcg: implement REQ_CGROUP_PUNT blkcg, writeback: Implement wbc_blkcg_css() blkcg, writeback: Add wbc->no_cgroup_owner blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner() ...
2019-07-12Merge tag 'vfs-fix-ioctl-checking-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull common SETFLAGS/FSSETXATTR parameter checking from Darrick Wong: "Here's a patch series that sets up common parameter checking functions for the FS_IOC_SETFLAGS and FS_IOC_FSSETXATTR ioctl implementations. The goal here is to reduce the amount of behaviorial variance between the filesystems where those ioctls originated (ext2 and XFS, respectively) and everybody else. - Standardize parameter checking for the SETFLAGS and FSSETXATTR ioctls (which were the file attribute setters for ext4 and xfs and have now been hoisted to the vfs) - Only allow the DAX flag to be set on files and directories" * tag 'vfs-fix-ioctl-checking-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: vfs: only allow FSSETXATTR to set DAX flag on files and dirs vfs: teach vfs_ioc_fssetxattr_check to check extent size hints vfs: teach vfs_ioc_fssetxattr_check to check project id info vfs: create a generic checking function for FS_IOC_FSSETXATTR vfs: create a generic checking and prep function for FS_IOC_SETFLAGS
2019-07-12Merge tag 'driver-core-5.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and debugfs updates from Greg KH: "Here is the "big" driver core and debugfs changes for 5.3-rc1 It's a lot of different patches, all across the tree due to some api changes and lots of debugfs cleanups. Other than the debugfs cleanups, in this set of changes we have: - bus iteration function cleanups - scripts/get_abi.pl tool to display and parse Documentation/ABI entries in a simple way - cleanups to Documenatation/ABI/ entries to make them parse easier due to typos and other minor things - default_attrs use for some ktype users - driver model documentation file conversions to .rst - compressed firmware file loading - deferred probe fixes All of these have been in linux-next for a while, with a bunch of merge issues that Stephen has been patient with me for" * tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (102 commits) debugfs: make error message a bit more verbose orangefs: fix build warning from debugfs cleanup patch ubifs: fix build warning after debugfs cleanup patch driver: core: Allow subsystems to continue deferring probe drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT arch_topology: Remove error messages on out-of-memory conditions lib: notifier-error-inject: no need to check return value of debugfs_create functions swiotlb: no need to check return value of debugfs_create functions ceph: no need to check return value of debugfs_create functions sunrpc: no need to check return value of debugfs_create functions ubifs: no need to check return value of debugfs_create functions orangefs: no need to check return value of debugfs_create functions nfsd: no need to check return value of debugfs_create functions lib: 842: no need to check return value of debugfs_create functions debugfs: provide pr_fmt() macro debugfs: log errors when something goes wrong drivers: s390/cio: Fix compilation warning about const qualifiers drivers: Add generic helper to match by of_node driver_find_device: Unify the match function with class_find_device() bus_find_device: Unify the match callback with class_find_device ...
2019-07-10Merge tag 'fsnotify_for_v5.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "This contains cleanups of the fsnotify name removal hook and also a patch to disable fanotify permission events for 'proc' filesystem" * tag 'fsnotify_for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: get rid of fsnotify_nameremove() fsnotify: move fsnotify_nameremove() hook out of d_delete() configfs: call fsnotify_rmdir() hook debugfs: call fsnotify_{unlink,rmdir}() hooks debugfs: simplify __debugfs_remove_file() devpts: call fsnotify_unlink() hook tracefs: call fsnotify_{unlink,rmdir}() hooks rpc_pipefs: call fsnotify_{unlink,rmdir}() hooks btrfs: call fsnotify_rmdir() hook fsnotify: add empty fsnotify_{unlink,rmdir}() hooks fanotify: Disallow permission events for proc filesystem
2019-07-10blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner()Tejun Heo
wbc_account_io() does a very specific job - try to see which cgroup is actually dirtying an inode and transfer its ownership to the majority dirtier if needed. The name is too generic and confusing. Let's rename it to something more specific. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-05btrfs: fix memory leak of path on error return pathColin Ian King
Currently if the allocation of roots or tmp_ulist fails the error handling does not free up the allocation of path causing a memory leak. Fix this and other similar leaks by moving the call of btrfs_free_path from label out to label out_free_ulist. Kudos to David Sterba for spotting the issue in my original fix and suggesting the correct way to fix the leak and Anand Jain for spotting a double free issue. Addresses-Coverity: ("Resource leak") Fixes: 5911c8fe05c5 ("btrfs: fiemap: preallocate ulists for btrfs_check_shared") Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-04btrfs: move the subvolume reservation stuff out of extent-tree.cJosef Bacik
This is just two functions, put it in root-tree.c since it involves root items. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-04btrfs: migrate the delalloc space stuff to it's own homeJosef Bacik
We have code for data and metadata reservations for delalloc. There's quite a bit of code here, and it's used in a lot of places so I've separated it out to it's own file. inode.c and file.c are already pretty large, and this code is complicated enough to live in its own space. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-04btrfs: migrate btrfs_trans_release_chunk_metadataJosef Bacik
Move this into transaction.c with the rest of the transaction related code. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-04btrfs: migrate the delayed refs rsv codeJosef Bacik
These belong with the delayed refs related code, not in extent-tree.c. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-04btrfs: Evaluate io_tree in find_lock_delalloc_range()Goldwyn Rodrigues
Simplification. No point passing the tree variable when it can be evaluated from inode. The tests now use the io_tree from btrfs_inode as opposed to creating one. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: migrate the global_block_rsv helpers to block-rsv.cJosef Bacik
These helpers belong in block-rsv.c Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: migrate the block-rsv code to block-rsv.cJosef Bacik
This moves everything out of extent-tree.c to block-rsv.c. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: stop using block_rsv_release_bytes everywhereJosef Bacik
block_rsv_release_bytes() is the internal to the block_rsv code, and shouldn't be called directly by anything else. Switch all users to the exported helpers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: cleanup the target logic in __btrfs_block_rsv_releaseJosef Bacik
This works for all callers already, but if we wanted to use the helper for the global_block_rsv it would end up trying to refill itself. Fix the logic to be able to be used no matter which block rsv is passed in to this helper. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: export __btrfs_block_rsv_releaseJosef Bacik
The delalloc reserve stuff calls this directly because it cares about the qgroup accounting stuff, so export it so we can move it around. Fix btrfs_block_rsv_release() to just be a static inline since it just calls __btrfs_block_rsv_release() with NULL for the qgroup stuff. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: export btrfs_block_rsv_add_bytesJosef Bacik
This is used in a few places, we need to make sure it's exported so we can move it around. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: move btrfs_block_rsv definitions into it's own headerJosef Bacik
Prep work for separating out all of the block_rsv related code into its own file. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: Simplify update of space_info in __reserve_metadata_bytes()Goldwyn Rodrigues
We don't need an if-else-if chain where we can use a simple OR since both conditions are performing the same action. The short-circuit for OR will ensure that if the first condition is true, can_overcommit() is not called. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: unexport can_overcommitJosef Bacik
Now that we've moved all of the users to space-info.c, unexport it and name it back to can_overcommit. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: move reserve_metadata_bytes and supporting code to space-info.cJosef Bacik
This moves all of the metadata reservation code into space-info.c. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: move dump_space_info to space-info.cJosef Bacik
We'll need this exported so we can use it in all the various was we need to use it. This is prep work to move reserve_metadata_bytes. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: export block_rsv_use_bytesJosef Bacik
We are going to need this to move the metadata reservation stuff to space_info.c. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: move btrfs_space_info_add_*_bytes to space-info.cJosef Bacik
Now that we've moved all the pre-requisite stuff, move these two functions. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: move the space info update macro to space-info.hJosef Bacik
Also rename it to btrfs_space_info_update_* so it's clear what we're updating. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: move and export can_overcommitJosef Bacik
This is the first piece of moving the space reservation code to space-info.c Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: move the space_info handling code to space-info.cJosef Bacik
These are the basic init and lookup functions and some helper functions, fairly straightforward before the bad stuff starts. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: export space_info_add_*_bytesJosef Bacik
Prep work for consolidating all of the space_info code into one file. We need to export these so multiple files can use them. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: rename do_chunk_alloc to btrfs_chunk_allocJosef Bacik
Really we just need the enum, but as we break more things up it'll help to have this external to extent-tree.c. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: move space_info to space-info.hJosef Bacik
Migrate the struct definition and the one helper that's in ctree.h into space-info.h Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: lift bio_set_dev from bio allocation helpersDavid Sterba
The block device is passed around for the only purpose to set it in new bios. Move the assignment one level up. This is a preparatory patch for further bdev cleanups. Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: use raid_attr for minimum stripe count in btrfs_calc_avail_data_spaceDavid Sterba
Minimum stripe count matches the minimum devices required for a given profile. The open coded assignments match the raid_attr table. What's changed here is the meaning for RAID5/6. Previously their min_stripes would be 1, while newly it's devs_min. This however shold be the same as before because it's not possible to create filesystem on fewer devices than the raid_attr table allows. There's no adjustment regarding the parity stripes (like calc_data_stripes does), because we're interested in overall space that would fit on the devices. Missing devices make no difference for the whole calculation, we have the size stored in the structures. Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: use raid_attr to adjust minimal stripe size in ↵David Sterba
btrfs_calc_avail_data_space Special case for DUP can be replaced by lookup to the attribute table, where the dev_stripes is the right coefficient. Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: drop default value assignments in enumsDavid Sterba
A few more instances whre we don't need to specify the values as long as they are the same that enum assigns automatically. All of the enums are in-memory only and nothing relies on the exact values. Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: use common helpers for extent IO state insertion messagesDavid Sterba
Print the error messages using the helpers that also print the filesystem identification. Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: run delayed iput at unlink timeJosef Bacik
We have been seeing issues in production where a cleaner script will end up unlinking a bunch of files that have pending iputs. This means they will get their final iput's run at btrfs-cleaner time and thus are not throttled, which impacts the workload. Since we are unlinking these files we can just drop the delayed iput at unlink time. We are already holding a reference to the inode so this will not be the final iput and thus is completely safe to do at this point. Doing this means we are more likely to be doing the final iput at unlink time, and thus will get the IO charged to the caller and get throttled appropriately without affecting the main workload. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02Btrfs: add missing inode version, ctime and mtime updates when punching holeFilipe Manana
If the range for which we are punching a hole covers only part of a page, we end up updating the inode item but we skip the update of the inode's iversion, mtime and ctime. Fix that by ensuring we update those properties of the inode. A patch for fstests test case generic/059 that tests this as been sent along with this fix. Fixes: 2aaa66558172b0 ("Btrfs: add hole punching") Fixes: e8c1c76e804b18 ("Btrfs: add missing inode update when punching hole") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02Btrfs: fix fsync not persisting dentry deletions due to inode evictionsFilipe Manana
In order to avoid searches on a log tree when unlinking an inode, we check if the inode being unlinked was logged in the current transaction, as well as the inode of its parent directory. When any of the inodes are logged, we proceed to delete directory items and inode reference items from the log, to ensure that if a subsequent fsync of only the inode being unlinked or only of the parent directory when the other is not fsync'ed as well, does not result in the entry still existing after a power failure. That check however is not reliable when one of the inodes involved (the one being unlinked or its parent directory's inode) is evicted, since the logged_trans field is transient, that is, it is not stored on disk, so it is lost when the inode is evicted and loaded into memory again (which is set to zero on load). As a consequence the checks currently being done by btrfs_del_dir_entries_in_log() and btrfs_del_inode_ref_in_log() always return true if the inode was evicted before, regardless of the inode having been logged or not before (and in the current transaction), this results in the dentry being unlinked still existing after a log replay if after the unlink operation only one of the inodes involved is fsync'ed. Example: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/dir $ touch /mnt/dir/foo $ xfs_io -c fsync /mnt/dir/foo # Keep an open file descriptor on our directory while we evict inodes. # We just want to evict the file's inode, the directory's inode must not # be evicted. $ ( cd /mnt/dir; while true; do :; done ) & $ pid=$! # Wait a bit to give time to background process to chdir to our test # directory. $ sleep 0.5 # Trigger eviction of the file's inode. $ echo 2 > /proc/sys/vm/drop_caches # Unlink our file and fsync the parent directory. After a power failure # we don't expect to see the file anymore, since we fsync'ed the parent # directory. $ rm -f $SCRATCH_MNT/dir/foo $ xfs_io -c fsync /mnt/dir <power failure> $ mount /dev/sdb /mnt $ ls /mnt/dir foo $ --> file still there, unlink not persisted despite explicit fsync on dir Fix this by checking if the inode has the full_sync bit set in its runtime flags as well, since that bit is set everytime an inode is loaded from disk, or for other less common cases such as after a shrinking truncate or failure to allocate extent maps for holes, and gets cleared after the first fsync. Also consider the inode as possibly logged only if it was last modified in the current transaction (besides having the full_fsync flag set). Fixes: 3a5f1d458ad161 ("Btrfs: Optimize btree walking while logging inodes") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: Use btrfs_get_io_geometry appropriatelyNikolay Borisov
Presently btrfs_map_block is used not only to do everything necessary to map a bio to the underlying allocation profile but it's also used to identify how much data could be written based on btrfs' stripe logic without actually submitting anything. This is achieved by passing NULL for 'bbio_ret' parameter. This patch refactors all callers that require just the mapping length by switching them to using btrfs_io_geometry instead of calling btrfs_map_block with a special NULL value for 'bbio_ret'. No functional change. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: Introduce btrfs_io_geometry infrastructureNikolay Borisov
Add a structure that holds various parameters for IO calculations and a helper that fills the values. This will help further refactoring and reduction of functions that in some way open-coded the calculations. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: improve messages when updating feature flagsDavid Sterba
Currently the messages printed after setting an incompat feature are cryptis, we can easily make it better as the textual description is passed to the helpers. Old: setting 128 feature flag updated: setting incompat feature flag for RAID56 (0x80) Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: shut up bogus -Wmaybe-uninitialized warningArnd Bergmann
gcc sometimes can't determine whether a variable has been initialized when both the initialization and the use are conditional: fs/btrfs/props.c: In function 'inherit_props': fs/btrfs/props.c:389:4: error: 'num_bytes' may be used uninitialized in this function [-Werror=maybe-uninitialized] btrfs_block_rsv_release(fs_info, trans->block_rsv, This code is fine. Unfortunately, I cannot think of a good way to rephrase it in a way that makes gcc understand this, so I add a bogus initialization the way one should not. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: David Sterba <dsterba@suse.com> [ gcc 8 and 9 don't emit the warning ] Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02Btrfs: prevent send failures and crashes due to concurrent relocationFilipe Manana
Send always operates on read-only trees and always expected that while it is in progress, nothing changes in those trees. Due to that expectation and the fact that send is a read-only operation, it operates on commit roots and does not hold transaction handles. However relocation can COW nodes and leafs from read-only trees, which can cause unexpected failures and crashes (hitting BUG_ONs). while send using a node/leaf, it gets COWed, the transaction used to COW it is committed, a new transaction starts, the extent previously used for that node/leaf gets allocated, possibly for another tree, and the respective extent buffer' content changes while send is still using it. When this happens send normally fails with EIO being returned to user space and messages like the following are found in dmesg/syslog: [ 3408.699121] BTRFS error (device sdc): parent transid verify failed on 58703872 wanted 250 found 253 [ 3441.523123] BTRFS error (device sdc): did not find backref in send_root. inode=63211, offset=0, disk_byte=5222825984 found extent=5222825984 Other times, less often, we hit a BUG_ON() because an extent buffer that send is using used to be a node, and while send is still using it, it got COWed and got reused as a leaf while send is still using, producing the following trace: [ 3478.466280] ------------[ cut here ]------------ [ 3478.466282] kernel BUG at fs/btrfs/ctree.c:1806! [ 3478.466965] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI [ 3478.467635] CPU: 0 PID: 2165 Comm: btrfs Not tainted 5.0.0-btrfs-next-46 #1 [ 3478.468311] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [ 3478.469681] RIP: 0010:read_node_slot+0x122/0x130 [btrfs] (...) [ 3478.471758] RSP: 0018:ffffa437826bfaa0 EFLAGS: 00010246 [ 3478.472457] RAX: ffff961416ed7000 RBX: 000000000000003d RCX: 0000000000000002 [ 3478.473151] RDX: 000000000000003d RSI: ffff96141e387408 RDI: ffff961599b30000 [ 3478.473837] RBP: ffffa437826bfb8e R08: 0000000000000001 R09: ffffa437826bfb8e [ 3478.474515] R10: ffffa437826bfa70 R11: 0000000000000000 R12: ffff9614385c8708 [ 3478.475186] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 3478.475840] FS: 00007f8e0e9cc8c0(0000) GS:ffff9615b6a00000(0000) knlGS:0000000000000000 [ 3478.476489] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3478.477127] CR2: 00007f98b67a056e CR3: 0000000005df6005 CR4: 00000000003606f0 [ 3478.477762] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3478.478385] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3478.479003] Call Trace: [ 3478.479600] ? do_raw_spin_unlock+0x49/0xc0 [ 3478.480202] tree_advance+0x173/0x1d0 [btrfs] [ 3478.480810] btrfs_compare_trees+0x30c/0x690 [btrfs] [ 3478.481388] ? process_extent+0x1280/0x1280 [btrfs] [ 3478.481954] btrfs_ioctl_send+0x1037/0x1270 [btrfs] [ 3478.482510] _btrfs_ioctl_send+0x80/0x110 [btrfs] [ 3478.483062] btrfs_ioctl+0x13fe/0x3120 [btrfs] [ 3478.483581] ? rq_clock_task+0x2e/0x60 [ 3478.484086] ? wake_up_new_task+0x1f3/0x370 [ 3478.484582] ? do_vfs_ioctl+0xa2/0x6f0 [ 3478.485075] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] [ 3478.485552] do_vfs_ioctl+0xa2/0x6f0 [ 3478.486016] ? __fget+0x113/0x200 [ 3478.486467] ksys_ioctl+0x70/0x80 [ 3478.486911] __x64_sys_ioctl+0x16/0x20 [ 3478.487337] do_syscall_64+0x60/0x1b0 [ 3478.487751] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 3478.488159] RIP: 0033:0x7f8e0d7d4dd7 (...) [ 3478.489349] RSP: 002b:00007ffcf6fb4908 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 3478.489742] RAX: ffffffffffffffda RBX: 0000000000000105 RCX: 00007f8e0d7d4dd7 [ 3478.490142] RDX: 00007ffcf6fb4990 RSI: 0000000040489426 RDI: 0000000000000005 [ 3478.490548] RBP: 0000000000000005 R08: 00007f8e0d6f3700 R09: 00007f8e0d6f3700 [ 3478.490953] R10: 00007f8e0d6f39d0 R11: 0000000000000202 R12: 0000000000000005 [ 3478.491343] R13: 00005624e0780020 R14: 0000000000000000 R15: 0000000000000001 (...) [ 3478.493352] ---[ end trace d5f537302be4f8c8 ]--- Another possibility, much less likely to happen, is that send will not fail but the contents of the stream it produces may not be correct. To avoid this, do not allow send and relocation (balance) to run in parallel. In the long term the goal is to allow for both to be able to run concurrently without any problems, but that will take a significant effort in development and testing. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>