summaryrefslogtreecommitdiff
path: root/fs/btrfs/super.c
AgeCommit message (Collapse)Author
2019-04-29btrfs: drop unused parameter in mount_subvolAnand Jain
@device_name in mount_subvol() is not used, drop it. Also see: 5bedc48a8f9e ("btrfs: drop unused parameters from mount_subvol"). Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25btrfs: add zstd compression level supportDennis Zhou
Zstd compression requires different amounts of memory for each level of compression. The prior patches implemented indirection to allow for each compression type to manage their workspaces independently. This patch uses this indirection to implement compression level support for zstd. To manage the additional memory require, each compression level has its own queue of workspaces. A global LRU is used to help with reclaim. Reclaim is done via a timer which provides a mechanism to decrease memory utilization by keeping only workspaces around that are sized appropriately. Forward progress is guaranteed by a preallocated max workspace hidden from the LRU. When getting a workspace, it uses a bitmap to identify the levels that are populated and scans up. If it finds a workspace that is greater than it, it uses it, but does not update the last_used time and the corresponding place in the LRU. If we hit memory pressure, we sleep on the max level workspace. We continue to rescan in case we can use a smaller workspace, but eventually should be able to obtain the max level workspace or allocate one again should memory pressure subside. The memory requirement for decompression is the same as level 1, and therefore can use any of available workspace. The number of workspaces is bound by an upper limit of the workqueue's limit which currently is 2 (percpu limit). The reclaim timer is used to free inactive/improperly sized workspaces and is set to 307s to avoid colliding with transaction commit (every 30s). Repeating the experiment from v2 [1], the Silesia corpus was copied to a btrfs filesystem 10 times and then read back after dropping the caches. The btrfs filesystem was on an SSD. Level Ratio Compression (MB/s) Decompression (MB/s) Memory (KB) 1 2.658 438.47 910.51 780 2 2.744 364.86 886.55 1004 3 2.801 336.33 828.41 1260 4 2.858 286.71 886.55 1260 5 2.916 212.77 556.84 1388 6 2.363 119.82 990.85 1516 7 3.000 154.06 849.30 1516 8 3.011 159.54 875.03 1772 9 3.025 100.51 940.15 1772 10 3.033 118.97 616.26 1772 11 3.036 94.19 802.11 1772 12 3.037 73.45 931.49 1772 13 3.041 55.17 835.26 2284 14 3.087 44.70 716.78 2547 15 3.126 37.30 878.84 2547 [1] https://lore.kernel.org/linux-btrfs/20181031181108.289340-1-terrelln@fb.com/ Cc: Nick Terrell <terrelln@fb.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25btrfs: change set_level() to bound the level passed inDennis Zhou
Currently, the only user of set_level() is zlib which sets an internal workspace parameter. As level is now plumbed into get_workspace(), this can be handled there rather than separately. This repurposes set_level() to bound the level passed in so it can be used when setting the mounts compression level and as well as verifying the level before getting a workspace. The other benefit is this divides the meaning of compress(0) and get_workspace(0). The former means we want to use the default compression level of the compression type. The latter means we can use any workspace available. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25btrfs: introduce new ioctl to unregister a btrfs deviceAnand Jain
Support for a new command that can be used eg. as a command $ btrfs device scan --forget [dev]' (the final name may change though) to undo the effects of 'btrfs device scan [dev]'. For this purpose this patch proposes to use ioctl #5 as it was empty and is next to the SCAN ioctl. The new ioctl BTRFS_IOC_FORGET_DEV works only on the control device (/dev/btrfs-control) to unregister one or all devices, devices that are not mounted. The argument is struct btrfs_ioctl_vol_args, ::name specifies the device path. To unregister all device, the path is an empty string. Again, the devices are removed only if they aren't part of a mounte filesystem. This new ioctl provides: - release of unwanted btrfs_fs_devices and btrfs_devices structures from memory if the device is not going to be mounted - ability to mount filesystem in degraded mode, when one devices is corrupted like in split brain raid1 - running test cases which would require reloading the kernel module but this is not possible eg. due to mounted filesystem or built-in Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-03Merge tag 'for-5.0-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - regression fix: transaction commit can run away due to delayed ref waiting heuristic, this is not necessary now because of the proper reservation mechanism introduced in 5.0 - regression fix: potential crash due to use-before-check of an ERR_PTR return value - fix for transaction abort during transaction commit that needs to properly clean up pending block groups - fix deadlock during b-tree node/leaf splitting, when this happens on some of the fundamental trees, we must prevent new tree block allocation to re-enter indirectly via the block group flushing path - potential memory leak after errors during mount * tag 'for-5.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: On error always free subvol_name in btrfs_mount btrfs: clean up pending block groups when transaction commit aborts btrfs: fix potential oops in device_list_add btrfs: don't end the transaction for delayed refs in throttle Btrfs: fix deadlock when allocating tree block during leaf/node split
2019-01-30btrfs: On error always free subvol_name in btrfs_mountEric W. Biederman
The subvol_name is allocated in btrfs_parse_subvol_options and is consumed and freed in mount_subvol. Add a free to the error paths that don't call mount_subvol so that it is guaranteed that subvol_name is freed when an error happens. Fixes: 312c89fbca06 ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()") Cc: stable@vger.kernel.org # v4.19+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-05Merge branch 'mount.part1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs mount API prep from Al Viro: "Mount API prereqs. Mostly that's LSM mount options cleanups. There are several minor fixes in there, but nothing earth-shattering (leaks on failure exits, mostly)" * 'mount.part1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (27 commits) mount_fs: suppress MAC on MS_SUBMOUNT as well as MS_KERNMOUNT smack: rewrite smack_sb_eat_lsm_opts() smack: get rid of match_token() smack: take the guts of smack_parse_opts_str() into a new helper LSM: new method: ->sb_add_mnt_opt() selinux: rewrite selinux_sb_eat_lsm_opts() selinux: regularize Opt_... names a bit selinux: switch away from match_token() selinux: new helper - selinux_add_opt() LSM: bury struct security_mnt_opts smack: switch to private smack_mnt_opts selinux: switch to private struct selinux_mnt_opts LSM: hide struct security_mnt_opts from any generic code selinux: kill selinux_sb_get_mnt_opts() LSM: turn sb_eat_lsm_opts() into a method nfs_remount(): don't leak, don't ignore LSM options quietly btrfs: sanitize security_mnt_opts use selinux; don't open-code a loop in sb_finish_set_opts() LSM: split ->sb_set_mnt_opts() out of ->sb_kern_mount() new helper: security_sb_eat_lsm_opts() ...
2018-12-21LSM: hide struct security_mnt_opts from any generic codeAl Viro
Keep void * instead, allocate on demand (in parse_str_opts, at the moment). Eventually both selinux and smack will be better off with private structures with several strings in those, rather than this "counter and two pointers to dynamically allocated arrays" ugliness. This commit allows to do that at leisure, without disrupting anything outside of given module. Changes: * instead of struct security_mnt_opt use an opaque pointer initialized to NULL. * security_sb_eat_lsm_opts(), security_sb_parse_opts_str() and security_free_mnt_opts() take it as var argument (i.e. as void **); call sites are unchanged. * security_sb_set_mnt_opts() and security_sb_remount() take it by value (i.e. as void *). * new method: ->sb_free_mnt_opts(). Takes void *, does whatever freeing that needs to be done. * ->sb_set_mnt_opts() and ->sb_remount() might get NULL as mnt_opts argument, meaning "empty". Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21btrfs: sanitize security_mnt_opts useAl Viro
1) keeping a copy in btrfs_fs_info is completely pointless - we never use it for anything. Getting rid of that allows for simpler calling conventions for setup_security_options() (caller is responsible for freeing mnt_opts in all cases). 2) on remount we want to use ->sb_remount(), not ->sb_set_mnt_opts(), same as we would if not for FS_BINARY_MOUNTDATA. Behaviours *are* close (in fact, selinux sb_set_mnt_opts() ought to punt to sb_remount() in "already initialized" case), but let's handle that uniformly. And the only reason why the original btrfs changes didn't go for security_sb_remount() in btrfs_remount() case is that it hadn't been exported. Let's export it for a while - it'll be going away soon anyway. Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21new helper: security_sb_eat_lsm_opts()Al Viro
combination of alloc_secdata(), security_sb_copy_data(), security_sb_parse_opt_str() and free_secdata(). Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-17btrfs: Fix typos in comments and stringsAndrea Gelmini
The typos accumulate over time so once in a while time they get fixed in a large patch. Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17btrfs: Remove fsid/metadata_fsid fields from btrfs_infoNikolay Borisov
Currently btrfs_fs_info structure contains a copy of the fsid/metadata_uuid fields. Same values are also contained in the btrfs_fs_devices structure which fs_info has a reference to. Let's reduce duplication by removing the fields from fs_info and always refer to the ones in fs_devices. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-11-14Btrfs: ensure path name is null terminated at btrfs_control_ioctlFilipe Manana
We were using the path name received from user space without checking that it is null terminated. While btrfs-progs is well behaved and does proper validation and null termination, someone could call the ioctl and pass a non-null terminated patch, leading to buffer overrun problems in the kernel. The ioctl is protected by CAP_SYS_ADMIN. So just set the last byte of the path to a null character, similar to what we do in other ioctls (add/remove/resize device, snapshot creation, etc). CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-11-06btrfs: avoid link error with CONFIG_NO_AUTO_INLINEArnd Bergmann
Note: this patch fixes a problem in a feature outside of btrfs ("kernel hacking: add a config option to disable compiler auto-inlining") and is applied ahead of time due to cross-subsystem dependencies. On 32-bit ARM with gcc-8, I see a link error with the addition of the CONFIG_NO_AUTO_INLINE option: fs/btrfs/super.o: In function `btrfs_statfs': super.c:(.text+0x67b8): undefined reference to `__aeabi_uldivmod' super.c:(.text+0x67fc): undefined reference to `__aeabi_uldivmod' super.c:(.text+0x6858): undefined reference to `__aeabi_uldivmod' super.c:(.text+0x6920): undefined reference to `__aeabi_uldivmod' super.c:(.text+0x693c): undefined reference to `__aeabi_uldivmod' fs/btrfs/super.o:super.c:(.text+0x6958): more undefined references to `__aeabi_uldivmod' follow So far this is the only file that shows the behavior, so I'd propose to just work around it by marking the functions as 'static inline' that normally get inlined here. The reference to __aeabi_uldivmod comes from a div_u64() which has an optimization for a constant division that uses a straight '/' operator when the result should be known to the compiler. My interpretation is that as we turn off inlining, gcc still expects the result to be constant but fails to use that constant value. Link: https://lkml.kernel.org/r/20181103153941.1881966-1-arnd@arndb.de Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> [ add the note ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15btrfs: Remove 'objectid' member from struct btrfs_rootMisono Tomohiro
There are two members in struct btrfs_root which indicate root's objectid: objectid and root_key.objectid. They are both set to the same value in __setup_root(): static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info, u64 objectid) { ... root->objectid = objectid; ... root->root_key.objectid = objecitd; ... } and not changed to other value after initialization. grep in btrfs directory shows both are used in many places: $ grep -rI "root->root_key.objectid" | wc -l 133 $ grep -rI "root->objectid" | wc -l 55 (4.17, inc. some noise) It is confusing to have two similar variable names and it seems that there is no rule about which should be used in a certain case. Since ->root_key itself is needed for tree reloc tree, let's remove 'objecitd' member and unify code to use ->root_key.objectid in all places. Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: Use wrapper macro for rcu string to remove duplicate codeMisono Tomohiro
Cleanup patch and no functional changes. Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06Btrfs: fix btrfs_write_inode vs delayed iput deadlockJosef Bacik
We recently ran into the following deadlock involving btrfs_write_inode(): [ +0.005066] __schedule+0x38e/0x8c0 [ +0.007144] schedule+0x36/0x80 [ +0.006447] bit_wait+0x11/0x60 [ +0.006446] __wait_on_bit+0xbe/0x110 [ +0.007487] ? bit_wait_io+0x60/0x60 [ +0.007319] __inode_wait_for_writeback+0x96/0xc0 [ +0.009568] ? autoremove_wake_function+0x40/0x40 [ +0.009565] inode_wait_for_writeback+0x21/0x30 [ +0.009224] evict+0xb0/0x190 [ +0.006099] iput+0x1a8/0x210 [ +0.006103] btrfs_run_delayed_iputs+0x73/0xc0 [ +0.009047] btrfs_commit_transaction+0x799/0x8c0 [ +0.009567] btrfs_write_inode+0x81/0xb0 [ +0.008008] __writeback_single_inode+0x267/0x320 [ +0.009569] writeback_sb_inodes+0x25b/0x4e0 [ +0.008702] wb_writeback+0x102/0x2d0 [ +0.007487] wb_workfn+0xa4/0x310 [ +0.006794] ? wb_workfn+0xa4/0x310 [ +0.007143] process_one_work+0x150/0x410 [ +0.008179] worker_thread+0x6d/0x520 [ +0.007490] kthread+0x12c/0x160 [ +0.006620] ? put_pwq_unlocked+0x80/0x80 [ +0.008185] ? kthread_park+0xa0/0xa0 [ +0.007484] ? do_syscall_64+0x53/0x150 [ +0.007837] ret_from_fork+0x29/0x40 Writeback calls: btrfs_write_inode btrfs_commit_transaction btrfs_run_delayed_iputs If iput() is called on that same inode, evict() will wait for writeback forever. btrfs_write_inode() was originally added way back in 4730a4bc5bf3 ("btrfs_dirty_inode") to support O_SYNC writes. However, ->write_inode() hasn't been used for O_SYNC since 148f948ba877 ("vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode"), so btrfs_write_inode() is actually unnecessary (and leads to a bunch of unnecessary commits). Get rid of it, which also gets rid of the deadlock. CC: stable@vger.kernel.org # 3.2+ Signed-off-by: Josef Bacik <jbacik@fb.com> [Omar: new commit message] Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: refactor block group replication factor calculation to a helperDavid Sterba
There are many places that open code the duplicity factor of the block group profiles, create a common helper. This can be easily extended for more copies. Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: rename btrfs_parse_early_optionsAnand Jain
Rename btrfs_parse_early_options() to btrfs_parse_device_options(). As btrfs_parse_early_options() parses the -o device options and scan the device provided. So this rename specifies its action. Also the function name is in line with btrfs_parse_subvol_options(). No functional changes. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: return device pointer from btrfs_scan_one_deviceGu Jinxiang
Return device pointer (with the IS_ERR semantics) from btrfs_scan_one_device so we don't have to return in through pointer. And since btrfs_fs_devices can be obtained from btrfs_device, return that. Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ fixed conflics after recent changes to btrfs_scan_one_device ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: make fs_devices a local variable in btrfs_parse_early_optionsGu Jinxiang
fs_devices is always passed to btrfs_scan_one_device which overrides it. In the call stack below fs_devices is passed to btrfs_scan_one_device from btrfs_mount_root. In btrfs_mount_root the output fs_devices of this call stack is not used. btrfs_mount_root btrfs_parse_early_options btrfs_scan_one_device So, it is not necessary to pass fs_devices from btrfs_mount_root, using a local variable in btrfs_parse_early_options is enough. Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com> Reviewed-by: Anand Jain <Anand.Jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: fix mount and ioctl device scan ioctl raceDavid Sterba
Technically this extends the critical section covered by uuid_mutex to: - parse early mount options -- here we can call device scan on paths that can be passed as 'device=/dev/...' - scan the device passed to mount - open the devices related to the fs_devices -- this increases fs_devices::opened The race can happen when mount calls one of the scans and there's another one called eg. by mkfs or 'btrfs dev scan': Mount Scan ----- ---- scan_one_device (dev1, fsid1) scan_one_device (dev2, fsid1) add the device free stale devices fsid1 fs_devices::opened == 0 find fsid1:dev1 free fsid1:dev1 if it's the last one, free fs_devices of fsid1 too open_devices (dev1, fsid1) dev1 not found When fixed, the uuid mutex will make sure that mount will increase fs_devices::opened and this will not be touched by the racing scan ioctl. Reported-and-tested-by: syzbot+909a5177749d7990ffa4@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+ceb2606025ec1cc3479c@syzkaller.appspotmail.com Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: reorder initialization before the mount locks uuid_mutexDavid Sterba
In preparation to take a big lock, move resource initialization before the critical section. It's not obvious from the diff, the desired order is: - initialize mount security options - allocate temporary fs_info - allocate superblock buffers Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: lift uuid_mutex to callers of btrfs_parse_early_optionsDavid Sterba
Prepartory work to fix race between mount and device scan. btrfs_parse_early_options calls the device scan from mount and we'll need to let mount completely manage the critical section. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: lift uuid_mutex to callers of btrfs_open_devicesDavid Sterba
Prepartory work to fix race between mount and device scan. The callers will have to manage the critical section, eg. mount wants to scan and then call btrfs_open_devices without the ioctl scan walking in and modifying the fs devices in the meantime. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: lift uuid_mutex to callers of btrfs_scan_one_deviceDavid Sterba
Prepartory work to fix race between mount and device scan. The callers will have to manage the critical section, eg. mount wants to scan and then call btrfs_open_devices without the ioctl scan walking in and modifying the fs devices in the meantime. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: remove unused parameter from btrfs_parse_subvol_optionsGu Jinxiang
Since parameter flags is no more used since commit d7407606564c ("btrfs: split parse_early_options() in two"), remove it. Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: prune unused includesDavid Sterba
Remove includes if none of the interfaces and exports is used in the given source file. Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: Fix a C compliance issueBart Van Assche
The C programming language does not allow to use preprocessor statements inside macro arguments (pr_info() is defined as a macro). Hence rework the pr_info() statement in btrfs_print_mod_info() such that it becomes compliant. This patch allows tools like sparse to analyze the BTRFS source code. Fixes: 62e855771dac ("btrfs: convert printk(KERN_* to use pr_* calls") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: Annotate fall-through when parsing mount optionBart Van Assche
This patch avoids that the compiler complains that a fall-through annotation is missing when building with W=1. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28Btrfs: allow empty subvol= againOmar Sandoval
I got a report that after upgrading to 4.16, someone's filesystems weren't mounting: [ 23.845852] BTRFS info (device loop0): unrecognized mount option 'subvol=' Before 4.16, this mounted the default subvolume. It turns out that this empty "subvol=" is actually an application bug, but it was causing the application to fail, so it's an ABI break if you squint. The generic parsing code we use for mount options (match_token()) doesn't match an empty string as "%s". Previously, setup_root_args() removed the "subvol=" string, but the mount path was cleaned up to not need that. Add a dummy Opt_subvol_empty to fix this. The simple workaround is to use / or . for the value of 'subvol=' . Fixes: 312c89fbca06 ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()") CC: stable@vger.kernel.org # 4.16+ Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: return original error code when failing from option parsingChengguang Xu
It's not good to overwrite -ENOMEM using -EINVAL when failing from mount option parsing, so just return original error code. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: David Sterba <dsterba@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-12btrfs: replace GPL boilerplate by SPDX -- sourcesDavid Sterba
Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Add the SPDX header. Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: use RCU in btrfs_show_devname for device list traversalDavid Sterba
The show_devname callback is used to print device name in /proc/self/mounts, we need to traverse the device list consistently and read the name that's copied to a seq buffer so we don't need further locking. If the first device is being deleted at the same time, the RCU will allow us to read the device name, though it will become stale right after the RCU protection ends. This is unavoidable and the user can expect that the device will disappear from the filesystem's list at some point. The device_list_mutex was pretty heavy as it is used eg. for writing superblock and a few other IO related contexts. This can stall any application that reads the proc file for no reason. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: sort and group mount option definitionsDavid Sterba
Sort mount options by the primary name, followed by the 'no-' counterpart if it exists. Group the deprecated and debugging options. Enum and token defintions are synced. Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: Add nossd_spread mount optionHoward McLauchlan
Btrfs has two mount options for SSD optimizations: ssd and ssd_spread. Presently there is an option to disable all SSD optimizations, but there isn't an option to disable just ssd_spread. This patch adds a mount option nossd_spread that disables ssd_spread only. Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: unify types for metadata_ratio and data_chunk_allocationsAnand Jain
We have btrfs_fs_info::data_chunk_allocations and btrfs_fs_info::metadata_ratio declared as unsigned which would be unsinged int and kernel style prefers unsigned int over bare unsigned. So this patch changes them to u32. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: add more __cold annotationsDavid Sterba
The __cold functions are placed to a special section, as they're expected to be called rarely. This could help i-cache prefetches or help compiler to decide which branches are more/less likely to be taken without any other annotations needed. Though we can't add more __exit annotations, it's still possible to add __cold (that's also added with __exit). That way the following function categories are tagged: - printf wrappers, error messages - exit helpers Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: verify subvolid mount parameterAnand Jain
We aren't verifying the parameter passed to the subvolid mount option, so we won't report and fail the mount if a junk value is specified for example, -o subvolid=abc. This patch verifies the subvolid option with match_u64. Up to now the memparse function accepts the K/M/G/ suffixes, that are usually meant for size values and do not make sense for a subvolume it. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: Remove custom crc32c init codeNikolay Borisov
The custom crc32 init code was introduced in 14a958e678cd ("Btrfs: fix btrfs boot when compiled as built-in") to enable using btrfs as a built-in. However, later as pointed out by 60efa5eb2e88 ("Btrfs: use late_initcall instead of module_init") this wasn't enough and finally btrfs was switched to late_initcall which comes after the generic crc32c implementation is initiliased. The latter commit superseeded the former. Now that we don't have to maintain our own code let's just remove it and switch to using the generic implementation. Despite touching a lot of files the patch is really simple. Here is the gist of the changes: 1. Select LIBCRC32C rather than the low-level modules. 2. s/btrfs_crc32c/crc32c/g 3. replace hash.h with linux/crc32c.h 4. Move the btrfs namehash funcs to ctree.h and change the tree accordingly. I've tested this with btrfs being both a module and a built-in and xfstest doesn't complain. Does seem to fix the longstanding problem of not automatically selectiong the crc32c module when btrfs is used. Possibly there is a workaround in dracut. The modinfo confirms that now all the module dependencies are there: before: depends: zstd_compress,zstd_decompress,raid6_pq,xor,zlib_deflate after: depends: libcrc32c,zstd_compress,zstd_decompress,raid6_pq,xor,zlib_deflate Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add more info to changelog from mails ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: add a comment to mark the deprecated mount optionAnand Jain
The options alloc_start and subvolrootid are deprecated, comment them in the tokens list. And leave them as it is. No functional changes. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: manage commit mount option as %uAnand Jain
As the commit mount option is unsigned so manage it as %u for token verifications, instead of %d. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: manage check_int_print_mask mount option as %uAnand Jain
As check_int_print_mask mount option is unsigned so manage it as %u for token verifications, instead of %d. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: manage metadata_ratio mount option as %uAnand Jain
As metadata_ratio mount option is unsinged so manage it as %u for token verifications, instead of %d. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: manage thread_pool mount option as %uAnand Jain
The mount option thread_pool is always unsigned. Manage it that way all around. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-01btrfs: use kvzalloc to allocate btrfs_fs_infoJeff Mahoney
The srcu_struct in btrfs_fs_info scales in size with NR_CPUS. On kernels built with NR_CPUS=8192, this can result in kmalloc failures that prevent mounting. There is work in progress to try to resolve this for every user of srcu_struct but using kvzalloc will work around the failures until that is complete. As an example with NR_CPUS=512 on x86_64: the overall size of subvol_srcu is 3460 bytes, fs_info is 6496. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22btrfS: collapse btrfs_handle_error() into __btrfs_handle_fs_error()Anand Jain
There is no other consumer for btrfs_handle_error() other than __btrfs_handle_fs_error(), further this function quite small. Merge it into its parent. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> [ reformat comment ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22btrfs: remove check for BTRFS_FS_STATE_ERROR which we just setAnand Jain
__btrfs_handle_fs_error() sets BTRFS_FS_STATE_ERROR, and calls btrfs_handle_error() so no need to check if the BTRFS_FS_STATE_ERROR is set in btrfs_handle_error(). And there is no other user of btrfs_handle_error() as well. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22btrfs: factor btrfs_check_rw_degradable() to check given deviceAnand Jain
Update btrfs_check_rw_degradable() to check against the given device if its lost. We can use this function to know if the volume is going to be in degraded mode OR failed state, when the given device fails. Which is needed when we are handling the device failed state. A preparatory patch does not affect the flow as such. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Qu Wenruo <wqu@suse.com> [ enhance comment ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22btrfs: drop unused parameters from mount_subvolDavid Sterba
Recent patches reworking the mount path left some unused parameters. We pass a vfsmount to mount_subvol, the flags and data (ie. mount options) have been already applied and we will not need them. Signed-off-by: David Sterba <dsterba@suse.com>