summaryrefslogtreecommitdiff
path: root/fs/f2fs
AgeCommit message (Collapse)Author
2021-02-21Merge tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: "Another nice round of removing more code than what is added, mostly due to Christoph's relentless pursuit of tech debt removal/cleanups. This pull request contains: - Two series of BFQ improvements (Paolo, Jan, Jia) - Block iov_iter improvements (Pavel) - bsg error path fix (Pan) - blk-mq scheduler improvements (Jan) - -EBUSY discard fix (Jan) - bvec allocation improvements (Ming, Christoph) - bio allocation and init improvements (Christoph) - Store bdev pointer in bio instead of gendisk + partno (Christoph) - Block trace point cleanups (Christoph) - hard read-only vs read-only split (Christoph) - Block based swap cleanups (Christoph) - Zoned write granularity support (Damien) - Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)" * tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits) mm: simplify swapdev_block sd_zbc: clear zone resources for non-zoned case block: introduce blk_queue_clear_zone_settings() zonefs: use zone write granularity as block size block: introduce zone_write_granularity limit block: use blk_queue_set_zoned in add_partition() nullb: use blk_queue_set_zoned() to setup zoned devices nvme: cleanup zone information initialization block: document zone_append_max_bytes attribute block: use bi_max_vecs to find the bvec pool md/raid10: remove dead code in reshape_request block: mark the bio as cloned in bio_iov_bvec_set block: set BIO_NO_PAGE_REF in bio_iov_bvec_set block: remove a layer of indentation in bio_iov_iter_get_pages block: turn the nr_iovecs argument to bio_alloc* into an unsigned short block: remove the 1 and 4 vec bvec_slabs entries block: streamline bvec_alloc block: factor out a bvec_alloc_gfp helper block: move struct biovec_slab to bio.c block: reuse BIO_INLINE_VECS for integrity bvecs ...
2021-02-21Merge tag 'fsverity-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt Pull fsverity updates from Eric Biggers: "Add an ioctl which allows reading fs-verity metadata from a file. This is useful when a file with fs-verity enabled needs to be served somewhere, and the other end wants to do its own fs-verity compatible verification of the file. See the commit messages for details. This new ioctl has been tested using new xfstests I've written for it" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fs-verity: support reading signature with ioctl fs-verity: support reading descriptor with ioctl fs-verity: support reading Merkle tree with ioctl fs-verity: add FS_IOC_READ_VERITY_METADATA ioctl fs-verity: don't pass whole descriptor to fsverity_verify_signature() fs-verity: factor out fsverity_get_descriptor()
2021-02-12f2fs: give a warning only for readonly partitionJaegeuk Kim
Let's allow mounting readonly partition. We're able to recovery later once we have it as read-write back. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-02-08f2fs: don't grab superblock freeze for flush/ckpt threadJaegeuk Kim
There are controlled by f2fs_freeze(). This fixes xfstests/generic/068 which is stuck at task:f2fs_ckpt-252:3 state:D stack: 0 pid: 5761 ppid: 2 flags:0x00004000 Call Trace: __schedule+0x44c/0x8a0 schedule+0x4f/0xc0 percpu_rwsem_wait+0xd8/0x140 ? percpu_down_write+0xf0/0xf0 __percpu_down_read+0x56/0x70 issue_checkpoint_thread+0x12c/0x160 [f2fs] ? wait_woken+0x80/0x80 kthread+0x114/0x150 ? __checkpoint_and_complete_reqs+0x110/0x110 [f2fs] ? kthread_park+0x90/0x90 ret_from_fork+0x22/0x30 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-02-07fs-verity: add FS_IOC_READ_VERITY_METADATA ioctlEric Biggers
Add an ioctl FS_IOC_READ_VERITY_METADATA which will allow reading verity metadata from a file that has fs-verity enabled, including: - The Merkle tree - The fsverity_descriptor (not including the signature if present) - The built-in signature, if present This ioctl has similar semantics to pread(). It is passed the type of metadata to read (one of the above three), and a buffer, offset, and size. It returns the number of bytes read or an error. Separate patches will add support for each of the above metadata types. This patch just adds the ioctl itself. This ioctl doesn't make any assumption about where the metadata is stored on-disk. It does assume the metadata is in a stable format, but that's basically already the case: - The Merkle tree and fsverity_descriptor are defined by how fs-verity file digests are computed; see the "File digest computation" section of Documentation/filesystems/fsverity.rst. Technically, the way in which the levels of the tree are ordered relative to each other wasn't previously specified, but it's logical to put the root level first. - The built-in signature is the value passed to FS_IOC_ENABLE_VERITY. This ioctl is useful because it allows writing a server program that takes a verity file and serves it to a client program, such that the client can do its own fs-verity compatible verification of the file. This only makes sense if the client doesn't trust the server and if the server needs to provide the storage for the client. More concretely, there is interest in using this ability in Android to export APK files (which are protected by fs-verity) to "protected VMs". This would use Protected KVM (https://lwn.net/Articles/836693), which provides an isolated execution environment without having to trust the traditional "host". A "guest" VM can boot from a signed image and perform specific tasks in a minimum trusted environment using files that have fs-verity enabled on the host, without trusting the host or requiring that the guest has its own trusted storage. Technically, it would be possible to duplicate the metadata and store it in separate files for serving. However, that would be less efficient and would require extra care in userspace to maintain file consistency. In addition to the above, the ability to read the built-in signatures is useful because it allows a system that is using the in-kernel signature verification to migrate to userspace signature verification. Link: https://lore.kernel.org/r/20210115181819.34732-4-ebiggers@kernel.org Reviewed-by: Victor Hsieh <victorhsieh@google.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-03f2fs: add ckpt_thread_ioprio sysfs nodeDaeho Jeong
Added "ckpt_thread_ioprio" sysfs node to give a way to change checkpoint merge daemon's io priority. Its default value is "be,3", which means "BE" I/O class and I/O priority "3". We can select the class between "rt" and "be", and set the I/O priority within valid range of it. "," delimiter is necessary in between I/O class and priority number. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-02-03f2fs: introduce checkpoint_merge mount optionDaeho Jeong
We've added a new mount options, "checkpoint_merge" and "nocheckpoint_merge", which creates a kernel daemon and makes it to merge concurrent checkpoint requests as much as possible to eliminate redundant checkpoint issues. Plus, we can eliminate the sluggish issue caused by slow checkpoint operation when the checkpoint is done in a process context in a cgroup having low i/o budget and cpu shares. To make this do better, we set the default i/o priority of the kernel daemon to "3", to give one higher priority than other kernel threads. The below verification result explains this. The basic idea has come from https://opensource.samsung.com. [Verification] Android Pixel Device(ARM64, 7GB RAM, 256GB UFS) Create two I/O cgroups (fg w/ weight 100, bg w/ wight 20) Set "strict_guarantees" to "1" in BFQ tunables In "fg" cgroup, - thread A => trigger 1000 checkpoint operations "for i in `seq 1 1000`; do touch test_dir1/file; fsync test_dir1; done" - thread B => gererating async. I/O "fio --rw=write --numjobs=1 --bs=128k --runtime=3600 --time_based=1 --filename=test_img --name=test" In "bg" cgroup, - thread C => trigger repeated checkpoint operations "echo $$ > /dev/blkio/bg/tasks; while true; do touch test_dir2/file; fsync test_dir2; done" We've measured thread A's execution time. [ w/o patch ] Elapsed Time: Avg. 68 seconds [ w/ patch ] Elapsed Time: Avg. 48 seconds Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> [Jaegeuk Kim: fix the return value in f2fs_start_ckpt_thread, reported by Dan] Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-02-02f2fs: relocate inline conversion from mmap() to mkwrite()Chao Yu
If there is page fault only for read case on inline inode, we don't need to convert inline inode, instead, let's do conversion for write case. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-02-02f2fs: fix a wrong condition in __submit_bioDehe Gu
We should use !F2FS_IO_ALIGNED() to check and submit_io directly. Fixes: 8223ecc456d0 ("f2fs: fix to add missing F2FS_IO_ALIGNED() condition") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Dehe Gu <gudehe@huawei.com> Signed-off-by: Ge Qiu <qiuge@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-02-01f2fs: remove unnecessary initialization in xattr.cLiu Song
These variables will be explicitly assigned before use, so there is no need to initialize. Signed-off-by: Liu Song <liu.song11@zte.com.cn> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-02-01f2fs: fix to avoid inconsistent quota dataYi Chen
Occasionally, quota data may be corrupted detected by fsck: Info: checkpoint state = 45 : crc compacted_summary unmount [QUOTA WARNING] Usage inconsistent for ID 0:actual (1543036928, 762) != expected (1543032832, 762) [ASSERT] (fsck_chk_quota_files:1986) --> Quota file is missing or invalid quota file content found. [QUOTA WARNING] Usage inconsistent for ID 0:actual (1352478720, 344) != expected (1352474624, 344) [ASSERT] (fsck_chk_quota_files:1986) --> Quota file is missing or invalid quota file content found. [FSCK] Unreachable nat entries [Ok..] [0x0] [FSCK] SIT valid block bitmap checking [Ok..] [FSCK] Hard link checking for regular file [Ok..] [0x0] [FSCK] valid_block_count matching with CP [Ok..] [0xdf299] [FSCK] valid_node_count matcing with CP (de lookup) [Ok..] [0x2b01] [FSCK] valid_node_count matcing with CP (nat lookup) [Ok..] [0x2b01] [FSCK] valid_inode_count matched with CP [Ok..] [0x2665] [FSCK] free segment_count matched with CP [Ok..] [0xcb04] [FSCK] next block offset is free [Ok..] [FSCK] fixing SIT types [FSCK] other corrupted bugs [Fail] The root cause is: If we open file w/ readonly flag, disk quota info won't be initialized for this file, however, following mmap() will force to convert inline inode via f2fs_convert_inline_inode(), which may increase block usage for this inode w/o updating quota data, it causes inconsistent disk quota info. The issue will happen in following stack: open(file, O_RDONLY) mmap(file) - f2fs_convert_inline_inode - f2fs_convert_inline_page - f2fs_reserve_block - f2fs_reserve_new_block - f2fs_reserve_new_blocks - f2fs_i_blocks_write - dquot_claim_block inode->i_blocks increase, but the dqb_curspace keep the size for the dquots is NULL. To fix this issue, let's call dquot_initialize() anyway in both f2fs_truncate() and f2fs_convert_inline_inode() functions to avoid potential inconsistent quota data issue. Fixes: 0abd675e97e6 ("f2fs: support plain user/group quota") Signed-off-by: Daiyue Zhang <zhangdaiyue1@huawei.com> Signed-off-by: Dehe Gu <gudehe@huawei.com> Signed-off-by: Junchao Jiang <jiangjunchao1@huawei.com> Signed-off-by: Ge Qiu <qiuge@huawei.com> Signed-off-by: Yi Chen <chenyi77@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-02-01f2fs: flush data when enabling checkpoint backJaegeuk Kim
During checkpoint=disable period, f2fs bypasses all the synchronous IOs such as sync and fsync. So, when enabling it back, we must flush all of them in order to keep the data persistent. Otherwise, suddern power-cut right after enabling checkpoint will cause data loss. Fixes: 4354994f097d ("f2fs: checkpoint disabling") Cc: stable@vger.kernel.org Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: deprecate f2fs_trace_ioJaegeuk Kim
This patch deprecates f2fs_trace_io, since f2fs uses page->private more broadly, resulting in more buggy cases. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: Remove readahead collision detectionMatthew Wilcox (Oracle)
With the new ->readahead operation, locked pages are added to the page cache, preventing two threads from racing with each other to read the same chunk of file, so this is dead code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: remove unused stat_{inc, dec}_atomic_writeJack Qiu
Just clean code, no logical change. Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: introduce sb_status sysfs nodeChao Yu
Introduce /sys/fs/f2fs/<devname>/stat/sb_status to show superblock status in real time as a hexadecimal value. value sb status macro description 0x1 SBI_IS_DIRTY, /* dirty flag for checkpoint */ 0x2 SBI_IS_CLOSE, /* specify unmounting */ 0x4 SBI_NEED_FSCK, /* need fsck.f2fs to fix */ 0x8 SBI_POR_DOING, /* recovery is doing or not */ 0x10 SBI_NEED_SB_WRITE, /* need to recover superblock */ 0x20 SBI_NEED_CP, /* need to checkpoint */ 0x40 SBI_IS_SHUTDOWN, /* shutdown by ioctl */ 0x80 SBI_IS_RECOVERED, /* recovered orphan/data */ 0x100 SBI_CP_DISABLED, /* CP was disabled last mount */ 0x200 SBI_CP_DISABLED_QUICK, /* CP was disabled quickly */ 0x400 SBI_QUOTA_NEED_FLUSH, /* need to flush quota info in CP */ 0x800 SBI_QUOTA_SKIP_FLUSH, /* skip flushing quota in current CP */ 0x1000 SBI_QUOTA_NEED_REPAIR, /* quota file may be corrupted */ 0x2000 SBI_IS_RESIZEFS, /* resizefs is in process */ Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: fix to use per-inode maxbytesChengguang Xu
F2FS inode may have different max size, e.g. compressed file have less blkaddr entries in all its direct-node blocks, result in being with less max filesize. So change to use per-inode maxbytes. Suggested-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: compress: fix potential deadlockChao Yu
generic/269 reports a hangtask issue, the root cause is ABBA deadlock described as below: Thread A Thread B - down_write(&sbi->gc_lock) -- A - f2fs_write_data_pages - lock all pages in cluster -- B - f2fs_write_multi_pages - f2fs_write_raw_pages - f2fs_write_single_data_page - f2fs_balance_fs - down_write(&sbi->gc_lock) -- A - f2fs_gc - do_garbage_collect - ra_data_block - pagecache_get_page -- B To fix this, it needs to avoid calling f2fs_balance_fs() if there is still cluster pages been locked in context of cluster writeback, so instead, let's call f2fs_balance_fs() in the end of f2fs_write_raw_pages() when all cluster pages were unlocked. Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: fix to set/clear I_LINKABLE under i_lockChao Yu
fsstress + fault injection test case reports a warning message as below: WARNING: CPU: 13 PID: 6226 at fs/inode.c:361 inc_nlink+0x32/0x40 Call Trace: f2fs_init_inode_metadata+0x25c/0x4a0 [f2fs] f2fs_add_inline_entry+0x153/0x3b0 [f2fs] f2fs_add_dentry+0x75/0x80 [f2fs] f2fs_do_add_link+0x108/0x160 [f2fs] f2fs_rename2+0x6ab/0x14f0 [f2fs] vfs_rename+0x70c/0x940 do_renameat2+0x4d8/0x4f0 __x64_sys_renameat2+0x4b/0x60 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Following race case can cause this: Thread A Kworker - f2fs_rename - f2fs_create_whiteout - __f2fs_tmpfile - f2fs_i_links_write - f2fs_mark_inode_dirty_sync - mark_inode_dirty_sync - writeback_single_inode - __writeback_single_inode - spin_lock(&inode->i_lock) - inode->i_state |= I_LINKABLE - inode->i_state &= ~dirty - spin_unlock(&inode->i_lock) - f2fs_add_link - f2fs_do_add_link - f2fs_add_dentry - f2fs_add_inline_entry - f2fs_init_inode_metadata - f2fs_i_links_write - inc_nlink - WARN_ON(!(inode->i_state & I_LINKABLE)) Fix to add i_lock to avoid i_state update race condition. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: fix null page reference in redirty_blocksDaeho Jeong
By Colin's static analysis, we found out there is a null page reference under low memory situation in redirty_blocks. I've made the page finding loop stop immediately and return an error not to cause further memory pressure when we run into a failure to find a page under low memory condition. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reported-by: Colin Ian King <colin.king@canonical.com> Fixes: 5fdb322ff2c2 ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE") Reviewed-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: clean up post-read processingEric Biggers
Rework the post-read processing logic to be much easier to understand. At least one bug is fixed by this: if an I/O error occurred when reading from disk, decryption and verity would be performed on the uninitialized data, causing misleading messages in the kernel log. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: trival cleanup in move_data_block()Chao Yu
Trival cleanups: - relocate set_summary() before its use - relocate "allocate block address" to correct place - remove unneeded f2fs_wait_on_page_writeback() Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: fix out-of-repair __setattr_copy()Chao Yu
__setattr_copy() was copied from setattr_copy() in fs/attr.c, there is two missing patches doesn't cover this inner function, fix it. Commit 7fa294c8991c ("userns: Allow chown and setgid preservation") Commit 23adbe12ef7d ("fs,userns: Change inode_capable to capable_wrt_inode_uidgid") Fixes: fbfa2cc58d53 ("f2fs: add file operations") Cc: stable@vger.kernel.org Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: fix to tag FIEMAP_EXTENT_MERGED in f2fs_fiemap()Chao Yu
f2fs does not natively support extents in metadata, 'extent' in f2fs is used as a virtual concept, so in f2fs_fiemap() interface, it needs to tag FIEMAP_EXTENT_MERGED flag to indicated the extent status is a result of merging. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: introduce a new per-sb directory in sysfsChao Yu
Add a new directory 'stat' in path of /sys/fs/f2fs/<devname>/, later we can add new readonly stat sysfs file into this directory, it will make <devname> directory less mess. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: compress: support compress levelChao Yu
Expand 'compress_algorithm' mount option to accept parameter as format of <algorithm>:<level>, by this way, it gives a way to allow user to do more specified config on lz4 and zstd compression level, then f2fs compression can provide higher compress ratio. In order to set compress level for lz4 algorithm, it needs to set CONFIG_LZ4HC_COMPRESS and CONFIG_F2FS_FS_LZ4HC config to enable lz4hc compress algorithm. CR and performance number on lz4/lz4hc algorithm: dd if=enwik9 of=compressed_file conv=fsync Original blocks: 244382 lz4 lz4hc-9 compressed blocks 170647 163270 compress ratio 69.8% 66.8% speed 16.4207 s, 60.9 MB/s 26.7299 s, 37.4 MB/s compress ratio = after / before Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: compress: deny setting unsupported compress algorithmChao Yu
If kernel doesn't support certain kinds of compress algorithm, deny to set them as compress algorithm of f2fs via 'compress_algorithm=%s' mount option. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: relocate f2fs_precache_extents()Chao Yu
Relocate f2fs_precache_extents() in prior to check_swap_activate(), then extent cache can be enabled before its use. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: enforce the immutable flag on open filesChao Yu
This patch ports commit 02b016ca7f99 ("ext4: enforce the immutable flag on open files") to f2fs. According to the chattr man page, "a file with the 'i' attribute cannot be modified..." Historically, this was only enforced when the file was opened, per the rest of the description, "... and the file can not be opened in write mode". There is general agreement that we should standardize all file systems to prevent modifications even for files that were opened at the time the immutable flag is set. Eventually, a change to enforce this at the VFS layer should be landing in mainline. Cc: stable@kernel.org Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: enhance to update i_mode and acl atomically in f2fs_setattr()Chao Yu
Previously, in f2fs_setattr(), we don't update S_ISUID|S_ISGID|S_ISVTX bits with S_IRWXUGO bits and acl entries atomically, so in error path, chmod() may partially success, this patch enhances to make chmod() flow being atomical. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: fix to set inode->i_mode correctly for posix_acl_update_modeWeichao Guo
We should update the ~S_IRWXUGO part of inode->i_mode in __setattr_copy, because posix_acl_update_mode updates mode based on inode->i_mode, which finally overwrites the ~S_IRWXUGO part of i_acl_mode with old i_mode. Testcase to reproduce this bug: 0. adduser abc 1. mkfs.f2fs /dev/sdd 2. mount -t f2fs /dev/sdd /mnt/f2fs 3. mkdir /mnt/f2fs/test 4. setfacl -m u:abc:r /mnt/f2fs/test 5. chmod +s /mnt/f2fs/test Signed-off-by: Weichao Guo <guoweichao@oppo.com> Signed-off-by: Bin Shu <shubin@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: Replace expression with offsetof()Zheng Yongjun
Use the existing offsetof() macro instead of duplicating code. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: handle unallocated section and zone on pinned/atgcJaegeuk Kim
If we have large section/zone, unallocated segment makes them corrupted. E.g., - Pinned file: -1 119304647 119304647 - ATGC data: -1 119304647 119304647 Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27f2fs: remove FAULT_ALLOC_BIOChristoph Hellwig
Sleeping bio allocations do not fail, which means that injecting an error into sleeping bio allocations is a little silly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27f2fs: use blkdev_issue_flush in __submit_flush_waitChristoph Hellwig
Use the blkdev_issue_flush helper instead of duplicating it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-24block: store a block_device pointer in struct bioChristoph Hellwig
Replace the gendisk pointer in struct bio with a pointer to the newly improved struct block device. From that the gendisk can be trivially accessed with an extra indirection, but it also allows to directly look up all information related to partition remapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-17Merge tag 'f2fs-for-5.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've made more work into per-file compression support. For example, F2FS_IOC_GET | SET_COMPRESS_OPTION provides a way to change the algorithm or cluster size per file. F2FS_IOC_COMPRESS | DECOMPRESS_FILE provides a way to compress and decompress the existing normal files manually. There is also a new mount option, compress_mode=fs|user, which can control who compresses the data. Chao also added a checksum feature with a mount option so that we are able to detect any corrupted cluster. In addition, Daniel contributed casefolding with encryption patch, which will be used for Android devices. Summary: Enhancements: - add ioctls and mount option to manage per-file compression feature - support casefolding with encryption - support checksum for compressed cluster - avoid IO starvation by replacing mutex with rwsem - add sysfs, max_io_bytes, to control max bio size Bug fixes: - fix use-after-free issue when compression and fsverity are enabled - fix consistency corruption during fault injection test - fix data offset for lseek - get rid of buffer_head which has 32bits limit in fiemap - fix some bugs in multi-partitions support - fix nat entry count calculation in shrinker - fix some stat information And, we've refactored some logics and fix minor bugs as well" * tag 'f2fs-for-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (36 commits) f2fs: compress: fix compression chksum f2fs: fix shift-out-of-bounds in sanity_check_raw_super() f2fs: fix race of pending_pages in decompression f2fs: fix to account inline xattr correctly during recovery f2fs: inline: fix wrong inline inode stat f2fs: inline: correct comment in f2fs_recover_inline_data f2fs: don't check PAGE_SIZE again in sanity_check_raw_super() f2fs: convert to F2FS_*_INO macro f2fs: introduce max_io_bytes, a sysfs entry, to limit bio size f2fs: don't allow any writes on readonly mount f2fs: avoid race condition for shrinker count f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE f2fs: add compress_mode mount option f2fs: Remove unnecessary unlikely() f2fs: init dirty_secmap incorrectly f2fs: remove buffer_head which has 32bits limit f2fs: fix wrong block count instead of bytes f2fs: use new conversion functions between blks and bytes f2fs: rename logical_to_blk and blk_to_logical f2fs: fix kbytes written stat for multi-device case ...
2020-12-16Merge tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: "Another series of killing more code than what is being added, again thanks to Christoph's relentless cleanups and tech debt tackling. This contains: - blk-iocost improvements (Baolin Wang) - part0 iostat fix (Jeffle Xu) - Disable iopoll for split bios (Jeffle Xu) - block tracepoint cleanups (Christoph Hellwig) - Merging of struct block_device and hd_struct (Christoph Hellwig) - Rework/cleanup of how block device sizes are updated (Christoph Hellwig) - Simplification of gendisk lookup and removal of block device aliasing (Christoph Hellwig) - Block device ioctl cleanups (Christoph Hellwig) - Removal of bdget()/blkdev_get() as exported API (Christoph Hellwig) - Disk change rework, avoid ->revalidate_disk() (Christoph Hellwig) - sbitmap improvements (Pavel Begunkov) - Hybrid polling fix (Pavel Begunkov) - bvec iteration improvements (Pavel Begunkov) - Zone revalidation fixes (Damien Le Moal) - blk-throttle limit fix (Yu Kuai) - Various little fixes" * tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block: (126 commits) blk-mq: fix msec comment from micro to milli seconds blk-mq: update arg in comment of blk_mq_map_queue blk-mq: add helper allocating tagset->tags Revert "block: Fix a lockdep complaint triggered by request queue flushing" nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class blk-mq: add new API of blk_mq_hctx_set_fq_lock_class block: disable iopoll for split bio block: Improve blk_revalidate_disk_zones() checks sbitmap: simplify wrap check sbitmap: replace CAS with atomic and sbitmap: remove swap_lock sbitmap: optimise sbitmap_deferred_clear() blk-mq: skip hybrid polling if iopoll doesn't spin blk-iocost: Factor out the base vrate change into a separate function blk-iocost: Factor out the active iocgs' state check into a separate function blk-iocost: Move the usage ratio calculation to the correct place blk-iocost: Remove unnecessary advance declaration blk-iocost: Fix some typos in comments blktrace: fix up a kerneldoc comment block: remove the request_queue to argument request based tracepoints ...
2020-12-10f2fs: compress: fix compression chksumChao Yu
This patch addresses minor issues in compression chksum. Fixes: b28f047b28c5 ("f2fs: compress: support chksum") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-10f2fs: fix shift-out-of-bounds in sanity_check_raw_super()Chao Yu
syzbot reported a bug which could cause shift-out-of-bounds issue, fix it. Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395 sanity_check_raw_super fs/f2fs/super.c:2812 [inline] read_raw_super_block fs/f2fs/super.c:3267 [inline] f2fs_fill_super.cold+0x16c9/0x16f6 fs/f2fs/super.c:3519 mount_bdev+0x34d/0x410 fs/super.c:1366 legacy_get_tree+0x105/0x220 fs/fs_context.c:592 vfs_get_tree+0x89/0x2f0 fs/super.c:1496 do_new_mount fs/namespace.c:2896 [inline] path_mount+0x12ae/0x1e70 fs/namespace.c:3227 do_mount fs/namespace.c:3240 [inline] __do_sys_mount fs/namespace.c:3448 [inline] __se_sys_mount fs/namespace.c:3425 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3425 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: syzbot+ca9a785f8ac472085994@syzkaller.appspotmail.com Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-08f2fs: fix race of pending_pages in decompressionDaeho Jeong
I found out f2fs_free_dic() is invoked in a wrong timing, but f2fs_verify_bio() still needed the dic info and it triggered the below kernel panic. It has been caused by the race condition of pending_pages value between decompression and verity logic, when the same compression cluster had been split in different bios. By split bios, f2fs_verify_bio() ended up with decreasing pending_pages value before it is reset to nr_cpages by f2fs_decompress_pages() and caused the kernel panic. [ 4416.564763] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 ... [ 4416.896016] Workqueue: fsverity_read_queue f2fs_verity_work [ 4416.908515] pc : fsverity_verify_page+0x20/0x78 [ 4416.913721] lr : f2fs_verify_bio+0x11c/0x29c [ 4416.913722] sp : ffffffc019533cd0 [ 4416.913723] x29: ffffffc019533cd0 x28: 0000000000000402 [ 4416.913724] x27: 0000000000000001 x26: 0000000000000100 [ 4416.913726] x25: 0000000000000001 x24: 0000000000000004 [ 4416.913727] x23: 0000000000001000 x22: 0000000000000000 [ 4416.913728] x21: 0000000000000000 x20: ffffffff2076f9c0 [ 4416.913729] x19: ffffffff2076f9c0 x18: ffffff8a32380c30 [ 4416.913731] x17: ffffffc01f966d97 x16: 0000000000000298 [ 4416.913732] x15: 0000000000000000 x14: 0000000000000000 [ 4416.913733] x13: f074faec89ffffff x12: 0000000000000000 [ 4416.913734] x11: 0000000000001000 x10: 0000000000001000 [ 4416.929176] x9 : ffffffff20d1f5c7 x8 : 0000000000000000 [ 4416.929178] x7 : 626d7464ff286b6b x6 : ffffffc019533ade [ 4416.929179] x5 : 000000008049000e x4 : ffffffff2793e9e0 [ 4416.929180] x3 : 000000008049000e x2 : ffffff89ecfa74d0 [ 4416.929181] x1 : 0000000000000c40 x0 : ffffffff2076f9c0 [ 4416.929184] Call trace: [ 4416.929187] fsverity_verify_page+0x20/0x78 [ 4416.929189] f2fs_verify_bio+0x11c/0x29c [ 4416.929192] f2fs_verity_work+0x58/0x84 [ 4417.050667] process_one_work+0x270/0x47c [ 4417.055354] worker_thread+0x27c/0x4d8 [ 4417.059784] kthread+0x13c/0x320 [ 4417.063693] ret_from_fork+0x10/0x18 Chao pointed this can happen by the below race condition. Thread A f2fs_post_read_wq fsverity_wq - f2fs_read_multi_pages() - f2fs_alloc_dic - dic->pending_pages = 2 - submit_bio() - submit_bio() - f2fs_post_read_work() handle first bio - f2fs_decompress_work() - __read_end_io() - f2fs_decompress_pages() - dic->pending_pages-- - enqueue f2fs_verity_work() - f2fs_verity_work() handle first bio - f2fs_verify_bio() - dic->pending_pages-- - f2fs_post_read_work() handle second bio - f2fs_decompress_work() - enqueue f2fs_verity_work() - f2fs_verify_pages() - f2fs_free_dic() - f2fs_verity_work() handle second bio - f2fs_verfy_bio() - use-after-free on dic Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-08f2fs: fix to account inline xattr correctly during recoveryChao Yu
During recovery, we may missed to update inline xattr count correctly, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-08f2fs: inline: fix wrong inline inode statJack Qiu
Miss to stat inline inode in f2fs_recover_inline_data. Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-08f2fs: inline: correct comment in f2fs_recover_inline_dataJack Qiu
In 3rd scene, it should remove data blocks instead of inline_data. Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-08f2fs: don't check PAGE_SIZE again in sanity_check_raw_super()Yangtao Li
Many flash devices read and write a single IO based on a multiple of 4KB, and we support only 4KB page cache size now. Since we already check page size in init_f2fs_fs(), so remove page size check in sanity_check_raw_super(). Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Shaohua Liu <liush@allwinnertech.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-08f2fs: convert to F2FS_*_INO macroYangtao Li
Use F2FS_ROOT_INO, F2FS_NODE_INO and F2FS_META_INO macro for better code readability. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Shaohua Liu <liush@allwinnertech.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-07f2fs: introduce max_io_bytes, a sysfs entry, to limit bio sizeJaegeuk Kim
This patch adds max_io_bytes to limit bio size when f2fs tries to merge consecutive IOs. This can give a testing point to split out bios and check end_io handles those bios correctly. This is used to capture a recent bug on the decompression and fsverity flow. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-07f2fs: don't allow any writes on readonly mountJaegeuk Kim
generic_make_request: Trying to write to read-only block-device dm-5 (partno 0) WARNING: CPU: 7 PID: 546 at block/blk-core.c:2190 generic_make_request_checks+0x664/0x690 pc : generic_make_request_checks+0x664/0x690 lr : generic_make_request_checks+0x664/0x690 Call trace: generic_make_request_checks+0x664/0x690 generic_make_request+0xf0/0x3a4 submit_bio+0x80/0x250 __submit_merged_bio+0x368/0x4e0 __submit_merged_write_cond.llvm.12294350193007536502+0xe0/0x3e8 f2fs_wait_on_page_writeback+0x84/0x128 f2fs_convert_inline_page+0x35c/0x6f8 f2fs_convert_inline_inode+0xe0/0x2e0 f2fs_file_mmap+0x48/0x9c mmap_region+0x41c/0x74c do_mmap+0x40c/0x4fc vm_mmap_pgoff+0xb8/0x114 vm_mmap+0x34/0x48 elf_map+0x68/0x108 load_elf_binary+0x538/0xb70 search_binary_handler+0xac/0x1dc exec_binprm+0x50/0x15c __do_execve_file+0x620/0x740 __arm64_sys_execve+0x54/0x68 el0_svc_common+0x9c/0x168 el0_svc_handler+0x60/0x6c el0_svc+0x8/0xc Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-03f2fs: avoid race condition for shrinker countJaegeuk Kim
Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in wrong do_shinker work. Let's avoid to return insane overflowed value by adding single tracking value. Reported-by: Light Hsieh <Light.Hsieh@mediatek.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-03f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILEDaeho Jeong
Added two ioctl to decompress/compress explicitly the compression enabled file in "compress_mode=user" mount option. Using these two ioctls, the users can make a control of compression and decompression of their files. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>