summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-05-09Merge tag 'for-linus-5.2-ofs1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "This includes one fix and our "Orangefs through the pagecache" patch series which greatly improves our small IO performance and helps us pass more xfstests than before. Fix: - orangefs: truncate before updating size Pagecache series: - all the rest" * tag 'for-linus-5.2-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: (23 commits) orangefs: truncate before updating size orangefs: copy Orangefs-sized blocks into the pagecache if possible. orangefs: pass slot index back to readpage. orangefs: remember count when reading. orangefs: add orangefs_revalidate_mapping orangefs: implement writepages orangefs: write range tracking orangefs: avoid fsync service operation on flush orangefs: skip inode writeout if nothing to write orangefs: move do_readv_writev to direct_IO orangefs: do not return successful read when the client-core disappeared orangefs: implement writepage orangefs: migrate to generic_file_read_iter orangefs: service ops done for writeback are not killable orangefs: remove orangefs_readpages orangefs: reorganize setattr functions to track attribute changes orangefs: let setattr write to cached inode orangefs: set up and use backing_dev_info orangefs: hold i_lock during inode_getattr orangefs: update attributes rather than relying on server ...
2019-05-09fsnotify: fix unlink performance regressionAmir Goldstein
__fsnotify_parent() has an optimization in place to avoid unneeded take_dentry_name_snapshot(). When fsnotify_nameremove() was changed not to call __fsnotify_parent(), we left out the optimization. Kernel test robot reported a 5% performance regression in concurrent unlink() workload. Reported-by: kernel test robot <rong.a.chen@intel.com> Link: https://lore.kernel.org/lkml/20190505062153.GG29809@shao2-debian/ Link: https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/ Fixes: 5f02a8776384 ("fsnotify: annotate directory entry modification events") CC: stable@vger.kernel.org Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-05-09Btrfs: do not abort transaction at btrfs_update_root() after failure to COW pathFilipe Manana
Currently when we fail to COW a path at btrfs_update_root() we end up always aborting the transaction. However all the current callers of btrfs_update_root() are able to deal with errors returned from it, many do end up aborting the transaction themselves (directly or not, such as the transaction commit path), other BUG_ON() or just gracefully cancel whatever they were doing. When syncing the fsync log, we call btrfs_update_root() through tree-log.c:update_log_root(), and if it returns an -ENOSPC error, the log sync code does not abort the transaction, instead it gracefully handles the error and returns -EAGAIN to the fsync handler, so that it falls back to a transaction commit. Any other error different from -ENOSPC, makes the log sync code abort the transaction. So remove the transaction abort from btrfs_update_log() when we fail to COW a path to update the root item, so that if an -ENOSPC failure happens we avoid aborting the current transaction and have a chance of the fsync succeeding after falling back to a transaction commit. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203413 Fixes: 79787eaab46121 ("btrfs: replace many BUG_ONs with proper error handling") Cc: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-09btrfs: use the existing reserved items for our first prop for inheritanceJosef Bacik
We're now reserving an extra items worth of space for property inheritance. We only have one property at the moment so this covers us, but if we add more in the future this will allow us to not get bitten by the extra space reservation. If we do add more properties in the future we should re-visit how we calculate the space reservation needs by the callers. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> [ refreshed on top of prop/xattr cleanups ] Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-09do_move_mount(): fix an unsafe use of is_anon_ns()Al Viro
What triggers it is a race between mount --move and umount -l of the source; we should reject it (the source is parentless *and* not the root of anon namespace at that), but the check for namespace being an anon one is broken in that case - is_anon_ns() needs ns to be non-NULL. Better fixed here than in is_anon_ns(), since the rest of the callers is guaranteed to get a non-NULL argument... Reported-by: syzbot+494c7ddf66acac0ad747@syzkaller.appspotmail.com Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-08f2fs: fix to avoid potential race on sbi->unusable_block_count access/updateChao Yu
Use sbi.stat_lock to protect sbi->unusable_block_count accesss/udpate, in order to avoid potential race on it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: add tracepoint for f2fs_filemap_fault()Chao Yu
This patch adds tracepoint for f2fs_filemap_fault(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: introduce DATA_GENERIC_ENHANCEChao Yu
Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check whether @blkaddr locates in main area or not. That check is weak, since the block address in range of main area can point to the address which is not valid in segment info table, and we can not detect such condition, we may suffer worse corruption as system continues running. So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check which trigger SIT bitmap check rather than only range check. This patch did below changes as wel: - set SBI_NEED_FSCK in f2fs_is_valid_blkaddr(). - get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid. - introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check. - spread blkaddr check in: * f2fs_get_node_info() * __read_out_blkaddrs() * f2fs_submit_page_read() * ra_data_block() * do_recover_data() This patch can fix bug reported from bugzilla below: https://bugzilla.kernel.org/show_bug.cgi?id=203215 https://bugzilla.kernel.org/show_bug.cgi?id=203223 https://bugzilla.kernel.org/show_bug.cgi?id=203231 https://bugzilla.kernel.org/show_bug.cgi?id=203235 https://bugzilla.kernel.org/show_bug.cgi?id=203241 = Update by Jaegeuk Kim = DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths. But, xfstest/generic/446 compalins some generated kernel messages saying invalid bitmap was detected when reading a block. The reaons is, when we get the block addresses from extent_cache, there is no lock to synchronize it from truncating the blocks in parallel. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to handle error in f2fs_disable_checkpoint()Chao Yu
In f2fs_disable_checkpoint(), it needs to detect and propagate error number returned from f2fs_write_checkpoint(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: remove redundant check in f2fs_file_write_iter()Chengguang Xu
We have already checked flag IOCB_DIRECT in the sanity check of flag IOCB_NOWAIT, so don't have to check it again here. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to be aware of readonly device in write_checkpoint()Chao Yu
As Park Ju Hyung reported: Probably unrelated but a similar issue: Warning appears upon unmounting a corrupted R/O f2fs loop image. Should be a trivial issue to fix as well :) [ 2373.758424] ------------[ cut here ]------------ [ 2373.758428] generic_make_request: Trying to write to read-only block-device loop1 (partno 0) [ 2373.758455] WARNING: CPU: 1 PID: 13950 at block/blk-core.c:2174 generic_make_request_checks+0x590/0x630 [ 2373.758556] CPU: 1 PID: 13950 Comm: umount Tainted: G O 4.19.35-zen+ #1 [ 2373.758558] Hardware name: System manufacturer System Product Name/ROG MAXIMUS X HERO (WI-FI AC), BIOS 1704 09/14/2018 [ 2373.758564] RIP: 0010:generic_make_request_checks+0x590/0x630 [ 2373.758567] Code: 5c 03 00 00 48 8d 74 24 08 48 89 df c6 05 b5 cd 36 01 01 e8 c2 90 01 00 48 89 c6 44 89 ea 48 c7 c7 98 64 59 82 e8 d5 9b a7 ff <0f> 0b 48 8b 7b 08 e9 f2 fa ff ff 41 8b 86 98 02 00 00 49 8b 16 89 [ 2373.758570] RSP: 0018:ffff8882bdb43950 EFLAGS: 00010282 [ 2373.758573] RAX: 0000000000000050 RBX: ffff8887244c6700 RCX: 0000000000000006 [ 2373.758575] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff88884ec56340 [ 2373.758577] RBP: ffff888849c426c0 R08: 0000000000000004 R09: 00000000000003ba [ 2373.758579] R10: 0000000000000001 R11: 0000000000000029 R12: 0000000000001000 [ 2373.758581] R13: 0000000000000000 R14: ffff888844a2e800 R15: ffff8882bdb43ac0 [ 2373.758584] FS: 00007fc0d114f8c0(0000) GS:ffff88884ec40000(0000) knlGS:0000000000000000 [ 2373.758586] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2373.758588] CR2: 00007fc0d1ad12c0 CR3: 00000002bdb82003 CR4: 00000000003606e0 [ 2373.758590] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2373.758592] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2373.758593] Call Trace: [ 2373.758602] ? generic_make_request+0x46/0x3d0 [ 2373.758608] ? wait_woken+0x80/0x80 [ 2373.758612] ? mempool_alloc+0xb7/0x1a0 [ 2373.758618] ? submit_bio+0x30/0x110 [ 2373.758622] ? bvec_alloc+0x7c/0xd0 [ 2373.758628] ? __submit_merged_bio+0x68/0x390 [ 2373.758633] ? f2fs_submit_page_write+0x1bb/0x7f0 [ 2373.758638] ? f2fs_do_write_meta_page+0x7f/0x160 [ 2373.758642] ? __f2fs_write_meta_page+0x70/0x140 [ 2373.758647] ? f2fs_sync_meta_pages+0x140/0x250 [ 2373.758653] ? f2fs_write_checkpoint+0x5c5/0x17b0 [ 2373.758657] ? f2fs_sync_fs+0x9c/0x110 [ 2373.758664] ? sync_filesystem+0x66/0x80 [ 2373.758667] ? generic_shutdown_super+0x1d/0x100 [ 2373.758670] ? kill_block_super+0x1c/0x40 [ 2373.758674] ? kill_f2fs_super+0x64/0xb0 [ 2373.758678] ? deactivate_locked_super+0x2d/0xb0 [ 2373.758682] ? cleanup_mnt+0x65/0xa0 [ 2373.758688] ? task_work_run+0x7f/0xa0 [ 2373.758693] ? exit_to_usermode_loop+0x9c/0xa0 [ 2373.758698] ? do_syscall_64+0xc7/0xf0 [ 2373.758703] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 2373.758706] ---[ end trace 5d3639907c56271b ]--- [ 2373.758780] print_req_error: I/O error, dev loop1, sector 143048 [ 2373.758800] print_req_error: I/O error, dev loop1, sector 152200 [ 2373.758808] print_req_error: I/O error, dev loop1, sector 8192 [ 2373.758819] print_req_error: I/O error, dev loop1, sector 12272 This patch adds to detect readonly device in write_checkpoint() to avoid trigger write IOs on it. Reported-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to skip recovery on readonly deviceChao Yu
As Park Ju Hyung reported in mailing list: https://sourceforge.net/p/linux-f2fs/mailman/message/36639787/ generic_make_request: Trying to write to read-only block-device loop0 (partno 0) WARNING: CPU: 0 PID: 23437 at block/blk-core.c:2174 generic_make_request_checks+0x594/0x630 generic_make_request+0x46/0x3d0 submit_bio+0x30/0x110 __submit_merged_bio+0x68/0x390 f2fs_submit_page_write+0x1bb/0x7f0 f2fs_do_write_meta_page+0x7f/0x160 __f2fs_write_meta_page+0x70/0x140 f2fs_sync_meta_pages+0x140/0x250 f2fs_write_checkpoint+0x5c5/0x17b0 f2fs_sync_fs+0x9c/0x110 sync_filesystem+0x66/0x80 f2fs_recover_fsync_data+0x790/0xa30 f2fs_fill_super+0xe4e/0x1980 mount_bdev+0x518/0x610 mount_fs+0x34/0x13f vfs_kern_mount.part.11+0x4f/0x120 do_mount+0x2d1/0xe40 __x64_sys_mount+0xbf/0xe0 do_syscall_64+0x4a/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 print_req_error: I/O error, dev loop0, sector 4096 If block device is readonly, we should never trigger write IO from filesystem layer, but previously, orphan and journal recovery didn't consider such condition, result in triggering above warning, fix it. Reported-by: Park Ju Hyung <qkrwngud825@gmail.com> Tested-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to consider multiple device for readonly checkChao Yu
This patch introduce f2fs_hw_is_readonly() to check whether lower device is readonly or not, it adapts multiple device scenario. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: relocate chksum_offset for large_nat_bitmap featureChao Yu
For large_nat_bitmap feature, there is a design flaw: Previous: struct f2fs_checkpoint layout: +--------------------------+ 0x0000 | checkpoint_ver | | ...... | | checksum_offset |------+ | ...... | | | sit_nat_version_bitmap[] |<-----|-------+ | ...... | | | | checksum_value |<-----+ | +--------------------------+ 0x1000 | | | nat_bitmap + sit_bitmap | payload blocks | | | | | +--------------------------|<-------------+ Obviously, if nat_bitmap size + sit_bitmap size is larger than MAX_BITMAP_SIZE_IN_CKPT, nat_bitmap or sit_bitmap may overlap checkpoint checksum's position, once checkpoint() is triggered from kernel, nat or sit bitmap will be damaged by checksum field. In order to fix this, let's relocate checksum_value's position to the head of sit_nat_version_bitmap as below, then nat/sit bitmap and chksum value update will become safe. After: struct f2fs_checkpoint layout: +--------------------------+ 0x0000 | checkpoint_ver | | ...... | | checksum_offset |------+ | ...... | | | sit_nat_version_bitmap[] |<-----+ | ...... |<-------------+ | | | +--------------------------+ 0x1000 | | | nat_bitmap + sit_bitmap | payload blocks | | | | | +--------------------------|<-------------+ Related report and discussion: https://sourceforge.net/p/linux-f2fs/mailman/message/36642346/ Reported-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: allow unfixed f2fs_checkpoint.checksum_offsetChao Yu
Previously, f2fs_checkpoint.checksum_offset points fixed position of f2fs_checkpoint structure: "#define CP_CHKSUM_OFFSET 4092" It is unnecessary, and it breaks the consecutiveness of nat and sit bitmap stored across checkpoint park block and payload blocks. This patch allows f2fs to handle unfixed .checksum_offset. In addition, for the case checksum value is stored in the middle of checkpoint park, calculating checksum value with superposition method like we did for inode_checksum. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: Replace spaces with tabYoungjun Yoo
Modify coding style ERROR: code indent should use tabs where possible Signed-off-by: Youngjun Yoo <youngjun.willow@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: insert space before the open parenthesis '('Youngjun Yoo
Modify coding style ERROR: space required before the open parenthesis '(' Signed-off-by: Youngjun Yoo <youngjun.willow@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: allow address pointer number of dnode aligning to specified sizeChao Yu
This patch expands scalability of dnode layout, it allows address pointer number of dnode aligning to specified size (now, the size is one byte by default), and later the number can align to compress cluster size (1 << n bytes, n=[2,..)), it can avoid cluster acrossing two dnode, making design of compress meta layout simple. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: introduce f2fs_read_single_page() for cleanupChao Yu
This patch introduces f2fs_read_single_page() to wrap core operations of reading one page in f2fs_mpage_readpages(). In addition, if we failed in f2fs_mpage_readpages(), propagate error number to f2fs_read_data_page(), for f2fs_read_data_pages() path, always return success. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: mark is_extension_exist() inlinePark Ju Hyung
The caller set_file_temperature() is marked as inline as well. It doesn't make much sense to leave is_extension_exist() un-inlined. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to set FI_UPDATE_WRITE correctlyChao Yu
This patch fixes to set FI_UPDATE_WRITE only if in-place IO was issued. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to avoid panic in f2fs_inplace_write_data()Chao Yu
As Jungyeon reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203239 - Overview When mounting the attached crafted image and running program, following errors are reported. Additionally, it hangs on sync after running program. The image is intentionally fuzzed from a normal f2fs image for testing. Compile options for F2FS are as follows. CONFIG_F2FS_FS=y CONFIG_F2FS_STAT_FS=y CONFIG_F2FS_FS_XATTR=y CONFIG_F2FS_FS_POSIX_ACL=y CONFIG_F2FS_CHECK_FS=y - Reproduces cc poc_15.c ./run.sh f2fs sync - Kernel messages ------------[ cut here ]------------ kernel BUG at fs/f2fs/segment.c:3162! RIP: 0010:f2fs_inplace_write_data+0x12d/0x160 Call Trace: f2fs_do_write_data_page+0x3c1/0x820 __write_data_page+0x156/0x720 f2fs_write_cache_pages+0x20d/0x460 f2fs_write_data_pages+0x1b4/0x300 do_writepages+0x15/0x60 __filemap_fdatawrite_range+0x7c/0xb0 file_write_and_wait_range+0x2c/0x80 f2fs_do_sync_file+0x102/0x810 do_fsync+0x33/0x60 __x64_sys_fsync+0xb/0x10 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The reason is f2fs_inplace_write_data() will trigger kernel panic due to data block locates in node type segment. To avoid panic, let's just return error code and set SBI_NEED_FSCK to give a hint to fsck for latter repairing. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to do sanity check on valid block count of segmentChao Yu
As Jungyeon reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203233 - Overview When mounting the attached crafted image and running program, following errors are reported. Additionally, it hangs on sync after running program. The image is intentionally fuzzed from a normal f2fs image for testing. Compile options for F2FS are as follows. CONFIG_F2FS_FS=y CONFIG_F2FS_STAT_FS=y CONFIG_F2FS_FS_XATTR=y CONFIG_F2FS_FS_POSIX_ACL=y CONFIG_F2FS_CHECK_FS=y - Reproduces cc poc_13.c mkdir test mount -t f2fs tmp.img test cp a.out test cd test sudo ./a.out sync - Kernel messages F2FS-fs (sdb): Bitmap was wrongly set, blk:4608 kernel BUG at fs/f2fs/segment.c:2102! RIP: 0010:update_sit_entry+0x394/0x410 Call Trace: f2fs_allocate_data_block+0x16f/0x660 do_write_page+0x62/0x170 f2fs_do_write_node_page+0x33/0xa0 __write_node_page+0x270/0x4e0 f2fs_sync_node_pages+0x5df/0x670 f2fs_write_checkpoint+0x372/0x1400 f2fs_sync_fs+0xa3/0x130 f2fs_do_sync_file+0x1a6/0x810 do_fsync+0x33/0x60 __x64_sys_fsync+0xb/0x10 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 sit.vblocks and sum valid block count in sit.valid_map may be inconsistent, segment w/ zero vblocks will be treated as free segment, while allocating in free segment, we may allocate a free block, if its bitmap is valid previously, it can cause kernel crash due to bitmap verification failure. Anyway, to avoid further serious metadata inconsistence and corruption, it is necessary and worth to detect SIT inconsistence. So let's enable check_block_count() to verify vblocks and valid_map all the time rather than do it only CONFIG_F2FS_CHECK_FS is enabled. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to do sanity check on valid node/block countChao Yu
As Jungyeon reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203229 - Overview When mounting the attached crafted image, following errors are reported. Additionally, it hangs on sync after trying to mount it. The image is intentionally fuzzed from a normal f2fs image for testing. Compile options for F2FS are as follows. CONFIG_F2FS_FS=y CONFIG_F2FS_STAT_FS=y CONFIG_F2FS_FS_XATTR=y CONFIG_F2FS_FS_POSIX_ACL=y CONFIG_F2FS_CHECK_FS=y - Reproduces mkdir test mount -t f2fs tmp.img test sync - Kernel message kernel BUG at fs/f2fs/recovery.c:591! RIP: 0010:recover_data+0x12d8/0x1780 Call Trace: f2fs_recover_fsync_data+0x613/0x710 f2fs_fill_super+0x1043/0x1aa0 mount_bdev+0x16d/0x1a0 mount_fs+0x4a/0x170 vfs_kern_mount+0x5d/0x100 do_mount+0x200/0xcf0 ksys_mount+0x79/0xc0 __x64_sys_mount+0x1c/0x20 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 With corrupted image wihch has out-of-range valid node/block count, during recovery, once we failed due to no free space, it will trigger kernel panic. Adding sanity check on valid node/block count in f2fs_sanity_check_ckpt() to detect such condition, so that potential panic can be avoided. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to avoid panic in do_recover_data()Chao Yu
As Jungyeon reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203227 - Overview When mounting the attached crafted image, following errors are reported. Additionally, it hangs on sync after trying to mount it. The image is intentionally fuzzed from a normal f2fs image for testing. Compile options for F2FS are as follows. CONFIG_F2FS_FS=y CONFIG_F2FS_STAT_FS=y CONFIG_F2FS_FS_XATTR=y CONFIG_F2FS_FS_POSIX_ACL=y CONFIG_F2FS_CHECK_FS=y - Reproduces mkdir test mount -t f2fs tmp.img test sync - Messages kernel BUG at fs/f2fs/recovery.c:549! RIP: 0010:recover_data+0x167a/0x1780 Call Trace: f2fs_recover_fsync_data+0x613/0x710 f2fs_fill_super+0x1043/0x1aa0 mount_bdev+0x16d/0x1a0 mount_fs+0x4a/0x170 vfs_kern_mount+0x5d/0x100 do_mount+0x200/0xcf0 ksys_mount+0x79/0xc0 __x64_sys_mount+0x1c/0x20 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 During recovery, if ofs_of_node is inconsistent in between recovered node page and original checkpointed node page, let's just fail recovery instead of making kernel panic. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to do sanity check on free nidChao Yu
As Jungyeon reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203225 - Overview When mounting the attached crafted image and unmounting it, following errors are reported. Additionally, it hangs on sync after unmounting. The image is intentionally fuzzed from a normal f2fs image for testing. Compile options for F2FS are as follows. CONFIG_F2FS_FS=y CONFIG_F2FS_STAT_FS=y CONFIG_F2FS_FS_XATTR=y CONFIG_F2FS_FS_POSIX_ACL=y CONFIG_F2FS_CHECK_FS=y - Reproduces mkdir test mount -t f2fs tmp.img test touch test/t umount test sync - Messages kernel BUG at fs/f2fs/node.c:3073! RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300 Call Trace: f2fs_put_super+0xf4/0x270 generic_shutdown_super+0x62/0x110 kill_block_super+0x1c/0x50 kill_f2fs_super+0xad/0xd0 deactivate_locked_super+0x35/0x60 cleanup_mnt+0x36/0x70 task_work_run+0x75/0x90 exit_to_usermode_loop+0x93/0xa0 do_syscall_64+0xba/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300 NAT table is corrupted, so reserved meta/node inode ids were added into free list incorrectly, during file creation, since reserved id has cached in inode hash, so it fails the creation and preallocated nid can not be released later, result in kernel panic. To fix this issue, let's do nid boundary check during free nid loading. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to do checksum even if inode page is uptodateChao Yu
As Jungyeon reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203221 - Overview When mounting the attached crafted image and running program, this error is reported. The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on. - Reproduces cc poc_07.c mkdir test mount -t f2fs tmp.img test cp a.out test cd test sudo ./a.out - Messages kernel BUG at fs/f2fs/node.c:1279! RIP: 0010:read_node_page+0xcf/0xf0 Call Trace: __get_node_page+0x6b/0x2f0 f2fs_iget+0x8f/0xdf0 f2fs_lookup+0x136/0x320 __lookup_slow+0x92/0x140 lookup_slow+0x30/0x50 walk_component+0x1c1/0x350 path_lookupat+0x62/0x200 filename_lookup+0xb3/0x1a0 do_fchmodat+0x3e/0xa0 __x64_sys_chmod+0x12/0x20 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 On below paths, we can have opportunity to readahead inode page - gc_node_segment -> f2fs_ra_node_page - gc_data_segment -> f2fs_ra_node_page - f2fs_fill_dentries -> f2fs_ra_node_page Unlike synchronized read, on readahead path, we can set page uptodate before verifying page's checksum, then read_node_page() will trigger kernel panic once it encounters a uptodated page w/ incorrect checksum. So considering readahead scenario, we have to do checksum each time when loading inode page even if it is uptodated. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to avoid panic in f2fs_remove_inode_page()Chao Yu
As Jungyeon reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203219 - Overview When mounting the attached crafted image and running program, I got this error. Additionally, it hangs on sync after running the program. The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on. - Reproduces cc poc_06.c mkdir test mount -t f2fs tmp.img test cp a.out test cd test sudo ./a.out sync - Messages kernel BUG at fs/f2fs/node.c:1183! RIP: 0010:f2fs_remove_inode_page+0x294/0x2d0 Call Trace: f2fs_evict_inode+0x2a3/0x3a0 evict+0xba/0x180 __dentry_kill+0xbe/0x160 dentry_kill+0x46/0x180 dput+0xbb/0x100 do_renameat2+0x3c9/0x550 __x64_sys_rename+0x17/0x20 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The reason is f2fs_remove_inode_page() will trigger kernel panic due to inconsistent i_blocks value of inode. To avoid panic, let's just print debug message and set SBI_NEED_FSCK to give a hint to fsck for latter repairing of potential image corruption. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix build warning and add unlikely] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to clear dirty inode in error path of f2fs_iget()Chao Yu
As Jungyeon reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203217 - Overview When mounting the attached crafted image and running program, I got this error. Additionally, it hangs on sync after running the program. The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on. - Reproduces cc poc_test_05.c mkdir test mount -t f2fs tmp.img test sudo ./a.out sync - Messages kernel BUG at fs/f2fs/inode.c:707! RIP: 0010:f2fs_evict_inode+0x33f/0x3a0 Call Trace: evict+0xba/0x180 f2fs_iget+0x598/0xdf0 f2fs_lookup+0x136/0x320 __lookup_slow+0x92/0x140 lookup_slow+0x30/0x50 walk_component+0x1c1/0x350 path_lookupat+0x62/0x200 filename_lookup+0xb3/0x1a0 do_readlinkat+0x56/0x110 __x64_sys_readlink+0x16/0x20 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 During inode loading, __recover_inline_status() can recovery inode status and set inode dirty, once we failed in following process, it will fail the check in f2fs_evict_inode, result in trigger BUG_ON(). Let's clear dirty inode in error path of f2fs_iget() to avoid panic. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: remove new blank line of f2fs kernel messageChao Yu
Just removing '\n' in f2fs_msg(, "\n") to avoid redundant new blank line. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix wrong __is_meta_io() macroChao Yu
This patch changes codes as below: - don't use is_read_io() as a condition to judge the meta IO. - use .is_por to replace .is_meta to indicate IO is from recovery explicitly. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to avoid panic in dec_valid_node_count()Chao Yu
As Jungyeon reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203213 - Overview When mounting the attached crafted image and running program, I got this error. Additionally, it hangs on sync after running the this script. The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on. - Reproduces mkdir test mount -t f2fs tmp.img test cp a.out test cd test sudo ./a.out sync kernel BUG at fs/f2fs/f2fs.h:2012! RIP: 0010:truncate_node+0x2c9/0x2e0 Call Trace: f2fs_truncate_xattr_node+0xa1/0x130 f2fs_remove_inode_page+0x82/0x2d0 f2fs_evict_inode+0x2a3/0x3a0 evict+0xba/0x180 __dentry_kill+0xbe/0x160 dentry_kill+0x46/0x180 dput+0xbb/0x100 do_renameat2+0x3c9/0x550 __x64_sys_rename+0x17/0x20 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The reason is dec_valid_node_count() will trigger kernel panic due to inconsistent count in between inode.i_blocks and actual block. To avoid panic, let's just print debug message and set SBI_NEED_FSCK to give a hint to fsck for latter repairing. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix build warning and add unlikely] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to avoid panic in dec_valid_block_count()Chao Yu
As Jungyeon reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203209 - Overview When mounting the attached crafted image and running program, I got this error. Additionally, it hangs on sync after the this script. The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on. - Reproduces cc poc_01.c ./run.sh f2fs sync kernel BUG at fs/f2fs/f2fs.h:1788! RIP: 0010:f2fs_truncate_data_blocks_range+0x342/0x350 Call Trace: f2fs_truncate_blocks+0x36d/0x3c0 f2fs_truncate+0x88/0x110 f2fs_setattr+0x3e1/0x460 notify_change+0x2da/0x400 do_truncate+0x6d/0xb0 do_sys_ftruncate+0xf1/0x160 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The reason is dec_valid_block_count() will trigger kernel panic due to inconsistent count in between inode.i_blocks and actual block. To avoid panic, let's just print debug message and set SBI_NEED_FSCK to give a hint to fsck for latter repairing. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix build warning and add unlikely] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to use inline space only if inline_xattr is enableChao Yu
With below mkfs and mount option: MKFS_OPTIONS -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f MOUNT_OPTIONS -- -o noinline_xattr We may miss xattr data with below testcase: - mkdir dir - setfattr -n "user.name" -v 0 dir - for ((i = 0; i < 190; i++)) do touch dir/$i; done - umount - mount - getfattr -n "user.name" dir user.name: No such attribute The root cause is that we persist xattr data into reserved inline xattr space, even if inline_xattr is not enable in inline directory inode, after inline dentry conversion, reserved space no longer exists, so that xattr data missed. Let's use inline xattr space only if inline_xattr flag is set on inode to fix this iusse. Fixes: 6afc662e68b5 ("f2fs: support flexible inline xattr size") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to retrieve inline xattr spaceChao Yu
With below mkfs and mount option, generic/339 of fstest will report that scratch image becomes corrupted. MKFS_OPTIONS -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f /dev/zram1 MOUNT_OPTIONS -- -o acl,user_xattr -o discard,noinline_xattr /dev/zram1 /mnt/scratch_f2fs [ASSERT] (f2fs_check_dirent_position:1315) --> Wrong position of dirent pino:1970, name: (...) level:8, dir_level:0, pgofs:951, correct range:[900, 901] In old kernel, inline data and directory always reserved 200 bytes in inode layout, even if inline_xattr is disabled, then new kernel tries to retrieve that space for non-inline xattr inode, but for inline dentry, its layout size should be fixed, so we just keep that reserved space. But the problem here is that, after inline dentry conversion, inline dentry layout no longer exists, if we still reserve inline xattr space, after dents updates, there will be a hole in inline xattr space, which can break hierarchy hash directory structure. This patch fixes this issue by retrieving inline xattr space after inline dentry conversion. Fixes: 6afc662e68b5 ("f2fs: support flexible inline xattr size") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix error path of recoveryChao Yu
There are some places in where we missed to unlock page or unlock page incorrectly, fix them. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to avoid deadloop in foreground GCChao Yu
As Jungyeon reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203211 - Overview When mounting the attached crafted image and making a new file, I got this error and the error messages keep repeating. The image is intentionally fuzzed from a normal f2fs image for testing and I run with option CONFIG_F2FS_CHECK_FS on. - Reproduces mkdir test mount -t f2fs tmp.img test cd test touch t - Messages [ 58.820451] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.821485] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.822530] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.823571] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.824616] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.825640] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.826663] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.827698] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.828719] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.829759] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.830783] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.831828] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.832869] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.833888] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.834945] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.835996] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.837028] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.838051] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.839072] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.840100] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.841147] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.842186] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.843214] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.844267] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.845282] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.846305] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.847341] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT ... (repeating) During GC, if segment type stored in SSA and SIT is inconsistent, we just skip migrating current segment directly, since we need to know the exact type to decide the migration function we use. So in foreground GC, we will easily run into a infinite loop as we may select the same victim segment which has inconsistent type due to greedy policy. In order to end up this, we choose to shutdown filesystem. For backgrond GC, we need to do that as well, so that we can avoid latter potential infinite looped foreground GC. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08Merge tag 'gfs2-for-5.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Andreas Gruenbacher: "We've got the following patches ready for this merge window: - "gfs2: Fix loop in gfs2_rbm_find (v2)" A rework of a fix we ended up reverting in 5.0 because of an iozone performance regression. - "gfs2: read journal in large chunks" "gfs2: fix race between gfs2_freeze_func and unmount" An improved version of a commit we also ended up reverting in 5.0 because of a regression in xfstest generic/311. It turns out that the journal changes were mostly innocent and that unfreeze didn't wait for the freeze to complete, which caused the filesystem to be unmounted before it was actually idle. - "gfs2: Fix occasional glock use-after-free" "gfs2: Fix iomap write page reclaim deadlock" "gfs2: Fix lru_count going negative" Fixes for various problems reported and partially fixed by Citrix engineers. Thank you very much. - "gfs2: clean_journal improperly set sd_log_flush_head" Another fix from Bob. - .. and a few other minor cleanups" * tag 'gfs2-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: read journal in large chunks gfs2: Fix iomap write page reclaim deadlock gfs2: fix race between gfs2_freeze_func and unmount gfs2: Rename gfs2_trans_{add_unrevoke => remove_revoke} gfs2: Rename sd_log_le_{revoke,ordered} gfs2: Remove unnecessary extern declarations gfs2: Remove misleading comments in gfs2_evict_inode gfs2: Replace gl_revokes with a GLF flag gfs2: Fix occasional glock use-after-free gfs2: clean_journal improperly set sd_log_flush_head gfs2: Fix lru_count going negative gfs2: Fix loop in gfs2_rbm_find (v2)
2019-05-08Merge tag '5.2-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "CIFS/SMB3 changes: - three fixes for stable - add fiemap support - improve zero-range support - various RDMA (smb direct fixes) I have an additional set of fixes (for improved handling of sparse files, mode bits, POSIX extensions) that are still being tested that are not included in this pull request but I expect to send in the next week" * tag '5.2-smb3' of git://git.samba.org/sfrench/cifs-2.6: (29 commits) cifs: update module internal version number SMB3: Clean up query symlink when reparse point cifs: fix strcat buffer overflow and reduce raciness in smb21_set_oplock_level() Negotiate and save preferred compression algorithms cifs: rename and clarify CIFS_ASYNC_OP and CIFS_NO_RESP cifs: fix credits leak for SMB1 oplock breaks smb3: Add protocol structs for change notify support cifs: fix smb3_zero_range for Azure cifs: zero-range does not require the file is sparse Add new flag on SMB3.1.1 read cifs: add fiemap support SMB3: Add defines for new negotiate contexts cifs: fix bi-directional fsctl passthrough calls cifs: smbd: take an array of reqeusts when sending upper layer data SMB3: Add handling for different FSCTL access flags cifs: Add support for FSCTL passthrough that write data to the server cifs: remove superfluous inode_lock in cifs_{strict_}fsync cifs: Call MID callback before destroying transport cifs: smbd: Retry on memory registration failure cifs: smbd: Indicate to retry on transport sending failure ...
2019-05-08fuse: clean up fuse_alloc_inodezhangliguang
This patch cleans up fuse_alloc_inode function, just simply the code, no logic change. Signed-off-by: zhangliguang <zhangliguang@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-05-08ovl: relax WARN_ON() for overlapping layers use caseAmir Goldstein
This nasty little syzbot repro: https://syzkaller.appspot.com/x/repro.syz?x=12c7a94f400000 Creates overlay mounts where the same directory is both in upper and lower layers. Simplified example: mkdir foo work mount -t overlay none foo -o"lowerdir=.,upperdir=foo,workdir=work" The repro runs several threads in parallel that attempt to chdir into foo and attempt to symlink/rename/exec/mkdir the file bar. The repro hits a WARN_ON() I placed in ovl_instantiate(), which suggests that an overlay inode already exists in cache and is hashed by the pointer of the real upper dentry that ovl_create_real() has just created. At the point of the WARN_ON(), for overlay dir inode lock is held and upper dir inode lock, so at first, I did not see how this was possible. On a closer look, I see that after ovl_create_real(), because of the overlapping upper and lower layers, a lookup by another thread can find the file foo/bar that was just created in upper layer, at overlay path foo/foo/bar and hash the an overlay inode with the new real dentry as lower dentry. This is possible because the overlay directory foo/foo is not locked and the upper dentry foo/bar is in dcache, so ovl_lookup() can find it without taking upper dir inode shared lock. Overlapping layers is considered a wrong setup which would result in unexpected behavior, but it shouldn't crash the kernel and it shouldn't trigger WARN_ON() either, so relax this WARN_ON() and leave a pr_warn() instead to cover all cases of failure to get an overlay inode. The error returned from failure to insert new inode to cache with inode_insert5() was changed to -EEXIST, to distinguish from the error -ENOMEM returned on failure to get/allocate inode with iget5_locked(). Reported-by: syzbot+9c69c282adc4edd2b540@syzkaller.appspotmail.com Fixes: 01b39dcc9568 ("ovl: use inode_insert5() to hash a newly...") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-05-08configfs: fix possible use-after-free in configfs_register_groupYueHaibing
In configfs_register_group(), if create_default_group() failed, we forget to unlink the group. It will left a invalid item in the parent list, which may trigger the use-after-free issue seen below: BUG: KASAN: use-after-free in __list_add_valid+0xd4/0xe0 lib/list_debug.c:26 Read of size 8 at addr ffff8881ef61ae20 by task syz-executor.0/5996 CPU: 1 PID: 5996 Comm: syz-executor.0 Tainted: G C 5.0.0+ #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xa9/0x10e lib/dump_stack.c:113 print_address_description+0x65/0x270 mm/kasan/report.c:187 kasan_report+0x149/0x18d mm/kasan/report.c:317 __list_add_valid+0xd4/0xe0 lib/list_debug.c:26 __list_add include/linux/list.h:60 [inline] list_add_tail include/linux/list.h:93 [inline] link_obj+0xb0/0x190 fs/configfs/dir.c:759 link_group+0x1c/0x130 fs/configfs/dir.c:784 configfs_register_group+0x56/0x1e0 fs/configfs/dir.c:1751 configfs_register_default_group+0x72/0xc0 fs/configfs/dir.c:1834 ? 0xffffffffc1be0000 iio_sw_trigger_init+0x23/0x1000 [industrialio_sw_trigger] do_one_initcall+0xbc/0x47d init/main.c:887 do_init_module+0x1b5/0x547 kernel/module.c:3456 load_module+0x6405/0x8c10 kernel/module.c:3804 __do_sys_finit_module+0x162/0x190 kernel/module.c:3898 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f494ecbcc58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003 RBP: 00007f494ecbcc70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f494ecbd6bc R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004 Allocated by task 5987: set_track mm/kasan/common.c:87 [inline] __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:497 kmalloc include/linux/slab.h:545 [inline] kzalloc include/linux/slab.h:740 [inline] configfs_register_default_group+0x4c/0xc0 fs/configfs/dir.c:1829 0xffffffffc1bd0023 do_one_initcall+0xbc/0x47d init/main.c:887 do_init_module+0x1b5/0x547 kernel/module.c:3456 load_module+0x6405/0x8c10 kernel/module.c:3804 __do_sys_finit_module+0x162/0x190 kernel/module.c:3898 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 5987: set_track mm/kasan/common.c:87 [inline] __kasan_slab_free+0x130/0x180 mm/kasan/common.c:459 slab_free_hook mm/slub.c:1429 [inline] slab_free_freelist_hook mm/slub.c:1456 [inline] slab_free mm/slub.c:3003 [inline] kfree+0xe1/0x270 mm/slub.c:3955 configfs_register_default_group+0x9a/0xc0 fs/configfs/dir.c:1836 0xffffffffc1bd0023 do_one_initcall+0xbc/0x47d init/main.c:887 do_init_module+0x1b5/0x547 kernel/module.c:3456 load_module+0x6405/0x8c10 kernel/module.c:3804 __do_sys_finit_module+0x162/0x190 kernel/module.c:3898 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8881ef61ae00 which belongs to the cache kmalloc-192 of size 192 The buggy address is located 32 bytes inside of 192-byte region [ffff8881ef61ae00, ffff8881ef61aec0) The buggy address belongs to the page: page:ffffea0007bd8680 count:1 mapcount:0 mapping:ffff8881f6c03000 index:0xffff8881ef61a700 flags: 0x2fffc0000000200(slab) raw: 02fffc0000000200 ffffea0007ca4740 0000000500000005 ffff8881f6c03000 raw: ffff8881ef61a700 000000008010000c 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881ef61ad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8881ef61ad80: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc >ffff8881ef61ae00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8881ef61ae80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8881ef61af00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 5cf6a51e6062 ("configfs: allow dynamic group creation") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-05-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Support AES128-CCM ciphers in kTLS, from Vakul Garg. 2) Add fib_sync_mem to control the amount of dirty memory we allow to queue up between synchronize RCU calls, from David Ahern. 3) Make flow classifier more lockless, from Vlad Buslov. 4) Add PHY downshift support to aquantia driver, from Heiner Kallweit. 5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces contention on SLAB spinlocks in heavy RPC workloads. 6) Partial GSO offload support in XFRM, from Boris Pismenny. 7) Add fast link down support to ethtool, from Heiner Kallweit. 8) Use siphash for IP ID generator, from Eric Dumazet. 9) Pull nexthops even further out from ipv4/ipv6 routes and FIB entries, from David Ahern. 10) Move skb->xmit_more into a per-cpu variable, from Florian Westphal. 11) Improve eBPF verifier speed and increase maximum program size, from Alexei Starovoitov. 12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit spinlocks. From Neil Brown. 13) Allow tunneling with GUE encap in ipvs, from Jacky Hu. 14) Improve link partner cap detection in generic PHY code, from Heiner Kallweit. 15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan Maguire. 16) Remove SKB list implementation assumptions in SCTP, your's truly. 17) Various cleanups, optimizations, and simplifications in r8169 driver. From Heiner Kallweit. 18) Add memory accounting on TX and RX path of SCTP, from Xin Long. 19) Switch PHY drivers over to use dynamic featue detection, from Heiner Kallweit. 20) Support flow steering without masking in dpaa2-eth, from Ioana Ciocoi. 21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri Pirko. 22) Increase the strict parsing of current and future netlink attributes, also export such policies to userspace. From Johannes Berg. 23) Allow DSA tag drivers to be modular, from Andrew Lunn. 24) Remove legacy DSA probing support, also from Andrew Lunn. 25) Allow ll_temac driver to be used on non-x86 platforms, from Esben Haabendal. 26) Add a generic tracepoint for TX queue timeouts to ease debugging, from Cong Wang. 27) More indirect call optimizations, from Paolo Abeni" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits) cxgb4: Fix error path in cxgb4_init_module net: phy: improve pause mode reporting in phy_print_status dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings net: macb: Change interrupt and napi enable order in open net: ll_temac: Improve error message on error IRQ net/sched: remove block pointer from common offload structure net: ethernet: support of_get_mac_address new ERR_PTR error net: usb: smsc: fix warning reported by kbuild test robot staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check net: dsa: support of_get_mac_address new ERR_PTR error net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats vrf: sit mtu should not be updated when vrf netdev is the link net: dsa: Fix error cleanup path in dsa_init_module l2tp: Fix possible NULL pointer dereference taprio: add null check on sched_nest to avoid potential null pointer dereference net: mvpp2: cls: fix less than zero check on a u32 variable net_sched: sch_fq: handle non connected flows net_sched: sch_fq: do not assume EDT packets are ordered net: hns3: use devm_kcalloc when allocating desc_cb net: hns3: some cleanup for struct hns3_enet_ring ...
2019-05-07Merge tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscryptLinus Torvalds
Pull fscrypt updates from Ted Ts'o: "Clean up fscrypt's dcache revalidation support, and other miscellaneous cleanups" * tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fscrypt: cache decrypted symlink target in ->i_link vfs: use READ_ONCE() to access ->i_link fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext fscrypt: only set dentry_operations on ciphertext dentries fs, fscrypt: clear DCACHE_ENCRYPTED_NAME when unaliasing directory fscrypt: fix race allowing rename() and link() of ciphertext dentries fscrypt: clean up and improve dentry revalidation fscrypt: use READ_ONCE() to access ->i_crypt_info fscrypt: remove WARN_ON_ONCE() when decryption fails fscrypt: drop inode argument from fscrypt_get_ctx()
2019-05-07cifs: update module internal version numberSteve French
To 2.20 Signed-off-by: Steve French <stfrench@microsoft.com>
2019-05-07SMB3: Clean up query symlink when reparse pointRonnie Sahlberg
Two of the common symlink formats use reparse points (unlike mfsymlinks and also unlike the SMB1 posix extensions). This is the first part of the fixes to allow these reparse points (NFS style and Windows symlinks) to be resolved properly as symlinks by the client. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-05-07cifs: fix strcat buffer overflow and reduce raciness in smb21_set_oplock_level()Christoph Probst
Change strcat to strncpy in the "None" case to fix a buffer overflow when cinode->oplock is reset to 0 by another thread accessing the same cinode. It is never valid to append "None" to any other message. Consolidate multiple writes to cinode->oplock to reduce raciness. Signed-off-by: Christoph Probst <kernel@probst.it> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org>
2019-05-07Negotiate and save preferred compression algorithmsSteve French
New negotiate context (3) allows the server and client to negotiate which compression algorithms to use. Add support for this and save it off in the server structure. Also now displayed in /proc/fs/cifs/DebugData (see below example to Windows 10) where compression algoirthm "LZ77" was negotiated: Servers: Number of credits: 326 Dialect 0x311 COMPRESS_LZ77 signed 1) Name: 192.168.92.17 Uses: 1 Capability: 0x300067 Session Status: 1 TCP status: 1 Instance: 1 See MS-XCA and MS-SMB2 2.2.3.1 for more details. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-05-07cifs: rename and clarify CIFS_ASYNC_OP and CIFS_NO_RESPRonnie Sahlberg
The flags were named confusingly. CIFS_ASYNC_OP now just means that we will not block waiting for credits to become available so we thus rename this to be CIFS_NON_BLOCKING. Change CIFS_NO_RESP to CIFS_NO_RSP_BUF to clarify that we will actually get a response from the server but we will not get/do not want a response buffer. Delete CIFSSMBNotify. This is an SMB1 function that is not used. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-05-07cifs: fix credits leak for SMB1 oplock breaksRonnie Sahlberg
For SMB1 oplock breaks we would grab one credit while sending the PDU but we would never relese the credit back since we will never receive a response to this from the server. Eventuallt this would lead to a hang once all credits are leaked. Fix this by defining a new flag CIFS_NO_SRV_RSP which indicates that there is no server response to this command and thus we need to add any credits back immediately after sending the PDU. CC: Stable <stable@vger.kernel.org> #v5.0+ Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>