Age | Commit message (Collapse) | Author |
|
iput() on sbi->node_inode can update sbi->stat_info
in the below context, if the f2fs_write_checkpoint()
has failed with error.
f2fs_balance_fs_bg+0x1ac/0x1ec
f2fs_write_node_pages+0x4c/0x260
do_writepages+0x80/0xbc
__writeback_single_inode+0xdc/0x4ac
writeback_single_inode+0x9c/0x144
write_inode_now+0xc4/0xec
iput+0x194/0x22c
f2fs_put_super+0x11c/0x1e8
generic_shutdown_super+0x70/0xf4
kill_block_super+0x2c/0x5c
kill_f2fs_super+0x44/0x50
deactivate_locked_super+0x60/0x8c
deactivate_super+0x68/0x74
cleanup_mnt+0x40/0x78
Fix this by moving f2fs_destroy_stats() further below iput() in
both f2fs_put_super() and f2fs_fill_super() paths.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Treat "block_count" from struct f2fs_super_block as 64-bit little endian
value in sanity_check_raw_super() because struct f2fs_super_block
declares "block_count" as "__le64".
This fixes a bug where the superblock validation fails on big endian
devices with the following error:
F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
F2FS-fs (sda1): Can't find valid F2FS filesystem in 1th superblock
F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
F2FS-fs (sda1): Can't find valid F2FS filesystem in 2th superblock
As result of this the partition cannot be mounted.
With this patch applied the superblock validation works fine and the
partition can be mounted again:
F2FS-fs (sda1): Mounted with checkpoint version = 7c84
My little endian x86-64 hardware was able to mount the partition without
this fix.
To confirm that mounting f2fs filesystems works on big endian machines
again I tested this on a 32-bit MIPS big endian (lantiq) device.
Fixes: 0cfe75c5b01199 ("f2fs: enhance sanity_check_raw_super() to avoid potential overflows")
Cc: stable@vger.kernel.org
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This fixes missing unlock call.
Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
One report says memalloc failure during mount.
(unwind_backtrace) from [<c010cd4c>] (show_stack+0x10/0x14)
(show_stack) from [<c049c6b8>] (dump_stack+0x8c/0xa0)
(dump_stack) from [<c024fcf0>] (warn_alloc+0xc4/0x160)
(warn_alloc) from [<c0250218>] (__alloc_pages_nodemask+0x3f4/0x10d0)
(__alloc_pages_nodemask) from [<c0270450>] (kmalloc_order_trace+0x2c/0x120)
(kmalloc_order_trace) from [<c03fa748>] (build_node_manager+0x35c/0x688)
(build_node_manager) from [<c03de494>] (f2fs_fill_super+0xf0c/0x16cc)
(f2fs_fill_super) from [<c02a5864>] (mount_bdev+0x15c/0x188)
(mount_bdev) from [<c03da624>] (f2fs_mount+0x18/0x20)
(f2fs_mount) from [<c02a68b8>] (mount_fs+0x158/0x19c)
(mount_fs) from [<c02c3c9c>] (vfs_kern_mount+0x78/0x134)
(vfs_kern_mount) from [<c02c76ac>] (do_mount+0x474/0xca4)
(do_mount) from [<c02c8264>] (SyS_mount+0x94/0xbc)
(SyS_mount) from [<c0108180>] (ret_fast_syscall+0x0/0x48)
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Those strings are immutable.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Section is minimal garbage collection unit of f2fs, in zoned block
device, or ancient block mapping flash device, in order to improve
GC efficiency, we can align GC unit to lower device erase unit,
normally, it consists of multiple of segments.
Once background or foreground GC triggers, it brings a large number
of IOs, which will impact user IO, and also occupy cpu/memory resource
intensively.
So, to reduce impact of GC on large size section, this patch supports
subsectional GC, in one cycle of GC, it only migrate partial segment{s}
in victim section. Currently, by default, we use sbi->segs_per_sec as
migration granularity.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In F2FS_HAS_FEATURE(), we will use F2FS_SB(sb) to get sbi pointer to
access .raw_super field, to avoid unneeded pointer conversion, this
patch changes to F2FS_HAS_FEATURE() accept sbi parameter directly.
Just do cleanup, no logic change.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
During recover, we will try to create new dentries for inodes with
dentry_mark. But if the parent is missing (e.g. killed by fsck),
recover will break. But those recovered dirty pages are not cleanup.
This will hit f2fs_bug_on:
[ 53.519566] F2FS-fs (loop0): Found nat_bits in checkpoint
[ 53.539354] F2FS-fs (loop0): recover_inode: ino = 5, name = file, inline = 3
[ 53.539402] F2FS-fs (loop0): recover_dentry: ino = 5, name = file, dir = 0, err = -2
[ 53.545760] F2FS-fs (loop0): Cannot recover all fsync data errno=-2
[ 53.546105] F2FS-fs (loop0): access invalid blkaddr:4294967295
[ 53.546171] WARNING: CPU: 1 PID: 1798 at fs/f2fs/checkpoint.c:163 f2fs_is_valid_blkaddr+0x26c/0x320
[ 53.546174] Modules linked in:
[ 53.546183] CPU: 1 PID: 1798 Comm: mount Not tainted 4.19.0-rc2+ #1
[ 53.546186] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 53.546191] RIP: 0010:f2fs_is_valid_blkaddr+0x26c/0x320
[ 53.546195] Code: 85 bb 00 00 00 48 89 df 88 44 24 07 e8 ad a8 db ff 48 8b 3b 44 89 e1 48 c7 c2 40 03 72 a9 48 c7 c6 e0 01 72 a9 e8 84 3c ff ff <0f> 0b 0f b6 44 24 07 e9 8a 00 00 00 48 8d bf 38 01 00 00 e8 7c a8
[ 53.546201] RSP: 0018:ffff88006c067768 EFLAGS: 00010282
[ 53.546208] RAX: 0000000000000000 RBX: ffff880068844200 RCX: ffffffffa83e1a33
[ 53.546211] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88006d51e590
[ 53.546215] RBP: 0000000000000005 R08: ffffed000daa3cb3 R09: ffffed000daa3cb3
[ 53.546218] R10: 0000000000000001 R11: ffffed000daa3cb2 R12: 00000000ffffffff
[ 53.546221] R13: ffff88006a1f8000 R14: 0000000000000200 R15: 0000000000000009
[ 53.546226] FS: 00007fb2f3646840(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
[ 53.546229] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 53.546234] CR2: 00007f0fd77f0008 CR3: 00000000687e6002 CR4: 00000000000206e0
[ 53.546237] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 53.546240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 53.546242] Call Trace:
[ 53.546248] f2fs_submit_page_bio+0x95/0x740
[ 53.546253] read_node_page+0x161/0x1e0
[ 53.546271] ? truncate_node+0x650/0x650
[ 53.546283] ? add_to_page_cache_lru+0x12c/0x170
[ 53.546288] ? pagecache_get_page+0x262/0x2d0
[ 53.546292] __get_node_page+0x200/0x660
[ 53.546302] f2fs_update_inode_page+0x4a/0x160
[ 53.546306] f2fs_write_inode+0x86/0xb0
[ 53.546317] __writeback_single_inode+0x49c/0x620
[ 53.546322] writeback_single_inode+0xe4/0x1e0
[ 53.546326] sync_inode_metadata+0x93/0xd0
[ 53.546330] ? sync_inode+0x10/0x10
[ 53.546342] ? do_raw_spin_unlock+0xed/0x100
[ 53.546347] f2fs_sync_inode_meta+0xe0/0x130
[ 53.546351] f2fs_fill_super+0x287d/0x2d10
[ 53.546367] ? vsnprintf+0x742/0x7a0
[ 53.546372] ? f2fs_commit_super+0x180/0x180
[ 53.546379] ? up_write+0x20/0x40
[ 53.546385] ? set_blocksize+0x5f/0x140
[ 53.546391] ? f2fs_commit_super+0x180/0x180
[ 53.546402] mount_bdev+0x181/0x200
[ 53.546406] mount_fs+0x94/0x180
[ 53.546411] vfs_kern_mount+0x6c/0x1e0
[ 53.546415] do_mount+0xe5e/0x1510
[ 53.546420] ? fs_reclaim_release+0x9/0x30
[ 53.546424] ? copy_mount_string+0x20/0x20
[ 53.546428] ? fs_reclaim_acquire+0xd/0x30
[ 53.546435] ? __might_sleep+0x2c/0xc0
[ 53.546440] ? ___might_sleep+0x53/0x170
[ 53.546453] ? __might_fault+0x4c/0x60
[ 53.546468] ? _copy_from_user+0x95/0xa0
[ 53.546474] ? memdup_user+0x39/0x60
[ 53.546478] ksys_mount+0x88/0xb0
[ 53.546482] __x64_sys_mount+0x5d/0x70
[ 53.546495] do_syscall_64+0x65/0x130
[ 53.546503] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.547639] ---[ end trace b804d1ea2fec893e ]---
So if recover fails, we need to drop all recovered data.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch changes codes as below:
- use f2fs_set_inode_flags() to update i_flags atomically to avoid
potential race.
- synchronize F2FS_I(inode)->i_flags to inode->i_flags in
f2fs_new_inode().
- use f2fs_set_inode_flags() to simply codes in f2fs_quota_{on,off}.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Since we can use the filesystem without quotas till next boot.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Note that, it requires "f2fs: return correct errno in f2fs_gc".
This adds a lightweight non-persistent snapshotting scheme to f2fs.
To use, mount with the option checkpoint=disable, and to return to
normal operation, remount with checkpoint=enable. If the filesystem
is shut down before remounting with checkpoint=enable, it will revert
back to its apparent state when it was first mounted with
checkpoint=disable. This is useful for situations where you wish to be
able to roll back the state of the disk in case of some critical
failure.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
[Jaegeuk Kim: use SB_RDONLY instead of MS_RDONLY]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Now we support crc32 checksum for superblock.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch fixes losing lazytime when remounting f2fs.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Currently we show mount option "io_bits=%u" as "io_size=%uKB",
it will cause option parsing problem(unrecognized mount option)
in remount.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This helps to control the frequency of submission of discard and
GC requests independently, based on the need.
Suggested-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch adds to support injecting error for write IO, this can simulate
IO error like fail_make_request or dm_flakey does.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Remove the verbose license text from f2fs files and replace them with
SPDX tags. This does not change the license of any of the code.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
CONFIG_F2FS_FAULT_INJECTION
It's a little bit strange when fault_injection related
options fail with -EINVAL which were already disabled
from config, so surround all fault_injection related option
parsing code using CONFIG_F2FS_FAULT_INJECTION. Meanwhile,
slightly change warning message to keep consistency with
option POSIX_ACL and FS_XATTR.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
https://bugzilla.kernel.org/show_bug.cgi?id=200219
Reproduction way:
- mount image
- run poc code
- umount image
F2FS-fs (loop1): Bitmap was wrongly set, blk:15364
------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/segment.c:2061!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 17686 Comm: umount Tainted: G W O 4.18.0-rc2+ #39
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
EIP: update_sit_entry+0x459/0x4e0 [f2fs]
Code: e8 1c b5 fd ff 0f 0b 0f 0b 8b 45 e4 c7 44 24 08 9c 7a 6c f8 c7 44 24 04 bc 4a 6c f8 89 44 24 0c 8b 06 89 04 24 e8 f7 b4 fd ff <0f> 0b 8b 45 e4 0f b6 d2 89 54 24 10 c7 44 24 08 60 7a 6c f8 c7 44
EAX: 00000032 EBX: 000000f8 ECX: 00000002 EDX: 00000001
ESI: d7177000 EDI: f520fe68 EBP: d6477c6c ESP: d6477c34
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010282
CR0: 80050033 CR2: b7fbe000 CR3: 2a99b3c0 CR4: 000406f0
Call Trace:
f2fs_allocate_data_block+0x124/0x580 [f2fs]
do_write_page+0x78/0x150 [f2fs]
f2fs_do_write_node_page+0x25/0xa0 [f2fs]
__write_node_page+0x2bf/0x550 [f2fs]
f2fs_sync_node_pages+0x60e/0x6d0 [f2fs]
? sync_inode_metadata+0x2f/0x40
? f2fs_write_checkpoint+0x28f/0x7d0 [f2fs]
? up_write+0x1e/0x80
f2fs_write_checkpoint+0x2a9/0x7d0 [f2fs]
? mark_held_locks+0x5d/0x80
? _raw_spin_unlock_irq+0x27/0x50
kill_f2fs_super+0x68/0x90 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x81/0xa0
exit_to_usermode_loop+0x59/0xa7
do_fast_syscall_32+0x1f5/0x22c
entry_SYSENTER_32+0x53/0x86
EIP: 0xb7f95c51
Code: c1 1e f7 ff ff 89 e5 8b 55 08 85 d2 8b 81 64 cd ff ff 74 02 89 02 5d c3 8b 0c 24 c3 8b 1c 24 c3 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
EAX: 00000000 EBX: 0871ab90 ECX: bfb2cd00 EDX: 00000000
ESI: 00000000 EDI: 0871ab90 EBP: 0871ab90 ESP: bfb2cd7c
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000246
Modules linked in: f2fs(O) crc32_generic bnep rfcomm bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pcbc joydev aesni_intel snd_seq_device aes_i586 snd_timer crypto_simd snd cryptd soundcore mac_hid serio_raw video i2c_piix4 parport_pc ppdev lp parport hid_generic psmouse usbhid hid e1000 [last unloaded: f2fs]
---[ end trace d423f83982cfcdc5 ]---
The reason is, different log headers using the same segment, once
one log's next block address is used by another log, it will cause
panic as above.
Main area: 24 segs, 24 secs 24 zones
- COLD data: 0, 0, 0
- WARM data: 1, 1, 1
- HOT data: 20, 20, 20
- Dir dnode: 22, 22, 22
- File dnode: 22, 22, 22
- Indir nodes: 21, 21, 21
So this patch adds sanity check to detect such condition to avoid
this issue.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In fill_super -> init_percpu_info, we should destroy percpu counter
in error path, otherwise memory allcoated for percpu counter will
leak.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
It needs to release memory allocated for sbi->write_io in error path,
otherwise, it will cause memory leak.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
generic/417 reported as blow:
------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:695!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 21697 Comm: umount Tainted: G W O 4.18.0-rc2+ #39
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
Call Trace:
? _raw_spin_unlock+0x2c/0x50
evict+0xa8/0x170
dispose_list+0x34/0x40
evict_inodes+0x118/0x120
generic_shutdown_super+0x41/0x100
? rcu_read_lock_sched_held+0x97/0xa0
kill_block_super+0x22/0x50
kill_f2fs_super+0x6f/0x80 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x81/0xa0
exit_to_usermode_loop+0x59/0xa7
do_fast_syscall_32+0x1f5/0x22c
entry_SYSENTER_32+0x53/0x86
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
It can simply reproduced with scripts:
Enable quota feature during mkfs.
Testcase1:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4k" -c "fsync"
4. godown /mnt/f2fs
5. umount /mnt/f2fs
6. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
7. umount /mnt/f2fs
Testcase2:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. touch /mnt/f2fs/file
4. create process[pid = x] do:
a) open /mnt/f2fs/file;
b) unlink /mnt/f2fs/file
5. godown -f /mnt/f2fs
6. kill process[pid = x]
7. umount /mnt/f2fs
8. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
9. umount /mnt/f2fs
The reason is: during recovery, i_{c,m}time of inode will be updated, then
the inode can be set dirty w/o being tracked in sbi->inode_list[DIRTY_META]
global list, so later write_checkpoint will not flush such dirty inode into
node page.
Once umount is called, sync_filesystem() in generic_shutdown_super() will
skip syncng dirty inodes due to sb_rdonly check, leaving dirty inodes
there.
To solve this issue, during umount, add remove SB_RDONLY flag in
sb->s_flags, to make sure sync_filesystem() will not be skipped.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Now, we depend on fsck to ensure quota file data is ok,
so we scan whole partition if checkpoint without umount
flag. It's same for quota off error case, which may make
quota file data inconsistent.
generic/019 reports below error:
__quota_error: 1160 callbacks suppressed
Quota error (device zram1): write_blk: dquota write failed
Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
Quota error (device zram1): write_blk: dquota write failed
Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
Quota error (device zram1): write_blk: dquota write failed
Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
Quota error (device zram1): write_blk: dquota write failed
Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
Quota error (device zram1): write_blk: dquota write failed
Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
VFS: Busy inodes after unmount of zram1. Self-destruct in 5 seconds. Have a nice day...
If we failed in below path due to fail to write dquot block, we will miss
to release quota inode, fix it.
- f2fs_put_super
- f2fs_quota_off_umount
- f2fs_quota_off
- f2fs_quota_sync <-- failed
- dquot_quota_off <-- missed to call
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Don't limit printing log, so that we will not miss any key messages.
This reverts commit a36c106dffb616250117efb1cab271c19a8f94ff.
In addition, we use printk_ratelimited to avoid too many log prints.
- error injection
- discard submission failure
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've tuned f2fs to improve general performance by
serializing block allocation and enhancing discard flows like fstrim
which avoids user IO contention. And we've added fsync_mode=nobarrier
which gives an option to user where it skips issuing cache_flush
commands to underlying flash storage. And there are many bug fixes
related to fuzzed images, revoked atomic writes, quota ops, and minor
direct IO.
Enhancements:
- add fsync_mode=nobarrier which bypasses cache_flush command
- enhance the discarding flow which avoids user IOs and issues in
LBA order
- readahead some encrypted blocks during GC
- enable in-memory inode checksum to verify the blocks if
F2FS_CHECK_FS is set
- enhance nat_bits behavior
- set -o discard by default
- set REQ_RAHEAD to bio in ->readpages
Bug fixes:
- fix a corner case to corrupt atomic_writes revoking flow
- revisit i_gc_rwsem to fix race conditions
- fix some dio behaviors captured by xfstests
- correct handling errors given by quota-related failures
- add many sanity check flows to avoid fuzz test failures
- add more error number propagation to their callers
- fix several corner cases to continue fault injection w/ shutdown
loop"
* tag 'f2fs-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (89 commits)
f2fs: readahead encrypted block during GC
f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc
f2fs: fix performance issue observed with multi-thread sequential read
f2fs: fix to skip verifying block address for non-regular inode
f2fs: rework fault injection handling to avoid a warning
f2fs: support fault_type mount option
f2fs: fix to return success when trimming meta area
f2fs: fix use-after-free of dicard command entry
f2fs: support discard submission error injection
f2fs: split discard command in prior to block layer
f2fs: wake up gc thread immediately when gc_urgent is set
f2fs: fix incorrect range->len in f2fs_trim_fs()
f2fs: refresh recent accessed nat entry in lru list
f2fs: fix avoid race between truncate and background GC
f2fs: avoid race between zero_range and background GC
f2fs: fix to do sanity check with block address in main area v2
f2fs: fix to do sanity check with inline flags
f2fs: fix to reset i_gc_failures correctly
f2fs: fix invalid memory access
f2fs: fix to avoid broken of dnode block list
...
|
|
This reverts the commit - "b93f771 - f2fs: remove writepages lock"
to fix the drop in sequential read throughput.
Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
device: UFS
Before -
read throughput: 185 MB/s
total read requests: 85177 (of these ~80000 are 4KB size requests).
total write requests: 2546 (of these ~2208 requests are written in 512KB).
After -
read throughput: 758 MB/s
total read requests: 2417 (of these ~2042 are 512KB reads).
total write requests: 2701 (of these ~2034 requests are written in 512KB).
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Previously, once fault injection is on, by default, all kind of faults
will be injected to f2fs, if we want to trigger single or specified
combined type during the test, we need to configure sysfs entry, it will
be a little inconvenient to integrate sysfs configuring into testsuit,
such as xfstest.
So this patch introduces a new mount option 'fault_type' to assist old
option 'fault_injection', with these two mount options, we can specify
any fault rate/type at mount-time.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch adds to support discard submission error injection for testing
error handling of __submit_discard_cmd().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
https://bugzilla.kernel.org/show_bug.cgi?id=200221
- Overview
BUG() in clear_inode() when mounting and un-mounting a corrupted f2fs image
- Reproduce
- Kernel message
[ 538.601448] F2FS-fs (loop0): Invalid segment/section count (31, 24 x 1376257)
[ 538.601458] F2FS-fs (loop0): Can't find valid F2FS filesystem in 2th superblock
[ 538.724091] F2FS-fs (loop0): Try to recover 2th superblock, ret: 0
[ 538.724102] F2FS-fs (loop0): Mounted with checkpoint version = 2
[ 540.970834] ------------[ cut here ]------------
[ 540.970838] kernel BUG at fs/inode.c:512!
[ 540.971750] invalid opcode: 0000 [#1] SMP KASAN PTI
[ 540.972755] CPU: 1 PID: 1305 Comm: umount Not tainted 4.18.0-rc1+ #4
[ 540.974034] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 540.982913] RIP: 0010:clear_inode+0xc0/0xd0
[ 540.983774] Code: 8d a3 30 01 00 00 4c 89 e7 e8 1c ec f8 ff 48 8b 83 30 01 00 00 49 39 c4 75 1a 48 c7 83 a0 00 00 00 60 00 00 00 5b 41 5c 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 1f 40 00 66 66 66 66 90 55
[ 540.987570] RSP: 0018:ffff8801e34a7b70 EFLAGS: 00010002
[ 540.988636] RAX: 0000000000000000 RBX: ffff8801e9b744e8 RCX: ffffffffb840eb3a
[ 540.990063] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8801e9b746b8
[ 540.991499] RBP: ffff8801e34a7b80 R08: ffffed003d36e8ce R09: ffffed003d36e8ce
[ 540.992923] R10: 0000000000000001 R11: ffffed003d36e8cd R12: ffff8801e9b74668
[ 540.994360] R13: ffff8801e9b74760 R14: ffff8801e9b74528 R15: ffff8801e9b74530
[ 540.995786] FS: 00007f4662bdf840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 540.997403] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 540.998571] CR2: 000000000175c568 CR3: 00000001dcfe6000 CR4: 00000000000006e0
[ 541.000015] Call Trace:
[ 541.000554] f2fs_evict_inode+0x253/0x630
[ 541.001381] evict+0x16f/0x290
[ 541.002015] iput+0x280/0x300
[ 541.002654] dentry_unlink_inode+0x165/0x1e0
[ 541.003528] __dentry_kill+0x16a/0x260
[ 541.004300] dentry_kill+0x70/0x250
[ 541.005018] dput+0x154/0x1d0
[ 541.005635] do_one_tree+0x34/0x40
[ 541.006354] shrink_dcache_for_umount+0x3f/0xa0
[ 541.007285] generic_shutdown_super+0x43/0x1c0
[ 541.008192] kill_block_super+0x52/0x80
[ 541.008978] kill_f2fs_super+0x62/0x70
[ 541.009750] deactivate_locked_super+0x6f/0xa0
[ 541.010664] deactivate_super+0x5e/0x80
[ 541.011450] cleanup_mnt+0x61/0xa0
[ 541.012151] __cleanup_mnt+0x12/0x20
[ 541.012893] task_work_run+0xc8/0xf0
[ 541.013635] exit_to_usermode_loop+0x125/0x130
[ 541.014555] do_syscall_64+0x138/0x170
[ 541.015340] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 541.016375] RIP: 0033:0x7f46624bf487
[ 541.017104] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 c9 2b 00 f7 d8 64 89 01 48
[ 541.020923] RSP: 002b:00007fff5e12e9a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 541.022452] RAX: 0000000000000000 RBX: 0000000001753030 RCX: 00007f46624bf487
[ 541.023885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000175a1e0
[ 541.025318] RBP: 000000000175a1e0 R08: 0000000000000000 R09: 0000000000000014
[ 541.026755] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f46629c883c
[ 541.028186] R13: 0000000000000000 R14: 0000000001753210 R15: 00007fff5e12ec30
[ 541.029626] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[ 541.039445] ---[ end trace 4ce02f25ff7d3df5 ]---
[ 541.040392] RIP: 0010:clear_inode+0xc0/0xd0
[ 541.041240] Code: 8d a3 30 01 00 00 4c 89 e7 e8 1c ec f8 ff 48 8b 83 30 01 00 00 49 39 c4 75 1a 48 c7 83 a0 00 00 00 60 00 00 00 5b 41 5c 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 1f 40 00 66 66 66 66 90 55
[ 541.045042] RSP: 0018:ffff8801e34a7b70 EFLAGS: 00010002
[ 541.046099] RAX: 0000000000000000 RBX: ffff8801e9b744e8 RCX: ffffffffb840eb3a
[ 541.047537] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8801e9b746b8
[ 541.048965] RBP: ffff8801e34a7b80 R08: ffffed003d36e8ce R09: ffffed003d36e8ce
[ 541.050402] R10: 0000000000000001 R11: ffffed003d36e8cd R12: ffff8801e9b74668
[ 541.051832] R13: ffff8801e9b74760 R14: ffff8801e9b74528 R15: ffff8801e9b74530
[ 541.053263] FS: 00007f4662bdf840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 541.054891] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 541.056039] CR2: 000000000175c568 CR3: 00000001dcfe6000 CR4: 00000000000006e0
[ 541.058506] ==================================================================
[ 541.059991] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x38c/0x3e0
[ 541.061513] Read of size 8 at addr ffff8801e34a7970 by task umount/1305
[ 541.063302] CPU: 1 PID: 1305 Comm: umount Tainted: G D 4.18.0-rc1+ #4
[ 541.064838] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 541.066778] Call Trace:
[ 541.067294] dump_stack+0x7b/0xb5
[ 541.067986] print_address_description+0x70/0x290
[ 541.068941] kasan_report+0x291/0x390
[ 541.069692] ? update_stack_state+0x38c/0x3e0
[ 541.070598] __asan_load8+0x54/0x90
[ 541.071315] update_stack_state+0x38c/0x3e0
[ 541.072172] ? __read_once_size_nocheck.constprop.7+0x20/0x20
[ 541.073340] ? vprintk_func+0x27/0x60
[ 541.074096] ? printk+0xa3/0xd3
[ 541.074762] ? __save_stack_trace+0x5e/0x100
[ 541.075634] unwind_next_frame.part.5+0x18e/0x490
[ 541.076594] ? unwind_dump+0x290/0x290
[ 541.077368] ? __show_regs+0x2c4/0x330
[ 541.078142] __unwind_start+0x106/0x190
[ 541.085422] __save_stack_trace+0x5e/0x100
[ 541.086268] ? __save_stack_trace+0x5e/0x100
[ 541.087161] ? unlink_anon_vmas+0xba/0x2c0
[ 541.087997] save_stack_trace+0x1f/0x30
[ 541.088782] save_stack+0x46/0xd0
[ 541.089475] ? __alloc_pages_slowpath+0x1420/0x1420
[ 541.090477] ? flush_tlb_mm_range+0x15e/0x220
[ 541.091364] ? __dec_node_state+0x24/0xb0
[ 541.092180] ? lock_page_memcg+0x85/0xf0
[ 541.092979] ? unlock_page_memcg+0x16/0x80
[ 541.093812] ? page_remove_rmap+0x198/0x520
[ 541.094674] ? mark_page_accessed+0x133/0x200
[ 541.095559] ? _cond_resched+0x1a/0x50
[ 541.096326] ? unmap_page_range+0xcd4/0xe50
[ 541.097179] ? rb_next+0x58/0x80
[ 541.097845] ? rb_next+0x58/0x80
[ 541.098518] __kasan_slab_free+0x13c/0x1a0
[ 541.099352] ? unlink_anon_vmas+0xba/0x2c0
[ 541.100184] kasan_slab_free+0xe/0x10
[ 541.100934] kmem_cache_free+0x89/0x1e0
[ 541.101724] unlink_anon_vmas+0xba/0x2c0
[ 541.102534] free_pgtables+0x101/0x1b0
[ 541.103299] exit_mmap+0x146/0x2a0
[ 541.103996] ? __ia32_sys_munmap+0x50/0x50
[ 541.104829] ? kasan_check_read+0x11/0x20
[ 541.105649] ? mm_update_next_owner+0x322/0x380
[ 541.106578] mmput+0x8b/0x1d0
[ 541.107191] do_exit+0x43a/0x1390
[ 541.107876] ? mm_update_next_owner+0x380/0x380
[ 541.108791] ? deactivate_super+0x5e/0x80
[ 541.109610] ? cleanup_mnt+0x61/0xa0
[ 541.110351] ? __cleanup_mnt+0x12/0x20
[ 541.111115] ? task_work_run+0xc8/0xf0
[ 541.111879] ? exit_to_usermode_loop+0x125/0x130
[ 541.112817] rewind_stack_do_exit+0x17/0x20
[ 541.113666] RIP: 0033:0x7f46624bf487
[ 541.114404] Code: Bad RIP value.
[ 541.115094] RSP: 002b:00007fff5e12e9a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 541.116605] RAX: 0000000000000000 RBX: 0000000001753030 RCX: 00007f46624bf487
[ 541.118034] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000175a1e0
[ 541.119472] RBP: 000000000175a1e0 R08: 0000000000000000 R09: 0000000000000014
[ 541.120890] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f46629c883c
[ 541.122321] R13: 0000000000000000 R14: 0000000001753210 R15: 00007fff5e12ec30
[ 541.124061] The buggy address belongs to the page:
[ 541.125042] page:ffffea00078d29c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 541.126651] flags: 0x2ffff0000000000()
[ 541.127418] raw: 02ffff0000000000 dead000000000100 dead000000000200 0000000000000000
[ 541.128963] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 541.130516] page dumped because: kasan: bad access detected
[ 541.131954] Memory state around the buggy address:
[ 541.132924] ffff8801e34a7800: 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 00
[ 541.134378] ffff8801e34a7880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 541.135814] >ffff8801e34a7900: 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1
[ 541.137253] ^
[ 541.138637] ffff8801e34a7980: f1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 541.140075] ffff8801e34a7a00: 00 00 00 00 00 00 00 00 f3 00 00 00 00 00 00 00
[ 541.141509] ==================================================================
- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/inode.c#L512
BUG_ON(inode->i_data.nrpages);
The root cause is root directory inode is corrupted, it has both
inline_data and inline_dentry flag, and its nlink is zero, so in
->evict(), after dropping all page cache, it grabs page #0 for inline
data truncation, result in panic in later clear_inode() where we will
check inode->i_data.nrpages value.
This patch adds inline flags check in sanity_check_inode, in addition,
do sanity check with root inode's nlink.
Reported-by Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs recovery flow is relying on dnode block link list, it means fsynced
file recovery depends on previous dnode's persistence in the list, so
during fsync() we should wait on all regular inode's dnode writebacked
before issuing flush.
By this way, we can avoid dnode block list being broken by out-of-order
IO submission due to IO scheduler or driver.
Sheng Yong helps to do the test with this patch:
Target:/data (f2fs, -)
64MB / 32768KB / 4KB / 8
1 / PERSIST / Index
Base:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 867.82 204.15 41440.03 41370.54 680.8 1025.94 1031.08
2 871.87 205.87 41370.3 40275.2 791.14 1065.84 1101.7
3 866.52 205.69 41795.67 40596.16 694.69 1037.16 1031.48
Avg 868.7366667 205.2366667 41535.33333 40747.3 722.21 1042.98 1054.753333
After:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 798.81 202.5 41143 40613.87 602.71 838.08 913.83
2 805.79 206.47 40297.2 41291.46 604.44 840.75 924.27
3 814.83 206.17 41209.57 40453.62 602.85 834.66 927.91
Avg 806.4766667 205.0466667 40883.25667 40786.31667 603.3333333 837.83 922.0033333
Patched/Original:
0.928332713 0.999074239 0.984300676 1.000957528 0.835398753 0.803303994 0.874141189
It looks like atomic write will suffer performance regression.
I suspect that the criminal is that we forcing to wait all dnode being in
storage cache before we issue PREFLUSH+FUA.
BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
cause the problem: we will lose data of last transaction after SPO, even if
atomic write return no error:
- atomic_open();
- write() P1, P2, P3;
- atomic_commit();
- writeback data: P1, P2, P3;
- writeback node: N1, N2, N3; <--- If N1, N2 is not writebacked, N3 with fsync_mark is
writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing
last transaction.
- preflush + fua;
- power-cut
If we don't wait dnode writeback for atomic_write:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 779.91 206.03 41621.5 40333.16 716.9 1038.21 1034.85
2 848.51 204.35 40082.44 39486.17 791.83 1119.96 1083.77
3 772.12 206.27 41335.25 41599.65 723.29 1055.07 971.92
Avg 800.18 205.55 41013.06333 40472.99333 744.0066667 1071.08 1030.18
Patched/Original:
0.92108464 1.001526693 0.987425886 0.993268102 1.030180511 1.026942031 0.976702294
SQLite's performance recovers.
Jaegeuk:
"Practically, I don't see db corruption becase of this. We can excuse to lose
the last transaction."
Finally, we decide to keep original implementation of atomic write interface
sematics that we don't wait all dnode writeback before preflush+fua submission.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
After fuzzing, cp_pack_start_sum could be corrupted, so current log's
summary info should be wrong due to loading incorrect summary block.
Then, if segment's type in current log is exceeded NR_CURSEG_TYPE, it
can lead accessing invalid dirty_i->dirty_segmap bitmap finally.
Add sanity check for cp_pack_start_sum to fix this issue.
https://bugzilla.kernel.org/show_bug.cgi?id=200419
- Reproduce
- Kernel message (f2fs-dev w/ KASAN)
[ 3117.578432] F2FS-fs (loop0): Invalid log blocks per segment (8)
[ 3117.578445] F2FS-fs (loop0): Can't find valid F2FS filesystem in 2th superblock
[ 3117.581364] F2FS-fs (loop0): invalid crc_offset: 30716
[ 3117.583564] WARNING: CPU: 1 PID: 1225 at fs/f2fs/checkpoint.c:90 __get_meta_page+0x448/0x4b0
[ 3117.583570] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev input_leds serio_raw snd soundcore mac_hid i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 8139too qxl ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel psmouse aes_x86_64 8139cp crypto_simd cryptd mii glue_helper pata_acpi floppy
[ 3117.584014] CPU: 1 PID: 1225 Comm: mount Not tainted 4.17.0+ #1
[ 3117.584017] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 3117.584022] RIP: 0010:__get_meta_page+0x448/0x4b0
[ 3117.584023] Code: 00 49 8d bc 24 84 00 00 00 e8 74 54 da ff 41 83 8c 24 84 00 00 00 08 4c 89 f6 4c 89 ef e8 c0 d9 95 00 48 89 ef e8 18 e3 00 00 <0f> 0b f0 80 4d 48 04 e9 0f fe ff ff 0f 0b 48 89 c7 48 89 04 24 e8
[ 3117.584072] RSP: 0018:ffff88018eb678c0 EFLAGS: 00010286
[ 3117.584082] RAX: ffff88018f0a6a78 RBX: ffffea0007a46600 RCX: ffffffff9314d1b2
[ 3117.584085] RDX: ffffffff00000001 RSI: 0000000000000000 RDI: ffff88018f0a6a98
[ 3117.584087] RBP: ffff88018ebe9980 R08: 0000000000000002 R09: 0000000000000001
[ 3117.584090] R10: 0000000000000001 R11: ffffed00326e4450 R12: ffff880193722200
[ 3117.584092] R13: ffff88018ebe9afc R14: 0000000000000206 R15: ffff88018eb67900
[ 3117.584096] FS: 00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
[ 3117.584098] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3117.584101] CR2: 00000000016f21b8 CR3: 0000000191c22000 CR4: 00000000000006e0
[ 3117.584112] Call Trace:
[ 3117.584121] ? f2fs_set_meta_page_dirty+0x150/0x150
[ 3117.584127] ? f2fs_build_segment_manager+0xbf9/0x3190
[ 3117.584133] ? f2fs_npages_for_summary_flush+0x75/0x120
[ 3117.584145] f2fs_build_segment_manager+0xda8/0x3190
[ 3117.584151] ? f2fs_get_valid_checkpoint+0x298/0xa00
[ 3117.584156] ? f2fs_flush_sit_entries+0x10e0/0x10e0
[ 3117.584184] ? map_id_range_down+0x17c/0x1b0
[ 3117.584188] ? __put_user_ns+0x30/0x30
[ 3117.584206] ? find_next_bit+0x53/0x90
[ 3117.584237] ? cpumask_next+0x16/0x20
[ 3117.584249] f2fs_fill_super+0x1948/0x2b40
[ 3117.584258] ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.584279] ? sget_userns+0x65e/0x690
[ 3117.584296] ? set_blocksize+0x88/0x130
[ 3117.584302] ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.584305] mount_bdev+0x1c0/0x200
[ 3117.584310] mount_fs+0x5c/0x190
[ 3117.584320] vfs_kern_mount+0x64/0x190
[ 3117.584330] do_mount+0x2e4/0x1450
[ 3117.584343] ? lockref_put_return+0x130/0x130
[ 3117.584347] ? copy_mount_string+0x20/0x20
[ 3117.584357] ? kasan_unpoison_shadow+0x31/0x40
[ 3117.584362] ? kasan_kmalloc+0xa6/0xd0
[ 3117.584373] ? memcg_kmem_put_cache+0x16/0x90
[ 3117.584377] ? __kmalloc_track_caller+0x196/0x210
[ 3117.584383] ? _copy_from_user+0x61/0x90
[ 3117.584396] ? memdup_user+0x3e/0x60
[ 3117.584401] ksys_mount+0x7e/0xd0
[ 3117.584405] __x64_sys_mount+0x62/0x70
[ 3117.584427] do_syscall_64+0x73/0x160
[ 3117.584440] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3117.584455] RIP: 0033:0x7f5693f14b9a
[ 3117.584456] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[ 3117.584505] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 3117.584510] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a
[ 3117.584512] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040
[ 3117.584514] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 3117.584516] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040
[ 3117.584519] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003
[ 3117.584523] ---[ end trace a8e0d899985faf31 ]---
[ 3117.685663] F2FS-fs (loop0): f2fs_check_nid_range: out-of-range nid=2, run fsck to fix.
[ 3117.685673] F2FS-fs (loop0): recover_data: ino = 2 (i_size: recover) recovered = 1, err = 0
[ 3117.685707] ==================================================================
[ 3117.685955] BUG: KASAN: slab-out-of-bounds in __remove_dirty_segment+0xdd/0x1e0
[ 3117.686175] Read of size 8 at addr ffff88018f0a63d0 by task mount/1225
[ 3117.686477] CPU: 0 PID: 1225 Comm: mount Tainted: G W 4.17.0+ #1
[ 3117.686481] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 3117.686483] Call Trace:
[ 3117.686494] dump_stack+0x71/0xab
[ 3117.686512] print_address_description+0x6b/0x290
[ 3117.686517] kasan_report+0x28e/0x390
[ 3117.686522] ? __remove_dirty_segment+0xdd/0x1e0
[ 3117.686527] __remove_dirty_segment+0xdd/0x1e0
[ 3117.686532] locate_dirty_segment+0x189/0x190
[ 3117.686538] f2fs_allocate_new_segments+0xa9/0xe0
[ 3117.686543] recover_data+0x703/0x2c20
[ 3117.686547] ? f2fs_recover_fsync_data+0x48f/0xd50
[ 3117.686553] ? ksys_mount+0x7e/0xd0
[ 3117.686564] ? policy_nodemask+0x1a/0x90
[ 3117.686567] ? policy_node+0x56/0x70
[ 3117.686571] ? add_fsync_inode+0xf0/0xf0
[ 3117.686592] ? blk_finish_plug+0x44/0x60
[ 3117.686597] ? f2fs_ra_meta_pages+0x38b/0x5e0
[ 3117.686602] ? find_inode_fast+0xac/0xc0
[ 3117.686606] ? f2fs_is_valid_blkaddr+0x320/0x320
[ 3117.686618] ? __radix_tree_lookup+0x150/0x150
[ 3117.686633] ? dqget+0x670/0x670
[ 3117.686648] ? pagecache_get_page+0x29/0x410
[ 3117.686656] ? kmem_cache_alloc+0x176/0x1e0
[ 3117.686660] ? f2fs_is_valid_blkaddr+0x11d/0x320
[ 3117.686664] f2fs_recover_fsync_data+0xc23/0xd50
[ 3117.686670] ? f2fs_space_for_roll_forward+0x60/0x60
[ 3117.686674] ? rb_insert_color+0x323/0x3d0
[ 3117.686678] ? f2fs_recover_orphan_inodes+0xa5/0x700
[ 3117.686683] ? proc_register+0x153/0x1d0
[ 3117.686686] ? f2fs_remove_orphan_inode+0x10/0x10
[ 3117.686695] ? f2fs_attr_store+0x50/0x50
[ 3117.686700] ? proc_create_single_data+0x52/0x60
[ 3117.686707] f2fs_fill_super+0x1d06/0x2b40
[ 3117.686728] ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.686735] ? sget_userns+0x65e/0x690
[ 3117.686740] ? set_blocksize+0x88/0x130
[ 3117.686745] ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.686748] mount_bdev+0x1c0/0x200
[ 3117.686753] mount_fs+0x5c/0x190
[ 3117.686758] vfs_kern_mount+0x64/0x190
[ 3117.686762] do_mount+0x2e4/0x1450
[ 3117.686769] ? lockref_put_return+0x130/0x130
[ 3117.686773] ? copy_mount_string+0x20/0x20
[ 3117.686777] ? kasan_unpoison_shadow+0x31/0x40
[ 3117.686780] ? kasan_kmalloc+0xa6/0xd0
[ 3117.686786] ? memcg_kmem_put_cache+0x16/0x90
[ 3117.686790] ? __kmalloc_track_caller+0x196/0x210
[ 3117.686795] ? _copy_from_user+0x61/0x90
[ 3117.686801] ? memdup_user+0x3e/0x60
[ 3117.686804] ksys_mount+0x7e/0xd0
[ 3117.686809] __x64_sys_mount+0x62/0x70
[ 3117.686816] do_syscall_64+0x73/0x160
[ 3117.686824] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3117.686829] RIP: 0033:0x7f5693f14b9a
[ 3117.686830] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[ 3117.686887] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 3117.686892] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a
[ 3117.686894] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040
[ 3117.686896] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 3117.686899] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040
[ 3117.686901] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003
[ 3117.687005] Allocated by task 1225:
[ 3117.687152] kasan_kmalloc+0xa6/0xd0
[ 3117.687157] kmem_cache_alloc_trace+0xfd/0x200
[ 3117.687161] f2fs_build_segment_manager+0x2d09/0x3190
[ 3117.687165] f2fs_fill_super+0x1948/0x2b40
[ 3117.687168] mount_bdev+0x1c0/0x200
[ 3117.687171] mount_fs+0x5c/0x190
[ 3117.687174] vfs_kern_mount+0x64/0x190
[ 3117.687177] do_mount+0x2e4/0x1450
[ 3117.687180] ksys_mount+0x7e/0xd0
[ 3117.687182] __x64_sys_mount+0x62/0x70
[ 3117.687186] do_syscall_64+0x73/0x160
[ 3117.687190] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3117.687285] Freed by task 19:
[ 3117.687412] __kasan_slab_free+0x137/0x190
[ 3117.687416] kfree+0x8b/0x1b0
[ 3117.687460] ttm_bo_man_put_node+0x61/0x80 [ttm]
[ 3117.687476] ttm_bo_cleanup_refs+0x15f/0x250 [ttm]
[ 3117.687492] ttm_bo_delayed_delete+0x2f0/0x300 [ttm]
[ 3117.687507] ttm_bo_delayed_workqueue+0x17/0x50 [ttm]
[ 3117.687528] process_one_work+0x2f9/0x740
[ 3117.687531] worker_thread+0x78/0x6b0
[ 3117.687541] kthread+0x177/0x1c0
[ 3117.687545] ret_from_fork+0x35/0x40
[ 3117.687638] The buggy address belongs to the object at ffff88018f0a6300
which belongs to the cache kmalloc-192 of size 192
[ 3117.688014] The buggy address is located 16 bytes to the right of
192-byte region [ffff88018f0a6300, ffff88018f0a63c0)
[ 3117.688382] The buggy address belongs to the page:
[ 3117.688554] page:ffffea00063c2980 count:1 mapcount:0 mapping:ffff8801f3403180 index:0x0
[ 3117.688788] flags: 0x17fff8000000100(slab)
[ 3117.688944] raw: 017fff8000000100 ffffea00063c2840 0000000e0000000e ffff8801f3403180
[ 3117.689166] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
[ 3117.689386] page dumped because: kasan: bad access detected
[ 3117.689653] Memory state around the buggy address:
[ 3117.689816] ffff88018f0a6280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3117.690027] ffff88018f0a6300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 3117.690239] >ffff88018f0a6380: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 3117.690448] ^
[ 3117.690644] ffff88018f0a6400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 3117.690868] ffff88018f0a6480: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 3117.691077] ==================================================================
[ 3117.691290] Disabling lock debugging due to kernel taint
[ 3117.693893] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 3117.694120] PGD 80000001f01bc067 P4D 80000001f01bc067 PUD 1d9638067 PMD 0
[ 3117.694338] Oops: 0002 [#1] SMP KASAN PTI
[ 3117.694490] CPU: 1 PID: 1225 Comm: mount Tainted: G B W 4.17.0+ #1
[ 3117.694703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 3117.695073] RIP: 0010:__remove_dirty_segment+0xe2/0x1e0
[ 3117.695246] Code: c4 48 89 c7 e8 cf bb d7 ff 45 0f b6 24 24 41 83 e4 3f 44 88 64 24 07 41 83 e4 3f 4a 8d 7c e3 08 e8 b3 bc d7 ff 4a 8b 4c e3 08 <f0> 4c 0f b3 29 0f 82 94 00 00 00 48 8d bd 20 04 00 00 e8 97 bb d7
[ 3117.695793] RSP: 0018:ffff88018eb67638 EFLAGS: 00010292
[ 3117.695969] RAX: 0000000000000000 RBX: ffff88018f0a6300 RCX: 0000000000000000
[ 3117.696182] RDX: 0000000000000000 RSI: 0000000000000297 RDI: 0000000000000297
[ 3117.696391] RBP: ffff88018ebe9980 R08: ffffed003e743ebb R09: ffffed003e743ebb
[ 3117.696604] R10: 0000000000000001 R11: ffffed003e743eba R12: 0000000000000019
[ 3117.696813] R13: 0000000000000014 R14: 0000000000000320 R15: ffff88018ebe99e0
[ 3117.697032] FS: 00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
[ 3117.697280] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3117.702357] CR2: 00007fe89bb1a000 CR3: 0000000191c22000 CR4: 00000000000006e0
[ 3117.707235] Call Trace:
[ 3117.712077] locate_dirty_segment+0x189/0x190
[ 3117.716891] f2fs_allocate_new_segments+0xa9/0xe0
[ 3117.721617] recover_data+0x703/0x2c20
[ 3117.726316] ? f2fs_recover_fsync_data+0x48f/0xd50
[ 3117.730957] ? ksys_mount+0x7e/0xd0
[ 3117.735573] ? policy_nodemask+0x1a/0x90
[ 3117.740198] ? policy_node+0x56/0x70
[ 3117.744829] ? add_fsync_inode+0xf0/0xf0
[ 3117.749487] ? blk_finish_plug+0x44/0x60
[ 3117.754152] ? f2fs_ra_meta_pages+0x38b/0x5e0
[ 3117.758831] ? find_inode_fast+0xac/0xc0
[ 3117.763448] ? f2fs_is_valid_blkaddr+0x320/0x320
[ 3117.768046] ? __radix_tree_lookup+0x150/0x150
[ 3117.772603] ? dqget+0x670/0x670
[ 3117.777159] ? pagecache_get_page+0x29/0x410
[ 3117.781648] ? kmem_cache_alloc+0x176/0x1e0
[ 3117.786067] ? f2fs_is_valid_blkaddr+0x11d/0x320
[ 3117.790476] f2fs_recover_fsync_data+0xc23/0xd50
[ 3117.794790] ? f2fs_space_for_roll_forward+0x60/0x60
[ 3117.799086] ? rb_insert_color+0x323/0x3d0
[ 3117.803304] ? f2fs_recover_orphan_inodes+0xa5/0x700
[ 3117.807563] ? proc_register+0x153/0x1d0
[ 3117.811766] ? f2fs_remove_orphan_inode+0x10/0x10
[ 3117.815947] ? f2fs_attr_store+0x50/0x50
[ 3117.820087] ? proc_create_single_data+0x52/0x60
[ 3117.824262] f2fs_fill_super+0x1d06/0x2b40
[ 3117.828367] ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.832432] ? sget_userns+0x65e/0x690
[ 3117.836500] ? set_blocksize+0x88/0x130
[ 3117.840501] ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.844420] mount_bdev+0x1c0/0x200
[ 3117.848275] mount_fs+0x5c/0x190
[ 3117.852053] vfs_kern_mount+0x64/0x190
[ 3117.855810] do_mount+0x2e4/0x1450
[ 3117.859441] ? lockref_put_return+0x130/0x130
[ 3117.862996] ? copy_mount_string+0x20/0x20
[ 3117.866417] ? kasan_unpoison_shadow+0x31/0x40
[ 3117.869719] ? kasan_kmalloc+0xa6/0xd0
[ 3117.872948] ? memcg_kmem_put_cache+0x16/0x90
[ 3117.876121] ? __kmalloc_track_caller+0x196/0x210
[ 3117.879333] ? _copy_from_user+0x61/0x90
[ 3117.882467] ? memdup_user+0x3e/0x60
[ 3117.885604] ksys_mount+0x7e/0xd0
[ 3117.888700] __x64_sys_mount+0x62/0x70
[ 3117.891742] do_syscall_64+0x73/0x160
[ 3117.894692] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3117.897669] RIP: 0033:0x7f5693f14b9a
[ 3117.900563] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[ 3117.906922] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 3117.910159] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a
[ 3117.913469] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040
[ 3117.916764] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 3117.920071] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040
[ 3117.923393] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003
[ 3117.926680] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev input_leds serio_raw snd soundcore mac_hid i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 8139too qxl ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel psmouse aes_x86_64 8139cp crypto_simd cryptd mii glue_helper pata_acpi floppy
[ 3117.949979] CR2: 0000000000000000
[ 3117.954283] ---[ end trace a8e0d899985faf32 ]---
[ 3117.958575] RIP: 0010:__remove_dirty_segment+0xe2/0x1e0
[ 3117.962810] Code: c4 48 89 c7 e8 cf bb d7 ff 45 0f b6 24 24 41 83 e4 3f 44 88 64 24 07 41 83 e4 3f 4a 8d 7c e3 08 e8 b3 bc d7 ff 4a 8b 4c e3 08 <f0> 4c 0f b3 29 0f 82 94 00 00 00 48 8d bd 20 04 00 00 e8 97 bb d7
[ 3117.971789] RSP: 0018:ffff88018eb67638 EFLAGS: 00010292
[ 3117.976333] RAX: 0000000000000000 RBX: ffff88018f0a6300 RCX: 0000000000000000
[ 3117.980926] RDX: 0000000000000000 RSI: 0000000000000297 RDI: 0000000000000297
[ 3117.985497] RBP: ffff88018ebe9980 R08: ffffed003e743ebb R09: ffffed003e743ebb
[ 3117.990098] R10: 0000000000000001 R11: ffffed003e743eba R12: 0000000000000019
[ 3117.994761] R13: 0000000000000014 R14: 0000000000000320 R15: ffff88018ebe99e0
[ 3117.999392] FS: 00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
[ 3118.004096] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3118.008816] CR2: 00007fe89bb1a000 CR3: 0000000191c22000 CR4: 00000000000006e0
- Location
https://elixir.bootlin.com/linux/v4.18-rc3/source/fs/f2fs/segment.c#L775
if (test_and_clear_bit(segno, dirty_i->dirty_segmap[t]))
dirty_i->nr_dirty[t]--;
Here dirty_i->dirty_segmap[t] can be NULL which leads to crash in test_and_clear_bit()
Reported-by Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Like quota_ino feature, we need to reject mounting RDWR with image
which enables project_quota feature when there is no CONFIG_QUOTA
be set in kernel.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If quota feature is enabled, quota is on by default. However, if
CONFIG_QUOTA is not built in kernel, dquot entries will not get updated,
which leads to quota inconsistency.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
According to fs/quota/dquot.c, `dq_data_lock' protects mem_dqinfo
structures and modifications of dquot pointers in the inode, and
`dquot->dq_dqb_lock' protects data from dq_dqb.
We should use dquot->dq_dqb_lock in statfs_project instead of
dq_dat_lock.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs is focused on flash based storage, so let's enable real-time
discard by default, if user don't want to enable it, 'nodiscard'
mount option should be used on mount.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch shows the fsync_mode=nobarrier mount option in
f2fs_show_options().
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Non-prefix global name 'fault_name' will pollute global
namespace, fix it.
Refer to:
https://lists.01.org/pipermail/kbuild-all/2018-June/049660.html
To: Jaegeuk Kim <jaegeuk@kernel.org>
To: Chao Yu <yuchao0@huawei.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch fixs to do sanity check with user_block_count.
- Overview
Divide zero in utilization when mount() a corrupted f2fs image
- Reproduce (4.18 upstream kernel)
- Kernel message
[ 564.099503] F2FS-fs (loop0): invalid crc value
[ 564.101991] divide error: 0000 [#1] SMP KASAN PTI
[ 564.103103] CPU: 1 PID: 1298 Comm: f2fs_discard-7: Not tainted 4.18.0-rc1+ #4
[ 564.104584] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 564.106624] RIP: 0010:issue_discard_thread+0x248/0x5c0
[ 564.107692] Code: ff ff 48 8b bd e8 fe ff ff 41 8b 9d 4c 04 00 00 e8 cd b8 ad ff 41 8b 85 50 04 00 00 31 d2 48 8d 04 80 48 8d 04 80 48 c1 e0 02 <48> f7 f3 83 f8 50 7e 16 41 c7 86 7c ff ff ff 01 00 00 00 41 c7 86
[ 564.111686] RSP: 0018:ffff8801f3117dc0 EFLAGS: 00010206
[ 564.112775] RAX: 0000000000000384 RBX: 0000000000000000 RCX: ffffffffb88c1e03
[ 564.114250] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffff8801e3aa4850
[ 564.115706] RBP: ffff8801f3117f00 R08: 1ffffffff751a1d0 R09: fffffbfff751a1d0
[ 564.117177] R10: 0000000000000001 R11: fffffbfff751a1d0 R12: 00000000fffffffc
[ 564.118634] R13: ffff8801e3aa4400 R14: ffff8801f3117ed8 R15: ffff8801e2050000
[ 564.120094] FS: 0000000000000000(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 564.121748] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 564.122923] CR2: 000000000202b078 CR3: 00000001f11ac000 CR4: 00000000000006e0
[ 564.124383] Call Trace:
[ 564.124924] ? __issue_discard_cmd+0x480/0x480
[ 564.125882] ? __sched_text_start+0x8/0x8
[ 564.126756] ? __kthread_parkme+0xcb/0x100
[ 564.127620] ? kthread_blkcg+0x70/0x70
[ 564.128412] kthread+0x180/0x1d0
[ 564.129105] ? __issue_discard_cmd+0x480/0x480
[ 564.130029] ? kthread_associate_blkcg+0x150/0x150
[ 564.131033] ret_from_fork+0x35/0x40
[ 564.131794] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[ 564.141798] ---[ end trace 4ce02f25ff7d3df5 ]---
[ 564.142773] RIP: 0010:issue_discard_thread+0x248/0x5c0
[ 564.143885] Code: ff ff 48 8b bd e8 fe ff ff 41 8b 9d 4c 04 00 00 e8 cd b8 ad ff 41 8b 85 50 04 00 00 31 d2 48 8d 04 80 48 8d 04 80 48 c1 e0 02 <48> f7 f3 83 f8 50 7e 16 41 c7 86 7c ff ff ff 01 00 00 00 41 c7 86
[ 564.147776] RSP: 0018:ffff8801f3117dc0 EFLAGS: 00010206
[ 564.148856] RAX: 0000000000000384 RBX: 0000000000000000 RCX: ffffffffb88c1e03
[ 564.150424] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffff8801e3aa4850
[ 564.151906] RBP: ffff8801f3117f00 R08: 1ffffffff751a1d0 R09: fffffbfff751a1d0
[ 564.153463] R10: 0000000000000001 R11: fffffbfff751a1d0 R12: 00000000fffffffc
[ 564.154915] R13: ffff8801e3aa4400 R14: ffff8801f3117ed8 R15: ffff8801e2050000
[ 564.156405] FS: 0000000000000000(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 564.158070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 564.159279] CR2: 000000000202b078 CR3: 00000001f11ac000 CR4: 00000000000006e0
[ 564.161043] ==================================================================
[ 564.162587] BUG: KASAN: stack-out-of-bounds in from_kuid_munged+0x1d/0x50
[ 564.163994] Read of size 4 at addr ffff8801f3117c84 by task f2fs_discard-7:/1298
[ 564.165852] CPU: 1 PID: 1298 Comm: f2fs_discard-7: Tainted: G D 4.18.0-rc1+ #4
[ 564.167593] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 564.169522] Call Trace:
[ 564.170057] dump_stack+0x7b/0xb5
[ 564.170778] print_address_description+0x70/0x290
[ 564.171765] kasan_report+0x291/0x390
[ 564.172540] ? from_kuid_munged+0x1d/0x50
[ 564.173408] __asan_load4+0x78/0x80
[ 564.174148] from_kuid_munged+0x1d/0x50
[ 564.174962] do_notify_parent+0x1f5/0x4f0
[ 564.175808] ? send_sigqueue+0x390/0x390
[ 564.176639] ? css_set_move_task+0x152/0x340
[ 564.184197] do_exit+0x1290/0x1390
[ 564.184950] ? __issue_discard_cmd+0x480/0x480
[ 564.185884] ? mm_update_next_owner+0x380/0x380
[ 564.186829] ? __sched_text_start+0x8/0x8
[ 564.187672] ? __kthread_parkme+0xcb/0x100
[ 564.188528] ? kthread_blkcg+0x70/0x70
[ 564.189333] ? kthread+0x180/0x1d0
[ 564.190052] ? __issue_discard_cmd+0x480/0x480
[ 564.190983] rewind_stack_do_exit+0x17/0x20
[ 564.192190] The buggy address belongs to the page:
[ 564.193213] page:ffffea0007cc45c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 564.194856] flags: 0x2ffff0000000000()
[ 564.195644] raw: 02ffff0000000000 0000000000000000 dead000000000200 0000000000000000
[ 564.197247] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 564.198826] page dumped because: kasan: bad access detected
[ 564.200299] Memory state around the buggy address:
[ 564.201306] ffff8801f3117b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 564.202779] ffff8801f3117c00: 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3
[ 564.204252] >ffff8801f3117c80: f3 f3 f3 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 564.205742] ^
[ 564.206424] ffff8801f3117d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 564.207908] ffff8801f3117d80: f3 f3 f3 f3 f3 f3 f3 f3 00 00 00 00 00 00 00 00
[ 564.209389] ==================================================================
[ 564.231795] F2FS-fs (loop0): Mounted with checkpoint version = 2
- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.h#L586
return div_u64((u64)valid_user_blocks(sbi) * 100,
sbi->user_block_count);
Missing checks on sbi->user_block_count.
Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch adds to do sanity check with {sit,nat}_ver_bitmap_bytesize
during mount, in order to avoid accessing across cache boundary with
this abnormal bitmap size.
- Overview
buffer overrun in build_sit_info() when mounting a crafted f2fs image
- Reproduce
- Kernel message
[ 548.580867] F2FS-fs (loop0): Invalid log blocks per segment (8201)
[ 548.580877] F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
[ 548.584979] ==================================================================
[ 548.586568] BUG: KASAN: use-after-free in kmemdup+0x36/0x50
[ 548.587715] Read of size 64 at addr ffff8801e9c265ff by task mount/1295
[ 548.589428] CPU: 1 PID: 1295 Comm: mount Not tainted 4.18.0-rc1+ #4
[ 548.589432] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 548.589438] Call Trace:
[ 548.589474] dump_stack+0x7b/0xb5
[ 548.589487] print_address_description+0x70/0x290
[ 548.589492] kasan_report+0x291/0x390
[ 548.589496] ? kmemdup+0x36/0x50
[ 548.589509] check_memory_region+0x139/0x190
[ 548.589514] memcpy+0x23/0x50
[ 548.589518] kmemdup+0x36/0x50
[ 548.589545] f2fs_build_segment_manager+0x8fa/0x3410
[ 548.589551] ? __asan_loadN+0xf/0x20
[ 548.589560] ? f2fs_sanity_check_ckpt+0x1be/0x240
[ 548.589566] ? f2fs_flush_sit_entries+0x10c0/0x10c0
[ 548.589587] ? __put_user_ns+0x40/0x40
[ 548.589604] ? find_next_bit+0x57/0x90
[ 548.589610] f2fs_fill_super+0x194b/0x2b40
[ 548.589617] ? f2fs_commit_super+0x1b0/0x1b0
[ 548.589637] ? set_blocksize+0x90/0x140
[ 548.589651] mount_bdev+0x1c5/0x210
[ 548.589655] ? f2fs_commit_super+0x1b0/0x1b0
[ 548.589667] f2fs_mount+0x15/0x20
[ 548.589672] mount_fs+0x60/0x1a0
[ 548.589683] ? alloc_vfsmnt+0x309/0x360
[ 548.589688] vfs_kern_mount+0x6b/0x1a0
[ 548.589699] do_mount+0x34a/0x18c0
[ 548.589710] ? lockref_put_or_lock+0xcf/0x160
[ 548.589716] ? copy_mount_string+0x20/0x20
[ 548.589728] ? memcg_kmem_put_cache+0x1b/0xa0
[ 548.589734] ? kasan_check_write+0x14/0x20
[ 548.589740] ? _copy_from_user+0x6a/0x90
[ 548.589744] ? memdup_user+0x42/0x60
[ 548.589750] ksys_mount+0x83/0xd0
[ 548.589755] __x64_sys_mount+0x67/0x80
[ 548.589781] do_syscall_64+0x78/0x170
[ 548.589797] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 548.589820] RIP: 0033:0x7f76fc331b9a
[ 548.589821] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[ 548.589880] RSP: 002b:00007ffd4f0a0e48 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 548.589890] RAX: ffffffffffffffda RBX: 000000000146c030 RCX: 00007f76fc331b9a
[ 548.589892] RDX: 000000000146c210 RSI: 000000000146df30 RDI: 0000000001474ec0
[ 548.589895] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 548.589897] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000000001474ec0
[ 548.589900] R13: 000000000146c210 R14: 0000000000000000 R15: 0000000000000003
[ 548.590242] The buggy address belongs to the page:
[ 548.591243] page:ffffea0007a70980 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 548.592886] flags: 0x2ffff0000000000()
[ 548.593665] raw: 02ffff0000000000 dead000000000100 dead000000000200 0000000000000000
[ 548.595258] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 548.603713] page dumped because: kasan: bad access detected
[ 548.605203] Memory state around the buggy address:
[ 548.606198] ffff8801e9c26480: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 548.607676] ffff8801e9c26500: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 548.609157] >ffff8801e9c26580: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 548.610629] ^
[ 548.612088] ffff8801e9c26600: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 548.613674] ffff8801e9c26680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 548.615141] ==================================================================
[ 548.616613] Disabling lock debugging due to kernel taint
[ 548.622871] WARNING: CPU: 1 PID: 1295 at mm/page_alloc.c:4065 __alloc_pages_slowpath+0xe4a/0x1420
[ 548.622878] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[ 548.623217] CPU: 1 PID: 1295 Comm: mount Tainted: G B 4.18.0-rc1+ #4
[ 548.623219] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 548.623226] RIP: 0010:__alloc_pages_slowpath+0xe4a/0x1420
[ 548.623227] Code: ff ff 01 89 85 c8 fe ff ff e9 91 fc ff ff 41 89 c5 e9 5c fc ff ff 0f 0b 89 f8 25 ff ff f7 ff 89 85 8c fe ff ff e9 d5 f2 ff ff <0f> 0b e9 65 f2 ff ff 65 8b 05 38 81 d2 47 f6 c4 01 74 1c 65 48 8b
[ 548.623281] RSP: 0018:ffff8801f28c7678 EFLAGS: 00010246
[ 548.623284] RAX: 0000000000000000 RBX: 00000000006040c0 RCX: ffffffffb82f73b7
[ 548.623287] RDX: 1ffff1003e518eeb RSI: 000000000000000c RDI: 0000000000000000
[ 548.623290] RBP: ffff8801f28c7880 R08: 0000000000000000 R09: ffffed0047fff2c5
[ 548.623292] R10: 0000000000000001 R11: ffffed0047fff2c4 R12: ffff8801e88de040
[ 548.623295] R13: 00000000006040c0 R14: 000000000000000c R15: ffff8801f28c7938
[ 548.623299] FS: 00007f76fca51840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 548.623302] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 548.623304] CR2: 00007f19b9171760 CR3: 00000001ed952000 CR4: 00000000000006e0
[ 548.623317] Call Trace:
[ 548.623325] ? kasan_check_read+0x11/0x20
[ 548.623330] ? __zone_watermark_ok+0x92/0x240
[ 548.623336] ? get_page_from_freelist+0x1c3/0x1d90
[ 548.623347] ? _raw_spin_lock_irqsave+0x2a/0x60
[ 548.623353] ? warn_alloc+0x250/0x250
[ 548.623358] ? save_stack+0x46/0xd0
[ 548.623361] ? kasan_kmalloc+0xad/0xe0
[ 548.623366] ? __isolate_free_page+0x2a0/0x2a0
[ 548.623370] ? mount_fs+0x60/0x1a0
[ 548.623374] ? vfs_kern_mount+0x6b/0x1a0
[ 548.623378] ? do_mount+0x34a/0x18c0
[ 548.623383] ? ksys_mount+0x83/0xd0
[ 548.623387] ? __x64_sys_mount+0x67/0x80
[ 548.623391] ? do_syscall_64+0x78/0x170
[ 548.623396] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 548.623401] __alloc_pages_nodemask+0x3c5/0x400
[ 548.623407] ? __alloc_pages_slowpath+0x1420/0x1420
[ 548.623412] ? __mutex_lock_slowpath+0x20/0x20
[ 548.623417] ? kvmalloc_node+0x31/0x80
[ 548.623424] alloc_pages_current+0x75/0x110
[ 548.623436] kmalloc_order+0x24/0x60
[ 548.623442] kmalloc_order_trace+0x24/0xb0
[ 548.623448] __kmalloc_track_caller+0x207/0x220
[ 548.623455] ? f2fs_build_node_manager+0x399/0xbb0
[ 548.623460] kmemdup+0x20/0x50
[ 548.623465] f2fs_build_node_manager+0x399/0xbb0
[ 548.623470] f2fs_fill_super+0x195e/0x2b40
[ 548.623477] ? f2fs_commit_super+0x1b0/0x1b0
[ 548.623481] ? set_blocksize+0x90/0x140
[ 548.623486] mount_bdev+0x1c5/0x210
[ 548.623489] ? f2fs_commit_super+0x1b0/0x1b0
[ 548.623495] f2fs_mount+0x15/0x20
[ 548.623498] mount_fs+0x60/0x1a0
[ 548.623503] ? alloc_vfsmnt+0x309/0x360
[ 548.623508] vfs_kern_mount+0x6b/0x1a0
[ 548.623513] do_mount+0x34a/0x18c0
[ 548.623518] ? lockref_put_or_lock+0xcf/0x160
[ 548.623523] ? copy_mount_string+0x20/0x20
[ 548.623528] ? memcg_kmem_put_cache+0x1b/0xa0
[ 548.623533] ? kasan_check_write+0x14/0x20
[ 548.623537] ? _copy_from_user+0x6a/0x90
[ 548.623542] ? memdup_user+0x42/0x60
[ 548.623547] ksys_mount+0x83/0xd0
[ 548.623552] __x64_sys_mount+0x67/0x80
[ 548.623557] do_syscall_64+0x78/0x170
[ 548.623562] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 548.623566] RIP: 0033:0x7f76fc331b9a
[ 548.623567] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[ 548.623632] RSP: 002b:00007ffd4f0a0e48 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 548.623636] RAX: ffffffffffffffda RBX: 000000000146c030 RCX: 00007f76fc331b9a
[ 548.623639] RDX: 000000000146c210 RSI: 000000000146df30 RDI: 0000000001474ec0
[ 548.623641] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 548.623643] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000000001474ec0
[ 548.623646] R13: 000000000146c210 R14: 0000000000000000 R15: 0000000000000003
[ 548.623650] ---[ end trace 4ce02f25ff7d3df5 ]---
[ 548.623656] F2FS-fs (loop0): Failed to initialize F2FS node manager
[ 548.627936] F2FS-fs (loop0): Invalid log blocks per segment (8201)
[ 548.627940] F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
[ 548.635835] F2FS-fs (loop0): Failed to initialize F2FS node manager
- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.c#L3578
sit_i->sit_bitmap = kmemdup(src_bitmap, bitmap_size, GFP_KERNEL);
Buffer overrun happens when doing memcpy. I suspect there is missing (inconsistent) checks on bitmap_size.
Reported by Wen Xu (wen.xu@gatech.edu) from SSLab, Gatech.
Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
As Wen Xu reported in below link:
https://bugzilla.kernel.org/show_bug.cgi?id=200183
- Overview
Divide zero in reset_curseg() when mounting a crafted f2fs image
- Reproduce
- Kernel message
[ 588.281510] divide error: 0000 [#1] SMP KASAN PTI
[ 588.282701] CPU: 0 PID: 1293 Comm: mount Not tainted 4.18.0-rc1+ #4
[ 588.284000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 588.286178] RIP: 0010:reset_curseg+0x94/0x1a0
[ 588.298166] RSP: 0018:ffff8801e88d7940 EFLAGS: 00010246
[ 588.299360] RAX: 0000000000000014 RBX: ffff8801e1d46d00 RCX: ffffffffb88bf60b
[ 588.300809] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffff8801e1d46d64
[ 588.305272] R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000000
[ 588.306822] FS: 00007fad85008840(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
[ 588.308456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 588.309623] CR2: 0000000001705078 CR3: 00000001f30f8000 CR4: 00000000000006f0
[ 588.311085] Call Trace:
[ 588.311637] f2fs_build_segment_manager+0x103f/0x3410
[ 588.316136] ? f2fs_commit_super+0x1b0/0x1b0
[ 588.317031] ? set_blocksize+0x90/0x140
[ 588.319473] f2fs_mount+0x15/0x20
[ 588.320166] mount_fs+0x60/0x1a0
[ 588.320847] ? alloc_vfsmnt+0x309/0x360
[ 588.321647] vfs_kern_mount+0x6b/0x1a0
[ 588.322432] do_mount+0x34a/0x18c0
[ 588.323175] ? strndup_user+0x46/0x70
[ 588.323937] ? copy_mount_string+0x20/0x20
[ 588.324793] ? memcg_kmem_put_cache+0x1b/0xa0
[ 588.325702] ? kasan_check_write+0x14/0x20
[ 588.326562] ? _copy_from_user+0x6a/0x90
[ 588.327375] ? memdup_user+0x42/0x60
[ 588.328118] ksys_mount+0x83/0xd0
[ 588.328808] __x64_sys_mount+0x67/0x80
[ 588.329607] do_syscall_64+0x78/0x170
[ 588.330400] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 588.331461] RIP: 0033:0x7fad848e8b9a
[ 588.336022] RSP: 002b:00007ffd7c5b6be8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 588.337547] RAX: ffffffffffffffda RBX: 00000000016f8030 RCX: 00007fad848e8b9a
[ 588.338999] RDX: 00000000016f8210 RSI: 00000000016f9f30 RDI: 0000000001700ec0
[ 588.340442] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 588.341887] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000000001700ec0
[ 588.343341] R13: 00000000016f8210 R14: 0000000000000000 R15: 0000000000000003
[ 588.354891] ---[ end trace 4ce02f25ff7d3df5 ]---
[ 588.355862] RIP: 0010:reset_curseg+0x94/0x1a0
[ 588.360742] RSP: 0018:ffff8801e88d7940 EFLAGS: 00010246
[ 588.361812] RAX: 0000000000000014 RBX: ffff8801e1d46d00 RCX: ffffffffb88bf60b
[ 588.363485] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffff8801e1d46d64
[ 588.365213] RBP: ffff8801e88d7968 R08: ffffed003c32266f R09: ffffed003c32266f
[ 588.366661] R10: 0000000000000001 R11: ffffed003c32266e R12: ffff8801f0337700
[ 588.368110] R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000000
[ 588.370057] FS: 00007fad85008840(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
[ 588.372099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 588.373291] CR2: 0000000001705078 CR3: 00000001f30f8000 CR4: 00000000000006f0
- Location
https://elixir.bootlin.com/linux/latest/source/fs/f2fs/segment.c#L2147
curseg->zone = GET_ZONE_FROM_SEG(sbi, curseg->segno);
If secs_per_zone is corrupted due to fuzzing test, it will cause divide
zero operation when using GET_ZONE_FROM_SEG macro, so we should do more
sanity check with secs_per_zone during mount to avoid this issue.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In fill_super, if root inode's attribute is incorrect, we need to
call f2fs_destroy_stats to release stats memory.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
readdir_ra is sysfs configuration instead of mount option, so it should
not be initialized in default_options(), otherwise after remount, it can
be reset to be enabled which may not as user wish, so let's move it to
f2fs_tuning_parameters().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Let default_options() initialize s_res{u,g}id with default value like
other options.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Once we shutdown f2fs, we have to flush stale pages in order to unmount
the system. In order to make stable, we need to stop fault injection as well.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Add defines for STAT_READ and STAT_WRITE for indexing the partition
stat entries. This clarifies some fs/ code which has hardcoded 1 for
STAT_WRITE and will make it easier to extend the stats with additional
fields.
tj: Refreshed on top of v4.17.
Signed-off-by: Michael Callahan <michaelcallahan@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When unmounting f2fs in force mode, we can get it stuck by io_schedule()
by some pending IOs in meta_inode.
io_schedule+0xd/0x30
wait_on_page_bit_common+0xc6/0x130
__filemap_fdatawait_range+0xbd/0x100
filemap_fdatawait_keep_errors+0x15/0x40
sync_inodes_sb+0x1cf/0x240
sync_filesystem+0x52/0x90
generic_shutdown_super+0x1d/0x110
kill_f2fs_super+0x28/0x80 [f2fs]
deactivate_locked_super+0x35/0x60
cleanup_mnt+0x36/0x70
task_work_run+0x79/0xa0
exit_to_usermode_loop+0x62/0x70
do_syscall_64+0xdb/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
0xffffffffffffffff
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|