summaryrefslogtreecommitdiff
path: root/fs/f2fs/gc.c
AgeCommit message (Collapse)Author
2020-04-17f2fs: support read iostatChao Yu
Adds to support accounting read IOs from userspace/kernel. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: don't trigger data flush in foreground operationChao Yu
Data flush can generate heavy IO and cause long latency during flush, so it's not appropriate to trigger it in foreground operation. And also, we may face below potential deadlock during data flush: - f2fs_write_multi_pages - f2fs_write_raw_pages - f2fs_write_single_data_page - f2fs_balance_fs - f2fs_balance_fs_bg - f2fs_sync_dirty_inodes - filemap_fdatawrite -- stuck on flush same cluster Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-22f2fs: fix to update f2fs_super_block fields under sb_lockChao Yu
Fields in struct f2fs_super_block should be updated under coverage of sb_lock, fix to adjust update_sb_metadata() for that rule. Fixes: 04f0b2eaa3b3 ("f2fs: ioctl for removing a range from F2FS") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-22f2fs: Fix mount failure due to SPO after a successful online resize FSSahitya Tummala
Even though online resize is successfully done, a SPO immediately after resize, still causes below error in the next mount. [ 11.294650] F2FS-fs (sda8): Wrong user_block_count: 2233856 [ 11.300272] F2FS-fs (sda8): Failed to get valid F2FS checkpoint This is because after FS metadata is updated in update_fs_metadata() if the SBI_IS_DIRTY is not dirty, then CP will not be done to reflect the new user_block_count. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19f2fs: skip migration only when BG_GC is calledJaegeuk Kim
FG_GC needs to move entire section more quickly. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19f2fs: introduce DEFAULT_IO_TIMEOUTChao Yu
As Geert Uytterhoeven reported: for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50); On some platforms, HZ can be less than 50, then unexpected 0 timeout jiffies will be set in congestion_wait(). This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue. Quoted from Geert Uytterhoeven: "A timeout of HZ means 1 second. HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50. If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20), as that takes care of the special cases, and never returns 0." Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19f2fs: skip GC when section is fullJaegeuk Kim
This fixes skipping GC when segment is full in large section. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19f2fs: add migration count iff migration happensJaegeuk Kim
If first segment is empty and migration_granularity is 1, we can't move this at all. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19f2fs: clean up bggc mount optionChao Yu
There are three status for background gc: on, off and sync, it's a little bit confused to use test_opt(BG_GC) and test_opt(FORCE_FG_GC) combinations to indicate status of background gc. So let's remove F2FS_MOUNT_BG_GC and F2FS_MOUNT_FORCE_FG_GC mount options, and add F2FS_OPTION().bggc_mode with below three status to clean up codes and enhance bggc mode's scalability. enum { BGGC_MODE_ON, /* background gc is on */ BGGC_MODE_OFF, /* background gc is off */ BGGC_MODE_SYNC, /* * background gc is on, migrating blocks * like foreground gc */ }; Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19f2fs: clean up lfs/adaptive mount optionChao Yu
This patch removes F2FS_MOUNT_ADAPTIVE and F2FS_MOUNT_LFS mount options, and add F2FS_OPTION.fs_mode with below two status to indicate filesystem mode. enum { FS_MODE_ADAPTIVE, /* use both lfs/ssr allocation */ FS_MODE_LFS, /* use lfs allocation only */ }; It can enhance code readability and fs mode's scalability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19f2fs: clean up codes with {f2fs_,}data_blkaddr()Chao Yu
- rename datablock_addr() to data_blkaddr(). - wrap data_blkaddr() with f2fs_data_blkaddr() to clean up parameters. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-10f2fs: fix inconsistent commentsChao Yu
Lack of maintenance on comments may mislead developers, fix them. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-23f2fs: Add f2fs stats to sysfsHridya Valsaraju
Currently f2fs stats are only available from /d/f2fs/status. This patch adds some of the f2fs stats to sysfs so that they are accessible even when debugfs is not mounted. The following sysfs nodes are added: -/sys/fs/f2fs/<disk>/free_segments -/sys/fs/f2fs/<disk>/cp_foreground_calls -/sys/fs/f2fs/<disk>/cp_background_calls -/sys/fs/f2fs/<disk>/gc_foreground_calls -/sys/fs/f2fs/<disk>/gc_background_calls -/sys/fs/f2fs/<disk>/moved_blocks_foreground -/sys/fs/f2fs/<disk>/moved_blocks_background -/sys/fs/f2fs/<disk>/avg_vblocks Signed-off-by: Hridya Valsaraju <hridya@google.com> [Jaegeuk Kim: allow STAT_FS without DEBUG_FS] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17f2fs: change to use rwsem for gc_mutexChao Yu
Mutex lock won't serialize callers, in order to avoid starving of unlucky caller, let's use rwsem lock instead. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17f2fs: run fsck when getting bad inode during GCJaegeuk Kim
This is to avoid inifinite GC when trying to disable checkpoint. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-25f2fs: stop GC when the victim becomes fully validJaegeuk Kim
We must stop GC, once the segment becomes fully valid. Otherwise, it can produce another dirty segments by moving valid blocks in the segment partially. Ramon hit no free segment panic sometimes and saw this case happens when validating reliable file pinning feature. Signed-off-by: Ramon Pantin <pantin@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-19f2fs: show f2fs instance in printk_ratelimitedChao Yu
As Eric mentioned, bare printk{,_ratelimited} won't show which filesystem instance these message is coming from, this patch tries to show fs instance with sb->s_id field in all places we missed before. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-10-22f2fs: update multi-dev metadata in resize_fsQiuyang Sun
Multi-device metadata should be updated in resize_fs as well. Also, we check that the new FS size still reaches the last device. Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-09-09f2fs: Fix indefinite loop in f2fs_gc()Sahitya Tummala
Policy - foreground GC, LFS mode and greedy GC mode. Under this policy, f2fs_gc() loops forever to GC as it doesn't have enough free segements to proceed and thus it keeps calling gc_more for the same victim segment. This can happen if the selected victim segment could not be GC'd due to failed blkaddr validity check i.e. is_alive() returns false for the blocks set in current validity map. Fix this by not resetting the sbi->cur_victim_sec to NULL_SEGNO, when the segment selected could not be GC'd. This helps to select another segment for GC and thus helps to proceed forward with GC. [Note] This can happen due to is_alive as well as atomic_file which skipps GC. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-08-23f2fs: Fix indefinite loop in f2fs_gc()Sahitya Tummala
Policy - Foreground GC, LFS and greedy GC mode. Under this policy, f2fs_gc() loops forever to GC as it doesn't have enough free segements to proceed and thus it keeps calling gc_more for the same victim segment. This can happen if the selected victim segment could not be GC'd due to failed blkaddr validity check i.e. is_alive() returns false for the blocks set in current validity map. Fix this by keeping track of such invalid segments and skip those segments for selection in get_victim_by_default() to avoid endless GC loop under such error scenarios. Currently, add this logic under CONFIG_F2FS_CHECK_FS to be able to root cause the issue in debug version. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix wrong bitmap size] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-26f2fs: fix to read source block before invalidating itJaegeuk Kim
f2fs_allocate_data_block() invalidates old block address and enable new block address. Then, if we try to read old block by f2fs_submit_page_bio(), it will give WARN due to reading invalid blocks. Let's make the order sanely back. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02f2fs: use generic EFSBADCRC/EFSCORRUPTEDChao Yu
f2fs uses EFAULT as error number to indicate filesystem is corrupted all the time, but generic filesystems use EUCLEAN for such condition, we need to change to follow others. This patch adds two new macros as below to wrap more generic error code macros, and spread them in code. EFSBADCRC EBADMSG /* Bad CRC detected */ EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */ Reported-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Chao Yu <yuchao0@huawei.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02f2fs: introduce f2fs_<level> macros to wrap f2fs_printk()Joe Perches
- Add and use f2fs_<level> macros - Convert f2fs_msg to f2fs_printk - Remove level from f2fs_printk and embed the level in the format - Coalesce formats and align multi-line arguments - Remove unnecessary duplicate extern f2fs_msg f2fs.h Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02f2fs: ioctl for removing a range from F2FSQiuyang Sun
This ioctl shrinks a given length (aligned to sections) from end of the main area. Any cursegs and valid blocks will be moved out before invalidating the range. This feature can be used for adjusting partition sizes online. History of the patch: Sahitya Tummala: - Add this ioctl for f2fs_compat_ioctl() as well. - Fix debugfs status to reflect the online resize changes. - Fix potential race between online resize path and allocate new data block path or gc path. Others: - Rename some identifiers. - Add some error handling branches. - Clear sbi->next_victim_seg[BG_GC/FG_GC] in shrinking range. - Implement this interface as ext4's, and change the parameter from shrunk bytes to new block count of F2FS. - During resizing, force to empty sit_journal and forbid adding new entries to it, in order to avoid invalid segno in journal after resize. - Reduce sbi->user_block_count before resize starts. - Commit the updated superblock first, and then update in-memory metadata only when the former succeeds. - Target block count must align to sections. - Write checkpoint before and after committing the new superblock, w/o CP_FSCK_FLAG respectively, so that the FS can be fixed by fsck even if resize fails after the new superblock is committed. - In free_segment_range(), reduce granularity of gc_mutex. - Add protection on curseg migration. - Add freeze_bdev() and thaw_bdev() for resize fs. - Remove CUR_MAIN_SECS and use MAIN_SECS directly for allocation. - Recover super_block and FS metadata when resize fails. - No need to clear CP_FSCK_FLAG in update_ckpt_flags(). - Clean up the sb and fs metadata update functions for resize_fs. Geert Uytterhoeven: - Use div_u64*() for 64-bit divisions Arnd Bergmann: - Not all architectures support get_user() with a 64-bit argument: ERROR: "__get_user_bad" [fs/f2fs/f2fs.ko] undefined! Use copy_from_user() here, this will always work. Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-23f2fs: allow ssr block allocation during checkpoint=disable periodJaegeuk Kim
This patch allows to use ssr during checkpoint is disabled. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: introduce DATA_GENERIC_ENHANCEChao Yu
Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check whether @blkaddr locates in main area or not. That check is weak, since the block address in range of main area can point to the address which is not valid in segment info table, and we can not detect such condition, we may suffer worse corruption as system continues running. So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check which trigger SIT bitmap check rather than only range check. This patch did below changes as wel: - set SBI_NEED_FSCK in f2fs_is_valid_blkaddr(). - get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid. - introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check. - spread blkaddr check in: * f2fs_get_node_info() * __read_out_blkaddrs() * f2fs_submit_page_read() * ra_data_block() * do_recover_data() This patch can fix bug reported from bugzilla below: https://bugzilla.kernel.org/show_bug.cgi?id=203215 https://bugzilla.kernel.org/show_bug.cgi?id=203223 https://bugzilla.kernel.org/show_bug.cgi?id=203231 https://bugzilla.kernel.org/show_bug.cgi?id=203235 https://bugzilla.kernel.org/show_bug.cgi?id=203241 = Update by Jaegeuk Kim = DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths. But, xfstest/generic/446 compalins some generated kernel messages saying invalid bitmap was detected when reading a block. The reaons is, when we get the block addresses from extent_cache, there is no lock to synchronize it from truncating the blocks in parallel. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: allow address pointer number of dnode aligning to specified sizeChao Yu
This patch expands scalability of dnode layout, it allows address pointer number of dnode aligning to specified size (now, the size is one byte by default), and later the number can align to compress cluster size (1 << n bytes, n=[2,..)), it can avoid cluster acrossing two dnode, making design of compress meta layout simple. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: fix to avoid deadloop in foreground GCChao Yu
As Jungyeon reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203211 - Overview When mounting the attached crafted image and making a new file, I got this error and the error messages keep repeating. The image is intentionally fuzzed from a normal f2fs image for testing and I run with option CONFIG_F2FS_CHECK_FS on. - Reproduces mkdir test mount -t f2fs tmp.img test cd test touch t - Messages [ 58.820451] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.821485] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.822530] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.823571] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.824616] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.825640] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.826663] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.827698] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.828719] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.829759] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.830783] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.831828] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.832869] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.833888] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.834945] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.835996] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.837028] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.838051] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.839072] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.840100] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.841147] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.842186] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.843214] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.844267] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.845282] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.846305] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT [ 58.847341] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT ... (repeating) During GC, if segment type stored in SSA and SIT is inconsistent, we just skip migrating current segment directly, since we need to know the exact type to decide the migration function we use. So in foreground GC, we will easily run into a infinite loop as we may select the same victim segment which has inconsistent type due to greedy policy. In order to end up this, we choose to shutdown filesystem. For backgrond GC, we need to do that as well, so that we can avoid latter potential infinite looped foreground GC. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-04-05f2fs: Fix use of number of devicesDamien Le Moal
For a single device mount using a zoned block device, the zone information for the device is stored in the sbi->devs single entry array and sbi->s_ndevs is set to 1. This differs from a single device mount using a regular block device which does not allocate sbi->devs and sets sbi->s_ndevs to 0. However, sbi->s_devs == 0 condition is used throughout the code to differentiate a single device mount from a multi-device mount where sbi->s_ndevs is always larger than 1. This results in problems with single zoned block device volumes as these are treated as multi-device mounts but do not have the start_blk and end_blk information set. One of the problem observed is skipping of zone discard issuing resulting in write commands being issued to full zones or unaligned to a zone write pointer. Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single regular block device mount) and sbi->s_ndevs == 1 (single zoned block device mount) in the same manner. This is done by introducing the helper function f2fs_is_multi_device() and using this helper in place of direct tests of sbi->s_ndevs value, improving code readability. Fixes: 7bb3a371d199 ("f2fs: Fix zoned block device support") Cc: <stable@vger.kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26f2fs: check PageWriteback flag for ordered caseChao Yu
For all ordered cases in f2fs_wait_on_page_writeback(), we need to check PageWriteback status, so let's clean up to relocate the check into f2fs_wait_on_page_writeback(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26f2fs: use kvmalloc, if kmalloc is failedJaegeuk Kim
One report says memalloc failure during mount. (unwind_backtrace) from [<c010cd4c>] (show_stack+0x10/0x14) (show_stack) from [<c049c6b8>] (dump_stack+0x8c/0xa0) (dump_stack) from [<c024fcf0>] (warn_alloc+0xc4/0x160) (warn_alloc) from [<c0250218>] (__alloc_pages_nodemask+0x3f4/0x10d0) (__alloc_pages_nodemask) from [<c0270450>] (kmalloc_order_trace+0x2c/0x120) (kmalloc_order_trace) from [<c03fa748>] (build_node_manager+0x35c/0x688) (build_node_manager) from [<c03de494>] (f2fs_fill_super+0xf0c/0x16cc) (f2fs_fill_super) from [<c02a5864>] (mount_bdev+0x15c/0x188) (mount_bdev) from [<c03da624>] (f2fs_mount+0x18/0x20) (f2fs_mount) from [<c02a68b8>] (mount_fs+0x158/0x19c) (mount_fs) from [<c02c3c9c>] (vfs_kern_mount+0x78/0x134) (vfs_kern_mount) from [<c02c76ac>] (do_mount+0x474/0xca4) (do_mount) from [<c02c8264>] (SyS_mount+0x94/0xbc) (SyS_mount) from [<c0108180>] (ret_fast_syscall+0x0/0x48) Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-14f2fs: fix to reorder set_page_dirty and wait_on_page_writebackChao Yu
This patch reorders flow from - update page - set_page_dirty - wait_on_page_writeback to - wait_on_page_writeback - update page - set_page_dirty The reason is: - set_page_dirty will increase reference of dirty page, the reference should be cleared before wait_on_page_writeback to keep its consistency. - some devices need stable page during page writebacking, so we should not change page's data. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: adjust trace print in f2fs_get_victim() to cover all pathsSahitya Tummala
Adjust the trace print in f2fs_get_victim() to cover GC done by F2FS_IOC_GARBAGE_COLLECT_RANGE. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: fix to allow node segment for GC by ioctl pathSahitya Tummala
Allow node type segments also to be GC'd via f2fs ioctl F2FS_IOC_GARBAGE_COLLECT_RANGE. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: support subsectional garbage collectionChao Yu
Section is minimal garbage collection unit of f2fs, in zoned block device, or ancient block mapping flash device, in order to improve GC efficiency, we can align GC unit to lower device erase unit, normally, it consists of multiple of segments. Once background or foreground GC triggers, it brings a large number of IOs, which will impact user IO, and also occupy cpu/memory resource intensively. So, to reduce impact of GC on large size section, this patch supports subsectional GC, in one cycle of GC, it only migrate partial segment{s} in victim section. Currently, by default, we use sbi->segs_per_sec as migration granularity. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: introduce __is_large_section() for cleanupChao Yu
Introduce a wrapper __is_large_section() to clean up codes. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: fix count of seg_freed to make sec_freed correctYunlong Song
When sbi->segs_per_sec > 1, and if some segno has 0 valid blocks before gc starts, do_garbage_collect will skip counting seg_freed++, and this will cause seg_freed < sbi->segs_per_sec and finally skip sec_freed++. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: avoid GC causing encrypted file corruptedYunlong Song
The encrypted file may be corrupted by GC in following case: Time 1: | segment 1 blkaddr = A | GC -> | segment 2 blkaddr = B | Encrypted block 1 is moved from blkaddr A of segment 1 to blkaddr B of segment 2, Time 2: | segment 1 blkaddr = B | GC -> | segment 3 blkaddr = C | Before page 1 is written back and if segment 2 become a victim, then page 1 is moved from blkaddr B of segment 2 to blkaddr Cof segment 3, during the GC process of Time 2, f2fs should wait for page 1 written back before reading it, or move_data_block will read a garbage block from blkaddr B since page is not written back to blkaddr B yet. Commit 6aa58d8a ("f2fs: readahead encrypted block during GC") introduce ra_data_block to read encrypted block, but it forgets to add f2fs_wait_on_page_writeback to avoid racing between GC and flush. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16f2fs: submit cached bio to avoid endless PageWritebackChao Yu
When migrating encrypted block from background GC thread, we only add them into f2fs inner bio cache, but forget to submit the cached bio, it may cause potential deadlock when we are waiting page writebacked, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16f2fs: checkpoint disablingDaniel Rosenberg
Note that, it requires "f2fs: return correct errno in f2fs_gc". This adds a lightweight non-persistent snapshotting scheme to f2fs. To use, mount with the option checkpoint=disable, and to return to normal operation, remount with checkpoint=enable. If the filesystem is shut down before remounting with checkpoint=enable, it will revert back to its apparent state when it was first mounted with checkpoint=disable. This is useful for situations where you wish to be able to roll back the state of the disk in case of some critical failure. Signed-off-by: Daniel Rosenberg <drosen@google.com> [Jaegeuk Kim: use SB_RDONLY instead of MS_RDONLY] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-30f2fs: add to account skip count of background GCChao Yu
This patch adds to account skip count of background GC, and show stat info via 'status' debugfs entry. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-28f2fs: return correct errno in f2fs_gcJaegeuk Kim
This fixes overriding error number in f2fs_gc. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-28f2fs: avoid f2fs_bug_on if f2fs_get_meta_page_nofail got EIOJaegeuk Kim
This patch avoids BUG_ON when f2fs_get_meta_page_nofail got EIO during xfstests/generic/475. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-19f2fs: add new idle interval timing for discard and gc pathsSahitya Tummala
This helps to control the frequency of submission of discard and GC requests independently, based on the need. Suggested-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-12f2fs: add SPDX license identifiersChao Yu
Remove the verbose license text from f2fs files and replace them with SPDX tags. This does not change the license of any of the code. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-20f2fs: readahead encrypted block during GCChao Yu
During GC, for each encrypted block, we will read block synchronously into meta page, and then submit it into current cold data log area. So this block read model with 4k granularity can make poor performance, like migrating non-encrypted block, let's readahead encrypted block as well to improve migration performance. To implement this, we choose meta page that its index is old block address of the encrypted block, and readahead ciphertext into this page, later, if readaheaded page is still updated, we will load its data into target meta page, and submit the write IO. Note that for OPU, truncation, deletion, we need to invalid meta page after we invalid old block address, to make sure we won't load invalid data from target meta page during encrypted block migration. for ((i = 0; i < 1000; i++)) do { xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync"; } done for ((i = 0; i < 1000; i+=2)) do { rm /mnt/f2fs/dir/$i; } done ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0); Before: gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc] gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1 gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc] gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc] gc-6549 [001] .... 214682.213902: block_plug: [gc] gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc] gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1 gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc] gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc] gc-6549 [001] .... 214682.226414: block_plug: [gc] gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc] gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1 gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc] gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc] gc-6549 [001] .... 214682.226911: block_plug: [gc] gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc] gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1 After: gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc] gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc] gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc] gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc] gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc] gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc] gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc] gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc] gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc] gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc] gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc] gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc] gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc] gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc] gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc] gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc] gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc] gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc] gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc] gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1 gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc] gc-5678 [003] .... 214327.026046: block_plug: [gc] Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-20f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gcJaegeuk Kim
The f2fs_gc() called by f2fs_balance_fs() requires to be called outside of fi->i_gc_rwsem[WRITE], since f2fs_gc() can try to grab it in a loop. If it hits the miximum retrials in GC, let's give a chance to release gc_mutex for a short time in order not to go into live lock in the worst case. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14f2fs: rework fault injection handling to avoid a warningArnd Bergmann
When CONFIG_F2FS_FAULT_INJECTION is disabled, we get a warning about an unused label: fs/f2fs/segment.c: In function '__submit_discard_cmd': fs/f2fs/segment.c:1059:1: error: label 'submit' defined but not used [-Werror=unused-label] This could be fixed by adding another #ifdef around it, but the more reliable way of doing this seems to be to remove the other #ifdefs where that is easily possible. By defining time_to_inject() as a trivial stub, most of the checks for CONFIG_F2FS_FAULT_INJECTION can go away. This also leads to nicer formatting of the code. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01f2fs: fix to propagate error from __get_meta_page()Chao Yu
If caller of __get_meta_page() can handle error, let's propagate error from __get_meta_page(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-28f2fs: fix to skip GC if type in SSA and SIT is inconsistentChao Yu
If segment type in SSA and SIT is inconsistent, we will encounter below BUG_ON during GC, to avoid this panic, let's just skip doing GC on such segment. The bug is triggered with image reported in below link: https://bugzilla.kernel.org/show_bug.cgi?id=200223 [ 388.060262] ------------[ cut here ]------------ [ 388.060268] kernel BUG at /home/y00370721/git/devf2fs/gc.c:989! [ 388.061172] invalid opcode: 0000 [#1] SMP [ 388.061773] Modules linked in: f2fs(O) bluetooth ecdh_generic xt_tcpudp iptable_filter ip_tables x_tables lp ttm drm_kms_helper drm intel_rapl sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel fb_sys_fops ppdev aes_x86_64 syscopyarea crypto_simd sysfillrect parport_pc joydev sysimgblt glue_helper parport cryptd i2c_piix4 serio_raw mac_hid btrfs hid_generic usbhid hid raid6_pq psmouse pata_acpi floppy [ 388.064247] CPU: 7 PID: 4151 Comm: f2fs_gc-7:0 Tainted: G O 4.13.0-rc1+ #26 [ 388.065306] Hardware name: Xen HVM domU, BIOS 4.1.2_115-900.260_ 11/06/2015 [ 388.066058] task: ffff880201583b80 task.stack: ffffc90004d7c000 [ 388.069948] RIP: 0010:do_garbage_collect+0xcc8/0xcd0 [f2fs] [ 388.070766] RSP: 0018:ffffc90004d7fc68 EFLAGS: 00010202 [ 388.071783] RAX: ffff8801ed227000 RBX: 0000000000000001 RCX: ffffea0007b489c0 [ 388.072700] RDX: ffff880000000000 RSI: 0000000000000001 RDI: ffffea0007b489c0 [ 388.073607] RBP: ffffc90004d7fd58 R08: 0000000000000003 R09: ffffea0007b489dc [ 388.074619] R10: 0000000000000000 R11: 0052782ab317138d R12: 0000000000000018 [ 388.075625] R13: 0000000000000018 R14: ffff880211ceb000 R15: ffff880211ceb000 [ 388.076687] FS: 0000000000000000(0000) GS:ffff880214fc0000(0000) knlGS:0000000000000000 [ 388.083277] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 388.084536] CR2: 0000000000e18c60 CR3: 00000001ecf2e000 CR4: 00000000001406e0 [ 388.085748] Call Trace: [ 388.086690] ? find_next_bit+0xb/0x10 [ 388.088091] f2fs_gc+0x1a8/0x9d0 [f2fs] [ 388.088888] ? lock_timer_base+0x7d/0xa0 [ 388.090213] ? try_to_del_timer_sync+0x44/0x60 [ 388.091698] gc_thread_func+0x342/0x4b0 [f2fs] [ 388.092892] ? wait_woken+0x80/0x80 [ 388.094098] kthread+0x109/0x140 [ 388.095010] ? f2fs_gc+0x9d0/0x9d0 [f2fs] [ 388.096043] ? kthread_park+0x60/0x60 [ 388.097281] ret_from_fork+0x25/0x30 [ 388.098401] Code: ff ff 48 83 e8 01 48 89 44 24 58 e9 27 f8 ff ff 48 83 e8 01 e9 78 fc ff ff 48 8d 78 ff e9 17 fb ff ff 48 83 ef 01 e9 4d f4 ff ff <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 [ 388.100864] RIP: do_garbage_collect+0xcc8/0xcd0 [f2fs] RSP: ffffc90004d7fc68 [ 388.101810] ---[ end trace 81c73d6e6b7da61d ]--- Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>