summaryrefslogtreecommitdiff
path: root/fs/f2fs/node.c
AgeCommit message (Collapse)Author
2015-05-28f2fs crypto: add encryption support in read/write pathsJaegeuk Kim
This patch adds encryption support in read and write paths. Note that, in f2fs, we need to consider cleaning operation. In cleaning procedure, we must avoid encrypting and decrypting written blocks. So, this patch implements move_encrypted_block(). Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28f2fs: do not re-lookup nat cache with same nidChao Yu
In set_node_addr, we try to lookup cached nat entry of inode and then set flag in it. But previously in this function, we have already grabbed nat entry with current node id, if the node id is the same as the one of inode, we do not need to lookup it in cache again. So this patch adds condition judgment for reducing unneeded lookup. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28f2fs: add need_dentry_markJaegeuk Kim
This patch introduces need_dentry_mark() to clean up and avoid redundant node locks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28f2fs: add sbi and page pointer in f2fs_io_infoJaegeuk Kim
This patch adds f2fs_sb_info and page pointers in f2fs_io_info structure. With this change, we can reduce a lot of parameters for IO functions. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-07f2fs: make has_fsynced_inode staticChao Yu
has_fsynced_inode() has no other caller out of node.c, make it static. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10f2fs: fix unlocked nat set cache operationWanpeng Li
nm_i->nat_tree_lock is used to sync both the operations of nat entry cache tree and nat set cache tree, however, it isn't held when flush nat entries during checkpoint which lead to potential race, this patch fix it by holding the lock when gang lookup nat set cache and delete item from nat set cache. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10f2fs: report -ENOENT for unreached data indicesJaegeuk Kim
If inode has inline_data, it should report -ENOENT when accessing out-of-bound region. This is used by f2fs_fiemap which treats -ENOENT with no error. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10f2fs: clear page's up-to-date if block was deallocatedJaegeuk Kim
If page's on-disk block was deallocated, let's remove up-to-date flag to avoid further access with wrong contents. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03f2fs: add core functions for rb-tree extent cacheChao Yu
This patch adds core functions including slab cache init function and init/lookup/update/shrink/destroy function for rb-tree based extent cache. Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about detail design and implementation of extent cache. Todo: * register rb-based extent cache shrink with mm shrink interface. v2: o move set_extent_info and __is_{extent,back,front}_mergeable into f2fs.h. o introduce __{attach,detach}_extent_node for code readability. o add cond_resched() when fail to invoke kmem_cache_alloc/radix_tree_insert. o fix some coding style and typo issues. v3: o fix oops due to using an unassigned pointer. o use list_del to remove extent node in shrink list. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Changman Lee <cm224.lee@samsung.com> [Jaegeuk Kim: add static for some funcitons and declare in f2fs.h] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11f2fs: fix accessing wrong indexed data blocksJaegeuk Kim
This patch fixes the following test. This causes: attempt to access beyond end of device sdb2: rw=16384, want=14413962000, limit=16777216 The reason is: - f2fs_write_begin - f2fs_convert_inline_inode returns -ENOSPC - f2fs_write_failed - truncate_blocks - truncate_partial_data_page - find_data_page - get_dnode_of_data returns wrong data index retrieved from inline_data - f2fs_submit_page_bio(wrong data index) - submit_bio(wrong data index) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11f2fs: check node page contents all the timeJaegeuk Kim
In get_node_page, if the page is up-to-date, we assumed that the page was not reclaimed at all. But, sometimes it was reported that its contents was missing. So, just for sure, let's check its mapping and contents. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11f2fs: merge {invalidate,release}page for meta/node/data pagesChao Yu
This patch merges ->{invalidate,release}page function for meta/node/data pages. After this, duplication of codes could be removed. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11f2fs: keep PagePrivate during releasepageJaegeuk Kim
If PagePrivate is removed by releasepage, f2fs loses counting dirty pages. e.g., try_to_release_page will not release page when the page is dirty, but our releasepage removes PagePrivate. [<ffffffff81188d75>] try_to_release_page+0x35/0x50 [<ffffffff811996f9>] invalidate_inode_pages2_range+0x2f9/0x3b0 [<ffffffffa02a7f54>] ? truncate_blocks+0x384/0x4d0 [f2fs] [<ffffffffa02b7583>] ? f2fs_direct_IO+0x283/0x290 [f2fs] [<ffffffffa02b7fb0>] ? get_data_block_fiemap+0x20/0x20 [f2fs] [<ffffffff8118aa53>] generic_file_direct_write+0x163/0x170 [<ffffffff8118ad06>] __generic_file_write_iter+0x2a6/0x350 [<ffffffff8118adef>] generic_file_write_iter+0x3f/0xb0 [<ffffffff81203081>] new_sync_write+0x81/0xb0 [<ffffffff81203837>] vfs_write+0xb7/0x1f0 [<ffffffff81204459>] SyS_write+0x49/0xb0 [<ffffffff817c286d>] system_call_fastpath+0x16/0x1b Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11f2fs: merge flags in struct f2fs_sb_infoChao Yu
Currently, there are several variables with Boolean type as below: struct f2fs_sb_info { ... int s_dirty; bool need_fsck; bool s_closing; ... bool por_doing; ... } For this there are some issues: 1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean type variables by compiler. 2. if we continuously add new flag into f2fs_sb_info, structure will be messed up. So in this patch, we try to: 1. switch s_dirty to Boolean type variable since it has two status 0/1. 2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag. 3. introduce an enum type which can indicate different states of sbi. 4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag to operate flags for sbi. After that, above issues will be fixed. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09f2fs: free radix_tree_nodes used by nat_set entriesJaegeuk Kim
In the normal case, the radix_tree_nodes are freed successfully. But, when cp_error was detected, we should destroy them forcefully. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09f2fs: avoid potential unnecessary codesJaegeuk Kim
This patch relocates some operations to avoid unnecessary execution. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09f2fs: activate f2fs_trace_pidJaegeuk Kim
This patch activates f2fs_trace_pid. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09f2fs: use f2fs_io_info to clean up messy parameters during IO pathJaegeuk Kim
This patch cleans up parameters on IO paths. The key idea is to use f2fs_io_info adding a parameter, block address, and then use this structure as parameters. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09f2fs: use ra_meta_pages to simplify readahead code in restore_node_summaryChao Yu
Use more common function ra_meta_pages() with META_POR to readahead node blocks in restore_node_summary() instead of ra_sum_pages(), hence we can simplify the readahead code there, and also we can remove unused function ra_sum_pages(). changes from v2: o use invalidate_mapping_pages as before suggested by Changman Lee. changes from v1: o fix one bug when using truncate_inode_pages_range which is pointed out by Jaegeuk Kim. Reviewed-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09f2fs: merge two uchar variable in struct node_info to reduce memory costChao Yu
This patch moves one member of struct nat_entry: _flag_ to struct node_info, so _version_ in struct node_info and _flag_ which are unsigned char type will merge to one 32-bit space in register/memory. So the size of nat_entry will be reduced from 28 bytes to 24 bytes (for 64-bit machine, reduce its size from 40 bytes to 32 bytes) and then slab memory using by f2fs will be reduced. changes from v2: o update description of memory usage gain for 64-bit machine suggested by Changman Lee. changes from v1: o introduce inline copy_node_info() to copy valid data from node info suggested by Jaegeuk Kim, it can avoid bug. Reviewed-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09f2fs: change atomic and volatile write policiesJaegeuk Kim
This patch adds two new ioctls to release inmemory pages grabbed by atomic writes. o f2fs_ioc_abort_volatile_write - If transaction was failed, all the grabbed pages and data should be written. o f2fs_ioc_release_volatile_write - This is to enhance the performance of PERSIST mode in sqlite. In order to avoid huge memory consumption which causes OOM, this patch changes volatile writes to use normal dirty pages, instead blocked flushing to the disk as long as system does not suffer from memory pressure. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09f2fs: remove checking dirty_exceedJaegeuk Kim
We don't need to force to write dirty_exceeded for f2fs_balance_fs_bg. This flag was only meaningful to write bypassing conditions. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08f2fs: do retry operations with cond_reschedJaegeuk Kim
This patch revists retrial paths in f2fs. The basic idea is to use cond_resched instead of retrying from the very early stage. Suggested-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-05f2fs: call radix_tree_preload before radix_tree_insertJaegeuk Kim
This patch tries to fix: BUG: using smp_processor_id() in preemptible [00000000] code: f2fs_gc-254:0/384 (radix_tree_node_alloc+0x14/0x74) from [<c033d8a0>] (radix_tree_insert+0x110/0x200) (radix_tree_insert+0x110/0x200) from [<c02e8264>] (gc_data_segment+0x340/0x52c) (gc_data_segment+0x340/0x52c) from [<c02e8658>] (f2fs_gc+0x208/0x400) (f2fs_gc+0x208/0x400) from [<c02e8a98>] (gc_thread_func+0x248/0x28c) (gc_thread_func+0x248/0x28c) from [<c0139944>] (kthread+0xa0/0xac) (kthread+0xa0/0xac) from [<c0105ef8>] (ret_from_fork+0x14/0x3c) The reason is that f2fs calls radix_tree_insert under enabled preemption. So, before calling it, we need to call radix_tree_preload. Otherwise, we should use _GFP_WAIT for the radix tree, and use mutex or semaphore to cover the radix tree operations. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-03f2fs: use rw_semaphore for nat entry lockJaegeuk Kim
Previoulsy, we used rwlock for nat_entry lock. But, now we have a lot of complex operations in set_node_addr. (e.g., allocating kernel memories, handling radix_trees, and so on) So, this patches tries to change spinlock to rw_semaphore to give CPUs to other threads. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-03f2fs: fix missing kmem_cache_freeJaegeuk Kim
This patch fixes missing kmem_cache_free when handling errors. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-25f2fs: no more dirty_nat_entires when flushingChangman Lee
After flushing dirty nat entries, it has to be no more dirty nat entries. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-25f2fs: check dirty_nat_cnt before flushing nat entries in journalChangman Lee
It's meaningless to check dirty_nat_cnt after re-dirtying nat entries in journal. And although there are rooms for dirty nat entires if dirty_nat_cnt is zero, it's also meaningless to check __has_cursum_space. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-25f2fs: fix typos for the word "destroy" in jump labelsMarkus Elfring
Two jump labels were adjusted in the implementation of the create_node_manager_caches() function because these identifiers contained typos. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-19f2fs: submit bio for node blocks in the reclaim pathJaegeuk Kim
If a node page is request to be written during the reclaiming path, we should submit the bio to avoid pending to recliam it. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-19f2fs: introduce struct inode_management to wrap inner fieldsChao Yu
Now in f2fs, we have three inode cache: ORPHAN_INO, APPEND_INO, UPDATE_INO, and we manage fields related to inode cache separately in struct f2fs_sb_info for each inode cache type. This makes codes a bit messy, so that this patch intorduce a new struct inode_management to wrap inner fields as following which make codes more neat. /* for inner inode cache management */ struct inode_management { struct radix_tree_root ino_root; /* ino entry array */ spinlock_t ino_lock; /* for ino entry lock */ struct list_head ino_list; /* inode list head */ unsigned long ino_num; /* number of entries */ }; struct f2fs_sb_info { ... struct inode_management im[MAX_INO_ENTRY]; /* manage inode cache */ ... } Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-10f2fs: write node pages if checkpoint is not doingJaegeuk Kim
It needs to write node pages if checkpoint is not doing in order to avoid memory pressure. Reviewed-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-06f2fs: control the memory footprint used by ino entriesJaegeuk Kim
This patch adds to control the memory footprint used by ino entries. This will conduct best effort, not strictly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30f2fs: refactor flush_nat_entries to remove costly reorganizing opsJaegeuk Kim
Previously, f2fs tries to reorganize the dirty nat entries into multiple sets according to its nid ranges. This can improve the flushing nat pages, however, if there are a lot of cached nat entries, it becomes a bottleneck. This patch introduces a new set management flow by removing dirty nat list and adding a series of set operations when the nat entry becomes dirty. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23f2fs: use MAX_BIO_BLOCKS(sbi)Jaegeuk Kim
This patch cleans up a simple macro. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23f2fs: fix conditions to remain recovery information in f2fs_sync_fileJaegeuk Kim
This patch revisited whole the recovery information during the f2fs_sync_file. In this patch, there are three information to make a decision. a) IS_CHECKPOINTED, /* is it checkpointed before? */ b) HAS_FSYNCED_INODE, /* is the inode fsynced before? */ c) HAS_LAST_FSYNC, /* has the latest node fsync mark? */ And, the scenarios for our rule are based on: [Term] F: fsync_mark, D: dentry_mark 1. inode(x) | CP | inode(x) | dnode(F) 2. inode(x) | CP | inode(F) | dnode(F) 3. inode(x) | CP | dnode(F) | inode(x) | inode(F) 4. inode(x) | CP | dnode(F) | inode(F) 5. CP | inode(x) | dnode(F) | inode(DF) 6. CP | inode(DF) | dnode(F) 7. CP | dnode(F) | inode(DF) 8. CP | dnode(F) | inode(x) | inode(DF) For example, #3, the three conditions should be changed as follows. inode(x) | CP | dnode(F) | inode(x) | inode(F) a) x o o o o b) x x x x o c) x o o x o If f2fs_sync_file stops ------^, it should write inode(F) --------------^ So, the need_inode_block_update should return true, since c) get_nat_flag(e, HAS_LAST_FSYNC), is false. For example, #8, CP | alloc | dnode(F) | inode(x) | inode(DF) a) o x x x x b) x x x o c) o o x o If f2fs_sync_file stops -------^, it should write inode(DF) --------------^ Note that, the roll-forward policy should follow this rule, which means, if there are any missing blocks, we doesn't need to recover that inode. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23f2fs: introduce a flag to represent each nat entry informationJaegeuk Kim
This patch introduces a flag in the nat entry structure to merge various information such as checkpointed and fsync_done marks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09f2fs: refactor flush_sit_entries codes for reducing SIT writesChao Yu
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT writes"), we descripte the issue as below: "Although building NAT journal in cursum reduce the read/write work for NAT block, but previous design leave us lower performance when write checkpoint frequently for these cases: 1. if journal in cursum has already full, it's a bit of waste that we flush all nat entries to page for persistence, but not to cache any entries. 2. if journal in cursum is not full, we fill nat entries to journal util journal is full, then flush the left dirty entries to disk without merge journaled entries, so these journaled entries may be flushed to disk at next checkpoint but lost chance to flushed last time." Actually, we have the same problem in using SIT journal area. In this patch, firstly we will update sit journal with dirty entries as many as possible. Secondly if there is no space in sit journal, we will remove all entries in journal and walk through the whole dirty entry bitmap of sit, accounting dirty sit entries located in same SIT block to sit entry set. All entry sets are linked to list sit_entry_set in sm_info, sorted ascending order by count of entries in set. Later we flush entries in set which have fewest entries into journal as many as we can, and then flush dense set with merged entries to disk. In this way we can use sit journal area more effectively, also we will reduce SIT update, result in gaining in performance and saving lifetime of flash device. In my testing environment, it shows this patch can help to reduce SIT block update obviously. virtual machine + hard disk: fsstress -p 20 -n 400 -l 5 sit page num cp count sit pages/cp based 2006.50 1349.75 1.486 patched 1566.25 1463.25 1.070 Our latency of merging op is small when handling a great number of dirty SIT entries in flush_sit_entries: latency(ns) dirty sit count 36038 2151 49168 2123 37174 2232 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09f2fs: need fsck.f2fs when f2fs_bug_on is triggeredJaegeuk Kim
If any f2fs_bug_on is triggered, fsck.f2fs is needed. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-03f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SBJaegeuk Kim
This patch adds three inline functions to clean up dirty casting codes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-25f2fs: truncate stale block for inline_dataJaegeuk Kim
This verifies to truncate any allocated blocks, offset[0], by inline_data. Not figured out, but for making sure. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21f2fs: fix incorrect calculation with total/free inode numChao Yu
Theoretically, our total inodes number is the same as total node number, but there are three node ids are reserved in f2fs, they are 0, 1 (node nid), and 2 (meta nid), and they should never be used by user, so our total/free inode number calculated in ->statfs is wrong. This patch indroduces F2FS_RESERVED_NODE_NUM and then fixes this issue by recalculating total/free inode number with the macro. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21f2fs: remove rewrite_node_pageJaegeuk Kim
I think we need to let the dirty node pages remain in the page cache instead of rewriting them in their places. So, after done with successful recovery, write_checkpoint will flush all of them through the normal write path. Through this, we can avoid potential error cases in terms of block allocation. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21f2fs: handle EIO not to break fs consistencyJaegeuk Kim
There are two rules when EIO is occurred. 1. don't write any checkpoint data to preserve the previous checkpoint 2. don't lose the cached dentry/node/meta pages So, at first, this patch adds set_page_dirty in f2fs_write_end_io's failure. Then, writing checkpoint/dentry/node blocks is not allowed. Note that, for the data pages, we can't just throw away by redirtying them. Otherwise, kworker can fall into infinite loop to flush them. (Ref. xfstests/019) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21f2fs: unlock_page when node page is redirtied outJaegeuk Kim
This patch fixes missing unlock_page when a node page is redirtied out. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19f2fs: fix to recover inline_xattr/data and blocksJaegeuk Kim
This patch fixes not to skip xattr recovery and inline xattr/data recovery order. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19f2fs: should clear the inline_xattr flagJaegeuk Kim
During the recovery, we should clear the inline_xattr flag if its xattr node block is recovered. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19f2fs: fix the initial inode page for recoveryJaegeuk Kim
If a new inode page is needed for recover_dentry, we should assing i_inline as zero. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19f2fs: fix typoarter97
Fix typo and some grammatical errors. The words "filesystem" and "readahead" are being used without the space treewide. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-02f2fs: avoid skipping recover_inline_xattr after recover_inline_dataChao Yu
When we recover data of inode in roll-forward procedure, and the inode has both inline data and inline xattr. We may skip recovering inline xattr if we recover inline data form node page first. This patch will fix the problem that we lost inline xattr data in above scenario. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>