summaryrefslogtreecommitdiff
path: root/fs/f2fs/data.c
AgeCommit message (Collapse)Author
2017-09-12Merge tag 'f2fs-for-4.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've mostly tuned f2fs to provide better user experience for Android. Especially, we've worked on atomic write feature again with SQLite community in order to support it officially. And we added or modified several facilities to analyze and enhance IO behaviors. Major changes include: - add app/fs io stat - add inode checksum feature - support project/journalled quota - enhance atomic write with new ioctl() which exposes feature set - enhance background gc/discard/fstrim flows with new gc_urgent mode - add F2FS_IOC_FS{GET,SET}XATTR - fix some quota flows" * tag 'f2fs-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (63 commits) f2fs: hurry up to issue discard after io interruption f2fs: fix to show correct discard_granularity in sysfs f2fs: detect dirty inode in evict_inode f2fs: clear radix tree dirty tag of pages whose dirty flag is cleared f2fs: speed up gc_urgent mode with SSR f2fs: better to wait for fstrim completion f2fs: avoid race in between read xattr & write xattr f2fs: make get_lock_data_page to handle encrypted inode f2fs: use generic terms used for encrypted block management f2fs: introduce f2fs_encrypted_file for clean-up Revert "f2fs: add a new function get_ssr_cost" f2fs: constify super_operations f2fs: fix to wake up all sleeping flusher f2fs: avoid race in between atomic_read & atomic_inc f2fs: remove unneeded parameter of change_curseg f2fs: update i_flags correctly f2fs: don't check inode's checksum if it was dirtied or writebacked f2fs: don't need to update inode checksum for recovery f2fs: trigger fdatasync for non-atomic_write file f2fs: fix to avoid race in between aio and gc ...
2017-09-08mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPYJérôme Glisse
Introduce a new migration mode that allow to offload the copy to a device DMA engine. This changes the workflow of migration and not all address_space migratepage callback can support this. This is intended to be use by migrate_vma() which itself is use for thing like HMM (see include/linux/hmm.h). No additional per-filesystem migratepage testing is needed. I disables MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i added comment in those to explain why (part of this patch). The commit message is unclear it should say that any callback that wish to support this new mode need to be aware of the difference in the migration flow from other mode. Some of these callbacks do extra locking while copying (aio, zsmalloc, balloon, ...) and for DMA to be effective you want to copy multiple pages in one DMA operations. But in the problematic case you can not easily hold the extra lock accross multiple call to this callback. Usual flow is: For each page { 1 - lock page 2 - call migratepage() callback 3 - (extra locking in some migratepage() callback) 4 - migrate page state (freeze refcount, update page cache, buffer head, ...) 5 - copy page 6 - (unlock any extra lock of migratepage() callback) 7 - return from migratepage() callback 8 - unlock page } The new mode MIGRATE_SYNC_NO_COPY: 1 - lock multiple pages For each page { 2 - call migratepage() callback 3 - abort in all problematic migratepage() callback 4 - migrate page state (freeze refcount, update page cache, buffer head, ...) } // finished all calls to migratepage() callback 5 - DMA copy multiple pages 6 - unlock all the pages To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a new callback migratepages() (for instance) that deals with multiple pages in one transaction. Because the problematic cases are not important for current usage I did not wanted to complexify this patchset even more for no good reason. Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Nellans <dnellans@nvidia.com> Cc: Evgeny Baskakov <ebaskakov@nvidia.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mark Hairgrove <mhairgrove@nvidia.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Sherry Cheung <SCheung@nvidia.com> Cc: Subhash Gutti <sgutti@nvidia.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Bob Liu <liubo95@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-07f2fs: make get_lock_data_page to handle encrypted inodeJaegeuk Kim
This patch refactors get_lock_data_page() to handle encryption case directly. In order to do that, it introduces common f2fs_submit_page_read(). Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-05f2fs: use generic terms used for encrypted block managementJaegeuk Kim
This patch renames functions regarding to buffer management via META_MAPPING used for encrypted blocks especially. We can actually use them in generic way. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-05f2fs: introduce f2fs_encrypted_file for clean-upJaegeuk Kim
This patch replaces (f2fs_encrypted_inode() && S_ISREG()) with f2fs_encrypted_file(), which gives no functional change. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-23block: replace bi_bdev with a gendisk pointer and partitions indexChristoph Hellwig
This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-21f2fs: merge equivalent flags F2FS_GET_BLOCK_[READ|DIO]Qiuyang Sun
Currently, the two flags F2FS_GET_BLOCK_[READ|DIO] are totally equivalent and can be used interchangably in all scenarios they are involved in. Neither of the flags is referenced in f2fs_map_blocks(), making them both the default case. To remove the ambiguity, this patch merges both flags into F2FS_GET_BLOCK_DEFAULT, and introduces an enum for all distinct flags. Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-09f2fs: add app/fs io statChao Yu
This patch enables inner app/fs io stats and introduces below virtual fs nodes for exposing stats info: /sys/fs/f2fs/<dev>/iostat_enable /proc/fs/f2fs/<dev>/iostat_info Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix wrong stat assignment] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-31f2fs: enhance on-disk inode structure scalabilityChao Yu
This patch add new flag F2FS_EXTRA_ATTR storing in inode.i_inline to indicate that on-disk structure of current inode is extended. In order to extend, we changed the inode structure a bit: Original one: struct f2fs_inode { ... struct f2fs_extent i_ext; __le32 i_addr[DEF_ADDRS_PER_INODE]; __le32 i_nid[DEF_NIDS_PER_INODE]; } Extended one: struct f2fs_inode { ... struct f2fs_extent i_ext; union { struct { __le16 i_extra_isize; __le16 i_padding; __le32 i_extra_end[0]; }; __le32 i_addr[DEF_ADDRS_PER_INODE]; }; __le32 i_nid[DEF_NIDS_PER_INODE]; } Once F2FS_EXTRA_ATTR is set, we will steal four bytes in the head of i_addr field for storing i_extra_isize and i_padding. with i_extra_isize, we can calculate actual size of reserved space in i_addr, available attribute fields included in total extra attribute fields for current inode can be described as below: +--------------------+ | .i_mode | | ... | | .i_ext | +--------------------+ | .i_extra_isize |-----+ | .i_padding | | | .i_prjid | | | .i_atime_extra | | | .i_ctime_extra | | | .i_mtime_extra |<----+ | .i_inode_cs |<----- store blkaddr/inline from here | .i_xattr_cs | | ... | +--------------------+ | | | block address | | | +--------------------+ | .i_nid | +--------------------+ | node_footer | | (nid, ino, offset) | +--------------------+ Hence, with this patch, we would enhance scalability of f2fs inode for storing more newly added attribute. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-31f2fs: make max inline size changeableChao Yu
This patch tries to make below macros calculating max inline size, inline dentry field size considerring reserving size-changeable space: - MAX_INLINE_DATA - NR_INLINE_DENTRY - INLINE_DENTRY_BITMAP_SIZE - INLINE_RESERVED_SIZE Then, when inline_{data,dentry} options is enabled, it allows us to reserve inline space with different size flexibly for adding newly introduced inode attribute. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-10Merge tag 'for-f2fs-4.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've added new features such as disk quota and statx, and modified internal bio management flow to merge more IOs depending on block types. We've also made internal threads freezeable for Android battery life. In addition to them, there are some patches to avoid lock contention as well as a couple of deadlock conditions. Enhancements: - support usrquota, grpquota, and statx - manage DATA/NODE typed bios separately to serialize more IOs - modify f2fs_lock_op/wio_mutex to avoid lock contention - prevent lock contention in migratepage Bug fixes: - fix missing load of written inode flag - fix worst case victim selection in GC - freezeable GC and discard threads for Android battery life - sanitize f2fs metadata to deal with security hole - clean up sysfs-related code and docs" * tag 'for-f2fs-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (59 commits) f2fs: support plain user/group quota f2fs: avoid deadlock caused by lock order of page and lock_op f2fs: use spin_{,un}lock_irq{save,restore} f2fs: relax migratepage for atomic written page f2fs: don't count inode block in in-memory inode.i_blocks Revert "f2fs: fix to clean previous mount option when remount_fs" f2fs: do not set LOST_PINO for renamed dir f2fs: do not set LOST_PINO for newly created dir f2fs: skip ->writepages for {mete,node}_inode during recovery f2fs: introduce __check_sit_bitmap f2fs: stop gc/discard thread in prior during umount f2fs: introduce reserved_blocks in sysfs f2fs: avoid redundant f2fs_flush after remount f2fs: report # of free inodes more precisely f2fs: add ioctl to do gc with target block address f2fs: don't need to check encrypted inode for partial truncation f2fs: measure inode.i_blocks as generic filesystem f2fs: set CP_TRIMMED_FLAG correctly f2fs: require key for truncate(2) of encrypted file f2fs: move sysfs code from super.c to fs/f2fs/sysfs.c ...
2017-07-08f2fs: support plain user/group quotaChao Yu
This patch adds to support plain user/group quota. Change Note by Jaegeuk Kim. - Use f2fs page cache for quota files in order to consider garbage collection. so, quota files are not tolerable for sudden power-cuts, so user needs to do quotacheck. - setattr() calls dquot_transfer which will transfer inode->i_blocks. We can't reclaim that during f2fs_evict_inode(). So, we need to count node blocks as well in order to match i_blocks with dquot's space. Note that, Chao wrote a patch to count inode->i_blocks without inode block. (f2fs: don't count inode block in in-memory inode.i_blocks) - in f2fs_remount, we need to make RW in prior to dquot_resume. - handle fault_injection case during f2fs_quota_off_umount - TODO: Project quota Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-07f2fs: avoid deadlock caused by lock order of page and lock_opJaegeuk Kim
- punch_hole - fill_zero - f2fs_lock_op - get_new_data_page - lock_page - f2fs_write_data_pages - lock_page - do_write_data_page - f2fs_lock_op Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-07f2fs: relax migratepage for atomic written pageJaegeuk Kim
In order to avoid lock contention for atomic written pages, we'd better give EBUSY in f2fs_migrate_page when mode is asynchronous. We expect it will be released soon as transaction commits. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-07f2fs: skip ->writepages for {mete,node}_inode during recoveryChao Yu
Skip ->writepages in prior to ->writepage for {meta,node}_inode during recovery, hence unneeded loop in ->writepages can be avoided. Moreover, check SBI_POR_DOING earlier while writebacking pages. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-04f2fs: dax: fix races between page faults and truncating pagesQiuyang Sun
Currently in F2FS, page faults and operations that truncate the pagecahe or data blocks, are completely unsynchronized. This can result in page fault faulting in a page into a range that we are changing after truncating, and thus we can end up with a page mapped to disk blocks that will be shortly freed. Filesystem corruption will shortly follow. This patch fixes the problem by creating new rw semaphore i_mmap_sem in f2fs_inode_info and grab it for functions removing blocks from extent tree and for read over page faults. The mechanism is similar to that in ext4. Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-06-09block: switch bios to blk_status_tChristoph Hellwig
Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-23f2fs: introduce io_list for serialize data/node IOsChao Yu
Serialize data/node IOs by using fifo list instead of mutex lock, it will help to enhance concurrency of f2fs, meanwhile keeping LFS IO semantics. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-23f2fs: avoid f2fs_lock_op for IPU writesJaegeuk Kim
Currently, if we do get_node_of_data before f2fs_lock_op, there may be dead lock as follows, where process A would be in infinite loop, and B will NOT be awaked. Process A(cp): Process B: f2fs_lock_all(sbi) get_dnode_of_data <---- lock dn.node_page flush_nodes f2fs_lock_op So, this patch adds f2fs_trylock_op to avoid f2fs_lock_op done by IPU. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-23f2fs: split bio cacheJaegeuk Kim
Split DATA/NODE type bio cache according to different temperature, so write IOs with the same temperature can be merged in corresponding bio cache as much as possible, otherwise, different temperature write IOs submitting into one bio cache will always cause split of bio. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-23f2fs: remove unnecessary read cases in merged IO flowJaegeuk Kim
Merged IO flow doesn't need to care about read IOs. f2fs_submit_merged_bio -> f2fs_submit_merged_write f2fs_submit_merged_bios -> f2fs_submit_merged_writes f2fs_submit_merged_bio_cond -> f2fs_submit_merged_write_cond Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-03f2fs: Make flush bios explicitely syncJan Kara
Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...} definitions. generic_make_request_checks() however strips REQ_FUA and REQ_PREFLUSH flags from a bio when the storage doesn't report volatile write cache and thus write effectively becomes asynchronous which can lead to performance regressions. Fix the problem by making sure all bios which are synchronous are properly marked with REQ_SYNC. Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3 Cc: stable@vger.kernel.org # 4.9+ CC: Jaegeuk Kim <jaegeuk@kernel.org> CC: linux-f2fs-devel@lists.sourceforge.net Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-03f2fs: release cp and dnode lock before IPUHou Pengyang
We don't need to rewrite the page under cp_rwsem and dnode locks. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: introduce valid_ipu_blkaddr to clean upJaegeuk Kim
This patch introduces valid_ipu_blkaddr to clean up checking block address for inplace-update. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: lookup extent cache first under IPU scenarioHou Pengyang
If a page is cold, NOT atomit written and need_ipu now, there is a high probability that IPU should be adapted. For IPU, we try to check extent tree to get the block index first, instead of reading the dnode page, where may lead to an useless dnode IO, since no need to update the dnode index for IPU. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: reconstruct code to write a data pageHou Pengyang
This patch introduces encrypt_one_page which encrypts one data page before submit_bio, and change the use of need_inplace_update. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-24f2fs: fix out-of free segmentsJaegeuk Kim
This patch also reverts d0db7703ac1 ("f2fs: do SSR in higher priority"). This patch fixes out of free segments caused by many small file creation by 1) mkfs -s 1 2G 2) mount 3) untar - preoduce 60000 small files burstly 4) sync - flush node pages - flush imeta Here, when we do f2fs_balance_fs, we missed # of imeta blocks, resulting in skipping to check has_not_enough_free_secs. Another test is done by 1) mkfs -s 12 2G 2) mount 3) untar - preoduce 60000 small files burstly 4) sync - flush node pages - flush imeta In this case, this patch also fixes wrong block allocation under large section size. Reported-by: William Brana <wbrana@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-19f2fs: introduce async IPU policyHou Pengyang
This patch introduces an ASYNC IPU policy. Under senario of large # of async updating(e.g. log writing in Android), disk would be seriously fragmented, and higher frequent gc would be triggered. This patch uses IPU to rewrite the async update writting, since async is NOT sensitive to io latency. Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
2017-04-19f2fs: unlock cp_rwsem early for IPU writesChao Yu
For IPU writes, there won't be any udpates in dnode page since we will reuse old block address instead of allocating new one, so we don't need to lock cp_rwsem during IPU IO submitting. Signed-off-by: Chao Yu <yuchao0@huawei.com>
2017-04-10f2fs: fix comment on f2fs_flush_merged_bios() after 86531d6bTomohiro Kusumi
Callers are to unlock the page on failure after 86531d6b. Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-05f2fs: submit bio of in-place-update pagesJaegeuk Kim
This patch tries to split in-place-update bios from sequential bios. Suggested-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-05f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONEJaegeuk Kim
If two threads try to flush dirty pages in different inodes respectively, f2fs_write_data_pages() will produce WRITE and WRITE_SYNC one at a time, resulting in a lot of 4KB seperated IOs. So, this patch gives higher priority to WB_SYNC_ALL IOs and gathers write IOs with a big WRITE_SYNC'ed bio. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-05f2fs: write small sized IO to hot logJaegeuk Kim
It would better split small and large IOs separately in order to get more consecutive big writes. The default threshold is set to 64KB, but configurable by sysfs/min_hot_blocks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-03-25f2fs: allow write page cache when writting cpYunlei He
This patch allow write data to normal file when writting new checkpoint. We relax three limitations for write_begin path: 1. data allocation 2. node allocation 3. variables in checkpoint Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-03-21f2fs: fix bad prefetchw of NULL pageKinglong Mee
For f2fs_read_data_pages, the f2fs_mpage_readpages gets "page == NULL", so that, the prefetchw(&page->flags) is operated on NULL. Fixes: f1e8866016 ("f2fs: expose f2fs_mpage_readpages") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-03-21f2fs: fix stale ATOMIC_WRITTEN_PAGE private pointerJaegeuk Kim
When I forced to enable atomic operations intentionally, I could hit the below panic, since we didn't clear page->private in f2fs_invalidate_page called by file truncation. The panic occurs due to NULL mapping having page->private. BUG: unable to handle kernel paging request at ffffffffffffffff IP: drop_buffers+0x38/0xe0 PGD 5d00c067 PUD 5d00e067 PMD 0 CPU: 3 PID: 1648 Comm: fsstress Tainted: G D OE 4.10.0+ #5 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 task: ffff9151952863c0 task.stack: ffffaaec40db4000 RIP: 0010:drop_buffers+0x38/0xe0 RSP: 0018:ffffaaec40db74c8 EFLAGS: 00010292 Call Trace: ? page_referenced+0x8b/0x170 try_to_free_buffers+0xc5/0xe0 try_to_release_page+0x49/0x50 shrink_page_list+0x8bc/0x9f0 shrink_inactive_list+0x1dd/0x500 ? shrink_active_list+0x2c0/0x430 shrink_node_memcg+0x5eb/0x7c0 shrink_node+0xe1/0x320 do_try_to_free_pages+0xef/0x2e0 try_to_free_pages+0xe9/0x190 __alloc_pages_slowpath+0x390/0xe70 __alloc_pages_nodemask+0x291/0x2b0 alloc_pages_current+0x95/0x140 __page_cache_alloc+0xc4/0xe0 pagecache_get_page+0xab/0x2a0 grab_cache_page_write_begin+0x20/0x40 get_read_data_page+0x2e6/0x4c0 [f2fs] ? f2fs_mark_inode_dirty_sync+0x16/0x30 [f2fs] ? truncate_data_blocks_range+0x238/0x2b0 [f2fs] get_lock_data_page+0x30/0x190 [f2fs] __exchange_data_block+0xaaf/0xf40 [f2fs] f2fs_fallocate+0x418/0xd00 [f2fs] vfs_fallocate+0x157/0x220 SyS_fallocate+0x48/0x80 Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> [Chao Yu: use INMEM_INVALIDATE for better tracing] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-03-02sched/headers: Prepare to move signal wakeup & sigpending methods from ↵Ingo Molnar
<linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-27f2fs: fix to update F2FS_{CP_}WB_DATA count correctlyChao Yu
We should only account F2FS_{CP_}WB_DATA IOs for write path, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27f2fs: show simple call stack in fault injection messageChao Yu
Previously kernel message can show that in which function we do the injection, but unfortunately, most of the caller are the same, for tracking more information of injection path, it needs to show upper caller's name. This patch supports that ability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27f2fs: no need lock_op in f2fs_write_inline_dataYunlei He
Similar as f2fs_write_inode, f2fs_write_inline_data just mark inode page dirty, so it's no need to write inline data under read lock of cp_rwsem. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27f2fs: avoid m_flags overlay when allocating more data blocksKinglong Mee
When more than one data blocks are allocated, the F2FS_MAP_UNWRITTEN/MAPPED flags will be overlapped by F2FS_MAP_NEW at the later times. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27f2fs: init local extent_info to avoid stale stack info in tpHou Pengyang
To avoid such stale(fops, blk, len) info in f2fs_lookup_extent_tree_end tp dio-23095 [005] ...1 17878.856859: f2fs_lookup_extent_tree_end: dev = (259,30), ino = 856, pgofs = 0, ext_info(fofs: 3441207040, blk: 4294967232, len: 3481143808) Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23f2fs: do not wait for writeback in write_beginJaegeuk Kim
Otherwise we can get livelock like below. [79880.428136] dbench D 0 18405 18404 0x00000000 [79880.428139] Call Trace: [79880.428142] __schedule+0x219/0x6b0 [79880.428144] schedule+0x36/0x80 [79880.428147] schedule_timeout+0x243/0x2e0 [79880.428152] ? update_sd_lb_stats+0x16b/0x5f0 [79880.428155] ? ktime_get+0x3c/0xb0 [79880.428157] io_schedule_timeout+0xa6/0x110 [79880.428161] __lock_page+0xf7/0x130 [79880.428164] ? unlock_page+0x30/0x30 [79880.428167] pagecache_get_page+0x16b/0x250 [79880.428171] grab_cache_page_write_begin+0x20/0x40 [79880.428182] f2fs_write_begin+0xa2/0xdb0 [f2fs] [79880.428192] ? f2fs_mark_inode_dirty_sync+0x16/0x30 [f2fs] [79880.428197] ? kmem_cache_free+0x79/0x200 [79880.428203] ? __mark_inode_dirty+0x17f/0x360 [79880.428206] generic_perform_write+0xbb/0x190 [79880.428213] ? file_update_time+0xa4/0xf0 [79880.428217] __generic_file_write_iter+0x19b/0x1e0 [79880.428226] f2fs_file_write_iter+0x9c/0x180 [f2fs] [79880.428231] __vfs_write+0xc5/0x140 [79880.428235] vfs_write+0xb2/0x1b0 [79880.428238] SyS_write+0x46/0xa0 [79880.428242] entry_SYSCALL_64_fastpath+0x1e/0xad Fixes: cae96a5c8ab6 ("f2fs: check io submission more precisely") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23f2fs: remove preflush for nobarrier caseJaegeuk Kim
This patch removes REQ_PREFLUSH in the nobarrier case. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23f2fs: check last page index in cached bio to decide submissionJaegeuk Kim
If the cached bio has the last page's index, then we need to submit it. Otherwise, we don't need to submit it and can wait for further IO merges. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23f2fs: check io submission more preciselyJaegeuk Kim
This patch check IO submission more precisely than previous rough check. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23f2fs: call internal __write_data_page directlyJaegeuk Kim
This patch introduces __write_data_page to call it by f2fs_write_cache_pages directly.. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22f2fs: fix a dead loop in f2fs_fiemap()Wei Fang
A dead loop can be triggered in f2fs_fiemap() using the test case as below: ... fd = open(); fallocate(fd, 0, 0, 4294967296); ioctl(fd, FS_IOC_FIEMAP, fiemap_buf); ... It's caused by an overflow in __get_data_block(): ... bh->b_size = map.m_len << inode->i_blkbits; ... map.m_len is an unsigned int, and bh->b_size is a size_t which is 64 bits on 64 bits archtecture, type conversion from an unsigned int to a size_t will result in an overflow. In the above-mentioned case, bh->b_size will be zero, and f2fs_fiemap() will call get_data_block() at block 0 again an again. Fix this by adding a force conversion before left shift. Signed-off-by: Wei Fang <fangwei1@huawei.com> Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22f2fs: do not preallocate blocks which has wrong bufferJaegeuk Kim
Sheng Yong reports needless preallocation if write(small_buffer, large_size) is called. In that case, f2fs preallocates large_size, but vfs returns early due to small_buffer size. Let's detect it before preallocation phase in f2fs. Reported-by: Sheng Yong <shengyong1@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22f2fs: introduce FI_ATOMIC_COMMITChao Yu
This patch introduces a new flag to indicate inode status of doing atomic write committing, so that, we can keep atomic write status for inode during atomic committing, then we can skip GCing pages of atomic write inode, that avoids random GCed datas being mixed with current transaction, so isolation of transaction can be kept. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>