summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2014-04-07mm: per-thread vma cachingDavidlohr Bueso
This patch is a continuation of efforts trying to optimize find_vma(), avoiding potentially expensive rbtree walks to locate a vma upon faults. The original approach (https://lkml.org/lkml/2013/11/1/410), where the largest vma was also cached, ended up being too specific and random, thus further comparison with other approaches were needed. There are two things to consider when dealing with this, the cache hit rate and the latency of find_vma(). Improving the hit-rate does not necessarily translate in finding the vma any faster, as the overhead of any fancy caching schemes can be too high to consider. We currently cache the last used vma for the whole address space, which provides a nice optimization, reducing the total cycles in find_vma() by up to 250%, for workloads with good locality. On the other hand, this simple scheme is pretty much useless for workloads with poor locality. Analyzing ebizzy runs shows that, no matter how many threads are running, the mmap_cache hit rate is less than 2%, and in many situations below 1%. The proposed approach is to replace this scheme with a small per-thread cache, maximizing hit rates at a very low maintenance cost. Invalidations are performed by simply bumping up a 32-bit sequence number. The only expensive operation is in the rare case of a seq number overflow, where all caches that share the same address space are flushed. Upon a miss, the proposed replacement policy is based on the page number that contains the virtual address in question. Concretely, the following results are seen on an 80 core, 8 socket x86-64 box: 1) System bootup: Most programs are single threaded, so the per-thread scheme does improve ~50% hit rate by just adding a few more slots to the cache. +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 50.61% | 19.90 | | patched | 73.45% | 13.58 | +----------------+----------+------------------+ 2) Kernel build: This one is already pretty good with the current approach as we're dealing with good locality. +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 75.28% | 11.03 | | patched | 88.09% | 9.31 | +----------------+----------+------------------+ 3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload. +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 70.66% | 17.14 | | patched | 91.15% | 12.57 | +----------------+----------+------------------+ 4) Ebizzy: There's a fair amount of variation from run to run, but this approach always shows nearly perfect hit rates, while baseline is just about non-existent. The amounts of cycles can fluctuate between anywhere from ~60 to ~116 for the baseline scheme, but this approach reduces it considerably. For instance, with 80 threads: +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 1.06% | 91.54 | | patched | 99.97% | 14.18 | +----------------+----------+------------------+ [akpm@linux-foundation.org: fix nommu build, per Davidlohr] [akpm@linux-foundation.org: document vmacache_valid() logic] [akpm@linux-foundation.org: attempt to untangle header files] [akpm@linux-foundation.org: add vmacache_find() BUG_ON] [hughd@google.com: add vmacache_valid_mm() (from Oleg)] [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: adjust and enhance comments] Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Tested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07mm: implement ->map_pages for page cacheKirill A. Shutemov
filemap_map_pages() is generic implementation of ->map_pages() for filesystems who uses page cache. It should be safe to use filemap_map_pages() for ->map_pages() if filesystem use filemap_fault() for ->fault(). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dave Chinner <david@fromorbit.com> Cc: Ning Qu <quning@gmail.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07exec: kill the unnecessary mm->def_flags setting in load_elf_binary()Alex Thorlton
load_elf_binary() sets current->mm->def_flags = def_flags and def_flags is always zero. Not only this looks strange, this is unnecessary because mm_init() has already set ->def_flags = 0. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07ntfs: logging clean-upFabian Frederick
- Convert spinlock/static array to va_format (inspired by Joe Perches help on previous logging patches). - Convert printk(KERN_ERR to pr_warn in __ntfs_warning. - Convert printk(KERN_ERR to pr_err in __ntfs_error. - Convert printk(KERN_DEBUG to pr_debug in __ntfs_debug. (Note that __ntfs_debug is still guarded by #if DEBUG) - Improve !DEBUG to parse all arguments (Joe Perches). - Sparse pr_foo() conversions in super.c NTFS, NTFS-fs prefixes as well as 'warning' and 'error' were removed : pr_foo() automatically adds module name and error level is already specified. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-06Merge tag 'nfs-for-3.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: - Stable fix for a use after free issue in the NFSv4.1 open code - Fix the SUNRPC bi-directional RPC code to account for TCP segmentation - Optimise usage of readdirplus when confronted with 'ls -l' situations - Soft mount bugfixes - NFS over RDMA bugfixes - NFSv4 close locking fixes - Various NFSv4.x client state management optimisations - Rename/unlink code cleanups" * tag 'nfs-for-3.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits) nfs: pass string length to pr_notice message about readdir loops NFSv4: Fix a use-after-free problem in open() SUNRPC: rpc_restart_call/rpc_restart_call_prepare should clear task->tk_status SUNRPC: Don't let rpc_delay() clobber non-timeout errors SUNRPC: Ensure call_connect_status() deals correctly with SOFTCONN tasks SUNRPC: Ensure call_status() deals correctly with SOFTCONN tasks NFSv4: Ensure we respect soft mount timeouts during trunking discovery NFSv4: Schedule recovery if nfs40_walk_client_list() is interrupted NFS: advertise only supported callback netids SUNRPC: remove KERN_INFO from dprintk() call sites SUNRPC: Fix large reads on NFS/RDMA NFS: Clean up: revert increase in READDIR RPC buffer max size SUNRPC: Ensure that call_bind times out correctly SUNRPC: Ensure that call_connect times out correctly nfs: emit a fsnotify_nameremove call in sillyrename codepath nfs: remove synchronous rename code nfs: convert nfs_rename to use async_rename infrastructure nfs: make nfs_async_rename non-static nfs: abstract out code needed to complete a sillyrename NFSv4: Clear the open state flags if the new stateid does not match ...
2014-04-06Merge tag 'modules-next-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module updates from Rusty Russell: "Nothing major: the stricter permissions checking for sysfs broke a staging driver; fix included. Greg KH said he'd take the patch but hadn't as the merge window opened, so it's included here to avoid breaking build" * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: staging: fix up speakup kobject mode Use 'E' instead of 'X' for unsigned module taint flag. VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms. kallsyms: fix percpu vars on x86-64 with relocation. kallsyms: generalize address range checking module: LLVMLinux: Remove unused function warning from __param_check macro Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE module: remove MODULE_GENERIC_TABLE module: allow multiple calls to MODULE_DEVICE_TABLE() per module module: use pr_cont
2014-04-05nfs: pass string length to pr_notice message about readdir loopsJeff Layton
There is no guarantee that the strings in the nfs_cache_array will be NULL-terminated. In the event that we end up hitting a readdir loop, we need to ensure that we pass the warning message the length of the string. Reported-by: Lachlan McIlroy <lmcilroy@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-04-04Merge tag 'xfs-for-linus-3.15-rc1' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull xfs update from Dave Chinner: "There are a couple of new fallocate features in this request - it was decided that it was easiest to push them through the XFS tree using topic branches and have the ext4 support be based on those branches. Hence you may see some overlap with the ext4 tree merge depending on how they including those topic branches into their tree. Other than that, there is O_TMPFILE support, some cleanups and bug fixes. The main changes in the XFS tree for 3.15-rc1 are: - O_TMPFILE support - allowing AIO+DIO writes beyond EOF - FALLOC_FL_COLLAPSE_RANGE support for fallocate syscall and XFS implementation - FALLOC_FL_ZERO_RANGE support for fallocate syscall and XFS implementation - IO verifier cleanup and rework - stack usage reduction changes - vm_map_ram NOIO context fixes to remove lockdep warings - various bug fixes and cleanups" * tag 'xfs-for-linus-3.15-rc1' of git://oss.sgi.com/xfs/xfs: (34 commits) xfs: fix directory hash ordering bug xfs: extra semi-colon breaks a condition xfs: Add support for FALLOC_FL_ZERO_RANGE fs: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate xfs: inode log reservations are still too small xfs: xfs_check_page_type buffer checks need help xfs: avoid AGI/AGF deadlock scenario for inode chunk allocation xfs: use NOIO contexts for vm_map_ram xfs: don't leak EFSBADCRC to userspace xfs: fix directory inode iolock lockdep false positive xfs: allocate xfs_da_args to reduce stack footprint xfs: always do log forces via the workqueue xfs: modify verifiers to differentiate CRC from other errors xfs: print useful caller information in xfs_error_report xfs: add xfs_verifier_error() xfs: add helper for updating checksums on xfs_bufs xfs: add helper for verifying checksums on xfs_bufs xfs: Use defines for CRC offsets in all cases xfs: skip pointless CRC updates after verifier failures xfs: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate ...
2014-04-04Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Major changes for 3.14 include support for the newly added ZERO_RANGE and COLLAPSE_RANGE fallocate operations, and scalability improvements in the jbd2 layer and in xattr handling when the extended attributes spill over into an external block. Other than that, the usual clean ups and minor bug fixes" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits) ext4: fix premature freeing of partial clusters split across leaf blocks ext4: remove unneeded test of ret variable ext4: fix comment typo ext4: make ext4_block_zero_page_range static ext4: atomically set inode->i_flags in ext4_set_inode_flags() ext4: optimize Hurd tests when reading/writing inodes ext4: kill i_version support for Hurd-castrated file systems ext4: each filesystem creates and uses its own mb_cache fs/mbcache.c: doucple the locking of local from global data fs/mbcache.c: change block and index hash chain to hlist_bl_node ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate ext4: refactor ext4_fallocate code ext4: Update inode i_size after the preallocation ext4: fix partial cluster handling for bigalloc file systems ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents ext4: only call sync_filesystm() when remounting read-only fs: push sync_filesystem() down to the file system's remount_fs() jbd2: improve error messages for inconsistent journal heads jbd2: minimize region locked by j_list_lock in jbd2_journal_forget() jbd2: minimize region locked by j_list_lock in journal_get_create_access() ...
2014-04-04Merge tag 'please-pull-pstore' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull pstore fixes from Tony Luck: "Series of small bug fixes for pstore" * tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: pstore: Fix memory leak when decompress using big_oops_buf pstore: Fix buffer overflow while write offset equal to buffer size pstore: Correct the max_dump_cnt clearing of ramoops pstore: Fix NULL pointer fault if get NULL prz in ramoops_get_next_prz pstore: skip zero size persistent ram buffer in traverse pstore: clarify clearing of _read_cnt in ramoops_context
2014-04-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse update from Miklos Szeredi: "This series adds cached writeback support to fuse, improving write throughput" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix "uninitialized variable" warning fuse: Turn writeback cache on fuse: Fix O_DIRECT operations vs cached writeback misorder fuse: fuse_flush() should wait on writeback fuse: Implement write_begin/write_end callbacks fuse: restructure fuse_readpage() fuse: Flush files on wb close fuse: Trust kernel i_mtime only fuse: Trust kernel i_size only fuse: Connection bit for enabling writeback fuse: Prepare to handle short reads fuse: Linking file to inode helper
2014-04-04Merge tag 'dlm-3.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This set includes a couple trivial cleanups and changes recovery log messages from DEBUG to INFO" * tag 'dlm-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: use INFO for recovery messages fs: Include appropriate header file in dlm/ast.c dlm: silence a harmless use after free warning
2014-04-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs changes from Chris Mason: "This is a pretty long stream of bug fixes and performance fixes. Qu Wenruo has replaced the btrfs async threads with regular kernel workqueues. We'll keep an eye out for performance differences, but it's nice to be using more generic code for this. We still have some corruption fixes and other patches coming in for the merge window, but this batch is tested and ready to go" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (108 commits) Btrfs: fix a crash of clone with inline extents's split btrfs: fix uninit variable warning Btrfs: take into account total references when doing backref lookup Btrfs: part 2, fix incremental send's decision to delay a dir move/rename Btrfs: fix incremental send's decision to delay a dir move/rename Btrfs: remove unnecessary inode generation lookup in send Btrfs: fix race when updating existing ref head btrfs: Add trace for btrfs_workqueue alloc/destroy Btrfs: less fs tree lock contention when using autodefrag Btrfs: return EPERM when deleting a default subvolume Btrfs: add missing kfree in btrfs_destroy_workqueue Btrfs: cache extent states in defrag code path Btrfs: fix deadlock with nested trans handles Btrfs: fix possible empty list access when flushing the delalloc inodes Btrfs: split the global ordered extents mutex Btrfs: don't flush all delalloc inodes when we doesn't get s_umount lock Btrfs: reclaim delalloc metadata more aggressively Btrfs: remove unnecessary lock in may_commit_transaction() Btrfs: remove the unnecessary flush when preparing the pages Btrfs: just do dirty page flush for the inode with compression before direct IO ...
2014-04-04Merge tag 'gfs2-merge-window' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw Pull GFS2 updates from Steven Whitehouse: "One of the main highlights this time, is not the patches themselves but instead the widening contributor base. It is good to see that interest is increasing in GFS2, and I'd like to thank all the contributors to this patch set. In addition to the usual set of bug fixes and clean ups, there are patches to improve inode creation performance when xattrs are required and some improvements to the transaction code which is intended to help improve scalability after further changes in due course. Journal extent mapping is also updated to make it more efficient and again, this is a foundation for future work in this area. The maximum number of ACLs has been increased to 300 (for a 4k block size) which means that even with a few additional xattrs from selinux, everything should fit within a single fs block. There is also a patch to bring GFS2's own copy of the writepages code up to the same level as the core VFS. Eventually we may be able to merge some of this code, since it is fairly similar. The other major change this time, is bringing consistency to the printing of messages via fs_<level>, pr_<level> macros" * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (29 commits) GFS2: Fix address space from page function GFS2: Fix uninitialized VFS inode in gfs2_create_inode GFS2: Fix return value in slot_get() GFS2: inline function gfs2_set_mode GFS2: Remove extraneous function gfs2_security_init GFS2: Increase the max number of ACLs GFS2: Re-add a call to log_flush_wait when flushing the journal GFS2: Ensure workqueue is scheduled after noexp request GFS2: check NULL return value in gfs2_ok_to_move GFS2: Convert gfs2_lm_withdraw to use fs_err GFS2: Use fs_<level> more often GFS2: Use pr_<level> more consistently GFS2: Move recovery variables to journal structure in memory GFS2: global conversion to pr_foo() GFS2: return -E2BIG if hit the maximum limits of ACLs GFS2: Clean up journal extent mapping GFS2: replace kmalloc - __vmalloc / memset 0 GFS2: Remove extra "if" in gfs2_log_flush() fs: NULL dereference in posix_acl_to_xattr() GFS2: Move log buffer accounting to transaction ...
2014-04-04Merge branch 'locks-3.15' of git://git.samba.org/jlayton/linuxLinus Torvalds
Pull file locking updates from Jeff Layton: "Highlights: - maintainership change for fs/locks.c. Willy's not interested in maintaining it these days, and is OK with Bruce and I taking it. - fix for open vs setlease race that Al ID'ed - cleanup and consolidation of file locking code - eliminate unneeded BUG() call - merge of file-private lock implementation" * 'locks-3.15' of git://git.samba.org/jlayton/linux: locks: make locks_mandatory_area check for file-private locks locks: fix locks_mandatory_locked to respect file-private locks locks: require that flock->l_pid be set to 0 for file-private locks locks: add new fcntl cmd values for handling file private locks locks: skip deadlock detection on FL_FILE_PVT locks locks: pass the cmd value to fcntl_getlk/getlk64 locks: report l_pid as -1 for FL_FILE_PVT locks locks: make /proc/locks show IS_FILE_PVT locks as type "FLPVT" locks: rename locks_remove_flock to locks_remove_file locks: consolidate checks for compatible filp->f_mode values in setlk handlers locks: fix posix lock range overflow handling locks: eliminate BUG() call when there's an unexpected lock on file close locks: add __acquires and __releases annotations to locks_start and locks_stop locks: remove "inline" qualifier from fl_link manipulation functions locks: clean up comment typo locks: close potential race between setlease and open MAINTAINERS: update entry for fs/locks.c
2014-04-04Merge branch 'cross-rename' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull renameat2 system call from Miklos Szeredi: "This adds a new syscall, renameat2(), which is the same as renameat() but with a flags argument. The purpose of extending rename is to add cross-rename, a symmetric variant of rename, which exchanges the two files. This allows interesting things, which were not possible before, for example atomically replacing a directory tree with a symlink, etc... This also allows overlayfs and friends to operate on whiteouts atomically. Andy Lutomirski also suggested a "noreplace" flag, which disables the overwriting behavior of rename. These two flags, RENAME_EXCHANGE and RENAME_NOREPLACE are only implemented for ext4 as an example and for testing" * 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ext4: add cross rename support ext4: rename: split out helper functions ext4: rename: move EMLINK check up ext4: rename: create ext4_renament structure for local vars vfs: add cross-rename vfs: lock_two_nondirectories: allow directory args security: add flags to rename hooks vfs: add RENAME_NOREPLACE flag vfs: add renameat2 syscall vfs: rename: use common code for dir and non-dir vfs: rename: move d_move() up vfs: add d_is_dir()
2014-04-03Merge branch 'akpm' (incoming from Andrew)Linus Torvalds
Merge first patch-bomb from Andrew Morton: - Various misc bits - kmemleak fixes - small befs, codafs, cifs, efs, freexxfs, hfsplus, minixfs, reiserfs things - fanotify - I appear to have become SuperH maintainer - ocfs2 updates - direct-io tweaks - a bit of the MM queue - printk updates - MAINTAINERS maintenance - some backlight things - lib/ updates - checkpatch updates - the rtc queue - nilfs2 updates - Small Documentation/ updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (237 commits) Documentation/SubmittingPatches: remove references to patch-scripts Documentation/SubmittingPatches: update some dead URLs Documentation/filesystems/ntfs.txt: remove changelog reference Documentation/kmemleak.txt: updates fs/reiserfs/super.c: add __init to init_inodecache fs/reiserfs: move prototype declaration to header file fs/hfsplus/attributes.c: add __init to hfsplus_create_attr_tree_cache() fs/hfsplus/extents.c: fix concurrent acess of alloc_blocks fs/hfsplus/extents.c: remove unused variable in hfsplus_get_block nilfs2: update project's web site in nilfs2.txt nilfs2: update MAINTAINERS file entries fix nilfs2: verify metadata sizes read from disk nilfs2: add FITRIM ioctl support for nilfs2 nilfs2: add nilfs_sufile_trim_fs to trim clean segs nilfs2: implementation of NILFS_IOCTL_SET_SUINFO ioctl nilfs2: add nilfs_sufile_set_suinfo to update segment usage nilfs2: add struct nilfs_suinfo_update and flags nilfs2: update MAINTAINERS file entries fs/coda/inode.c: add __init to init_inodecache() BEFS: logging cleanup ...
2014-04-03fs/reiserfs/super.c: add __init to init_inodecacheFabian Frederick
init_inodecache is only called by __init init_reiserfs_fs. Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fs/reiserfs: move prototype declaration to header fileRashika Kheria
Move prototype declaration to header file reiserfs/reiserfs.h from reiserfs/super.c because they are used by more than one file. This eliminates the following warning in reiserfs/bitmap.c: fs/reiserfs/bitmap.c:647:6: warning: no previous prototype for `show_alloc_options' [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fs/hfsplus/attributes.c: add __init to hfsplus_create_attr_tree_cache()Fabian Frederick
hfsplus_create_attr_tree_cache is only called by __init init_hfsplus_fs Signed-off-by: Fabian Frederick <fabf@skynet.be> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fs/hfsplus/extents.c: fix concurrent acess of alloc_blocksSougata Santra
Concurrent access to alloc_blocks in hfsplus_inode_info() is protected by extents_lock mutex. This patch fixes two instances where alloc_blocks modification was not protected with this lock. This fixes possible allocation bitmap corruption in race conditions while extending and truncating files. [akpm@linux-foundation.org: take extents_lock before taking a copy of ->alloc_blocks] [akpm@linux-foundation.org: remove now-unused label `out'] Signed-off-by: Sougata Santra <sougata@tuxera.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fs/hfsplus/extents.c: remove unused variable in hfsplus_get_blockSougata Santra
The variable is defined but not used. Generally it compiles away with -O2 optimization hence it does not show a warning. Signed-off-by: Sougata Santra <sougata@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03nilfs2: verify metadata sizes read from diskRyusuke Konishi
Add code to check sizes of on-disk data of metadata files such as inode size, segment usage size, DAT entry size, and checkpoint size. Although these sizes are read from disk, the current implementation doesn't check them. If these sizes are not sane on disk, it can cause out-of-range access to metadata or memory access overrun on metadata block buffers due to overflow in sundry calculations. Both lower limit and upper limit of metadata sizes are verified to prevent these issues. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03nilfs2: add FITRIM ioctl support for nilfs2Andreas Rohner
Add support for the FITRIM ioctl, which enables user space tools to issue TRIM/DISCARD requests to the underlying device. Every clean segment within the specified range will be discarded. Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03nilfs2: add nilfs_sufile_trim_fs to trim clean segsAndreas Rohner
Add nilfs_sufile_trim_fs(), which takes an fstrim_range structure and calls blkdev_issue_discard for every clean segment in the specified range. The range is truncated to file system block boundaries. Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03nilfs2: implementation of NILFS_IOCTL_SET_SUINFO ioctlAndreas Rohner
With this ioctl the segment usage entries in the SUFILE can be updated from userspace. This is useful, because it allows the userspace GC to modify and update segment usage entries for specific segments, which enables it to avoid unnecessary write operations. If a segment needs to be cleaned, but there is no or very little reclaimable space in it, the cleaning operation basically degrades to a useless moving operation. In the end the only thing that changes is the location of the data and a timestamp in the segment usage information. With this ioctl the GC can skip the cleaning and update the segment usage entries directly instead. This is basically a shortcut to cleaning the segment. It is still necessary to read the segment summary information, but the writing of the live blocks can be skipped if it's not worth it. [konishi.ryusuke@lab.ntt.co.jp: add description of NILFS_IOCTL_SET_SUINFO ioctl] Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03nilfs2: add nilfs_sufile_set_suinfo to update segment usageAndreas Rohner
Introduce nilfs_sufile_set_suinfo(), which expects an array of nilfs_suinfo_update structures and updates the segment usage information accordingly. This is basically a helper function for the newly introduced NILFS_IOCTL_SET_SUINFO ioctl. [konishi.ryusuke@lab.ntt.co.jp: use put_bh() instead of brelse() because we know bh != NULL] Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fs/coda/inode.c: add __init to init_inodecache()Fabian Frederick
init_inodecache is only called by __init init_coda Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03BEFS: logging cleanupFabian Frederick
Summary: - all printk(KERN_foo converted to pr_foo() - add pr_fmt and remove redundant prefixes - convert befs_() to va_format (based on patch by Joe Perches) - remove non standard %Lu - use __func__ for all debugging [akpm@linux-foundation.org: fix printk warnings, reported by Fengguang] Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Joe Perches <joe@perches.com> Cc: Fengguang Wu <fengguang.wu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fs/befs/linuxvfs.c: add __init to befs_init_inodecache()Fabian Frederick
init_inodecache is only called by __init init_befs_fs. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03befs: replace kmalloc/memset 0 by kzallocFabian Frederick
Use kzalloc for clean fs_info allocation like other filesystems. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fs/minix/inode.c: add __init to init_inodecache()Fabian Frederick
init_inodecache is only called by __init init_minix_fs. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03binfmt_misc: add missing 'break' statementLuis Henriques
A missing 'break' statement in bm_status_write() results in a user program receiving '3' when doing the following: write(fd, "-1", 2); Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fs/efs/super.c: add __init to init_inodecache()Fabian Frederick
init_inodecache is only called by __init init_efs_fs. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fs, kernel: permit disabling the uselib syscallJosh Triplett
uselib hasn't been used since libc5; glibc does not use it. Support turning it off. When disabled, also omit the load_elf_library implementation from binfmt_elf.c, which only uselib invokes. bloat-o-meter: add/remove: 0/4 grow/shrink: 0/1 up/down: 0/-785 (-785) function old new delta padzero 39 36 -3 uselib_flags 20 - -20 sys_uselib 168 - -168 SyS_uselib 168 - -168 load_elf_library 426 - -426 The new CONFIG_USELIB defaults to `y'. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03kernel/groups.c: remove return value of set_groupsWang YanQing
After commit 6307f8fee295 ("security: remove dead hook task_setgroups"), set_groups will always return zero, so we could just remove return value of set_groups. This patch reduces code size, and simplfies code to use set_groups, because we don't need to check its return value any more. [akpm@linux-foundation.org: remove obsolete claims from set_groups() comment] Signed-off-by: Wang YanQing <udknight@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03sys_sysfs: Add CONFIG_SYSFS_SYSCALLFabian Frederick
sys_sysfs is an obsolete system call no longer supported by libc. - This patch adds a default CONFIG_SYSFS_SYSCALL=y - Option can be turned off in expert mode. - cond_syscall added to kernel/sys_ni.c [akpm@linux-foundation.org: tweak Kconfig help text] Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03drop_caches: add some documentation and info messageDave Hansen
There is plenty of anecdotal evidence and a load of blog posts suggesting that using "drop_caches" periodically keeps your system running in "tip top shape". Perhaps adding some kernel documentation will increase the amount of accurate data on its use. If we are not shrinking caches effectively, then we have real bugs. Using drop_caches will simply mask the bugs and make them harder to find, but certainly does not fix them, nor is it an appropriate "workaround" to limit the size of the caches. On the contrary, there have been bug reports on issues that turned out to be misguided use of cache dropping. Dropping caches is a very drastic and disruptive operation that is good for debugging and running tests, but if it creates bug reports from production use, kernel developers should be aware of its use. Add a bit more documentation about it, a syslog message to track down abusers, and vmstat drop counters to help analyze problem reports. [akpm@linux-foundation.org: checkpatch fixes] [hannes@cmpxchg.org: add runtime suppression control] Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03mm: remove read_cache_page_async()Sasha Levin
This patch removes read_cache_page_async() which wasn't really needed anywhere and simplifies the code around it a bit. read_cache_page_async() is useful when we want to read a page into the cache without waiting for it to complete. This happens when the appropriate callback 'filler' doesn't complete its read operation and releases the page lock immediately, and instead queues a different completion routine to do that. This never actually happened anywhere in the code. read_cache_page_async() had 3 different callers: - read_cache_page() which is the sync version, it would just wait for the requested read to complete using wait_on_page_read(). - JFFS2 would call it from jffs2_gc_fetch_page(), but the filler function it supplied doesn't do any async reads, and would complete before the filler function returns - making it actually a sync read. - CRAMFS would call it using the read_mapping_page_async() wrapper, with a similar story to JFFS2 - the filler function doesn't do anything that reminds async reads and would always complete before the filler function returns. To sum it up, the code in mm/filemap.c never took advantage of having read_cache_page_async(). While there are filler callbacks that do async reads (such as the block one), we always called it with the read_cache_page(). This patch adds a mandatory wait for read to complete when adding a new page to the cache, and removes read_cache_page_async() and its wrappers. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03mm + fs: store shadow entries in page cacheJohannes Weiner
Reclaim will be leaving shadow entries in the page cache radix tree upon evicting the real page. As those pages are found from the LRU, an iput() can lead to the inode being freed concurrently. At this point, reclaim must no longer install shadow pages because the inode freeing code needs to ensure the page tree is really empty. Add an address_space flag, AS_EXITING, that the inode freeing code sets under the tree lock before doing the final truncate. Reclaim will check for this flag before installing shadow pages. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Metin Doslu <metin@citusdata.com> Cc: Michel Lespinasse <walken@google.com> Cc: Ozgun Erdogan <ozgun@citusdata.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Ryan Mallon <rmallon@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03mm + fs: prepare for non-page entries in page cache radix treesJohannes Weiner
shmem mappings already contain exceptional entries where swap slot information is remembered. To be able to store eviction information for regular page cache, prepare every site dealing with the radix trees directly to handle entries other than pages. The common lookup functions will filter out non-page entries and return NULL for page cache holes, just as before. But provide a raw version of the API which returns non-page entries as well, and switch shmem over to use it. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Metin Doslu <metin@citusdata.com> Cc: Michel Lespinasse <walken@google.com> Cc: Ozgun Erdogan <ozgun@citusdata.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Ryan Mallon <rmallon@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03mm: filemap: move radix tree hole searching hereJohannes Weiner
The radix tree hole searching code is only used for page cache, for example the readahead code trying to get a a picture of the area surrounding a fault. It sufficed to rely on the radix tree definition of holes, which is "empty tree slot". But this is about to change, though, as shadow page descriptors will be stored in the page cache after the actual pages get evicted from memory. Move the functions over to mm/filemap.c and make them native page cache operations, where they can later be adapted to handle the new definition of "page cache hole". Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Metin Doslu <metin@citusdata.com> Cc: Michel Lespinasse <walken@google.com> Cc: Ozgun Erdogan <ozgun@citusdata.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Ryan Mallon <rmallon@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fs: cachefiles: use add_to_page_cache_lru()Johannes Weiner
This code used to have its own lru cache pagevec up until a0b8cab3 ("mm: remove lru parameter from __pagevec_lru_add and remove parts of pagevec API"). Now it's just add_to_page_cache() followed by lru_cache_add(), might as well use add_to_page_cache_lru() directly. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Metin Doslu <metin@citusdata.com> Cc: Michel Lespinasse <walken@google.com> Cc: Ozgun Erdogan <ozgun@citusdata.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Ryan Mallon <rmallon@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03mm, hugetlb: unify region structure handlingJoonsoo Kim
Currently, to track reserved and allocated regions, we use two different ways, depending on the mapping. For MAP_SHARED, we use address_mapping's private_list and, while for MAP_PRIVATE, we use a resv_map. Now, we are preparing to change a coarse grained lock which protect a region structure to fine grained lock, and this difference hinder it. So, before changing it, unify region structure handling, consistently using a resv_map regardless of the kind of mapping. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fs/direct-io.c: remove some left over checksDan Carpenter
We know that "ret > 0" is true here. These tests were left over from commit 02afc27faec9 ('direct-io: Handle O_(D)SYNC AIO') and aren't needed any more. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fs/direct-io.c: remove redundant comparisonGu Zheng
The return value of bio_get_nr_vecs() cannot be bigger than BIO_MAX_PAGES, so we can remove redundant the comparison between nr_pages and BIO_MAX_PAGES. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03ocfs2: pass "new" parameter to ocfs2_init_xattr_bucketWengang Wang
This patch fixes the following crash: kernel BUG at fs/ocfs2/uptodate.c:530! Modules linked in: ocfs2(F) ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bridge xen_pciback xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd sunrpc 8021q garp stp llc bonding be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iTCO_wdt iTCO_vendor_support dcdbas coretemp freq_table mperf microcode pcspkr serio_raw bnx2 lpc_ich mfd_core i5k_amb i5000_edac edac_core e1000e sg shpchp ext4(F) jbd2(F) mbcache(F) dm_round_robin(F) sr_mod(F) cdrom(F) usb_storage(F) sd_mod(F) crc_t10dif(F) pata_acpi(F) ata_generic(F) ata_piix(F) mptsas(F) mptscsih(F) mptbase(F) scsi_transport_sas(F) radeon(F) ttm(F) drm_kms_helper(F) drm(F) hwmon(F) i2c_algo_bit(F) i2c_core(F) dm_multipath(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) CPU 5 Pid: 21303, comm: xattr-test Tainted: GF W 3.8.13-30.el6uek.x86_64 #2 Dell Inc. PowerEdge 1950/0M788G RIP: ocfs2_set_new_buffer_uptodate+0x51/0x60 [ocfs2] Process xattr-test (pid: 21303, threadinfo ffff880017aca000, task ffff880016a2c480) Call Trace: ocfs2_init_xattr_bucket+0x8a/0x120 [ocfs2] ocfs2_cp_xattr_bucket+0xbb/0x1b0 [ocfs2] ocfs2_extend_xattr_bucket+0x20a/0x2f0 [ocfs2] ocfs2_add_new_xattr_bucket+0x23e/0x4b0 [ocfs2] ocfs2_xattr_set_entry_index_block+0x13c/0x3d0 [ocfs2] ocfs2_xattr_block_set+0xf9/0x220 [ocfs2] __ocfs2_xattr_set_handle+0x118/0x710 [ocfs2] ocfs2_xattr_set+0x691/0x880 [ocfs2] ocfs2_xattr_user_set+0x46/0x50 [ocfs2] generic_setxattr+0x96/0xa0 __vfs_setxattr_noperm+0x7b/0x170 vfs_setxattr+0xbc/0xc0 setxattr+0xde/0x230 sys_fsetxattr+0xc6/0xf0 system_call_fastpath+0x16/0x1b Code: 41 80 0c 24 01 48 89 df e8 7d f0 ff ff 4c 89 e6 48 89 df e8 a2 fe ff ff 48 89 df e8 3a f0 ff ff 48 8b 1c 24 4c 8b 64 24 08 c9 c3 <0f> 0b eb fe 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 66 66 RIP ocfs2_set_new_buffer_uptodate+0x51/0x60 [ocfs2] It hit the BUG_ON() in ocfs2_set_new_buffer_uptodate(): void ocfs2_set_new_buffer_uptodate(struct ocfs2_caching_info *ci, struct buffer_head *bh) { /* This should definitely *not* exist in our cache */ if (ocfs2_buffer_cached(ci, bh)) printk(KERN_ERR "bh->b_blocknr: %lu @ %p\n", bh->b_blocknr, bh); BUG_ON(ocfs2_buffer_cached(ci, bh)); set_buffer_uptodate(bh); ocfs2_metadata_cache_io_lock(ci); ocfs2_set_buffer_uptodate(ci, bh); ocfs2_metadata_cache_io_unlock(ci); } The problem here is: We cached a block, but the buffer_head got reused. When we are to pick up this block again, a new buffer_head created with UPTODATE flag cleared. ocfs2_buffer_uptodate() returned false since no UPTODATE is set on the buffer_head. so we set this block to cache as a NEW block, then it failed at asserting block is not in cache. The fix is to add a new parameter indicating the bucket is a new allocated or not to ocfs2_init_xattr_bucket(). ocfs2_init_xattr_bucket() assert block not cached accordingly. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03ocfs2: avoid system inode ref confusion by adding mutex lockjiangyiwen
The following case may lead to the same system inode ref in confusion. A thread B thread ocfs2_get_system_file_inode ->get_local_system_inode ->_ocfs2_get_system_file_inode because of *arr == NULL, ocfs2_get_system_file_inode ->get_local_system_inode ->_ocfs2_get_system_file_inode gets first ref thru _ocfs2_get_system_file_inode, gets second ref thru igrab and set *arr = inode at the moment, B thread also gets two refs, so lead to one more inode ref. So add mutex lock to avoid multi thread set two inode ref once at the same time. Signed-off-by: jiangyiwen <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03ocfs2: iput inode alloc when failed locallyjiangyiwen
In ocfs2_info_handle_freeinode() and ocfs2_test_inode_bit() func, after calls ocfs2_get_system_file_inode() to get inode ref, if calls ocfs2_info_scan_inode_alloc() or ocfs2_inode_lock() failed, we should iput inode alloc to avoid leaking the inode. Signed-off-by: jiangyiwen <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03ocfs2/o2net: o2net_listen_data_ready should do nothing if socket state is ↵Tariq Saeed
not TCP_LISTEN Orabug: 17330860 When accepting an incomming connection o2net_accept_one clones a child data socket from the parent listening socket. It then proceeds to setup the child with callback o2net_data_ready() and sk_user_data to NULL. If data arrives in this window, o2net_listen_data_ready will be called with some non-deterministic value in sk_user_data (not inherited). We panic when we page fault on sk_user_data -- in parent it is sock_def_readable(). The fix is to recognize that this is a data socket being set up by looking at the socket state and do nothing. Signed-off-by: Tariq Saseed <tariq.x.saeed@oracle.com> Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>