summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-01-11Merge branch 'work.xattr' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs xattr updates from Al Viro: "Andreas' xattr cleanup series. It's a followup to his xattr work that went in last cycle; -0.5KLoC" * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: xattr handlers: Simplify list operation ocfs2: Replace list xattr handler operations nfs: Move call to security_inode_listsecurity into nfs_listxattr xfs: Change how listxattr generates synthetic attributes tmpfs: listxattr should include POSIX ACL xattrs tmpfs: Use xattr handler infrastructure btrfs: Use xattr handler infrastructure vfs: Distinguish between full xattr names and proper prefixes posix acls: Remove duplicate xattr name definitions gfs2: Remove gfs2_xattr_acl_chmod vfs: Remove vfs_xattr_cmp
2016-01-11Merge branch 'work.symlinks' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs RCU symlink updates from Al Viro: "Replacement of ->follow_link/->put_link, allowing to stay in RCU mode even if the symlink is not an embedded one. No changes since the mailbomb on Jan 1" * 'work.symlinks' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch ->get_link() to delayed_call, kill ->put_link() kill free_page_put_link() teach nfs_get_link() to work in RCU mode teach proc_self_get_link()/proc_thread_self_get_link() to work in RCU mode teach shmem_get_link() to work in RCU mode teach page_get_link() to work in RCU mode replace ->follow_link() with new method that could stay in RCU mode don't put symlink bodies in pagecache into highmem namei: page_getlink() and page_follow_link_light() are the same thing ufs: get rid of ->setattr() for symlinks udf: don't duplicate page_symlink_inode_operations logfs: don't duplicate page_symlink_inode_operations switch befs long symlinks to page_symlink_operations
2016-01-08compat_ioctl: don't call do_ioctl under set_fs(KERNEL_DS)Jann Horn
This replaces all code in fs/compat_ioctl.c that translated ioctl arguments into a in-kernel structure, then performed do_ioctl under set_fs(KERNEL_DS), with code that allocates data on the user stack and can call the VFS ioctl handler under USER_DS. This is done as a hardening measure because the caller does not know what kind of ioctl handler will be invoked, only that no corresponding compat_ioctl handler exists and what the ioctl command number is. The accidental invocation of an unlocked_ioctl handler that unexpectedly calls copy_to_user could be a severe security issue. Signed-off-by: Jann Horn <jann@thejh.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-08compat_ioctl: don't pass fd around when not neededAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-08compat_ioctl: don't look up the fd twiceJann Horn
In code in fs/compat_ioctl.c that translates ioctl arguments into a in-kernel structure, then performs sys_ioctl, possibly under set_fs(KERNEL_DS), this commit changes the sys_ioctl calls to do_ioctl calls. do_ioctl is a new function that does the same thing as sys_ioctl, but doesn't look up the fd again. This change is made to avoid (potential) security issues because of ioctl handlers that accept one of the ioctl commands I2C_FUNCS, VIDEO_GET_EVENT, MTIOCPOS, MTIOCGET, TIOCGSERIAL, TIOCSSERIAL, RTC_IRQP_READ, RTC_EPOCH_READ. This can happen for multiple reasons: - The ioctl command number could be reused. - The ioctl handler might not check the full ioctl command. This is e.g. true for drm_ioctl. - The ioctl handler is very special, e.g. cuse_file_ioctl The real issue is that set_fs(KERNEL_DS) is used here, but that's fixed in a separate commit "compat_ioctl: don't call do_ioctl under set_fs(KERNEL_DS)". This change mitigates potential security issues by preventing a race that permits invocation of unlocked_ioctl handlers under KERNEL_DS through compat code even if a corresponding compat_ioctl handler exists. So far, no way has been identified to use this to damage kernel memory without having CAP_SYS_ADMIN in the init ns (with the capability, doing reads/writes at arbitrary kernel addresses should be easy through CUSE's ioctl handler with FUSE_IOCTL_UNRESTRICTED set). [AV: two missed sys_ioctl() taken care of] Signed-off-by: Jann Horn <jann@thejh.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-30switch ->get_link() to delayed_call, kill ->put_link()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-29ocfs2/dlm: clear migration_pending when migration target goes downxuejiufei
We have found a BUG on res->migration_pending when migrating lock resources. The situation is as follows. dlm_mark_lockres_migration res->migration_pending = 1; __dlm_lockres_reserve_ast dlm_lockres_release_ast returns with res->migration_pending remains because other threads reserve asts wait dlm_migration_can_proceed returns 1 >>>>>>> o2hb found that target goes down and remove target from domain_map dlm_migration_can_proceed returns 1 dlm_mark_lockres_migrating returns -ESHOTDOWN with res->migration_pending still remains. When reentering dlm_mark_lockres_migrating(), it will trigger the BUG_ON with res->migration_pending. So clear migration_pending when target is down. Signed-off-by: Jiufei Xue <xuejiufei@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-29ocfs2: fix flock panic issueJunxiao Bi
Commit 4f6563677ae8 ("Move locks API users to locks_lock_inode_wait()") move flock/posix lock indentify code to locks_lock_inode_wait(), but missed to set fl_flags to FL_FLOCK which caused the following kernel panic on 4.4.0_rc5. kernel BUG at fs/locks.c:1895! invalid opcode: 0000 [#1] SMP Modules linked in: ocfs2(O) ocfs2_dlmfs(O) ocfs2_stack_o2cb(O) ocfs2_dlm(O) ocfs2_nodemanager(O) ocfs2_stackglue(O) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi xen_kbdfront xen_netfront xen_fbfront xen_blkfront CPU: 0 PID: 20268 Comm: flock_unit_test Tainted: G O 4.4.0-rc5-next-20151217 #1 Hardware name: Xen HVM domU, BIOS 4.3.1OVM 05/14/2014 task: ffff88007b3672c0 ti: ffff880028b58000 task.ti: ffff880028b58000 RIP: locks_lock_inode_wait+0x2e/0x160 Call Trace: ocfs2_do_flock+0x91/0x160 [ocfs2] ocfs2_flock+0x76/0xd0 [ocfs2] SyS_flock+0x10f/0x1a0 entry_SYSCALL_64_fastpath+0x12/0x71 Code: e5 41 57 41 56 49 89 fe 41 55 41 54 53 48 89 f3 48 81 ec 88 00 00 00 8b 46 40 83 e0 03 83 f8 01 0f 84 ad 00 00 00 83 f8 02 74 04 <0f> 0b eb fe 4c 8d ad 60 ff ff ff 4c 8d 7b 58 e8 0e 8e 73 00 4d RIP locks_lock_inode_wait+0x2e/0x160 RSP <ffff880028b5bce8> ---[ end trace dfca74ec9b5b274c ]--- Fixes: 4f6563677ae8 ("Move locks API users to locks_lock_inode_wait()") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-29ocfs2: fix BUG when calculate new backup superJoseph Qi
When resizing, it firstly extends the last gd. Once it should backup super in the gd, it calculates new backup super and update the corresponding value. But it currently doesn't consider the situation that the backup super is already done. And in this case, it still sets the bit in gd bitmap and then decrease from bg_free_bits_count, which leads to a corrupted gd and trigger the BUG in ocfs2_block_group_set_bits: BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits); So check whether the backup super is done and then do the updates. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Jiufei Xue <xuejiufei@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-29kill free_page_put_link()Al Viro
all callers are better off with kfree_put_link() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-22Merge tag 'nfsd-4.4-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fix from Bruce Fields: "Just one fix for a NFSv4 callback bug introduced in 4.4" * tag 'nfsd-4.4-1' of git://linux-nfs.org/~bfields/linux: nfsd: don't hold ls_mutex across a layout recall
2015-12-18Merge branch 'for-linus-4.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "A couple of small fixes" * 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: check prepare_uptodate_page() error code earlier Btrfs: check for empty bitmap list in setup_cluster_bitmaps btrfs: fix misleading warning when space cache failed to load Btrfs: fix transaction handle leak in balance Btrfs: fix unprotected list move from unused_bgs to deleted_bgs list
2015-12-18proc: fix -ESRCH error when writing to /proc/$pid/coredump_filterColin Ian King
Writing to /proc/$pid/coredump_filter always returns -ESRCH because commit 774636e19ed51 ("proc: convert to kstrto*()/kstrto*_from_user()") removed the setting of ret after the get_proc_task call and incorrectly left it as -ESRCH. Instead, return 0 when successful. Example breakage: echo 0 > /proc/self/coredump_filter bash: echo: write error: No such process Fixes: 774636e19ed51 ("proc: convert to kstrto*()/kstrto*_from_user()") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> [4.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-16nfsd: don't hold ls_mutex across a layout recallJeff Layton
We do need to serialize layout stateid morphing operations, but we currently hold the ls_mutex across a layout recall which is pretty ugly. It's also unnecessary -- once we've bumped the seqid and copied it, we don't need to serialize the rest of the CB_LAYOUTRECALL vs. anything else. Just drop the mutex once the copy is done. This was causing a "workqueue leaked lock or atomic" warning and an occasional deadlock. There's more work to be done here but this fixes the immediate regression. Fixes: cc8a55320b5f "nfsd: serialize layout stateid morphing operations" Cc: stable@vger.kernel.org Reported-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-12-15Merge branch 'for-chris-4.4' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.4
2015-12-15Btrfs: check prepare_uptodate_page() error code earlierChris Mason
prepare_pages() may end up calling prepare_uptodate_page() twice if our write only spans a single page. But if the first call returns an error, our page will be unlocked and its not safe to call it again. This bug goes all the way back to 2011, and it's not something commonly hit. While we're here, add a more explicit check for the page being truncated away. The bare lock_page() alone is protected only by good thoughts and i_mutex, which we're sure to regret eventually. Reported-by: Dave Jones <dsj@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-12-15Btrfs: check for empty bitmap list in setup_cluster_bitmapsChris Mason
Dave Jones found a warning from kasan in setup_cluster_bitmaps() ================================================================== BUG: KASAN: stack-out-of-bounds in setup_cluster_bitmap+0xc4/0x5a0 at addr ffff88039bef6828 Read of size 8 by task nfsd/1009 page:ffffea000e6fbd80 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x8000000000000000() page dumped because: kasan: bad access detected CPU: 1 PID: 1009 Comm: nfsd Tainted: G W 4.4.0-rc3-backup-debug+ #1 ffff880065647b50 000000006bb712c2 ffff88039bef6640 ffffffffa680a43e 0000004559c00000 ffff88039bef66c8 ffffffffa62638d1 ffffffffa61121c0 ffff8803a5769de8 0000000000000296 ffff8803a5769df0 0000000000046280 Call Trace: [<ffffffffa680a43e>] dump_stack+0x4b/0x6d [<ffffffffa62638d1>] kasan_report_error+0x501/0x520 [<ffffffffa61121c0>] ? debug_show_all_locks+0x1e0/0x1e0 [<ffffffffa6263948>] kasan_report+0x58/0x60 [<ffffffffa6814b00>] ? rb_last+0x10/0x40 [<ffffffffa66f8af4>] ? setup_cluster_bitmap+0xc4/0x5a0 [<ffffffffa6262ead>] __asan_load8+0x5d/0x70 [<ffffffffa66f8af4>] setup_cluster_bitmap+0xc4/0x5a0 [<ffffffffa66f675a>] ? setup_cluster_no_bitmap+0x6a/0x400 [<ffffffffa66fcd16>] btrfs_find_space_cluster+0x4b6/0x640 [<ffffffffa66fc860>] ? btrfs_alloc_from_cluster+0x4e0/0x4e0 [<ffffffffa66fc36e>] ? btrfs_return_cluster_to_free_space+0x9e/0xb0 [<ffffffffa702dc37>] ? _raw_spin_unlock+0x27/0x40 [<ffffffffa666a1a1>] find_free_extent+0xba1/0x1520 Andrey noticed this was because we were doing list_first_entry on a list that might be empty. Rework the tests a bit so we don't do that. Signed-off-by: Chris Mason <clm@fb.com> Reprorted-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Reported-by: Dave Jones <dsj@fb.com>
2015-12-13xattr handlers: Simplify list operationAndreas Gruenbacher
Change the list operation to only return whether or not an attribute should be listed. Copying the attribute names into the buffer is moved to the callers. Since the result only depends on the dentry and not on the attribute name, we do not pass the attribute name to list operations. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-13ocfs2: Replace list xattr handler operationsAndreas Gruenbacher
The list operations of the ocfs2 xattr handlers were never called anywhere. Remove them and directly check in ocfs2_xattr_list_entry which attributes should be skipped over instead. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: ocfs2-devel@oss.oracle.com Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-13nfs: Move call to security_inode_listsecurity into nfs_listxattrAndreas Gruenbacher
Add a nfs_listxattr operation. Move the call to security_inode_listsecurity from list operation of the "security.*" xattr handler to nfs_listxattr. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-13sched/wait: Fix the signal handling fixPeter Zijlstra
Jan Stancek reported that I wrecked things for him by fixing things for Vladimir :/ His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which should not be possible, however my previous patch made this possible by unconditionally checking signal_pending(). We cannot use current->state as was done previously, because the instruction after the store to that variable it can be changed. We must instead pass the initial state along and use that. Fixes: 68985633bccb ("sched/wait: Fix signal handling in bit wait helpers") Reported-by: Jan Stancek <jstancek@redhat.com> Reported-by: Chris Mason <clm@fb.com> Tested-by: Jan Stancek <jstancek@redhat.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Chris Mason <clm@fb.com> Reviewed-by: Paul Turner <pjt@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: tglx@linutronix.de Cc: Oleg Nesterov <oleg@redhat.com> Cc: hpa@zytor.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-13Merge tag 'nfs-for-4.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfix from Trond Myklebust: "SUNRPC: Fix a NFSv4.1 callback channel regression" * tag 'nfs-for-4.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Fix callback channel
2015-12-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "17 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: MIPS: fix DMA contiguous allocation sh64: fix __NR_fgetxattr ocfs2: fix SGID not inherited issue mm/oom_kill.c: avoid attempting to kill init sharing same memory drivers/base/memory.c: prohibit offlining of memory blocks with missing sections tmpfs: fix shmem_evict_inode() warnings on i_blocks mm/hugetlb.c: fix resv map memory leak for placeholder entries mm: hugetlb: call huge_pte_alloc() only if ptep is null kernel: remove stop_machine() Kconfig dependency mm: kmemleak: mark kmemleak_init prototype as __init mm: fix kerneldoc on mem_cgroup_replace_page osd fs: __r4w_get_page rely on PageUptodate for uptodate MAINTAINERS: make Vladimir co-maintainer of the memory controller mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress mm: fix swapped Movable and Reclaimable in /proc/pagetypeinfo memcg: fix memory.high target mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
2015-12-12Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer fixes from Jens Axboe: "A set of fixes for the current series. This contains: - A bunch of fixes for lightnvm, should be the last round for this series. From Matias and Wenwei. - A writeback detach inode fix from Ilya, also marked for stable. - A block (though it says SCSI) fix for an OOPS in SCSI runtime power management. - Module init error path fixes for null_blk from Minfei" * 'for-linus' of git://git.kernel.dk/linux-block: null_blk: Fix error path in module initialization lightnvm: do not compile in debugging by default lightnvm: prevent gennvm module unload on use lightnvm: fix media mgr registration lightnvm: replace req queue with nvmdev for lld lightnvm: comments on constants lightnvm: check mm before use lightnvm: refactor spin_unlock in gennvm_get_blk lightnvm: put blks when luns configure failed lightnvm: use flags in rrpc_get_blk block: detach bdev inode from its wb in __blkdev_put() SCSI: Fix NULL pointer dereference in runtime PM
2015-12-12ocfs2: fix SGID not inherited issueJunxiao Bi
Commit 8f1eb48758aa ("ocfs2: fix umask ignored issue") introduced an issue, SGID of sub dir was not inherited from its parents dir. It is because SGID is set into "inode->i_mode" in ocfs2_get_init_inode(), but is overwritten by "mode" which don't have SGID set later. Fixes: 8f1eb48758aa ("ocfs2: fix umask ignored issue") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-12osd fs: __r4w_get_page rely on PageUptodate for uptodateHugh Dickins
Commit 42cb14b110a5 ("mm: migrate dirty page without clear_page_dirty_for_io etc") simplified the migration of a PageDirty pagecache page: one stat needs moving from zone to zone and that's about all. It's convenient and safest for it to shift the PageDirty bit from old page to new, just before updating the zone stats: before copying data and marking the new PageUptodate. This is all done while both pages are isolated and locked, just as before; and just as before, there's a moment when the new page is visible in the radix_tree, but not yet PageUptodate. What's new is that it may now be briefly visible as PageDirty before it is PageUptodate. When I scoured the tree to see if this could cause a problem anywhere, the only places I found were in two similar functions __r4w_get_page(): which look up a page with find_get_page() (not using page lock), then claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate. I'm not sure whether that was right before, but now it might be wrong (on rare occasions): only claim the page is uptodate if PageUptodate. Or perhaps the page in question could never be migratable anyway? Signed-off-by: Hugh Dickins <hughd@google.com> Tested-by: Boaz Harrosh <ooo@electrozaur.com> Cc: Benny Halevy <bhalevy@panasas.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Two bugfixes, both bound for -stable" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: break infinite loop in fuse_fill_write_pages() cuse: fix memory leak
2015-12-10btrfs: fix misleading warning when space cache failed to loadHolger Hoffstätte
When an inconsistent space cache is detected during loading we log a warning that users frequently mistake as instruction to invalidate the cache manually, even though this is not required. Fix the message to indicate that the cache will be rebuilt automatically. Signed-off-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com> Acked-by: Filipe Manana <fdmanana@suse.com>
2015-12-10Btrfs: fix transaction handle leak in balanceFilipe Manana
If we fail to allocate a new data chunk, we were jumping to the error path without release the transaction handle we got before. Fix this by always releasing it before doing the jump. Fixes: 2c9fe8355258 ("btrfs: Fix lost-data-profile caused by balance bg") Signed-off-by: Filipe Manana <fdmanana@suse.com>
2015-12-10Btrfs: fix unprotected list move from unused_bgs to deleted_bgs listFilipe Manana
As of my previous change titled "Btrfs: fix scrub preventing unused block groups from being deleted", the following warning at extent-tree.c:btrfs_delete_unused_bgs() can be hit when we mount the a filesysten with "-o discard": 10263 void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) 10264 { (...) 10405 if (trimming) { 10406 WARN_ON(!list_empty(&block_group->bg_list)); 10407 spin_lock(&trans->transaction->deleted_bgs_lock); 10408 list_move(&block_group->bg_list, 10409 &trans->transaction->deleted_bgs); 10410 spin_unlock(&trans->transaction->deleted_bgs_lock); 10411 btrfs_get_block_group(block_group); 10412 } (...) This happens because scrub can now add back the block group to the list of unused block groups (fs_info->unused_bgs). This is dangerous because we are moving the block group from the unused block groups list to the list of deleted block groups without holding the lock that protects the source list (fs_info->unused_bgs_lock). The following diagram illustrates how this happens: CPU 1 CPU 2 cleaner_kthread() btrfs_delete_unused_bgs() sees bg X in list fs_info->unused_bgs deletes bg X from list fs_info->unused_bgs scrub_enumerate_chunks() searches device tree using its commit root finds device extent for block group X gets block group X from the tree fs_info->block_group_cache_tree (via btrfs_lookup_block_group()) sets bg X to RO (again) scrub_chunk(bg X) sets bg X back to RW mode adds bg X to the list fs_info->unused_bgs again, since it's still unused and currently not in that list sets bg X to RO mode btrfs_remove_chunk(bg X) --> discard is enabled and bg X is in the fs_info->unused_bgs list again so the warning is triggered --> we move it from that list into the transaction's delete_bgs list, but we can have another task currently manipulating the first list (fs_info->unused_bgs) Fix this by using the same lock (fs_info->unused_bgs_lock) to protect both the list of unused block groups and the list of deleted block groups. This makes it safe and there's not much worry for more lock contention, as this lock is seldom used and only the cleaner kthread adds elements to the list of deleted block groups. The warning goes away too, as this was previously an impossible case (and would have been better a BUG_ON/ASSERT) but it's not impossible anymore. Reproduced with fstest btrfs/073 (using MOUNT_OPTIONS="-o discard"). Signed-off-by: Filipe Manana <fdmanana@suse.com>
2015-12-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "A couple of fixes, both -stable fodder (9p one all way back to 2.6.32, dio - to all branches where "Fix negative return from dio read beyond eof" will end up it; it's a fixup to commit marked for -stable)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix the regression from "direct-io: Fix negative return from dio read beyond eof" 9p: ->evict_inode() should kick out ->i_data, not ->i_mapping
2015-12-08teach nfs_get_link() to work in RCU modeAl Viro
based upon the corresponding patch from Neil's March patchset, again with kmap-related horrors removed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08teach proc_self_get_link()/proc_thread_self_get_link() to work in RCU modeAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08teach page_get_link() to work in RCU modeAl Viro
more or less along the lines of Neil's patchset, sans the insanity around kmap(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08replace ->follow_link() with new method that could stay in RCU modeAl Viro
new method: ->get_link(); replacement of ->follow_link(). The differences are: * inode and dentry are passed separately * might be called both in RCU and non-RCU mode; the former is indicated by passing it a NULL dentry. * when called that way it isn't allowed to block and should return ERR_PTR(-ECHILD) if it needs to be called in non-RCU mode. It's a flagday change - the old method is gone, all in-tree instances converted. Conversion isn't hard; said that, so far very few instances do not immediately bail out when called in RCU mode. That'll change in the next commits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08don't put symlink bodies in pagecache into highmemAl Viro
kmap() in page_follow_link_light() needed to go - allowing to hold an arbitrary number of kmaps for long is a great way to deadlocking the system. new helper (inode_nohighmem(inode)) needs to be used for pagecache symlinks inodes; done for all in-tree cases. page_follow_link_light() instrumented to yell about anything missed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08fix the regression from "direct-io: Fix negative return from dio read beyond ↵Al Viro
eof" Sure, it's better to bail out of past-the-eof read and return 0 than return a bogus negative value on such. Only we'd better make sure we are bailing out with 0 and not -ENOMEM... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-089p: ->evict_inode() should kick out ->i_data, not ->i_mappingAl Viro
For block devices the pagecache is associated with the inode on bdevfs, not with the aliasing ones on the mountable filesystems. The latter have its own ->i_data empty and ->i_mapping pointing to the (unique per major/minor) bdevfs inode. That guarantees cache coherence between all block device inodes with the same device number. Eviction of an alias inode has no business trying to evict the pages belonging to bdevfs one; moreover, ->i_mapping is only safe to access when the thing is opened. At the time of ->evict_inode() the victim is definitely *not* opened. We are about to kill the address space embedded into struct inode (inode->i_data) and that's what we need to empty of any pages. 9p instance tries to empty inode->i_mapping instead, which is both unsafe and bogus - if we have several device nodes with the same device number in different places, closing one of them should not try to empty the (shared) page cache. Fortunately, other instances in the tree are OK; they are evicting from &inode->i_data instead, as 9p one should. Cc: stable@vger.kernel.org # v2.6.32+, ones prior to 2.6.36 need only half of that Reported-by: "Suzuki K. Poulose" <Suzuki.Poulose@arm.com> Tested-by: "Suzuki K. Poulose" <Suzuki.Poulose@arm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-07SUNRPC: Fix callback channelTrond Myklebust
The NFSv4.1 callback channel is currently broken because the receive message will keep shrinking because the backchannel receive buffer size never gets reset. The easiest solution to this problem is instead of changing the receive buffer, to rather adjust the copied request. Fixes: 38b7631fbe42 ("nfs4: limit callback decoding to received bytes") Cc: Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-07Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Ext4 bug fixes for v4.4, including fixes for post-2038 time encodings, some endian conversion problems with ext4 encryption, potential memory leaks after truncate in data=journal mode, and an ocfs2 regression caused by a jbd2 performance improvement" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: fix null committed data return in undo_access ext4: add "static" to ext4_seq_##name##_fops struct ext4: fix an endianness bug in ext4_encrypted_follow_link() ext4: fix an endianness bug in ext4_encrypted_zeroout() jbd2: Fix unreclaimed pages after truncate in data=journal mode ext4: Fix handling of extended tv_sec
2015-12-06xfs: Change how listxattr generates synthetic attributesAndreas Gruenbacher
Instead of adding the synthesized POSIX ACL attribute names after listing all non-synthesized attributes, generate them immediately when listing the non-synthesized attributes. In addition, merge xfs_xattr_put_listent and xfs_xattr_put_listent_sizes to ensure that the list size is computed correctly; the split version was overestimating the list size for non-root users. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: xfs@oss.sgi.com Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06tmpfs: listxattr should include POSIX ACL xattrsAndreas Gruenbacher
When a file on tmpfs has an ACL or a Default ACL, listxattr should include the corresponding xattr name. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06tmpfs: Use xattr handler infrastructureAndreas Gruenbacher
Use the VFS xattr handler infrastructure and get rid of similar code in the filesystem. For implementing shmem_xattr_handler_set, we need a version of simple_xattr_set which removes the attribute when value is NULL. Use this to implement kernfs_iop_removexattr as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06btrfs: Use xattr handler infrastructureAndreas Gruenbacher
Use the VFS xattr handler infrastructure and get rid of similar code in the filesystem. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06vfs: Distinguish between full xattr names and proper prefixesAndreas Gruenbacher
Add an additional "name" field to struct xattr_handler. When the name is set, the handler matches attributes with exactly that name. When the prefix is set instead, the handler matches attributes with the given prefix and with a non-empty suffix. This patch should avoid bugs like the one fixed in commit c361016a in the future. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06posix acls: Remove duplicate xattr name definitionsAndreas Gruenbacher
Remove POSIX_ACL_XATTR_{ACCESS,DEFAULT} and GFS2_POSIX_ACL_{ACCESS,DEFAULT} and replace them with the definitions in <include/uapi/linux/xattr.h>. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06gfs2: Remove gfs2_xattr_acl_chmodAndreas Gruenbacher
Function gfs2_xattr_acl_chmod is unused since commit e01580bf. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Acked-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06vfs: Remove vfs_xattr_cmpAndreas Gruenbacher
This function was only briefly used in security/integrity/evm, between commits 66dbc325 and 15647eb3. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06namei: page_getlink() and page_follow_link_light() are the same thingAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06ufs: get rid of ->setattr() for symlinksAl Viro
It was to needed for a couple of months in 2010, until UFS quota support got dropped. Since then it's equivalent to simple_setattr() (i.e. the default) for everything except the regular files. And dropping it there allows to convert all UFS symlinks to {page,simple}_symlink_inode_operations, getting rid of fs/ufs/symlink.c completely. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>