summaryrefslogtreecommitdiff
path: root/net/sunrpc/rpc_pipe.c
AgeCommit message (Collapse)Author
2019-12-04kernel/notifier.c: remove blocking_notifier_chain_cond_register()Xiaoming Ni
blocking_notifier_chain_cond_register() does not consider system_booting state, which is the only difference between this function and blocking_notifier_cain_register(). This can be a bug and is a piece of duplicate code. Delete blocking_notifier_chain_cond_register() Link: http://lkml.kernel.org/r/1568861888-34045-4-git-send-email-nixiaoming@huawei.com Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Sam Protsenko <semen.protsenko@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vasily Averin <vvs@virtuozzo.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-05new helper: get_tree_keyed()Al Viro
For vfs_get_keyed_super users. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-19Merge branch 'work.mount0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs mount updates from Al Viro: "The first part of mount updates. Convert filesystems to use the new mount API" * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) mnt_init(): call shmem_init() unconditionally constify ksys_mount() string arguments don't bother with registering rootfs init_rootfs(): don't bother with init_ramfs_fs() vfs: Convert smackfs to use the new mount API vfs: Convert selinuxfs to use the new mount API vfs: Convert securityfs to use the new mount API vfs: Convert apparmorfs to use the new mount API vfs: Convert openpromfs to use the new mount API vfs: Convert xenfs to use the new mount API vfs: Convert gadgetfs to use the new mount API vfs: Convert oprofilefs to use the new mount API vfs: Convert ibmasmfs to use the new mount API vfs: Convert qib_fs/ipathfs to use the new mount API vfs: Convert efivarfs to use the new mount API vfs: Convert configfs to use the new mount API vfs: Convert binfmt_misc to use the new mount API convenience helper: get_tree_single() convenience helper get_tree_nodev() vfs: Kill sget_userns() ...
2019-06-20rpc_pipefs: call fsnotify_{unlink,rmdir}() hooksAmir Goldstein
This will allow generating fsnotify delete events after the fsnotify_nameremove() hook is removed from d_delete(). Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-05-25vfs: Convert rpc_pipefs to use the new mount APIDavid Howells
Convert the rpc_pipefs filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <dhowells@redhat.com> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Anna Schumaker <anna.schumaker@netapp.com> cc: "J. Bruce Fields" <bfields@fieldses.org> cc: Jeff Layton <jlayton@kernel.org> cc: linux-nfs@vger.kernel.org
2019-05-21treewide: Add SPDX license identifier for missed filesThomas Gleixner
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-01rpcpipe: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-01-02sunrpc: convert to DEFINE_SHOW_ATTRIBUTEYangtao Li
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-06-04Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Misc bits and pieces not fitting into anything more specific" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: delete unnecessary assignment in vfs_listxattr Documentation: filesystems: update filesystem locking documentation vfs: namei: use path_equal() in follow_dotdot() fs.h: fix outdated comment about file flags __inode_security_revalidate() never gets NULL opt_dentry make xattr_getsecurity() static vfat: simplify checks in vfat_lookup() get rid of dead code in d_find_alias() it's SB_BORN, not MS_BORN... msdos_rmdir(): kill BS comment remove rpc_rmdir() fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range()
2018-04-16remove rpc_rmdir()Al Viro
no users since 2014... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-15rpc_pipefs: fix double-dput()Al Viro
if we ever hit rpc_gssd_dummy_depopulate() dentry passed to it has refcount equal to 1. __rpc_rmpipe() drops it and dput() done after that hits an already freed dentry. Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-26net: Use octal not symbolic permissionsJoe Perches
Prefer the direct use of octal for permissions. Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace and some typing. Miscellanea: o Whitespace neatening around these conversions. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-11vfs: do bulk POLL* -> EPOLL* replacementLinus Torvalds
This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27net: annotate ->poll() instancesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-17sunrpc: remove net pointer from messagesVasily Averin
Publishing of net pointer is not safe, use net->ns.inum as net ID [ 171.391947] RPC: created new rpcb local clients (rpcb_local_clnt: ..., rpcb_local_clnt4: ...) for net f00001e7 [ 171.767188] NFSD: starting 90-second grace period (net f00001e7) Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27fs: Replace CURRENT_TIME with current_time() for inode timestampsDeepa Dinamani
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_time() instead. CURRENT_TIME is also not y2038 safe. This is also in preparation for the patch that transitions vfs timestamps to use 64 bit time and hence make them y2038 safe. As part of the effort current_time() will be extended to do range checks. Hence, it is necessary for all file system timestamps to use current_time(). Also, current_time() will be transitioned along with vfs to be y2038 safe. Note that whenever a single call to current_time() is used to change timestamps in different inodes, it is because they share the same time granularity. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Felipe Balbi <balbi@kernel.org> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-23vfs: Pass data, ns, and ns->userns to mount_nsEric W. Biederman
Today what is normally called data (the mount options) is not passed to fill_super through mount_ns. Pass the mount options and the namespace separately to mount_ns so that filesystems such as proc that have mount options, can use mount_ns. Pass the user namespace to mount_ns so that the standard permission check that verifies the mounter has permissions over the namespace can be performed in mount_ns instead of in each filesystems .mount method. Thus removing the duplication between mqueuefs and proc in terms of permission checks. The extra permission check does not currently affect the rpc_pipefs filesystem and the nfsd filesystem as those filesystems do not currently allow unprivileged mounts. Without unpvileged mounts it is guaranteed that the caller has already passed capable(CAP_SYS_ADMIN) which guarantees extra permission check will pass. Update rpc_pipefs and the nfsd filesystem to ensure that the network namespace reference is always taken in fill_super and always put in kill_sb so that the logic is simpler and so that errors originating inside of fill_super do not cause a network namespace leak. Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-04-04mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macrosKirill A. Shutemov
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22wrappers for ->i_mutex accessAl Viro
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-14kmemcg: account certain kmem allocations to memcgVladimir Davydov
Mark those kmem allocations that are known to be easily triggered from userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to memcg. For the list, see below: - threadinfo - task_struct - task_delay_info - pid - cred - mm_struct - vm_area_struct and vm_region (nommu) - anon_vma and anon_vma_chain - signal_struct - sighand_struct - fs_struct - files_struct - fdtable and fdtable->full_fds_bits - dentry and external_name - inode for all filesystems. This is the most tedious part, because most filesystems overwrite the alloc_inode method. The list is far from complete, so feel free to add more objects. Nevertheless, it should be close to "account everything" approach and keep most workloads within bounds. Malevolent users will be able to breach the limit, but this was possible even with the former "account everything" approach (simply because it did not account everything in fact). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15VFS: net/: d_inode() annotationsDavid Howells
socket inodes and sunrpc filesystems - inodes owned by that code Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-07-12rpc_pipe: Drop memory allocation castHimangi Saraogi
Drop cast on the result of kmalloc and similar functions. The semantic patch that makes this change is as follows: // <smpl> @@ type T; @@ - (T *) (\(kmalloc\|kzalloc\|kcalloc\|kmem_cache_alloc\|kmem_cache_zalloc\| kmem_cache_alloc_node\|kmalloc_node\|kzalloc_node\)(...)) // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2013-12-10rpc_pipe: fix cleanup of dummy gssd directory when notification failsJeff Layton
Currently, it could leak dentry references in some cases. Make sure we clean up properly. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-12-06sunrpc: add an "info" file for the dummy gssd pipeJeff Layton
rpc.gssd expects to see an "info" file in each clntXX dir. Since adding the dummy gssd pipe, users that run rpc.gssd see a lot of these messages spamming the logs: rpc.gssd[508]: ERROR: can't open /var/lib/nfs/rpc_pipefs/gssd/clntXX/info: No such file or directory rpc.gssd[508]: ERROR: failed to read service info Add a dummy gssd/clntXX/info file to help silence these messages. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-12-06rpc_pipe: remove the clntXX dir if creating the pipe failsJeff Layton
In the event that we create the gssd/clntXX dir, but the pipe creation subsequently fails, then we should remove the clntXX dir before returning. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-12-06sunrpc: replace sunrpc_net->gssd_running flag with a more reliable checkJeff Layton
Now that we have a more reliable method to tell if gssd is running, we can replace the sn->gssd_running flag with a function that will query to see if it's up and running. There's also no need to attempt an upcall that we know will fail, so just return -EACCES if gssd isn't running. Finally, fix the warn_gss() message not to claim that that the upcall timed out since we don't necesarily perform one now when gssd isn't running, and remove the extraneous newline from the message. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-12-06sunrpc: create a new dummy pipe for gssd to hold openJeff Layton
rpc.gssd will naturally hold open any pipe named */clnt*/gssd that shows up under rpc_pipefs. That behavior gives us a reliable mechanism to tell whether it's actually running or not. Create a new toplevel "gssd" directory in rpc_pipefs when it's mounted. Under that directory create another directory called "clntXX", and then within that a pipe called "gssd". We'll never send an upcall along that pipe, and any downcall written to it will just return -EINVAL. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-15consolidate simple ->d_delete() instancesAl Viro
Rename simple_delete_dentry() to always_delete_dentry() and export it. Export simple_dentry_operations, while we are at it, and get rid of their duplicates Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24sunrpc: switch to %pdAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-01SUNRPC: Add a helper to allow sharing of rpc_pipefs directory objectsTrond Myklebust
Add support for looking up existing objects and creating new ones if there is no match. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-01SUNRPC: Remove the rpc_client->cl_dentryTrond Myklebust
It is now redundant. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-30SUNRPC: Add a framework to clean up management of rpc_pipefs directoriesTrond Myklebust
The current system requires everyone to set up notifiers, manage directory locking, etc. What we really want to do is have the rpc_client create its directory, and then create all the entries. This patch will allow the RPCSEC_GSS and NFS code to register all the objects that they want to have appear in the directory, and then have the sunrpc code call them back to actually create/destroy their pipefs dentries when the rpc_client creates/destroys the parent. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-30SUNRPC: Deprecate rpc_client->cl_protnameTrond Myklebust
It just duplicates the cl_program->name, and is not used in any fast paths where the extra dereference will cause a hit. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-23rpc_pipe: convert back to simple_dir_inode_operationsJeff Layton
Now that Al has fixed simple_lookup to account for the case where sb->s_d_op is set, there's no need to keep our own special lookup op. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs stuff from Al Viro: "O_TMPFILE ABI changes, Oleg's fput() series, misc cleanups, including making simple_lookup() usable for filesystems with non-NULL s_d_op, which allows us to get rid of quite a bit of ugliness" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: sunrpc: now we can just set ->s_d_op cgroup: we can use simple_lookup() now efivarfs: we can use simple_lookup() now make simple_lookup() usable for filesystems that set ->s_d_op configfs: don't open-code d_alloc_name() __rpc_lookup_create_exclusive: pass string instead of qstr rpc_create_*_dir: don't bother with qstr llist: llist_add() can use llist_add_batch() llist: fix/simplify llist_add() and llist_add_batch() fput: turn "list_head delayed_fput_list" into llist_head fs/file_table.c:fput(): add comment Safer ABI for O_TMPFILE
2013-07-14sunrpc: now we can just set ->s_d_opAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-14__rpc_lookup_create_exclusive: pass string instead of qstrAl Viro
... and use d_hash_and_lookup() instead of open-coding it, for fsck sake... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-14rpc_create_*_dir: don't bother with qstrAl Viro
just pass the name Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-09rpc_pipe: rpc_dir_inode_operations can be staticFengguang Wu
Hi Jeff, FYI, there are new sparse warnings show up in tree: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git nfs-for-next head: 296afe1f58d55fd56ed85daaafafcfee39f59ece commit: 76fa66657900071016f2bae61de28f059f3f2abf [2/5] rpc_pipe: set dentry operations at d_alloc time >> net/sunrpc/rpc_pipe.c:496:31: sparse: symbol 'rpc_dir_inode_operations' was not declared. Should it be static? Please consider folding the attached diff :-) Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-09rpc_pipe: set dentry operations at d_alloc timeJeff Layton
Currently the way these get set is a little convoluted. If the dentry is allocated via lookup from userland, then it gets set by simple_lookup. If it gets allocated when the kernel is populating the directory, then it gets set via __rpc_lookup_create_exclusive, which has to check whether they might already be set. Between both of these, this ensures that all dentries have their d_op pointer set. Instead of doing that, just have them set at d_alloc time by pointing sb->s_d_op at them. With that change, we no longer want the lookup op to set them, so we must move to using our own lookup routine. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28SUNRPC: fix races on PipeFS UMOUNT notificationsStanislav Kinsbursky
CPU#0 CPU#1 ----------------------------- ----------------------------- rpc_kill_sb sn->pipefs_sb = NULL rpc_release_client (UMOUNT_EVENT) rpc_free_auth rpc_pipefs_event rpc_get_client_for_event !atomic_inc_not_zero(cl_count) <skip the client> atomic_inc(cl_count) rpc_free_client rpc_clnt_remove_pipedir <skip client dir removing> To fix this, this patch does the following: 1) Calls RPC_PIPEFS_UMOUNT notification with sn->pipefs_sb_lock being held. 2) Removes SUNRPC client from the list AFTER pipes destroying. 3) Doesn't hold RPC client on notification: if client in the list, then it can't be destroyed while sn->pipefs_sb_lock in hold by notification caller. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28SUNRPC: fix races on PipeFS MOUNT notificationsStanislav Kinsbursky
Below are races, when RPC client can be created without PiepFS dentries CPU#0 CPU#1 ----------------------------- ----------------------------- rpc_new_client rpc_fill_super rpc_setup_pipedir mutex_lock(&sn->pipefs_sb_lock) rpc_get_sb_net == NULL (no per-net PipeFS superblock) sn->pipefs_sb = sb; notifier_call_chain(MOUNT) (client is not in the list) rpc_register_client (client without pipes dentries) To fix this patch: 1) makes PipeFS mount notification call with pipefs_sb_lock being held. 2) releases pipefs_sb_lock on new SUNRPC client creation only after registration. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-18rpc_pipefs: only set rpc_dentry_ops if d_op isn't already setJeff Layton
We had a report of a reproducible WARNING: [ 1360.039358] ------------[ cut here ]------------ [ 1360.043978] WARNING: at fs/dcache.c:1355 d_set_d_op+0x8d/0xc0() [ 1360.049880] Hardware name: HP Z200 Workstation [ 1360.054308] Modules linked in: nfsv4 nfs dns_resolver fscache nfsd auth_rpcgss nfs_acl lockd sunrpc sg acpi_cpufreq mperf coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec hp_wmi crc32c_intel snd_hwdep e1000e snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd sparse_keymap rfkill soundcore serio_raw ptp iTCO_wdt pps_core pcspkr iTCO_vendor_support mei microcode lpc_ich mfd_core wmi xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif radeon i2c_algo_bit drm_kms_helper ttm ahci libahci drm i2c_core libata dm_mirror dm_region_hash dm_log dm_mod [last unloaded: auth_rpcgss] [ 1360.107406] Pid: 8814, comm: mount.nfs4 Tainted: G I -------------- 3.9.0-0.55.el7.x86_64 #1 [ 1360.116771] Call Trace: [ 1360.119219] [<ffffffff810610c0>] warn_slowpath_common+0x70/0xa0 [ 1360.125208] [<ffffffff810611aa>] warn_slowpath_null+0x1a/0x20 [ 1360.131025] [<ffffffff811af46d>] d_set_d_op+0x8d/0xc0 [ 1360.136159] [<ffffffffa05a7d6f>] __rpc_lookup_create_exclusive+0x4f/0x80 [sunrpc] [ 1360.143710] [<ffffffffa05a8cc6>] rpc_mkpipe_dentry+0x86/0x170 [sunrpc] [ 1360.150311] [<ffffffffa062a7b6>] nfs_idmap_new+0x96/0x130 [nfsv4] [ 1360.156475] [<ffffffffa062e7cd>] nfs4_init_client+0xad/0x2d0 [nfsv4] [ 1360.162902] [<ffffffff812f02df>] ? idr_get_empty_slot+0x16f/0x3c0 [ 1360.169062] [<ffffffff812f0582>] ? idr_mark_full+0x52/0x60 [ 1360.174615] [<ffffffff812f0699>] ? idr_alloc+0x79/0xe0 [ 1360.179826] [<ffffffffa0598081>] ? __rpc_init_priority_wait_queue+0x81/0xc0 [sunrpc] [ 1360.187635] [<ffffffffa05980f3>] ? rpc_init_wait_queue+0x13/0x20 [sunrpc] [ 1360.194493] [<ffffffffa05d05da>] nfs_get_client+0x27a/0x350 [nfs] [ 1360.200666] [<ffffffffa062e438>] nfs4_set_client.isra.8+0x78/0x100 [nfsv4] [ 1360.207624] [<ffffffffa062f2f3>] nfs4_create_server+0xf3/0x3a0 [nfsv4] [ 1360.214222] [<ffffffffa06284be>] nfs4_remote_mount+0x2e/0x60 [nfsv4] [ 1360.220644] [<ffffffff8119ea79>] mount_fs+0x39/0x1b0 [ 1360.225691] [<ffffffff81153880>] ? __alloc_percpu+0x10/0x20 [ 1360.231348] [<ffffffff811b7ccf>] vfs_kern_mount+0x5f/0xf0 [ 1360.236822] [<ffffffffa0628396>] nfs_do_root_mount+0x86/0xc0 [nfsv4] [ 1360.243246] [<ffffffffa06287b4>] nfs4_try_mount+0x44/0xc0 [nfsv4] [ 1360.249410] [<ffffffffa05d1457>] ? get_nfs_version+0x27/0x80 [nfs] [ 1360.255659] [<ffffffffa05db985>] nfs_fs_mount+0x5c5/0xd10 [nfs] [ 1360.261650] [<ffffffffa05dc550>] ? nfs_clone_super+0x140/0x140 [nfs] [ 1360.268074] [<ffffffffa05da8e0>] ? param_set_portnr+0x60/0x60 [nfs] [ 1360.274406] [<ffffffff8119ea79>] mount_fs+0x39/0x1b0 [ 1360.279443] [<ffffffff81153880>] ? __alloc_percpu+0x10/0x20 [ 1360.285088] [<ffffffff811b7ccf>] vfs_kern_mount+0x5f/0xf0 [ 1360.290556] [<ffffffff811b9f5d>] do_mount+0x1fd/0xa00 [ 1360.295677] [<ffffffff81137dee>] ? __get_free_pages+0xe/0x50 [ 1360.301405] [<ffffffff811b9be6>] ? copy_mount_options+0x36/0x170 [ 1360.307479] [<ffffffff811ba7e3>] sys_mount+0x83/0xc0 [ 1360.312515] [<ffffffff8160ad59>] system_call_fastpath+0x16/0x1b [ 1360.318503] ---[ end trace 8fa1f4cbc36094a7 ]--- The problem is that we're ending up in __rpc_lookup_create_exclusive with a negative dentry that already has d_op set. A little debugging has shown that when we hit this, the d_ops are already set to simple_dentry_operations. I believe that what's happening is that during a mount, idmapd is racing in and doing a lookup of /var/lib/nfs/rpc_pipefs/nfs/clnt???/idmap. Before that dentry reference is released, the kernel races in to create that file and finds the new negative dentry, which already has the d_op set. This patch just avoids setting the d_op if it's already set. simple_dentry_operations and rpc_dentry_operations are functionally equivalent so it shouldn't matter which one it's set to. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-05-16SUNRPC: Convert auth_gss pipe detection to work in namespacesTrond Myklebust
This seems to have been overlooked when we did the namespace conversion. If a container is running a legacy version of rpc.gssd then it will be disrupted if the global 'pipe_version' is set by a container running the new version of rpc.gssd. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-05-16SUNRPC: Faster detection if gssd is actually runningTrond Myklebust
Recent changes to the NFS security flavour negotiation mean that we have a stronger dependency on rpc.gssd. If the latter is not running, because the user failed to start it, then we time out and mark the container as not having an instance. We then use that information to time out faster the next time. If, on the other hand, the rpc.gssd successfully binds to an rpc_pipe, then we mark the container as having an rpc.gssd instance. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-12fs: Readd the fs module aliases.Eric W. Biederman
I had assumed that the only use of module aliases for filesystems prior to "fs: Limit sys_mount to only request filesystem modules." was in request_module. It turns out I was wrong. At least mkinitcpio in Arch linux uses these aliases. So readd the preexising aliases, to keep from breaking userspace. Userspace eventually will have to follow and use the same aliases the kernel does. So at some point we may be delete these aliases without problems. However that day is not today. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-03fs: Limit sys_mount to only request filesystem modules.Eric W. Biederman
Modify the request_module to prefix the file system type with "fs-" and add aliases to all of the filesystems that can be built as modules to match. A common practice is to build all of the kernel code and leave code that is not commonly needed as modules, with the result that many users are exposed to any bug anywhere in the kernel. Looking for filesystems with a fs- prefix limits the pool of possible modules that can be loaded by mount to just filesystems trivially making things safer with no real cost. Using aliases means user space can control the policy of which filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf with blacklist and alias directives. Allowing simple, safe, well understood work-arounds to known problematic software. This also addresses a rare but unfortunate problem where the filesystem name is not the same as it's module name and module auto-loading would not work. While writing this patch I saw a handful of such cases. The most significant being autofs that lives in the module autofs4. This is relevant to user namespaces because we can reach the request module in get_fs_type() without having any special permissions, and people get uncomfortable when a user specified string (in this case the filesystem type) goes all of the way to request_module. After having looked at this issue I don't think there is any particular reason to perform any filtering or permission checks beyond making it clear in the module request that we want a filesystem module. The common pattern in the kernel is to call request_module() without regards to the users permissions. In general all a filesystem module does once loaded is call register_filesystem() and go to sleep. Which means there is not much attack surface exposed by loading a filesytem module unless the filesystem is mounted. In a user namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT, which most filesystems do not set today. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Reported-by: Kees Cook <keescook@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-22new helper: file_inode(file)Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-11Merge branch 'bugfixes' into nfs-for-nextTrond Myklebust
2012-11-08SUNRPC: Fix validity issues with rpc_pipefs sb->s_fs_infoTrond Myklebust
rpc_kill_sb() must defer calling put_net() until after the notifier has been called, since most (all?) of the notifier callbacks assume that sb->s_fs_info points to a valid net namespace. It also must not call put_net() if the call to rpc_fill_super was unsuccessful. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=48421 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: stable@vger.kernel.org [>= v3.4]