summaryrefslogtreecommitdiff
path: root/fs/namei.c
AgeCommit message (Collapse)Author
2016-01-22wrappers for ->i_mutex accessAl Viro
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-12Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "All kinds of stuff. That probably should've been 5 or 6 separate branches, but by the time I'd realized how large and mixed that bag had become it had been too close to -final to play with rebasing. Some fs/namei.c cleanups there, memdup_user_nul() introduction and switching open-coded instances, burying long-dead code, whack-a-mole of various kinds, several new helpers for ->llseek(), assorted cleanups and fixes from various people, etc. One piece probably deserves special mention - Neil's lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets called without ->i_mutex and tries to avoid ever taking it. That, of course, means that it's not useful for any directory modifications, but things like getting inode attributes in nfds readdirplus are fine with that. I really should've asked for moratorium on lookup-related changes this cycle, but since I hadn't done that early enough... I *am* asking for that for the coming cycle, though - I'm going to try and get conversion of i_mutex to rwsem with ->lookup() done under lock taken shared. There will be a patch closer to the end of the window, along the lines of the one Linus had posted last May - mechanical conversion of ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/ inode_is_locked()/inode_lock_nested(). To quote Linus back then: ----- | This is an automated patch using | | sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/' | sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/' | sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/' | sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/' | sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/' | | with a very few manual fixups ----- I'm going to send that once the ->i_mutex-affecting stuff in -next gets mostly merged (or when Linus says he's about to stop taking merges)" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) nfsd: don't hold i_mutex over userspace upcalls fs:affs:Replace time_t with time64_t fs/9p: use fscache mutex rather than spinlock proc: add a reschedule point in proc_readfd_common() logfs: constify logfs_block_ops structures fcntl: allow to set O_DIRECT flag on pipe fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE fs: xattr: Use kvfree() [s390] page_to_phys() always returns a multiple of PAGE_SIZE nbd: use ->compat_ioctl() fs: use block_device name vsprintf helper lib/vsprintf: add %*pg format specifier fs: use gendisk->disk_name where possible poll: plug an unused argument to do_poll amdkfd: don't open-code memdup_user() cdrom: don't open-code memdup_user() rsxx: don't open-code memdup_user() mtip32xx: don't open-code memdup_user() [um] mconsole: don't open-code memdup_user_nul() [um] hostaudio: don't open-code memdup_user() ...
2016-01-09nfsd: don't hold i_mutex over userspace upcallsNeilBrown
We need information about exports when crossing mountpoints during lookup or NFSv4 readdir. If we don't already have that information cached, we may have to ask (and wait for) rpc.mountd. In both cases we currently hold the i_mutex on the parent of the directory we're asking rpc.mountd about. We've seen situations where rpc.mountd performs some operation on that directory that tries to take the i_mutex again, resulting in deadlock. With some care, we may be able to avoid that in rpc.mountd. But it seems better just to avoid holding a mutex while waiting on userspace. It appears that lookup_one_len is pretty much the only operation that needs the i_mutex. So we could just drop the i_mutex elsewhere and do something like mutex_lock() lookup_one_len() mutex_unlock() In many cases though the lookup would have been cached and not required the i_mutex, so it's more efficient to create a lookup_one_len() variant that only takes the i_mutex when necessary. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-04don't carry MAY_OPEN in op->acc_modeAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-30switch ->get_link() to delayed_call, kill ->put_link()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08teach page_get_link() to work in RCU modeAl Viro
more or less along the lines of Neil's patchset, sans the insanity around kmap(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08replace ->follow_link() with new method that could stay in RCU modeAl Viro
new method: ->get_link(); replacement of ->follow_link(). The differences are: * inode and dentry are passed separately * might be called both in RCU and non-RCU mode; the former is indicated by passing it a NULL dentry. * when called that way it isn't allowed to block and should return ERR_PTR(-ECHILD) if it needs to be called in non-RCU mode. It's a flagday change - the old method is gone, all in-tree instances converted. Conversion isn't hard; said that, so far very few instances do not immediately bail out when called in RCU mode. That'll change in the next commits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08don't put symlink bodies in pagecache into highmemAl Viro
kmap() in page_follow_link_light() needed to go - allowing to hold an arbitrary number of kmaps for long is a great way to deadlocking the system. new helper (inode_nohighmem(inode)) needs to be used for pagecache symlinks inodes; done for all in-tree cases. page_follow_link_light() instrumented to yell about anything missed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06restore_nameidata(): no need to clear now->stackAl Viro
microoptimization: in all callers *now is in the frame we are about to leave. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06namei.c: take "jump to root" into a new helperAl Viro
... and use it both in path_init() (for absolute pathnames) and get_link() (for absolute symlinks). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06path_init(): set nd->inode earlier in cwd-relative caseAl Viro
that allows to kill the recheck of nd->seq on the way out in this case, and this check on the way out is left only for absolute pathnames. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06namei.c: fold set_root_rcu() into set_root()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06typo in fs/namei.c commentMike Marshall
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06namei: page_getlink() and page_follow_link_light() are the same thingAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06Don't reset ->total_link_count on nested calls of vfs_path_lookup()Al Viro
we already zero it on outermost set_nameidata(), so initialization in path_init() is pointless and wrong. The same DoS exists on pre-4.2 kernels, but there a slightly different fix will be needed. Cc: stable@vger.kernel.org # v4.2 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge second patch-bomb from Andrew Morton: - most of the rest of MM - procfs - lib/ updates - printk updates - bitops infrastructure tweaks - checkpatch updates - nilfs2 update - signals - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc, dma-debug, dma-mapping, ... * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) ipc,msg: drop dst nil validation in copy_msg include/linux/zutil.h: fix usage example of zlib_adler32() panic: release stale console lock to always get the logbuf printed out dma-debug: check nents in dma_sync_sg* dma-mapping: tidy up dma_parms default handling pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode kexec: use file name as the output message prefix fs, seqfile: always allow oom killer seq_file: reuse string_escape_str() fs/seq_file: use seq_* helpers in seq_hex_dump() coredump: change zap_threads() and zap_process() to use for_each_thread() coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT) signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() signal: turn dequeue_signal_lock() into kernel_dequeue_signal() signals: kill block_all_signals() and unblock_all_signals() nilfs2: fix gcc uninitialized-variable warnings in powerpc build nilfs2: fix gcc unused-but-set-variable warnings MAINTAINERS: nilfs2: add header file for tracing nilfs2: add tracepoints for analyzing reading and writing metadata files ...
2015-11-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina: "Trivial stuff from trivial tree that can be trivially summed up as: - treewide drop of spurious unlikely() before IS_ERR() from Viresh Kumar - cosmetic fixes (that don't really affect basic functionality of the driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek - various comment / printk fixes and updates all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: bcache: Really show state of work pending bit hwmon: applesmc: fix comment typos Kconfig: remove comment about scsi_wait_scan module class_find_device: fix reference to argument "match" debugfs: document that debugfs_remove*() accepts NULL and error values net: Drop unlikely before IS_ERR(_OR_NULL) mm: Drop unlikely before IS_ERR(_OR_NULL) fs: Drop unlikely before IS_ERR(_OR_NULL) drivers: net: Drop unlikely before IS_ERR(_OR_NULL) drivers: misc: Drop unlikely before IS_ERR(_OR_NULL) UBI: Update comments to reflect UBI_METAONLY flag pktcdvd: drop null test before destroy functions
2015-11-06mm, fs: introduce mapping_gfp_constraint()Michal Hocko
There are many places which use mapping_gfp_mask to restrict a more generic gfp mask which would be used for allocations which are not directly related to the page cache but they are performed in the same context. Let's introduce a helper function which makes the restriction explicit and easier to track. This patch doesn't introduce any functional changes. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull userns hardlink capability check fix from Eric Biederman: "This round just contains a single patch. There has been a lot of other work this period but it is not quite ready yet, so I am pushing it until 4.5. The remaining change by Dirk Steinmetz wich fixes both Gentoo and Ubuntu containers allows hardlinks if we have the appropriate capabilities in the user namespace. Security wise it is really a gimme as the user namespace root can already call setuid become that user and create the hardlink" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: namei: permit linking with CAP_FOWNER in userns
2015-10-27namei: permit linking with CAP_FOWNER in usernsDirk Steinmetz
Attempting to hardlink to an unsafe file (e.g. a setuid binary) from within an unprivileged user namespace fails, even if CAP_FOWNER is held within the namespace. This may cause various failures, such as a gentoo installation within a lxc container failing to build and install specific packages. This change permits hardlinking of files owned by mapped uids, if CAP_FOWNER is held for that namespace. Furthermore, it improves consistency by using the existing inode_owner_or_capable(), which is aware of namespaced capabilities as of 23adbe12ef7d3 ("fs,userns: Change inode_capable to capable_wrt_inode_uidgid"). Signed-off-by: Dirk Steinmetz <public@rsjtdrjgfuzkfg.com> This is hitting us in Ubuntu during some dpkg upgrades in containers. When upgrading a file dpkg creates a hard link to the old file to back it up before overwriting it. When packages upgrade suid files owned by a non-root user the link isn't permitted, and the package upgrade fails. This patch fixes our problem. Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2015-10-10namei: results of d_is_negative() should be checked after dentry revalidationTrond Myklebust
Leandro Awa writes: "After switching to version 4.1.6, our parallelized and distributed workflows now fail consistently with errors of the form: T34: ./regex.c:39:22: error: config.h: No such file or directory From our 'git bisect' testing, the following commit appears to be the possible cause of the behavior we've been seeing: commit 766c4cbfacd8" Al Viro says: "What happens is that 766c4cbfacd8 got the things subtly wrong. We used to treat d_is_negative() after lookup_fast() as "fall with ENOENT". That was wrong - checking ->d_flags outside of ->d_seq protection is unreliable and failing with hard error on what should've fallen back to non-RCU pathname resolution is a bug. Unfortunately, we'd pulled the test too far up and ran afoul of another kind of staleness. The dentry might have been absolutely stable from the RCU point of view (and we might be on UP, etc), but stale from the remote fs point of view. If ->d_revalidate() returns "it's actually stale", dentry gets thrown away and the original code wouldn't even have looked at its ->d_flags. What we need is to check ->d_flags where 766c4cbfacd8 does (prior to ->d_seq validation) but only use the result in cases where we do not discard this dentry outright" Reported-by: Leandro Awa <lawa@nvidia.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=104911 Fixes: 766c4cbfacd8 ("namei: d_is_negative() should be checked...") Tested-by: Leandro Awa <lawa@nvidia.com> Cc: stable@vger.kernel.org # v4.1+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-29fs: Drop unlikely before IS_ERR(_OR_NULL)Viresh Kumar
IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there is no need to do that again from its callers. Drop it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve French <smfrench@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-09-10namei: fix warning while make xmldocs caused by namei.cMasanari Iida
Fix the following warnings: Warning(.//fs/namei.c:2422): No description found for parameter 'nd' Warning(.//fs/namei.c:2422): Excess function parameter 'nameidata' description in 'path_mountpoint' Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-21vfs: Test for and handle paths that are unreachable from their mnt_rootEric W. Biederman
In rare cases a directory can be renamed out from under a bind mount. In those cases without special handling it becomes possible to walk up the directory tree to the root dentry of the filesystem and down from the root dentry to every other file or directory on the filesystem. Like division by zero .. from an unconnected path can not be given a useful semantic as there is no predicting at which path component the code will realize it is unconnected. We certainly can not match the current behavior as the current behavior is a security hole. Therefore when encounting .. when following an unconnected path return -ENOENT. - Add a function path_connected to verify path->dentry is reachable from path->mnt.mnt_root. AKA to validate that rename did not do something nasty to the bind mount. To avoid races path_connected must be called after following a path component to it's next path component. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-08-04may_follow_link() should use nd->inodeAl Viro
Now that we can get there in RCU mode, we shouldn't play with nd->path.dentry->d_inode - it's not guaranteed to be stable. Use nd->inode instead. Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-08-01link_path_walk(): be careful when failing with ENOTDIRAl Viro
In RCU mode we might end up with dentry evicted just we check that it's a directory. In such case we should return ECHILD rather than ENOTDIR, so that pathwalk would be retries in non-RCU mode. Breakage had been introduced in commit b18825a - prior to that we were looking at nd->inode, which had been fetched before verifying that ->d_seq was still valid. That form of check would only be satisfied if at some point the pathname prefix would indeed have resolved to a non-directory. The fix consists of checking ->d_seq after we'd run into a non-directory dentry, and failing with ECHILD in case of mismatch. Note that all branches since 3.12 have that problem... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-29namei: make set_root_rcu() return voidAl Viro
The only caller that cares about its return value can just as easily pick it from nd->root_seq itself. We used to just calculate it and return to caller, but these days we are storing it in nd->root_seq in all cases. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15turn user_{path_at,path,lpath,path_dir}() into static inlinesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: move saved_nd pointer into struct nameidataAl Viro
these guys are always declared next to each other; might as well put the former (pointer to previous instance) into the latter and simplify the calling conventions for {set,restore}_nameidata() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15inline user_path_create()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15inline user_path_parent()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: trim do_last() argumentsAl Viro
now that struct filename is stashed in nameidata we have no need to pass it in Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: stash dfd and name into nameidataAl Viro
fewer arguments to pass around... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: fold path_cleanup() into terminate_walk()Al Viro
they are always called next to each other; moreover, terminate_walk() is more symmetrical that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: saner calling conventions for filename_parentat()Al Viro
a) make it reject ERR_PTR() for name b) make it putname(name) on all other failure exits c) make it return name on success again, simplifies the callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: saner calling conventions for filename_create()Al Viro
a) make it reject ERR_PTR() for name b) make it putname(name) upon return in all other cases. seriously simplifies the callers... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: shift nameidata down into filename_parentat()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: make filename_lookup() reject ERR_PTR() passed as nameAl Viro
makes for much easier life in callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: shift nameidata inside filename_lookup()Al Viro
pass root instead; non-NULL => copy to nd.root and set LOOKUP_ROOT in flags Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: move putname() call into filename_lookup()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: pass the struct path to store the result down into path_lookupat()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: uninline set_root{,_rcu}()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: be careful with mountpoint crossings in follow_dotdot_rcu()Al Viro
Otherwise we are risking a hard error where nonlazy restart would be the right thing to do; it's a very narrow race with mount --move and most of the time it ends up being completely harmless, but it's possible to construct a case when we'll get a bogus hard error instead of falling back to non-lazy walk... For one thing, when crossing _into_ overmount of parent we need to check for mount_lock bumps when we get NULL from __lookup_mnt() as well. For another, and less exotically, we need to make sure that the data fetched in follow_up_rcu() had been consistent. ->mnt_mountpoint is pinned for as long as it is a mountpoint, but we need to check mount_lock after fetching to verify that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: unlazy_walk() doesn't need to mess with current->fs anymoreAl Viro
now that we have ->root_seq, legitimize_path(&nd->root, nd->root_seq) will do just fine... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: handle absolute symlinks without dropping out of RCU modeAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15enable passing fast relative symlinks without dropping out of RCU modeAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15VFS/namei: make the use of touch_atime() in get_link() RCU-safe.NeilBrown
touch_atime is not RCU-safe, and so cannot be called on an RCU walk. However, in situations where RCU-walk makes a difference, the symlink will likely to accessed much more often than it is useful to update the atime. So split out the test of "Does the atime actually need to be updated" into atime_needs_update(), and have get_link() unlazy if it finds that it will need to do that update. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: don't unlazy until get_link()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: make unlazy_walk and terminate_walk handle nd->stack, add unlazy_linkAl Viro
We are almost done - primitives for leaving RCU mode are aware of nd->stack now, a new primitive for going to non-RCU mode when we have a symlink on hands added. The thing we are heavily relying upon is that *any* unlazy failure will be shortly followed by terminate_walk(), with no access to nameidata in between. So it's enough to leave the things in a state terminate_walk() would cope with. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-11namei: store seq numbers in nd->stack[]Al Viro
we'll need them for unlazy_walk() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>