summaryrefslogtreecommitdiff
path: root/fs/overlayfs
AgeCommit message (Collapse)Author
2020-07-16ovl: fix regression with re-formatted lower squashfsAmir Goldstein
Commit 9df085f3c9a2 ("ovl: relax requirement for non null uuid of lower fs") relaxed the requirement for non null uuid with single lower layer to allow enabling index and nfs_export features with single lower squashfs. Fabian reported a regression in a setup when overlay re-uses an existing upper layer and re-formats the lower squashfs image. Because squashfs has no uuid, the origin xattr in upper layer are decoded from the new lower layer where they may resolve to a wrong origin file and user may get an ESTALE or EIO error on lookup. To avoid the reported regression while still allowing the new features with single lower squashfs, do not allow decoding origin with lower null uuid unless user opted-in to one of the new features that require following the lower inode of non-dir upper (index, xino, metacopy). Reported-by: Fabian <godi.beat@gmx.net> Link: https://lore.kernel.org/linux-unionfs/32532923.JtPX5UtSzP@fgdesktop/ Fixes: 9df085f3c9a2 ("ovl: relax requirement for non null uuid of lower fs") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16ovl: fix oops in ovl_indexdir_cleanup() with nfs_export=onAmir Goldstein
Mounting with nfs_export=on, xfstests overlay/031 triggers a kernel panic since v5.8-rc1 overlayfs updates. overlayfs: orphan index entry (index/00fb1..., ftype=4000, nlink=2) BUG: kernel NULL pointer dereference, address: 0000000000000030 RIP: 0010:ovl_cleanup_and_whiteout+0x28/0x220 [overlay] Bisect point at commit c21c839b8448 ("ovl: whiteout inode sharing") Minimal reproducer: -------------------------------------------------- rm -rf l u w m mkdir -p l u w m mkdir -p l/testdir touch l/testdir/testfile mount -t overlay -o lowerdir=l,upperdir=u,workdir=w,nfs_export=on overlay m echo 1 > m/testdir/testfile umount m rm -rf u/testdir mount -t overlay -o lowerdir=l,upperdir=u,workdir=w,nfs_export=on overlay m umount m -------------------------------------------------- When mount with nfs_export=on, and fail to verify an orphan index, we're cleaning this index from indexdir by calling ovl_cleanup_and_whiteout(). This dereferences ofs->workdir, that was earlier set to NULL. The design was that ovl->workdir will point at ovl->indexdir, but we are assigning ofs->indexdir to ofs->workdir only after ovl_indexdir_cleanup(). There is no reason not to do it sooner, because once we get success from ofs->indexdir = ovl_workdir_create(... there is no turning back. Reported-and-tested-by: Murphy Zhou <jencce.kernel@gmail.com> Fixes: c21c839b8448 ("ovl: whiteout inode sharing") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16ovl: relax WARN_ON() when decoding lower directory file handleAmir Goldstein
Decoding a lower directory file handle to overlay path with cold inode/dentry cache may go as follows: 1. Decode real lower file handle to lower dir path 2. Check if lower dir is indexed (was copied up) 3. If indexed, get the upper dir path from index 4. Lookup upper dir path in overlay 5. If overlay path found, verify that overlay lower is the lower dir from step 1 On failure to verify step 5 above, user will get an ESTALE error and a WARN_ON will be printed. A mismatch in step 5 could be a result of lower directory that was renamed while overlay was offline, after that lower directory has been copied up and indexed. This is a scripted reproducer based on xfstest overlay/052: # Create lower subdir create_dirs create_test_files $lower/lowertestdir/subdir mount_dirs # Copy up lower dir and encode lower subdir file handle touch $SCRATCH_MNT/lowertestdir test_file_handles $SCRATCH_MNT/lowertestdir/subdir -p -o $tmp.fhandle # Rename lower dir offline unmount_dirs mv $lower/lowertestdir $lower/lowertestdir.new/ mount_dirs # Attempt to decode lower subdir file handle test_file_handles $SCRATCH_MNT -p -i $tmp.fhandle Since this WARN_ON() can be triggered by user we need to relax it. Fixes: 4b91c30a5a19 ("ovl: lookup connected ancestor of dir in inode cache") Cc: <stable@vger.kernel.org> # v4.16+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16ovl: remove not used argument in ovl_check_originyoungjun
ovl_check_origin outparam 'ctrp' argument not used by caller. So remove this argument. Signed-off-by: youngjun <her0gyugyu@gmail.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16ovl: change ovl_copy_up_flags staticyoungjun
"ovl_copy_up_flags" is used in copy_up.c. so, change it static. Signed-off-by: youngjun <her0gyugyu@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16ovl: inode reference leak in ovl_is_inuse true case.youngjun
When "ovl_is_inuse" true case, trap inode reference not put. plus adding the comment explaining sequence of ovl_is_inuse after ovl_setup_trap. Fixes: 0be0bfd2de9d ("ovl: fix regression caused by overlapping layers detection") Cc: <stable@vger.kernel.org> # v4.19+ Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: youngjun <her0gyugyu@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-09Merge tag 'ovl-update-5.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "Fixes: - Resolve mount option conflicts consistently - Sync before remount R/O - Fix file handle encoding corner cases - Fix metacopy related issues - Fix an unintialized return value - Add missing permission checks for underlying layers Optimizations: - Allow multipe whiteouts to share an inode - Optimize small writes by inheriting SB_NOSEC from upper layer - Do not call ->syncfs() multiple times for sync(2) - Do not cache negative lookups on upper layer - Make private internal mounts longterm" * tag 'ovl-update-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (27 commits) ovl: remove unnecessary lock check ovl: make oip->index bool ovl: only pass ->ki_flags to ovl_iocb_to_rwf() ovl: make private mounts longterm ovl: get rid of redundant members in struct ovl_fs ovl: add accessor for ofs->upper_mnt ovl: initialize error in ovl_copy_xattr ovl: drop negative dentry in upper layer ovl: check permission to open real file ovl: call secutiry hook in ovl_real_ioctl() ovl: verify permissions in ovl_path_open() ovl: switch to mounter creds in readdir ovl: pass correct flags for opening real directory ovl: fix redirect traversal on metacopy dentries ovl: initialize OVL_UPPERDATA in ovl_lookup() ovl: use only uppermetacopy state in ovl_lookup() ovl: simplify setting of origin for index lookup ovl: fix out of bounds access warning in ovl_check_fb_len() ovl: return required buffer size for file handles ovl: sync dirty data when remounting to ro mode ...
2020-06-08ovl: remove unnecessary lock checkyoungjun
Directory is always locked until "out_unlock" label. So lock check is not needed. Signed-off-by: youngjun <her0gyugyu@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-05Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "A lot of bug fixes and cleanups for ext4, including: - Fix performance problems found in dioread_nolock now that it is the default, caused by transaction leaks. - Clean up fiemap handling in ext4 - Clean up and refactor multiple block allocator (mballoc) code - Fix a problem with mballoc with a smaller file systems running out of blocks because they couldn't properly use blocks that had been reserved by inode preallocation. - Fixed a race in ext4_sync_parent() versus rename() - Simplify the error handling in the extent manipulation code - Make sure all metadata I/O errors are felected to ext4_ext_dirty()'s and ext4_make_inode_dirty()'s callers. - Avoid passing an error pointer to brelse in ext4_xattr_set() - Fix race which could result to freeing an inode on the dirty last in data=journal mode. - Fix refcount handling if ext4_iget() fails - Fix a crash in generic/019 caused by a corrupted extent node" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits) ext4: avoid unnecessary transaction starts during writeback ext4: don't block for O_DIRECT if IOCB_NOWAIT is set ext4: remove the access_ok() check in ext4_ioctl_get_es_cache fs: remove the access_ok() check in ioctl_fiemap fs: handle FIEMAP_FLAG_SYNC in fiemap_prep fs: move fiemap range validation into the file systems instances iomap: fix the iomap_fiemap prototype fs: move the fiemap definitions out of fs.h fs: mark __generic_block_fiemap static ext4: remove the call to fiemap_check_flags in ext4_fiemap ext4: split _ext4_fiemap ext4: fix fiemap size checks for bitmap files ext4: fix EXT4_MAX_LOGICAL_BLOCK macro add comment for ext4_dir_entry_2 file_type member jbd2: avoid leaking transaction credits when unreserving handle ext4: drop ext4_journal_free_reserved() ext4: mballoc: use lock for checking free blocks while retrying ext4: mballoc: refactor ext4_mb_good_group() ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling ext4: mballoc: refactor ext4_mb_discard_preallocations() ...
2020-06-04ovl: make oip->index boolMiklos Szeredi
ovl_get_inode() uses oip->index as a bool value, not as a pointer. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-04ovl: only pass ->ki_flags to ovl_iocb_to_rwf()Miklos Szeredi
Next patch will want to pass a modified set of flags, so... Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-04ovl: make private mounts longtermMiklos Szeredi
Overlayfs is using clone_private_mount() to create internal mounts for underlying layers. These are used for operations requiring a path, such as dentry_open(). Since these private mounts are not in any namespace they are treated as short term, "detached" mounts and mntput() involves taking the global mount_lock, which can result in serious cacheline pingpong. Make these private mounts longterm instead, which trade the penalty on mntput() for a slightly longer shutdown time due to an added RCU grace period when putting these mounts. Introduce a new helper kern_unmount_many() that can take care of multiple longterm mounts with a single RCU grace period. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-04ovl: get rid of redundant members in struct ovl_fsMiklos Szeredi
ofs->upper_mnt is copied to ->layers[0].mnt and ->layers[0].trap could be used instead of a separate ->upperdir_trap. Split the lowerdir option early to get the number of layers, then allocate the ->layers array, and finally fill the upper and lower layers, as before. Get rid of path_put_init() in ovl_lower_dir(), since the only caller will take care of that. [Colin Ian King] Fix null pointer dereference on null stack pointer on error return found by Coverity. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-04ovl: add accessor for ofs->upper_mntMiklos Szeredi
Next patch will remove ofs->upper_mnt, so add an accessor function for this field. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-04ovl: initialize error in ovl_copy_xattrYuxuan Shui
In ovl_copy_xattr, if all the xattrs to be copied are overlayfs private xattrs, the copy loop will terminate without assigning anything to the error variable, thus returning an uninitialized value. If ovl_copy_xattr is called from ovl_clear_empty, this uninitialized error value is put into a pointer by ERR_PTR(), causing potential invalid memory accesses down the line. This commit initialize error with 0. This is the correct value because when there's no xattr to copy, because all xattrs are private, ovl_copy_xattr should succeed. This bug is discovered with the help of INIT_STACK_ALL and clang. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Link: https://bugs.chromium.org/p/chromium/issues/detail?id=1050405 Fixes: 0956254a2d5b ("ovl: don't copy up opaqueness") Cc: stable@vger.kernel.org # v4.8 Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-03fs: handle FIEMAP_FLAG_SYNC in fiemap_prepChristoph Hellwig
By moving FIEMAP_FLAG_SYNC handling to fiemap_prep we ensure it is handled once instead of duplicated, but can still be done under fs locks, like xfs/iomap intended with its duplicate handling. Also make sure the error value of filemap_write_and_wait is propagated to user space. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20200523073016.2944131-8-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03fs: move the fiemap definitions out of fs.hChristoph Hellwig
No need to pull the fiemap definitions into almost every file in the kernel build. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ovl: drop negative dentry in upper layerChengguang Xu
Negative dentries of upper layer are useless after construction of overlayfs' own dentry and may keep in the memory long time even after unmount of overlayfs instance. This patch tries to drop unnecessary negative dentry of upper layer to effectively reclaim memory. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-03ovl: check permission to open real fileMiklos Szeredi
Call inode_permission() on real inode before opening regular file on one of the underlying layers. In some cases ovl_permission() already checks access to an underlying file, but it misses the metacopy case, and possibly other ones as well. Removing the redundant permission check from ovl_permission() should be considered later. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-03ovl: call secutiry hook in ovl_real_ioctl()Miklos Szeredi
Verify LSM permissions for underlying file, since vfs_ioctl() doesn't do it. [Stephen Rothwell] export security_file_ioctl Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-02ovl: verify permissions in ovl_path_open()Miklos Szeredi
Check permission before opening a real file. ovl_path_open() is used by readdir and copy-up routines. ovl_permission() theoretically already checked copy up permissions, but it doesn't hurt to re-do these checks during the actual copy-up. For directory reading ovl_permission() only checks access to topmost underlying layer. Readdir on a merged directory accesses layers below the topmost one as well. Permission wasn't checked for these layers. Note: modifying ovl_permission() to perform this check would be far more complex and hence more bug prone. The result is less precise permissions returned in access(2). If this turns out to be an issue, we can revisit this bug. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-02ovl: switch to mounter creds in readdirMiklos Szeredi
In preparation for more permission checking, override credentials for directory operations on the underlying filesystems. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-02ovl: pass correct flags for opening real directoryMiklos Szeredi
The three instances of ovl_path_open() in overlayfs/readdir.c do three different things: - pass f_flags from overlay file - pass O_RDONLY | O_DIRECTORY - pass just O_RDONLY The value of f_flags can be (other than O_RDONLY): O_WRONLY - not possible for a directory O_RDWR - not possible for a directory O_CREAT - masked out by dentry_open() O_EXCL - masked out by dentry_open() O_NOCTTY - masked out by dentry_open() O_TRUNC - masked out by dentry_open() O_APPEND - no effect on directory ops O_NDELAY - no effect on directory ops O_NONBLOCK - no effect on directory ops __O_SYNC - no effect on directory ops O_DSYNC - no effect on directory ops FASYNC - no effect on directory ops O_DIRECT - no effect on directory ops O_LARGEFILE - ? O_DIRECTORY - only affects lookup O_NOFOLLOW - only affects lookup O_NOATIME - overlay sets this unconditionally in ovl_path_open() O_CLOEXEC - only affects fd allocation O_PATH - no effect on directory ops __O_TMPFILE - not possible for a directory Fon non-merge directories we use the underlying filesystem's iterate; in this case honor O_LARGEFILE from the original file to make sure that open doesn't get rejected. For merge directories it's safe to pass O_LARGEFILE unconditionally since userspace will only see the artificial offsets created by overlayfs. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-02ovl: fix redirect traversal on metacopy dentriesVivek Goyal
Amir pointed me to metacopy test cases in unionmount-testsuite and I decided to run "./run --ov=10 --meta" and it failed while running test "rename-mass-5.py". Problem is w.r.t absolute redirect traversal on intermediate metacopy dentry. We do not store intermediate metacopy dentries and also skip current loop/layer and move onto lookup in next layer. But at the end of loop, we have logic to reset "poe" and layer index if currnently looked up dentry has absolute redirect. We skip all that and that means lookup in next layer will fail. Following is simple test case to reproduce this. - mkdir -p lower upper work merged lower/a lower/b - touch lower/a/foo.txt - mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work,metacopy=on none merged # Following will create absolute redirect "/a/foo.txt" on upper/b/bar.txt. - mv merged/a/foo.txt merged/b/bar.txt # unmount overlay and use upper as lower layer (lower2) for next mount. - umount merged - mv upper lower2 - rm -rf work; mkdir -p upper work - mount -t overlay -o lowerdir=lower2:lower,upperdir=upper,workdir=work,metacopy=on none merged # Force a metacopy copy-up - chown bin:bin merged/b/bar.txt # unmount overlay and use upper as lower layer (lower3) for next mount. - umount merged - mv upper lower3 - rm -rf work; mkdir -p upper work - mount -t overlay -o lowerdir=lower3:lower2:lower,upperdir=upper,workdir=work,metacopy=on none merged # ls merged/b/bar.txt ls: cannot access 'bar.txt': Input/output error Intermediate lower layer (lower2) has metacopy dentry b/bar.txt with absolute redirect "/a/foo.txt". We skipped redirect processing at the end of loop which sets poe to roe and sets the appropriate next lower layer index. And that means lookup failed in next layer. Fix this by continuing the loop for any intermediate dentries. We still do not save these at lower stack. With this fix applied unionmount-testsuite, "./run --ov-10 --meta" now passes. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-02ovl: initialize OVL_UPPERDATA in ovl_lookup()Vivek Goyal
Currently ovl_get_inode() initializes OVL_UPPERDATA flag and for that it has to call ovl_check_metacopy_xattr() and check if metacopy xattr is present or not. yangerkun reported sometimes underlying filesystem might return -EIO and in that case error handling path does not cleanup properly leading to various warnings. Run generic/461 with ext4 upper/lower layer sometimes may trigger the bug as below(linux 4.19): [ 551.001349] overlayfs: failed to get metacopy (-5) [ 551.003464] overlayfs: failed to get inode (-5) [ 551.004243] overlayfs: cleanup of 'd44/fd51' failed (-5) [ 551.004941] overlayfs: failed to get origin (-5) [ 551.005199] ------------[ cut here ]------------ [ 551.006697] WARNING: CPU: 3 PID: 24674 at fs/inode.c:1528 iput+0x33b/0x400 ... [ 551.027219] Call Trace: [ 551.027623] ovl_create_object+0x13f/0x170 [ 551.028268] ovl_create+0x27/0x30 [ 551.028799] path_openat+0x1a35/0x1ea0 [ 551.029377] do_filp_open+0xad/0x160 [ 551.029944] ? vfs_writev+0xe9/0x170 [ 551.030499] ? page_counter_try_charge+0x77/0x120 [ 551.031245] ? __alloc_fd+0x160/0x2a0 [ 551.031832] ? do_sys_open+0x189/0x340 [ 551.032417] ? get_unused_fd_flags+0x34/0x40 [ 551.033081] do_sys_open+0x189/0x340 [ 551.033632] __x64_sys_creat+0x24/0x30 [ 551.034219] do_syscall_64+0xd5/0x430 [ 551.034800] entry_SYSCALL_64_after_hwframe+0x44/0xa9 One solution is to improve error handling and call iget_failed() if error is encountered. Amir thinks that this path is little intricate and there is not real need to check and initialize OVL_UPPERDATA in ovl_get_inode(). Instead caller of ovl_get_inode() can initialize this state. And this will avoid double checking of metacopy xattr lookup in ovl_lookup() and ovl_get_inode(). OVL_UPPERDATA is inode flag. So I was little concerned that initializing it outside ovl_get_inode() might have some races. But this is one way transition. That is once a file has been fully copied up, it can't go back to metacopy file again. And that seems to help avoid races. So as of now I can't see any races w.r.t OVL_UPPERDATA being set wrongly. So move settingof OVL_UPPERDATA inside the callers of ovl_get_inode(). ovl_obtain_alias() already does it. So only two callers now left are ovl_lookup() and ovl_instantiate(). Reported-by: yangerkun <yangerkun@huawei.com> Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-02ovl: use only uppermetacopy state in ovl_lookup()Vivek Goyal
Currently we use a variable "metacopy" which signifies that dentry could be either uppermetacopy or lowermetacopy. Amir suggested that we can move code around and use d.metacopy in such a way that we don't need lowermetacopy and just can do away with uppermetacopy. So this patch replaces "metacopy" with "uppermetacopy". It also moves some code little higher to keep reading little simpler. Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-02ovl: simplify setting of origin for index lookupVivek Goyal
overlayfs can keep index of copied up files and directories and it seems to serve two primary puroposes. For regular files, it avoids breaking lower hardlinks over copy up. For directories it seems to be used for various error checks. During ovl_lookup(), we lookup for index using lower dentry in many a cases. That lower dentry is called "origin" and following is a summary of current logic. If there is no upperdentry, always lookup for index using lower dentry. For regular files it helps avoiding breaking hard links over copyup and for directories it seems to be just error checks. If there is an upperdentry, then there are 3 possible cases. - For directories, lower dentry is found using two ways. One is regular path based lookup in lower layers and second is using ORIGIN xattr on upper dentry. First verify that path based lookup lower dentry matches the one pointed by upper ORIGIN xattr. If yes, use this verified origin for index lookup. - For regular files (non-metacopy), there is no path based lookup in lower layers as lookup stops once we find upper dentry. So there is no origin verification. If there is ORIGIN xattr present on upper, use that to lookup index otherwise don't. - For regular metacopy files, again lower dentry is found using path based lookup as well as ORIGIN xattr on upper. Path based lookup is continued in this case to find lower data dentry for metacopy upper. So like directories we only use verified origin. If ORIGIN xattr is not present (Either because lower did not support file handles or because this is hardlink copied up with index=off), then don't use path lookup based lower dentry as origin. This is same as regular non-metacopy file case. Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-02ovl: fix out of bounds access warning in ovl_check_fb_len()Amir Goldstein
syzbot reported out of bounds memory access from open_by_handle_at() with a crafted file handle that looks like this: { .handle_bytes = 2, .handle_type = OVL_FILEID_V1 } handle_bytes gets rounded down to 0 and we end up calling: ovl_check_fh_len(fh, 0) => ovl_check_fb_len(fh + 3, -3) But fh buffer is only 2 bytes long, so accessing struct ovl_fb at fh + 3 is illegal. Fixes: cbe7fba8edfc ("ovl: make sure that real fid is 32bit aligned in memory") Reported-and-tested-by: syzbot+61958888b1c60361a791@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> # v5.5 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-01Merge tag 'docs-5.8' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "A fair amount of stuff this time around, dominated by yet another massive set from Mauro toward the completion of the RST conversion. I *really* hope we are getting close to the end of this. Meanwhile, those patches reach pretty far afield to update document references around the tree; there should be no actual code changes there. There will be, alas, more of the usual trivial merge conflicts. Beyond that we have more translations, improvements to the sphinx scripting, a number of additions to the sysctl documentation, and lots of fixes" * tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits) Documentation: fixes to the maintainer-entry-profile template zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst tracing: Fix events.rst section numbering docs: acpi: fix old http link and improve document format docs: filesystems: add info about efivars content Documentation: LSM: Correct the basic LSM description mailmap: change email for Ricardo Ribalda docs: sysctl/kernel: document unaligned controls Documentation: admin-guide: update bug-hunting.rst docs: sysctl/kernel: document ngroups_max nvdimm: fixes to maintainter-entry-profile Documentation/features: Correct RISC-V kprobes support entry Documentation/features: Refresh the arch support status files Revert "docs: sysctl/kernel: document ngroups_max" docs: move locking-specific documents to locking/ docs: move digsig docs to the security book docs: move the kref doc into the core-api book docs: add IRQ documentation at the core-api book docs: debugging-via-ohci1394.txt: add it to the core-api book docs: fix references for ipmi.rst file ...
2020-05-13ovl: return required buffer size for file handlesLubos Dolezel
Overlayfs doesn't work well with the fanotify mechanism. Fanotify first probes for the required buffer size for the file handle, but overlayfs currently bails out without passing the size back. That results in errors in the kernel log, such as: [527944.485384] overlayfs: failed to encode file handle (/, err=-75, buflen=0, len=29, type=1) [527944.485386] fanotify: failed to encode fid (fsid=ae521e68.a434d95f, type=255, bytes=0, err=-2) Signed-off-by: Lubos Dolezel <lubos@dolezel.info> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-13ovl: sync dirty data when remounting to ro modeChengguang Xu
sync_filesystem() does not sync dirty data for readonly filesystem during umount, so before changing to readonly filesystem we should sync dirty data for data integrity. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-13ovl: whiteout inode sharingChengguang Xu
Share inode with different whiteout files for saving inode and speeding up delete operation. If EMLINK is encountered when linking a shared whiteout, create a new one. In case of any other error, disable sharing for this super block. Note: ofs->whiteout is protected by inode lock on workdir. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-13ovl: inherit SB_NOSEC flag from upperdirJeffle Xu
Since the stacking of regular file operations [1], the overlayfs edition of write_iter() is called when writing regular files. Since then, xattr lookup is needed on every write since file_remove_privs() is called from ovl_write_iter(), which would become the performance bottleneck when writing small chunks of data. In my test case, file_remove_privs() would consume ~15% CPU when running fstime of unixbench (the workload is repeadly writing 1 KB to the same file) [2]. Inherit the SB_NOSEC flag from upperdir. Since then xattr lookup would be done only once on the first write. Unixbench fstime gets a ~20% performance gain with this patch. [1] https://lore.kernel.org/lkml/20180606150905.GC9426@magnolia/T/ [2] https://www.spinics.net/lists/linux-unionfs/msg07153.html Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-13ovl: skip overlayfs superblocks at global syncKonstantin Khlebnikov
Stacked filesystems like overlayfs has no own writeback, but they have to forward syncfs() requests to backend for keeping data integrity. During global sync() each overlayfs instance calls method ->sync_fs() for backend although it itself is in global list of superblocks too. As a result one syscall sync() could write one superblock several times and send multiple disk barriers. This patch adds flag SB_I_SKIP_SYNC into sb->sb_iflags to avoid that. Reported-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-13ovl: index dir act as work dirAmir Goldstein
With index=on, let index dir act as the work dir for copy up and cleanups. This will help implementing whiteout inode sharing. We still create the "work" dir on mount regardless of index=on and it is used to test the features supported by upper fs. One reason is that before the feature tests, we do not know if index could be enabled or not. The reason we do not use "index" directory also as workdir with index=off is because the existence of the "index" directory acts as a simple persistent signal that index was enabled on this filesystem and tools may want to use that signal. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-13ovl: prepare to copy up without workdirAmir Goldstein
With index=on, we copy up lower hardlinks to work dir and move them into index dir. Fix locking to allow work dir and index dir to be the same directory. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-13ovl: cleanup non-empty directories in ovl_indexdir_cleanup()Amir Goldstein
Teach ovl_indexdir_cleanup() to remove temp directories containing whiteouts to prepare for using index dir instead of work dir for removing merge directories. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-13ovl: resolve more conflicting mount optionsAmir Goldstein
Similar to the way that a conflict between metacopy=on,redirect_dir=off is resolved, also resolve conflicts between nfs_export=on,index=off and nfs_export=on,metacopy=on. An explicit mount option wins over a default config value. Both explicit mount options result in an error. Without this change the xfstests group overlay/exportfs are skipped if metacopy is enabled by default. Reported-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-13ovl: potential crash in ovl_fid_to_fh()Dan Carpenter
The "buflen" value comes from the user and there is a potential that it could be zero. In do_handle_to_path() we know that "handle->handle_bytes" is non-zero and we do: handle_dwords = handle->handle_bytes >> 2; So values 1-3 become zero. Then in ovl_fh_to_dentry() we do: int len = fh_len << 2; So now len is in the "0,4-128" range and a multiple of 4. But if "buflen" is zero it will try to copy negative bytes when we do the memcpy in ovl_fid_to_fh(). memcpy(&fh->fb, fid, buflen - OVL_FH_WIRE_OFFSET); And that will lead to a crash. Thanks to Amir Goldstein for his help with this patch. Fixes: cbe7fba8edfc ("ovl: make sure that real fid is 32bit aligned in memory") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Cc: <stable@vger.kernel.org> # v5.5 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-04-30ovl: clear ATTR_OPEN from attr->ia_validVivek Goyal
As of now during open(), we don't pass bunch of flags to underlying filesystem. O_TRUNC is one of these. Normally this is not a problem as VFS calls ->setattr() with zero size and underlying filesystem sets file size to 0. But when overlayfs is running on top of virtiofs, it has an optimization where it does not send setattr request to server if dectects that truncation is part of open(O_TRUNC). It assumes that server already zeroed file size as part of open(O_TRUNC). fuse_do_setattr() { if (attr->ia_valid & ATTR_OPEN) { /* * No need to send request to userspace, since actual * truncation has already been done by OPEN. But still * need to truncate page cache. */ } } IOW, fuse expects O_TRUNC to be passed to it as part of open flags. But currently overlayfs does not pass O_TRUNC to underlying filesystem hence fuse/virtiofs breaks. Setup overlayfs on top of virtiofs and following does not zero the file size of a file is either upper only or has already been copied up. fd = open(foo.txt, O_TRUNC | O_WRONLY); There are two ways to fix this. Either pass O_TRUNC to underlying filesystem or clear ATTR_OPEN from attr->ia_valid so that fuse ends up sending a SETATTR request to server. Miklos is concerned that O_TRUNC might have side affects so it is better to clear ATTR_OPEN for now. Hence this patch clears ATTR_OPEN from attr->ia_valid. I found this problem while running unionmount-testsuite. With this patch, unionmount-testsuite passes with overlayfs on top of virtiofs. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Fixes: bccece1ead36 ("ovl: allow remote upper") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-04-30ovl: clear ATTR_FILE from attr->ia_validVivek Goyal
ovl_setattr() can be passed an attr which has ATTR_FILE set and attr->ia_file is a file pointer to overlay file. This is done in open(O_TRUNC) path. We should either replace with attr->ia_file with underlying file object or clear ATTR_FILE so that underlying filesystem does not end up using overlayfs file object pointer. There are no good use cases yet so for now clear ATTR_FILE. fuse seems to be one user which can use this. But it can work even without this. So it is not mandatory to pass ATTR_FILE to fuse. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Fixes: bccece1ead36 ("ovl: allow remote upper") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-04-20docs: fix broken references to text filesMauro Carvalho Chehab
Several references got broken due to txt to ReST conversion. Several of them can be automatically fixed with: scripts/documentation-file-ref-check --fix Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> # hwtracing/coresight/Kconfig Reviewed-by: Paul E. McKenney <paulmck@kernel.org> # memory-barrier.txt Acked-by: Alex Shi <alex.shi@linux.alibaba.com> # translations/zh_CN Acked-by: Federico Vaga <federico.vaga@vaga.pv.it> # translations/it_IT Acked-by: Marc Zyngier <maz@kernel.org> # kvm/arm64 Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/6f919ddb83a33b5f2a63b6b5f0575737bb2b36aa.1586881715.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-03-27ovl: enable xino automatically in more casesAmir Goldstein
So far, with xino=auto, we only enable xino if we know that all underlying filesystem use 32bit inode numbers. When users configure overlay with xino=auto, they already declare that they are ready to handle 64bit inode number from overlay. It is a very common case, that underlying filesystem uses 64bit ino, but rarely or never uses the high inode number bits (e.g. tmpfs, xfs). Leaving it for the users to declare high ino bits are unused with xino=on is not a recipe for many users to enjoy the benefits of xino. There appears to be very little reason not to enable xino when users declare xino=auto even if we do not know how many bits underlying filesystem uses for inode numbers. In the worst case of xino bits overflow by real inode number, we already fall back to the non-xino behavior - real inode number with unique pseudo dev or to non persistent inode number and overlay st_dev (for directories). The only annoyance from auto enabling xino is that xino bits overflow emits a warning to kmsg. Suppress those warnings unless users explicitly asked for xino=on, suggesting that they expected high ino bits to be unused by underlying filesystem. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-27ovl: avoid possible inode number collisions with xino=onAmir Goldstein
When xino feature is enabled and a real directory inode number overflows the lower xino bits, we cannot map this directory inode number to a unique and persistent inode number and we fall back to the real inode st_ino and overlay st_dev. The real inode st_ino with high bits may collide with a lower inode number on overlay st_dev that was mapped using xino. To avoid possible collision with legitimate xino values, map a non persistent inode number to a dedicated range in the xino address space. The dedicated range is created by adding one more bit to the number of reserved high xino bits. We could have added just one more fsid, but that would have had the undesired effect of changing persistent overlay inode numbers on kernel or require more complex xino mapping code. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-27ovl: use a private non-persistent ino poolAmir Goldstein
There is no reason to deplete the system's global get_next_ino() pool for overlay non-persistent inode numbers and there is no reason at all to allocate non-persistent inode numbers for non-directories. For non-directories, it is much better to leave i_ino the same as real i_ino, to be consistent with st_ino/d_ino. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-27ovl: fix WARN_ON nlink drop to zeroMiklos Szeredi
Changes to underlying layers should not cause WARN_ON(), but this repro does: mkdir w l u mnt sudo mount -t overlay -o workdir=w,lowerdir=l,upperdir=u overlay mnt touch mnt/h ln u/h u/k rm -rf mnt/k rm -rf mnt/h dmesg ------------[ cut here ]------------ WARNING: CPU: 1 PID: 116244 at fs/inode.c:302 drop_nlink+0x28/0x40 After upper hardlinks were added while overlay is mounted, unlinking all overlay hardlinks drops overlay nlink to zero before all upper inodes are unlinked. After unlink/rename prevent i_nlink from going to zero if there are still hashed aliases (i.e. cached hard links to the victim) remaining. Reported-by: Phasip <phasip@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: fix a typo in commentChengguang Xu
Fix a typo in comment. (annonate -> annotate) Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Fixes: cbe7fba8edfc ("ovl: make sure that real fid is 32bit aligned in memory") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: ovl_obtain_alias(): don't call d_instantiate_anon() for oldAl Viro
The situation is the same as for __d_obtain_alias() (which is what that thing is parallel to) - if we find a preexisting alias, we want to grab it, drop the inode and return the alias we'd found. The only thing d_instantiate_anon() does compared to that is spurious security_d_instiate() that has already been done to that dentry with exact same arguments. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: strict upper fs requirements for remote upper fsAmir Goldstein
Overlayfs works sub-optimally with upper fs that has no xattr/d_type/ RENAME_WHITEOUT support. We should basically deprecate support for those filesystems, but so far, we only issue a warning and don't fail the mount for the sake of backward compat. Some features are already being disabled with no xattr support. For newly supported remote upper fs, we do not need to worry about backward compatibility, so we can fail the mount if upper fs is a sub-optimal filesystem. This reduces the in-tree remote filesystems supported as upper to just FUSE, for which the remote upper fs support was added. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>