summaryrefslogtreecommitdiff
path: root/fs/overlayfs/ovl_entry.h
AgeCommit message (Collapse)Author
2021-01-28ovl: implement volatile-specific fsync error behaviourSargun Dhillon
Overlayfs's volatile option allows the user to bypass all forced sync calls to the upperdir filesystem. This comes at the cost of safety. We can never ensure that the user's data is intact, but we can make a best effort to expose whether or not the data is likely to be in a bad state. The best way to handle this in the time being is that if an overlayfs's upperdir experiences an error after a volatile mount occurs, that error will be returned on fsync, fdatasync, sync, and syncfs. This is contradictory to the traditional behaviour of VFS which fails the call once, and only raises an error if a subsequent fsync error has occurred, and been raised by the filesystem. One awkward aspect of the patch is that we have to manually set the superblock's errseq_t after the sync_fs callback as opposed to just returning an error from syncfs. This is because the call chain looks something like this: sys_syncfs -> sync_filesystem -> __sync_filesystem -> /* The return value is ignored here sb->s_op->sync_fs(sb) _sync_blockdev /* Where the VFS fetches the error to raise to userspace */ errseq_check_and_advance Because of this we call errseq_set every time the sync_fs callback occurs. Due to the nature of this seen / unseen dichotomy, if the upperdir is an inconsistent state at the initial mount time, overlayfs will refuse to mount, as overlayfs cannot get a snapshot of the upperdir's errseq that will increment on error until the user calls syncfs. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Suggested-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Fixes: c86243b090bc ("ovl: provide a mount option "volatile"") Cc: stable@vger.kernel.org Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-12-14ovl: user xattrMiklos Szeredi
Optionally allow using "user.overlay." namespace instead of "trusted.overlay." This is necessary for overlayfs to be able to be mounted in an unprivileged namepsace. Make the option explicit, since it makes the filesystem format be incompatible. Disable redirect_dir and metacopy options, because these would allow privilege escalation through direct manipulation of the "user.overlay.redirect" or "user.overlay.metacopy" xattrs. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
2020-11-12ovl: introduce new "uuid=off" option for inodes index featurePavel Tikhomirov
This replaces uuid with null in overlayfs file handles and thus relaxes uuid checks for overlay index feature. It is only possible in case there is only one filesystem for all the work/upper/lower directories and bare file handles from this backing filesystem are unique. In other case when we have multiple filesystems lets just fallback to "uuid=on" which is and equivalent of how it worked before with all uuid checks. This is needed when overlayfs is/was mounted in a container with index enabled (e.g.: to be able to resolve inotify watch file handles on it to paths in CRIU), and this container is copied and started alongside with the original one. This way the "copy" container can't have the same uuid on the superblock and mounting the overlayfs from it later would fail. That is an example of the problem on top of loop+ext4: dd if=/dev/zero of=loopbackfile.img bs=100M count=10 losetup -fP loopbackfile.img losetup -a #/dev/loop0: [64768]:35 (/loop-test/loopbackfile.img) mkfs.ext4 loopbackfile.img mkdir loop-mp mount -o loop /dev/loop0 loop-mp mkdir loop-mp/{lower,upper,work,merged} mount -t overlay overlay -oindex=on,lowerdir=loop-mp/lower,\ upperdir=loop-mp/upper,workdir=loop-mp/work loop-mp/merged umount loop-mp/merged umount loop-mp e2fsck -f /dev/loop0 tune2fs -U random /dev/loop0 mount -o loop /dev/loop0 loop-mp mount -t overlay overlay -oindex=on,lowerdir=loop-mp/lower,\ upperdir=loop-mp/upper,workdir=loop-mp/work loop-mp/merged #mount: /loop-test/loop-mp/merged: #mount(2) system call failed: Stale file handle. If you just change the uuid of the backing filesystem, overlay is not mounting any more. In Virtuozzo we copy container disks (ploops) when create the copy of container and we require fs uuid to be unique for a new container. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-02ovl: provide a mount option "volatile"Vivek Goyal
Container folks are complaining that dnf/yum issues too many sync while installing packages and this slows down the image build. Build requirement is such that they don't care if a node goes down while build was still going on. In that case, they will simply throw away unfinished layer and start new build. So they don't care about syncing intermediate state to the disk and hence don't want to pay the price associated with sync. So they are asking for mount options where they can disable sync on overlay mount point. They primarily seem to have two use cases. - For building images, they will mount overlay with nosync and then sync upper layer after unmounting overlay and reuse upper as lower for next layer. - For running containers, they don't seem to care about syncing upper layer because if node goes down, they will simply throw away upper layer and create a fresh one. So this patch provides a mount option "volatile" which disables all forms of sync. Now it is caller's responsibility to throw away upper if system crashes or shuts down and start fresh. With "volatile", I am seeing roughly 20% speed up in my VM where I am just installing emacs in an image. Installation time drops from 31 seconds to 25 seconds when nosync option is used. This is for the case of building on top of an image where all packages are already cached. That way I take out the network operations latency out of the measurement. Giuseppe is also looking to cut down on number of iops done on the disk. He is complaining that often in cloud their VMs are throttled if they cross the limit. This option can help them where they reduce number of iops (by cutting down on frequent sync and writebacks). Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-04ovl: get rid of redundant members in struct ovl_fsMiklos Szeredi
ofs->upper_mnt is copied to ->layers[0].mnt and ->layers[0].trap could be used instead of a separate ->upperdir_trap. Split the lowerdir option early to get the number of layers, then allocate the ->layers array, and finally fill the upper and lower layers, as before. Get rid of path_put_init() in ovl_lower_dir(), since the only caller will take care of that. [Colin Ian King] Fix null pointer dereference on null stack pointer on error return found by Coverity. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-04ovl: add accessor for ofs->upper_mntMiklos Szeredi
Next patch will remove ofs->upper_mnt, so add an accessor function for this field. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-13ovl: whiteout inode sharingChengguang Xu
Share inode with different whiteout files for saving inode and speeding up delete operation. If EMLINK is encountered when linking a shared whiteout, create a new one. In case of any other error, disable sharing for this super block. Note: ofs->whiteout is protected by inode lock on workdir. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-27ovl: use a private non-persistent ino poolAmir Goldstein
There is no reason to deplete the system's global get_next_ino() pool for overlay non-persistent inode numbers and there is no reason at all to allocate non-persistent inode numbers for non-directories. For non-directories, it is much better to leave i_ino the same as real i_ino, to be consistent with st_ino/d_ino. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-01-24ovl: layer is constMiklos Szeredi
The ovl_layer struct is never modified except at initialization. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-01-24ovl: fix corner case of conflicting lower layer uuidAmir Goldstein
This fixes ovl_lower_uuid_ok() to correctly detect the corner case: - two filesystems, A and B, both have null uuid - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B In this case, bad_uuid would not have been set for B, because the check only involved the list of lower fs. Hence we'll try to decode a layer 2 origin on layer 1 and fail. We check for conflicting (and null) uuid among all lower layers, including those layers that are on the same fs as the upper layer. Reported-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-01-24ovl: generalize the lower_fs[] arrayAmir Goldstein
Rename lower_fs[] array to fs[], extend its size by one and use index fsid (instead of fsid-1) to access the fs[] array. Initialize fs[0] with upper fs values. fsid 0 is reserved even with lower only overlay, so fs[0] remains null in this case. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-01-24ovl: simplify ovl_same_sb() helperAmir Goldstein
No code uses the sb returned from this helper, so make it retrun a boolean and rename it to ovl_same_fs(). The xino mode is irrelevant when all layers are on same fs, so instead of describing samefs with mode OVL_XINO_OFF, use a new xino_mode state, which is 0 in the case of samefs, -1 in the case of xino=off and > 0 with xino enabled. Create a new helper ovl_same_dev(), to use instead of the common check for (ovl_same_fs() || xinobits). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-01-22ovl: generalize the lower_layers[] arrayAmir Goldstein
Rename lower_layers[] array to layers[], extend its size by one and initialize layers[0] with upper layer values. Lower layers are now addressed with index 1..numlower. layers[0] is reserved even with lower only overlay. [SzM: replace ofs->numlower with ofs->numlayer, the latter's value is incremented by one] Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-10ovl: fix lookup failure on multi lower squashfsAmir Goldstein
In the past, overlayfs required that lower fs have non null uuid in order to support nfs export and decode copy up origin file handles. Commit 9df085f3c9a2 ("ovl: relax requirement for non null uuid of lower fs") relaxed this requirement for nfs export support, as long as uuid (even if null) is unique among all lower fs. However, said commit unintentionally also relaxed the non null uuid requirement for decoding copy up origin file handles, regardless of the unique uuid requirement. Amend this mistake by disabling decoding of copy up origin file handle from lower fs with a conflicting uuid. We still encode copy up origin file handles from those fs, because file handles like those already exist in the wild and because they might provide useful information in the future. There is an unhandled corner case described by Miklos this way: - two filesystems, A and B, both have null uuid - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B In this case bad_uuid won't be set for B, because the check only involves the list of lower fs. Hence we'll try to decode a layer 2 origin on layer 1 and fail. We will deal with this corner case later. Reported-by: Colin Ian King <colin.king@canonical.com> Tested-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/lkml/20191106234301.283006-1-colin.king@canonical.com/ Fixes: 9df085f3c9a2 ("ovl: relax requirement for non null uuid ...") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-07-16ovl: fix regression caused by overlapping layers detectionAmir Goldstein
Once upon a time, commit 2cac0c00a6cd ("ovl: get exclusive ownership on upper/work dirs") in v4.13 added some sanity checks on overlayfs layers. This change caused a docker regression. The root cause was mount leaks by docker, which as far as I know, still exist. To mitigate the regression, commit 85fdee1eef1a ("ovl: fix regression caused by exclusive upper/work dir protection") in v4.14 turned the mount errors into warnings for the default index=off configuration. Recently, commit 146d62e5a586 ("ovl: detect overlapping layers") in v5.2, re-introduced exclusive upper/work dir checks regardless of index=off configuration. This changes the status quo and mount leak related bug reports have started to re-surface. Restore the status quo to fix the regressions. To clarify, index=off does NOT relax overlapping layers check for this ovelayfs mount. index=off only relaxes exclusive upper/work dir checks with another overlayfs mount. To cover the part of overlapping layers detection that used the exclusive upper/work dir checks to detect overlap with self upper/work dir, add a trap also on the work base dir. Link: https://github.com/moby/moby/issues/34672 Link: https://lore.kernel.org/linux-fsdevel/20171006121405.GA32700@veci.piliscsaba.szeredi.hu/ Link: https://github.com/containers/libpod/issues/3540 Fixes: 146d62e5a586 ("ovl: detect overlapping layers") Cc: <stable@vger.kernel.org> # v4.19+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Tested-by: Colin Walters <walters@verbum.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-29ovl: detect overlapping layersAmir Goldstein
Overlapping overlay layers are not supported and can cause unexpected behavior, but overlayfs does not currently check or warn about these configurations. User is not supposed to specify the same directory for upper and lower dirs or for different lower layers and user is not supposed to specify directories that are descendants of each other for overlay layers, but that is exactly what this zysbot repro did: https://syzkaller.appspot.com/x/repro.syz?x=12c7a94f400000 Moving layer root directories into other layers while overlayfs is mounted could also result in unexpected behavior. This commit places "traps" in the overlay inode hash table. Those traps are dummy overlay inodes that are hashed by the layers root inodes. On mount, the hash table trap entries are used to verify that overlay layers are not overlapping. While at it, we also verify that overlay layers are not overlapping with directories "in-use" by other overlay instances as upperdir/workdir. On lookup, the trap entries are used to verify that overlay layers root inodes have not been moved into other layers after mount. Some examples: $ ./run --ov --samefs -s ... ( mkdir -p base/upper/0/u base/upper/0/w base/lower lower upper mnt mount -o bind base/lower lower mount -o bind base/upper upper mount -t overlay none mnt ... -o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w) $ umount mnt $ mount -t overlay none mnt ... -o lowerdir=base,upperdir=upper/0/u,workdir=upper/0/w [ 94.434900] overlayfs: overlapping upperdir path mount: mount overlay on mnt failed: Too many levels of symbolic links $ mount -t overlay none mnt ... -o lowerdir=upper/0/u,upperdir=upper/0/u,workdir=upper/0/w [ 151.350132] overlayfs: conflicting lowerdir path mount: none is already mounted or mnt busy $ mount -t overlay none mnt ... -o lowerdir=lower:lower/a,upperdir=upper/0/u,workdir=upper/0/w [ 201.205045] overlayfs: overlapping lowerdir path mount: mount overlay on mnt failed: Too many levels of symbolic links $ mount -t overlay none mnt ... -o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w $ mv base/upper/0/ base/lower/ $ find mnt/0 mnt/0 mnt/0/w find: 'mnt/0/w/work': Too many levels of symbolic links find: 'mnt/0/u': Too many levels of symbolic links Reported-by: syzbot+9c69c282adc4edd2b540@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Store lower data inode in ovl_inodeVivek Goyal
Right now ovl_inode stores inode pointer for lower inode. This helps with quickly getting lower inode given overlay inode (ovl_inode_lower()). Now with metadata only copy-up, we can have metacopy inode in middle layer as well and inode containing data can be different from ->lower. I need to be able to open the real file in ovl_open_realfile() and for that I need to quickly find the lower data inode. Hence store lower data inode also in ovl_inode. Also provide an helper ovl_inode_lowerdata() to access this field. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Provide a mount option metacopy=on/off for metadata copyupVivek Goyal
By default metadata only copy up is disabled. Provide a mount option so that users can choose one way or other. Also provide a kernel config and module option to enable/disable metacopy feature. metacopy feature requires redirect_dir=on when upper is present. Otherwise, it requires redirect_dir=follow atleast. As of now, metacopy does not work with nfs_export=on. So if both metacopy=on and nfs_export=on then nfs_export is disabled. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: add support for "xino" mount and config optionsAmir Goldstein
With mount option "xino=on", mounter declares that there are enough free high bits in underlying fs to hold the layer fsid. If overlayfs does encounter underlying inodes using the high xino bits reserved for layer fsid, a warning will be emitted and the original inode number will be used. The mount option name "xino" goes after a similar meaning mount option of aufs, but in overlayfs case, the mapping is stateless. An example for a use case of "xino=on" is when upper/lower is on an xfs filesystem. xfs uses 64bit inode numbers, but it currently never uses the upper 8bit for inode numbers exposed via stat(2) and that is not likely to change in the future without user opting-in for a new xfs feature. The actual number of unused upper bit is much larger and determined by the xfs filesystem geometry (64 - agno_log - agblklog - inopblog). That means that for all practical purpose, there are enough unused bits in xfs inode numbers for more than OVL_MAX_STACK unique fsid's. Another use case of "xino=on" is when upper/lower is on tmpfs. tmpfs inode numbers are allocated sequentially since boot, so they will practially never use the high inode number bits. For compatibility with applications that expect 32bit inodes, the feature can be disabled with "xino=off". The option "xino=auto" automatically detects underlying filesystem that use 32bit inodes and enables the feature. The Kconfig option OVERLAY_FS_XINO_AUTO and module parameter of the same name, determine if the default mode for overlayfs mount is "xino=auto" or "xino=off". Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: constant st_ino for non-samefs with xinoAmir Goldstein
On 64bit systems, when overlay layers are not all on the same fs, but all inode numbers of underlying fs are not using the high bits, use the high bits to partition the overlay st_ino address space. The high bits hold the fsid (upper fsid is 0). This way overlay inode numbers are unique and all inodes use overlay st_dev. Inode numbers are also persistent for a given layer configuration. Currently, our only indication for available high ino bits is from a filesystem that supports file handles and uses the default encode_fh() operation, which encodes a 32bit inode number. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: allocate anon bdev per unique lower fsAmir Goldstein
Instead of allocating an anonymous bdev per lower layer, allocate one anonymous bdev per every unique lower fs that is different than upper fs. Every unique lower fs is assigned an fsid > 0 and the number of unique lower fs are stored in ofs->numlowerfs. The assigned fsid is stored in the lower layer struct and will be used also for inode number multiplexing. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: store 'has_upper' and 'opaque' as bit flagsAmir Goldstein
We need to make some room in struct ovl_entry to store information about redirected ancestors for NFS export, so cram two booleans as bit flags. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: add support for "nfs_export" configurationAmir Goldstein
Introduce the "nfs_export" config, module and mount options. The NFS export feature depends on the "index" feature and enables two implicit overlayfs features: "index_all" and "verify_lower". The "index_all" feature creates an index on copy up of every file and directory. The "verify_lower" feature uses the full index to detect overlay filesystems inconsistencies on lookup, like redirect from multiple upper dirs to the same lower dir. NFS export can be enabled for non-upper mount with no index. However, because lower layer redirects cannot be verified with the index, enabling NFS export support on an overlay with no upper layer requires turning off redirect follow (e.g. "redirect_dir=nofollow"). The full index may incur some overhead on mount time, especially when verifying that lower directory file handles are not stale. NFS export support, full index and consistency verification will be implemented by following patches. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: store layer index in ovl_layerAmir Goldstein
Store the fs root layer index inside ovl_layer struct, so we can get the root fs layer index from merge dir lower layer instead of find it with ovl_find_layer() helper. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-12-11ovl: don't follow redirects if redirect_dir=offMiklos Szeredi
Overlayfs is following redirects even when redirects are disabled. If this is unintentional (probably the majority of cases) then this can be a problem. E.g. upper layer comes from untrusted USB drive, and attacker crafts a redirect to enable read access to otherwise unreadable directories. If "redirect_dir=off", then turn off following as well as creation of redirects. If "redirect_dir=follow", then turn on following, but turn off creation of redirects (which is what "redirect_dir=off" does now). This is a backward incompatible change, so make it dependent on a config option. Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-11-17Merge branch 'overlayfs-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: - Report constant st_ino values across copy-up even if underlying layers are on different filesystems, but using different st_dev values for each layer. Ideally we'd report the same st_dev across the overlay, and it's possible to do for filesystems that use only 32bits for st_ino by unifying the inum space. It would be nice if it wasn't a choice of 32 or 64, rather filesystems could report their current maximum (that could change on resize, so it wouldn't be set in stone). - miscellaneus fixes and a cleanup of ovl_fill_super(), that was long overdue. - created a path_put_init() helper that clears out the pointers after putting the ref. I think this could be useful elsewhere, so added it to <linux/path.h> * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (30 commits) ovl: remove unneeded arg from ovl_verify_origin() ovl: Put upperdentry if ovl_check_origin() fails ovl: rename ufs to ofs ovl: clean up getting lower layers ovl: clean up workdir creation ovl: clean up getting upper layer ovl: move ovl_get_workdir() and ovl_get_lower_layers() ovl: reduce the number of arguments for ovl_workdir_create() ovl: change order of setup in ovl_fill_super() ovl: factor out ovl_free_fs() helper ovl: grab reference to workbasedir early ovl: split out ovl_get_indexdir() from ovl_fill_super() ovl: split out ovl_get_lower_layers() from ovl_fill_super() ovl: split out ovl_get_workdir() from ovl_fill_super() ovl: split out ovl_get_upper() from ovl_fill_super() ovl: split out ovl_get_lowerstack() from ovl_fill_super() ovl: split out ovl_get_workpath() from ovl_fill_super() ovl: split out ovl_get_upperpath() from ovl_fill_super() ovl: use path_put_init() in error paths for ovl_fill_super() vfs: add path_put_init() ...
2017-11-09ovl: allocate anonymous devs for lowerdirsChandan Rajendra
Generate unique values of st_dev per lower layer for non-samefs overlay mount. The unique values are obtained by allocating anonymous bdevs for each of the lowerdirs in the overlayfs instance. The anonymous bdev is going to be returned by stat(2) for lowerdir non-dir entries in non-samefs case. [amir: split from ovl_getattr() and re-structure patches] Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-11-09ovl: re-structure overlay lower layers in-memoryChandan Rajendra
Define new structures to represent overlay instance lower layers and overlay merge dir lower layers to make room for storing more per layer information in-memory. Instead of keeping the fs instance lower layers in an array of struct vfsmount, keep them in an array of new struct ovl_layer, that has a pointer to struct vfsmount. Instead of keeping the dentry lower layers in an array of struct path, keep them in an array of new struct ovl_path, that has a pointer to struct dentry and to struct ovl_layer. Add a small helper to find the fs layer id that correspopnds to a lower struct ovl_path and use it in ovl_lookup(). [amir: split re-structure from anonymous bdev patch] Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-10-24locking/barriers: Convert users of lockless_dereference() to READ_ONCE()Will Deacon
READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it can be used instead of lockless_dereference() without any change in semantics. Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-05ovl: fix regression caused by exclusive upper/work dir protectionAmir Goldstein
Enforcing exclusive ownership on upper/work dirs caused a docker regression: https://github.com/moby/moby/issues/34672. Euan spotted the regression and pointed to the offending commit. Vivek has brought the regression to my attention and provided this reproducer: Terminal 1: mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none merged/ Terminal 2: unshare -m Terminal 1: umount merged mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none merged/ mount: /root/overlay-testing/merged: none already mounted or mount point busy To fix the regression, I replaced the error with an alarming warning. With index feature enabled, mount does fail, but logs a suggestion to override exclusive dir protection by disabling index. Note that index=off mount does take the inuse locks, so a concurrent index=off will issue the warning and a concurrent index=on mount will fail. Documentation was updated to reflect this change. Fixes: 2cac0c00a6cd ("ovl: get exclusive ownership on upper/work dirs") Cc: <stable@vger.kernel.org> # v4.13 Reported-by: Euan Kemp <euank@euank.com> Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: add flag for upper in ovl_entryMiklos Szeredi
For rename, we need to ensure that an upper alias exists for hard links before attempting the operation. Introduce a flag in ovl_entry to track the state of the upper alias. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: introduce the inodes index dir featureAmir Goldstein
Create the index dir on mount. The index dir will contain hardlinks to upper inodes, named after the hex representation of their origin lower inodes. The index dir is going to be used to prevent breaking lower hardlinks on copy up and to implement overlayfs NFS export. Because the feature is not fully backward compat, enabling the feature is opt-in by config/module/mount option. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: get exclusive ownership on upper/work dirsAmir Goldstein
Bad things can happen if several concurrent overlay mounts try to use the same upperdir/workdir path. Try to get the 'inuse' advisory lock on upperdir and workdir. Fail mount if another overlay mount instance or another user holds the 'inuse' lock on these directories. Note that this provides no protection for concurrent overlay mount that use overlapping (i.e. descendant) upper/work dirs. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: move cache and version to ovl_inodeMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: use ovl_inode mutex to synchronize concurrent copy upAmir Goldstein
Use the new ovl_inode mutex to synchonize concurrent copy up instead of the super block copy up workqueue. Moving the synchronization object from the overlay dentry to the overlay inode is needed for synchonizing concurrent copy up of lower hardlinks to the same upper inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: move impure to ovl_inodeMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: move redirect to ovl_inodeMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: move __upperdentry to ovl_inodeMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: use i_private only as a keyMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: allocate an ovl_inode structAmir Goldstein
We need some more space to store overlay inode data in memory, so allocate overlay inodes from a slab of struct ovl_inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-19ovl: mark upper dir with type origin entries "impure"Amir Goldstein
When moving a merge dir or non-dir with copy up origin into a non-merge upper dir (a.k.a pure upper dir), we are marking the target parent dir "impure". ovl_iterate() iterates pure upper dirs directly, because there is no need to filter out whiteouts and merge dir content with lower dir. But for the case of an "impure" upper dir, ovl_iterate() will not be able to iterate the real upper dir directly, because it will need to lookup the origin inode and use it to fill d_ino. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-18ovl: check on mount time if upper fs supports setting xattrAmir Goldstein
xattr are needed by overlayfs for setting opaque dir, redirect dir and copy up origin. Check at mount time by trying to set the overlay.opaque xattr on the workdir and if that fails issue a warning message. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05ovl: check if all layers are on the same fsAmir Goldstein
Some features can only work when all layers are on the same fs. Test this condition during mount time, so features can check them later. Add helper ovl_same_sb() to return the common super block in case all layers are on the same fs. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07ovl: introduce copy up waitqueueAmir Goldstein
The overlay sb 'copyup_wq' and overlay inode 'copying' condition variable are about to replace the upper sb rename_lock, as finer grained synchronization objects for concurrent copy up. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07ovl: check if upperdir fs supports O_TMPFILEAmir Goldstein
This is needed for choosing between concurrent copyup using O_TMPFILE and legacy copyup using workdir+rename. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16ovl: redirect on rename-dirMiklos Szeredi
Current code returns EXDEV when a directory would need to be copied up to move. We could copy up the directory tree in this case, but there's another, simpler solution: point to old lower directory from moved upper directory. This is achieved with a "trusted.overlay.redirect" xattr storing the path relative to the root of the overlay. After such attribute has been set, the directory can be moved without further actions required. This is a backward incompatible feature, old kernels won't be able to correctly mount an overlay containing redirected directories. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16ovl: lookup redirectsMiklos Szeredi
If a directory has the "trusted.overlay.redirect" xattr, it means that the value of the xattr should be used to find the underlying directory on the next lower layer. The redirect may be relative or absolute. Absolute redirects begin with a slash. A relative redirect means: instead of the current dentry's name use the value of the redirect to find the directory in the next lower layer. Relative redirects must not contain a slash. An absolute redirect means: look up the directory relative to the root of the overlay using the value of the redirect in the next lower layer. Redirects work on lower layers as well. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16ovl: check namelenMiklos Szeredi
We already calculate f_namelen in statfs as the maximum of the name lengths provided by the filesystems taking part in the overlay. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16ovl: split super.cMiklos Szeredi
fs/overlayfs/super.c is the biggest of the overlayfs source files and it contains various utility functions as well as the rather complicated lookup code. Split these parts out to separate files. Before: 1446 fs/overlayfs/super.c After: 919 fs/overlayfs/super.c 267 fs/overlayfs/namei.c 235 fs/overlayfs/util.c 51 fs/overlayfs/ovl_entry.h Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>