Age | Commit message (Collapse) | Author |
|
Now we will have the capability to have upper inodes which might be only
metadata copy up and data is still on lower inode. So add a new xattr
OVL_XATTR_METACOPY to distinguish between two cases.
Presence of OVL_XATTR_METACOPY reflects that file has been copied up
metadata only and and data will be copied up later from lower origin. So
this xattr is set when a metadata copy takes place and cleared when data
copy takes place.
We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects
whether ovl inode has data or not (as opposed to metadata only copy up).
If a file is copied up metadata only and later when same file is opened for
WRITE, then data copy up takes place. We copy up data, remove METACOPY
xattr and then set the UPPERDATA flag in ovl_inode->flags. While all these
operations happen with oi->lock held, read side of oi->flags can be
lockless. That is another thread on another cpu can check if UPPERDATA
flag is set or not.
So this gives us an ordering requirement w.r.t UPPERDATA flag. That is, if
another cpu sees UPPERDATA flag set, then it should be guaranteed that
effects of data copy up and remove xattr operations are also visible.
For example.
CPU1 CPU2
ovl_open() acquire(oi->lock)
ovl_open_maybe_copy_up() ovl_copy_up_data()
open_open_need_copy_up() vfs_removexattr()
ovl_already_copied_up()
ovl_dentry_needs_data_copy_up() ovl_set_flag(OVL_UPPERDATA)
ovl_test_flag(OVL_UPPERDATA) release(oi->lock)
Say CPU2 is copying up data and in the end sets UPPERDATA flag. But if
CPU1 perceives the effects of setting UPPERDATA flag but not the effects of
preceding operations (ex. upper that is not fully copied up), it will be a
problem.
Hence this patch introduces smp_wmb() on setting UPPERDATA flag operation
and smp_rmb() on UPPERDATA flag test operation.
May be some other lock or barrier is already covering it. But I am not sure
what that is and is it obvious enough that we will not break it in future.
So hence trying to be safe here and introducing barriers explicitly for
UPPERDATA flag/bit.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
There are couple of places where we need to know if file is already copied
up (in lockless manner). Right now its open coded and there are only two
conditions to check. Soon this patch series will introduce another
condition to check and Amir wants to introduce one more. So introduce a
helper instead to check this so that code is easier to read.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
If it makes sense to copy up only metadata during copy up, do it. This is
done for regular files which are not opened for WRITE.
Right now ->metacopy is set to 0 always. Last patch in the series will
remove the hard coded statement and enable metacopy feature.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Just a little re-ordering of code. This helps with next patch where after
copying up metadata, we skip data copying step, if needed.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
By default metadata only copy up is disabled. Provide a mount option so
that users can choose one way or other.
Also provide a kernel config and module option to enable/disable metacopy
feature.
metacopy feature requires redirect_dir=on when upper is present.
Otherwise, it requires redirect_dir=follow atleast.
As of now, metacopy does not work with nfs_export=on. So if both
metacopy=on and nfs_export=on then nfs_export is disabled.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Right now two copy up helpers are in inode.c. Amir suggested it might be
better to move these to copy_up.c.
There will one more related function which will come in later patch.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
ovl_inode->redirect is an inode property and should be initialized in
ovl_get_inode() only when we are adding a new inode to cache. If inode is
already in cache, it is already initialized and we should not be touching
ovl_inode->redirect field.
As of now this is not a problem as redirects are used only for directories
which don't share inode. But soon I want to use redirects for regular
files also and there it can become an issue.
Hence, move ->redirect initialization in ovl_get_inode().
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This was provided for debugging the ro/rw inconsistecy. The inconsitency
is now gone so this option is obsolete.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Opening regular files on overlayfs is now handled via ovl_open(). Remove
the now unused "open_flags" argument from d_op->d_real() and the d_real()
helper.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This partially reverts commit c568d68341be7030f5647def68851e469b21ca11.
Overlayfs files will now automatically get the correct locks, no need to
hack overlay support in VFS.
It is a partial revert, because it leaves the locks_inode() calls in place
and defines locks_inode() to file_inode(). We could revert those as well,
but it would be unnecessary code churn and it makes sense to document that
we are getting the inode for locking purposes.
Don't revert MS_NOREMOTELOCK yet since that has been part of the userspace
API for some time (though not in a useful way). Will try to remove
internal flags later when the dust around the new mount API settles.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
|
|
This reverts commit 4d0c5ba2ff79ef9f5188998b29fd28fcb05f3667.
We now get write access on both overlay and underlying layers so this patch
is no longer needed for correct operation.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This reverts commit 495e642939114478a5237a7d91661ba93b76f15a.
No user of "flags" argument of d_real() remain.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This reverts commit 598e3c8f72f5b77c84d2cb26cfd936ffb3cfdbaa.
Overlayfs no longer relies on the vfs correct atime handling.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This reverts commit cd91304e7190b4c4802f8e413ab2214b233e0260.
Overlayfs no longer relies on the vfs correct atime handling.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The underlying real file used by overlayfs still contains the overlay path.
This results in mnt_want_write_file() calls by the filesystem getting
freeze protection on the wrong inode (the overlayfs one instead of the real
one).
Fix by using file_inode(file)->i_sb instead of file->f_path.mnt->mnt_sb.
Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
This reverts commit 7c6893e3c9abf6a9676e060a1e35e5caca673d57.
Overlayfs no longer relies on the vfs for checking writability of files.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This reverts commit 954c736f865d6c0c68ae4263a2f3502ee7c447a3.
Overlayfs no longer relies on the vfs for checking writability of files.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Let overlayfs do its thing when opening a file.
This enables stacking and fixes the corner case when a file is opened for
read, modified through a writable open, and data is read from the read-only
file. After this patch the read-only open will not return stale data even
in this case.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Since set of arguments are so similar, handle in a common helper.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Implement stacked fiemap().
Need to split inode operations for regular file (which has fiemap) and
special file (which doesn't have fiemap).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Implement FS_IOC_GETFLAGS and FS_IOC_SETFLAGS.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Implement stacked fallocate.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Implement stacked mmap.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Implement stacked fsync().
Don't sync if lower (noticed by Amir Goldstein).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Implement stacked writes.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Implement stacked reading.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
In the common case we can just use the real file cached in
file->private_data. There are two exceptions:
1) File has been copied up since open: in this unlikely corner case just
use a throwaway real file for the operation. If ever this becomes a
perfomance problem (very unlikely, since overlayfs has been doing most fine
without correctly handling this case at all), then we can deal with that by
updating the cached real file.
2) File's f_flags have changed since open: no need to reopen the cached
real file, we can just change the flags there as well.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Implement file operations on a regular overlay file. The underlying file
is opened separately and cached in ->private_data.
It might be worth making an exception for such files when accounting in
nr_file to confirm to userspace expectations. We are only adding a small
overhead (248bytes for the struct file) since the real inode and dentry are
pinned by overlayfs anyway.
This patch doesn't have any effect, since the vfs will use d_real() to find
the real underlying file to open. The patch at the end of the series will
actually enable this functionality.
AV: make it use open_with_fake_path(), don't mess with override_creds
SzM: still need to mess with override_creds() until no fs uses
current_cred() in their open method.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Copy i_size of the underlying inode to the overlay inode in ovl_copyattr().
This is in preparation for stacking I/O operations on overlay files.
This patch shouldn't have any observable effect.
Remove stale comment from ovl_setattr() [spotted by Vivek Goyal].
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This reverts commit 31c3a7069593b072bd57192b63b62f9a7e994e9a.
Re-add functionality dealing with i_writecount on truncate to overlayfs.
This patch shouldn't have any observable effects, since we just re-assert
the writecout that vfs_truncate() already got for us.
This is in preparation for moving overlay functionality out of the VFS.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
On inode creation copy certain inode flags from the underlying real inode
to the overlay inode.
This is in preparation for moving overlay functionality out of the VFS.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Copy up mtime and ctime to overlay inode after times in real object are
modified. Be careful not to dirty cachelines when not necessary.
This is in preparation for moving overlay functionality out of the VFS.
This patch shouldn't have any observable effect.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This is needed by the stacked dedupe implementation in overlayfs.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This is needed by the stacked ioctl implementation in overlayfs.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Stacking file operations in overlay will store an extra open file for each
overlay file opened.
The overhead is just that of "struct file" which is about 256bytes, because
overlay already pins an extra dentry and inode when the file is open, which
add up to a much larger overhead.
For fear of breaking working setups, don't start accounting the extra file.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Following series for stacking overlay files depends on this mini series.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into overlayfs-next
This gives us the open_with_fake_path() helper that is needed for stacked
open files in overlay and mmap in particular.
|
|
Only upper dir can be impure, but if we are in the middle of
iterating a lower real dir, dir could be copied up and marked
impure. We only want the impure cache if we started iterating
a real upper dir to begin with.
Aditya Kali reported that the following reproducer hits the
WARN_ON(!cache->refcount) in ovl_get_cache():
docker run --rm drupal:8.5.4-fpm-alpine \
sh -c 'cd /var/www/html/vendor/symfony && \
chown -R www-data:www-data . && ls -l .'
Reported-by: Aditya Kali <adityakali@google.com>
Tested-by: Aditya Kali <adityakali@google.com>
Fixes: 4edb83bb1041 ('ovl: constant d_ino for non-merge dirs')
Cc: <stable@vger.kernel.org> # v4.14
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
open a file by given inode, faking ->f_path. Use with shitloads
of caution - at the very least you'd damn better make sure that
some dentry alias of that inode is pinned down by the path in
question. Again, this is no general-purpose interface and I hope
it will eventually go away. Right now overlayfs wants something
like that, but nothing else should.
Any out-of-tree code with bright idea of using this one *will*
eventually get hurt, with zero notice and great delight on my part.
I refuse to use EXPORT_SYMBOL_GPL(), especially in situations when
it's really EXPORT_SYMBOL_DONT_USE_IT(), but don't take that export
as "you are welcome to use it".
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
These checks are better off in do_dentry_open(); the reason we couldn't
put them there used to be that callers couldn't tell what kind of cleanup
would do_dentry_open() failure call for. Now that we have FMODE_OPENED,
cleanup is the same in all cases - it's simply fput(). So let's fold
that into do_dentry_open(), as Christoph's patch tried to.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Just check FMODE_OPENED in __fput() and be done with that...
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
basically, "is that instance set up enough for regular fput(), or
do we want put_filp() for that one".
NOTE: the only alloc_file() caller that could be followed by put_filp()
is in arch/ia64/kernel/perfmon.c, which is (Kconfig-level) broken.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
always equal to ->f_cred
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... and have it set the f_flags-derived part of ->f_mode.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... and rename get_empty_filp() to alloc_empty_file().
dentry_open() gets creds as argument, but the only thing that sees those is
security_file_open() - file->f_cred still ends up with current_cred(). For
almost all callers it's the same thing, but there are several broken cases.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... so that it could set both ->f_flags and ->f_mode, without callers
having to set ->f_flags manually.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|