Age | Commit message (Collapse) | Author |
|
Don't save callback version and type fields as the version is about the
format of the callback information and the type is relative to the
particular RPC call.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
When applying the status and callback in the response of an operation,
apply them in the same critical section so that there's no race between
checking the callback state and checking status-dependent state (such as
the data version).
Fix this by:
(1) Allocating a joint {status,callback} record (afs_status_cb) before
calling the RPC function for each vnode for which the RPC reply
contains a status or a status plus a callback. A flag is set in the
record to indicate if a callback was actually received.
(2) These records are passed into the RPC functions to be filled in. The
afs_decode_status() and yfs_decode_status() functions are removed and
the cb_lock is no longer taken.
(3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer
update the vnode.
(4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update
the vnode.
(5) vnodes, expected data-version numbers and callback break counters
(cb_break) no longer need to be passed to the reply delivery
functions.
Note that, for the moment, the file locking functions still need
access to both the call and the vnode at the same time.
(6) afs_vnode_commit_status() is now given the cb_break value and the
expected data_version and the task of applying the status and the
callback to the vnode are now done here.
This is done under a single taking of vnode->cb_lock.
(7) afs_pages_written_back() is now called by afs_store_data() rather than
by the reply delivery function.
afs_pages_written_back() has been moved to before the call point and
is now given the first and last page numbers rather than a pointer to
the call.
(8) The indicator from YFS.RemoveFile2 as to whether the target file
actually got removed (status.abort_code == VNOVNODE) rather than
merely dropping a link is now checked in afs_unlink rather than in
xdr_decode_YFSFetchStatus().
Supplementary fixes:
(*) afs_cache_permit() now gets the caller_access mask from the
afs_status_cb object rather than picking it out of the vnode's status
record. afs_fetch_status() returns caller_access through its argument
list for this purpose also.
(*) afs_inode_init_from_status() now uses a write lock on cb_lock rather
than a read lock and now sets the callback inside the same critical
section.
Fixes: c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Always ask for the reply time from AF_RXRPC as it's used to calculate the
callback expiry time and lock expiry times, so it's needed by most FS
operations.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
afs_do_lookup() will do an order-1 allocation to allocate status records if
there are more than 39 vnodes to stat.
Fix this by allocating an array of {status,callback} records for each vnode
we want to examine using vmalloc() if larger than a page.
This not only gets rid of the order-1 allocation, but makes it easier to
grow beyond 50 records for YFS servers. It also allows us to move to
{status,callback} tuples for other calls too and makes it easier to lock
across the application of the status and the callback to the vnode.
Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Replace the afs_call::reply[] array with a bunch of typed members so that
the compiler can use type-checking on them. It's also easier for the eye
to see what's going on.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Don't pass the vnode pointer through into the inline bulk status op. We
want to process the status records outside of it anyway.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
When __afs_break_callback() clears the CB_PROMISED flag, it increments
vnode->cb_break to trigger a future refetch of the status and callback -
however it also calls afs_clear_permits(), which also increments
vnode->cb_break.
Fix this by removing the increment from afs_clear_permits().
Whilst we're at it, fix the conditional call to afs_put_permits() as the
function checks to see if the argument is NULL, so the check is redundant.
Fixes: be080a6f43c4 ("afs: Overhaul permit caching");
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
__afs_break_callback() holds vnode->lock around its call of
afs_lock_may_be_available() - which also takes that lock.
Fix this by not taking the lock in __afs_break_callback().
Also, there's no point checking the granted_locks and pending_locks queues;
it's sufficient to check lock_state, so move that check out of
afs_lock_may_be_available() into __afs_break_callback() to replace the
queue checks.
Fixes: e8d6c554126b ("AFS: implement file locking")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Don't invalidate the callback promise on a directory if the
AFS_VNODE_DIR_VALID flag is not set (which indicates that the directory
contents are invalid, due to edit failure, callback break, page reclaim).
The directory will be reloaded next time the directory is accessed, so
clearing the callback flag at this point may race with a reload of the
directory and cancel it's recorded callback promise.
Fixes: f3ddee8dc4e2 ("afs: Fix directory handling")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Fix the calculation of the expiry time of a callback promise, as obtained
from operations like FS.FetchStatus and FS.FetchData.
The time should be based on the timestamp of the first DATA packet in the
reply and the calculation needs to turn the ktime_t timestamp into a
time64_t.
Fixes: c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Make dynamic root population wait uninterruptibly for proc_cells_lock.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Make certain RPC operations non-interruptible, including:
(*) Set attributes
(*) Store data
We don't want to get interrupted during a flush on close, flush on
unlock, writeback or an inode update, leaving us in a state where we
still need to do the writeback or update.
(*) Extend lock
(*) Release lock
We don't want to get lock extension interrupted as the file locks on
the server are time-limited. Interruption during lock release is less
of an issue since the lock is time-limited, but it's better to
complete the release to avoid a several-minute wait to recover it.
*Setting* the lock isn't a problem if it's interrupted since we can
just return to the user and tell them they were interrupted - at
which point they can elect to retry.
(*) Silly unlink
We want to remove silly unlink files if we can, rather than leaving
them for the salvager to clear up.
Note that whilst these calls are no longer interruptible, they do have
timeouts on them, so if the server stops responding the call will fail with
something like ETIME or ECONNRESET.
Without this, the following:
kAFS: Unexpected error from FS.StoreData -512
appears in dmesg when a pending store data gets interrupted and some
processes may just hang.
Additionally, make the code that checks/updates the server record ignore
failure due to interruption if the main call is uninterruptible and if the
server has an address list. The next op will check it again since the
expiration time on the old list has past.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Jonathan Billings <jsbillings@jsbillings.org>
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Allow kernel services using AF_RXRPC to indicate that a call should be
non-interruptible. This allows kafs to make things like lock-extension and
writeback data storage calls non-interruptible.
If this is set, signals will be ignored for operations on that call where
possible - such as waiting to get a call channel on an rxrpc connection.
It doesn't prevent UDP sendmsg from being interrupted, but that will be
handled by packet retransmission.
rxrpc_kernel_recv_data() isn't affected by this since that never waits,
preferring instead to return -EAGAIN and leave the waiting to the caller.
Userspace initiated calls can't be set to be uninterruptible at this time.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
afs_check/update_server_record() should be setting fc->error rather than
fc->ac.error as they're called from within the cursor iteration function.
afs_fs_cursor::error is where the error code of the attempt to call the
operation on multiple servers is integrated and is the final result,
whereas afs_addr_cursor::error is used to hold the error from individual
iterations of the call loop. (Note there's also an afs_vl_cursor which
also wraps afs_addr_cursor for accessing VL servers rather than file
servers).
Fix this by setting fc->error in the afs_check/update_server_record() so
that any error incurred whilst talking to the VL server correctly
propagates to the final result.
This results in:
kAFS: Unexpected error from FS.StoreData -512
being seen, even though the store-data op is non-interruptible. The error
is actually coming from the server record update getting interrupted.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
If an older AFS server doesn't support an operation, it may accept the call
and then sit on it forever, happily responding to pings that make kafs
think that the call is still alive.
Fix this by setting the maximum lifespan of Volume Location service calls
in particular and probe calls in general so that they don't run on
endlessly if they're not supported.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Under some circumstances afs_select_fileserver() can return without setting
an error in fc->error. The problem is in the no_more_servers segment where
the accumulated errors from attempts to contact various servers are
integrated into an afs_error-type variable 'e'. The resultant error code
is, however, then abandoned.
Fix this by getting the error out of e.error and putting it in 'error' so
that the next part will store it into fc->error.
Not doing this causes a report like the following:
kAFS: AFS vnode with undefined type 0
kAFS: A=0 m=0 s=0 v=0
kAFS: vnode 20000025:1:1
because the code following the server selection loop then sees what it
thinks is a successful invocation because fc.error is 0. However, it can't
apply the status record because it's all zeros.
The report is followed on the first instance with a trace looking something
like:
dump_stack+0x67/0x8e
afs_inode_init_from_status.isra.2+0x21b/0x487
afs_fetch_status+0x119/0x1df
afs_iget+0x130/0x295
afs_get_tree+0x31d/0x595
vfs_get_tree+0x1f/0xe8
fc_mount+0xe/0x36
afs_d_automount+0x328/0x3c3
follow_managed+0x109/0x20a
lookup_fast+0x3bf/0x3f8
do_last+0xc3/0x6a4
path_openat+0x1af/0x236
do_filp_open+0x51/0xae
? _raw_spin_unlock+0x24/0x2d
? __alloc_fd+0x1a5/0x1b7
do_sys_open+0x13b/0x1e8
do_syscall_64+0x7d/0x1b3
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: 4584ae96ae30 ("afs: Fix missing net error handling")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Currently, once configured, AFS cells are looked up in the DNS at regular
intervals - which is a waste of resources if those cells aren't being
used. It also leads to a problem where cells preloaded, but not
configured, before the network is brought up end up effectively statically
configured with no VL servers and are unable to get any.
Fix this by not doing the DNS lookup until the first time a cell is
touched. It is waited for if we don't have any cached records yet,
otherwise the DNS lookup to maintain the record is done in the background.
This has the downside that the first time you touch a cell, you now have to
wait for the upcall to do the required DNS lookups rather than them already
being cached.
Further, the record is not replaced if the old record has at least one
server in it and the new record doesn't have any.
Fixes: 0a5143f2f89c ("afs: Implement VL server rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Allow used DNS resolver keys to be invalidated after use if the caller is
doing its own caching of the results. This reduces the amount of resources
required.
Fix AFS to invalidate DNS results to kill off permanent failure records
that get lodged in the resolver keyring and prevent future lookups from
happening.
Fixes: 0a5143f2f89c ("afs: Implement VL server rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Fix it such that afs_cell records always have a VL server list record
attached, even if it's a dummy one, so that various checks can be removed.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
When afs_update_cell() replaces the cell->vl_servers list, it uses RCU
protocol so that proc is protected, but doesn't take ->vl_servers_lock to
protect afs_start_vl_iteration() (which does actually take a shared lock).
Fix this by making afs_update_cell() take an exclusive lock when replacing
->vl_servers.
Fixes: 0a5143f2f89c ("afs: Implement VL server rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
afs_xattr_get_yfs() tries to free yacl, which may hold an error value (say
if yfs_fs_fetch_opaque_acl() failed and returned an error).
Fix this by allocating yacl up front (since it's a fixed-length struct,
unlike afs_acl) and passing it in to the RPC function. This also allows
the flags to be placed in the object rather than passing them through to
the RPC function.
Fixes: ae46578b963f ("afs: Get YFS ACLs and information through xattrs")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Fix incorrect error handling in afs_xattr_get_acl() where there appears to
be a redundant assignment before return, but in fact the return should be a
goto to the error handling at the end of the function.
Fixes: 260f082bae6d ("afs: Get an AFS3 ACL as an xattr")
Addresses-Coverity: ("Unused Value")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Joe Perches <joe@perches.com>
|
|
Fix afs_release() to go through the cleanup part of the function if
FMODE_WRITE is set rather than exiting through vfs_fsync() (which skips the
cleanup). The cleanup involves discarding the refs on the key used for
file ops and the writeback key record.
Also fix afs_evict_inode() to clean up any left over wb keys attached to
the inode/vnode when it is removed.
Fixes: 5a8132761609 ("afs: Do better accretion of small writes on newly created content")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS updates from David Howells:
"A set of fix and development patches for AFS for 5.2.
Summary:
- Fix the AFS file locking so that sqlite can run on an AFS mount and
also so that firefox and gnome can use a homedir that's mounted
through AFS.
This required emulation of fine-grained locking when the server
will only support whole-file locks and no upgrade/downgrade. Four
modes are provided, settable by mount parameter:
"flock=local" - No reference to the server
"flock=openafs" - Fine-grained locks are local-only, whole-file
locks require sufficient server locks
"flock=strict" - All locks require sufficient server locks
"flock=write" - Always get an exclusive server lock
If the volume is a read-only or backup volume, then flock=local for
that volume.
- Log extra information for a couple of cases where the client mucks
up somehow: AFS vnode with undefined type and dir check failure -
in both cases we seem to end up with unfilled data, but the issues
happen infrequently and are difficult to reproduce at will.
- Implement silly rename for unlink() and rename().
- Set i_blocks so that du can get some information about usage.
- Fix xattr handlers to return the right amount of data and to not
overflow buffers.
- Implement getting/setting raw AFS and YFS ACLs as xattrs"
* tag 'afs-next-20190507' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Implement YFS ACL setting
afs: Get YFS ACLs and information through xattrs
afs: implement acl setting
afs: Get an AFS3 ACL as an xattr
afs: Fix getting the afs.fid xattr
afs: Fix the afs.cell and afs.volume xattr handlers
afs: Calculate i_blocks based on file size
afs: Log more information for "kAFS: AFS vnode with undefined type\n"
afs: Provide mount-time configurable byte-range file locking emulation
afs: Add more tracepoints
afs: Implement sillyrename for unlink and rename
afs: Add directory reload tracepoint
afs: Handle lock rpc ops failing on a file that got deleted
afs: Improve dir check failure reports
afs: Add file locking tracepoints
afs: Further fix file locking
afs: Fix AFS file locking to allow fine grained locks
afs: Calculate lock extend timer from set/extend reply reception
afs: Split wait from afs_make_call()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull Wimplicit-fallthrough updates from Gustavo A. R. Silva:
"Mark switch cases where we are expecting to fall through.
This is part of the ongoing efforts to enable -Wimplicit-fallthrough.
Most of them have been baking in linux-next for a whole development
cycle. And with Stephen Rothwell's help, we've had linux-next
nag-emails going out for newly introduced code that triggers
-Wimplicit-fallthrough to avoid gaining more of these cases while we
work to remove the ones that are already present.
We are getting close to completing this work. Currently, there are
only 32 of 2311 of these cases left to be addressed in linux-next. I'm
auditing every case; I take a look into the code and analyze it in
order to determine if I'm dealing with an actual bug or a false
positive, as explained here:
https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/
While working on this, I've found and fixed the several missing
break/return bugs, some of them introduced more than 5 years ago.
Once this work is finished, we'll be able to universally enable
"-Wimplicit-fallthrough" to avoid any of these kinds of bugs from
entering the kernel again"
* tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits)
memstick: mark expected switch fall-throughs
drm/nouveau/nvkm: mark expected switch fall-throughs
NFC: st21nfca: Fix fall-through warnings
NFC: pn533: mark expected switch fall-throughs
block: Mark expected switch fall-throughs
ASN.1: mark expected switch fall-through
lib/cmdline.c: mark expected switch fall-throughs
lib: zstd: Mark expected switch fall-throughs
scsi: sym53c8xx_2: sym_nvram: Mark expected switch fall-through
scsi: sym53c8xx_2: sym_hipd: mark expected switch fall-throughs
scsi: ppa: mark expected switch fall-through
scsi: osst: mark expected switch fall-throughs
scsi: lpfc: lpfc_scsi: Mark expected switch fall-throughs
scsi: lpfc: lpfc_nvme: Mark expected switch fall-through
scsi: lpfc: lpfc_nportdisc: Mark expected switch fall-through
scsi: lpfc: lpfc_hbadisc: Mark expected switch fall-throughs
scsi: lpfc: lpfc_els: Mark expected switch fall-throughs
scsi: lpfc: lpfc_ct: Mark expected switch fall-throughs
scsi: imm: mark expected switch fall-throughs
scsi: csiostor: csio_wr: mark expected switch fall-through
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs inode freeing updates from Al Viro:
"Introduction of separate method for RCU-delayed part of
->destroy_inode() (if any).
Pretty much as posted, except that destroy_inode() stashes
->free_inode into the victim (anon-unioned with ->i_fops) before
scheduling i_callback() and the last two patches (sockfs conversion
and folding struct socket_wq into struct socket) are excluded - that
pair should go through netdev once davem reopens his tree"
* 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (58 commits)
orangefs: make use of ->free_inode()
shmem: make use of ->free_inode()
hugetlb: make use of ->free_inode()
overlayfs: make use of ->free_inode()
jfs: switch to ->free_inode()
fuse: switch to ->free_inode()
ext4: make use of ->free_inode()
ecryptfs: make use of ->free_inode()
ceph: use ->free_inode()
btrfs: use ->free_inode()
afs: switch to use of ->free_inode()
dax: make use of ->free_inode()
ntfs: switch to ->free_inode()
securityfs: switch to ->free_inode()
apparmor: switch to ->free_inode()
rpcpipe: switch to ->free_inode()
bpf: switch to ->free_inode()
mqueue: switch to ->free_inode()
ufs: switch to ->free_inode()
coda: switch to ->free_inode()
...
|
|
Implement the setting of YFS ACLs in AFS through the interface of setting
the afs.yfs.acl extended attribute on the file.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
The YFS/AuriStor variant of AFS provides more capable ACLs and provides
per-volume ACLs and per-file ACLs as well as per-directory ACLs. It also
provides some extra information that can be retrieved through four ACLs:
(1) afs.yfs.acl
The YFS file ACL (not the same format as afs.acl).
(2) afs.yfs.vol_acl
The YFS volume ACL.
(3) afs.yfs.acl_inherited
"1" if a file's ACL is inherited from its parent directory, "0"
otherwise.
(4) afs.yfs.acl_num_cleaned
The number of of ACEs removed from the ACL by the server because the
PT entries were removed from the PTS database (ie. the subject is no
longer known).
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Implements the setting of ACLs in AFS by means of setting the
afs.acl extended attribute on the file.
Signed-off-by: Joe Gorse <jhgorse@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Implement an xattr on AFS files called "afs.acl" that retrieves a file's
ACL. It returns the raw AFS3 ACL from the result of calling FS.FetchACL,
leaving any interpretation to userspace.
Note that whilst YFS servers will respond to FS.FetchACL, this will render
a more-advanced YFS ACL down. Use "afs.yfs.acl" instead for that.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
The AFS3 FID is three 32-bit unsigned numbers and is represented as three
up-to-8-hex-digit numbers separated by colons to the afs.fid xattr.
However, with the advent of support for YFS, the FID is now a 64-bit volume
number, a 96-bit vnode/inode number and a 32-bit uniquifier (as before).
Whilst the sprintf in afs_xattr_get_fid() has been partially updated (it
currently ignores the upper 32 bits of the 96-bit vnode number), the size
of the stack-based buffer has not been increased to match, thereby allowing
stack corruption to occur.
Fix this by increasing the buffer size appropriately and conditionally
including the upper part of the vnode number if it is non-zero. The latter
requires the lower part to be zero-padded if the upper part is non-zero.
Fixes: 3b6492df4153 ("afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Fix the ->get handlers for the afs.cell and afs.volume xattrs to pass the
source data size to memcpy() rather than target buffer size.
Overcopying the source data occasionally causes the kernel to oops.
Fixes: d3e3b7eac886 ("afs: Add metadata xattrs")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
While it's not possible to give an accurate number for the blocks
used on the server, populate i_blocks based on the file size so
that 'du' can give a reasonable estimate.
The value is rounded up to 1K granularity, for consistency with
what other AFS clients report, and the servers' 1K usage quota
unit. Note that the value calculated by 'du' at the root of a
volume can still be slightly lower than the quota usage on the
server, as 0-length files are charged 1 quota block, but are
reported as occupying 0 blocks. Again, this is consistent with
other AFS clients.
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Log more information when "kAFS: AFS vnode with undefined type\n" is
displayed due to a vnode record being retrieved from the server that
appears to have a duff file type (usually 0). This prints more information
to try and help pin down the problem.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
debugging printks left in ->destroy_inode() and so's the
update of inode count; we could take the latter to RCU-delayed
part (would take only moving the check on module exit past
rcu_barrier() there), but debugging output ought to either
stay where it is or go into ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Provide byte-range file locking emulation that can be configured at mount
time to one of four modes:
(1) flock=local. Locking is done locally only and no reference is made to
the server.
(2) flock=openafs. Byte-range locking is done locally only; whole-file
locking is done with reference to the server. Whole-file locks cannot
be upgraded unless the client holds an exclusive lock.
(3) flock=strict. Byte-range and whole-file locking both require a
sufficient whole-file lock on the server.
(4) flock=write. As strict, but the client always gets an exclusive
whole-file lock on the server.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Add four more tracepoints:
(1) afs_make_fs_call1 - Split from afs_make_fs_call but takes a filename
to log also.
(2) afs_make_fs_call2 - Like the above but takes two filenames to log.
(3) afs_lookup - Log the result of doing a successful lookup, including a
negative result (fid 0:0).
(4) afs_get_tree - Log the set up of a volume for mounting.
It also extends the name buffer on the afs_edit_dir tracepoint to 24 chars
and puts quotes around the filename in the text representation.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Implement sillyrename for AFS unlink and rename, using the NFS variant
implementation as a basis.
Note that the asynchronous file locking extender/releaser has to be
notified with a state change to stop it complaining if there's a race
between that and the actual file deletion.
A tracepoint, afs_silly_rename, is also added to note the silly rename and
the cleanup. The afs_edit_dir tracepoint is given some extra reason
indicators and the afs_flock_ev tracepoint is given a silly-delete file
lock cancellation indicator.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Add a tracepoint (afs_reload_dir) to indicate when a directory is being
reloaded.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Holding a file lock on an AFS file does not prevent it from being deleted
on the server, so we need to handle an error resulting from that when we
try setting, extending or releasing a lock.
Fix this by adding a "deleted" lock state and cancelling the lock extension
process for that file and aborting all waiters for the lock.
Fixes: 0fafdc9f888b ("afs: Fix file locking")
Reported-by: Jonathan Billings <jsbillin@umich.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Improve the content of directory check failure reports from:
kAFS: afs_dir_check_page(6d57): bad magic 1/2 is 0000
to dump more information about the individual blocks in a directory page.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Add two tracepoints for monitoring AFS file locking. Firstly, add one that
follows the operational part:
echo 1 >/sys/kernel/debug/tracing/events/afs/afs_flock_op/enable
And add a second that more follows the event-driven part:
echo 1 >/sys/kernel/debug/tracing/events/afs/afs_flock_ev/enable
Individual file_lock structs seen by afs are tagged with debugging IDs that
are displayed in the trace log to make it easier to see what's going on,
especially as setting the first lock always seems to involve copying the
file_lock twice.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Further fix the file locking in the afs filesystem client in a number of
ways, including:
(1) Don't submit the operation to obtain a lock from the server in a work
queue context, but rather do it in the process context of whoever
issued the requesting system call.
(2) The owner of the file_lock struct at the front of the pending_locks
queue now owns right to talk to the server.
(3) Write locks can be instantly granted if they don't overlap with any
other locks *and* we have a write lock on the server.
(4) In the event of an authentication/permission error, all other matching
pending locks requests are also immediately aborted.
(5) Properly use VFS core locks_lock_file_wait() to distribute the server
lock amongst local client locks, including waiting for the lock to
become available.
Test with:
sqlite3 /afs/.../scratch/billings.sqlite <<EOF
CREATE TABLE hosts (
hostname varchar(80),
shorthost varchar(80),
room varchar(30),
building varchar(30),
PRIMARY KEY(shorthost)
);
EOF
With the version of sqlite3 that I have, this should fail consistently with
EAGAIN, whether or not the program is straced (which introduces some delays
between lock syscalls).
Fixes: 0fafdc9f888b ("afs: Fix file locking")
Reported-by: Jonathan Billings <jsbillin@umich.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Fix AFS file locking to allow fine grained locks as some applications, such
as firefox, won't work if they can't take such locks on certain state files
- thereby preventing the use of kAFS to distribute a home directory.
Note that this cannot be made completely functional as the protocol only
has provision for whole-file locks, so there exists the possibility of a
process deadlocking itself by getting a partial read-lock on a file first
and then trying to get a non-overlapping write-lock - but we got the
server's read lock with the first lock, so we're now stuck.
OpenAFS solves this by just granting any partial-range lock directly
without consulting the server - and hoping there's no remote collision. I
want to implement that in a separate patch and it requires a bit more
thought.
Fixes: 8d6c554126b8 ("AFS: implement file locking")
Reported-by: Jonathan Billings <jsbillings@jsbillings.org>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Record the timestamp on the first reply DATA packet received in response to
a set- or extend-lock operation, then use this to calculate the time
remaining till the lock expires rather than using whatever time the
requesting process wakes up and finishes processing the operation as a
base.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Split the call to afs_wait_for_call_to_complete() from afs_make_call() to
make it easier to handle asynchronous calls and to make it easier to
convert a synchronous call to an asynchronous one in future, for instance
when someone tries to interrupt an operation by pressing Ctrl-C.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
The in-kernel afs filesystem client counts the number of server-level
callback invalidation events (CB.InitCallBackState* RPC operations) that it
receives from the server. This is stored in cb_s_break in various
structures, including afs_server and afs_vnode.
If an inode is examined by afs_validate(), say, the afs_server copy is
compared, along with other break counters, to those in afs_vnode, and if
one or more of the counters do not match, it is considered that the
server's callback promise is broken. At points where this happens,
AFS_VNODE_CB_PROMISED is cleared to indicate that the status must be
refetched from the server.
afs_validate() issues an FS.FetchStatus operation to get updated metadata -
and based on the updated data_version may invalidate the pagecache too.
However, the break counters are also used to determine whether to note a
new callback in the vnode (which would set the AFS_VNODE_CB_PROMISED flag)
and whether to cache the permit data included in the YFSFetchStatus record
by the server.
The problem comes when the server sends us a CB.InitCallBackState op. The
first such instance doesn't cause cb_s_break to be incremented, but rather
causes AFS_SERVER_FL_NEW to be cleared - but thereafter, say some hours
after last use and all the volumes have been automatically unmounted and
the server has forgotten about the client[*], this *will* likely cause an
increment.
[*] There are other circumstances too, such as the server restarting or
needing to make space in its callback table.
Note that the server won't send us a CB.InitCallBackState op until we talk
to it again.
So what happens is:
(1) A mount for a new volume is attempted, a inode is created for the root
vnode and vnode->cb_s_break and AFS_VNODE_CB_PROMISED aren't set
immediately, as we don't have a nominated server to talk to yet - and
we may iterate through a few to find one.
(2) Before the operation happens, afs_fetch_status(), say, notes in the
cursor (fc.cb_break) the break counter sum from the vnode, volume and
server counters, but the server->cb_s_break is currently 0.
(3) We send FS.FetchStatus to the server. The server sends us back
CB.InitCallBackState. We increment server->cb_s_break.
(4) Our FS.FetchStatus completes. The reply includes a callback record.
(5) xdr_decode_AFSCallBack()/xdr_decode_YFSCallBack() check to see whether
the callback promise was broken by checking the break counter sum from
step (2) against the current sum.
This fails because of step (3), so we don't set the callback record
and, importantly, don't set AFS_VNODE_CB_PROMISED on the vnode.
This does not preclude the syscall from progressing, and we don't loop here
rechecking the status, but rather assume it's good enough for one round
only and will need to be rechecked next time.
(6) afs_validate() it triggered on the vnode, probably called from
d_revalidate() checking the parent directory.
(7) afs_validate() notes that AFS_VNODE_CB_PROMISED isn't set, so doesn't
update vnode->cb_s_break and assumes the vnode to be invalid.
(8) afs_validate() needs to calls afs_fetch_status(). Go back to step (2)
and repeat, every time the vnode is validated.
This primarily affects volume root dir vnodes. Everything subsequent to
those inherit an already incremented cb_s_break upon mounting.
The issue is that we assume that the callback record and the cached permit
information in a reply from the server can't be trusted after getting a
server break - but this is wrong since the server makes sure things are
done in the right order, holding up our ops if necessary[*].
[*] There is an extremely unlikely scenario where a reply from before the
CB.InitCallBackState could get its delivery deferred till after - at
which point we think we have a promise when we don't. This, however,
requires unlucky mass packet loss to one call.
AFS_SERVER_FL_NEW tries to paper over the cracks for the initial mount from
a server we've never contacted before, but this should be unnecessary.
It's also further insulated from the problem on an initial mount by
querying the server first with FS.GetCapabilities, which triggers the
CB.InitCallBackState.
Fix this by
(1) Remove AFS_SERVER_FL_NEW.
(2) In afs_calc_vnode_cb_break(), don't include cb_s_break in the
calculation.
(3) In afs_cb_is_broken(), don't include cb_s_break in the check.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
__pagevec_release() complains loudly if any page in the vector is still
locked. The pages need to be locked for generic_error_remove_page(), but
that function doesn't actually unlock them.
Unlock the pages afterwards.
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jonathan Billings <jsbillin@umich.edu>
|
|
Differentiate an abort due to an unmarshalling error from an abort due to
other errors, such as ENETUNREACH. It doesn't make sense to set abort code
RXGEN_*_UNMARSHAL in such a case, so use RX_USER_ABORT instead.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
__tracepoint_str cannot be const because the tracepoint_str
section is not read-only. Remove the stray const.
Cc: dhowells@redhat.com
Cc: viro@zeniv.linux.org.uk
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|