summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-02-03NFSv4: Limit the total number of cached delegationsTrond Myklebust
Delegations can be expensive to return, and can cause scalability issues for the server. Let's therefore try to limit the number of inactive delegations we hold. Once the number of delegations is above a certain threshold, start to return them on close. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03NFSv4: Add accounting for the number of active delegations heldTrond Myklebust
In order to better manage our delegation caching, add a counter to track the number of active delegations. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03NFSv4: Try to return the delegation immediately when marked for return on closeTrond Myklebust
Add a routine to return the delegation immediately upon close of the file if it was marked for return-on-close. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03NFS: Clear NFS_DELEGATION_RETURN_IF_CLOSED when the delegation is returnedTrond Myklebust
If a delegation is marked as needing to be returned when the file is closed, then don't clear that marking until we're ready to return it. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03NFSv4: nfs_inode_evict_delegation() should set NFS_DELEGATION_RETURNINGTrond Myklebust
In particular, the pnfs return-on-close code will check for that flag, so ensure we set it appropriately. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03NFS: nfs_find_open_context() should use cred_fscmp()Trond Myklebust
We want to find open contexts that match our filesystem access properties. They don't have to exactly match the cred. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03NFS: nfs_access_get_cached_rcu() should use cred_fscmp()Trond Myklebust
We do not need to have the rcu lookup method fail in the case where the fsuid/fsgid and supplemental groups match. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03NFSv4: pnfs_roc() must use cred_fscmp() to compare credsTrond Myklebust
When comparing two 'struct cred' for equality w.r.t. behaviour under filesystem access, we need to use cred_fscmp(). Fixes: a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03NFS: remove unused macrosAlex Shi
MNT_fhs_status_sz/MNT_fhandle3_sz are never used after they were introduced. So better to remove them. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: linux-nfs@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-24nfs: Return EINVAL rather than ERANGE for mount parse errorsDavid Howells
Return EINVAL rather than ERANGE for mount parse errors as the userspace mount command doesn't necessarily understand what to do with anything other than EINVAL. The old code returned -ERANGE as an intermediate error that then get converted to -EINVAL, whereas the new code returns -ERANGE. This was induced by passing minorversion=1 to a v4 mount where CONFIG_NFS_V4_1 was disabled in the kernel build. Fixes: 68f65ef40e1e ("NFS: Convert mount option parsing to use functionality from fs_parser.h") Reported-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-24NFS: allow deprecation of NFS UDP protocolOlga Kornievskaia
Add a kernel config CONFIG_NFS_DISABLE_UDP_SUPPORT to disallow NFS UDP mounts and enable it by default. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-24NFS: Add softreval behaviour to nfs_lookup_revalidate()Trond Myklebust
If the server is unavaliable, we want to allow the revalidating lookup to time out, and to default to validating the cached dentry if the 'softreval' mount option is set. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFSv3: FIx bug when using chacl and chmod to change aclSu Yanjun
We find a bug when running test under nfsv3  as below. 1) chacl u::r--,g::rwx,o:rw- file1 2) chmod u+w file1 3) chacl -l file1 We expect u::rw-, but it shows u::r--, more likely it returns the cached acl in inode. We dig the code find that the code path is different. chacl->..->__nfs3_proc_setacls->nfs_zap_acl_cache Then nfs_zap_acl_cache clears the NFS_INO_INVALID_ACL in NFS_I(inode)->cache_validity. chmod->..->nfs3_proc_setattr Because NFS_INO_INVALID_ACL has been cleared by chacl path, nfs_zap_acl_cache wont be called. nfs_setattr_update_inode will set NFS_INO_INVALID_ACL so let it before nfs_zap_acl_cache call. Signed-off-by: Su Yanjun <suyanjun218@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFSv4.x recover from pre-mature loss of openstateidOlga Kornievskaia
Ever since the commit 0e0cb35b417f, it's possible to lose an open stateid while retrying a CLOSE due to ERR_OLD_STATEID. Once that happens, operations that require openstateid fail with EAGAIN which is propagated to the application then tests like generic/446 and generic/168 fail with "Resource temporarily unavailable". Instead of returning this error, initiate state recovery when possible to recover the open stateid and then try calling nfs4_select_rw_stateid() again. Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFSv4 fix acl retrieval over krb5i/krb5p mountsOlga Kornievskaia
For the krb5i and krb5p mount, it was problematic to truncate the received ACL to the provided buffer because an integrity check could not be preformed. Instead, provide enough pages to accommodate the largest buffer bounded by the largest RPC receive buffer size. Note: I don't think it's possible for the ACL to be truncated now. Thus NFS4_ACL_TRUNC flag and related code could be possibly removed but since I'm unsure, I'm leaving it. v2: needs +1 page. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS: Add mount option 'softreval'Trond Myklebust
Add a mount option 'softreval' that allows attribute revalidation 'getattr' calls to time out, and causes them to fall back to using the cached attributes. The use case for this option is for ensuring that we can still (slowly) traverse paths and use cached information even when the server is down. Once the server comes back up again, the getattr calls start succeeding, and the caches will revalidate as usual. The 'softreval' mount option is automatically enabled if you have specified 'softerr'. It can be turned off using the options 'nosoftreval', or 'hard'. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS: Trust cached access if we've already revalidated the inode onceTrond Myklebust
If we've already revalidated the inode once then don't distrust the access cache unless the NFS_INO_INVALID_ACCESS flag is actually set. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS: Fix nfs_direct_write_reschedule_io()Trond Myklebust
The 'hdr->good_bytes' is defined as the number of bytes we expect to read or write starting at offset hdr->io_start. In the case of a partial read/write we may end up adjusting hdr->args.offset and hdr->args.count to skip I/O for data that was already read/written, and so we must ensure the calculation takes that into account. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS: When resending after a short write, reset the reply count to zeroTrond Myklebust
If we're resending a write due to a short read or write, ensure we reset the reply count to zero. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS: Improve tracing of permission callsTrond Myklebust
On exit from nfs_do_access(), record the mask representing the requested permissions, as well as the server-supplied set of access rights for this user. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15pNFS/flexfiles: Add tracing for layout errorsTrond Myklebust
Trace layout errors for pNFS/flexfiles on read/write/commit operations. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS: Clean up generic file commit tracepointTrond Myklebust
Clean up the generic file commit tracepoints to use a 64-bit value for the verifier, and to display the pNFS filehandle, if it exists. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS: Clean up generic writeback tracepointsTrond Myklebust
Clean up the generic writeback tracepoints so they do pass the full structures as arguments. Also ensure we report the number of bytes actually written. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS: Clean up generic file read tracepointsTrond Myklebust
Clean up the generic file read tracepoints so they do pass the full structures as arguments. Also ensure we report the number of bytes actually read. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15pNFS/flexfiles: Record resend attempts on I/O failureTrond Myklebust
If the attempt to do pNFS fails, then record what action we take to recover (resend, reset to pnfs or reset to mds). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS: Fix fix of show_nfs_errorsTrond Myklebust
Casting a negative value to an unsigned long is not the same as converting it to its absolute value. Fixes: 96650e2effa2 ("NFS: Fix show_nfs_errors macros again") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFSv4: Improve read/write/commit tracingTrond Myklebust
Ensure we always return the number of bytes read/written. Also display the pnfs filehandle if it is in use. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS/pnfs: Fix pnfs_generic_prepare_to_resend_writes()Trond Myklebust
Instead of making assumptions about the commit verifier contents, change the commit code to ensure we always check that the verifier was set by the XDR code. Fixes: f54bcf2ecee9 ("pnfs: Prepare for flexfiles by pulling out common code") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS: Fix up fsync() when the server rebootedTrond Myklebust
Don't clear the NFS_CONTEXT_RESEND_WRITES flag until after calling nfs_commit_inode(). Otherwise, if nfs_commit_inode() returns an error, we end up with dirty pages in the page cache, but no tag to tell us that those pages need resending. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15SUNRPC: Remove broken gss_mech_list_pseudoflavors()Trond Myklebust
Remove gss_mech_list_pseudoflavors() and its callers. This is part of an unused API, and could leak an RCU reference if it were ever called. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS: Revalidate the file mapping on all fatal writeback errorsTrond Myklebust
If a write or commit failed, and the mapping sees a fatal error, we need to revalidate the contents of that mapping. Fixes: 06c9fdf3b9f1 ("NFS: On fatal writeback errors, we need to call nfs_inode_remove_request()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS: Revalidate the file size on a fatal write errorTrond Myklebust
If we suffer a fatal error upon writing a file, which causes us to need to revalidate the entire mapping, then we should also revalidate the file size. Fixes: d2ceb7e57086 ("NFS: Don't use page_file_mapping after removing the page") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: DMA map rr_rdma_buf as each rpcrdma_rep is createdChuck Lever
Clean up: This simplifies the logic in rpcrdma_post_recvs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Destroy reps from previous connection instanceChuck Lever
To safely get rid of all rpcrdma_reps from a particular connection instance, xprtrdma has to wait until each of those reps is finished being used. A rep may be backing the rq_rcv_buf of an RPC that has just completed, for example. Since it is safe to invoke rpcrdma_rep_destroy() only in the Receive completion handler, simply mark reps remaining in the rb_all_reps list after the transport is drained. These will then be deleted as rpcrdma_post_recvs pulls them off the rep free list. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Destroy rpcrdma_rep when Receive is flushedChuck Lever
This reduces the hardware and memory footprint of an unconnected transport. At some point in the future, transport reconnect will allow resolving the destination IP address through a different device. The current change enables reps for the new connection to be allocated on whichever NUMA node the new device affines to after a reconnect. Note that this does not destroy _all_ the transport's reps... there will be a few that are still part of a running RPC completion. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Allocate and map transport header buffers at connect timeChuck Lever
Currently the underlying RDMA device is chosen at transport set-up time. But it will soon be at connect time instead. The maximum size of a transport header is based on device capabilities. Thus transport header buffers have to be allocated _after_ the underlying device has been chosen (via address and route resolution); ie, in the connect worker. Thus, move the allocation of transport header buffers to the connect worker, after the point at which the underlying RDMA device has been chosen. This also means the RDMA device is available to do a DMA mapping of these buffers at connect time, instead of in the hot I/O path. Make that optimization as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Refactor frwr_is_supportedChuck Lever
Refactor: Perform the "is supported" check in rpcrdma_ep_create() instead of in rpcrdma_ia_open(). frwr_open() is where most of the logic to query device attributes is already located. The current code displays a redundant error message when the device does not support FRWR. As an additional clean-up, this patch removes the extra message. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Eliminate per-transport "max pages"Chuck Lever
To support device hotplug and migrating a connection between devices of different capabilities, we have to guarantee that all in-kernel devices can support the same max NFS payload size (1 megabyte). This means that possibly one or two in-tree devices are no longer supported for NFS/RDMA because they cannot support 1MB rsize/wsize. The only one I confirmed was cxgb3, but it has already been removed from the kernel. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Refactor initialization of ep->rep_max_requestsChuck Lever
Clean up: there is no need to keep two copies of the same value. Also, in subsequent patches, rpcrdma_ep_create() will be called in the connect worker rather than at set-up time. Minor fix: Initialize the transport's sendctx to the value based on the capabilities of the underlying device, not the maximum setting. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Make sendctx queue lifetime the same as connection lifetimeChuck Lever
The size of the sendctx queue depends on the value stored in ia->ri_max_send_sges. This value is determined by querying the underlying device. Eventually, rpcrdma_ia_open() and rpcrdma_ep_create() will be called in the connect worker rather than at transport set-up time. The underlying device will not have been chosen device set-up time. The sendctx queue will thus have to be created after the underlying device has been chosen via address and route resolution; in other words, in the connect worker. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Eliminate ri_max_send_sgesChuck Lever
Clean-up. The max_send_sge value also happens to be stored in ep->rep_attr. Let's keep just a single copy. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS: Add missing null check for failed allocationColin Ian King
Currently the allocation of buf is not being null checked and a null pointer dereference can occur when the memory allocation fails. Fix this by adding a check and returning -ENOMEM. Addresses-Coverity: ("Dereference null return") Fixes: 6d972518b821 ("NFS: Add fs_context support.") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15nfs: NFS_SWAP should depend on SWAPGeert Uytterhoeven
If CONFIG_SWAP=n, it does not make much sense to offer the user the option to enable support for swapping over NFS, as that will still fail at run time: # swapon /swap swapon: /swap: swapon failed: Function not implemented Fix this by adding a dependency on CONFIG_SWAP. Fixes: a564b8f0398636ba ("nfs: enable swap on NFS") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15SUNRPC: constify copied structureJulia Lawall
The empty_iov structure is only copied into another structure, so make it const. The opportunity for this change was found using Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15fs/nfs, swapon: check holes in swapfileMurphy Zhou
swapon over NFS does not go through generic_swapfile_activate code path when setting up extents. This makes holes in NFS swapfiles possible which is not expected for swapon. Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15SUNRPC: call_connect_status should handle -EPROTOChuck Lever
The xprtrdma connect logic can return -EPROTO if the underlying device or network path does not support RDMA. This can happen after a device removal/insertion. - When SOFTCONN is set, EPROTO is a permanent error. - When SOFTCONN is not set, EPROTO is treated as a temporary error. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS4: Report callback authentication errorsChuck Lever
This seems to be a somewhat common issue with Kerberos NFSv4.0 set-ups. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS: Introduce trace events triggered by page writeback errorsChuck Lever
Try to capture the reason for the writeback path tagging an error on a page. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15SUNRPC: Capture signalled RPC tasksChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS: move dprintk after nfs_alloc_fattr in nfs3_proc_lookupzhengbin
In nfs3_proc_lookup, if nfs_alloc_fattr fails, will only print "NFS call lookup". This may be confusing, move dprintk after nfs_alloc_fattr. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>