summaryrefslogtreecommitdiff
path: root/fs/nfs/pnfs.h
AgeCommit message (Collapse)Author
2018-01-14pnfs/blocklayout: handle transient devicesBenjamin Coddington
PNFS block/SCSI layouts should gracefully handle cases where block devices are not available when a layout is retrieved, or the block devices are removed while the client holds a layout. While setting up a layout segment, keep a record of an unavailable or un-parsable block device in cache with a flag so that subsequent layouts do not spam the server with GETDEVINFO. We can reuse the current NFS_DEVICEID_UNAVAILABLE handling with one variation: instead of reusing the device, we will discard it and send a fresh GETDEVINFO after the timeout, since the lookup and validation of the device occurs within the GETDEVINFO response handling. A lookup of a layout segment that references an unavailable device will return a segment with the NFS_LSEG_UNAVAILABLE flag set. This will allow the pgio layer to mark the layout with the appropriate fail bit, which forces subsequent IO to the MDS, and prevents spamming the server with LAYOUTGET, LAYOUTRETURN. Finally, when IO to a block device fails, look up the block device(s) referenced by the pgio header, and mark them as unavailable. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14pnfs/blocklayout: set PNFS_LAYOUTRETURN_ON_ERRORBenjamin Coddington
If there's an error doing I/O to block device, and the client resends the I/O to the MDS, the MDS must recall the layout from the client before processing the I/O. Let's preempt that exchange by returning the layout before falling back to the MDS when there's an error. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-11-17pNFS: Retry NFS4ERR_OLD_STATEID errors in layoutreturn-on-closeTrond Myklebust
If our layoutreturn on close operation returns an NFS4ERR_OLD_STATEID, then try to update the stateid and retry. We know that there should be no further LAYOUTGET requests being launched. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17fs, nfs: convert pnfs_layout_hdr.plh_refcount from atomic_t to refcount_tElena Reshetova
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable pnfs_layout_hdr.plh_refcount is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17fs, nfs: convert pnfs_layout_segment.pls_refcount from atomic_t to refcount_tElena Reshetova
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17fs, nfs: convert nfs4_pnfs_ds.ds_count from atomic_t to refcount_tElena Reshetova
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nfs4_pnfs_ds.ds_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-15NFSv4/pnfs: Replace pnfs_put_lseg_locked() with pnfs_put_lseg()Trond Myklebust
Now that we no longer hold the inode->i_lock when manipulating the commit lists, it is safe to call pnfs_put_lseg() again. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-24pnfs: Fix the check for requests in range of layout segmentBenjamin Coddington
It's possible and acceptable for NFS to attempt to add requests beyond the range of the current pgio->pg_lseg, a case which should be caught and limited by the pg_test operation. However, the current handling of this case replaces pgio->pg_lseg with a new layout segment (after a WARN) within that pg_test operation. That will cause all the previously added requests to be submitted with this new layout segment, which may not be valid for those requests. Fix this problem by only returning zero for the number of bytes to coalesce from pg_test for this case which allows any previously added requests to complete on the current layout segment. The check for requests starting out of range of the layout segment moves to pg_init, so that the replacement of pgio->pg_lseg will be done when the next request is added. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-25pNFS: Ensure we check layout segment validity in the pg_init() callbackTrond Myklebust
If we have a layout segment cached in pgio->pg_lseg, we should check it for validity before reusing it in a new RPC request. Otherwise, if we recoalesce, we can end up looping forever. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-20pNFS: Remove unused layout driver callbacksTrond Myklebust
encode_layoutreturn and encode_layoutcommit are now unused. Let's remove them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-03-17pNFS: return status from nfs4_pnfs_ds_connectWeston Andros Adamson
The nfs4_pnfs_ds_connect path can call rpc_create which can fail or it can wait on another context to reach the same failure. This checks that the rpc_create succeeded and returns the error to the caller. When an error is returned, both the files and flexfiles layouts will return NULL from _prepare_ds(). The flexfiles layout will also return the layout with the error NFS4ERR_NXIO. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-12-03pNFS/flexfiles: Minor refactoring before adding iostats to layoutreturnTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-03pNFS: Add a layoutreturn callback to performa layout-private setupTrond Myklebust
Add a callback to allow the flexfiles layout driver to initialise the layout private payload. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01NFS: Remove unused authflavour parameter from nfs_get_client()Anna Schumaker
This parameter hasn't been used since f8407299 (Linux 3.11-rc2), so let's remove it from this function and callers. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01pNFS: Enable layoutreturn operation for return-on-closeTrond Myklebust
Amend the pnfs return on close helper functions to enable sending the layoutreturn op in CLOSE/DELEGRETURN. This closes a potential race between CLOSE/DELEGRETURN and parallel OPEN calls to the same file, and allows the client and the server to agree on whether or not there is an outstanding layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01pNFS: Get rid of unnecessary layout parameter in encode_layoutreturn callbackTrond Myklebust
The parameter is already present in the "args" structure. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01NFSv4: Ignore LAYOUTRETURN result if the layout doesn't match or is invalidTrond Myklebust
Fix a potential race with CB_LAYOUTRECALL in which the server recalls the remaining layout segments while our LAYOUTRETURN is still in transit. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01pNFS: Do not free layout segments that are marked for returnTrond Myklebust
We may want to process and transmit layout stat information for the layout segments that are being returned, so we should defer freeing them until after the layoutreturn has completed. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01pNFS: consolidate the different range intersection testsTrond Myklebust
Both pnfs.c and the flexfiles code have their own versions of the range intersection testing, and the "end_offset" helper. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01pNFS: On error, do not send LAYOUTGET until the LAYOUTRETURN has completedTrond Myklebust
If there is an I/O error, we should not call LAYOUTGET until the LAYOUTRETURN that reports the error is complete. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.8+
2016-09-19pnfs: add a new mechanism to select a layout driver according to an ordered listJeff Layton
Currently, the layout driver selection code always chooses the first one from the list. That's not really ideal however, as the server can send the list of layout types in any order that it likes. It's up to the client to select the best one for its needs. This patch adds an ordered list of preferred driver types and has the selection code sort the list of available layout drivers according to it. Any unrecognized layout type is sorted to the end of the list. For now, the order of preference is hardcoded, but it should be possible to make this configurable in the future. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19pnfs: track multiple layout types in fsinfo structureJeff Layton
Current NFSv4.1/pNFS client assumes that MDS supports only one layout type. While it's true for most existing servers, nevertheless, this can be change in the near future. For now, this patch just plumbs in the ability to track a list of layouts in the fsinfo structure. The existing behavior of the client is preserved, by having it just select the first entry in the list. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Jeff Layton <jlayton@poochiereds.net> Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-07-24Merge branch 'pnfs'Trond Myklebust
2016-07-24pNFS: Remove redundant pnfs_mark_layout_returned_if_empty()Trond Myklebust
That's already being taken care of in pnfs_layout_remove_lseg(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24pNFS: Cleanup - don't open code pnfs_mark_layout_stateid_invalid()Trond Myklebust
Ensure nfs42_layoutstat_done() layoutget don't open code layout stateid invalidation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24pNFS: LAYOUTRETURN should only update the stateid if the layout is validTrond Myklebust
If the layout was completely returned, then ignore the returned layout stateid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-18pNFS: Don't mark the inode as revalidated if a LAYOUTCOMMIT is outstandingTrond Myklebust
We know that the attributes will need updating if there is still a LAYOUTCOMMIT outstanding. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05pNFS: pnfs_layoutcommit_outstanding() is no longer used when !CONFIG_NFS_V4_1Trond Myklebust
Cleanup... Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-05-26pnfs: pnfs_update_layout needs to consider if strict iomode checking is onTom Haynes
As flexfiles has FF_FLAGS_NO_READ_IO, there is a need to generically support enforcing that a IOMODE_RW segment will not allow READ I/O. Signed-off-by: Tom Haynes <loghyr@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: rework LAYOUTGET retry handlingJeff Layton
There are several problems in the way a stateid is selected for a LAYOUTGET operation: We pick a stateid to use in the RPC prepare op, but that makes it difficult to serialize LAYOUTGETs that use the open stateid. That serialization is done in pnfs_update_layout, which occurs well before the rpc_prepare operation. Between those two events, the i_lock is dropped and reacquired. pnfs_update_layout can find that the list has lsegs in it and not do any serialization, but then later pnfs_choose_layoutget_stateid ends up choosing the open stateid. This patch changes the client to select the stateid to use in the LAYOUTGET earlier, when we're searching for a usable layout segment. This way we can do it all while holding the i_lock the first time, and ensure that we serialize any LAYOUTGET call that uses a non-layout stateid. This also means a rework of how LAYOUTGET replies are handled, as we must now get the latest stateid if we want to retransmit in response to a retryable error. Most of those errors boil down to the fact that the layout state has changed in some fashion. Thus, what we really want to do is to re-search for a layout when it fails with a retryable error, so that we can avoid reissuing the RPC at all if possible. While the LAYOUTGET RPC is async, the initiating thread always waits for it to complete, so it's effectively synchronous anyway. Currently, when we need to retry a LAYOUTGET because of an error, we drive that retry via the rpc state machine. This means that once the call has been submitted, it runs until it completes. So, we must move the error handling for this RPC out of the rpc_call_done operation and into the caller. In order to handle errors like NFS4ERR_DELAY properly, we must also pass a pointer to the sliding timeout, which is now moved to the stack in pnfs_update_layout. The complicating errors are -NFS4ERR_RECALLCONFLICT and -NFS4ERR_LAYOUTTRYLATER, as those involve a timeout after which we give up and return NULL back to the caller. So, there is some special handling for those errors to ensure that the layers driving the retries can handle that appropriately. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN argsJeff Layton
LAYOUTRETURN is "special" in that servers and clients are expected to work with old stateids. When the client sends a LAYOUTRETURN with an old stateid in it then the server is expected to only tear down layout segments that were present when that seqid was current. Ensure that the client handles its accounting accordingly. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: keep track of the return sequence number in pnfs_layout_hdrJeff Layton
When we want to selectively do a LAYOUTRETURN, we need to specify a stateid that represents most recent layout acquisition that is to be returned. When we mark a layout stateid to be returned, we update the return sequence number in the layout header with that value, if it's newer than the existing one. Then, when we go to do a LAYOUTRETURN on layout header put, we overwrite the seqid in the stateid with the saved one, and then zero it out. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: record sequence in pnfs_layout_segment when it's createdJeff Layton
In later patches, we're going to teach the client to be more selective about how it returns layouts. This means keeping a record of what the stateid's seqid was at the time that the server handed out a layout segment. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09pnfs: set NFS_IOHDR_REDO in pnfs_read_resend_pnfsWeston Andros Adamson
Like other resend paths, mark the (old) hdr as NFS_IOHDR_REDO. This ensures the hdr completion function will not count the (old) hdr as good bytes. Also, vector the error back through the hdr->task.tk_status like other retry calls. This fixes a bug with the FlexFiles layout where libaio was reporting more bytes read than requested. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-01-27NFS: Cleanup - rename NFS_LAYOUT_RETURN_BEFORE_CLOSETrond Myklebust
NFS_LAYOUT_RETURN_BEFORE_CLOSE is being used to signal that a layoutreturn is needed, either due to a layout recall or to a layout error. Rename it to NFS_LAYOUT_RETURN_REQUESTED in order to clarify its purpose. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04Merge branch 'pnfs_generic'Trond Myklebust
* pnfs_generic: NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid() NFSv4.1/pNFS: Fix a race in initiate_file_draining() NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn() NFS: Relax requirements in nfs_flush_incompatible NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid NFS: Allow multiple commit requests in flight per file NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs NFSv4: List stateid information in the callback tracepoints NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALL NFSv4.1/pNFS: Ensure we enforce RFC5661 Section 12.5.5.2.1 pNFS: If we have to delay the layout callback, mark the layout for return NFSv4.1/pNFS: Add a helper to mark the layout as returned pNFS: Ensure nfs4_layoutget_prepare returns the correct error
2016-01-04NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range argumentsTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structuresTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layoutTrond Myklebust
Fix a bug whereby if all the layout segments could be immediately freed, the call to pnfs_error_mark_layout_for_return() would never result in a layoutreturn. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-31NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalidTrond Myklebust
If the layout segment is invalid, then we should not be adding more write requests to the commit list. Instead, those writes should be replayed after requesting a new layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28pNFS: If we have to delay the layout callback, mark the layout for returnTrond Myklebust
If the client needs to delay the layout callback, then speed up the recall process by marking the remaining layout segments to be actively returned by the client. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28NFSv4.1/pNFS: Add a helper to mark the layout as returnedTrond Myklebust
This ensures that we don't reuse the stateid if a layout return or implied layout return means that we've returned all layout segments Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28pNFS/flexfiles: Don't prevent flexfiles client from retrying LAYOUTGETTrond Myklebust
Fix a bug in which flexfiles clients are falling back to I/O through the MDS even when the FF_FLAGS_NO_IO_THRU_MDS flag is set. The flexfiles client will always report errors through the LAYOUTRETURN and/or LAYOUTERROR mechanisms, so it should normally be safe for it to retry the LAYOUTGET until it fails or succeeds. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-23NFS41: make close wait for layoutreturnPeng Tao
If we send a layoutreturn asynchronously before close, the close might reach server first and layoutreturn would fail with BADSTATEID because there is nothing keeping the layout stateid alive. Also do not pretend sending layoutreturn if we are not. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25NFSv4.1/flexfiles: Allow coalescing of new layout segments and existing onesTrond Myklebust
In order to ensure atomicity of updates, we merge the old layout segments into the new ones, and then invalidate the old ones. Also ensure that we order the list of layout segments so that RO segments are preferred over RW. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25NFSv4.1/pnfs: Allow pNFS device drivers to customise layout segment insertionTrond Myklebust
This is needed in order to allow merging of contiguous layout segments, and also to correct the ordering of layouts for those device drivers that don't necessarily want to place the read-write layouts first. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25NFSv4.1/pnfs Improve the packing of struct pnfs_layout_hdrTrond Myklebust
Eliminate a couple of holes in the structure, and move the 2 atomics into the same cacheline. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25NFSv4.2/pnfs: Make the layoutstats timer configurableTrond Myklebust
Allow advanced users to set the layoutstats timer in order to lengthen or shorten the period between layoutstat transmissions to the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25NFS41: remove NFS_LAYOUT_ROC flagPeng Tao
If we return delegation before closing, we fail to do roc check during close because NFS_LAYOUT_ROC is cleared by delegreturn and it causes layouts to be still hanging around after delegreturn + close, which is a voilation against protocol. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-18NFSv4.1/pnfs: Fix a close/delegreturn hang when return-on-close is setTrond Myklebust
The helper pnfs_roc() has already verified that we have no delegations, and no further open files, hence no outstanding I/O and it has marked all the return-on-close lsegs as being invalid. Furthermore, it sets the NFS_LAYOUT_RETURN bit, thus serialising the close/delegreturn with all future layoutget calls on this inode. The checks in pnfs_roc_drain() for valid layout segments are therefore redundant: those cannot exist until another layoutget completes. The other check for whether or not NFS_LAYOUT_RETURN is set, actually causes a hang, since we already know that we hold that flag. To fix, we therefore strip out all the functionality in pnfs_roc_drain() except the retrieval of the barrier state, and then rename the function accordingly. Reported-by: Christoph Hellwig <hch@infradead.org> Fixes: 5c4a79fb2b1c ("Don't prevent layoutgets when doing return-on-close") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>