Age | Commit message (Collapse) | Author |
|
Pull nfsd updates from Bruce Fields:
"Chuck Lever did a bunch of work on nfsd tracepoints, on RDMA, and on
server xdr decoding (with an eye towards eliminating a data copy in
the RDMA case).
I did some refactoring of the delegation code in preparation for
eliminating some delegation self-conflicts and implementing write
delegations"
* tag 'nfsd-4.17' of git://linux-nfs.org/~bfields/linux: (40 commits)
nfsd: fix incorrect umasks
sunrpc: remove incorrect HMAC request initialization
NFSD: Clean up legacy NFS SYMLINK argument XDR decoders
NFSD: Clean up legacy NFS WRITE argument XDR decoders
nfsd: Trace NFSv4 COMPOUND execution
nfsd: Add I/O trace points in the NFSv4 read proc
nfsd: Add I/O trace points in the NFSv4 write path
nfsd: Add "nfsd_" to trace point names
nfsd: Record request byte count, not count of vectors
nfsd: Fix NFSD trace points
svc: Report xprt dequeue latency
sunrpc: Report per-RPC execution stats
sunrpc: Re-purpose trace_svc_process
sunrpc: Save remote presentation address in svc_xprt for trace events
sunrpc: Simplify trace_svc_recv
sunrpc: Simplify do_enqueue tracing
sunrpc: Move trace_svc_xprt_dequeue()
sunrpc: Update show_svc_xprt_flags() to include recently added flags
svc: Simplify ->xpo_secure_port
sunrpc: Remove unneeded pointer dereference
...
|
|
make_checksum_hmac_md5() is allocating an HMAC transform and doing
crypto API calls in the following order:
crypto_ahash_init()
crypto_ahash_setkey()
crypto_ahash_digest()
This is wrong because it makes no sense to init() the request before a
key has been set, given that the initial state depends on the key. And
digest() is short for init() + update() + final(), so in this case
there's no need to explicitly call init() at all.
Before commit 9fa68f620041 ("crypto: hash - prevent using keyed hashes
without setting key") the extra init() had no real effect, at least for
the software HMAC implementation. (There are also hardware drivers that
implement HMAC-MD5, and it's not immediately obvious how gracefully they
handle init() before setkey().) But now the crypto API detects this
incorrect initialization and returns -ENOKEY. This is breaking NFS
mounts in some cases.
Fix it by removing the incorrect call to crypto_ahash_init().
Reported-by: Michael Young <m.a.young@durham.ac.uk>
Fixes: 9fa68f620041 ("crypto: hash - prevent using keyed hashes without setting key")
Fixes: fffdaef2eb4a ("gss_krb5: Add support for rc4-hmac encryption")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These pernet_operations initialize and destroy sunrpc_net_id
refered per-net items. Only used global list is cache_list,
and accesses already serialized.
sunrpc_destroy_cache_detail() check for list_empty() without
cache_list_lock, but when it's called from unregister_pernet_subsys(),
there can't be callers in parallel, so we won't miss list_empty()
in this case.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Prefer the direct use of octal for permissions.
Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace
and some typing.
Miscellanea:
o Whitespace neatening around these conversions.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix unaligned access in gss_{get,verify}_mic_v2() on sparc64
Signed-off-by: James Ettle <james@ettle.org.uk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.
This patch:
- Make groups_sort globally visible.
- Move the call to groups_sort to the modifiers of group_info
- Remove the call to groups_sort from set_groups
Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Make these const as they are only getting passed to the function
cache_create_net having the argument as const.
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Pull nfsd updates from Bruce Fields:
"Lots of good bugfixes, including:
- fix a number of races in the NFSv4+ state code
- fix some shutdown crashes in multiple-network-namespace cases
- relax our 4.1 session limits; if you've an artificially low limit
to the number of 4.1 clients that can mount simultaneously, try
upgrading"
* tag 'nfsd-4.15' of git://linux-nfs.org/~bfields/linux: (22 commits)
SUNRPC: Improve ordering of transport processing
nfsd: deal with revoked delegations appropriately
svcrdma: Enqueue after setting XPT_CLOSE in completion handlers
nfsd: use nfs->ns.inum as net ID
rpc: remove some BUG()s
svcrdma: Preserve CB send buffer across retransmits
nfds: avoid gettimeofday for nfssvc_boot time
fs, nfsd: convert nfs4_file.fi_ref from atomic_t to refcount_t
fs, nfsd: convert nfs4_cntl_odstate.co_odcount from atomic_t to refcount_t
fs, nfsd: convert nfs4_stid.sc_count from atomic_t to refcount_t
lockd: double unregister of inetaddr notifiers
nfsd4: catch some false session retries
nfsd4: fix cached replies to solo SEQUENCE compounds
sunrcp: make function _svc_create_xprt static
SUNRPC: Fix tracepoint storage issues with svc_recv and svc_rqst_status
nfsd: use ARRAY_SIZE
nfsd: give out fewer session slots as limit approaches
nfsd: increase DRC cache limit
nfsd: remove unnecessary nofilehandle checks
nfs_common: convert int to bool
...
|
|
It would be kinder to WARN() and recover in several spots here instead
of BUG()ing.
Also, it looks like the read_u32_from_xdr_buf() call could actually
fail, though it might require a broken (or malicious) client, so convert
that to just an error return.
Reported-by: Weston Andros Adamson <dros@monkey.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull NFS client updates from Anna Schumaker:
"Stable bugfixes:
- Fix -EACCESS on commit to DS handling
- Fix initialization of nfs_page_array->npages
- Only invalidate dentries that are actually invalid
Features:
- Enable NFSoRDMA transparent state migration
- Add support for lookup-by-filehandle
- Add support for nfs re-exporting
Other bugfixes and cleanups:
- Christoph cleaned up the way we declare NFS operations
- Clean up various internal structures
- Various cleanups to commits
- Various improvements to error handling
- Set the dt_type of . and .. entries in NFS v4
- Make slot allocation more reliable
- Fix fscache stat printing
- Fix uninitialized variable warnings
- Fix potential list overrun in nfs_atomic_open()
- Fix a race in NFSoRDMA RPC reply handler
- Fix return size for nfs42_proc_copy()
- Fix against MAC forgery timing attacks"
* tag 'nfs-for-4.13-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (68 commits)
NFS: Don't run wake_up_bit() when nobody is waiting...
nfs: add export operations
nfs4: add NFSv4 LOOKUPP handlers
nfs: add a nfs_ilookup helper
nfs: replace d_add with d_splice_alias in atomic_open
sunrpc: use constant time memory comparison for mac
NFSv4.2 fix size storage for nfs42_proc_copy
xprtrdma: Fix documenting comments in frwr_ops.c
xprtrdma: Replace PAGE_MASK with offset_in_page()
xprtrdma: FMR does not need list_del_init()
xprtrdma: Demote "connect" log messages
NFSv4.1: Use seqid returned by EXCHANGE_ID after state migration
NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration
xprtrdma: Don't defer MR recovery if ro_map fails
xprtrdma: Fix FRWR invalidation error recovery
xprtrdma: Fix client lock-up after application signal fires
xprtrdma: Rename rpcrdma_req::rl_free
xprtrdma: Pass only the list of registered MRs to ro_unmap_sync
xprtrdma: Pre-mark remotely invalidated MRs
xprtrdma: On invalidation failure, remove MWs from rl_registered
...
|
|
Pull nfsd updates from Bruce Fields:
"Chuck's RDMA update overhauls the "call receive" side of the
RPC-over-RDMA transport to use the new rdma_rw API.
Christoph cleaned the way nfs operations are declared, removing a
bunch of function-pointer casts and declaring the operation vectors as
const.
Christoph's changes touch both client and server, and both client and
server pulls this time around should be based on the same commits from
Christoph"
* tag 'nfsd-4.13' of git://linux-nfs.org/~bfields/linux: (53 commits)
svcrdma: fix an incorrect check on -E2BIG and -EINVAL
nfsd4: factor ctime into change attribute
svcrdma: Remove svc_rdma_chunk_ctxt::cc_dir field
svcrdma: use offset_in_page() macro
svcrdma: Clean up after converting svc_rdma_recvfrom to rdma_rw API
svcrdma: Clean-up svc_rdma_unmap_dma
svcrdma: Remove frmr cache
svcrdma: Remove unused Read completion handlers
svcrdma: Properly compute .len and .buflen for received RPC Calls
svcrdma: Use generic RDMA R/W API in RPC Call path
svcrdma: Add recvfrom helpers to svc_rdma_rw.c
sunrpc: Allocate up to RPCSVC_MAXPAGES per svc_rqst
svcrdma: Don't account for Receive queue "starvation"
svcrdma: Improve Reply chunk sanity checking
svcrdma: Improve Write chunk sanity checking
svcrdma: Improve Read chunk sanity checking
svcrdma: Remove svc_rdma_marshal.c
svcrdma: Avoid Send Queue overflow
svcrdma: Squelch disconnection messages
sunrpc: Disable splice for krb5i
...
|
|
Otherwise, we enable a MAC forgery via timing attack.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
struct rpc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
p_count is the only writeable memeber of struct rpc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.
This patch moves it into out out struct rpc_procinfo, and into a
separate writable array that is pointed to by struct rpc_version and
indexed by p_statidx.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Declare the p_decode callbacks with the proper prototype instead of
casting to kxdrdproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Declare the p_encode callbacks with the proper prototype instead of
casting to kxdreproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Running a multi-threaded 8KB fio test (70/30 mix), three or four out
of twelve of the jobs fail when using krb5i. The failure is an EIO
on a read.
Troubleshooting confirmed the EIO results when the client fails to
verify the MIC of an NFS READ reply. Bruce suggested the problem
could be due to the data payload changing between the time the
reply's MIC was computed on the server and the time the reply was
actually sent.
krb5p gets around this problem by disabling RQ_SPLICE_OK. Use the
same mechanism for krb5i RPCs.
"iozone -i0 -i1 -s128m -y1k -az -I", export is tmpfs, mount is
sec=krb5i,vers=3,proto=rdma. The important numbers are the
read / reread column.
Here's without the RQ_SPLICE_OK patch:
kB reclen write rewrite read reread
131072 1 7546 7929 8396 8267
131072 2 14375 14600 15843 15639
131072 4 19280 19248 21303 21410
131072 8 32350 31772 35199 34883
131072 16 36748 37477 49365 51706
131072 32 55669 56059 57475 57389
131072 64 74599 75190 74903 75550
131072 128 99810 101446 102828 102724
131072 256 122042 122612 124806 125026
131072 512 137614 138004 141412 141267
131072 1024 146601 148774 151356 151409
131072 2048 180684 181727 293140 292840
131072 4096 206907 207658 552964 549029
131072 8192 223982 224360 454493 473469
131072 16384 228927 228390 654734 632607
And here's with it:
kB reclen write rewrite read reread
131072 1 7700 7365 7958 8011
131072 2 13211 13303 14937 14414
131072 4 19001 19265 20544 20657
131072 8 30883 31097 34255 33566
131072 16 36868 34908 51499 49944
131072 32 56428 55535 58710 56952
131072 64 73507 74676 75619 74378
131072 128 100324 101442 103276 102736
131072 256 122517 122995 124639 124150
131072 512 137317 139007 140530 140830
131072 1024 146807 148923 151246 151072
131072 2048 179656 180732 292631 292034
131072 4096 206216 208583 543355 541951
131072 8192 223738 224273 494201 489372
131072 16384 229313 229840 691719 668427
I would say that there is not much difference in this test.
For good measure, here's the same test with sec=krb5p:
kB reclen write rewrite read reread
131072 1 5982 5881 6137 6218
131072 2 10216 10252 10850 10932
131072 4 12236 12575 15375 15526
131072 8 15461 15462 23821 22351
131072 16 25677 25811 27529 27640
131072 32 31903 32354 34063 33857
131072 64 42989 43188 45635 45561
131072 128 52848 53210 56144 56141
131072 256 59123 59214 62691 62933
131072 512 63140 63277 66887 67025
131072 1024 65255 65299 69213 69140
131072 2048 76454 76555 133767 133862
131072 4096 84726 84883 251925 250702
131072 8192 89491 89482 270821 276085
131072 16384 91572 91597 361768 336868
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=307
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
struct rpc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
p_count is the only writeable memeber of struct rpc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.
This patch moves it into out out struct rpc_procinfo, and into a
separate writable array that is pointed to by struct rpc_version and
indexed by p_statidx.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Declare the p_decode callbacks with the proper prototype instead of
casting to kxdrdproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Declare the p_encode callbacks with the proper prototype instead of
casting to kxdreproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Pull nfsd updates from Bruce Fields:
"The nfsd update this round is mainly a lot of miscellaneous cleanups
and bugfixes.
A couple changes could theoretically break working setups on upgrade.
I don't expect complaints in practice, but they seem worth calling out
just in case:
- NFS security labels are now off by default; a new security_label
export flag reenables it per export. But, having them on by default
is a disaster, as it generally only makes sense if all your clients
and servers have similar enough selinux policies. Thanks to Jason
Tibbitts for pointing this out.
- NFSv4/UDP support is off. It was never really supported, and the
spec explicitly forbids it. We only ever left it on out of
laziness; thanks to Jeff Layton for finally fixing that"
* tag 'nfsd-4.11' of git://linux-nfs.org/~bfields/linux: (34 commits)
nfsd: Fix display of the version string
nfsd: fix configuration of supported minor versions
sunrpc: don't register UDP port with rpcbind when version needs congestion control
nfs/nfsd/sunrpc: enforce transport requirements for NFSv4
sunrpc: flag transports as having congestion control
sunrpc: turn bitfield flags in svc_version into bools
nfsd: remove superfluous KERN_INFO
nfsd: special case truncates some more
nfsd: minor nfsd_setattr cleanup
NFSD: Reserve adequate space for LOCKT operation
NFSD: Get response size before operation for all RPCs
nfsd/callback: Drop a useless data copy when comparing sessionid
nfsd/callback: skip the callback tag
nfsd/callback: Cleanup callback cred on shutdown
nfsd/idmap: return nfserr_inval for 0-length names
SUNRPC/Cache: Always treat the invalid cache as unexpired
SUNRPC: Drop all entries from cache_detail when cache_purge()
svcrdma: Poll CQs in "workqueue" mode
svcrdma: Combine list fields in struct svc_rdma_op_ctxt
svcrdma: Remove unused sc_dto_q field
...
|
|
Now that %z is standartised in C99 there is no reason to support %Z.
Unlike %L it doesn't even make format strings smaller.
Use BUILD_BUG_ON in a couple ATM drivers.
In case anyone didn't notice lib/vsprintf.o is about half of SLUB which
is in my opinion is quite an achievement. Hopefully this patch inspires
someone else to trim vsprintf.c more.
Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We currently handle a client PROC_DESTROY request by turning it
CACHE_NEGATIVE, setting the expired time to now, and then waiting for
cache_clean to clean it up later. Since we forgot to set the cache's
nextcheck value, that could take up to 30 minutes. Also, though there's
probably no real bug in this case, setting CACHE_NEGATIVE directly like
this probably isn't a great idea in general.
So let's just remove the entry from the cache directly, and move this
bit of cache manipulation to a helper function.
Signed-off-by: Neil Brown <neilb@suse.com>
Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Olga Kornievskaia says: "I ran into this oops in the nfsd (below)
(4.10-rc3 kernel). To trigger this I had a client (unsuccessfully) try
to mount the server with krb5 where the server doesn't have the
rpcsec_gss_krb5 module built."
The problem is that rsci.cred is copied from a svc_cred structure that
gss_proxy didn't properly initialize. Fix that.
[120408.542387] general protection fault: 0000 [#1] SMP
...
[120408.565724] CPU: 0 PID: 3601 Comm: nfsd Not tainted 4.10.0-rc3+ #16
[120408.567037] Hardware name: VMware, Inc. VMware Virtual =
Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[120408.569225] task: ffff8800776f95c0 task.stack: ffffc90003d58000
[120408.570483] RIP: 0010:gss_mech_put+0xb/0x20 [auth_rpcgss]
...
[120408.584946] ? rsc_free+0x55/0x90 [auth_rpcgss]
[120408.585901] gss_proxy_save_rsc+0xb2/0x2a0 [auth_rpcgss]
[120408.587017] svcauth_gss_proxy_init+0x3cc/0x520 [auth_rpcgss]
[120408.588257] ? __enqueue_entity+0x6c/0x70
[120408.589101] svcauth_gss_accept+0x391/0xb90 [auth_rpcgss]
[120408.590212] ? try_to_wake_up+0x4a/0x360
[120408.591036] ? wake_up_process+0x15/0x20
[120408.592093] ? svc_xprt_do_enqueue+0x12e/0x2d0 [sunrpc]
[120408.593177] svc_authenticate+0xe1/0x100 [sunrpc]
[120408.594168] svc_process_common+0x203/0x710 [sunrpc]
[120408.595220] svc_process+0x105/0x1c0 [sunrpc]
[120408.596278] nfsd+0xe9/0x160 [nfsd]
[120408.597060] kthread+0x101/0x140
[120408.597734] ? nfsd_destroy+0x60/0x60 [nfsd]
[120408.598626] ? kthread_park+0x90/0x90
[120408.599448] ret_from_fork+0x22/0x30
Fixes: 1d658336b05f "SUNRPC: Add RPC based upcall mechanism for RPCGSS auth"
Cc: stable@vger.kernel.org
Cc: Simo Sorce <simo@redhat.com>
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Tested-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Context expiry times are in units of seconds since boot, not unix time.
The use of get_seconds() here therefore sets the expiry time decades in
the future. This prevents timely freeing of contexts destroyed by
client RPC_GSS_PROC_DESTROY requests. We'd still free them eventually
(when the module is unloaded or the container shut down), but a lot of
contexts could pile up before then.
Cc: stable@vger.kernel.org
Fixes: c5b29f885afe "sunrpc: use seconds since boot in expiry cache"
Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull nfsd updates from Bruce Fields:
"The one new feature is support for a new NFSv4.2 mode_umask attribute
that makes ACL inheritance a little more useful in environments that
default to restrictive umasks. Requires client-side support, also on
its way for 4.10.
Other than that, miscellaneous smaller fixes and cleanup, especially
to the server rdma code"
[ The client side of the umask attribute was merged yesterday ]
* tag 'nfsd-4.10' of git://linux-nfs.org/~bfields/linux:
nfsd: add support for the umask attribute
sunrpc: use DEFINE_SPINLOCK()
svcrdma: Further clean-up of svc_rdma_get_inv_rkey()
svcrdma: Break up dprintk format in svc_rdma_accept()
svcrdma: Remove unused variable in rdma_copy_tail()
svcrdma: Remove unused variables in xprt_rdma_bc_allocate()
svcrdma: Remove svc_rdma_op_ctxt::wc_status
svcrdma: Remove DMA map accounting
svcrdma: Remove BH-disabled spin locking in svc_rdma_send()
svcrdma: Renovate sendto chunk list parsing
svcauth_gss: Close connection when dropping an incoming message
svcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm
nfsd: constify reply_cache_stats_operations structure
nfsd: update workqueue creation
sunrpc: GFP_KERNEL should be GFP_NOFS in crypto code
nfsd: catch errors in decode_fattr earlier
nfsd: clean up supported attribute handling
nfsd: fix error handling for clients that fail to return the layout
nfsd: more robust allocation failure handling in nfsd_reply_cache_init
|
|
There are two problems with refcounting of auth_gss messages.
First, the reference on the pipe->pipe list (taken by a call
to rpc_queue_upcall()) is not counted. It seems to be
assumed that a message in pipe->pipe will always also be in
pipe->in_downcall, where it is correctly reference counted.
However there is no guaranty of this. I have a report of a
NULL dereferences in rpc_pipe_read() which suggests a msg
that has been freed is still on the pipe->pipe list.
One way I imagine this might happen is:
- message is queued for uid=U and auth->service=S1
- rpc.gssd reads this message and starts processing.
This removes the message from pipe->pipe
- message is queued for uid=U and auth->service=S2
- rpc.gssd replies to the first message. gss_pipe_downcall()
calls __gss_find_upcall(pipe, U, NULL) and it finds the
*second* message, as new messages are placed at the head
of ->in_downcall, and the service type is not checked.
- This second message is removed from ->in_downcall and freed
by gss_release_msg() (even though it is still on pipe->pipe)
- rpc.gssd tries to read another message, and dereferences a pointer
to this message that has just been freed.
I fix this by incrementing the reference count before calling
rpc_queue_upcall(), and decrementing it if that fails, or normally in
gss_pipe_destroy_msg().
It seems strange that the reply doesn't target the message more
precisely, but I don't know all the details. In any case, I think the
reference counting irregularity became a measureable bug when the
extra arg was added to __gss_find_upcall(), hence the Fixes: line
below.
The second problem is that if rpc_queue_upcall() fails, the new
message is not freed. gss_alloc_msg() set the ->count to 1,
gss_add_msg() increments this to 2, gss_unhash_msg() decrements to 1,
then the pointer is discarded so the memory never gets freed.
Fixes: 9130b8dbc6ac ("SUNRPC: allow for upcalls for same uid but different gss service")
Cc: stable@vger.kernel.org
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1011250
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
S5.3.3.1 of RFC 2203 requires that an incoming GSS-wrapped message
whose sequence number lies outside the current window is dropped.
The rationale is:
The reason for discarding requests silently is that the server
is unable to determine if the duplicate or out of range request
was due to a sequencing problem in the client, network, or the
operating system, or due to some quirk in routing, or a replay
attack by an intruder. Discarding the request allows the client
to recover after timing out, if indeed the duplication was
unintentional or well intended.
However, clients may rely on the server dropping the connection to
indicate that a retransmit is needed. Without a connection reset, a
client can wait forever without retransmitting, and the workload
just stops dead. I've reproduced this behavior by running xfstests
generic/323 on an NFSv4.0 mount with proto=rdma and sec=krb5i.
To address this issue, have the server close the connection when it
silently discards an incoming message due to a GSS sequence number
problem.
There are a few other places where the server will never reply.
Change those spots in a similar fashion.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Writes may depend on the auth_gss crypto code, so we shouldn't be
allocating with GFP_KERNEL there.
This still leaves some crypto_alloc_* calls which end up doing
GFP_KERNEL allocations in the crypto code. Those could probably done at
crypto import time.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
As of ac4e97abce9b "scatterlist: sg_set_buf() argument must be in linear
mapping", sg_set_buf hits a BUG when make_checksum_v2->xdr_process_buf,
among other callers, passes it memory on the stack.
We only need a scatterlist to pass this to the crypto code, and it seems
like overkill to require kmalloc'd memory just to encrypt a few bytes,
but for now this seems the best fix.
Many of these callers are in the NFS write paths, so we allocate with
GFP_NOFS. It might be possible to do without allocations here entirely,
but that would probably be a bigger project.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Pull NFS client updates from Anna Schumaker:
"Highlights include:
Stable bugfixes:
- sunrpc: fix writ espace race causing stalls
- NFS: Fix inode corruption in nfs_prime_dcache()
- NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
- NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
- NFSv4: Open state recovery must account for file permission changes
- NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
Features:
- Add support for tracking multiple layout types with an ordered list
- Add support for using multiple backchannel threads on the client
- Add support for pNFS file layout session trunking
- Delay xprtrdma use of DMA API (for device driver removal)
- Add support for xprtrdma remote invalidation
- Add support for larger xprtrdma inline thresholds
- Use a scatter/gather list for sending xprtrdma RPC calls
- Add support for the CB_NOTIFY_LOCK callback
- Improve hashing sunrpc auth_creds by using both uid and gid
Bugfixes:
- Fix xprtrdma use of DMA API
- Validate filenames before adding to the dcache
- Fix corruption of xdr->nwords in xdr_copy_to_scratch
- Fix setting buffer length in xdr_set_next_buffer()
- Don't deadlock the state manager on the SEQUENCE status flags
- Various delegation and stateid related fixes
- Retry operations if an interrupted slot receives EREMOTEIO
- Make nfs boot time y2038 safe"
* tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits)
NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
fs: nfs: Make nfs boot time y2038 safe
sunrpc: replace generic auth_cred hash with auth-specific function
sunrpc: add RPCSEC_GSS hash_cred() function
sunrpc: add auth_unix hash_cred() function
sunrpc: add generic_auth hash_cred() function
sunrpc: add hash_cred() function to rpc_authops struct
Retry operation on EREMOTEIO on an interrupted slot
pNFS: Fix atime updates on pNFS clients
sunrpc: queue work on system_power_efficient_wq
NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
NFSv4: If recovery failed for a specific open stateid, then don't retry
NFSv4: Fix retry issues with nfs41_test/free_stateid
NFSv4: Open state recovery must account for file permission changes
NFSv4: Mark the lock and open stateids as invalid after freeing them
NFSv4: Don't test open_stateid unless it is set
NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
NFSv4: Fix a race when updating an open_stateid
NFSv4: Fix a race in nfs_inode_reclaim_delegation()
...
|
|
Current supplementary groups code can massively overallocate memory and
is implemented in a way so that access to individual gid is done via 2D
array.
If number of gids is <= 32, memory allocation is more or less tolerable
(140/148 bytes). But if it is not, code allocates full page (!)
regardless and, what's even more fun, doesn't reuse small 32-entry
array.
2D array means dependent shifts, loads and LEAs without possibility to
optimize them (gid is never known at compile time).
All of the above is unnecessary. Switch to the usual
trailing-zero-len-array scheme. Memory is allocated with
kmalloc/vmalloc() and only as much as needed. Accesses become simpler
(LEA 8(gi,idx,4) or even without displacement).
Maximum number of gids is 65536 which translates to 256KB+8 bytes. I
think kernel can handle such allocation.
On my usual desktop system with whole 9 (nine) aux groups, struct
group_info shrinks from 148 bytes to 44 bytes, yay!
Nice side effects:
- "gi->gid[i]" is shorter than "GROUP_AT(gi, i)", less typing,
- fix little mess in net/ipv4/ping.c
should have been using GROUP_AT macro but this point becomes moot,
- aux group allocation is persistent and should be accounted as such.
Link: http://lkml.kernel.org/r/20160817201927.GA2096@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Vasily Kulikov <segoon@openwall.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add a hash_cred() function for RPCSEC_GSS, using only the
uid from the auth_cred.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
rsc_lookup steals the passed-in memory to avoid doing an allocation of
its own, so we can't just pass in a pointer to memory that someone else
is using.
If we really want to avoid allocation there then maybe we should
preallocate somwhere, or reference count these handles.
For now we should revert.
On occasion I see this on my server:
kernel: kernel BUG at /home/cel/src/linux/linux-2.6/mm/slub.c:3851!
kernel: invalid opcode: 0000 [#1] SMP
kernel: Modules linked in: cts rpcsec_gss_krb5 sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd btrfs xor iTCO_wdt iTCO_vendor_support raid6_pq pcspkr i2c_i801 i2c_smbus lpc_ich mfd_core mei_me sg mei shpchp wmi ioatdma ipmi_si ipmi_msghandler acpi_pad acpi_power_meter rpcrdma ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb mlx4_core ahci libahci libata ptp pps_core dca i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
kernel: CPU: 7 PID: 145 Comm: kworker/7:2 Not tainted 4.8.0-rc4-00006-g9d06b0b #15
kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
kernel: Workqueue: events do_cache_clean [sunrpc]
kernel: task: ffff8808541d8000 task.stack: ffff880854344000
kernel: RIP: 0010:[<ffffffff811e7075>] [<ffffffff811e7075>] kfree+0x155/0x180
kernel: RSP: 0018:ffff880854347d70 EFLAGS: 00010246
kernel: RAX: ffffea0020fe7660 RBX: ffff88083f9db064 RCX: 146ff0f9d5ec5600
kernel: RDX: 000077ff80000000 RSI: ffff880853f01500 RDI: ffff88083f9db064
kernel: RBP: ffff880854347d88 R08: ffff8808594ee000 R09: ffff88087fdd8780
kernel: R10: 0000000000000000 R11: ffffea0020fe76c0 R12: ffff880853f01500
kernel: R13: ffffffffa013cf76 R14: ffffffffa013cff0 R15: ffffffffa04253a0
kernel: FS: 0000000000000000(0000) GS:ffff88087fdc0000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007fed60b020c3 CR3: 0000000001c06000 CR4: 00000000001406e0
kernel: Stack:
kernel: ffff8808589f2f00 ffff880853f01500 0000000000000001 ffff880854347da0
kernel: ffffffffa013cf76 ffff8808589f2f00 ffff880854347db8 ffffffffa013d006
kernel: ffff8808589f2f20 ffff880854347e00 ffffffffa0406f60 0000000057c7044f
kernel: Call Trace:
kernel: [<ffffffffa013cf76>] rsc_free+0x16/0x90 [auth_rpcgss]
kernel: [<ffffffffa013d006>] rsc_put+0x16/0x30 [auth_rpcgss]
kernel: [<ffffffffa0406f60>] cache_clean+0x2e0/0x300 [sunrpc]
kernel: [<ffffffffa04073ee>] do_cache_clean+0xe/0x70 [sunrpc]
kernel: [<ffffffff8109a70f>] process_one_work+0x1ff/0x3b0
kernel: [<ffffffff8109b15c>] worker_thread+0x2bc/0x4a0
kernel: [<ffffffff8109aea0>] ? rescuer_thread+0x3a0/0x3a0
kernel: [<ffffffff810a0ba4>] kthread+0xe4/0xf0
kernel: [<ffffffff8169c47f>] ret_from_fork+0x1f/0x40
kernel: [<ffffffff810a0ac0>] ? kthread_stop+0x110/0x110
kernel: Code: f7 ff ff eb 3b 65 8b 05 da 30 e2 7e 89 c0 48 0f a3 05 a0 38 b8 00 0f 92 c0 84 c0 0f 85 d1 fe ff ff 0f 1f 44 00 00 e9 f5 fe ff ff <0f> 0b 49 8b 03 31 f6 f6 c4 40 0f 85 62 ff ff ff e9 61 ff ff ff
kernel: RIP [<ffffffff811e7075>] kfree+0x155/0x180
kernel: RSP <ffff880854347d70>
kernel: ---[ end trace 3fdec044969def26 ]---
It seems to be most common after a server reboot where a client has been
using a Kerberos mount, and reconnects to continue its workload.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
- Stable patch from Olga to fix RPCSEC_GSS upcalls when the same user
needs multiple different security services (e.g. krb5i and krb5p).
- Stable patch to fix a regression introduced by the use of
SO_REUSEPORT, and that prevented the use of multiple different NFS
versions to the same server.
- TCP socket reconnection timer fixes.
- Patch from Neil to disable the use of IPv6 temporary addresses"
* tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Cap the transport reconnection timer at 1/2 lease period
NFSv4: Cleanup the setting of the nfs4 lease period
SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout
SUNRPC: Fix reconnection timeouts
NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED
SUNRPC: disable the use of IPv6 temporary addresses.
SUNRPC: allow for upcalls for same uid but different gss service
SUNRPC: Fix up socket autodisconnect
SUNRPC: Handle EADDRNOTAVAIL on connection failures
|
|
It's possible to have simultaneous upcalls for the same UIDs but
different GSS service. In that case, we need to allow for the
upcall to gssd to proceed so that not the same context is used
by two different GSS services. Some servers lock the use of context
to the GSS service.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Pull nfsd updates from Bruce Fields:
"Highlights:
- Trond made a change to the server's tcp logic that allows a fast
client to better take advantage of high bandwidth networks, but may
increase the risk that a single client could starve other clients;
a new sunrpc.svc_rpc_per_connection_limit parameter should help
mitigate this in the (hopefully unlikely) event this becomes a
problem in practice.
- Tom Haynes added a minimal flex-layout pnfs server, which is of no
use in production for now--don't build it unless you're doing
client testing or further server development"
* tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux: (32 commits)
nfsd: remove some dead code in nfsd_create_locked()
nfsd: drop unnecessary MAY_EXEC check from create
nfsd: clean up bad-type check in nfsd_create_locked
nfsd: remove unnecessary positive-dentry check
nfsd: reorganize nfsd_create
nfsd: check d_can_lookup in fh_verify of directories
nfsd: remove redundant zero-length check from create
nfsd: Make creates return EEXIST instead of EACCES
SUNRPC: Detect immediate closure of accepted sockets
SUNRPC: accept() may return sockets that are still in SYN_RECV
nfsd: allow nfsd to advertise multiple layout types
nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
nfsd/blocklayout: Make sure calculate signature/designator length aligned
xfs: abstract block export operations from nfsd layouts
SUNRPC: Remove unused callback xpo_adjust_wspace()
SUNRPC: Change TCP socket space reservation
SUNRPC: Add a server side per-connection limit
SUNRPC: Micro optimisation for svc_data_ready
SUNRPC: Call the default socket callbacks instead of open coding
SUNRPC: lock the socket while detaching it
...
|
|
|
|
A generic_cred can be used to look up a unx_cred or a gss_cred, so it's
not really safe to use the the generic_cred->acred->ac_flags to store
the NO_CRKEY_TIMEOUT flag. A lookup for a unx_cred triggered while the
KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and
KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated
with the auth_cred to be in a state where they're perpetually doing 4K
NFS_FILE_SYNC writes.
This can be reproduced as follows:
1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys.
They do not need to be the same export, nor do they even need to be from
the same NFS server. Also, v3 is fine.
$ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5
$ sudo mount -o v3,sec=sys server2:/export /mnt/sys
2. As the normal user, before accessing the kerberized mount, kinit with
a short lifetime (but not so short that renewing the ticket would leave
you within the 4-minute window again by the time the original ticket
expires), e.g.
$ kinit -l 10m -r 60m
3. Do some I/O to the kerberized mount and verify that the writes are
wsize, UNSTABLE:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1
4. Wait until you're within 4 minutes of key expiry, then do some more
I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets
set. Verify that the writes are 4K, FILE_SYNC:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1
5. Now do some I/O to the sec=sys mount. This will cause
RPC_CRED_NO_CRKEY_TIMEOUT to be set:
$ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1
6. Writes for that user will now be permanently 4K, FILE_SYNC for that
user, regardless of which mount is being written to, until you reboot
the client. Renewing the kerberos ticket (assuming it hasn't already
expired) will have no effect. Grabbing a new kerberos ticket at this
point will have no effect either.
Move the flag to the auth->au_flags field (which is currently unused)
and rename it slightly to reflect that it's no longer associated with
the auth_cred->ac_flags. Add the rpc_auth to the arg list of
rpcauth_cred_key_to_expire and check the au_flags there too. Finally,
add the inode to the arg list of nfs_ctx_key_to_expire so we can
determine the rpc_auth to pass to rpcauth_cred_key_to_expire.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
GSS-Proxy doesn't produce very much debug logging at all. Printing out
the gss minor status will aid in troubleshooting if the
GSS_Accept_sec_context upcall fails.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Direct data placement is not allowed when using flavors that
guarantee integrity or privacy. When such security flavors are in
effect, don't allow the use of Read and Write chunks for moving
individual data items. All messages larger than the inline threshold
are sent via Long Call or Long Reply.
On my systems (CX-3 Pro on FDR), for small I/O operations, the use
of Long messages adds only around 5 usecs of latency in each
direction.
Note that when integrity or encryption is used, the host CPU touches
every byte in these messages. Even if it could be used, data
movement offload doesn't buy much in this case.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Pull NFS client updates from Anna Schumaker:
"Highlights include:
Features:
- Add support for the NFS v4.2 COPY operation
- Add support for NFS/RDMA over IPv6
Bugfixes and cleanups:
- Avoid race that crashes nfs_init_commit()
- Fix oops in callback path
- Fix LOCK/OPEN race when unlinking an open file
- Choose correct stateids when using delegations in setattr, read and
write
- Don't send empty SETATTR after OPEN_CREATE
- xprtrdma: Prevent server from writing a reply into memory client
has released
- xprtrdma: Support using Read list and Reply chunk in one RPC call"
* tag 'nfs-for-4.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (61 commits)
pnfs: pnfs_update_layout needs to consider if strict iomode checking is on
nfs/flexfiles: Use the layout segment for reading unless it a IOMODE_RW and reading is disabled
nfs/flexfiles: Helper function to detect FF_FLAGS_NO_READ_IO
nfs: avoid race that crashes nfs_init_commit
NFS: checking for NULL instead of IS_ERR() in nfs_commit_file()
pnfs: make pnfs_layout_process more robust
pnfs: rework LAYOUTGET retry handling
pnfs: lift retry logic from send_layoutget to pnfs_update_layout
pnfs: fix bad error handling in send_layoutget
flexfiles: add kerneldoc header to nfs4_ff_layout_prepare_ds
flexfiles: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTED
pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args
pnfs: keep track of the return sequence number in pnfs_layout_hdr
pnfs: record sequence in pnfs_layout_segment when it's created
pnfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit set
pNFS/flexfiles: When initing reads or writes, we might have to retry connecting to DSes
pNFS/flexfiles: When checking for available DSes, conditionally check for MDS io
pNFS/flexfile: Fix erroneous fall back to read/write through the MDS
NFS: Reclaim writes via writepage are opportunistic
NFSv4: Use the right stateid for delegations in setattr, read and write
...
|
|
The length of the GSS MIC token need not be a multiple of four bytes.
It is then padded by XDR to a multiple of 4 B, but unwrap_integ_data()
would previously only trim mic.len + 4 B. The remaining up to three
bytes would then trigger a check in nfs4svc_decode_compoundargs(),
leading to a "garbage args" error and mount failure:
nfs4svc_decode_compoundargs: compound not properly padded!
nfsd: failed to decode arguments!
This would prevent older clients using the pre-RFC 4121 MIC format
(37-byte MIC including a 9-byte OID) from mounting exports from v3.9+
servers using krb5i.
The trimming was introduced by commit 4c190e2f913f ("sunrpc: trim off
trailing checksum before returning decrypted or integrity authenticated
buffer").
Fixes: 4c190e2f913f "unrpc: trim off trailing checksum..."
Signed-off-by: Tomáš Trnka <ttrnka@mail.muni.cz>
Cc: stable@vger.kernel.org
Acked-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
We need to be able to call the generic_cred creator from different
contexts. Add a gfp_t parm to the crcreate operation and to
rpcauth_lookup_credcache. For now, we just push the gfp_t parms up
one level to the *_lookup_cred functions.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|