summaryrefslogtreecommitdiff
path: root/fs/nfsd
AgeCommit message (Collapse)Author
2017-02-17NFSD: Get response size before operation for all RPCsKinglong Mee
NFSD usess PAGE_SIZE as the reply size estimate for RPCs which don't support op_rsize_bop(), A PAGE_SIZE (4096) is larger than many real response sizes, eg, access (op_encode_hdr_size + 2), seek (op_encode_hdr_size + 3). This patch just adds op_rsize_bop() for all RPCs getting response size. An overestimate is generally safe but the tighter estimates are probably better. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-17nfsd/callback: Drop a useless data copy when comparing sessionidKinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-17nfsd/callback: skip the callback tagKinglong Mee
The callback tag is NULL, and hdr->nops is unused too right now, but. But if we were to ever test with a nonzero callback tag, nops will get a bad value. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-17nfsd/callback: Cleanup callback cred on shutdownKinglong Mee
The rpccred gotten from rpc_lookup_machine_cred() should be put when state is shutdown. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-17nfsd/idmap: return nfserr_inval for 0-length namesKinglong Mee
Tigran Mkrtchyan's new pynfs testcases for zero length principals fail: SATT16 st_setattr.testEmptyPrincipal : FAILURE Setting empty owner should return NFS4ERR_INVAL, instead got NFS4ERR_BADOWNER SATT17 st_setattr.testEmptyGroupPrincipal : FAILURE Setting empty owner_group should return NFS4ERR_INVAL, instead got NFS4ERR_BADOWNER This patch checks the principal and returns nfserr_inval directly. It could check after decoding in nfs4xdr.c, but it's simpler to do it in nfsd_map_xxxx. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-17Merge branch 'for-4.11/next' into for-4.11/linus-mergeJens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-09nfsd: Revert "nfsd: special case truncates some more"J. Bruce Fields
This patch incorrectly attempted nested mnt_want_write, and incorrectly disabled nfsd's owner override for truncate. We'll fix those problems and make another attempt soon, for the moment I think the safest is to revert. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-06NFSDv4: use export cache flushtime for changeid on V4ROOT objects.NeilBrown
If you change the set of filesystems that are exported, then the contents of various directories in the NFSv4 pseudo-root is likely to change. However the change-id of those directories is currently tied to the underlying directory, so the client may not see the changes in a timely fashion. This patch changes the change-id number to be derived from the "flush_time" of the export cache. Whenever any changes are made to the set of exported filesystems, this flush_time is updated. The result is that clients see changes to the set of exported filesystems much more quickly, often immediately. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-31block: fold cmd_type into the REQ_OP_ spaceChristoph Hellwig
Instead of keeping two levels of indirection for requests types, fold it all into the operations. The little caveat here is that previously cmd_type only applied to struct request, while the request and bio op fields were set to plain REQ_OP_READ/WRITE even for passthrough operations. Instead this patch adds new REQ_OP_* for SCSI passthrough and driver private requests, althought it has to add two for each so that we can communicate the data in/out nature of the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31block: make scsi_request and scsi ioctl support optionalChristoph Hellwig
We only need this code to support scsi, ide, cciss and virtio. And at least for virtio it's a deprecated feature to start with. This should shrink the kernel size for embedded device that only use, say eMMC a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31nfsd: opt in to labeled nfs per exportJ. Bruce Fields
Currently turning on NFSv4.2 results in 4.2 clients suddenly seeing the individual file labels as they're set on the server. This is not what they've previously seen, and not appropriate in may cases. (In particular, if clients have heterogenous security policies then one client's labels may not even make sense to another.) Labeled NFS should be opted in only in those cases when the administrator knows it makes sense. It's helpful to be able to turn 4.2 on by default, and otherwise the protocol upgrade seems free of regressions. So, default labeled NFS to off and provide an export flag to reenable it. Users wanting labeled NFS support on an export will henceforth need to: - make sure 4.2 support is enabled on client and server (as before), and - upgrade the server nfs-utils to a version supporting the new "security_label" export flag. - set that "security_label" flag on the export. This is commit may be seen as a regression to anyone currently depending on security labels. We believe those cases are currently rare. Reported-by: tibbs@math.uh.edu Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-31nfsd: constify nfsd_suppatttrsJ. Bruce Fields
To keep me from accidentally writing to this again.... Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-31nfsd: initialize sin6_scope_id in nfsd_inet6addr_event()Scott Mayhew
I noticed this was missing when I was testing with link local addresses. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-31NFSD: Remove unused value inode in nfsd_vfs_writeKinglong Mee
This is just cleanup, no change in functionality. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-31NFSD: cleanup dead codes and values in nfsd_writeKinglong Mee
This is just cleanup, no change in functionality. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-31NFSD: pass an integer for stable type to nfsd_vfs_writeKinglong Mee
After fae5096ad217 "nfsd: assume writeable exportabled filesystems have f_sync" we no longer modify this argument. This is just cleanup, no change in functionality. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-31NFSD: correctly range-check v4.x minor version when setting versions.NeilBrown
Writing to /proc/fs/nfsd/versions allows individual major versions and NFSv4 minor versions to be enabled or disabled. However NFSv4.0 cannot currently be disabled, thought there is no good reason. Also the minor number is parsed as a 'long' but used as an 'int' so '4294967297' will be incorrectly treated as '1'. This patch removes the test on 'minor == 0' and switches to kstrtouint() to get correct range checking. When reading from /proc/fs/nfsd/versions, 4.0 is current not reported. To allow the disabling for v4.0 to be visible, while maintaining backward compatibility, change code to report "-4.0" if appropriate, but not "+4.0". Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-31nfsd: special case truncates some moreChristoph Hellwig
Both the NFS protocols and the Linux VFS use a setattr operation with a bitmap of attributs to set to set various file attributes including the file size and the uid/gid. The Linux syscalls never mixes size updates with unrelated updates like the uid/gid, and some file systems like XFS and GFS2 rely on the fact that truncates might not update random other attributes, and many other file systems handle the case but do not update the different attributes in the same transaction. NFSD on the other hand passes the attributes it gets on the wire more or less directly through to the VFS, leading to updates the file systems don't expect. XFS at least has an assert on the allowed attributes, which caught an unusual NFS client setting the size and group at the same time. To handle this issue properly this switches nfsd to call vfs_truncate for size changes, and then handle all other attributes through notify_change. As a side effect this also means less boilerplace code around the size change as we can now reuse the VFS code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-31NFSD: Fix a null reference case in find_or_create_lock_stateid()Kinglong Mee
nfsd assigns the nfs4_free_lock_stateid to .sc_free in init_lock_stateid(). If nfsd doesn't go through init_lock_stateid() and put stateid at end, there is a NULL reference to .sc_free when calling nfs4_put_stid(ns). This patch let the nfs4_stid.sc_free assignment to nfs4_alloc_stid(). Cc: stable@vger.kernel.org Fixes: 356a95ece7aa "nfsd: clean up races in lock stateid searching..." Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-27block: split scsi_request out of struct requestChristoph Hellwig
And require all drivers that want to support BLOCK_PC to allocate it as the first thing of their private data. To support this the legacy IDE and BSG code is switched to set cmd_size on their queues to let the block layer allocate the additional space. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-19Make static usermode helper binaries constantGreg Kroah-Hartman
There are a number of usermode helper binaries that are "hard coded" in the kernel today, so mark them as "const" to make it harder for someone to change where the variables point to. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Sailer <t.sailer@alumni.ethz.ch> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Johan Hovold <johan@kernel.org> Cc: Alex Elder <elder@kernel.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-12nfsd: fix supported attributes for acl & labelsJ. Bruce Fields
Oops--in 916d2d844afd I moved some constants into an array for convenience, but here I'm accidentally writing to that array. The effect is that if you ever encounter a filesystem lacking support for ACLs or security labels, then all queries of supported attributes will report that attribute as unsupported from then on. Fixes: 916d2d844afd "nfsd: clean up supported attribute handling" Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-17Merge uncontroversial parts of branch 'readlink' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull partial readlink cleanups from Miklos Szeredi. This is the uncontroversial part of the readlink cleanup patch-set that simplifies the default readlink handling. Miklos and Al are still discussing the rest of the series. * git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: vfs: make generic_readlink() static vfs: remove ".readlink = generic_readlink" assignments vfs: default to generic_readlink() vfs: replace calling i_op->readlink with vfs_readlink() proc/self: use generic_readlink ecryptfs: use vfs_get_link() bad_inode: add missing i_op initializers
2016-12-16Merge branch 'overlayfs-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "This update contains: - try to clone on copy-up - allow renaming a directory - split source into managable chunks - misc cleanups and fixes It does not contain the read-only fd data inconsistency fix, which Al didn't like. I'll leave that to the next year..." * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (36 commits) ovl: fix reStructuredText syntax errors in documentation ovl: fix return value of ovl_fill_super ovl: clean up kstat usage ovl: fold ovl_copy_up_truncate() into ovl_copy_up() ovl: create directories inside merged parent opaque ovl: opaque cleanup ovl: show redirect_dir mount option ovl: allow setting max size of redirect ovl: allow redirect_dir to default to "on" ovl: check for emptiness of redirect dir ovl: redirect on rename-dir ovl: lookup redirects ovl: consolidate lookup for underlying layers ovl: fix nested overlayfs mount ovl: check namelen ovl: split super.c ovl: use d_is_dir() ovl: simplify lookup ovl: check lower existence of rename target ovl: rename: simplify handling of lower/merged directory ...
2016-12-16Merge tag 'nfsd-4.10' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "The one new feature is support for a new NFSv4.2 mode_umask attribute that makes ACL inheritance a little more useful in environments that default to restrictive umasks. Requires client-side support, also on its way for 4.10. Other than that, miscellaneous smaller fixes and cleanup, especially to the server rdma code" [ The client side of the umask attribute was merged yesterday ] * tag 'nfsd-4.10' of git://linux-nfs.org/~bfields/linux: nfsd: add support for the umask attribute sunrpc: use DEFINE_SPINLOCK() svcrdma: Further clean-up of svc_rdma_get_inv_rkey() svcrdma: Break up dprintk format in svc_rdma_accept() svcrdma: Remove unused variable in rdma_copy_tail() svcrdma: Remove unused variables in xprt_rdma_bc_allocate() svcrdma: Remove svc_rdma_op_ctxt::wc_status svcrdma: Remove DMA map accounting svcrdma: Remove BH-disabled spin locking in svc_rdma_send() svcrdma: Renovate sendto chunk list parsing svcauth_gss: Close connection when dropping an incoming message svcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm nfsd: constify reply_cache_stats_operations structure nfsd: update workqueue creation sunrpc: GFP_KERNEL should be GFP_NOFS in crypto code nfsd: catch errors in decode_fattr earlier nfsd: clean up supported attribute handling nfsd: fix error handling for clients that fail to return the layout nfsd: more robust allocation failure handling in nfsd_reply_cache_init
2016-12-16vfs: call vfs_clone_file_range() under freeze protectionAmir Goldstein
Move sb_start_write()/sb_end_write() out of the vfs helper and up into the ioctl handler. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-15nfsd: add support for the umask attributeAndreas Gruenbacher
Clients can set the umask attribute when creating files to cause the server to apply it always except when inheriting permissions from the parent directory. That way, the new files will end up with the same permissions as files created locally. See https://tools.ietf.org/html/draft-ietf-nfsv4-umask-02 for more details. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-12-09vfs: replace calling i_op->readlink with vfs_readlink()Miklos Szeredi
Also check d_is_symlink() in callers instead of inode->i_op->readlink because following patches will allow NULL ->readlink for symlinks. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-11-18netns: make struct pernet_operations::id unsigned intAlexey Dobriyan
Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-14nfsd: constify reply_cache_stats_operations structureJulia Lawall
reply_cache_stats_operations, of type struct file_operations, is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-14nfsd: update workqueue creationJ. Bruce Fields
No real change in functionality, but the old interface seems to be deprecated. We don't actually care about ordering necessarily, but we do depend on running at most one work item at a time: nfsd4_process_cb_update() assumes that no other thread is running it, and that no new callbacks are starting while it's running. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-01nfsd: catch errors in decode_fattr earlierJ. Bruce Fields
3c8e03166ae2 "NFSv4: do exact check about attribute specified" fixed some handling of unsupported-attribute errors, but it also delayed checking for unwriteable attributes till after we decode them. This could lead to odd behavior in the case a client attemps to set an attribute we don't know about followed by one we try to parse. In that case the parser for the known attribute will attempt to parse the unknown attribute. It should fail in some safe way, but the error might at least be incorrect (probably bad_xdr instead of inval). So, it's better to do that check at the start. As far as I know this doesn't cause any problems with current clients but it might be a minor issue e.g. if we encounter a future client that supports a new attribute that we currently don't. Cc: Yu Zhiguo <yuzg@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-01nfsd: clean up supported attribute handlingJ. Bruce Fields
Minor cleanup, no change in behavior. Provide helpers for some common attribute bitmap operations. Drop some comments that just echo the code. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-01nfsd: fix error handling for clients that fail to return the layoutJeff Layton
Currently, when the client continually returns NFS4ERR_DELAY on a CB_LAYOUTRECALL, we'll give up trying to retransmit after two lease periods, but leave the layout in place. What we really need to do here is fence the client in this case. Have it fall through to that code in that case instead of into the NFS4ERR_NOMATCHING_LAYOUT case. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-01nfsd: more robust allocation failure handling in nfsd_reply_cache_initJeff Layton
Currently, we try to allocate the cache as a single, large chunk, which can fail if no big chunks of memory are available. We _do_ try to size it according to the amount of memory in the box, but if the server is started well after boot time, then the allocation can fail due to memory fragmentation. Fall back to doing a vzalloc if the kcalloc fails, and switch the shutdown code to do a kvfree to handle freeing correctly. Reported-by: Olaf Hering <olaf@aepfle.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-01nfsd: Fix general protection fault in release_lock_stateid()Chuck Lever
When I push NFSv4.1 / RDMA hard, (xfstests generic/089, for example), I get this crash on the server: Oct 28 22:04:30 klimt kernel: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC Oct 28 22:04:30 klimt kernel: Modules linked in: cts rpcsec_gss_krb5 iTCO_wdt iTCO_vendor_support sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm btrfs irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd xor pcspkr raid6_pq i2c_i801 i2c_smbus lpc_ich mfd_core sg mei_me mei ioatdma shpchp wmi ipmi_si ipmi_msghandler rpcrdma ib_ipoib rdma_ucm acpi_power_meter acpi_pad ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb ahci libahci ptp mlx4_core pps_core dca libata i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod Oct 28 22:04:30 klimt kernel: CPU: 7 PID: 1558 Comm: nfsd Not tainted 4.9.0-rc2-00005-g82cd754 #8 Oct 28 22:04:30 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015 Oct 28 22:04:30 klimt kernel: task: ffff880835c3a100 task.stack: ffff8808420d8000 Oct 28 22:04:30 klimt kernel: RIP: 0010:[<ffffffffa05a759f>] [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd] Oct 28 22:04:30 klimt kernel: RSP: 0018:ffff8808420dbce0 EFLAGS: 00010246 Oct 28 22:04:30 klimt kernel: RAX: ffff88084e6660f0 RBX: ffff88084e667020 RCX: 0000000000000000 Oct 28 22:04:30 klimt kernel: RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88084e667020 Oct 28 22:04:30 klimt kernel: RBP: ffff8808420dbcf8 R08: 0000000000000001 R09: 0000000000000000 Oct 28 22:04:30 klimt kernel: R10: ffff880835c3a100 R11: ffff880835c3aca8 R12: 6b6b6b6b6b6b6b6b Oct 28 22:04:30 klimt kernel: R13: ffff88084e6670d8 R14: ffff880835f546f0 R15: ffff880835f1c548 Oct 28 22:04:30 klimt kernel: FS: 0000000000000000(0000) GS:ffff88087bdc0000(0000) knlGS:0000000000000000 Oct 28 22:04:30 klimt kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 28 22:04:30 klimt kernel: CR2: 00007ff020389000 CR3: 0000000001c06000 CR4: 00000000001406e0 Oct 28 22:04:30 klimt kernel: Stack: Oct 28 22:04:30 klimt kernel: ffff88084e667020 0000000000000000 ffff88084e6670d8 ffff8808420dbd20 Oct 28 22:04:30 klimt kernel: ffffffffa05ac80d ffff880835f54548 ffff88084e640008 ffff880835f545b0 Oct 28 22:04:30 klimt kernel: ffff8808420dbd70 ffffffffa059803d ffff880835f1c768 0000000000000870 Oct 28 22:04:30 klimt kernel: Call Trace: Oct 28 22:04:30 klimt kernel: [<ffffffffa05ac80d>] nfsd4_free_stateid+0xfd/0x1b0 [nfsd] Oct 28 22:04:30 klimt kernel: [<ffffffffa059803d>] nfsd4_proc_compound+0x40d/0x690 [nfsd] Oct 28 22:04:30 klimt kernel: [<ffffffffa0583114>] nfsd_dispatch+0xd4/0x1d0 [nfsd] Oct 28 22:04:30 klimt kernel: [<ffffffffa047bbf9>] svc_process_common+0x3d9/0x700 [sunrpc] Oct 28 22:04:30 klimt kernel: [<ffffffffa047ca64>] svc_process+0xf4/0x330 [sunrpc] Oct 28 22:04:30 klimt kernel: [<ffffffffa05827ca>] nfsd+0xfa/0x160 [nfsd] Oct 28 22:04:30 klimt kernel: [<ffffffffa05826d0>] ? nfsd_destroy+0x170/0x170 [nfsd] Oct 28 22:04:30 klimt kernel: [<ffffffff810b367b>] kthread+0x10b/0x120 Oct 28 22:04:30 klimt kernel: [<ffffffff810b3570>] ? kthread_stop+0x280/0x280 Oct 28 22:04:30 klimt kernel: [<ffffffff8174e8ba>] ret_from_fork+0x2a/0x40 Oct 28 22:04:30 klimt kernel: Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 87 b0 00 00 00 48 89 fb 4c 8b a0 98 00 00 00 <49> 8b 44 24 20 48 8d b8 80 03 00 00 e8 10 66 1a e1 48 89 df e8 Oct 28 22:04:30 klimt kernel: RIP [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd] Oct 28 22:04:30 klimt kernel: RSP <ffff8808420dbce0> Oct 28 22:04:30 klimt kernel: ---[ end trace cf5d0b371973e167 ]--- Jeff Layton says: > Hm...now that I look though, this is a little suspicious: > > struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner); > > I wonder if it's possible for the openstateid to have already been > destroyed at this point. > > We might be better off doing something like this to get the client pointer: > > stp->st_stid.sc_client; > > ...which should be more direct and less dependent on other stateids > staying valid. With the suggested change, I am no longer able to reproduce the above oops. v2: Fix unhash_lock_stateid() as well Fix-suggested-by: Jeff Layton <jlayton@redhat.com> Fixes: 42691398be08 ('nfsd: Fix race between FREE_STATEID and LOCK') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-10-24nfsd: move blocked lock handling under a dedicated spinlockJeff Layton
Bruce was hitting some lockdep warnings in testing, showing that we could hit a deadlock with the new CB_NOTIFY_LOCK handling, involving a rather complex situation involving four different spinlocks. The crux of the matter is that we end up taking the nn->client_lock in the lm_notify handler. The simplest fix is to just declare a new per-nfsd_net spinlock to protect the new CB_NOTIFY_LOCK structures. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-10-13Merge tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Some RDMA work and some good bugfixes, and two new features that could benefit from user testing: - Anna Schumacker contributed a simple NFSv4.2 COPY implementation. COPY is already supported on the client side, so a call to copy_file_range() on a recent client should now result in a server-side copy that doesn't require all the data to make a round trip to the client and back. - Jeff Layton implemented callbacks to notify clients when contended locks become available, which should reduce latency on workloads with contended locks" * tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux: NFSD: Implement the COPY call nfsd: handle EUCLEAN nfsd: only WARN once on unmapped errors exportfs: be careful to only return expected errors. nfsd4: setclientid_confirm with unmatched verifier should fail nfsd: randomize SETCLIENTID reply to help distinguish servers nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant nfsd: add a LRU list for blocked locks nfsd: have nfsd4_lock use blocking locks for v4.1+ locks nfsd: plumb in a CB_NOTIFY_LOCK operation NFSD: fix corruption in notifier registration svcrdma: support Remote Invalidation svcrdma: Server-side support for rpcrdma_connect_private rpcrdma: RDMA/CM private message data structure svcrdma: Skip put_page() when send_reply() fails svcrdma: Tail iovec leaves an orphaned DMA mapping nfsd: fix dprintk in nfsd4_encode_getdeviceinfo nfsd: eliminate cb_minorversion field nfsd: don't set a FL_LAYOUT lease for flexfiles layouts
2016-10-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: ">rename2() work from Miklos + current_time() from Deepa" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Replace current_fs_time() with current_time() fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps fs: Replace CURRENT_TIME with current_time() for inode timestamps fs: proc: Delete inode time initializations in proc_alloc_inode() vfs: Add current_time() api vfs: add note about i_op->rename changes to porting fs: rename "rename2" i_op to "rename" vfs: remove unused i_op->rename fs: make remaining filesystems use .rename2 libfs: support RENAME_NOREPLACE in simple_rename() fs: support RENAME_NOREPLACE for local filesystems ncpfs: fix unused variable warning
2016-10-10Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted misc bits and pieces. There are several single-topic branches left after this (rename2 series from Miklos, current_time series from Deepa Dinamani, xattr series from Andreas, uaccess stuff from from me) and I'd prefer to send those separately" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits) proc: switch auxv to use of __mem_open() hpfs: support FIEMAP cifs: get rid of unused arguments of CIFSSMBWrite() posix_acl: uapi header split posix_acl: xattr representation cleanups fs/aio.c: eliminate redundant loads in put_aio_ring_file fs/internal.h: add const to ns_dentry_operations declaration compat: remove compat_printk() fs/buffer.c: make __getblk_slow() static proc: unsigned file descriptors fs/file: more unsigned file descriptors fs: compat: remove redundant check of nr_segs cachefiles: Fix attempt to read i_blocks after deleting file [ver #2] cifs: don't use memcpy() to copy struct iov_iter get rid of separate multipage fault-in primitives fs: Avoid premature clearing of capabilities fs: Give dentry to inode_change_ok() instead of inode fuse: Propagate dentry down to inode_change_ok() ceph: Propagate dentry down to inode_change_ok() xfs: Propagate dentry down to inode_change_ok() ...
2016-10-08Merge remote-tracking branch 'jk/vfs' into work.miscAl Viro
2016-10-07cred: simpler, 1D supplementary groupsAlexey Dobriyan
Current supplementary groups code can massively overallocate memory and is implemented in a way so that access to individual gid is done via 2D array. If number of gids is <= 32, memory allocation is more or less tolerable (140/148 bytes). But if it is not, code allocates full page (!) regardless and, what's even more fun, doesn't reuse small 32-entry array. 2D array means dependent shifts, loads and LEAs without possibility to optimize them (gid is never known at compile time). All of the above is unnecessary. Switch to the usual trailing-zero-len-array scheme. Memory is allocated with kmalloc/vmalloc() and only as much as needed. Accesses become simpler (LEA 8(gi,idx,4) or even without displacement). Maximum number of gids is 65536 which translates to 256KB+8 bytes. I think kernel can handle such allocation. On my usual desktop system with whole 9 (nine) aux groups, struct group_info shrinks from 148 bytes to 44 bytes, yay! Nice side effects: - "gi->gid[i]" is shorter than "GROUP_AT(gi, i)", less typing, - fix little mess in net/ipv4/ping.c should have been using GROUP_AT macro but this point becomes moot, - aux group allocation is persistent and should be accounted as such. Link: http://lkml.kernel.org/r/20160817201927.GA2096@p183.telecom.by Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Vasily Kulikov <segoon@openwall.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07NFSD: Implement the COPY callAnna Schumaker
I only implemented the sync version of this call, since it's the easiest. I can simply call vfs_copy_range() and have the vfs do the right thing for the filesystem being exported. Signed-off-by: Anna Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-10-07nfsd: handle EUCLEANJ. Bruce Fields
Eric Sandeen reports that xfs can return this if filesystem corruption prevented completing the operation. Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-10-07nfsd: only WARN once on unmapped errorsJ. Bruce Fields
No need to spam the logs here. The only drawback is losing information if we ever encounter two different unmapped errors, but in practice we've rarely see even one. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-27fs: Replace current_fs_time() with current_time()Deepa Dinamani
current_fs_time() uses struct super_block* as an argument. As per Linus's suggestion, this is changed to take struct inode* as a parameter instead. This is because the function is primarily meant for vfs inode timestamps. Also the function was renamed as per Arnd's suggestion. Change all calls to current_fs_time() to use the new current_time() function instead. current_fs_time() will be deleted. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-26nfsd4: setclientid_confirm with unmatched verifier should failJ. Bruce Fields
A setclientid_confirm with (clientid, verifier) both matching an existing confirmed record is assumed to be a replay, but if the verifier doesn't match, it shouldn't be. This would be a very rare case, except that clients following https://tools.ietf.org/html/rfc7931#section-5.8 may depend on the failure. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-26nfsd: randomize SETCLIENTID reply to help distinguish serversJ. Bruce Fields
NFSv4.1 has built-in trunking support that allows a client to determine whether two connections to two different IP addresses are actually to the same server. NFSv4.0 does not, but RFC 7931 attempts to provide clients a means to do this, basically by performing a SETCLIENTID to one address and confirming it with a SETCLIENTID_CONFIRM to the other. Linux clients since 05f4c350ee02 "NFS: Discover NFSv4 server trunking when mounting" implement a variation on this suggestion. It is possible that other clients do too. This depends on the clientid and verifier not being accepted by an unrelated server. Since both are 64-bit values, that would be very unlikely if they were random numbers. But they aren't: knfsd generates the 64-bit clientid by concatenating the 32-bit boot time (in seconds) and a counter. This makes collisions between clientids generated by the same server extremely unlikely. But collisions are very likely between clientids generated by servers that boot at the same time, and it's quite common for multiple servers to boot at the same time. The verifier is a concatenation of the SETCLIENTID time (in seconds) and a counter, so again collisions between different servers are likely if multiple SETCLIENTIDs are done at the same time, which is a common case. Therefore recent NFSv4.0 clients may decide two different servers are really the same, and mount a filesystem from the wrong server. Fortunately the Linux client, since 55b9df93ddd6 "nfsv4/v4.1: Verify the client owner id during trunking detection", only does this when given the non-default "migration" mount option. The fault is really with RFC 7931, and needs a client fix, but in the meantime we can mitigate the chance of these collisions by randomizing the starting value of the counters used to generate clientids and verifiers. Reported-by: Frank Sorenson <fsorenso@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-26nfsd: set the MAY_NOTIFY_LOCK flag in OPEN repliesJeff Layton
If we are using v4.1+, then we can send notification when contended locks become free. Inform the client of that fact. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>