summaryrefslogtreecommitdiff
path: root/fs/nfsd/nfscache.c
AgeCommit message (Collapse)Author
2015-03-20NFSD: Error out when register_shrinker() failKinglong Mee
If register_shrinker() failed, nfsd will cause a NULL pointer access as, [ 9250.875465] nfsd: last server has exited, flushing export cache [ 9251.427270] BUG: unable to handle kernel NULL pointer dereference at (null) [ 9251.427393] IP: [<ffffffff8136fc29>] __list_del_entry+0x29/0xd0 [ 9251.427579] PGD 13e4d067 PUD 13e4c067 PMD 0 [ 9251.427633] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 9251.427706] Modules linked in: ip6t_rpfilter ip6t_REJECT bnep bluetooth xt_conntrack cfg80211 rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw btrfs xfs microcode ppdev serio_raw pcspkr xor libcrc32c raid6_pq e1000 parport_pc parport i2c_piix4 i2c_core nfsd(OE-) auth_rpcgss nfs_acl lockd sunrpc(E) ata_generic pata_acpi [ 9251.428240] CPU: 0 PID: 1557 Comm: rmmod Tainted: G OE 3.16.0-rc2+ #22 [ 9251.428366] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 9251.428496] task: ffff880000849540 ti: ffff8800136f4000 task.ti: ffff8800136f4000 [ 9251.428593] RIP: 0010:[<ffffffff8136fc29>] [<ffffffff8136fc29>] __list_del_entry+0x29/0xd0 [ 9251.428696] RSP: 0018:ffff8800136f7ea0 EFLAGS: 00010207 [ 9251.428751] RAX: 0000000000000000 RBX: ffffffffa0116d48 RCX: dead000000200200 [ 9251.428814] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0116d48 [ 9251.428876] RBP: ffff8800136f7ea0 R08: ffff8800136f4000 R09: 0000000000000001 [ 9251.428939] R10: 8080808080808080 R11: 0000000000000000 R12: ffffffffa011a5a0 [ 9251.429002] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000018ac090 [ 9251.429064] FS: 00007fb9acef0740(0000) GS:ffff88003fa00000(0000) knlGS:0000000000000000 [ 9251.429164] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9251.429221] CR2: 0000000000000000 CR3: 0000000031a17000 CR4: 00000000001407f0 [ 9251.429306] Stack: [ 9251.429410] ffff8800136f7eb8 ffffffff8136fcdd ffffffffa0116d20 ffff8800136f7ed0 [ 9251.429511] ffffffff8118a0f2 0000000000000000 ffff8800136f7ee0 ffffffffa00eb765 [ 9251.429610] ffff8800136f7ef0 ffffffffa010e93c ffff8800136f7f78 ffffffff81104ac2 [ 9251.429709] Call Trace: [ 9251.429755] [<ffffffff8136fcdd>] list_del+0xd/0x30 [ 9251.429896] [<ffffffff8118a0f2>] unregister_shrinker+0x22/0x40 [ 9251.430037] [<ffffffffa00eb765>] nfsd_reply_cache_shutdown+0x15/0x90 [nfsd] [ 9251.430106] [<ffffffffa010e93c>] exit_nfsd+0x9/0x6cd [nfsd] [ 9251.430192] [<ffffffff81104ac2>] SyS_delete_module+0x162/0x200 [ 9251.430280] [<ffffffff81013b69>] ? do_notify_resume+0x59/0x90 [ 9251.430395] [<ffffffff816f2369>] system_call_fastpath+0x16/0x1b [ 9251.430457] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08 [ 9251.430691] RIP [<ffffffff8136fc29>] __list_del_entry+0x29/0xd0 [ 9251.430755] RSP <ffff8800136f7ea0> [ 9251.430805] CR2: 0000000000000000 [ 9251.431033] ---[ end trace 080f3050d082b4ea ]--- Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to itJeff Layton
In a later patch, we're going to need some atomic bit flags. Since that field will need to be an unsigned long, we mitigate that space consumption by migrating some other bitflags to the new field. Start with the rq_secure flag. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-17nfsd: Reorder nfsd_cache_match to check more powerful discriminators firstTrond Myklebust
We would normally expect the xid and the checksum to be the best discriminators. Check them before looking at the procedure number, etc. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-17nfsd: split DRC global spinlock into per-bucket locksTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-17nfsd: convert num_drc_entries to an atomic_tTrond Myklebust
...so we can remove the spinlocking around it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-17nfsd: Remove the cache_hash listTrond Myklebust
Now that the lru list is per-bucket, we don't need a second list for searches. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-17nfsd: convert the lru list into a per-bucket thingTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-17nfsd: Clean up drc cache in preparation for global spinlock eliminationTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-23nfsd: clean up sparse endianness warnings in nfscache.cJeff Layton
We currently hash the XID to determine a hash bucket to use for the reply cache entry, which is fed into hash_32 without byte-swapping it. Add __force to make sparse happy, and add some comments to explain why. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-06nfsd: don't halt scanning the DRC LRU list when there's an RC_INPROG entryJeff Layton
Currently, the DRC cache pruner will stop scanning the list when it hits an entry that is RC_INPROG. It's possible however for a call to take a *very* long time. In that case, we don't want it to block other entries from being pruned if they are expired or we need to trim the cache to get back under the limit. Fix the DRC cache pruner to just ignore RC_INPROG entries. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-12-11nfsd: don't try to reuse an expired DRC entry off the listJeff Layton
Currently when we are processing a request, we try to scrape an expired or over-limit entry off the list in preference to allocating a new one from the slab. This is unnecessarily complicated. Just use the slab layer. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-12-10nfsd: when reusing an existing repcache entry, unhash it firstJeff Layton
The DRC code will attempt to reuse an existing, expired cache entry in preference to allocating a new one. It'll then search the cache, and if it gets a hit it'll then free the cache entry that it was going to reuse. The cache code doesn't unhash the entry that it's going to reuse however, so it's possible for it end up designating an entry for reuse and then subsequently freeing the same entry after it finds it. This leads it to a later use-after-free situation and usually some list corruption warnings or an oops. Fix this by simply unhashing the entry that we intend to reuse. That will mean that it's not findable via a search and should prevent this situation from occurring. Cc: stable@vger.kernel.org # v3.10+ Reported-by: Christoph Hellwig <hch@infradead.org> Reported-by: g. artim <gartim@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-09-10fs: convert fs shrinkers to new scan/count APIDave Chinner
Convert the filesystem shrinkers to use the new API, and standardise some of the behaviours of the shrinkers at the same time. For example, nr_to_scan means the number of objects to scan, not the number of objects to free. I refactored the CIFS idmap shrinker a little - it really needs to be broken up into a shrinker per tree and keep an item count with the tree root so that we don't need to walk the tree every time the shrinker needs to count the number of objects in the tree (i.e. all the time under memory pressure). [glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree] [assorted fixes folded in] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-30nfsd: make symbol nfsd_reply_cache_shrinker staticWei Yongjun
symbol 'nfsd_reply_cache_shrinker' only used within this file. It should be static. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03nfsd: scale up the number of DRC hash buckets with cache sizeJeff Layton
We've now increased the size of the duplicate reply cache by quite a bit, but the number of hash buckets has not changed. So, we've gone from an average hash chain length of 16 in the old code to 4096 when the cache is its largest. Change the code to scale out the number of buckets with the max size of the cache. At the same time, we also need to fix the hash function since the existing one isn't really suitable when there are more than 256 buckets. Move instead to use the stock hash_32 function for this. Testing on a machine that had 2048 buckets showed that this gave a smaller longest:average ratio than the existing hash function: The formula here is longest hash bucket searched divided by average number of entries per bucket at the time that we saw that longest bucket: old hash: 68/(39258/2048) == 3.547404 hash_32: 45/(33773/2048) == 2.728807 Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03nfsd: keep stats on worst hash balancing seen so farJeff Layton
The typical case with the DRC is a cache miss, so if we keep track of the max number of entries that we've ever walked over in a search, then we should have a reasonable estimate of the longest hash chain that we've ever seen. With that, we'll also keep track of the total size of the cache when we see the longest chain. In the case of a tie, we prefer to track the smallest total cache size in order to properly gauge the worst-case ratio of max vs. avg chain length. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03nfsd: add new reply_cache_stats file in nfsdfsJeff Layton
For presenting statistics relating to duplicate reply cache. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03nfsd: track memory utilization by the DRCJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03nfsd: break out comparator into separate functionJeff Layton
Break out the function that compares the rqstp and checksum against a reply cache entry. While we're at it, track the efficacy of the checksum over the NFS data by tracking the cases where we would have incorrectly matched a DRC entry if we had not tracked it or the length. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03nfsd: eliminate one of the DRC cache searchesJeff Layton
The most common case is to do a search of the cache, followed by an insert. In the case where we have to allocate an entry off the slab, then we end up having to redo the search, which is wasteful. Better optimize the code for the common case by eliminating the initial search of the cache and always preallocating an entry. In the case of a cache hit, we'll end up just freeing that entry but that's preferable to an extra search. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-18nfsd: fix startup order in nfsd_reply_cache_initJeff Layton
If we end up doing "goto out_nomem" in this function, we'll call nfsd_reply_cache_shutdown. That will attempt to walk the LRU list and free entries, but that list may not be initialized yet if the server is starting up for the first time. It's also possible for the shrinker to kick in before we've initialized the LRU list. Rearrange the initialization so that the LRU list_head and cache size are initialized before doing any of the allocations that might fail. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-18nfsd: only unhash DRC entries that are in the hashtableJeff Layton
It's not safe to call hlist_del() on a newly initialized hlist_node. That leads to a NULL pointer dereference. Only do that if the entry is hashed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-28Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd changes from J Bruce Fields: "Miscellaneous bugfixes, plus: - An overhaul of the DRC cache by Jeff Layton. The main effect is just to make it larger. This decreases the chances of intermittent errors especially in the UDP case. But we'll need to watch for any reports of performance regressions. - Containerized nfsd: with some limitations, we now support per-container nfs-service, thanks to extensive work from Stanislav Kinsbursky over the last year." Some notes about conflicts, since there were *two* non-data semantic conflicts here: - idr_remove_all() had been added by a memory leak fix, but has since become deprecated since idr_destroy() does it for us now. - xs_local_connect() had been added by this branch to make AF_LOCAL connections be synchronous, but in the meantime Trond had changed the calling convention in order to avoid a RCU dereference. There were a couple of more obvious actual source-level conflicts due to the hlist traversal changes and one just due to code changes next to each other, but those were trivial. * 'for-3.9' of git://linux-nfs.org/~bfields/linux: (49 commits) SUNRPC: make AF_LOCAL connect synchronous nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum svcrpc: fix rpc server shutdown races svcrpc: make svc_age_temp_xprts enqueue under sv_lock lockd: nlmclnt_reclaim(): avoid stack overflow nfsd: enable NFSv4 state in containers nfsd: disable usermode helper client tracker in container nfsd: use proper net while reading "exports" file nfsd: containerize NFSd filesystem nfsd: fix comments on nfsd_cache_lookup SUNRPC: move cache_detail->cache_request callback call to cache_read() SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function SUNRPC: rework cache upcall logic SUNRPC: introduce cache_detail->cache_request callback NFS: simplify and clean cache library NFS: use SUNRPC cache creation and destruction helper for DNS cache nfsd4: free_stid can be static nfsd: keep a checksum of the first 256 bytes of request sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer sunrpc: fix comment in struct xdr_buf definition ...
2013-02-27hlist: drop the node parameter from iteratorsSasha Levin
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-17nfsd: fix compiler warning about ambiguous types in nfsd_cache_csumJeff Layton
kbuild test robot says: tree: git://linux-nfs.org/~bfields/linux.git for-3.9 head: deb4534f4f3be7aea7d9d24c3b0d58f370cbf9ef commit: 01a7decf75930925322c5efc87af0b5e58eb8650 [32/44] nfsd: keep a checksum of the first 256 bytes of request config: i386-randconfig-x088 (attached as .config) All warnings: fs/nfsd/nfscache.c: In function 'nfsd_cache_csum': >> fs/nfsd/nfscache.c:266:9: warning: comparison of distinct pointer types lacks a cast [enabled by default] vim +266 fs/nfsd/nfscache.c 250 __wsum csum; 251 struct xdr_buf *buf = &rqstp->rq_arg; 252 const unsigned char *p = buf->head[0].iov_base; 253 size_t csum_len = min_t(size_t, buf->head[0].iov_len + buf->page_len, 254 RC_CSUMLEN); 255 size_t len = min(buf->head[0].iov_len, csum_len); 256 257 /* rq_arg.head first */ 258 csum = csum_partial(p, len, 0); 259 csum_len -= len; 260 261 /* Continue into page array */ 262 idx = buf->page_base / PAGE_SIZE; 263 base = buf->page_base & ~PAGE_MASK; 264 while (csum_len) { 265 p = page_address(buf->pages[idx]) + base; > 266 len = min(PAGE_SIZE - base, csum_len); 267 csum = csum_partial(p, len, csum); 268 csum_len -= len; 269 base = 0; 270 ++idx; 271 } 272 return csum; 273 } 274 Signed-off-by: Jeff Layton <jlayton@redhat.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15nfsd: fix comments on nfsd_cache_lookupJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-08nfsd: keep a checksum of the first 256 bytes of requestJeff Layton
Now that we're allowing more DRC entries, it becomes a lot easier to hit problems with XID collisions. In order to mitigate those, calculate a checksum of up to the first 256 bytes of each request coming in and store that in the cache entry, along with the total length of the request. This initially used crc32, but Chuck Lever and Jim Rees pointed out that crc32 is probably more heavyweight than we really need for generating these checksums, and recommended looking at using the same routines that are used to generate checksums for IP packets. On an x86_64 KVM guest measurements with ftrace showed ~800ns to use csum_partial vs ~1750ns for crc32. The difference probably isn't terribly significant, but for now we may as well use csum_partial. Signed-off-by: Jeff Layton <jlayton@redhat.com> Stones-thrown-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-05sunrpc: move address copy/cmp/convert routines and prototypes from clnt.h to ↵Jeff Layton
addr.h These routines are used by server and client code, so having them in a separate header would be best. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04nfsd: register a shrinker for DRC cache entriesJeff Layton
Since we dynamically allocate them now, allow the system to call us up to release them if it gets low on memory. Since these entries aren't replaceable, only free ones that are expired or that are over the cap. The the seeks value is set to '1' however to indicate that freeing the these entries is low-cost. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04nfsd: add recurring workqueue job to clean the cacheJeff Layton
It's not sufficient to only clean the cache when requests come in. What if we have a flurry of activity and then the server goes idle? Add a workqueue job that will clean the cache every RC_EXPIRE period. Care is taken to only run this when we expect to have entries expiring. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04nfsd: when updating an entry with RC_NOCACHE, just free itJeff Layton
There's no need to keep entries around that we're declaring RC_NOCACHE. Ditto if there's a problem with the entry. With this change too, there's no need to test for RC_UNUSED in the search function. If the entry's in the hash table then it's either INPROG or DONE. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04nfsd: remove the cache_disabled flagJeff Layton
With the change to dynamically allocate entries, the cache is never disabled on the fly. Remove this flag. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04nfsd: dynamically allocate DRC entriesJeff Layton
The existing code keeps a fixed-size cache of 1024 entries. This is much too small for a busy server, and wastes memory on an idle one. This patch changes the code to dynamically allocate and free these cache entries. A cap on the number of entries is retained, but it's much larger than the existing value and now scales with the amount of low memory in the machine. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04nfsd: track the number of DRC entries in the cacheJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04nfsd: always move DRC entries to the end of LRU list when updating timestampJeff Layton
...otherwise, we end up with the list ordering wrong. Currently, it's not a problem since we skip RC_INPROG entries, but keeping the ordering strict will be necessary for a later patch that adds a cache cleaner. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04nfsd: break out hashtable search into separate functionJeff Layton
Later, we'll need more than one call site for this, so break it out into a new function. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04nfsd: clean up and clarify the cache expiration codeJeff Layton
Add a preprocessor constant for the expiry time of cache entries, and move the test for an expired entry into a function. Note that the current code does not test for RC_INPROG. It just assumes that it won't take more than 2 minutes to fill out an in-progress entry. I'm not sure how valid that assumption is though, so let's just ensure that we never consider an RC_INPROG entry to be expired. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04nfsd: remove redundant test from nfsd_reply_cache_freeJeff Layton
Entries can only get a c_type of RC_REPLBUFF iff they are RC_DONE. Therefore the test for RC_DONE isn't necessary here. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04nfsd: add alloc and free functions for DRC entriesJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04nfsd: create a dedicated slabcache for DRC entriesJeff Layton
Currently we use kmalloc() which wastes a little bit of memory on each allocation since it's a power of 2 allocator. Since we're allocating a 1024 of these now, and may need even more later, let's create a new slabcache for them. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04nfsd: remove unneeded spinlock in nfsd_cache_updateJeff Layton
The locking rules for cache entries say that locking the cache_lock isn't needed if you're just touching the current entry. Earlier in this function we set rp->c_state to RC_UNUSED without any locking, so I believe it's ok to do the same here. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04nfsd: fix IPv6 address handling in the DRCJeff Layton
Currently, it only stores the first 16 bytes of any address. struct sockaddr_in6 is 28 bytes however, so we're currently ignoring the last 12 bytes of the address. Expand the c_addr field to a sockaddr_in6, and cast it to a sockaddr_in as necessary. Also fix the comparitor to use the existing RPC helpers for this. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-07-18nfsd: turn on reply cache for NFSv4J. Bruce Fields
It's sort of ridiculous that we've never had a working reply cache for NFSv4. On the other hand, we may still not: our current reply cache is likely not very good, especially in the TCP case (which is the only case that matters for v4). What we really need here is some serious testing. Anyway, here's a start. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2009-12-15nfsd: remove pointless paths in file headersJ. Bruce Fields
The new .h files have paths at the top that are now out of date. While we're here, just remove all of those from fs/nfsd; they never served any purpose. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-14nfsd: Move private headers to source directoryBoaz Harrosh
Lots of include/linux/nfsd/* headers are only used by nfsd module. Move them to the source directory Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-14nfsd: Source files #include cleanupsBoaz Harrosh
Now that the headers are fixed and carry their own wait, all fs/nfsd/ source files can include a minimal set of headers. and still compile just fine. This patch should improve the compilation speed of the nfsd module. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-27knfsd: fix reply cache memory corruptionGreg Banks
Fix a regression in the reply cache introduced when the code was converted to use proper Linux lists. When a new entry needs to be inserted, the case where all the entries are currently being used by threads is not correctly detected. This can result in memory corruption and a crash. In the current code this is an extremely unlikely corner case; it would require the machine to have 1024 nfsd threads and all of them to be busy at the same time. However, upcoming reply cache changes make this more likely; a crash due to this problem was actually observed in field. Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-27knfsd: reply cache cleanupsGreg Banks
Make REQHASH() an inline function. Rename hash_list to cache_hash. Fix an obsolete comment. Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01nfsd: fail module init on reply cache init failureJ. Bruce Fields
If the reply cache initialization fails due to a kmalloc failure, currently we try to soldier on with a reduced (or nonexistant) reply cache. Better to just fail immediately: the failure is then much easier to understand and debug, and it could save us complexity in some later code. (But actually, it doesn't help currently because the cache is also turned off in some odd failure cases; we should probably find a better way to handle those failure cases some day.) Fix some minor style problems while we're at it, and rename nfsd_cache_init() to remove the need for a comment describing it. Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>