summaryrefslogtreecommitdiff
path: root/net/sunrpc/svc_xprt.c
AgeCommit message (Collapse)Author
2015-08-10nfsd/sunrpc: turn enqueueing a svc_xprt into a svc_serv operationJeff Layton
For now, all services use svc_xprt_do_enqueue, but once we add workqueue-based service support, we'll need to do something different. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-23sunrpc/lockd: fix references to the BKLJeff Layton
The BKL is completely out of the picture in the lockd and sunrpc code these days. Update the antiquated comments that refer to it. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: only call test_bit once in svc_xprt_receivedJeff Layton
...move the WARN_ON_ONCE inside the following if block since they use the same condition. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: add some tracepoints around enqueue and dequeue of svc_xprtJeff Layton
These were useful when I was tracking down a race condition between svc_xprt_do_enqueue and svc_get_next_xprt. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: convert to lockless lookup of queued server threadsJeff Layton
Testing has shown that the pool->sp_lock can be a bottleneck on a busy server. Every time data is received on a socket, the server must take that lock in order to dequeue a thread from the sp_threads list. Address this problem by eliminating the sp_threads list (which contains threads that are currently idle) and replacing it with a RQ_BUSY flag in svc_rqst. This allows us to walk the sp_all_threads list under the rcu_read_lock and find a suitable thread for the xprt by doing a test_and_set_bit. Note that we do still have a potential atomicity problem however with this approach. We don't want svc_xprt_do_enqueue to set the rqst->rq_xprt pointer unless a test_and_set_bit of RQ_BUSY returned zero (which indicates that the thread was idle). But, by the time we check that, the bit could be flipped by a waking thread. To address this, we acquire a new per-rqst spinlock (rq_lock) and take that before doing the test_and_set_bit. If that returns false, then we can set rq_xprt and drop the spinlock. Then, when the thread wakes up, it must set the bit under the same spinlock and can trust that if it was already set then the rq_xprt is also properly set. With this scheme, the case where we have an idle thread no longer needs to take the highly contended pool->sp_lock at all, and that removes the bottleneck. That still leaves one issue: What of the case where we walk the whole sp_all_threads list and don't find an idle thread? Because the search is lockess, it's possible for the queueing to race with a thread that is going to sleep. To address that, we queue the xprt and then search again. If we find an idle thread at that point, we can't attach the xprt to it directly since that might race with a different thread waking up and finding it. All we can do is wake the idle thread back up and let it attempt to find the now-queued xprt. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Chris Worley <chris.worley@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: fix potential races in pool_stats collectionJeff Layton
In a later patch, we'll be removing some spinlocking around the socket and thread queueing code in order to fix some contention problems. At that point, the stats counters will no longer be protected by the sp_lock. Change the counters to atomic_long_t fields, except for the "sockets_queued" counter which will still be manipulated under a spinlock. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Chris Worley <chris.worley@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: have svc_wake_up only deal with pool 0Jeff Layton
The way that svc_wake_up works is a bit inefficient. It walks all of the available pools for a service and either wakes up a task in each one or sets the SP_TASK_PENDING flag in each one. When svc_wake_up is called, there is no need to wake up more than one thread to do this work. In practice, only lockd currently uses this function and it's single threaded anyway. Thus, this just boils down to doing a wake up of a thread in pool 0 or setting a single flag. Eliminate the for loop in this function and change it to just operate on pool 0. Also update the comments that sit above it and get rid of some code that has been commented out for years now. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: convert sp_task_pending flag to use atomic bitopsJeff Layton
In a later patch, we'll want to be able to handle this flag without holding the sp_lock. Change this field to an unsigned long flags field, and declare a new flag in it that can be managed with atomic bitops. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: move rq_dropme flag into rq_flagsJeff Layton
Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: move rq_usedeferral flag to rq_flagsJeff Layton
Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to itJeff Layton
In a later patch, we're going to need some atomic bit flags. Since that field will need to be an unsigned long, we mitigate that space consumption by migrating some other bitflags to the new field. Start with the rq_secure flag. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09Merge tag 'nfs-for-3.19-1' into nfsd for-3.19 branchJ. Bruce Fields
Mainly what I need is 860a0d9e511f "sunrpc: add some tracepoints in svc_rqst handling functions", which subsequent server rpc patches from jlayton depend on. I'm merging this later tag on the assumption that's more likely to be a tested and stable point.
2014-12-01sunrpc: eliminate the XPT_DETACHED flagJeff Layton
All it does is indicate whether a xprt has already been deleted from a list or not, which is unnecessary since we use list_del_init and it's always set and checked under the sv_lock anyway. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-11-24sunrpc: add some tracepoints in svc_rqst handling functionsJeff Layton
...just around svc_send, svc_recv and svc_process for now. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-28SUNRPC: Fix compile on non-x86J. Bruce Fields
current_task appears to be x86-only, oops. Let's just delete this check entirely: Any developer that adds a new user without setting rq_task will get a crash the first time they test it. I also don't think there are normally any important locks held here, and I can't see any other reason why killing a server thread would bring the whole box down. So the effort to fail gracefully here looks like overkill. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 983c684466e0 "SUNRPC: get rid of the request wait queue" Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-17SUNRPC: More optimisations of svc_xprt_enqueue()Trond Myklebust
Just move the transport locking out of the spin lock protected area altogether. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-17SUNRPC: Fix broken kthread_should_stop test in svc_get_next_xprtTrond Myklebust
We should definitely not be exiting svc_get_next_xprt() with the thread enqueued. Fix this by ensuring that we fall through to the dequeue. Also move the test itself outside the spin lock protected section. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-17SUNRPC: get rid of the request wait queueTrond Myklebust
We're always _only_ waking up tasks from within the sp_threads list, so we know that they are enqueued and alive. The rq_wait waitqueue is just a distraction with extra atomic semantics. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-17SUNRPC: Do not grab pool->sp_lock unnecessarily in svc_get_next_xprtTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-17SUNRPC: Do not override wspace tests in svc_handle_xprtTrond Myklebust
We already determined that there was enough wspace when we called svc_xprt_enqueue. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-29SUNRPC: Allow svc_reserve() to notify TCP socket that space has been freedTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-29SUNRPC: Reduce contention in svc_xprt_enqueue()Trond Myklebust
Ensure that all calls to svc_xprt_enqueue() except svc_xprt_received() check the value of XPT_BUSY, before attempting to grab spinlocks etc. This is to avoid situations such as the following "perf" trace, which shows heavy contention on the pool spinlock: 54.15% nfsd [kernel.kallsyms] [k] _raw_spin_lock_bh | --- _raw_spin_lock_bh | |--71.43%-- svc_xprt_enqueue | | | |--50.31%-- svc_reserve | | | |--31.35%-- svc_xprt_received | | | |--18.34%-- svc_tcp_data_ready ... Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30nfsd4: allow encoding across page boundariesJ. Bruce Fields
After this we can handle for example getattr of very large ACLs. Read, readdir, readlink are still special cases with their own limits. Also we can't handle a new operation starting close to the end of a page. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-22SUNRPC: Fix a module reference leak in svc_handle_xprtTrond Myklebust
If the accept() call fails, we need to put the module reference. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-22NFSD: Ignore client's source port on RDMA transportsChuck Lever
An NFS/RDMA client's source port is meaningless for RDMA transports. The transport layer typically sets the source port value on the connection to a random ephemeral port. Currently, NFS server administrators must specify the "insecure" export option to enable clients to access exports via RDMA. But this means NFS clients can access such an export via IP using an ephemeral port, which may not be desirable. This patch eliminates the need to specify the "insecure" export option to allow NFS/RDMA clients access to an export. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=250 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-02-09net: Mark functions as static in net/sunrpc/svc_xprt.cRashika Kheria
Mark functions as static in net/sunrpc/svc_xprt.c because they are not used outside this file. This eliminates the following warning in net/sunrpc/svc_xprt.c: net/sunrpc/svc_xprt.c:574:5: warning: no previous prototype for ‘svc_alloc_arg’ [-Wmissing-prototypes] net/sunrpc/svc_xprt.c:615:18: warning: no previous prototype for ‘svc_get_next_xprt’ [-Wmissing-prototypes] net/sunrpc/svc_xprt.c:694:6: warning: no previous prototype for ‘svc_add_new_temp_xprt’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-17svcrpc: fix rpc server shutdown racesJ. Bruce Fields
Rewrite server shutdown to remove the assumption that there are no longer any threads running (no longer true, for example, when shutting down the service in one network namespace while it's still running in others). Do that by doing what we'd do in normal circumstances: just CLOSE each socket, then enqueue it. Since there may not be threads to handle the resulting queued xprts, also run a simplified version of the svc_recv() loop run by a server to clean up any closed xprts afterwards. Cc: stable@kernel.org Tested-by: Jason Tibbitts <tibbs@math.uh.edu> Tested-by: Paweł Sikora <pawel.sikora@agmk.net> Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-17svcrpc: make svc_age_temp_xprts enqueue under sv_lockJ. Bruce Fields
svc_age_temp_xprts expires xprts in a two-step process: first it takes the sv_lock and moves the xprts to expire off their server-wide list (sv_tempsocks or sv_permsocks) to a local list. Then it drops the sv_lock and enqueues and puts each one. I see no reason for this: svc_xprt_enqueue() will take sp_lock, but the sv_lock and sp_lock are not otherwise nested anywhere (and documentation at the top of this file claims it's correct to nest these with sp_lock inside.) Cc: stable@kernel.org Tested-by: Jason Tibbitts <tibbs@math.uh.edu> Tested-by: Paweł Sikora <pawel.sikora@agmk.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-01-23sunrpc: Fix lockd sleeping until timeoutAndriy Skulysh
There is a race in enqueueing thread to a pool and waking up a thread. lockd doesn't wake up on reception of lock granted callback if svc_wake_up() is called before lockd's thread is added to a pool. Signed-off-by: Andriy Skulysh <Andriy_Skulysh@xyratex.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-04SUNRPC: remove BUG_ON in svc_delete_xprtWeston Andros Adamson
Replace BUG_ON() with WARN_ON_ONCE(). Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04SUNRPC: remove BUG_ONs checking RPCSVC_MAXPAGESWeston Andros Adamson
Replace two bounds checking BUG_ON() calls with WARN_ON_ONCE() and resetting the requested size to RPCSVC_MAXPAGES. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04SUNRPC: remove BUG_ON in svc_xprt_receivedWeston Andros Adamson
Replace BUG_ON() with a WARN_ON_ONCE() and early return. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-08-21svcrpc: split up svc_handle_xprtJ. Bruce Fields
Move initialization of newly accepted socket into a helper. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21svcrpc: break up svc_recvJ. Bruce Fields
Matter of taste, I suppose, but svc_recv breaks up naturally into: allocate pages and setup arg dequeue (wait for, if necessary) next socket do something with that socket And I find it easier to read when it doesn't go on for pages and pages. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21svcrpc: make svc_xprt_received staticJ. Bruce Fields
Note this isn't used outside svc_xprt.c. May as well move it so we don't need a declaration while we're here. Also remove an outdated comment. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21svcrpc: make xpo_recvfrom return only >=0J. Bruce Fields
The only errors returned from xpo_recvfrom have been -EAGAIN and -EAFNOSUPPORT. The latter was removed by a previous patch. That leaves only -EAGAIN, which is treated just like 0 by the caller (svc_recv). So, just ditch -EAGAIN and return 0 instead. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21svcrpc: share some setup of listening socketsJ. Bruce Fields
There's some duplicate code here. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21svcrpc: make svc_create_xprt enqueue on clearing XPT_BUSYJ. Bruce Fields
Whenever we clear XPT_BUSY we should call svc_xprt_enqueue(). Without that we may fail to notice any events (such as new connections) that arrived while XPT_BUSY was set. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21svcrpc: fix xpt_list traversal locking on shutdownJ. Bruce Fields
Server threads are not running at this point, but svc_age_temp_xprts still may be, so we need this locking. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20svcrpc: fix svc_xprt_enqueue/svc_recv busy-loopingJ. Bruce Fields
The rpc server tries to ensure that there will be room to send a reply before it receives a request. It does this by tracking, in xpt_reserved, an upper bound on the total size of the replies that is has already committed to for the socket. Currently it is adding in the estimate for a new reply *before* it checks whether there is space available. If it finds that there is not space, it then subtracts the estimate back out. This may lead the subsequent svc_xprt_enqueue to decide that there is space after all. The results is a svc_recv() that will repeatedly return -EAGAIN, causing server threads to loop without doing any actual work. Cc: stable@vger.kernel.org Reported-by: Michael Tokarev <mjt@tls.msk.ru> Tested-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20svcrpc: sends on closed socket should stop immediatelyJ. Bruce Fields
svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply. However, the XPT_CLOSE won't be acted on immediately. Meanwhile other threads could send further replies before the socket is really shut down. This can manifest as data corruption: for example, if a truncated read reply is followed by another rpc reply, that second reply will look to the client like further read data. Symptoms were data corruption preceded by svc_tcp_sendto logging something like kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket Cc: stable@vger.kernel.org Reported-by: Malahal Naineni <malahal@us.ibm.com> Tested-by: Malahal Naineni <malahal@us.ibm.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-01Merge branch 'for-3.5' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull the rest of the nfsd commits from Bruce Fields: "... and then I cherry-picked the remainder of the patches from the head of my previous branch" This is the rest of the original nfsd branch, rebased without the delegation stuff that I thought really needed to be redone. I don't like rebasing things like this in general, but in this situation this was the lesser of two evils. * 'for-3.5' of git://linux-nfs.org/~bfields/linux: (50 commits) nfsd4: fix, consolidate client_has_state nfsd4: don't remove rebooted client record until confirmation nfsd4: remove some dprintk's and a comment nfsd4: return "real" sequence id in confirmed case nfsd4: fix exchange_id to return confirm flag nfsd4: clarify that renewing expired client is a bug nfsd4: simpler ordering of setclientid_confirm checks nfsd4: setclientid: remove pointless assignment nfsd4: fix error return in non-matching-creds case nfsd4: fix setclientid_confirm same_cred check nfsd4: merge 3 setclientid cases to 2 nfsd4: pull out common code from setclientid cases nfsd4: merge last two setclientid cases nfsd4: setclientid/confirm comment cleanup nfsd4: setclientid remove unnecessary terms from a logical expression nfsd4: move rq_flavor into svc_cred nfsd4: stricter cred comparison for setclientid/exchange_id nfsd4: move principal name into svc_cred nfsd4: allow removing clients not holding state nfsd4: rearrange exchange_id logic to simplify ...
2012-05-31svcrpc: fix a comment typoJ. Bruce Fields
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31sunrpc: do array overrun check in svc_recv before allocating pagesJeff Layton
There's little point in waiting until after we allocate all of the pages to see if we're going to overrun the array. In the event that this calculation is really off we could end up scribbling over a bunch of memory and make it tougher to debug. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-15net: Convert net_ratelimit uses to net_<level>_ratelimitedJoe Perches
Standardize the net core ratelimited logging functions. Coalesce formats, align arguments. Change a printk then vprintk sequence to use printf extension %pV. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-15SUNRPC: service destruction in network namespace contextStanislav Kinsbursky
v2: Added comment to BUG_ON's in svc_destroy() to make code looks clearer. This patch introduces network namespace filter for service destruction function. Nothing special here - just do exactly the same operations, but only for tranports in passed networks namespace context. BTW, BUG_ON() checks for empty service transports lists were returned into svc_destroy() function. This is because of swithing generic svc_close_all() to networks namespace dependable svc_close_net(). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-15SUNRPC: clear svc transports lists helper introducedStanislav Kinsbursky
This patch moves service transports deletion from service sockets lists to separated function. This is a precursor patch, which would be usefull with service shutdown in network namespace context, introduced later in the series. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-15SUNRPC: clear svc pools lists helper introducedStanislav Kinsbursky
This patch moves removing of service transport from it's pools ready lists to separated function. Also this clear is now done with list_for_each_entry_safe() helper. This is a precursor patch, which would be usefull with service shutdown in network namespace context, introduced later in the series. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNRPC: search for service transports in network namespace contextStanislav Kinsbursky
Service transports are parametrized by network namespace. And thus lookup of transport instance have to take network namespace into account. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: J. Bruce Fields <bfields@redhat.com>
2012-01-14Merge branch 'for-3.3' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-3.3' of git://linux-nfs.org/~bfields/linux: (31 commits) nfsd4: nfsd4_create_clid_dir return value is unused NFSD: Change name of extended attribute containing junction svcrpc: don't revert to SVC_POOL_DEFAULT on nfsd shutdown svcrpc: fix double-free on shutdown of nfsd after changing pool mode nfsd4: be forgiving in the absence of the recovery directory nfsd4: fix spurious 4.1 post-reboot failures NFSD: forget_delegations should use list_for_each_entry_safe NFSD: Only reinitilize the recall_lru list under the recall lock nfsd4: initialize special stateid's at compile time NFSd: use network-namespace-aware cache registering routines SUNRPC: create svc_xprt in proper network namespace svcrpc: update outdated BKL comment nfsd41: allow non-reclaim open-by-fh's in 4.1 svcrpc: avoid memory-corruption on pool shutdown svcrpc: destroy server sockets all at once svcrpc: make svc_delete_xprt static nfsd: Fix oops when parsing a 0 length export nfsd4: Use kmemdup rather than duplicating its implementation nfsd4: add a separate (lockowner, inode) lookup nfsd4: fix CONFIG_NFSD_FAULT_INJECTION compile error ...