summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2005-09-23[PATCH] RPC: expose API for serializing access to RPC transportsChuck Lever
The next several patches introduce an API that allows transports to choose whether the RPC client provides congestion control or whether the transport itself provides it. The first method we abstract is the one that serializes access to the RPC transport to prevent the bytes from different requests from mingling together. This method provides proper request serialization and the opportunity to prevent new requests from being started because the transport is congested. The normal situation is for the transport to handle congestion control itself. Although NFS over UDP was first, it has been recognized after years of experience that having the transport provide congestion control is much better than doing it in the RPC client. Thus TCP, and probably every future transport implementation, will use the default method, xprt_lock_write, provided in xprt.c, which does not provide any kind of congestion control. UDP can continue using the xprt.c-provided Van Jacobson congestion avoidance implementation. Test-plan: Use WAN simulation to cause sporadic bursty packet loss. Look for significant regression in performance or client stability. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: add API to set transport-specific timeoutsChuck Lever
Prepare the way to remove the "xprt->nocong" variable by adding a callout to the RPC client transport switch API to handle setting RPC retransmit timeouts. Add a pair of generic helper functions that provide the ability to set a simple fixed timeout, or to set a timeout based on the state of a round- trip estimator. Test-plan: Use WAN simulation to cause sporadic bursty packet loss. Look for significant regression in performance or client stability. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: get rid of xprt->streamChuck Lever
Now we can fix up the last few places that use the "xprt->stream" variable, and get rid of it from the rpc_xprt structure. Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: skip over transport-specific heads automaticallyChuck Lever
Add a generic mechanism for skipping over transport-specific headers when constructing an RPC request. This removes another "xprt->stream" dependency. Test-plan: Write-intensive workload on a single mount point (try both UDP and TCP). Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: separate TCP and UDP socket write pathsChuck Lever
Split the RPC client's main socket write path into a TCP version and a UDP version to eliminate another dependency on the "xprt->stream" variable. Compiler optimization removes unneeded code from xs_sendpages, as this function is now called with some constant arguments. We can now cleanly perform transport protocol-specific return code testing and error recovery in each path. Test-plan: Millions of fsx operations. Performance characterization such as "sio" or "iozone". Examine oprofile results for any changes before and after this patch is applied. Version: Thu, 11 Aug 2005 16:08:46 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: separate TCP and UDP transport connection logicChuck Lever
Create separate connection worker functions for managing UDP and TCP transport sockets. This eliminates several dependencies on "xprt->stream". Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with v2, v3, and v4. Version: Thu, 11 Aug 2005 16:08:18 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: separate TCP and UDP write space callbacksChuck Lever
Split the socket write space callback function into a TCP version and UDP version, eliminating one dependence on the "xprt->stream" variable. Keep the common pieces of this path in xprt.c so other transports can use it too. Test-plan: Write-intensive workload on a single mount point. Version: Thu, 11 Aug 2005 16:07:51 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: client-side transport switch cleanupChuck Lever
Clean-up: change some comments to reflect the realities of the new RPC transport switch mechanism. Get rid of unused xprt_receive() prototype. Also, organize function prototypes in xprt.h by usage and scope. Test-plan: Compile kernel with CONFIG_NFS enabled. Version: Thu, 11 Aug 2005 16:07:21 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: Add helper for waking tasks pending on a transportChuck Lever
Clean-up: remove only reference to xprt->pending from the socket transport implementation. This makes a cleaner interface for other transport implementations as well. Test-plan: Compile kernel with CONFIG_NFS enabled. Version: Thu, 11 Aug 2005 16:06:52 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: Eliminate socket.h includes in RPC clientChuck Lever
Clean-up: get rid of unnecessary socket.h and in.h includes in the generic parts of the RPC client. Test-plan: Compile kernel with CONFIG_NFS enabled. Version: Thu, 11 Aug 2005 16:06:23 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: rename the sockstate fieldChuck Lever
Clean-up: get rid of a name reference to sockets in the generic parts of the RPC client by renaming the sockstate field in the rpc_xprt structure. Test-plan: Compile kernel with CONFIG_NFS enabled. Version: Thu, 11 Aug 2005 16:05:53 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: Rename xprt_lockChuck Lever
Clean-up: Replace the xprt_lock with something more aptly named. This lock single-threads the XID and request slot reservation process. Test-plan: Compile kernel with CONFIG_NFS enabled. Version: Thu, 11 Aug 2005 16:05:26 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: Rename sock_lockChuck Lever
Clean-up: replace a name reference to sockets in the generic parts of the RPC client by renaming sock_lock in the rpc_xprt structure. Test-plan: Compile kernel with CONFIG_NFS enabled. Version: Thu, 11 Aug 2005 16:05:00 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: Reduce stack utilization in xs_sendpagesChuck Lever
Reduce stack utilization of the RPC socket transport's send path. A couple of unlikely()s are added to ensure the compiler places the tail processing at the end of the csect. Test-plan: Millions of fsx operations. Performance characterization such as "sio" or "iozone". Version: Thu, 11 Aug 2005 16:04:30 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: transport switch function namingChuck Lever
Introduce block header comments and a function naming convention to the socket transport implementation. Provide a debug setting for transports that is separate from RPCDBG_XPRT. Eliminate xprt_default_timeout(). Provide block comments for exposed interfaces in xprt.c, and eliminate the useless obvious comments. Convert printk's to dprintk's. Test-plan: Compile kernel with CONFIG_NFS enabled. Version: Thu, 11 Aug 2005 16:04:04 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: introduce client-side transport switchChuck Lever
Move the bulk of client-side socket-specific code into a separate source file, net/sunrpc/xprtsock.c. Test-plan: Millions of fsx operations. Performance characterization such as "sio" or "iozone". Destructive testing (unplugging the network temporarily, server reboots). Connectathon with v2, v3, and v4. Version: Thu, 11 Aug 2005 16:03:38 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: extract socket logic common to both client and serverChuck Lever
Clean-up: Move some code that is common to both RPC client- and server-side socket transports into its own source file, net/sunrpc/socklib.c. Test-plan: Compile kernel with CONFIG_NFS enabled. Millions of fsx operations over UDP, client and server. Connectathon over UDP. Version: Thu, 11 Aug 2005 16:03:09 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: portmapper doesn't need a reserved portChuck Lever
The in-kernel portmapper does not require a reserved port for making bind queries. Test-plan: Tens of runs of the Connectathon locking suite with TCP and UDP against several other NFS server implementations using NFSv3, not NFSv4 (which doesn't require rpcbind). Version: Thu, 11 Aug 2005 16:02:43 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] NFS: use a constant value for TCP retransmit timeoutsChuck Lever
Implement a best practice: don't use exponential backoff when computing retransmit timeout values on TCP connections, but simply retransmit at regular intervals. This also fixes a bug introduced when xprt_reset_majortimeo() was added. Test-plan: Enable RPC debugging and watch timeout behavior on a NFS/TCP mount. Version: Thu, 11 Aug 2005 16:02:19 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: proper soft timeout behavior for rpcbindChuck Lever
Implement a best practice: for soft mounts, an rpcbind timeout should cause an RPC request to fail. This also provides an FSM hook for retrying an rpcbind with a different rpcbind protocol version. We'll use this later to try multiple rpcbind protocol versions when binding. To enable this, expose the RPC error code returned during a portmap request to the FSM so it can make some decision about how to report, retry, or fail the request. Test-plan: Hundreds of passes with connectathon NFSv3 locking suite, on the client and server. Version: Thu, 11 Aug 2005 16:01:53 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: Report connection errors properly when mounting with "soft"Chuck Lever
Fix up xprt_connect_status: the soft timeout logic was clobbering tk_status, so TCP connect errors were not properly reported on soft mounts. Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. Version: Thu, 11 Aug 2005 16:01:28 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-22[SCTP]: Fix SCTP_SHUTDOWN notifications.Sridhar Samudrala
Fix to allow SCTP_SHUTDOWN notifications to be received on 1-1 style SCTP SOCK_STREAM sockets. Add SCTP_SHUTDOWN notification to the receive queue before updating the state of the association. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-22[NETFILTER] Fix conntrack event cache deadlock/oopsHarald Welte
This patch fixes a number of bugs. It cannot be reasonably split up in multiple fixes, since all bugs interact with each other and affect the same function: Bug #1: The event cache code cannot be called while a lock is held. Therefore, the call to ip_conntrack_event_cache() within ip_ct_refresh_acct() needs to be moved outside of the locked section. This fixes a number of 2.6.14-rcX oops and deadlock reports. Bug #2: We used to call ct_add_counters() for unconfirmed connections without holding a lock. Since the add operations are not atomic, we could race with another CPU. Bug #3: ip_ct_refresh_acct() lost REFRESH events in some cases where refresh (and the corresponding event) are desired, but no accounting shall be performed. Both, evenst and accounting implicitly depended on the skb parameter bein non-null. We now re-introduce a non-accounting "ip_ct_refresh()" variant to explicitly state the desired behaviour. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-22[NETFILTER] Fix sparse endian warnings in pptp helperAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-22[NETFILTER] fix DEBUG statement in PPTP helperHarald Welte
As noted by Alexey Dobriyan, the DEBUGP statement prints the wrong callID. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-22[BRIDGE]: TSO fix in br_dev_queue_push_xmitVlad Drukker
Signed-off-by: Vlad Drukker <vlad@storewiz.com> Acked-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-22[TCP]: Adjust Reno SACK estimate in tcp_fragmentHerbert Xu
Since the introduction of TSO pcount a year ago, it has been possible for tcp_fragment() to cause packets_out to decrease. Prior to that, tcp_retrans_try_collapse() was the only way for that to happen on the retransmission path. When this happens with Reno, it is possible for sasked_out to become invalid because it is only an estimate and not tied to any particular packet on the retransmission queue. Therefore we need to adjust sacked_out as well as left_out in the Reno case. The following patch does exactly that. This bug is pretty difficult to trigger in practice though since you need a SACKless peer with a retransmission that occurs just as the cached MTU value expires. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-21[TCP]: Set default congestion control correctly for incoming connections.Stephen Hemminger
Patch from Joel Sing to fix the default congestion control algorithm for incoming connections. If a new congestion control handler is added (via module), it should become the default for new connections. Instead, the incoming connections use reno. The cause is incorrect initialisation causes the tcp_init_congestion_control() function to return after the initial if test fails. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Acked-by: Ian McDonald <imcdnzl@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-21[FIB_TRIE]: message cleanupStephen Hemminger
Cleanup the printk's in fib_trie: * Convert a couple of places in the dump code to BUG_ON * Put log level's on each message The version message really needed the message since it leaks out on the pretty Fedora bootup. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Acked-by: Robert Olsson <Robert.Olsson@data.slu.se>, Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-21[AF_PACKET]: Allow for > 8 byte hardware addresses.Eric W. Biederman
The convention is that longer addresses will simply extend the hardeware address byte arrays at the end of sockaddr_ll and packet_mreq. In making this change a small information leak was also closed. The code only initializes the hardware address bytes that are used, but all of struct sockaddr_ll was copied to userspace. Now we just copy sockaddr_ll to the last byte of the hardware address used. For error checking larger structures than our internal maximums continue to be allowed but an error is signaled if we can not fit the hardware address into our internal structure. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-19Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
2005-09-19[PATCH] raw_sendmsg DoS on 2.6Mark J Cox
Fix unchecked __get_user that could be tricked into generating a memory read on an arbitrary address. The result of the read is not returned directly but you may be able to divine some information about it, or use the read to cause a crash on some architectures by reading hardware state. CAN-2004-2492. Fix from Al Viro, ack from Dave Miller. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-19[TCP]: Handle SACK'd packets properly in tcp_fragment().Herbert Xu
The problem is that we're now calling tcp_fragment() in a context where the packets might be marked as SACKED_ACKED or SACKED_RETRANS. This was not possible before as you never retransmitted packets that are so marked. Because of this, we need to adjust sacked_out and retrans_out in tcp_fragment(). This is exactly what the following patch does. We also need to preserve the SACKED_ACKED/SACKED_RETRANS marking if they exist. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-19[8021Q]: Add endian annotations.Alexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-19[NETFILTER]: Export ip_nat_port_{nfattr_to_range,range_to_nfattr}Harald Welte
Those exports are needed by the PPTP helper following in the next couple of changes. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-19[NETFILTER]: Rename misnamed functionPatrick McHardy
Both __ip_conntrack_expect_find and ip_conntrack_expect_find_get take a reference to the expectation, the difference is that callers of __ip_conntrack_expect_find must hold ip_conntrack_lock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-19[NETFILTER] ip6tables: remove duplicate codeYasuyuki Kozakai
Some IPv6 matches have very similar loops to find IPv6 extension header and we can unify them. This patch introduces ipv6_find_hdr() to do it. I just checked that it can find the target headers in the packet which has dst,hbh,rt,frag,ah,esp headers. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-19[NETFILTER]: Add new PPTP conntrack and NAT helperHarald Welte
This new "version 3" PPTP conntrack/nat helper is finally ready for mainline inclusion. Special thanks to lots of last-minute bugfixing by Patric McHardy. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-19[IPV4]: fib_trie RCU refinementsRobert Olsson
* This patch is from Paul McKenney's RCU reviewing. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-19[IPV4]: fib_trie tnode stats refinementsRobert Olsson
* Prints the route tnode and set the stats level deepth as before. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-18[NETFILTER]: Solve Kconfig dependency problemHarald Welte
As suggested by Roman Zippel. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-18[IPV6]: Check connect(2) status for IPv6 UDP socket (Re: xfrm_lookup)Mitsuru KANDA
I think we should cache the per-socket route(dst_entry) only when the IPv6 UDP socket is connect(2)'ed. (which is same as IPv4 UDP send behavior) Signed-off-by: Mitsuru KANDA <mk@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-18[DCCP]: Introduce CCID getsockopt for the CCIDsArnaldo Carvalho de Melo
Allocation for the optnames is similar to the DCCP options, with a range for rx and tx half connection CCIDs. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-18[DCCP]: Don't use necessarily the same CCID for tx and rxArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-18[CCID3]: Introduce include/linux/tfrc.hArnaldo Carvalho de Melo
Moving the TFRC sender and receiver variables to separate structs, so that we can copy these structs to userspace thru getsockopt, dccp_diag, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-18[DCCP]: Move the ack vector code to net/dccp/ackvec.[ch]Arnaldo Carvalho de Melo
Isolating it, that will be used when we introduce a CCID2 (TCP-Like) implementation. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-17[NETFILTER] move nfnetlink options to right location in kconfig menuHarald Welte
Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-17[NETFILTER] Fix Kconfig dependencies for nfnetlink/ctnetlinkHarald Welte
Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-16[NETFILTER]: Fix oops in conntrack event cacheHarald Welte
ip_ct_refresh_acct() can be called without a valid "skb" pointer. This used to work, since ct_add_counters() deals with that fact. However, the recently-added event cache doesn't handle this at all. This patch is a quick fix that is supposed to be replaced soon by a cleaner solution during the pending redesign of the event cache. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-16[NETFILTER] CLUSTERIP: use a bitmap to store node responsibility dataKOVACS Krisztian
Instead of maintaining an array containing a list of nodes this instance is responsible for let's use a simple bitmap. This provides the following features: * clusterip_responsible() and the add_node()/delete_node() operations become very simple and don't need locking * the config structure is much smaller In spite of the completely different internal data representation the user-space interface remains almost unchanged; the only difference is that the proc file does not list nodes in the order they were added. (The target info structure remains the same.) Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>