summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-09-24rxrpc: Reinitialise the call ACK and timer state for client reply phaseDavid Howells
Clear the ACK reason, ACK timer and resend timer when entering the client reply phase when the first DATA packet is received. New ACKs will be proposed once the data is queued. The resend timer is no longer relevant and we need to cancel ACKs scheduled to probe for a lost reply. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24rxrpc: Include the last reply DATA serial number in the final ACKDavid Howells
In a client call, include the serial number of the last DATA packet of the reply in the final ACK. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24rxrpc: Send an immediate ACK if we fill in a holeDavid Howells
Send an immediate ACK if we fill in a hole in the buffer left by an out-of-sequence packet. This may allow the congestion management in the peer to avoid a retransmission if packets got reordered on the wire. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24rxrpc: Send an ACK after every few DATA packets we receiveDavid Howells
Send an ACK if we haven't sent one for the last two packets we've received. This keeps the other end apprised of where we've got to - which is important if they're doing slow-start. We do this in recvmsg so that we can dispatch a packet directly without the need to wake up the background thread. This should possibly be made configurable in future. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24Merge tag 'rxrpc-rewrite-20160923' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Bug fixes and tracepoints Here are a bunch of bug fixes: (1) Need to set the timestamp on a Tx packet before queueing it to avoid trouble with the retransmission function. (2) Don't send an ACK at the end of the service reply transmission; it's the responsibility of the client to send an ACK to close the call. The service can resend the last DATA packet or send a PING ACK. (3) Wake sendmsg() on abnormal call termination. (4) Use ktime_add_ms() not ktime_add_ns() to add millisecond offsets. (5) Use before_eq() & co. to compare serial numbers (which may wrap). (6) Start the resend timer on DATA packet transmission. (7) Don't accidentally cancel a retransmission upon receiving a NACK. (8) Fix the call timer setting function to deal with timeouts that are now or past. (9) Don't use a flag to communicate the presence of the last packet in the Tx buffer from sendmsg to the input routines where ACK and DATA reception is handled. The problem is that there's a window between queueing the last packet for transmission and setting the flag in which ACKs or reply DATA packets can arrive, causing apparent state machine violation issues. Instead use the annotation buffer to mark the last packet and pick up and set the flag in the input routines. (10) Don't call the tx_ack tracepoint and don't allocate a serial number if someone else nicked the ACK we were about to transmit. There are also new tracepoints and one altered tracepoint used to track down the above bugs: (11) Call timer tracepoint. (12) Data Tx tracepoint (and adjustments to ACK tracepoint). (13) Injected Rx packet loss tracepoint. (14) Ack proposal tracepoint. (15) Retransmission selection tracepoint. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2016-09-23 Only two patches this time: 1) Fix a comment reference to struct xfrm_replay_state_esn. From Richard Guy Briggs. 2) Convert xfrm_state_lookup to rcu, we don't need the xfrm_state_lock anymore in the input path. From Florian Westphal. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24net: Update API for VF vlan protocol 802.1ad supportMoshe Shemesh
Introduce new rtnl UAPI that exposes a list of vlans per VF, giving the ability for user-space application to specify it for the VF, as an option to support 802.1ad. We adjusted IP Link tool to support this option. For future use cases, the new UAPI supports multiple vlans. For now we limit the list size to a single vlan in kernel. Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward compatibility with older versions of IP Link tool. Add a vlan protocol parameter to the ndo_set_vf_vlan callback. We kept 802.1Q as the drivers' default vlan protocol. Suitable ip link tool command examples: Set vf vlan protocol 802.1ad: ip link set eth0 vf 1 vlan 100 proto 802.1ad Set vf to VST (802.1Q) mode: ip link set eth0 vf 1 vlan 100 proto 802.1Q Or by omitting the new parameter ip link set eth0 vf 1 vlan 100 Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23rxrpc: Add a tracepoint to log which packets will be retransmittedDavid Howells
Add a tracepoint to log in rxrpc_resend() which packets will be retransmitted. Note that if a positive ACK comes in whilst we have dropped the lock to retransmit another packet, the actual retransmission may not happen, though some of the effects will (such as altering the congestion management). Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Add tracepoint for ACK proposalDavid Howells
Add a tracepoint to log proposed ACKs, including whether the proposal is used to update a pending ACK or is discarded in favour of an easlier, higher priority ACK. Whilst we're at it, get rid of the rxrpc_acks() function and access the name array directly. We do, however, need to validate the ACK reason number given to trace_rxrpc_rx_ack() to make sure we don't overrun the array. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Add a tracepoint to log injected Rx packet lossDavid Howells
Add a tracepoint to log received packets that get discarded due to Rx packet loss. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Add data Tx tracepoint and adjust Tx ACK tracepointDavid Howells
Add a tracepoint to log transmission of DATA packets (including loss injection). Adjust the ACK transmission tracepoint to include the packet serial number and to line this up with the DATA transmission display. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Add a tracepoint for the call timerDavid Howells
Add a tracepoint to log call timer initiation, setting and expiry. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Don't call the tx_ack tracepoint if don't generate an ACKDavid Howells
rxrpc_send_call_packet() is invoking the tx_ack tracepoint before it checks whether there's an ACK to transmit (another thread may jump in and transmit it). Fix this by only invoking the tracepoint if we get a valid ACK to transmit. Further, only allocate a serial number if we're going to actually transmit something. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Pass the last Tx packet marker in the annotation bufferDavid Howells
When the last packet of data to be transmitted on a call is queued, tx_top is set and then the RXRPC_CALL_TX_LAST flag is set. Unfortunately, this leaves a race in the ACK processing side of things because the flag affects the interpretation of tx_top and also allows us to start receiving reply data before we've finished transmitting. To fix this, make the following changes: (1) rxrpc_queue_packet() now sets a marker in the annotation buffer instead of setting the RXRPC_CALL_TX_LAST flag. (2) rxrpc_rotate_tx_window() detects the marker and sets the flag in the same context as the routines that use it. (3) rxrpc_end_tx_phase() is simplified to just shift the call state. The Tx window must have been rotated before calling to discard the last packet. (4) rxrpc_receiving_reply() is added to handle the arrival of the first DATA packet of a reply to a client call (which is an implicit ACK of the Tx phase). (5) The last part of rxrpc_input_ack() is reordered to perform Tx rotation, then soft-ACK application and then to end the phase if we've rotated the last packet. In the event of a terminal ACK, the soft-ACK application will be skipped as nAcks should be 0. (6) rxrpc_input_ackall() now has to rotate as well as ending the phase. In addition: (7) Alter the transmit tracepoint to log the rotation of the last packet. (8) Remove the no-longer relevant queue_reqack tracepoint note. The ACK-REQUESTED packet header flag is now set as needed when we actually transmit the packet and may vary by retransmission. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Fix call timerDavid Howells
Fix the call timer in the following ways: (1) If call->resend_at or call->ack_at are before or equal to the current time, then ignore that timeout. (2) If call->expire_at is before or equal to the current time, then don't set the timer at all (possibly we should queue the call). (3) Don't skip modifying the timer if timer_pending() is true. This indicates that the timer is working, not that it has expired and is running/waiting to run its expiry handler. Also call rxrpc_set_timer() to start the call timer going rather than calling add_timer(). Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Fix accidental cancellation of scheduled resend by ACK parserDavid Howells
When rxrpc_input_soft_acks() is parsing the soft-ACKs from an ACK packet, it updates the Tx packet annotations in the annotation buffer. If a soft-ACK is an ACK, then we overwrite unack'd, nak'd or to-be-retransmitted states and that is fine; but if the soft-ACK is an NACK, we overwrite the to-be-retransmitted with a nak - which isn't. Instead, we need to let any scheduled retransmission stand if the packet was NAK'd. Note that we don't reissue a resend if the annotation is in the to-be-retransmitted state because someone else must've scheduled the resend already. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Need to start the resend timer on initial transmissionDavid Howells
When a DATA packet has its initial transmission, we may need to start or adjust the resend timer. Without this we end up relying on being sent a NACK to initiate the resend. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Use before_eq() and friends to compare serial numbersDavid Howells
before_eq() and friends should be used to compare serial numbers (when not checking for (non)equality) rather than casting to int, subtracting and checking the result. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23bpf: add helper to invalidate hashDaniel Borkmann
Add a small helper that complements 36bbef52c7eb ("bpf: direct packet write and access for helpers for clsact progs") for invalidating the current skb->hash after mangling on headers via direct packet write. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23bpf: use bpf_get_smp_processor_id_proto instead of raw oneDaniel Borkmann
Same motivation as in commit 80b48c445797 ("bpf: don't use raw processor id in generic helper"), but this time for XDP typed programs. Thus, allow for preemption checks when we have DEBUG_PREEMPT enabled, and otherwise use the raw variant. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23bpf: use skb_to_full_sk helper in bpf_skb_under_cgroupDaniel Borkmann
We need to use skb_to_full_sk() helper introduced in commit bd5eb35f16a9 ("xfrm: take care of request sockets") as otherwise we miss tcp synack messages, since ownership is on request socket and therefore it would miss the sk_fullsock() check. Use skb_to_full_sk() as also done similarly in the bpf_get_cgroup_classid() helper via 2309236c13fe ("cls_cgroup: get sk_classid only from full sockets") fix to not let this fall through. Fixes: 4a482f34afcc ("cgroup: bpf: Add bpf_skb_in_cgroup_proto") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23net: dsa: add port fast ageingVivien Didelot
Today the DSA drivers are in charge of flushing the MAC addresses associated to a port when its STP state changes from Learning or Forwarding, to Disabled or Blocking or Listening. This makes the drivers more complex and hides the generic switch logic. Introduce a new optional port_fast_age operation to dsa_switch_ops, to move this logic to the DSA layer and keep drivers simple. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23net: dsa: add port STP state helperVivien Didelot
Add a void helper to set the STP state of a port, checking first if the required routine is provided by the driver. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23rxrpc: Should be using ktime_add_ms() not ktime_add_ns()David Howells
ktime_add_ms() should be used to add the resend time (in ms) rather than ktime_add_ns(). Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Make sure sendmsg() is woken on call completionDavid Howells
Make sure that sendmsg() gets woken up if the call it is waiting for completes abnormally. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Don't send an ACK at the end of service call response transmissionDavid Howells
Don't send an IDLE ACK at the end of the transmission of the response to a service call. The service end resends DATA packets until the client sends an ACK that hard-acks all the send data. At that point, the call is complete. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Preset timestamp on Tx sk_buffsDavid Howells
Set the timestamp on sk_buffs holding packets to be transmitted before queueing them because the moment the packet is on the queue it can be seen by the retransmission algorithm - which may see a completely random timestamp. If the retransmission algorithm sees such a timestamp, it may retransmit the packet and, in future, tell the congestion management algorithm that the retransmit timer expired. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23net_sched: sch_fq: account for schedule/timers driftsEric Dumazet
It looks like the following patch can make FQ very precise, even in VM or stressed hosts. It matters at high pacing rates. We take into account the difference between the time that was programmed when last packet was sent, and current time (a drift of tens of usecs is often observed) Add an EWMA of the unthrottle latency to help diagnostics. This latency is the difference between current time and oldest packet in delayed RB-tree. This accounts for the high resolution timer latency, but can be different under stress, as fq_check_throttled() can be opportunistically be called from a dequeue() called after an enqueue() for a different flow. Tested: // Start a 10Gbit flow $ netperf --google-pacing-rate 1250000000 -H lpaa24 -l 10000 -- -K bbr & Before patch : $ sar -n DEV 10 5 | grep eth0 | grep Average Average: eth0 17106.04 756876.84 1102.75 1119049.02 0.00 0.00 0.52 After patch : $ sar -n DEV 10 5 | grep eth0 | grep Average Average: eth0 17867.00 800245.90 1151.77 1183172.12 0.00 0.00 0.52 A new iproute2 tc can output the 'unthrottle latency' : $ tc -s qd sh dev eth0 | grep latency 0 gc, 0 highprio, 32490767 throttled, 2382 ns latency Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23sctp: fix the handling of SACK Gap Ack blocksMarcelo Ricardo Leitner
sctp_acked() is using 32bit arithmetics on 16bits vars, via TSN_lte() macros, which is weird and confusing. Once the offset to ctsn is calculated, all wrapping is already handled and thus to verify the Gap Ack blocks we can just use pure less/big-or-equal than checks. Also, rename gap variable to tsn_offset, so it's more meaningful, as it doesn't point to any gap at all. Even so, I don't think this discrepancy resulted in any practical bug. This patch is a preparation for the next one, which will introduce typecheck() for TSN_lte() macros and would cause a compile error here. Suggested-by: David Laight <David.Laight@ACULAB.COM> Reported-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23net_sched: check NULL on error path in route4_change()WANG Cong
On error path in route4_change(), 'f' could be NULL, so we should check NULL before calling tcf_exts_destroy(). Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()") Reported-by: kbuild test robot <fengguang.wu@intel.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2016-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Mostly small bits scattered all over the place, which is usually how things go this late in the -rc series. 1) Proper driver init device resets in bnx2, from Baoquan He. 2) Fix accounting overflow in __tcp_retransmit_skb(), sk_forward_alloc, and ip_idents_reserve, from Eric Dumazet. 3) Fix crash in bna driver ethtool stats handling, from Ivan Vecera. 4) Missing check of skb_linearize() return value in mac80211, from Johannes Berg. 5) Endianness fix in nf_table_trace dumps, from Liping Zhang. 6) SSN comparison fix in SCTP, from Marcelo Ricardo Leitner. 7) Update DSA and b44 MAINTAINERS entries. 8) Make input path of vti6 driver work again, from Nicolas Dichtel. 9) Off-by-one in mlx4, from Sebastian Ott. 10) Fix fallback route lookup handling in ipv6, from Vincent Bernat. 11) Fix stack corruption on probe in qed driver, from Yuval Mintz. 12) PHY init fixes in r8152 from Hayes Wang. 13) Missing SKB free in irda_accept error path, from Phil Turnbull" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits) tcp: properly account Fast Open SYN-ACK retrans tcp: fix under-accounting retransmit SNMP counters MAINTAINERS: Update b44 maintainer. net: get rid of an signed integer overflow in ip_idents_reserve() net/mlx4_core: Fix to clean devlink resources net: can: ifi: Configure transmitter delay vti6: fix input path ipmr, ip6mr: return lastuse relative to now r8152: disable ALDPS and EEE before setting PHY r8152: remove r8153_enable_eee r8152: move PHY settings to hw_phy_cfg r8152: move enabling PHY r8152: move some functions cxgb4/cxgb4vf: Allocate more queues for 25G and 100G adapter qed: Fix stack corruption on probe MAINTAINERS: Add an entry for the core network DSA code net: ipv6: fallback to full lookup if table lookup is unsuitable net/mlx5: E-Switch, Handle mode change failures net/mlx5: E-Switch, Fix error flow in the SRIOV e-switch init code net/mlx5: Fix flow counter bulk command out mailbox allocation ...
2016-09-22Merge tag 'rxrpc-rewrite-20160922-v2' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Preparation for slow-start algorithm [ver #2] Here are some patches that prepare for improvements in ACK generation and for the implementation of the slow-start part of the protocol: (1) Stop storing the protocol header in the Tx socket buffers, but rather generate it on the fly. This potentially saves a little space and makes it easier to alter the header just before transmission (the flags may get altered and the serial number has to be changed). (2) Mask off the Tx buffer annotations and add a flag to record which ones have already been resent. (3) Track RTT on a per-peer basis for use in future changes. Tracepoints are added to log this. (4) Send PING ACKs in response to incoming calls to elicit a PING-RESPONSE ACK from which RTT data can be calculated. The response also carries other useful information. (5) Expedite PING-RESPONSE ACK generation from sendmsg. If we're actively using sendmsg, this allows us, under some circumstances, to avoid having to rely on the background work item to run to generate this ACK. This requires ktime_sub_ms() to be added. (6) Set the REQUEST-ACK flag on some DATA packets to elicit ACK-REQUESTED ACKs from which RTT data can be calculated. (7) Limit the use of pings and ACK requests for RTT determination. Changes: (V2) Don't use the C division operator for 64-bit division. One instance should use do_div() and the other should be using nsecs_to_jiffies(). The last two patches got transposed, leading to an undefined symbol in one of them. Reported-by: kbuild test robot <lkp@intel.com> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22rxrpc: Reduce the number of PING ACKs sentDavid Howells
We don't want to send a PING ACK for every new incoming call as that just adds to the network traffic. Instead, we send a PING ACK to the first three that we receive and then once per second thereafter. This could probably be made adjustable in future. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-22rxrpc: Reduce the number of ACK-Requests sentDavid Howells
Reduce the number of ACK-Requests we set on DATA packets that we're sending to reduce network traffic. We set the flag on odd-numbered DATA packets to start off the RTT cache until we have at least three entries in it and then probe once per second thereafter to keep it topped up. This could be made tunable in future. Note that from this point, the RXRPC_REQUEST_ACK flag is set on DATA packets as we transmit them and not stored statically in the sk_buff. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-22tcp: properly account Fast Open SYN-ACK retransYuchung Cheng
Since the TFO socket is accepted right off SYN-data, the socket owner can call getsockopt(TCP_INFO) to collect ongoing SYN-ACK retransmission or timeout stats (i.e., tcpi_total_retrans, tcpi_retransmits). Currently those stats are only updated upon handshake completes. This patch fixes it. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22tcp: fix under-accounting retransmit SNMP countersYuchung Cheng
This patch fixes these under-accounting SNMP rtx stats LINUX_MIB_TCPFORWARDRETRANS LINUX_MIB_TCPFASTRETRANS LINUX_MIB_TCPSLOWSTARTRETRANS when retransmitting TSO packets Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time") Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22rxrpc: Obtain RTT data by requesting ACKs on DATA packetsDavid Howells
In addition to sending a PING ACK to gain RTT data, we can set the RXRPC_REQUEST_ACK flag on a DATA packet and get a REQUESTED-ACK ACK. The ACK packet contains the serial number of the packet it is in response to, so we can look through the Tx buffer for a matching DATA packet. This requires that the data packets be stamped with the time of transmission as a ktime rather than having the resend_at time in jiffies. This further requires the resend code to do the resend determination in ktimes and convert to jiffies to set the timer. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-22rxrpc: Expedite ping response transmissionDavid Howells
Expedite the transmission of a response to a PING ACK by sending it from sendmsg if one is pending. We're most likely to see a PING ACK during the client call Tx phase as the other side may use it to determine a number of parameters, such as the client's receive window size, the RTT and whether the client is doing slow start (similar to RFC5681). If we don't expedite it, it's left to the background processing thread to transmit. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-22rxrpc: Send pings to get RTT dataDavid Howells
Send a PING ACK packet to the peer when we get a new incoming call from a peer we don't have a record for. The PING RESPONSE ACK packet will tell us the following about the peer: (1) its receive window size (2) its MTU sizes (3) its support for jumbo DATA packets (4) if it supports slow start (similar to RFC 5681) (5) an estimate of the RTT This is necessary because the peer won't normally send us an ACK until it gets to the Rx phase and we send it a packet, but we would like to know some of this information before we start sending packets. A pair of tracepoints are added so that RTT determination can be observed. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-22sctp: make use of SCTP_TRUNC4 macroMarcelo Ricardo Leitner
And avoid the usage of '&~3'. This is the last place still not using the macro. Also break the line to make it easier to read. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22sctp: rename WORD_TRUNC/ROUND macrosMarcelo Ricardo Leitner
To something more meaningful these days, specially because this is working on packet headers or lengths and which are not tied to any CPU arch but to the protocol itself. So, WORD_TRUNC becomes SCTP_TRUNC4 and WORD_ROUND becomes SCTP_PAD4. Reported-by: David Laight <David.Laight@ACULAB.COM> Reported-by: David Miller <davem@davemloft.net> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2016-09-21 1) Propagate errors on security context allocation. From Mathias Krause. 2) Fix inbound policy checks for inter address family tunnels. From Thomas Zeitlhofer. 3) Fix an old memory leak on aead algorithm usage. From Ilan Tayari. 4) A recent patch fixed a possible NULL pointer dereference but broke the vti6 input path. Fix from Nicolas Dichtel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22tcp: implement TSQ for retransmitsEric Dumazet
We saw sch_fq drops caused by the per flow limit of 100 packets and TCP when dealing with large cwnd and bursts of retransmits. Even after increasing the limit to 1000, and even after commit 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time"), we can still have these drops. Under certain conditions, TCP can spend a considerable amount of time queuing thousands of skbs in a single tcp_xmit_retransmit_queue() invocation, incurring latency spikes and stalls of other softirq handlers. This patch implements TSQ for retransmits, limiting number of packets and giving more chance for scheduling packets in both ways. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22net: get rid of an signed integer overflow in ip_idents_reserve()Eric Dumazet
Jiri Pirko reported an UBSAN warning happening in ip_idents_reserve() [] UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:11 [] signed integer overflow: [] -2117905507 + -695755206 cannot be represented in type 'int' Since we do not have uatomic_add_return() yet, use atomic_cmpxchg() so that the arithmetics can be done using unsigned int. Fixes: 04ca6973f7c1 ("ip: make IP identifiers less predictable") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22net: skbuff: Coding: Use eth_type_vlan() instead of open coding itShmulik Ladkani
Fix 'skb_vlan_pop' to use eth_type_vlan instead of directly comparing skb->protocol to ETH_P_8021Q or ETH_P_8021AD. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Reviewed-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22net: skbuff: Remove errornous length validation in skb_vlan_pop()Shmulik Ladkani
In 93515d53b1 "net: move vlan pop/push functions into common code" skb_vlan_pop was moved from its private location in openvswitch to skbuff common code. In case skb has non hw-accel vlan tag, the original 'pop_vlan()' assured that skb->len is sufficient (if skb->len < VLAN_ETH_HLEN then pop was considered a no-op). This validation was moved as is into the new common 'skb_vlan_pop'. Alas, in its original location (openvswitch), there was a guarantee that 'data' points to the mac_header, therefore the 'skb->len < VLAN_ETH_HLEN' condition made sense. However there's no such guarantee in the generic 'skb_vlan_pop'. For short packets received in rx path going through 'skb_vlan_pop', this causes 'skb_vlan_pop' to fail pop-ing a valid vlan hdr (in the non hw-accel case) or to fail moving next tag into hw-accel tag. Remove the 'skb->len < VLAN_ETH_HLEN' condition entirely: It is superfluous since inner '__skb_vlan_pop' already verifies there are VLAN_ETH_HLEN writable bytes at the mac_header. Note this presents a slight change to skb_vlan_pop() users: In case total length is smaller than VLAN_ETH_HLEN, skb_vlan_pop() now returns an error, as opposed to previous "no-op" behavior. Existing callers (e.g. tc act vlan, ovs) usually drop the packet if 'skb_vlan_pop' fails. Fixes: 93515d53b1 ("net: move vlan pop/push functions into common code") Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Cc: Pravin Shelar <pshelar@ovn.org> Reviewed-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22net/sched: act_vlan: Introduce TCA_VLAN_ACT_MODIFY vlan actionShmulik Ladkani
TCA_VLAN_ACT_MODIFY allows one to change an existing tag. It accepts same attributes as TCA_VLAN_ACT_PUSH (protocol, id, priority). If packet is vlan tagged, then the tag gets overwritten according to user specified attributes. For example, this allows user to replace a tag's vid while preserving its priority bits (as opposed to "action vlan pop pipe action vlan push"). Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22net: skbuff: Export __skb_vlan_popShmulik Ladkani
This exports the functionality of extracting the tag from the payload, without moving next vlan tag into hw accel tag. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22rxrpc: Add per-peer RTT trackerDavid Howells
Add a function to track the average RTT for a peer. Sources of RTT data will be added in subsequent patches. The RTT data will be useful in the future for determining resend timeouts and for handling the slow-start part of the Rx protocol. Also add a pair of tracepoints, one to log transmissions to elicit a response for RTT purposes and one to log responses that contribute RTT data. Signed-off-by: David Howells <dhowells@redhat.com>