Age | Commit message (Collapse) | Author |
|
Add support to allow non-local binds similar to how this was done for IPv4.
Non-local binds are very useful in emulating the Internet in a box, etc.
This add the ip_nonlocal_bind sysctl under ipv6.
Testing:
Set up nonlocal binding and receive routing on a host, e.g.:
ip -6 rule add from ::/0 iif eth0 lookup 200
ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200
sysctl -w net.ipv6.ip_nonlocal_bind=1
Set up routing to 2001:0:0:1::/64 on peer to go to first host
ping6 -I 2001:0:0:1::1 peer-address -- to verify
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Eric Dumazet says:
====================
inet: timewait cleanups
Another round of patches to make tw handling simpler.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
inet_twsk_deschedule() calls are followed by inet_twsk_put().
Only particular case is in inet_twsk_purge() but there is no point
to defer the inet_twsk_put() after re-enabling BH.
Lets rename inet_twsk_deschedule() to inet_twsk_deschedule_put()
and move the inet_twsk_put() inside.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
timewait sockets have a complex refcounting logic.
Once we realize it should be similar to established and
syn_recv sockets, we can use sk_nulls_del_node_init_rcu()
and remove inet_twsk_unhash()
In particular, deferred inet_twsk_put() added in commit
13475a30b66cd ("tcp: connect() race with timewait reuse")
looks unecessary : When removing a timewait socket from
ehash or bhash, caller must own a reference on the socket
anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Kernel will crash the same if one of the pointer is NULL anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Hop was always either 0 or sizeof(struct ipv6hdr).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
MV88E6320 and MV88E6321 are largely compatible to MV886352,
but are members of a different chip family.
Signed-off-by: Aleksey S. Kazantsev <ioctl@yandex.ru>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Yuchung Cheng says:
====================
tcp: fixes some congestion control corner cases
This patch series fixes corner cases of TCP congestion control.
First issue is to avoid continuing slow start when cwnd reaches ssthresh.
Second issue is incorrectly processing order of congestion state and
cwnd update when entering fast recovery or undoing cwnd.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The congestion state and cwnd can be updated in the wrong order.
For example, upon receiving a dubious ACK, we incorrectly raise
the cwnd first (tcp_may_raise_cwnd()/tcp_cong_avoid()) because
the state is still Open, then enter recovery state to reduce cwnd.
For another example, if the ACK indicates spurious timeout or
retransmits, we first revert the cwnd reduction and congestion
state back to Open state. But we don't raise the cwnd even though
the ACK does not indicate any congestion.
To fix this problem we should first call tcp_fastretrans_alert() to
process the dubious ACK and update the congestion state, then call
tcp_may_raise_cwnd() that raises cwnd based on the current state.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the original design slow start is only used to raise cwnd
when cwnd is stricly below ssthresh. It makes little sense
to slow start when cwnd == ssthresh: especially
when hystart has set ssthresh in the initial ramp, or after
recovery when cwnd resets to ssthresh. Not doing so will
also help reduce the buffer bloat slightly.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a helper to test the slow start condition in various congestion
control modules and other places. This is to prepare a slight improvement
in policy as to exactly when to slow start.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This change makes it so that the call skb_defer_rx_timestamp will first
check for a phydev before going in and manipulating the skb->data and
skb->len values. By doing this we can avoid unnecessary work on network
devices that don't support phydev. As a result we reduce the total
instruction count needed to process this on most devices.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
V1 of this patch contains Eric Dumazet's suggestion to move the per
dst RTAX_QUICKACK check into tcp_in_quickack_mode(). Thanks Eric.
I ran some tests and after setting the "ip route change quickack 1"
knob there were still many delayed ACKs sent. This occured
because when icsk_ack.quick=0 the !icsk_ack.pingpong value is
subsequently ignored as tcp_in_quickack_mode() checks both these
values. The condition for a quick ack to trigger requires
that both icsk_ack.quick != 0 and icsk_ack.pingpong=0. Currently
only icsk_ack.pingpong is controlled by the knob. But the
icsk_ack.quick value changes dynamically depending on heuristics.
The crux of the matter is that delayed acks still cannot be entirely
disabled even with the RTAX_QUICKACK per dst knob enabled. This
patch ensures that a quick ack is always sent when the RTAX_QUICKACK
per dst knob is turned on.
The "ip route change quickack 1" knob was recently added to enable
quickacks. It was modeled around the TCP_QUICKACK setsockopt() option.
This issue is that even with "ip route change quickack 1" enabled
we still see delayed ACKs under some conditions. It would be nice
to be able to completely disable delayed ACKs.
Here is an example:
# netstat -s|grep dela
3 delayed acks sent
For all routes enable the knob
# ip route change quickack 1
Generate some traffic across a slow link and we still see the delayed
acks.
# netstat -s|grep dela
106 delayed acks sent
1 delayed acks further delayed because of locked socket
The issue is that both the "ip route change quickack 1" knob and
the TCP_QUICKACK option set the icsk_ack.pingpong variable to 0.
However at the business end in the __tcp_ack_snd_check() routine,
tcp_in_quickack_mode() checks that both icsk_ack.quick != 0
and icsk_ack.pingpong=0 in order to trigger a quickack. As
icsk_ack.quick is determined by heuristics it can be 0. When
that occurs the icsk_ack.pingpong value is ignored and a delayed
ACK is sent regardless.
This patch moves the RTAX_QUICKACK per dst check into the
tcp_in_quickack_mode() routine which ensures that a quickack is
always sent when the quickack knob is enabled for that dst.
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement ndo_change_mtu: on MTU change, reallocate Rx ring bufs and signal
HW of new port MTU value.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Tested-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use module_pci_driver for drivers whose init and exit functions
only register and unregister, respectively.
A simplified version of the Coccinelle semantic patch that performs
this transformation is as follows:
@a@
identifier f, x;
@@
-static f(...) { return pci_register_driver(&x); }
@b depends on a@
identifier e, a.x;
statement S;
@@
-static e(...) {
-pci_unregister_driver(&x);
-DBG_PRINT(INIT_DBG,"S");
- }
@c depends on a && b@
identifier a.f;
declarer name module_init;
@@
-module_init(f);
@d depends on a && b && c@
identifier b.e, a.x;
declarer name module_exit;
declarer name module_pci_driver;
@@
-module_exit(e);
+module_pci_driver(x);
Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If we don't have access to the new User GTS (T5+), use the old doorbell
mechanism; otherwise use the new BAR2 mechanism.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently "ALU_END_FROM_BE 32" and "ALU_END_FROM_LE 32" do not test if
the upper bits of the result are zeros (the arm64 JIT had such bugs).
Extend the two tests to catch this.
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Hariprasad Shenai says:
====================
Cleanup, T6 changes and register range update
This patch series adds the following:
Don't use entire L2T table, update register ranges for T6 adapter,
read stats for only available channels for T6 and enable cim_la dump for
T6 adapter also.
This patch series has been created against net-next tree and includes
patches on cxgb4 driver.
We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.
====================
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Updating the driver to read the stats of only available channels. T6 and
later has only 2 channels
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The driver was retrieving the parameters for the bounds of its
slice of the L2T from the firmware and then throwing those away and
using the entire table. This corrects that problem.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use module_pci_driver for drivers whose init and exit functions
only register and unregister, respectively.
A simplified version of the Coccinelle semantic patch that performs
this transformation is as follows:
@a@
identifier f, x;
@@
-static f(...) { return pci_register_driver(&x); }
@b depends on a@
identifier e, a.x;
@@
-static e(...) { pci_unregister_driver(&x); }
@c depends on a && b@
identifier a.f;
declarer name module_init;
@@
-module_init(f);
@d depends on a && b && c@
identifier b.e, a.x;
declarer name module_exit;
declarer name module_pci_driver;
@@
-module_exit(e);
+module_pci_driver(x);
Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When packet encapsulation is in use, the MTU needs to be reduced for
headroom reservation.
The existing code takes the updated MTU value only from the host side.
But vSwitch extensions, such as Open vSwitch, require the flexibility
to change the MTU to different values from within a guest during the
lifecycle of a vNIC, when the encapsulation protocol is changed. The
patch supports this kind of MTU changes.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add multiqueue capabilities to ifb netdevice.
This removes last bottleneck for ingress when mq qdisc can be used
to shard load from multiple RX queues on physical device.
Tested:
# netem based setup, installed at receiver side
ETH=eth0
IFB=ifb10
EST="est 1sec 4sec" # Optional rate estimator
RTT_HALF=2ms
#REORDER=20us
#LOSS="loss 1"
TXQ=8
ip link add ifb10 numtxqueues $TXQ type ifb
ip link set dev $IFB up
tc qdisc add dev $ETH ingress 2>/dev/null
tc filter add dev $ETH parent ffff: \
protocol ip u32 match u32 0 0 flowid 1:1 \
action mirred egress redirect dev $IFB
tc qdisc del dev $IFB root 2>/dev/null
tc qdisc add dev $IFB root handle 1: mq
for i in `seq 1 $TXQ`
do
slot=$( printf %x $(( i )) )
tc qd add dev $IFB parent 1:$slot $EST netem \
limit 100000 delay $RTT_HALF $REORDER $LOSS
done
lpaa24:~# tc -s -d qd sh dev ifb10
qdisc mq 1: root
Sent 316544766 bytes 5265927 pkt (dropped 0, overlimits 0 requeues 0)
backlog 98880b 1648p requeues 0
qdisc netem 8002: parent 1:1 limit 100000 delay 2.0ms
Sent 39601416 bytes 658721 pkt (dropped 0, overlimits 0 requeues 0)
rate 38235Kbit 79657pps backlog 12240b 204p requeues 0
qdisc netem 8003: parent 1:2 limit 100000 delay 2.0ms
Sent 39472866 bytes 657227 pkt (dropped 0, overlimits 0 requeues 0)
rate 38234Kbit 79655pps backlog 10620b 176p requeues 0
qdisc netem 8004: parent 1:3 limit 100000 delay 2.0ms
Sent 39703417 bytes 659699 pkt (dropped 0, overlimits 0 requeues 0)
rate 38320Kbit 79831pps backlog 12780b 213p requeues 0
qdisc netem 8005: parent 1:4 limit 100000 delay 2.0ms
Sent 39565149 bytes 658011 pkt (dropped 0, overlimits 0 requeues 0)
rate 38174Kbit 79530pps backlog 11880b 198p requeues 0
qdisc netem 8006: parent 1:5 limit 100000 delay 2.0ms
Sent 39506078 bytes 657354 pkt (dropped 0, overlimits 0 requeues 0)
rate 38195Kbit 79571pps backlog 12480b 208p requeues 0
qdisc netem 8007: parent 1:6 limit 100000 delay 2.0ms
Sent 39675994 bytes 658849 pkt (dropped 0, overlimits 0 requeues 0)
rate 38323Kbit 79838pps backlog 12600b 210p requeues 0
qdisc netem 8008: parent 1:7 limit 100000 delay 2.0ms
Sent 39532042 bytes 658367 pkt (dropped 0, overlimits 0 requeues 0)
rate 38177Kbit 79536pps backlog 13140b 219p requeues 0
qdisc netem 8009: parent 1:8 limit 100000 delay 2.0ms
Sent 39488164 bytes 657705 pkt (dropped 0, overlimits 0 requeues 0)
rate 38192Kbit 79568pps backlog 13Kb 222p requeues 0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add extra check for total vfs for SRIOV to check if that value is
bigger than total vfs in pci SRIOV capabalities. Fix a check and
print of the number of maximum vfs that hw can handle. Fix a check
and print of the number of maximum vfs per port that driver can handle.
Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The trace bpf samples do not compile on s390x because they use x86
specific fields from the "pt_regs" structure.
Fix this and access the fields via new PT_REGS macros.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Enable SG support for Zynq SOC family devices.
Signed-off-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are two duplicated xenvif_zerocopy_callback() definitions.
Remove one of them.
Signed-off-by: Liang Li <liang.z.li@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Eric Dumazet says:
====================
net_sched: act: lockless operation
As mentioned by Alexei last week in Budapest, it is a bit weird
to take a spinlock in order to drop a packet in a tc filter...
Lets add percpu infra for tc actions and use it for gact & mirred.
Before changes, my host with 8 RX queues was handling 5 Mpps with gact,
and more than 11 Mpps after.
Mirred change is not yet visible if ifb+qdisc is used, as ifb is
not yet multi queue enabled, but is a step forward.
====================
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Like act_gact, act_mirred can be lockless in packet processing
1) Use percpu stats
2) update lastuse only every clock tick to avoid false sharing
3) use rcu to protect tcfm_dev
4) Remove spinlock usage, as it is no longer needed.
Next step : add multi queue capability to ifb device
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Final step for gact RCU operation :
1) Use percpu stats
2) update lastuse only every clock tick to avoid false sharing
3) Remove spinlock acquisition, as it is no longer needed.
Since this is the last contended lock in packet RX when tc gact is used,
this gives impressive gain.
My host with 8 RX queues was handling 5 Mpps before the patch,
and more than 11 Mpps after patch.
Tested:
On receiver :
dev=eth0
tc qdisc del dev $dev ingress 2>/dev/null
tc qdisc add dev $dev ingress
tc filter del dev $dev root pref 10 2>/dev/null
tc filter del dev $dev pref 10 2>/dev/null
tc filter add dev $dev est 1sec 4sec parent ffff: protocol ip prio 1 \
u32 match ip src 7.0.0.0/8 flowid 1:15 action drop
Sender sends packets flood from 7/8 network
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Third step for gact RCU operation :
Following patch will get rid of spinlock protection,
so we need to read tcfg_ptype once.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Second step for gact RCU operation :
We want to get rid of the spinlock protecting gact operations.
Stats (packets/bytes) will soon be per cpu.
gact_determ() would not work without a central packet counter,
so lets add it for this mode.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
First step for gact RCU operation :
Instead of testing if tcfg_pval is zero or not, just make it 1.
No change in behavior, but slightly faster code.
The smp_rmb()/smp_wmb() barriers, while not strictly needed at this
stage are added for upcoming spinlock removal.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Reuse existing percpu infrastructure John Fastabend added for qdisc.
This patch adds a new cpustats parameter to tcf_hash_create() and all
actions pass false, meaning this patch should have no effect yet.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
qdisc_bstats_update_cpu() and other helpers were added to support
percpu stats for qdisc.
We want to add percpu stats for tc action, so this patch add common
helpers.
qdisc_bstats_update_cpu() is renamed to qdisc_bstats_cpu_update()
qdisc_qstats_drop_cpu() is renamed to qdisc_qstats_cpu_drop()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Mellanox driver has the knowledge if rxhash is a L4 hash,
if it receives a non fragmented TCP or UDP frame and
NETIF_F_RXCSUM is enabled on netdev.
ip_summed value is CHECKSUM_UNNECESSARY in this case.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Cc: Ido Shamay <idos@mellanox.com>
Acked-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Yuchung Cheng says:
====================
tcp: reducing lost retransmits in recovery
This patch series reduces lost retransmits in recovery, in particular
when dealing with traffic policers. The main problem is that
slow start in recovery under policing can cause massive lost and
retransmit storms: any excess sending rate turns into drops. The
solution is to avoid doing slow start when lost retransmit is
detected and use packet conservation instead.
On networks with traffic policers the patches have lowered the
TCP loss rates by ~20% from Google servers without latency regressions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PRR slow start is often too aggressive especially when drops are
caused by traffic policers. The policers mainly use token bucket
to enforce the rate so sending (twice) faster than the delivery
rate causes excessive drops.
This patch changes PRR to the conservative reduction bound
(CRB) mode in RFC 6937 by default. CRB follows the packet
conservation rule to send at most the delivery rate by default.
But if many packets are lost and the pipe is empty, CRB may take N
round trips to repair N losses. We conditionally turn on slow start
mode if all these conditions are made to speed up the recovery:
1) on the second round or later in recovery
2) retransmission sent in the previous round is delivered on this ACK
3) no retransmission is marked lost on this ACK
By using packet conservation by default, this change reduces the loss
retransmits signicantly on networks that deploy traffic policers,
up to 20% reduction of overall loss rate.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the retransmission in CA_Loss is lost again, we should not
continue to slow start or raise cwnd in congestion avoidance mode.
Instead we should enter fast recovery and use PRR to reduce cwnd,
following the principle in RFC5681:
"... or the loss of a retransmission, should be taken as two
indications of congestion and, therefore, cwnd (and ssthresh) MUST
be lowered twice in this case."
This is especially important to reduce loss when the CA_Loss
state was caused by a traffic policer dropping the entire inflight.
The CA_Loss state has a problem where a loss of L packets causes the
sender to send a burst of L packets. So a policer that's dropping
most packets in a given RTT can cause a huge retransmit storm. By
contrast, PRR includes logic to bound the number of outbound packets
that result from a given ACK. So switching to CA_Recovery on lost
retransmits in CA_Loss avoids this retransmit storm problem when
in CA_Loss.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Part of commit 49aa284fe64c4c1 ("cxgb4: Add support for devlog")
change introduced a real bug where the Device Log Sequence Numbers are
no longer being converted from firmware Big-Endian to local CPU-Endian
format.
This patch moves all of the translation into the devlog_show() routine.
The only endianness code now in devlog_open() is the small loop to find the
earliest (lowest Sequence Number) Device Log entry in the circular buffer.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before commit daad151263cf ("ipv6: Make ipv6_is_mld() inline and use it
from ip6_mc_input().") MLD packets were only processed locally. After the
change, a copy of MLD packet goes through ip6_mr_input, causing
MRT6MSG_NOCACHE message to be generated to user space.
Make MLD packet only processed locally.
Fixes: daad151263cf ("ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().")
Signed-off-by: Hermin Anggawijaya <hermin.anggawijaya@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The module_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The module_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The free_percpu() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vlan ids 0 and 4095 were disallowed by commit:
8adff41c3d25 ("bridge: Don't use VID 0 and 4095 in vlan filtering")
but then the check was removed when vlan ranges were introduced by:
bdced7ef7838 ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests")
So reintroduce the vlan range check.
Before patch:
[root@testvm ~]# bridge vlan add vid 0 dev eth0 master
(succeeds)
After Patch:
[root@testvm ~]# bridge vlan add vid 0 dev eth0 master
RTNETLINK answers: Invalid argument
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: bdced7ef7838 ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests")
Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:
====================
pull request: bluetooth 2015-07-02
A couple of regressions crept in because of a patch to use proper list
APIs rather than manually reading & writing the next/prev pointers
(commit 835a6a2f8603237a3e6cded5a6765090ecb06ea5). Turns out this was
masking a few bugs: a missing INIT_LIST_HEAD() call and incorrectly
using list_del() rather than list_del_init(). The two patches in this
set fix these, and it'd be nice they could still make it to 4.2-rc1 to
avoid new bug reports from users.
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In enic_poll, we clean tx and rx queues, when low latency busy socket polling
is happening, enic_poll will only clean tx queue. After cleaning tx, it should
return total budget for re-poll.
There is a small window between vnic_intr_unmask() and enic_poll_unlock_napi().
In this window if an irq occurs and napi is scheduled on different cpu, it tries
to acquire enic_poll_lock_napi() and fails. Unlock napi_poll before unmasking
the interrupt.
v2:
Do not change tx wonk done behaviour. Consider only rx work done for completing
napi.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|