Age | Commit message (Collapse) | Author |
|
The macro 'A' used in internal BPF interpreter:
#define A regs[insn->a_reg]
was easily confused with the name of classic BPF register 'A', since
'A' would mean two different things depending on context.
This patch is trying to clean up the naming and clarify its usage in the
following way:
- A and X are names of two classic BPF registers
- BPF_REG_A denotes internal BPF register R0 used to map classic register A
in internal BPF programs generated from classic
- BPF_REG_X denotes internal BPF register R7 used to map classic register X
in internal BPF programs generated from classic
- internal BPF instruction format:
struct sock_filter_int {
__u8 code; /* opcode */
__u8 dst_reg:4; /* dest register */
__u8 src_reg:4; /* source register */
__s16 off; /* signed offset */
__s32 imm; /* signed immediate constant */
};
- BPF_X/BPF_K is 1 bit used to encode source operand of instruction
In classic:
BPF_X - means use register X as source operand
BPF_K - means use 32-bit immediate as source operand
In internal:
BPF_X - means use 'src_reg' register as source operand
BPF_K - means use 32-bit immediate as source operand
Suggested-by: Chema Gonzalez <chema@google.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Chema Gonzalez <chema@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adding bridge support to the batman-adv multicast optimization requires
batman-adv knowing about the existence of bridged-in IGMP/MLD queriers
to be able to reliably serve any multicast listener behind this same
bridge.
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With this new, exported function br_multicast_list_adjacent(net_dev) a
list of IPv4/6 addresses is returned. This list contains all multicast
addresses sensed by the bridge multicast snooping feature on all bridge
ports of the bridge interface of net_dev, excluding addresses from the
specified net_device itself.
Adding bridge support to the batman-adv multicast optimization requires
batman-adv knowing about the existence of bridged-in multicast
listeners to be able to reliably serve them with multicast packets.
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
MLDv1 (RFC2710 section 6), MLDv2 (RFC3810 section 7.6.2), IGMPv2
(RFC2236 section 3) and IGMPv3 (RFC3376 section 6.6.2) specify that the
querier with lowest source address shall become the selected
querier.
So far the bridge stopped its querier as soon as it heard another
querier regardless of its source address. This results in the "wrong"
querier potentially becoming the active querier or a potential,
unnecessary querying delay.
With this patch the bridge memorizes the source address of the currently
selected querier and ignores queries from queriers with a higher source
address than the currently selected one. This slight optimization is
supposed to make it more RFC compliant (but is rather uncritical and
therefore probably not necessary to be queued for stable kernels).
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current naming of these two structs is very random, in that
reversing their naming would not make any semantical difference.
This patch tries to make the naming less confusing by giving them a more
specific, distinguishable naming.
This is also useful for the upcoming patches reintroducing the
"struct bridge_mcast_querier" but for storing information about the
selected querier (no matter if our own or a foreign querier).
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no need to require forcing device down on a Ethernet GRE (gretap)
tunnel to change the MAC address.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tcp_fragment can be called from process context (from tso_fragment).
Add a new gfp parameter to allow it to preserve atomic memory if
possible.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Reviewed-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:
====================
pull request: wireless-next 2014-06-06
Please accept this batch of fixes intended for the 3.16 stream.
For the bluetooth bits, Gustavo says:
"Here some more patches for 3.16. We know that Linus already opened the merge
window, but this is fix only pull request, and most of the patches here are
also tagged for stable."
Along with that, Andrea Merello provides a fix for the broken scanning
in the venerable at76c50x driver...
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During key removal, the key object is freed, but not taken out of the
llsec key list properly. Fix that.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
|
|
To avoid the confusion of having two variables, shrink the function to
only use the parameter variable for looping.
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conflicts:
drivers/net/xen-netback/netback.c
net/core/filter.c
A filter bug fix overlapped some cleanups and a conversion
over to some new insn generation macros.
A xen-netback bug fix overlapped the addition of multi-queue
support.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
BPF classic->internal converter broke SKF_AD_PKTTYPE extension, since
pkt_type_offset() was failing to find skb->pkt_type field which is defined as:
__u8 pkt_type:3,
fclone:2,
ipvs_property:1,
peeked:1,
nf_trace:1;
Fix it by searching for 3 most significant bits and shift them by 5 at run-time
Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Tested-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/nf_tables fixes for net-next
This patchset contains fixes for recent updates available in your
net-next, they are:
1) Fix double memory allocation for accounting objects that results
in a leak, this slipped through with the new quota extension,
patch from Mathieu Poirier.
2) Fix broken ordering when adding set element transactions.
3) Make sure that objects are released in reverse order in the abort
path, to avoid possible use-after-free when accessing dependencies.
4) Allow to delete several objects (as long as dependencies are
fulfilled) by using one batch. This includes changes in the use
counter semantics of the nf_tables objects.
5) Fix illegal sleeping allocation from rcu callback.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
br_manage_promisc() incorrectly expects br_auto_port() to return only 0
or 1, while it actually returns flags, i.e., a subset of BR_AUTO_MASK.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If an MPLS packet requires segmentation then use mpls_features
to determine if the software implementation should be used.
As no driver advertises MPLS GSO segmentation this will always be
the case.
I had not noticed that this was necessary before as software MPLS GSO
segmentation was already being used in my test environment. I believe that
the reason for that is the skbs in question always had fragments and the
driver I used does not advertise NETIF_F_FRAGLIST (which seems to be the
case for most drivers). Thus software segmentation was activated by
skb_gso_ok().
This introduces the overhead of an extra call to skb_network_protocol()
in the case where where CONFIG_NET_MPLS_GSO is set and
skb->ip_summed == CHECKSUM_NONE.
Thanks to Jesse Gross for prompting me to investigate this.
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
|
|
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is available since v3.15-rc5.
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
According to RFC1035 "[...] the total length of a domain name (i.e.,
label octets and label length octets) is restricted to 255 octets or
less."
Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Added VXLAN link configuration for sending UDP checksums, and allowing
TX and RX of UDP6 checksums.
Also, call common iptunnel_handle_offloads and added GSO support for
checksums.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Call gso_make_checksum. This should have the benefit of using a
checksum that may have been previously computed for the packet.
This also adds NETIF_F_GSO_GRE_CSUM to differentiate devices that
offload GRE GSO with and without the GRE checksum offloaed.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates
that a device is capable of computing the UDP checksum in the
encapsulating header of a UDP tunnel.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Call common gso_make_checksum when calculating checksum for a
TCP GSO segment.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When creating a GSO packet segment we may need to set more than
one checksum in the packet (for instance a TCP checksum and
UDP checksum for VXLAN encapsulation). To be efficient, we want
to do checksum calculation for any part of the packet at most once.
This patch adds csum_start offset to skb_gso_cb. This tracks the
starting offset for skb->csum which is initially set in skb_segment.
When a protocol needs to compute a transport checksum it calls
gso_make_checksum which computes the checksum value from the start
of transport header to csum_start and then adds in skb->csum to get
the full checksum. skb->csum and csum_start are then updated to reflect
the checksum of the resultant packet starting from the transport header.
This patch also adds a flag to skbuff, encap_hdr_csum, which is set
in *gso_segment fucntions to indicate that a tunnel protocol needs
checksum calculation
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Call common functions to set checksum for UDP tunnel.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Added udp_set_csum and udp6_set_csum functions to set UDP checksums
in packets. These are for simple UDP packets such as those that might
be created in UDP tunnels.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit efe4208 ("ipv6: make lookups simpler and faster") introduced a
regression in udp_v6_mcast_next(), resulting in multicast packets not
reaching the destination sockets under certain conditions.
The packet's IPv6 addresses are wrongly compared to the IPv6 addresses
from the function's socket argument, which indicates the starting point
for looping, instead of the loop variable. If the addresses from the
first socket do not match the packet's addresses, no socket in the list
will match.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 30f38d2fdd79f13fc929489f7e6e517b4a4bfe63.
fib_triestat is surrounded by a big lie: while it claims that it's a
seq_file (fib_triestat_seq_open, fib_triestat_seq_show), it isn't:
static const struct file_operations fib_triestat_fops = {
.owner = THIS_MODULE,
.open = fib_triestat_seq_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release_net,
};
Yes, fib_triestat is just a regular file.
A small detail (assuming CONFIG_NET_NS=y) is that while for seq_files
you could do seq_file_net() to get the net ptr, doing so for a regular
file would be wrong and would dereference an invalid pointer.
The fib_triestat lie claimed a victim, and trying to show the file would
be bad for the kernel. This patch just reverts the issue and fixes
fib_triestat, which still needs a rewrite to either be a seq_file or
stop claiming it is.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conflicts:
include/net/inetpeer.h
net/ipv6/output_core.c
Changes in net were fixing bugs in code removed in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When we jump to free_pcpu on failure in alloc_netdev_mqs()
rx and tx queues are not yet allocated, so no need to free them.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is possible that ->newlink() fails before registering
the device, in this case we should just free it, it's
safe to call free_netdev().
Fixes: commit 0e0eee2465df77bcec2 (net: correct error path in rtnl_newlink())
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The xfrm_user module registers its pernet init/exit after xfrm
itself so that its net exit function xfrm_user_net_exit() is
executed before xfrm_net_exit() which calls xfrm_state_fini() to
cleanup the SA's (xfrm states). This opens a window between
zeroing net->xfrm.nlsk pointer and deleting all xfrm_state
instances which may access it (via the timer). If an xfrm state
expires in this window, xfrm_exp_state_notify() will pass null
pointer as socket to nlmsg_multicast().
As the notifications are called inside rcu_read_lock() block, it
is sufficient to retrieve the nlsk socket with rcu_dereference()
and check the it for null.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bwh/net-next
Ben Hutchings says:
====================
Pull request: Fixes for new ethtool RSS commands
This addresses several problems I previously identified with the new
ETHTOOL_{G,S}RSSH commands:
1. Missing validation of reserved parameters
2. Vague documentation
3. Use of unnamed magic number
4. No consolidation with existing driver operations
I don't currently have access to suitable network hardware, but have
tested these changes with a dummy driver that can support various
combinations of operations and sizes, together with (a) Debian's ethtool
3.13 (b) ethtool 3.14 with the submitted patch to use ETHTOOL_{G,S}RSSH
and minor adjustment for fixes 1 and 3.
v2: Update RSS operations in vmxnet3 too
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We should fail rather than silently ignoring use of these extensions.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
ETHTOOL_{G,S}RXFHINDIR and ETHTOOL_{G,S}RSSH should work for drivers
regardless of whether they expose the hash key, unless you try to
set a hash key for a driver that doesn't expose it.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
(This patch was previously posted as RFC at
http://patchwork.ozlabs.org/patch/352677/)
This patch adds NDA_MASTER attribute to neighbour attributes enum for
bridge/master ifindex. And adds NDA_MASTER to bridge fdb notify msgs.
Today bridge fdb notifications dont contain bridge information.
Userspace can derive it from the port information in the fdb
notification. However this is tricky in some scenarious.
Example, bridge port delete notification comes before bridge fdb
delete notifications. And we have seen problems in userspace
when using libnl where, the bridge fdb delete notification handling code
does not understand which bridge this fdb entry is part of because
the bridge and port association has already been deleted.
And these notifications (port membership and fdb) are generated on
separate rtnl groups.
Fixing the order of notifications could possibly solve the problem
for some cases (I can submit a separate patch for that).
This patch chooses to add NDA_MASTER to bridge fdb notify msgs
because it not only solves the problem described above, but also helps
userspace avoid another lookup into link msgs to derive the master index.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
__sk_prepare_filter() was reworked in commit bd4cf0ed3 (net: filter:
rework/optimize internal BPF interpreter's instruction set) so that it should
have uncharged memory once things went wrong. However that work isn't complete.
Error is handled only in __sk_migrate_filter() while memory can still leak in
the error path right after sk_chk_filter().
Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Signed-off-by: Leon Yu <chianglungyu@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This bug is discovered by an recent F-RTO issue on tcpm list
https://www.ietf.org/mail-archive/web/tcpm/current/msg08794.html
The bug is that currently F-RTO does not use DSACK to undo cwnd in
certain cases: upon receiving an ACK after the RTO retransmission in
F-RTO, and the ACK has DSACK indicating the retransmission is spurious,
the sender only calls tcp_try_undo_loss() if some never retransmisted
data is sacked (FLAG_ORIG_DATA_SACKED).
The correct behavior is to unconditionally call tcp_try_undo_loss so
the DSACK information is used properly to undo the cwnd reduction.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make fib_triestat_seq_show consistent with other /proc/net files and
use seq_file_net.
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It was possible to get a setuid root or setcap executable to write to
it's stdout or stderr (which has been set made a netlink socket) and
inadvertently reconfigure the networking stack.
To prevent this we check that both the creator of the socket and
the currentl applications has permission to reconfigure the network
stack.
Unfortunately this breaks Zebra which always uses sendto/sendmsg
and creates it's socket without any privileges.
To keep Zebra working don't bother checking if the creator of the
socket has privilege when a destination address is specified. Instead
rely exclusively on the privileges of the sender of the socket.
Note from Andy: This is exactly Eric's code except for some comment
clarifications and formatting fixes. Neither I nor, I think, anyone
else is thrilled with this approach, but I'm hesitant to wait on a
better fix since 3.15 is almost here.
Note to stable maintainers: This is a mess. An earlier series of
patches in 3.15 fix a rather serious security issue (CVE-2014-0181),
but they did so in a way that breaks Zebra. The offending series
includes:
commit aa4cf9452f469f16cea8c96283b641b4576d4a7b
Author: Eric W. Biederman <ebiederm@xmission.com>
Date: Wed Apr 23 14:28:03 2014 -0700
net: Add variants of capable for use on netlink messages
If a given kernel version is missing that series of fixes, it's
probably worth backporting it and this patch. if that series is
present, then this fix is critical if you care about Zebra.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery
is disabled.
Note how GSO/TSO packets do not have monotonically incrementing ID.
06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396)
06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212)
06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972)
06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292)
06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764)
It appears I introduced this bug in linux-3.1.
inet_getid() must return the old value of peer->ip_id_count,
not the new one.
Lets revert this part, and remove the prevention of
a null identification field in IPv6 Fragment Extension Header,
which is dubious and not even done properly.
Fixes: 87c48fa3b463 ("ipv6: make fragment identifications less predictable")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
br_handle_local_finish() is allowing us to insert an FDB entry with
disallowed vlan. For example, when port 1 and 2 are communicating in
vlan 10, and even if vlan 10 is disallowed on port 3, port 3 can
interfere with their communication by spoofed src mac address with
vlan id 10.
Note: Even if it is judged that a frame should not be learned, it should
not be dropped because it is destined for not forwarding layer but higher
layer. See IEEE 802.1Q-2011 8.13.10.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:
====================
pull request: wireless-next 2014-06-02
Please pull this remaining batch of updates intended for the 3.16 stream...
For the mac80211 bits, Johannes says:
"The remainder for -next right now is mostly fixes, and a handful of
small new things like some CSA infrastructure, the regdb script mW/dBm
conversion change and sending wiphy notifications."
For the bluetooth bits, Gustavo says:
"Some more patches for 3.16. There is nothing really special here, just a
bunch of clean ups, fixes plus some small improvements. Please pull."
For the nfc bits, Samuel says:
"We have:
- Felica (Type3) tags support for trf7970a
- Type 4b tags support for port100
- st21nfca DTS typo fix
- A few sparse warning fixes"
For the atheros bits, Kalle says:
"Ben added support for setting antenna configurations. Michal improved
warm reset so that we would not need to fall back to cold reset that
often, an issue where ath10k stripped protected flag while in monitor
mode and made module initialisation asynchronous to fix the problems
with firmware loading when the driver is linked to the kernel.
Luca removed unused channel_switch_beacon callbacks both from ath9k and
ath10k. Marek fixed Protected Management Frames (PMF) when using Action
Frames. Also we had other small fixes everywhere in the driver."
Along with that, there are a handful of updates to a variety
of drivers. This includes updates to at76c50x-usb, ath9k, b43,
brcmfmac, mwifiex, rsi, rtlwifi, and wil6210.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ideally, we would need to generate IP ID using a per destination IP
generator.
linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.
1) each inet_peer struct consumes 192 bytes
2) inetpeer cache uses a binary tree of inet_peer structs,
with a nominal size of ~66000 elements under load.
3) lookups in this tree are hitting a lot of cache lines, as tree depth
is about 20.
4) If server deals with many tcp flows, we have a high probability of
not finding the inet_peer, allocating a fresh one, inserting it in
the tree with same initial ip_id_count, (cf secure_ip_id())
5) We garbage collect inet_peer aggressively.
IP ID generation do not have to be 'perfect'
Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.
We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.
ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)
secure_ip_id() and secure_ipv6_id() no longer are needed.
Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This change provides a function to be used in order to break the
ndo_set_rx_mode call into a set of address add and remove calls. The code
is based on the implementation of dev_uc_sync/dev_mc_sync. Since they
essentially do the same thing but with only one dev I simply named my
functions __dev_uc_sync/__dev_mc_sync.
I also implemented an unsync version of the functions as well to allow for
cleanup on close.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fix a off by one error while fragmentation. If the frag_cap
value is equal to skb_unprocessed value we need to stop the
fragmentation loop because the last fragment which has a size of
skb_unprocessed fits into the frag capability size.
This issue was introduced by commit d4b2816d67d6e07b2f27037f282d8db03a5829d7
("6lowpan: fix fragmentation").
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fix the 6LoWPAN fragmentation for the case if we have exactly
two fragments. The problem is that the (skb_unprocessed >= frag_cap)
condition is always false on the second fragment after sending the first
fragment. A fragmentation with only one fragment doesn't make any sense.
The solution is that we use a do while loop here, that ensures we sending
always a minimum of two fragments if we need a fragmentation.
This issue was introduced by commit d4b2816d67d6e07b2f27037f282d8db03a5829d7
("6lowpan: fix fragmentation").
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
the local variable ops and n_ops were just read out from family,
and not changed, hence no need to assign back.
Validation functions should operate on const parameters and not
change anything.
Signed-off-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
|