summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-10-01l2tp: fix l2tp_eth module loadingGuillaume Nault
The l2tp_eth module crashes if its netlink callbacks are run when the pernet data aren't initialised. We should normally register_pernet_device() before the genl callbacks. However, the pernet data only maintain a list of l2tpeth interfaces, and this list is never used. So let's just drop pernet handling instead. Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01Merge branch 'erspan-fixes'David S. Miller
Xin Long says: ==================== ip_gre: a bunch of fixes for erspan This patchset is to fix some issues that could cause 0 or low performance, and even unexpected truncated packets on erspan. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01ip_gre: erspan device should keep dstXin Long
The patch 'ip_gre: ipgre_tap device should keep dst' fixed the issue ipgre_tap dev mtu couldn't be updated in tx path. The same fix is needed for erspan as well. Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01ip_gre: set tunnel hlen properly in erspan_tunnel_initXin Long
According to __gre_tunnel_init, tunnel->hlen should be set as the headers' length between inner packet and outer iphdr. It would be used especially to calculate a proper mtu when updating mtu in tnl_update_pmtu. Now without setting it, a bigger mtu value than expected would be updated, which hurts performance a lot. This patch is to fix it by setting tunnel->hlen with: tunnel->tun_hlen + tunnel->encap_hlen + sizeof(struct erspanhdr) Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01ip_gre: check packet length and mtu correctly in erspan_xmitXin Long
As a ARPHRD_ETHER device, skb->len in erspan_xmit is the length of the whole ether packet. So before checking if a packet size exceeds the mtu, skb->len should subtract dev->hard_header_len. Otherwise, all packets with max size according to mtu would be trimmed to be truncated packet. Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01ip_gre: get key from session_id correctly in erspan_rcvXin Long
erspan only uses the first 10 bits of session_id as the key to look up the tunnel. But in erspan_rcv, it missed 'session_id & ID_MASK' when getting the key from session_id. If any other flag is also set in session_id in a packet, it would fail to find the tunnel due to incorrect key in erspan_rcv. This patch is to add 'session_id & ID_MASK' there and also remove the unnecessary variable session_id. Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01tipc: use only positive error codes in messagesParthasarathy Bhuvaragan
In commit e3a77561e7d32 ("tipc: split up function tipc_msg_eval()"), we have updated the function tipc_msg_lookup_dest() to set the error codes to negative values at destination lookup failures. Thus when the function sets the error code to -TIPC_ERR_NO_NAME, its inserted into the 4 bit error field of the message header as 0xf instead of TIPC_ERR_NO_NAME (1). The value 0xf is an unknown error code. In this commit, we set only positive error code. Fixes: e3a77561e7d32 ("tipc: split up function tipc_msg_eval()") Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01ppp: fix __percpu annotationGuillaume Nault
Move sparse annotation right after pointer type. Fixes sparse warning: drivers/net/ppp/ppp_generic.c:1422:13: warning: incorrect type in initializer (different address spaces) drivers/net/ppp/ppp_generic.c:1422:13: expected void const [noderef] <asn:3>*__vpp_verify drivers/net/ppp/ppp_generic.c:1422:13: got int *<noident> ... Fixes: e5dadc65f9e0 ("ppp: Fix false xmit recursion detect with two ppp devices") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01Merge branch 'udp-fix-early-demux-for-mcast-packets'David S. Miller
Paolo Abeni says: ==================== udp: fix early demux for mcast packets Currently the early demux callbacks do not perform source address validation. This is not an issue for TCP or UDP unicast, where the early demux is only allowed for connected sockets and the source address is validated for the first packet and never change. The UDP protocol currently allows early demux also for unconnected multicast sockets, and we are not currently doing any validation for them, after that the first packet lands on the socket: beyond ignoring the rp_filter - if enabled - any kind of martian sources are also allowed. This series addresses the issue allowing the early demux callback to return an error code, and performing the proper checks for unconnected UDP multicast sockets before leveraging the rx dst cache. Alternatively we could disable the early demux for unconnected mcast sockets, but that would cause relevant performance regression - around 50% - while with this series, with full rp_filter in place, we keep the regression to a more moderate level. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01udp: perform source validation for mcast early demuxPaolo Abeni
The UDP early demux can leverate the rx dst cache even for multicast unconnected sockets. In such scenario the ipv4 source address is validated only on the first packet in the given flow. After that, when we fetch the dst entry from the socket rx cache, we stop enforcing the rp_filter and we even start accepting any kind of martian addresses. Disabling the dst cache for unconnected multicast socket will cause large performace regression, nearly reducing by half the max ingress tput. Instead we factor out a route helper to completely validate an skb source address for multicast packets and we call it from the UDP early demux for mcast packets landing on unconnected sockets, after successful fetching the related cached dst entry. This still gives a measurable, but limited performance regression: rp_filter = 0 rp_filter = 1 edmux disabled: 1182 Kpps 1127 Kpps edmux before: 2238 Kpps 2238 Kpps edmux after: 2037 Kpps 2019 Kpps The above figures are on top of current net tree. Applying the net-next commit 6e617de84e87 ("net: avoid a full fib lookup when rp_filter is disabled.") the delta with rp_filter == 0 will decrease even more. Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01IPv4: early demux can return an error codePaolo Abeni
Currently no error is emitted, but this infrastructure will used by the next patch to allow source address validation for mcast sockets. Since early demux can do a route lookup and an ipv4 route lookup can return an error code this is consistent with the current ipv4 route infrastructure. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01ip6_tunnel: update mtu properly for ARPHRD_ETHER tunnel device in tx pathXin Long
Now when updating mtu in tx path, it doesn't consider ARPHRD_ETHER tunnel device, like ip6gre_tap tunnel, for which it should also subtract ether header to get the correct mtu. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01ip6_gre: ip6gre_tap device should keep dstXin Long
The patch 'ip_gre: ipgre_tap device should keep dst' fixed a issue that ipgre_tap mtu couldn't be updated in tx path. The same fix is needed for ip6gre_tap as well. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01ip_gre: ipgre_tap device should keep dstXin Long
Without keeping dst, the tunnel will not update any mtu/pmtu info, since it does not have a dst on the skb. Reproducer: client(ipgre_tap1 - eth1) <-----> (eth1 - ipgre_tap1)server After reducing eth1's mtu on client, then perforamnce became 0. This patch is to netif_keep_dst in gre_tap_init, as ipgre does. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-30netlink: do not proceed if dump's start() errsJason A. Donenfeld
Drivers that use the start method for netlink dumping rely on dumpit not being called if start fails. For example, ila_xlat.c allocates memory and assigns it to cb->args[0] in its start() function. It might fail to do that and return -ENOMEM instead. However, even when returning an error, dumpit will be called, which, in the example above, quickly dereferences the memory in cb->args[0], which will OOPS the kernel. This is but one example of how this goes wrong. Since start() has always been a function with an int return type, it therefore makes sense to use it properly, rather than ignoring it. This patch thus returns early and does not call dumpit() when start() fails. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29Merge tag 'mlx5-fixes-2017-09-28' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-fixes-2017-09-28 Misc. fixes for mlx5 drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: Set sk_prot_creator when cloning sockets to the right protoChristoph Paasch
sk->sk_prot and sk->sk_prot_creator can differ when the app uses IPV6_ADDRFORM (transforming an IPv6-socket to an IPv4-one). Which is why sk_prot_creator is there to make sure that sk_prot_free() does the kmem_cache_free() on the right kmem_cache slab. Now, if such a socket gets transformed back to a listening socket (using connect() with AF_UNSPEC) we will allocate an IPv4 tcp_sock through sk_clone_lock() when a new connection comes in. But sk_prot_creator will still point to the IPv6 kmem_cache (as everything got copied in sk_clone_lock()). When freeing, we will thus put this memory back into the IPv6 kmem_cache although it was allocated in the IPv4 cache. I have seen memory corruption happening because of this. With slub-debugging and MEMCG_KMEM enabled this gives the warning "cache_from_obj: Wrong slab cache. TCPv6 but object is from TCP" A C-program to trigger this: void main(void) { int fd = socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP); int new_fd, newest_fd, client_fd; struct sockaddr_in6 bind_addr; struct sockaddr_in bind_addr4, client_addr1, client_addr2; struct sockaddr unsp; int val; memset(&bind_addr, 0, sizeof(bind_addr)); bind_addr.sin6_family = AF_INET6; bind_addr.sin6_port = ntohs(42424); memset(&client_addr1, 0, sizeof(client_addr1)); client_addr1.sin_family = AF_INET; client_addr1.sin_port = ntohs(42424); client_addr1.sin_addr.s_addr = inet_addr("127.0.0.1"); memset(&client_addr2, 0, sizeof(client_addr2)); client_addr2.sin_family = AF_INET; client_addr2.sin_port = ntohs(42421); client_addr2.sin_addr.s_addr = inet_addr("127.0.0.1"); memset(&unsp, 0, sizeof(unsp)); unsp.sa_family = AF_UNSPEC; bind(fd, (struct sockaddr *)&bind_addr, sizeof(bind_addr)); listen(fd, 5); client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); connect(client_fd, (struct sockaddr *)&client_addr1, sizeof(client_addr1)); new_fd = accept(fd, NULL, NULL); close(fd); val = AF_INET; setsockopt(new_fd, SOL_IPV6, IPV6_ADDRFORM, &val, sizeof(val)); connect(new_fd, &unsp, sizeof(unsp)); memset(&bind_addr4, 0, sizeof(bind_addr4)); bind_addr4.sin_family = AF_INET; bind_addr4.sin_port = ntohs(42421); bind(new_fd, (struct sockaddr *)&bind_addr4, sizeof(bind_addr4)); listen(new_fd, 5); client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); connect(client_fd, (struct sockaddr *)&client_addr2, sizeof(client_addr2)); newest_fd = accept(new_fd, NULL, NULL); close(new_fd); close(client_fd); close(new_fd); } As far as I can see, this bug has been there since the beginning of the git-days. Signed-off-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: dsa: mv88e6xxx: lock mutex when freeing IRQsVivien Didelot
mv88e6xxx_g2_irq_free locks the registers mutex, but not mv88e6xxx_g1_irq_free, which results in a stack trace from assert_reg_lock when unloading the mv88e6xxx module. Fix this. Fixes: 3460a5770ce9 ("net: dsa: mv88e6xxx: Mask g1 interrupts and free interrupt") Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28packet: only test po->has_vnet_hdr once in packet_sndWillem de Bruijn
Packet socket option po->has_vnet_hdr can be updated concurrently with other operations if no ring is attached. Do not test the option twice in packet_snd, as the value may change in between calls. A race on setsockopt disable may cause a packet > mtu to be sent without having GSO options set. Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.") Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28packet: in packet_do_bind, test fanout with bind_lock heldWillem de Bruijn
Once a socket has po->fanout set, it remains a member of the group until it is destroyed. The prot_hook must be constant and identical across sockets in the group. If fanout_add races with packet_do_bind between the test of po->fanout and taking the lock, the bind call may make type or dev inconsistent with that of the fanout group. Hold po->bind_lock when testing po->fanout to avoid this race. I had to introduce artificial delay (local_bh_enable) to actually observe the race. Fixes: dc99f600698d ("packet: Add fanout support.") Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: stmmac: dwmac4: Re-enable MAC Rx before powering downEd Blake
Re-enable the MAC receiver by setting CONFIG_RE before powering down, as instructed in section 6.3.5.1 of [1]. Without this the MAC fails to receive WoL packets and never wakes up. [1] DWC Ethernet QoS Databook 4.10a October 2014 Signed-off-by: Ed Blake <ed.blake@sondrel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: stmmac: dwc-qos: Add suspend / resume supportEd Blake
Add hook to stmmac_pltfr_pm_ops for suspend / resume handling. Signed-off-by: Ed Blake <ed.blake@sondrel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: dsa: Fix network device registration orderFlorian Fainelli
We cannot be registering the network device first, then setting its carrier off and finally connecting it to a PHY, doing that leaves a window during which the carrier is at best inconsistent, and at worse the device is not usable without a down/up sequence since the network device is visible to user space with possibly no PHY device attached. Re-order steps so that they make logical sense. This fixes some devices where the port was not usable after e.g: an unbind then bind of the driver. Fixes: 0071f56e46da ("dsa: Register netdev before phy") Fixes: 91da11f870f0 ("net: Distributed Switch Architecture protocol support") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: dsa: mv88e6xxx: Allow dsa and cpu ports in multiple vlansAndrew Lunn
Ports with the same VLAN must all be in the same bridge. However the CPU and DSA ports need to be in multiple VLANs spread over multiple bridges. So exclude them when performing this test. Fixes: b2f81d304cee ("net: dsa: add CPU and DSA ports as VLAN members") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28inetpeer: fix RCU lookup() againEric Dumazet
My prior fix was not complete, as we were dereferencing a pointer three times per node, not twice as I initially thought. Fixes: 4cc5b44b29a9 ("inetpeer: fix RCU lookup()") Fixes: b145425f269a ("inetpeer: remove AVL implementation in favor of RB tree") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28Merge branch 'mvpp2-various-fixes'David S. Miller
Antoine Tenart says: ==================== net: mvpp2: various fixes This series contains 3 fixes for the Marvell PPv2 driver. Since v1: - Removed one patch about dma masks as it would need a better fix. - Added one fix about the MAC Tx clock source selection. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: mvpp2: do not select the internal source clockAntoine Tenart
This patch stops the internal MAC Tx clock from being enabled as the internal clock isn't used. The definition used for the bit controlling this behaviour is renamed as well as it was wrongly named (bit 4 of GMAC_CTRL_2_REG). Fixes: 3919357fb0bb ("net: mvpp2: initialize the GMAC when using a port") Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: mvpp2: fix port list indexingYan Markman
The private port_list array has a list of pointers to mvpp2_port instances. This list is allocated given the number of ports enabled in the device tree, but the pointers are set using the port-id property. If on a single port is enabled, the port_list array will be of size 1, but when registering the port, if its id is not 0 the driver will crash. Other crashes were encountered in various situations. This fixes the issue by using an index not equal to the value of the port-id property. Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: Yan Markman <ymarkman@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: mvpp2: fix parsing fragmentation detectionStefan Chulski
Parsing fragmentation detection failed due to wrong configured parser TCAM entry's. Some traffic was marked as fragmented in RX descriptor, even it wasn't IP fragmented. The hardware also failed to calculate checksums which lead to use software checksum and caused performance degradation. Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: Stefan Chulski <stefanc@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28tun: bail out from tun_get_user() if the skb is emptyAlexander Potapenko
KMSAN (https://github.com/google/kmsan) reported accessing uninitialized skb->data[0] in the case the skb is empty (i.e. skb->len is 0): ================================================ BUG: KMSAN: use of uninitialized memory in tun_get_user+0x19ba/0x3770 CPU: 0 PID: 3051 Comm: probe Not tainted 4.13.0+ #3140 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: ... __msan_warning_32+0x66/0xb0 mm/kmsan/kmsan_instr.c:477 tun_get_user+0x19ba/0x3770 drivers/net/tun.c:1301 tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365 call_write_iter ./include/linux/fs.h:1743 new_sync_write fs/read_write.c:457 __vfs_write+0x6c3/0x7f0 fs/read_write.c:470 vfs_write+0x3e4/0x770 fs/read_write.c:518 SYSC_write+0x12f/0x2b0 fs/read_write.c:565 SyS_write+0x55/0x80 fs/read_write.c:557 do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284 entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:245 ... origin: ... kmsan_poison_shadow+0x6e/0xc0 mm/kmsan/kmsan.c:211 slab_alloc_node mm/slub.c:2732 __kmalloc_node_track_caller+0x351/0x370 mm/slub.c:4351 __kmalloc_reserve net/core/skbuff.c:138 __alloc_skb+0x26a/0x810 net/core/skbuff.c:231 alloc_skb ./include/linux/skbuff.h:903 alloc_skb_with_frags+0x1d7/0xc80 net/core/skbuff.c:4756 sock_alloc_send_pskb+0xabf/0xfe0 net/core/sock.c:2037 tun_alloc_skb drivers/net/tun.c:1144 tun_get_user+0x9a8/0x3770 drivers/net/tun.c:1274 tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365 call_write_iter ./include/linux/fs.h:1743 new_sync_write fs/read_write.c:457 __vfs_write+0x6c3/0x7f0 fs/read_write.c:470 vfs_write+0x3e4/0x770 fs/read_write.c:518 SYSC_write+0x12f/0x2b0 fs/read_write.c:565 SyS_write+0x55/0x80 fs/read_write.c:557 do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284 return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:245 ================================================ Make sure tun_get_user() doesn't touch skb->data[0] unless there is actual data. C reproducer below: ========================== // autogenerated by syzkaller (http://github.com/google/syzkaller) #define _GNU_SOURCE #include <fcntl.h> #include <linux/if_tun.h> #include <netinet/ip.h> #include <net/if.h> #include <string.h> #include <sys/ioctl.h> int main() { int sock = socket(PF_INET, SOCK_STREAM, IPPROTO_IP); int tun_fd = open("/dev/net/tun", O_RDWR); struct ifreq req; memset(&req, 0, sizeof(struct ifreq)); strcpy((char*)&req.ifr_name, "gre0"); req.ifr_flags = IFF_UP | IFF_MULTICAST; ioctl(tun_fd, TUNSETIFF, &req); ioctl(sock, SIOCSIFFLAGS, "gre0"); write(tun_fd, "hi", 0); return 0; } ========================== Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net/mlx5: Fix wrong indentation in enable SRIOV codeOr Gerlitz
Smatch is screaming: drivers/net/ethernet/mellanox/mlx5/core/sriov.c:112 mlx5_device_enable_sriov() warn: inconsistent indenting fix that. Fixes: 7ecf6d8ff154 ('IB/mlx5: Restore IB guid/policy for virtual functions') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28net/mlx5: Fix static checker warning on steering tracepoints codeMatan Barak
Fix this sparse complaint: drivers/net/ethernet/mellanox/mlx5/core/./diag/fs_tracepoint.h:172:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) Fixes: d9fea79171ee ('net/mlx5: Add tracepoints') Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28net/mlx5e: Fix calculated checksum offloads countersGal Pressman
Instead of calculating the offloads counters, count them explicitly. The calculations done for these counters would result in bugs in some cases, for example: When running TCP traffic over a VXLAN tunnel with TSO enabled the following counters would increase: tx_csum_partial: 1,333,284 tx_csum_partial_inner: 29,286 tx4_csum_partial_inner: 384 tx7_csum_partial_inner: 8 tx9_csum_partial_inner: 34 tx10_csum_partial_inner: 26,807 tx11_csum_partial_inner: 287 tx12_csum_partial_inner: 27 tx16_csum_partial_inner: 6 tx25_csum_partial_inner: 1,733 Seems like tx_csum_partial increased out of nowhere. The issue is in the following calculation in mlx5e_update_sw_counters: s->tx_csum_partial = s->tx_packets - tx_offload_none - s->tx_csum_partial_inner; While tx_packets increases by the number of GSO segments for each SKB, tx_csum_partial_inner will only increase by one, resulting in wrong tx_csum_partial counter. Fixes: bfe6d8d1d433 ("net/mlx5e: Reorganize ethtool statistics") Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28net/mlx5e: Don't add/remove 802.1ad rules when changing 802.1Q VLAN filterGal Pressman
Toggling of C-tag VLAN filter should not affect the "any S-tag" steering rule. Fixes: 8a271746a264 ("net/mlx5e: Receive s-tagged packets in promiscuous mode") Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28net/mlx5e: Print netdev features correctly in error messageGal Pressman
Use the correct formatting for netdev features. Fixes: 0e405443e803 ("net/mlx5e: Improve set features ndo resiliency") Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28net/mlx5e: Check encap entry state when offloading tunneled flowsVlad Buslov
Encap entries cached by the driver could be invalidated due to tunnel destination neighbour state changes. When attempting to offload a flow that uses a cached encap entry, we must check the entry validity and defer the offloading if the entry exists but not valid. When EAGAIN is returned, the flow offloading to hardware takes place by the neigh update code when the tunnel destination neighbour becomes connected. Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28net/mlx5e: Disallow TC offloading of unsupported match/action combinationsOr Gerlitz
When offloading header re-write, the HW may need to adjust checksums along the packet. For IP traffic, and a case where we are asked to modify fields in the IP header, current HW supports that only for TCP and UDP. Enforce it, in this case fail the offloading attempt for non TCP/UDP packets. Fixes: d7e75a325cb2 ('net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions') Fixes: 2f4fe4cab073 ('net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28net/mlx5e: Fix erroneous freeing of encap header bufferPaul Blakey
In case the neighbour for the tunnel destination isn't valid, we send a neighbour update request but we free the encap header buffer. This is wrong, because we still need it for allocating a HW encap entry once the neighbour is available. Fix that by skipping freeing it if we wait for neighbour. Fixes: 232c001398ae ('net/mlx5e: Add support to neighbour update flow') Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28net/mlx5: Check device capability for maximum flow countersRaed Salem
Added check for the maximal number of flow counters attached to rule (FTE). Fixes: bd5251dbf156b ('net/mlx5_core: Introduce flow steering destination of type counter') Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28net/mlx5: Fix FPGA capability locationInbar Karmy
Currently, FPGA capability is located in (mdev)->caps.hca_cur, change the location to be (mdev)->caps.fpga, since hca_cur is reserved for HCA device capabilities. Fixes: e29341fb3a5b ("net/mlx5: FPGA, Add basic support for Innova") Signed-off-by: Inbar Karmy <inbark@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28net/mlx5e: IPoIB, Fix access to invalid memory addressRoi Dayan
When cleaning rdma netdevice we need to save the mdev pointer because priv is released when we release netdev. This bug was found using the kernel address sanitizer (KASAN). use-after-free in mlx5_rdma_netdev_free+0xe3/0x100 [mlx5_core] Fixes: 48935bbb7ae8 ("net/mlx5e: IPoIB, Add netdevice profile skeleton") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26sctp: Fix a big endian bug in sctp_diag_dump()Dan Carpenter
The sctp_for_each_transport() function takes an pointer to int. The cb->args[] array holds longs so it's only using the high 32 bits. It works on little endian system but will break on big endian 64 bit machines. Fixes: d25adbeb0cdb ("sctp: fix an use-after-free issue in sctp_sock_dump") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26Merge tag 'wireless-drivers-for-davem-2017-09-25' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 4.14 Quite a lot of fixes this time. Most notable is the brcmfmac fix for a CVE issue. iwlwifi * a couple of bugzilla bugs related to multicast handling * two fixes for WoWLAN bugs that were causing queue hangs and re-initialization problems * two fixes for potential uninitialized variable use reported by Dan Carpenter in relation to a recently introduced patch * a fix for buffer reordering in the newly supported 9000 device family * fix a race when starting aggregation * small fix for a recent patch to wake mac80211 queues * send non-bufferable management frames in the generic queue so they are not sent on queues that are under power-save ath10k * fix a PCI PM related gcc warning brcmfmac * CVE-2017-0786: add length check scan results from firmware * respect passive scan requests from user space qtnfmac * fix race in tx path when using multiple interfaces * cancel ongoing scan when removing the wireless interface ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26Merge branch 'aquantia-fixes'David S. Miller
Igor Russkikh says: ==================== aquantia: Atlantic driver bugfixes und improvements This series contains bugfixes for aQuantia Atlantic driver. Changes in v2: Review comments applied: - min_mtu set removed - extra mtu range check is removed - err codes handling improved ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26atlantic: fix iommu errorsPavel Belous
Call skb_frag_dma_map multiple times if tx length is greater than device max and avoid processing tx ring until entire packet has been sent. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: Pavel Belous <pavel.belous@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26aquantia: Fix transient invalid link down/up indicationsIgor Russkikh
Due to a bug in aquantia atlantic card firmware, it sometimes reports invalid link speed bits. That caused driver to report link down events, although link itself is totally fine. This patch ignores such out of blue readings. Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26aquantia: Fix Tx queue hangupsIgor Russkikh
Driver did a poor job in managing its Tx queues: Sometimes it could stop tx queues due to link down condition in aq_nic_xmit - but never waked up them. That led to Tx path total suspend. This patch fixes this and improves generic queue management: - introduces queue restart counter - uses generic netif_ interface to disable and enable tx path - refactors link up/down condition and introduces dmesg log event when link changes. - introduces new constant for minimum descriptors count required for queue wakeup Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26aquantia: Setup max_mtu in ndev to enable jumbo framesIgor Russkikh
Although hardware is capable for almost 16K MTU, without max_mtu field correctly set it only allows standard MTU to be used. This patch enables max MTU, calculating it from hardware maximum frame size of 16352 octets (including FCS). Fixes: 5513e16421cb ("net: ethernet: aquantia: Fixes for aq_ndev_change_mtu") Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26l2tp: fix race condition in l2tp_tunnel_deleteSabrina Dubroca
If we try to delete the same tunnel twice, the first delete operation does a lookup (l2tp_tunnel_get), finds the tunnel, calls l2tp_tunnel_delete, which queues it for deletion by l2tp_tunnel_del_work. The second delete operation also finds the tunnel and calls l2tp_tunnel_delete. If the workqueue has already fired and started running l2tp_tunnel_del_work, then l2tp_tunnel_delete will queue the same tunnel a second time, and try to free the socket again. Add a dead flag to prevent firing the workqueue twice. Then we can remove the check of queue_work's result that was meant to prevent that race but doesn't. Reproducer: ip l2tp add tunnel tunnel_id 3000 peer_tunnel_id 4000 local 192.168.0.2 remote 192.168.0.1 encap udp udp_sport 5000 udp_dport 6000 ip l2tp add session name l2tp1 tunnel_id 3000 session_id 1000 peer_session_id 2000 ip link set l2tp1 up ip l2tp del tunnel tunnel_id 3000 ip l2tp del tunnel tunnel_id 3000 Fixes: f8ccac0e4493 ("l2tp: put tunnel socket release on a workqueue") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmitAlexey Kodanev
When running LTP IPsec tests, KASan might report: BUG: KASAN: use-after-free in vti_tunnel_xmit+0xeee/0xff0 [ip_vti] Read of size 4 at addr ffff880dc6ad1980 by task swapper/0/0 ... Call Trace: <IRQ> dump_stack+0x63/0x89 print_address_description+0x7c/0x290 kasan_report+0x28d/0x370 ? vti_tunnel_xmit+0xeee/0xff0 [ip_vti] __asan_report_load4_noabort+0x19/0x20 vti_tunnel_xmit+0xeee/0xff0 [ip_vti] ? vti_init_net+0x190/0x190 [ip_vti] ? save_stack_trace+0x1b/0x20 ? save_stack+0x46/0xd0 dev_hard_start_xmit+0x147/0x510 ? icmp_echo.part.24+0x1f0/0x210 __dev_queue_xmit+0x1394/0x1c60 ... Freed by task 0: save_stack_trace+0x1b/0x20 save_stack+0x46/0xd0 kasan_slab_free+0x70/0xc0 kmem_cache_free+0x81/0x1e0 kfree_skbmem+0xb1/0xe0 kfree_skb+0x75/0x170 kfree_skb_list+0x3e/0x60 __dev_queue_xmit+0x1298/0x1c60 dev_queue_xmit+0x10/0x20 neigh_resolve_output+0x3a8/0x740 ip_finish_output2+0x5c0/0xe70 ip_finish_output+0x4ba/0x680 ip_output+0x1c1/0x3a0 xfrm_output_resume+0xc65/0x13d0 xfrm_output+0x1e4/0x380 xfrm4_output_finish+0x5c/0x70 Can be fixed if we get skb->len before dst_output(). Fixes: b9959fd3b0fa ("vti: switch to new ip tunnel code") Fixes: 22e1b23dafa8 ("vti6: Support inter address family tunneling.") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>