summaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)Author
2017-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Both conflict were simple overlapping changes. In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the conversion of the driver in net-next to use in-netdev stats. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21ip6mr: fix notification device destructionNikolay Aleksandrov
Andrey Konovalov reported a BUG caused by the ip6mr code which is caused because we call unregister_netdevice_many for a device that is already being destroyed. In IPv4's ipmr that has been resolved by two commits long time ago by introducing the "notify" parameter to the delete function and avoiding the unregister when called from a notifier, so let's do the same for ip6mr. The trace from Andrey: ------------[ cut here ]------------ kernel BUG at net/core/dev.c:6813! invalid opcode: 0000 [#1] SMP KASAN Modules linked in: CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: netns cleanup_net task: ffff880069208000 task.stack: ffff8800692d8000 RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813 RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297 RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569 RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000 R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070 R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000 FS: 0000000000000000(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0 Call Trace: unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881 unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880 ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346 notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647 call_netdevice_notifiers net/core/dev.c:1663 rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841 unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881 unregister_netdevice_many net/core/dev.c:7880 default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333 ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144 cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463 process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097 worker_thread+0x223/0x19c0 kernel/workqueue.c:2231 kthread+0x35e/0x430 kernel/kthread.c:231 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89 47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe <0f> 0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00 RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0 ---[ end trace e0b29c57e9b3292c ]--- Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-04-20 This adds the basic infrastructure for IPsec hardware offloading, it creates a configuration API and adjusts the packet path. 1) Add the needed netdev features to configure IPsec offloads. 2) Add the IPsec hardware offloading API. 3) Prepare the ESP packet path for hardware offloading. 4) Add gso handlers for esp4 and esp6, this implements the software fallback for GSO packets. 5) Add xfrm replay handler functions for offloading. 6) Change ESP to use a synchronous crypto algorithm on offloading, we don't have the option for asynchronous returns when we handle IPsec at layer2. 7) Add a xfrm validate function to validate_xmit_skb. This implements the software fallback for non GSO packets. 8) Set the inner_network and inner_transport members of the SKB, as well as encapsulation, to reflect the actual positions of these headers, and removes them only once encryption is done on the payload. From Ilan Tayari. 9) Prepare the ESP GRO codepath for hardware offloading. 10) Fix incorrect null pointer check in esp6. From Colin Ian King. 11) Fix for the GSO software fallback path to detect the fallback correctly. From Ilan Tayari. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21net: ipv6: RTF_PCPU should not be settable from userspaceDavid Ahern
Andrey reported a fault in the IPv6 route code: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff880069809600 task.stack: ffff880062dc8000 RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975 RSP: 0018:ffff880062dced30 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006 RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018 RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000 FS: 00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0 Call Trace: ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128 ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212 ... Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit set. Flags passed to the kernel are blindly copied to the allocated rt6_info by ip6_route_info_create making a newly inserted route appear as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set and expects rt->dst.from to be set - which it is not since it is not really a per-cpu copy. The subsequent call to __ip6_dst_alloc then generates the fault. Fix by checking for the flag and failing with EINVAL. Fixes: d52d3997f843f ("ipv6: Create percpu rt6_info") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21ip_tunnel: Allow policy-based routing through tunnelsCraig Gallek
This feature allows the administrator to set an fwmark for packets traversing a tunnel. This allows the use of independent routing tables for tunneled packets without the use of iptables. There is no concept of per-packet routing decisions through IPv4 tunnels, so this implementation does not need to work with per-packet route lookups as the v6 implementation may (with IP6_TNL_F_USE_ORIG_FWMARK). Further, since the v4 tunnel ioctls share datastructures (which can not be trivially modified) with the kernel's internal tunnel configuration structures, the mark attribute must be stored in the tunnel structure itself and passed as a parameter when creating or changing tunnel attributes. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21ip6_tunnel: Allow policy-based routing through tunnelsCraig Gallek
This feature allows the administrator to set an fwmark for packets traversing a tunnel. This allows the use of independent routing tables for tunneled packets without the use of iptables. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21ipv6: sr: fix double free of skb after handling invalid SRHDavid Lebrun
The icmpv6_param_prob() function already does a kfree_skb(), this patch removes the duplicate one. Fixes: 1ababeba4a21f3dba3da3523c670b207fb2feb62 ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20net: ipv6: Fix UDP early demux lookup with udp_l3mdev_accept=0subashab@codeaurora.org
David Ahern reported that 5425077d73e0c ("net: ipv6: Add early demux handler for UDP unicast") breaks udp_l3mdev_accept=0 since early demux for IPv6 UDP was doing a generic socket lookup which does not require an exact match. Fix this by making UDPv6 early demux match connected sockets only. v1->v2: Take reference to socket after match as suggested by Eric v2->v3: Add comment before break Fixes: 5425077d73e0c ("net: ipv6: Add early demux handler for UDP unicast") Reported-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Eric Dumazet <edumazet@google.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20ipv6: sr: fix out-of-bounds access in SRH validationDavid Lebrun
This patch fixes an out-of-bounds access in seg6_validate_srh() when the trailing data is less than sizeof(struct sr6_tlv). Reported-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
A function in kernel/bpf/syscall.c which got a bug fix in 'net' was moved to kernel/bpf/verifier.c in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-19esp4/6: Fix GSO path for non-GSO SW-crypto packetsIlan Tayari
If esp*_offload module is loaded, outbound packets take the GSO code path, being encapsulated at layer 3, but encrypted in layer 2. validate_xmit_xfrm calls esp*_xmit for that. esp*_xmit was wrongfully detecting these packets as going through hardware crypto offload, while in fact they should be encrypted in software, causing plaintext leakage to the network, and also dropping at the receiver side. Perform the encryption in esp*_xmit, if the SA doesn't have a hardware offload_handle. Also, align esp6 code to esp4 logic. Fixes: fca11ebde3f0 ("esp4: Reorganize esp_output") Fixes: 383d0350f2cc ("esp6: Reorganize esp_output") Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-19esp6: fix incorrect null pointer check on xoColin Ian King
The check for xo being null is incorrect, currently it is checking for non-null, it should be checking for null. Detected with CoverityScan, CID#1429349 ("Dereference after null check") Fixes: 7862b4058b9f ("esp: Add gso handlers for esp4 and esp6") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-18mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCUPaul E. McKenney
A group of Linux kernel hackers reported chasing a bug that resulted from their assumption that SLAB_DESTROY_BY_RCU provided an existence guarantee, that is, that no block from such a slab would be reallocated during an RCU read-side critical section. Of course, that is not the case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire slab of blocks. However, there is a phrase for this, namely "type safety". This commit therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order to avoid future instances of this sort of confusion. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <linux-mm@kvack.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> [ paulmck: Add comments mentioning the old name, as requested by Eric Dumazet, in order to help people familiar with the old name find the new one. ] Acked-by: David Rientjes <rientjes@google.com>
2017-04-17net: rtnetlink: plumb extended ack to doit functionDavid Ahern
Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse for doit functions that call it directly. This is the first step to using extended error reporting in rtnetlink. >From here individual subsystems can be updated to set netlink_ext_ack as needed. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17ipv6: sr: fix BUG due to headroom too small after SRH pushDavid Lebrun
When a locally generated packet receives an SRH with two or more segments, the remaining headroom is too small to push an ethernet header. This patch ensures that the headroom is large enough after SRH push. The BUG generated the following trace. [ 192.950285] skbuff: skb_under_panic: text:ffffffff81809675 len:198 put:14 head:ffff88006f306400 data:ffff88006f3063fa tail:0xc0 end:0x2c0 dev:A-1 [ 192.952456] ------------[ cut here ]------------ [ 192.953218] kernel BUG at net/core/skbuff.c:105! [ 192.953411] invalid opcode: 0000 [#1] PREEMPT SMP [ 192.953411] Modules linked in: [ 192.953411] CPU: 5 PID: 3433 Comm: ping6 Not tainted 4.11.0-rc3+ #237 [ 192.953411] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014 [ 192.953411] task: ffff88007c2d42c0 task.stack: ffffc90000ef4000 [ 192.953411] RIP: 0010:skb_panic+0x61/0x70 [ 192.953411] RSP: 0018:ffffc90000ef7900 EFLAGS: 00010286 [ 192.953411] RAX: 0000000000000085 RBX: 00000000000086dd RCX: 0000000000000201 [ 192.953411] RDX: 0000000080000201 RSI: ffffffff81d104c5 RDI: 00000000ffffffff [ 192.953411] RBP: ffffc90000ef7920 R08: 0000000000000001 R09: 0000000000000000 [ 192.953411] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 192.953411] R13: ffff88007c5a4000 R14: ffff88007b363d80 R15: 00000000000000b8 [ 192.953411] FS: 00007f94b558b700(0000) GS:ffff88007fd40000(0000) knlGS:0000000000000000 [ 192.953411] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 192.953411] CR2: 00007fff5ecd5080 CR3: 0000000074141000 CR4: 00000000001406e0 [ 192.953411] Call Trace: [ 192.953411] skb_push+0x3b/0x40 [ 192.953411] eth_header+0x25/0xc0 [ 192.953411] neigh_resolve_output+0x168/0x230 [ 192.953411] ? ip6_finish_output2+0x242/0x8f0 [ 192.953411] ip6_finish_output2+0x242/0x8f0 [ 192.953411] ? ip6_finish_output2+0x76/0x8f0 [ 192.953411] ip6_finish_output+0xa8/0x1d0 [ 192.953411] ip6_output+0x64/0x2d0 [ 192.953411] ? ip6_output+0x73/0x2d0 [ 192.953411] ? ip6_dst_check+0xb5/0xc0 [ 192.953411] ? dst_cache_per_cpu_get.isra.2+0x40/0x80 [ 192.953411] seg6_output+0xb0/0x220 [ 192.953411] lwtunnel_output+0xcf/0x210 [ 192.953411] ? lwtunnel_output+0x59/0x210 [ 192.953411] ip6_local_out+0x38/0x70 [ 192.953411] ip6_send_skb+0x2a/0xb0 [ 192.953411] ip6_push_pending_frames+0x48/0x50 [ 192.953411] rawv6_sendmsg+0xa39/0xf10 [ 192.953411] ? __lock_acquire+0x489/0x890 [ 192.953411] ? __mutex_lock+0x1fc/0x970 [ 192.953411] ? __lock_acquire+0x489/0x890 [ 192.953411] ? __mutex_lock+0x1fc/0x970 [ 192.953411] ? tty_ioctl+0x283/0xec0 [ 192.953411] inet_sendmsg+0x45/0x1d0 [ 192.953411] ? _copy_from_user+0x54/0x80 [ 192.953411] sock_sendmsg+0x33/0x40 [ 192.953411] SYSC_sendto+0xef/0x170 [ 192.953411] ? entry_SYSCALL_64_fastpath+0x5/0xc2 [ 192.953411] ? trace_hardirqs_on_caller+0x12b/0x1b0 [ 192.953411] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 192.953411] SyS_sendto+0x9/0x10 [ 192.953411] entry_SYSCALL_64_fastpath+0x1f/0xc2 [ 192.953411] RIP: 0033:0x7f94b453db33 [ 192.953411] RSP: 002b:00007fff5ecd0578 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 192.953411] RAX: ffffffffffffffda RBX: 00007fff5ecd16e0 RCX: 00007f94b453db33 [ 192.953411] RDX: 0000000000000040 RSI: 000055a78352e9c0 RDI: 0000000000000003 [ 192.953411] RBP: 00007fff5ecd1690 R08: 000055a78352c940 R09: 000000000000001c [ 192.953411] R10: 0000000000000000 R11: 0000000000000246 R12: 000055a783321e10 [ 192.953411] R13: 000055a7839890c0 R14: 0000000000000004 R15: 0000000000000000 [ 192.953411] Code: 00 00 48 89 44 24 10 8b 87 c4 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 90 58 d2 81 48 89 04 24 31 c0 e8 4f 70 9a ff <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 48 8b 97 d8 00 00 [ 192.953411] RIP: skb_panic+0x61/0x70 RSP: ffffc90000ef7900 [ 193.000186] ---[ end trace bd0b89fabdf2f92c ]--- [ 193.000951] Kernel panic - not syncing: Fatal exception in interrupt [ 193.001137] Kernel Offset: disabled [ 193.001169] ---[ end Kernel panic - not syncing: Fatal exception in interrupt Fixes: 19d5a26f5ef8de5dcb78799feaf404d717b1aac3 ("ipv6: sr: expand skb head only if necessary") Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17ipv6: drop non loopback packets claiming to originate from ::1Florian Westphal
We lack a saddr check for ::1. This causes security issues e.g. with acls permitting connections from ::1 because of assumption that these originate from local machine. Assuming a source address of ::1 is local seems reasonable. RFC4291 doesn't allow such a source address either, so drop such packets. Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-04-14 Here's the main batch of Bluetooth & 802.15.4 patches for the 4.12 kernel. - Many fixes to 6LoWPAN, in particular for BLE - New CA8210 IEEE 802.15.4 device driver (accounting for most of the lines of code added in this pull request) - Added Nokia Bluetooth (UART) HCI driver - Some serdev & TTY changes that are dependencies for the Nokia driver (with acks from relevant maintainers and an agreement that these come through the bluetooth tree) - Support for new Intel Bluetooth device - Various other minor cleanups/fixes here and there Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net-timestamp: avoid use-after-free in ip_recv_errorWillem de Bruijn
Syzkaller reported a use-after-free in ip_recv_error at line info->ipi_ifindex = skb->dev->ifindex; This function is called on dequeue from the error queue, at which point the device pointer may no longer be valid. Save ifindex on enqueue in __skb_complete_tx_timestamp, when the pointer is valid or NULL. Store it in temporary storage skb->cb. It is safe to reference skb->dev here, as called from device drivers or dev_queue_xmit. The exception is when called from tcp_ack_tstamp; in that case it is NULL and ifindex is set to 0 (invalid). Do not return a pktinfo cmsg if ifindex is 0. This maintains the current behavior of not returning a cmsg if skb->dev was NULL. On dequeue, the ipv4 path will cast from sock_exterr_skb to in_pktinfo. Both have ifindex as their first element, so no explicit conversion is needed. This is by design, introduced in commit 0b922b7a829c ("net: original ingress device index in PKTINFO"). For ipv6 ip6_datagram_support_cmsg converts to in6_pktinfo. Fixes: 829ae9d61165 ("net-timestamp: allow reading recv cmsg on errqueue with origin tstamp") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net: ipv6: send unsolicited NA on admin upDavid Ahern
ndisc_notify is the ipv6 equivalent to arp_notify. When arp_notify is set to 1, gratuitous arp requests are sent when the device is brought up. The same is expected when ndisc_notify is set to 1 (per ndisc_notify in Documentation/networking/ip-sysctl.txt). The NA is not sent on NETDEV_UP event; add it. Fixes: 5cb04436eef6 ("ipv6: add knob to send unsolicited ND on link-layer address change") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts were simply overlapping changes. In the net/ipv4/route.c case the code had simply moved around a little bit and the same fix was made in both 'net' and 'net-next'. In the net/sched/sch_generic.c case a fix in 'net' happened at the same time that a new argument was added to qdisc_hash_add(). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-15netfilter: remove nf_ct_is_untrackedFlorian Westphal
This function is now obsolete and always returns false. This change has no effect on generated code. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-15netfilter: kill the fake untracked conntrack objectsFlorian Westphal
resurrect an old patch from Pablo Neira to remove the untracked objects. Currently, there are four possible states of an skb wrt. conntrack. 1. No conntrack attached, ct is NULL. 2. Normal (kmem cache allocated) ct attached. 3. a template (kmalloc'd), not in any hash tables at any point in time 4. the 'untracked' conntrack, a percpu nf_conn object, tagged via IPS_UNTRACKED_BIT in ct->status. Untracked is supposed to be identical to case 1. It exists only so users can check -m conntrack --ctstate UNTRACKED vs. -m conntrack --ctstate INVALID e.g. attempts to set connmark on INVALID or UNTRACKED conntracks is supposed to be a no-op. Thus currently we need to check ct == NULL || nf_ct_is_untracked(ct) in a lot of places in order to avoid altering untracked objects. The other consequence of the percpu untracked object is that all -j NOTRACK (and, later, kfree_skb of such skbs) result in an atomic op (inc/dec the untracked conntracks refcount). This adds a new kernel-private ctinfo state, IP_CT_UNTRACKED, to make the distinction instead. The (few) places that care about packet invalid (ct is NULL) vs. packet untracked now need to test ct == NULL vs. ctinfo == IP_CT_UNTRACKED, but all other places can omit the nf_ct_is_untracked() check. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-14xfrm: Prepare the GRO codepath for hardware offloading.Steffen Klassert
On IPsec hardware offloading, we already get a secpath with valid state attached when the packet enters the GRO handlers. So check for hardware offload and skip the state lookup in this case. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14xfrm: Add encapsulation header offsets while SKB is not encryptedIlan Tayari
Both esp4 and esp6 used to assume that the SKB payload is encrypted and therefore the inner_network and inner_transport offsets are not relevant. When doing crypto offload in the NIC, this is no longer the case and the NIC driver needs these offsets so it can do TX TCP checksum offloading. This patch sets the inner_network and inner_transport members of the SKB, as well as encapsulation, to reflect the actual positions of these headers, and removes them only once encryption is done on the payload. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14esp: Use a synchronous crypto algorithm on offloading.Steffen Klassert
We need a fallback algorithm for crypto offloading to a NIC. This is because packets can be rerouted to other NICs that don't support crypto offloading. The fallback is going to be implemented at layer2 where we know the final output device but can't handle asynchronous returns fron the crypto layer. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14esp: Add gso handlers for esp4 and esp6Steffen Klassert
This patch extends the xfrm_type by an encap function pointer and implements esp4_gso_encap and esp6_gso_encap. These functions doing the basic esp encapsulation for a GSO packet. In case the GSO packet needs to be segmented in software, we add gso_segment functions. This codepath is going to be used on esp hardware offloads. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14esp6: Reorganize esp_outputSteffen Klassert
We need a fallback for ESP at layer 2, so split esp6_output into generic functions that can be used at layer 3 and layer 2 and use them in esp_output. We also add esp6_xmit which is used for the layer 2 fallback. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14esp6: Remame esp_input_done2Steffen Klassert
We are going to export the ipv4 and the ipv6 version of esp_input_done2. They are not static anymore and can't have the same name. So rename the ipv6 version to esp6_input_done2. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14xfrm: Add an IPsec hardware offloading APISteffen Klassert
This patch adds all the bits that are needed to do IPsec hardware offload for IPsec states and ESP packets. We add xfrmdev_ops to the net_device. xfrmdev_ops has function pointers that are needed to manage the xfrm states in the hardware and to do a per packet offloading decision. Joint work with: Ilan Tayari <ilant@mellanox.com> Guy Shapiro <guysh@mellanox.com> Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14xfrm: Add mode handlers for IPsec on layer 2Steffen Klassert
This patch adds a gso_segment and xmit callback for the xfrm_mode and implement these functions for tunnel and transport mode. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-13netlink: pass extended ACK struct to parsing functionsJohannes Berg
Pass the new extended ACK reporting struct to all of the generic netlink parsing functions. For now, pass NULL in almost all callers (except for some in the core.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-126lowpan: Fix IID format for BluetoothLuiz Augusto von Dentz
According to RFC 7668 U/L bit shall not be used: https://wiki.tools.ietf.org/html/rfc7668#section-3.2.2 [Page 10]: In the figure, letter 'b' represents a bit from the Bluetooth device address, copied as is without any changes on any bit. This means that no bit in the IID indicates whether the underlying Bluetooth device address is public or random. |0 1|1 3|3 4|4 6| |0 5|6 1|2 7|8 3| +----------------+----------------+----------------+----------------+ |bbbbbbbbbbbbbbbb|bbbbbbbb11111111|11111110bbbbbbbb|bbbbbbbbbbbbbbbb| +----------------+----------------+----------------+----------------+ Because of this the code cannot figure out the address type from the IP address anymore thus it makes no sense to use peer_lookup_ba as it needs the peer address type. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-04-12ipv6: addrconf: fix 48 bit 6lowpan autoconfigurationAlexander Aring
This patch adds support for 48 bit 6LoWPAN address length autoconfiguration which is the case for BTLE 6LoWPAN. Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-04-12ipv6: Fix idev->addr_list corruptionRabin Vincent
addrconf_ifdown() removes elements from the idev->addr_list without holding the idev->lock. If this happens while the loop in __ipv6_dev_get_saddr() is handling the same element, that function ends up in an infinite loop: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [test:1719] Call Trace: ipv6_get_saddr_eval+0x13c/0x3a0 __ipv6_dev_get_saddr+0xe4/0x1f0 ipv6_dev_get_saddr+0x1b4/0x204 ip6_dst_lookup_tail+0xcc/0x27c ip6_dst_lookup_flow+0x38/0x80 udpv6_sendmsg+0x708/0xba8 sock_sendmsg+0x18/0x30 SyS_sendto+0xb8/0xf8 syscall_common+0x34/0x58 Fixes: 6a923934c33 (Revert "ipv6: Revert optional address flusing on ifdown.") Signed-off-by: Rabin Vincent <rabinv@axis.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-08netfilter: ip6_tables: Remove unneccessary commentsArushi Singhal
This comments are obsolete and should go, as there are no set of rules per CPU anymore. Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com>
2017-04-07netfilter: Remove exceptional & on function nameArushi Singhal
Remove & from function pointers to conform to the style found elsewhere in the file. Done using the following semantic patch // <smpl> @r@ identifier f; @@ f(...) { ... } @@ identifier r.f; @@ - &f + f // </smpl> Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-07netfilter: Remove unnecessary cast on void pointersimran singhal
The following Coccinelle script was used to detect this: @r@ expression x; void* e; type T; identifier f; @@ ( *((T *)e) | ((T *)x)[...] | ((T*)x)->f | - (T*) e ) Unnecessary parantheses are also remove. Signed-off-by: simran singhal <singhalsimran0@gmail.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-28net: ipv6: Add support for RTM_DELNETCONFDavid Ahern
Send RTM_DELNETCONF notifications when a device is deleted. The message only needs the device index, so modify inet6_netconf_fill_devconf to skip devconf references if it is NULL. Allows a userspace cache to remove entries as devices are deleted. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28net: ipv6: Refactor inet6_netconf_notify_devconf to take eventDavid Ahern
Refactor inet6_netconf_notify_devconf to take the event as an input arg. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28ipv6: add support for NETDEV_RESEND_IGMP eventVlad Yasevich
This patch adds support for NETDEV_RESEND_IGMP event similar to how it works for IPv4. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27ipv6: sr: select DST_CACHE by defaultDavid Lebrun
When CONFIG_IPV6_SEG6_LWTUNNEL is selected, automatically select DST_CACHE. This allows to remove multiple ifdefs. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24tcp: Record Rx hash and NAPI ID in tcp_child_processAlexander Duyck
While working on some recent busy poll changes we found that child sockets were being instantiated without NAPI ID being set. In our first attempt to fix it, it was suggested that we should just pull programming the NAPI ID into the function itself since all callers will need to have it set. In addition to the NAPI ID change I have dropped the code that was populating the Rx hash since it was actually being populated in tcp_get_cookie_sock. Reported-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24ipv6: sr: use dst_cache in seg6_inputDavid Lebrun
We already use dst_cache in seg6_output, when handling locally generated packets. We extend it in seg6_input, to also handle forwarded packets, and avoid unnecessary fib lookups. Performances for SRH encapsulation before the patch: Result: OK: 5656067(c5655678+d388) usec, 5000000 (1000byte,0frags) 884006pps 7072Mb/sec (7072048000bps) errors: 0 Performances after the patch: Result: OK: 4774543(c4774084+d459) usec, 5000000 (1000byte,0frags) 1047220pps 8377Mb/sec (8377760000bps) errors: 0 Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24ipv6: sr: expand skb head only if necessaryDavid Lebrun
To insert or encapsulate a packet with an SRH, we need a large enough skb headroom. Currently, we are using pskb_expand_head to inconditionally increase the size of the headroom by the amount needed by the SRH (and IPv6 header). If this reallocation is performed by another CPU than the one that initially allocated the skb, then when the initial CPU kfree the skb, it will enter the __slab_free slowpath, impacting performances. This patch replaces pskb_expand_head with skb_cow_head, that will reallocate the skb head only if the headroom is not large enough. Performances for SRH encapsulation before the patch: Result: OK: 7348320(c7347271+d1048) usec, 5000000 (1000byte,0frags) 680427pps 5443Mb/sec (5443416000bps) errors: 0 Performances after the patch: Result: OK: 5656067(c5655678+d388) usec, 5000000 (1000byte,0frags) 884006pps 7072Mb/sec (7072048000bps) errors: 0 Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net: Add sysctl to toggle early demux for tcp and udpsubashab@codeaurora.org
Certain system process significant unconnected UDP workload. It would be preferrable to disable UDP early demux for those systems and enable it for TCP only. By disabling UDP demux, we see these slight gains on an ARM64 system- 782 -> 788Mbps unconnected single stream UDPv4 633 -> 654Mbps unconnected UDPv4 different sources The performance impact can change based on CPU architecure and cache sizes. There will not much difference seen if entire UDP hash table is in cache. Both sysctls are enabled by default to preserve existing behavior. v1->v2: Change function pointer instead of adding conditional as suggested by Stephen. v2->v3: Read once in callers to avoid issues due to compiler optimizations. Also update commit message with the tests. v3->v4: Store and use read once result instead of querying pointer again incorrectly. v4->v5: Refactor to avoid errors due to compilation with IPV6={m,n} Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Suggested-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Tom Herbert <tom@herbertland.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/broadcom/genet/bcmmii.c drivers/net/hyperv/netvsc.c kernel/bpf/hashtab.c Almost entirely overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22net: ipv6: Add sysctl for minimum prefix len acceptable in RIOs.Joel Scherpelz
This commit adds a new sysctl accept_ra_rt_info_min_plen that defines the minimum acceptable prefix length of Route Information Options. The new sysctl is intended to be used together with accept_ra_rt_info_max_plen to configure a range of acceptable prefix lengths. It is useful to prevent misconfigurations from unintentionally blackholing too much of the IPv6 address space (e.g., home routers announcing RIOs for fc00::/7, which is incorrect). Signed-off-by: Joel Scherpelz <jscherpelz@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22ipv6: make sure to initialize sockc.tsflags before first useAlexander Potapenko
In the case udp_sk(sk)->pending is AF_INET6, udpv6_sendmsg() would jump to do_append_data, skipping the initialization of sockc.tsflags. Fix the problem by moving sockc.tsflags initialization earlier. The bug was detected with KMSAN. Fixes: c14ac9451c34 ("sock: enable timestamping using control messages") Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22neighbour: fix nlmsg_pid in notificationsRoopa Prabhu
neigh notifications today carry pid 0 for nlmsg_pid in all cases. This patch fixes it to carry calling process pid when available. Applications (eg. quagga) rely on nlmsg_pid to ignore notifications generated by their own netlink operations. This patch follows the routing subsystem which already sets this correctly. Reported-by: Vivek Venkatraman <vivek@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree. A couple of new features for nf_tables, and unsorted cleanups and incremental updates for the Netfilter tree. More specifically, they are: 1) Allow to check for TCP option presence via nft_exthdr, patch from Phil Sutter. 2) Add symmetric hash support to nft_hash, from Laura Garcia Liebana. 3) Use pr_cont() in ebt_log, from Joe Perches. 4) Remove some dead code in arp_tables reported via static analysis tool, from Colin Ian King. 5) Consolidate nf_tables expression validation, from Liping Zhang. 6) Consolidate set lookup via nft_set_lookup(). 7) Remove unnecessary rcu read lock side in bridge netfilter, from Florian Westphal. 8) Remove unused variable in nf_reject_ipv4, from Tahee Yoo. 9) Pass nft_ctx struct to object initialization indirections, from Florian Westphal. 10) Add code to integrate conntrack helper into nf_tables, also from Florian. 11) Allow to check if interface index or name exists via NFTA_FIB_F_PRESENT, from Phil Sutter. 12) Simplify resolve_normal_ct(), from Florian. 13) Use per-limit spinlock in nft_limit and xt_limit, from Liping Zhang. 14) Use rwlock in nft_set_rbtree set, also from Liping Zhang. 15) One patch to remove a useless printk at netns init path in ipvs, and several patches to document IPVS knobs. 16) Use refcount_t for reference counter in the Netfilter/IPVS code, from Elena Reshetova. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>