summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-04-02Merge branch 'syscalls-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux Pull removal of in-kernel calls to syscalls from Dominik Brodowski: "System calls are interaction points between userspace and the kernel. Therefore, system call functions such as sys_xyzzy() or compat_sys_xyzzy() should only be called from userspace via the syscall table, but not from elsewhere in the kernel. At least on 64-bit x86, it will likely be a hard requirement from v4.17 onwards to not call system call functions in the kernel: It is better to use use a different calling convention for system calls there, where struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands processing over to the actual syscall function. This means that only those parameters which are actually needed for a specific syscall are passed on during syscall entry, instead of filling in six CPU registers with random user space content all the time (which may cause serious trouble down the call chain). Those x86-specific patches will be pushed through the x86 tree in the near future. Moreover, rules on how data may be accessed may differ between kernel data and user data. This is another reason why calling sys_xyzzy() is generally a bad idea, and -- at most -- acceptable in arch-specific code. This patchset removes all in-kernel calls to syscall functions in the kernel with the exception of arch/. On top of this, it cleans up the three places where many syscalls are referenced or prototyped, namely kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h" * 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits) bpf: whitelist all syscalls for error injection kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions kernel/sys_ni: sort cond_syscall() entries syscalls/x86: auto-create compat_sys_*() prototypes syscalls: sort syscall prototypes in include/linux/compat.h net: remove compat_sys_*() prototypes from net/compat.h syscalls: sort syscall prototypes in include/linux/syscalls.h kexec: move sys_kexec_load() prototype to syscalls.h x86/sigreturn: use SYSCALL_DEFINE0 x86: fix sys_sigreturn() return type to be long, not unsigned long x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate() fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate() fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid() kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare() ...
2018-04-02net: socket: add __compat_sys_...msg() helpers; remove in-kernel calls to ↵Dominik Brodowski
compat syscalls Using the net-internal helpers __compat_sys_...msg() allows us to avoid the internal calls to the compat_sys_...msg() syscalls. compat_sys_recvmmsg() is handled in a different patch. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __compat_sys_recvmmsg() helper; remove in-kernel call to ↵Dominik Brodowski
compat syscall Using the net-internal helper __compat_sys_recvmmsg() allows us to avoid the internal calls to the compat_sys_recvmmsg() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __compat_sys_getsockopt() helper; remove in-kernel call to ↵Dominik Brodowski
compat syscall Using the net-internal helper __compat_sys_getsockopt() allows us to avoid the internal calls to the compat_sys_getsockopt() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __compat_sys_setsockopt() helper; remove in-kernel call to ↵Dominik Brodowski
compat syscall Using the net-internal helper __compat_sys_setsockopt() allows us to avoid the internal calls to the compat_sys_setsockopt() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __compat_sys_recvfrom() helper; remove in-kernel call to ↵Dominik Brodowski
compat syscall Using the net-internal helper __compat_sys_recvfrom() allows us to avoid the internal calls to the compat_sys_recvfrom() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: replace call to sys_recv() with __sys_recvfrom()Dominik Brodowski
sys_recv() merely expands the parameters to __sys_recvfrom() by NULL and NULL. Open-code this in the two places which used sys_recv() as a wrapper to __sys_recvfrom(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: replace calls to sys_send() with __sys_sendto()Dominik Brodowski
sys_send() merely expands the parameters to __sys_sendto() by NULL and 0. Open-code this in the two places which used sys_send() as a wrapper to __sys_sendto(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: move check for forbid_cmsg_compat to __sys_...msg()Dominik Brodowski
The non-compat codepaths for sys_...msg() verify that MSG_CMSG_COMPAT is not set. By moving this check to the __sys_...msg() functions (and making it dependent on a static flag passed to this function), we can call the __sys...msg() functions instead of the syscall functions in all cases. __sys_recvmmsg() does not need this trickery, as the check is handled within the do_sys_recvmmsg() function internal to net/socket.c. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add do_sys_recvmmsg() helper; remove in-kernel call to syscallDominik Brodowski
Using the net-internal helper do_sys_recvmmsg() allows us to avoid the internal calls to the sys_getsockopt() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_getsockopt() helper; remove in-kernel call to syscallDominik Brodowski
Using the net-internal helper __sys_getsockopt() allows us to avoid the internal calls to the sys_getsockopt() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_setsockopt() helper; remove in-kernel call to syscallDominik Brodowski
Using the net-internal helper __sys_setsockopt() allows us to avoid the internal calls to the sys_setsockopt() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_shutdown() helper; remove in-kernel call to syscallDominik Brodowski
Using the net-internal helper __sys_shutdown() allows us to avoid the internal calls to the sys_shutdown() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_socketpair() helper; remove in-kernel call to syscallDominik Brodowski
Using the net-internal helper __sys_socketpair() allows us to avoid the internal calls to the sys_socketpair() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_getpeername() helper; remove in-kernel call to syscallDominik Brodowski
Using the net-internal helper __sys_getpeername() allows us to avoid the internal calls to the sys_getpeername() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_getsockname() helper; remove in-kernel call to syscallDominik Brodowski
Using the net-internal helper __sys_getsockname() allows us to avoid the internal calls to the sys_getsockname() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_listen() helper; remove in-kernel call to syscallDominik Brodowski
Using the net-internal helper __sys_listen() allows us to avoid the internal calls to the sys_listen() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_connect() helper; remove in-kernel call to syscallDominik Brodowski
Using the net-internal helper __sys_connect() allows us to avoid the internal calls to the sys_connect() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_bind() helper; remove in-kernel call to syscallDominik Brodowski
Using the net-internal helper __sys_bind() allows us to avoid the internal calls to the sys_bind() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_socket() helper; remove in-kernel call to syscallDominik Brodowski
Using the net-internal helper __sys_socket() allows us to avoid the internal calls to the sys_socket() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_accept4() helper; remove in-kernel call to syscallDominik Brodowski
Using the net-internal helper __sys_accept4() allows us to avoid the internal calls to the sys_accept4() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_sendto() helper; remove in-kernel call to syscallDominik Brodowski
Using the net-internal helper __sys_sendto() allows us to avoid the internal calls to the sys_sendto() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_recvfrom() helper; remove in-kernel call to syscallDominik Brodowski
Using the net-internal helper __sys_recvfrom() allows us to avoid the internal calls to the sys_recvfrom() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-03-30Merge tag 'kbuild-fixes-v4.16-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - fix missed rebuild of TRIM_UNUSED_KSYMS - fix rpm-pkg for GNU tar >= 1.29 - include scripts/dtc/include-prefixes/* to kernel header deb-pkg - add -no-integrated-as option ealier to fix building with Clang - fix netfilter Makefile for parallel building * tag 'kbuild-fixes-v4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: netfilter: nf_nat_snmp_basic: add correct dependency to Makefile kbuild: rpm-pkg: Support GNU tar >= 1.29 builddeb: Fix header package regarding dtc source links kbuild: set no-integrated-as before incl. arch Makefile kbuild: make scripts/adjust_autoksyms.sh robust against timestamp races
2018-03-30net/ipv6: Fix route leaking between VRFsDavid Ahern
Donald reported that IPv6 route leaking between VRFs is not working. The root cause is the strict argument in the call to rt6_lookup when validating the nexthop spec. ip6_route_check_nh validates the gateway and device (if given) of a route spec. It in turn could call rt6_lookup (e.g., lookup in a given table did not succeed so it falls back to a full lookup) and if so sets the strict argument to 1. That means if the egress device is given, the route lookup needs to return a result with the same device. This strict requirement does not work with VRFs (IPv4 or IPv6) because the oif in the flow struct is overridden with the index of the VRF device to trigger a match on the l3mdev rule and force the lookup to its table. The right long term solution is to add an l3mdev index to the flow struct such that the oif is not overridden. That solution will not backport well, so this patch aims for a simpler solution to relax the strict argument if the route spec device is an l3mdev slave. As done in other places, use the FLOWI_FLAG_SKIP_NH_OIF to know that the RT6_LOOKUP_F_IFACE flag needs to be removed. Fixes: ca254490c8df ("net: Add VRF support to IPv6 stack") Reported-by: Donald Sharp <sharpd@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30ipv6: sr: fix seg6 encap performances with TSO enabledDavid Lebrun
Enabling TSO can lead to abysmal performances when using seg6 in encap mode, such as with the ixgbe driver. This patch adds a call to iptunnel_handle_offloads() to remove the encapsulation bit if needed. Before: root@comp4-seg6bpf:~# iperf3 -c fc00::55 Connecting to host fc00::55, port 5201 [ 4] local fc45::4 port 36592 connected to fc00::55 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 196 KBytes 1.60 Mbits/sec 47 6.66 KBytes [ 4] 1.00-2.00 sec 304 KBytes 2.49 Mbits/sec 100 5.33 KBytes [ 4] 2.00-3.00 sec 284 KBytes 2.32 Mbits/sec 92 5.33 KBytes After: root@comp4-seg6bpf:~# iperf3 -c fc00::55 Connecting to host fc00::55, port 5201 [ 4] local fc45::4 port 43062 connected to fc00::55 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 1.03 GBytes 8.89 Gbits/sec 0 743 KBytes [ 4] 1.00-2.00 sec 1.03 GBytes 8.87 Gbits/sec 0 743 KBytes [ 4] 2.00-3.00 sec 1.03 GBytes 8.87 Gbits/sec 0 743 KBytes Reported-by: Tom Herbert <tom@quantonium.net> Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30net: Fix untag for vlan packets without ethernet headerToshiaki Makita
In some situation vlan packets do not have ethernet headers. One example is packets from tun devices. Users can specify vlan protocol in tun_pi field instead of IP protocol, and skb_vlan_untag() attempts to untag such packets. skb_vlan_untag() (more precisely, skb_reorder_vlan_header() called by it) however did not expect packets without ethernet headers, so in such a case size argument for memmove() underflowed and triggered crash. ==== BUG: unable to handle kernel paging request at ffff8801cccb8000 IP: __memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43 PGD 9cee067 P4D 9cee067 PUD 1d9401063 PMD 1cccb7063 PTE 2810100028101 Oops: 000b [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 17663 Comm: syz-executor2 Not tainted 4.16.0-rc7+ #368 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43 RSP: 0018:ffff8801cc046e28 EFLAGS: 00010287 RAX: ffff8801ccc244c4 RBX: fffffffffffffffe RCX: fffffffffff6c4c2 RDX: fffffffffffffffe RSI: ffff8801cccb7ffc RDI: ffff8801cccb8000 RBP: ffff8801cc046e48 R08: ffff8801ccc244be R09: ffffed0039984899 R10: 0000000000000001 R11: ffffed0039984898 R12: ffff8801ccc244c4 R13: ffff8801ccc244c0 R14: ffff8801d96b7c06 R15: ffff8801d96b7b40 FS: 00007febd562d700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8801cccb8000 CR3: 00000001ccb2f006 CR4: 00000000001606e0 DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Call Trace: memmove include/linux/string.h:360 [inline] skb_reorder_vlan_header net/core/skbuff.c:5031 [inline] skb_vlan_untag+0x470/0xc40 net/core/skbuff.c:5061 __netif_receive_skb_core+0x119c/0x3460 net/core/dev.c:4460 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4627 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4701 netif_receive_skb+0xae/0x390 net/core/dev.c:4725 tun_rx_batched.isra.50+0x5ee/0x870 drivers/net/tun.c:1555 tun_get_user+0x299e/0x3c20 drivers/net/tun.c:1962 tun_chr_write_iter+0xb9/0x160 drivers/net/tun.c:1990 call_write_iter include/linux/fs.h:1782 [inline] new_sync_write fs/read_write.c:469 [inline] __vfs_write+0x684/0x970 fs/read_write.c:482 vfs_write+0x189/0x510 fs/read_write.c:544 SYSC_write fs/read_write.c:589 [inline] SyS_write+0xef/0x220 fs/read_write.c:581 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x454879 RSP: 002b:00007febd562cc68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007febd562d6d4 RCX: 0000000000454879 RDX: 0000000000000157 RSI: 0000000020000180 RDI: 0000000000000014 RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000006b0 R14: 00000000006fc120 R15: 0000000000000000 Code: 90 90 90 90 90 90 90 48 89 f8 48 83 fa 20 0f 82 03 01 00 00 48 39 fe 7d 0f 49 89 f0 49 01 d0 49 39 f8 0f 8f 9f 00 00 00 48 89 d1 <f3> a4 c3 48 81 fa a8 02 00 00 72 05 40 38 fe 74 3b 48 83 ea 20 RIP: __memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43 RSP: ffff8801cc046e28 CR2: ffff8801cccb8000 ==== We don't need to copy headers for packets which do not have preceding headers of vlan headers, so skip memmove() in that case. Fixes: 4bbb3e0e8239 ("net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29ip_tunnel: Resolve ipsec merge conflict properly.David S. Miller
We want to use dev_set_mtu() regardless of how we calculate the mtu value. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-03-29 1) Fix a rcu_read_lock/rcu_read_unlock imbalance in the error path of xfrm_local_error(). From Taehee Yoo. 2) Some VTI MTU fixes. From Stefano Brivio. 3) Fix a too early overwritten skb control buffer on xfrm transport mode. Please note that this pull request has a merge conflict in net/ipv4/ip_tunnel.c. The conflict is between commit f6cc9c054e77 ("ip_tunnel: Emit events for post-register MTU changes") from the net tree and commit 24fc79798b8d ("ip_tunnel: Clamp MTU to bounds on new link") from the ipsec tree. It can be solved as it is currently done in linux-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29netfilter: nf_nat_snmp_basic: add correct dependency to MakefileMasahiro Yamada
nf_nat_snmp_basic_main.c includes a generated header, but the necessary dependency is missing in Makefile. This could cause build error in parallel building. Remove a weird line, and add a correct one. Fixes: cc2d58634e0f ("netfilter: nf_nat_snmp_basic: use asn1 decoder library") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-03-27net/smc: use announced length in sock_recvmsg()Ursula Braun
Not every CLC proposal message needs the maximum buffer length. Due to the MSG_WAITALL flag, it is important to use the peeked real length when receiving the message. Fixes: d63d271ce2b5ce ("smc: switch to sock_recvmsg()") Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27llc: properly handle dev_queue_xmit() return valueCong Wang
llc_conn_send_pdu() pushes the skb into write queue and calls llc_conn_send_pdus() to flush them out. However, the status of dev_queue_xmit() is not returned to caller, in this case, llc_conn_state_process(). llc_conn_state_process() needs hold the skb no matter success or failure, because it still uses it after that, therefore we should hold skb before dev_queue_xmit() when that skb is the one being processed by llc_conn_state_process(). For other callers, they can just pass NULL and ignore the return value as they are. Reported-by: Noam Rathaus <noamr@beyondsecurity.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27strparser: Fix sign of err codesDave Watson
strp_parser_err is called with a negative code everywhere, which then calls abort_parser with a negative code. strp_msg_timeout calls abort_parser directly with a positive code. Negate ETIMEDOUT to match signed-ness of other calls. The default abort_parser callback, strp_abort_strp, sets sk->sk_err to err. Also negate the error here so sk_err always holds a positive value, as the rest of the net code expects. Currently a negative sk_err can result in endless loops, or user code that thinks it actually sent/received err bytes. Found while testing net/tls_sw recv path. Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages") Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27net sched actions: fix dumping which requires several messages to user spaceCraig Dillabaugh
Fixes a bug in the tcf_dump_walker function that can cause some actions to not be reported when dumping a large number of actions. This issue became more aggrevated when cookies feature was added. In particular this issue is manifest when large cookie values are assigned to the actions and when enough actions are created that the resulting table must be dumped in multiple batches. The number of actions returned in each batch is limited by the total number of actions and the memory buffer size. With small cookies the numeric limit is reached before the buffer size limit, which avoids the code path triggering this bug. When large cookies are used buffer fills before the numeric limit, and the erroneous code path is hit. For example after creating 32 csum actions with the cookie aaaabbbbccccdddd $ tc actions ls action csum total acts 26 action order 0: csum (tcp) action continue index 1 ref 1 bind 0 cookie aaaabbbbccccdddd ..... action order 25: csum (tcp) action continue index 26 ref 1 bind 0 cookie aaaabbbbccccdddd total acts 6 action order 0: csum (tcp) action continue index 28 ref 1 bind 0 cookie aaaabbbbccccdddd ...... action order 5: csum (tcp) action continue index 32 ref 1 bind 0 cookie aaaabbbbccccdddd Note that the action with index 27 is omitted from the report. Fixes: 4b3550ef530c ("[NET_SCHED]: Use nla_nest_start/nla_nest_end")" Signed-off-by: Craig Dillabaugh <cdillaba@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27net: fix possible out-of-bound read in skb_network_protocol()Eric Dumazet
skb mac header is not necessarily set at the time skb_network_protocol() is called. Use skb->data instead. BUG: KASAN: slab-out-of-bounds in skb_network_protocol+0x46b/0x4b0 net/core/dev.c:2739 Read of size 2 at addr ffff8801b3097a0b by task syz-executor5/14242 CPU: 1 PID: 14242 Comm: syz-executor5 Not tainted 4.16.0-rc6+ #280 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x23c/0x360 mm/kasan/report.c:412 __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:443 skb_network_protocol+0x46b/0x4b0 net/core/dev.c:2739 harmonize_features net/core/dev.c:2924 [inline] netif_skb_features+0x509/0x9b0 net/core/dev.c:3011 validate_xmit_skb+0x81/0xb00 net/core/dev.c:3084 validate_xmit_skb_list+0xbf/0x120 net/core/dev.c:3142 packet_direct_xmit+0x117/0x790 net/packet/af_packet.c:256 packet_snd net/packet/af_packet.c:2944 [inline] packet_sendmsg+0x3aed/0x60b0 net/packet/af_packet.c:2969 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xca/0x110 net/socket.c:639 ___sys_sendmsg+0x767/0x8b0 net/socket.c:2047 __sys_sendmsg+0xe5/0x210 net/socket.c:2081 Fixes: 19acc327258a ("gso: Handle Trans-Ether-Bridging protocol in skb_network_protocol()") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pravin B Shelar <pshelar@ovn.org> Reported-by: Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27Merge tag 'batadv-net-for-davem-20180326' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Simon Wunderlich says: ==================== Here are some batman-adv bugfixes: - fix multicast-via-unicast transmissions for AP isolation and gateway extension, by Linus Luessing (2 patches) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26net: sched, fix OOO packets with pfifo_fastJohn Fastabend
After the qdisc lock was dropped in pfifo_fast we allow multiple enqueue threads and dequeue threads to run in parallel. On the enqueue side the skb bit ooo_okay is used to ensure all related skbs are enqueued in-order. On the dequeue side though there is no similar logic. What we observe is with fewer queues than CPUs it is possible to re-order packets when two instances of __qdisc_run() are running in parallel. Each thread will dequeue a skb and then whichever thread calls the ndo op first will be sent on the wire. This doesn't typically happen because qdisc_run() is usually triggered by the same core that did the enqueue. However, drivers will trigger __netif_schedule() when queues are transitioning from stopped to awake using the netif_tx_wake_* APIs. When this happens netif_schedule() calls qdisc_run() on the same CPU that did the netif_tx_wake_* which is usually done in the interrupt completion context. This CPU is selected with the irq affinity which is unrelated to the enqueue operations. To resolve this we add a RUNNING bit to the qdisc to ensure only a single dequeue per qdisc is running. Enqueue and dequeue operations can still run in parallel and also on multi queue NICs we can still have a dequeue in-flight per qdisc, which is typically per CPU. Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") Reported-by: Jakob Unterwurzacher <jakob.unterwurzacher@theobroma-systems.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-25ipv6: the entire IPv6 header chain must fit the first fragmentPaolo Abeni
While building ipv6 datagram we currently allow arbitrary large extheaders, even beyond pmtu size. The syzbot has found a way to exploit the above to trigger the following splat: kernel BUG at ./include/linux/skbuff.h:2073! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 4230 Comm: syzkaller672661 Not tainted 4.16.0-rc2+ #326 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__skb_pull include/linux/skbuff.h:2073 [inline] RIP: 0010:__ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636 RSP: 0018:ffff8801bc18f0f0 EFLAGS: 00010293 RAX: ffff8801b17400c0 RBX: 0000000000000738 RCX: ffffffff84f01828 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801b415ac18 RBP: ffff8801bc18f360 R08: ffff8801b4576844 R09: 0000000000000000 R10: ffff8801bc18f380 R11: ffffed00367aee4e R12: 00000000000000d6 R13: ffff8801b415a740 R14: dffffc0000000000 R15: ffff8801b45767c0 FS: 0000000001535880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000002000b000 CR3: 00000001b4123001 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip6_finish_skb include/net/ipv6.h:969 [inline] udp_v6_push_pending_frames+0x269/0x3b0 net/ipv6/udp.c:1073 udpv6_sendmsg+0x2a96/0x3400 net/ipv6/udp.c:1343 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:764 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg+0xca/0x110 net/socket.c:640 ___sys_sendmsg+0x320/0x8b0 net/socket.c:2046 __sys_sendmmsg+0x1ee/0x620 net/socket.c:2136 SYSC_sendmmsg net/socket.c:2167 [inline] SyS_sendmmsg+0x35/0x60 net/socket.c:2162 do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x4404c9 RSP: 002b:00007ffdce35f948 EFLAGS: 00000217 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004404c9 RDX: 0000000000000003 RSI: 0000000020001f00 RDI: 0000000000000003 RBP: 00000000006cb018 R08: 00000000004002c8 R09: 00000000004002c8 R10: 0000000020000080 R11: 0000000000000217 R12: 0000000000401df0 R13: 0000000000401e80 R14: 0000000000000000 R15: 0000000000000000 Code: ff e8 1d 5e b9 fc e9 15 e9 ff ff e8 13 5e b9 fc e9 44 e8 ff ff e8 29 5e b9 fc e9 c0 e6 ff ff e8 3f f3 80 fc 0f 0b e8 38 f3 80 fc <0f> 0b 49 8d 87 80 00 00 00 4d 8d 87 84 00 00 00 48 89 85 20 fe RIP: __skb_pull include/linux/skbuff.h:2073 [inline] RSP: ffff8801bc18f0f0 RIP: __ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636 RSP: ffff8801bc18f0f0 As stated by RFC 7112 section 5: When a host fragments an IPv6 datagram, it MUST include the entire IPv6 Header Chain in the First Fragment. So this patch addresses the issue dropping datagrams with excessive extheader length. It also updates the error path to report to the calling socket nonnegative pmtu values. The issue apparently predates git history. v1 -> v2: cleanup error path, as per Eric's suggestion Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+91e6f9932ff122fa4410@syzkaller.appspotmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-25netlink: make sure nladdr has correct size in netlink_connect()Alexander Potapenko
KMSAN reports use of uninitialized memory in the case when |alen| is smaller than sizeof(struct sockaddr_nl), and therefore |nladdr| isn't fully copied from the userspace. Signed-off-by: Alexander Potapenko <glider@google.com> Fixes: 1da177e4c3f41524 ("Linux-2.6.12-rc2") Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-25net/ipv4: disable SMC TCP option with SYN CookiesHans Wippel
Currently, the SMC experimental TCP option in a SYN packet is lost on the server side when SYN Cookies are active. However, the corresponding SYNACK sent back to the client contains the SMC option. This causes an inconsistent view of the SMC capabilities on the client and server. This patch disables the SMC option in the SYNACK when SYN Cookies are active to avoid this issue. Fixes: 60e2a7780793b ("tcp: TCP experimental option for SMC") Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Don't pick fixed hash implementation for NFT_SET_EVAL sets, otherwise userspace hits EOPNOTSUPP with valid rules using the meter statement, from Florian Westphal. 2) If you send a batch that flushes the existing ruleset (that contains a NAT chain) and the new ruleset definition comes with a new NAT chain, don't bogusly hit EBUSY. Also from Florian. 3) Missing netlink policy attribute validation, from Florian. 4) Detach conntrack template from skbuff if IP_NODEFRAG is set on, from Paolo Abeni. 5) Cache device names in flowtable object, otherwise we may end up walking over devices going aways given no rtnl_lock is held. 6) Fix incorrect net_device ingress with ingress hooks. 7) Fix crash when trying to read more data than available in UDP packets from the nf_socket infrastructure, from Subash. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-24netfilter: nf_socket: Fix out of bounds access in nf_sk_lookup_slow_v{4,6}Subash Abhinov Kasiviswanathan
skb_header_pointer will copy data into a buffer if data is non linear, otherwise it will return a pointer in the linear section of the data. nf_sk_lookup_slow_v{4,6} always copies data of size udphdr but later accesses memory within the size of tcphdr (th->doff) in case of TCP packets. This causes a crash when running with KASAN with the following call stack - BUG: KASAN: stack-out-of-bounds in xt_socket_lookup_slow_v4+0x524/0x718 net/netfilter/xt_socket.c:178 Read of size 2 at addr ffffffe3d417a87c by task syz-executor/28971 CPU: 2 PID: 28971 Comm: syz-executor Tainted: G B W O 4.9.65+ #1 Call trace: [<ffffff9467e8d390>] dump_backtrace+0x0/0x428 arch/arm64/kernel/traps.c:76 [<ffffff9467e8d7e0>] show_stack+0x28/0x38 arch/arm64/kernel/traps.c:226 [<ffffff946842d9b8>] __dump_stack lib/dump_stack.c:15 [inline] [<ffffff946842d9b8>] dump_stack+0xd4/0x124 lib/dump_stack.c:51 [<ffffff946811d4b0>] print_address_description+0x68/0x258 mm/kasan/report.c:248 [<ffffff946811d8c8>] kasan_report_error mm/kasan/report.c:347 [inline] [<ffffff946811d8c8>] kasan_report.part.2+0x228/0x2f0 mm/kasan/report.c:371 [<ffffff946811df44>] kasan_report+0x5c/0x70 mm/kasan/report.c:372 [<ffffff946811bebc>] check_memory_region_inline mm/kasan/kasan.c:308 [inline] [<ffffff946811bebc>] __asan_load2+0x84/0x98 mm/kasan/kasan.c:739 [<ffffff94694d6f04>] __tcp_hdrlen include/linux/tcp.h:35 [inline] [<ffffff94694d6f04>] xt_socket_lookup_slow_v4+0x524/0x718 net/netfilter/xt_socket.c:178 Fix this by copying data into appropriate size headers based on protocol. Fixes: a583636a83ea ("inet: refactor inet[6]_lookup functions to take skb") Signed-off-by: Tejaswi Tanikella <tejaswit@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-24batman-adv: fix packet loss for broadcasted DHCP packets to a serverLinus Lüssing
DHCP connectivity issues can currently occur if the following conditions are met: 1) A DHCP packet from a client to a server 2) This packet has a multicast destination 3) This destination has a matching entry in the translation table (FF:FF:FF:FF:FF:FF for IPv4, 33:33:00:01:00:02/33:33:00:01:00:03 for IPv6) 4) The orig-node determined by TT for the multicast destination does not match the orig-node determined by best-gateway-selection In this case the DHCP packet will be dropped. The "gateway-out-of-range" check is supposed to only be applied to unicasted DHCP packets to a specific DHCP server. In that case dropping the the unicasted frame forces the client to retry via a broadcasted one, but now directed to the new best gateway. A DHCP packet with broadcast/multicast destination is already ensured to always be delivered to the best gateway. Dropping a multicasted DHCP packet here will only prevent completing DHCP as there is no other fallback. So far, it seems the unicast check was implicitly performed by expecting the batadv_transtable_search() to return NULL for multicast destinations. However, a multicast address could have always ended up in the translation table and in fact is now common. To fix this potential loss of a DHCP client-to-server packet to a multicast address this patch adds an explicit multicast destination check to reliably bail out of the gateway-out-of-range check for such destinations. The issue and fix were tested in the following three node setup: - Line topology, A-B-C - A: gateway client, DHCP client - B: gateway server, hop-penalty increased: 30->60, DHCP server - C: gateway server, code modifications to announce FF:FF:FF:FF:FF:FF Without this patch, A would never transmit its DHCP Discover packet due to an always "out-of-range" condition. With this patch, a full DHCP handshake between A and B was possible again. Fixes: be7af5cf9cae ("batman-adv: refactoring gateway handling code") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-03-24batman-adv: fix multicast-via-unicast transmission with AP isolationLinus Lüssing
For multicast frames AP isolation is only supposed to be checked on the receiving nodes and never on the originating one. Furthermore, the isolation or wifi flag bits should only be intepreted as such for unicast and never multicast TT entries. By injecting flags to the multicast TT entry claimed by a single target node it was verified in tests that this multicast address becomes unreachable, leading to packet loss. Omitting the "src" parameter to the batadv_transtable_search() call successfully skipped the AP isolation check and made the target reachable again. Fixes: 1d8ab8d3c176 ("batman-adv: Modified forwarding behaviour for multicast packets") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-03-23ipv6: fix possible deadlock in rt6_age_examine_exception()Eric Dumazet
syzbot reported a LOCKDEP splat [1] in rt6_age_examine_exception() rt6_age_examine_exception() is called while rt6_exception_lock is held. This lock is the lower one in the lock hierarchy, thus we can not call dst_neigh_lookup() function, as it can fallback to neigh_create() We should instead do a pure RCU lookup. As a bonus we avoid a pair of atomic operations on neigh refcount. [1] WARNING: possible circular locking dependency detected 4.16.0-rc4+ #277 Not tainted syz-executor7/4015 is trying to acquire lock: (&ndev->lock){++--}, at: [<00000000416dce19>] __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928 but task is already holding lock: (&tbl->lock){++-.}, at: [<00000000b5cb1d65>] neigh_ifdown+0x3d/0x250 net/core/neighbour.c:292 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&tbl->lock){++-.}: __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline] _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312 __neigh_create+0x87e/0x1d90 net/core/neighbour.c:528 neigh_create include/net/neighbour.h:315 [inline] ip6_neigh_lookup+0x9a7/0xba0 net/ipv6/route.c:228 dst_neigh_lookup include/net/dst.h:405 [inline] rt6_age_examine_exception net/ipv6/route.c:1609 [inline] rt6_age_exceptions+0x381/0x660 net/ipv6/route.c:1645 fib6_age+0xfb/0x140 net/ipv6/ip6_fib.c:2033 fib6_clean_node+0x389/0x580 net/ipv6/ip6_fib.c:1919 fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1845 fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1893 fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1970 __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1986 fib6_clean_all net/ipv6/ip6_fib.c:1997 [inline] fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2053 ndisc_netdev_event+0x3c2/0x4a0 net/ipv6/ndisc.c:1781 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707 call_netdevice_notifiers net/core/dev.c:1725 [inline] __dev_notify_flags+0x262/0x430 net/core/dev.c:6960 dev_change_flags+0xf5/0x140 net/core/dev.c:6994 devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080 inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919 sock_do_ioctl+0xef/0x390 net/socket.c:957 sock_ioctl+0x36b/0x610 net/socket.c:1081 vfs_ioctl fs/ioctl.c:46 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686 SYSC_ioctl fs/ioctl.c:701 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 -> #2 (rt6_exception_lock){+.-.}: __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline] _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168 spin_lock_bh include/linux/spinlock.h:315 [inline] rt6_flush_exceptions+0x21/0x210 net/ipv6/route.c:1367 fib6_del_route net/ipv6/ip6_fib.c:1677 [inline] fib6_del+0x624/0x12c0 net/ipv6/ip6_fib.c:1761 __ip6_del_rt+0xc7/0x120 net/ipv6/route.c:2980 ip6_del_rt+0x132/0x1a0 net/ipv6/route.c:2993 __ipv6_dev_ac_dec+0x3b1/0x600 net/ipv6/anycast.c:332 ipv6_dev_ac_dec net/ipv6/anycast.c:345 [inline] ipv6_sock_ac_close+0x2b4/0x3e0 net/ipv6/anycast.c:200 inet6_release+0x48/0x70 net/ipv6/af_inet6.c:433 sock_release+0x8d/0x1e0 net/socket.c:594 sock_close+0x16/0x20 net/socket.c:1149 __fput+0x327/0x7e0 fs/file_table.c:209 ____fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x199/0x270 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x9bb/0x1ad0 kernel/exit.c:865 do_group_exit+0x149/0x400 kernel/exit.c:968 get_signal+0x73a/0x16d0 kernel/signal.c:2469 do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809 exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:265 [inline] do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292 entry_SYSCALL_64_after_hwframe+0x42/0xb7 -> #1 (&(&tb->tb6_lock)->rlock){+.-.}: __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline] _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168 spin_lock_bh include/linux/spinlock.h:315 [inline] __ip6_ins_rt+0x56/0x90 net/ipv6/route.c:1007 ip6_route_add+0x141/0x190 net/ipv6/route.c:2955 addrconf_prefix_route+0x44f/0x620 net/ipv6/addrconf.c:2359 fixup_permanent_addr net/ipv6/addrconf.c:3368 [inline] addrconf_permanent_addr net/ipv6/addrconf.c:3391 [inline] addrconf_notify+0x1ad2/0x2310 net/ipv6/addrconf.c:3460 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707 call_netdevice_notifiers net/core/dev.c:1725 [inline] __dev_notify_flags+0x15d/0x430 net/core/dev.c:6958 dev_change_flags+0xf5/0x140 net/core/dev.c:6994 do_setlink+0xa22/0x3bb0 net/core/rtnetlink.c:2357 rtnl_newlink+0xf37/0x1a50 net/core/rtnetlink.c:2965 rtnetlink_rcv_msg+0x57f/0xb10 net/core/rtnetlink.c:4641 netlink_rcv_skb+0x14b/0x380 net/netlink/af_netlink.c:2444 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4659 netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline] netlink_unicast+0x4c4/0x6b0 net/netlink/af_netlink.c:1334 netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xca/0x110 net/socket.c:639 ___sys_sendmsg+0x767/0x8b0 net/socket.c:2047 __sys_sendmsg+0xe5/0x210 net/socket.c:2081 SYSC_sendmsg net/socket.c:2092 [inline] SyS_sendmsg+0x2d/0x50 net/socket.c:2088 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 -> #0 (&ndev->lock){++--}: lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920 __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline] _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312 __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928 ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961 pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392 pneigh_ifdown net/core/neighbour.c:695 [inline] neigh_ifdown+0x149/0x250 net/core/neighbour.c:294 rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874 addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633 addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707 call_netdevice_notifiers net/core/dev.c:1725 [inline] __dev_notify_flags+0x262/0x430 net/core/dev.c:6960 dev_change_flags+0xf5/0x140 net/core/dev.c:6994 devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080 inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919 packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066 sock_do_ioctl+0xef/0x390 net/socket.c:957 sock_ioctl+0x36b/0x610 net/socket.c:1081 vfs_ioctl fs/ioctl.c:46 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686 SYSC_ioctl fs/ioctl.c:701 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 other info that might help us debug this: Chain exists of: &ndev->lock --> rt6_exception_lock --> &tbl->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&tbl->lock); lock(rt6_exception_lock); lock(&tbl->lock); lock(&ndev->lock); *** DEADLOCK *** 2 locks held by syz-executor7/4015: #0: (rtnl_mutex){+.+.}, at: [<00000000a2f16daa>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 #1: (&tbl->lock){++-.}, at: [<00000000b5cb1d65>] neigh_ifdown+0x3d/0x250 net/core/neighbour.c:292 stack backtrace: CPU: 0 PID: 4015 Comm: syz-executor7 Not tainted 4.16.0-rc4+ #277 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 print_circular_bug.isra.38+0x2cd/0x2dc kernel/locking/lockdep.c:1223 check_prev_add kernel/locking/lockdep.c:1863 [inline] check_prevs_add kernel/locking/lockdep.c:1976 [inline] validate_chain kernel/locking/lockdep.c:2417 [inline] __lock_acquire+0x30a8/0x3e00 kernel/locking/lockdep.c:3431 lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920 __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline] _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312 __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928 ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961 pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392 pneigh_ifdown net/core/neighbour.c:695 [inline] neigh_ifdown+0x149/0x250 net/core/neighbour.c:294 rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874 addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633 addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707 call_netdevice_notifiers net/core/dev.c:1725 [inline] __dev_notify_flags+0x262/0x430 net/core/dev.c:6960 dev_change_flags+0xf5/0x140 net/core/dev.c:6994 devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080 inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919 packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066 sock_do_ioctl+0xef/0x390 net/socket.c:957 sock_ioctl+0x36b/0x610 net/socket.c:1081 vfs_ioctl fs/ioctl.c:46 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686 SYSC_ioctl fs/ioctl.c:701 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: c757faa8bfa2 ("ipv6: prepare fib6_age() for exception table") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23ip_tunnel: Emit events for post-register MTU changesPetr Machata
For tunnels created with IFLA_MTU, MTU of the netdevice is set by rtnl_create_link() (called from rtnl_newlink()) before the device is registered. However without IFLA_MTU that's not done. rtnl_newlink() proceeds by calling struct rtnl_link_ops.newlink, which via ip_tunnel_newlink() calls register_netdevice(), and that emits NETDEV_REGISTER. Thus any listeners that inspect the netdevice get the MTU of 0. After ip_tunnel_newlink() corrects the MTU after registering the netdevice, but since there's no event, the listeners don't get to know about the MTU until something else happens--such as a NETDEV_UP event. That's not ideal. So instead of setting the MTU directly, go through dev_set_mtu(), which takes care of distributing the necessary NETDEV_PRECHANGEMTU and NETDEV_CHANGEMTU events. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23xfrm: Fix transport mode skb control buffer usage.Steffen Klassert
A recent commit introduced a new struct xfrm_trans_cb that is used with the sk_buff control buffer. Unfortunately it placed the structure in front of the control buffer and overlooked that the IPv4/IPv6 control buffer is still needed for some layer 4 protocols. As a result the IPv4/IPv6 control buffer is overwritten with this structure. Fix this by setting a apropriate header in front of the structure. Fixes acf568ee859f ("xfrm: Reinject transport-mode packets ...") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-03-22Merge tag 'mac80211-for-davem-2018-03-21' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Two more fixes (in three patches): * ath9k_htc doesn't like QoS NDP frames, use regular ones * hwsim: set up wmediumd for radios created later ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net/ipv6: Handle onlink flag with multipath routesDavid Ahern
For multipath routes the ONLINK flag can be specified per nexthop in rtnh_flags or globally in rtm_flags. Update ip6_route_multipath_add to consider the ONLINK setting coming from rtnh_flags. Each loop over nexthops the config for the sibling route is initialized to the global config and then per nexthop settings overlayed. The flag is 'or'ed into fib6_config to handle the ONLINK flag coming from either rtm_flags or rtnh_flags. Fixes: fc1e64e1092f ("net/ipv6: Add support for onlink flag") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22ipv6: sr: fix NULL pointer dereference when setting encap source addressDavid Lebrun
When using seg6 in encap mode, we call ipv6_dev_get_saddr() to set the source address of the outer IPv6 header, in case none was specified. Using skb->dev can lead to BUG() when it is in an inconsistent state. This patch uses the net_device attached to the skb's dst instead. [940807.667429] BUG: unable to handle kernel NULL pointer dereference at 000000000000047c [940807.762427] IP: ipv6_dev_get_saddr+0x8b/0x1d0 [940807.815725] PGD 0 P4D 0 [940807.847173] Oops: 0000 [#1] SMP PTI [940807.890073] Modules linked in: [940807.927765] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G W 4.16.0-rc1-seg6bpf+ #2 [940808.028988] Hardware name: HP ProLiant DL120 G6/ProLiant DL120 G6, BIOS O26 09/06/2010 [940808.128128] RIP: 0010:ipv6_dev_get_saddr+0x8b/0x1d0 [940808.187667] RSP: 0018:ffff88043fd836b0 EFLAGS: 00010206 [940808.251366] RAX: 0000000000000005 RBX: ffff88042cb1c860 RCX: 00000000000000fe [940808.338025] RDX: 00000000000002c0 RSI: ffff88042cb1c860 RDI: 0000000000004500 [940808.424683] RBP: ffff88043fd83740 R08: 0000000000000000 R09: ffffffffffffffff [940808.511342] R10: 0000000000000040 R11: 0000000000000000 R12: ffff88042cb1c850 [940808.598012] R13: ffffffff8208e380 R14: ffff88042ac8da00 R15: 0000000000000002 [940808.684675] FS: 0000000000000000(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000 [940808.783036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [940808.852975] CR2: 000000000000047c CR3: 00000004255fe000 CR4: 00000000000006e0 [940808.939634] Call Trace: [940808.970041] <IRQ> [940808.995250] ? ip6t_do_table+0x265/0x640 [940809.043341] seg6_do_srh_encap+0x28f/0x300 [940809.093516] ? seg6_do_srh+0x1a0/0x210 [940809.139528] seg6_do_srh+0x1a0/0x210 [940809.183462] seg6_output+0x28/0x1e0 [940809.226358] lwtunnel_output+0x3f/0x70 [940809.272370] ip6_xmit+0x2b8/0x530 [940809.313185] ? ac6_proc_exit+0x20/0x20 [940809.359197] inet6_csk_xmit+0x7d/0xc0 [940809.404173] tcp_transmit_skb+0x548/0x9a0 [940809.453304] __tcp_retransmit_skb+0x1a8/0x7a0 [940809.506603] ? ip6_default_advmss+0x40/0x40 [940809.557824] ? tcp_current_mss+0x24/0x90 [940809.605925] tcp_retransmit_skb+0xd/0x80 [940809.654016] tcp_xmit_retransmit_queue.part.17+0xf9/0x210 [940809.719797] tcp_ack+0xa47/0x1110 [940809.760612] tcp_rcv_established+0x13c/0x570 [940809.812865] tcp_v6_do_rcv+0x151/0x3d0 [940809.858879] tcp_v6_rcv+0xa5c/0xb10 [940809.901770] ? seg6_output+0xdd/0x1e0 [940809.946745] ip6_input_finish+0xbb/0x460 [940809.994837] ip6_input+0x74/0x80 [940810.034612] ? ip6_rcv_finish+0xb0/0xb0 [940810.081663] ipv6_rcv+0x31c/0x4c0 ... Fixes: 6c8702c60b886 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Reported-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>