summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-03-02net: sit: fix memory leak in sit_init_net()Mao Wenan
If register_netdev() is failed to register sitn->fb_tunnel_dev, it will go to err_reg_dev and forget to free netdev(sitn->fb_tunnel_dev). BUG: memory leak unreferenced object 0xffff888378daad00 (size 512): comm "syz-executor.1", pid 4006, jiffies 4295121142 (age 16.115s) hex dump (first 32 bytes): 00 e6 ed c0 83 88 ff ff 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d6dcb63e>] kvmalloc include/linux/mm.h:577 [inline] [<00000000d6dcb63e>] kvzalloc include/linux/mm.h:585 [inline] [<00000000d6dcb63e>] netif_alloc_netdev_queues net/core/dev.c:8380 [inline] [<00000000d6dcb63e>] alloc_netdev_mqs+0x600/0xcc0 net/core/dev.c:8970 [<00000000867e172f>] sit_init_net+0x295/0xa40 net/ipv6/sit.c:1848 [<00000000871019fa>] ops_init+0xad/0x3e0 net/core/net_namespace.c:129 [<00000000319507f6>] setup_net+0x2ba/0x690 net/core/net_namespace.c:314 [<0000000087db4f96>] copy_net_ns+0x1dc/0x330 net/core/net_namespace.c:437 [<0000000057efc651>] create_new_namespaces+0x382/0x730 kernel/nsproxy.c:107 [<00000000676f83de>] copy_namespaces+0x2ed/0x3d0 kernel/nsproxy.c:165 [<0000000030b74bac>] copy_process.part.27+0x231e/0x6db0 kernel/fork.c:1919 [<00000000fff78746>] copy_process kernel/fork.c:1713 [inline] [<00000000fff78746>] _do_fork+0x1bc/0xe90 kernel/fork.c:2224 [<000000001c2e0d1c>] do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290 [<00000000ec48bd44>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<0000000039acff8a>] 0xffffffffffffffff Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02net: dsa: mv88e6xxx: Fix statistics on mv88e6161Andrew Lunn
Despite what the datesheet says, the silicon implements the older way of snapshoting the statistics. Change the op. Reported-by: Chris.Healy@zii.aero Tested-by: Chris.Healy@zii.aero Fixes: 0ac64c394900 ("net: dsa: mv88e6xxx: mv88e6161 uses mv88e6320 stats snapshot") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01geneve: correctly handle ipv6.disable module parameterJiri Benc
When IPv6 is compiled but disabled at runtime, geneve_sock_add returns -EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole operation of bringing up the tunnel. Ignore failure of IPv6 socket creation for metadata based tunnels caused by IPv6 not being available. This is the same fix as what commit d074bf960044 ("vxlan: correctly handle ipv6.disable module parameter") is doing for vxlan. Note there's also commit c0a47e44c098 ("geneve: should not call rt6_lookup() when ipv6 was disabled") which fixes a similar issue but for regular tunnels, while this patch is needed for metadata based tunnels. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2019-03-01 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) fix sanitation rewrite, from Daniel. 2) fix error path on map_new_fd, from Peng. 3) fix icache flush address, from Paul. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmodeHeiner Kallweit
When debugging another issue I faced an interrupt storm in this driver (88E6390, port 9 in SGMII mode), consisting of alternating link-up / link-down interrupts. Analysis showed that the driver wanted to set a cmode that was set already. But so far mv88e6390x_port_set_cmode() doesn't check this and powers down SERDES, what causes the link to break, and eventually results in the described interrupt storm. Fix this by checking whether the cmode actually changes. We want that the very first call to mv88e6390x_port_set_cmode() always configures the registers, therefore initialize port.cmode with a value that is different from any supported cmode value. We have to take care that we only init the ports cmode once chip->info->num_ports is set. v2: - add small helper and init the number of actual ports only Fixes: 364e9d7776a3 ("net: dsa: mv88e6xxx: Power on/off SERDES on cmode change") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01bpf: fix sanitation rewrite in case of non-pointersDaniel Borkmann
Marek reported that he saw an issue with the below snippet in that timing measurements where off when loaded as unpriv while results were reasonable when loaded as privileged: [...] uint64_t a = bpf_ktime_get_ns(); uint64_t b = bpf_ktime_get_ns(); uint64_t delta = b - a; if ((int64_t)delta > 0) { [...] Turns out there is a bug where a corner case is missing in the fix d3bd7413e0ca ("bpf: fix sanitation of alu op with pointer / scalar type from different paths"), namely fixup_bpf_calls() only checks whether aux has a non-zero alu_state, but it also needs to test for the case of BPF_ALU_NON_POINTER since in both occasions we need to skip the masking rewrite (as there is nothing to mask). Fixes: d3bd7413e0ca ("bpf: fix sanitation of alu op with pointer / scalar type from different paths") Reported-by: Marek Majkowski <marek@cloudflare.com> Reported-by: Arthur Fabre <afabre@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/netdev/CAJPywTJqP34cK20iLM5YmUMz9KXQOdu1-+BZrGMAGgLuBWz7fg@mail.gmail.com/T/ Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-01ipv4: Add ICMPv6 support when parse route ipprotoHangbin Liu
For ip rules, we need to use 'ipproto ipv6-icmp' to match ICMPv6 headers. But for ip -6 route, currently we only support tcp, udp and icmp. Add ICMPv6 support so we can match ipv6-icmp rules for route lookup. v2: As David Ahern and Sabrina Dubroca suggested, Add an argument to rtm_getroute_parse_ip_proto() to handle ICMP/ICMPv6 with different family. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: eacb9384a3fe ("ipv6: support sport, dport and ip_proto in RTM_GETROUTE") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02MIPS: eBPF: Fix icache flush end addressPaul Burton
The MIPS eBPF JIT calls flush_icache_range() in order to ensure the icache observes the code that we just wrote. Unfortunately it gets the end address calculation wrong due to some bad pointer arithmetic. The struct jit_ctx target field is of type pointer to u32, and as such adding one to it will increment the address being pointed to by 4 bytes. Therefore in order to find the address of the end of the code we simply need to add the number of 4 byte instructions emitted, but we mistakenly add the number of instructions multiplied by 4. This results in the call to flush_icache_range() operating on a memory region 4x larger than intended, which is always wasteful and can cause crashes if we overrun into an unmapped page. Fix this by correcting the pointer arithmetic to remove the bogus multiplication, and use braces to remove the need for a set of brackets whilst also making it obvious that the target field is a pointer. Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: b6bd53f9c4e8 ("MIPS: Add missing file for eBPF JIT.") Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: netdev@vger.kernel.org Cc: bpf@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01lan743x: Fix TX Stall IssueBryan Whitehead
It has been observed that tx queue stalls while downloading from certain web sites (example www.speedtest.net) The cause has been tracked down to a corner case where dma descriptors where not setup properly. And there for a tx completion interrupt was not signaled. This fix corrects the problem by properly marking the end of a multi descriptor transmission. Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: phy: phylink: fix uninitialized variable in phylink_get_mac_stateHeiner Kallweit
When debugging an issue I found implausible values in state->pause. Reason in that state->pause isn't initialized and later only single bits are changed. Also the struct itself isn't initialized in phylink_resolve(). So better initialize state->pause and other not yet initialized fields. v2: - use right function name in subject v3: - initialize additional fields Fixes: 9525ae83959b ("phylink: add phylink infrastructure") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: aquantia: regression on cpus with high cores: set mode with 8 queuesDmitry Bogdanov
Recently the maximum number of queues was increased up to 8, but NIC was not fully configured for 8 queues. In setups with more than 4 CPU cores parts of TX traffic gets lost if the kernel routes it to queues 4th-8th. This patch sets a tx hw traffic mode with 8 queues. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202651 Fixes: 71a963cfc50b ("net: aquantia: increase max number of hw queues") Reported-by: Nicholas Johnson <nicholas.johnson@outlook.com.au> Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01selftests: fixes for UDP GROPaolo Abeni
The current implementation for UDP GRO tests is racy: the receiver may flush the RX queue while the sending is still transmitting and incorrectly report RX errors, with a wrong number of packet received. Add explicit timeouts to the receiver for both connection activation (first packet received for UDP) and reception completion, so that in the above critical scenario the receiver will wait for the transfer completion. Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01bpf: drop refcount if bpf_map_new_fd() fails in map_create()Peng Sun
In bpf/syscall.c, map_create() first set map->usercnt to 1, a file descriptor is supposed to return to userspace. When bpf_map_new_fd() fails, drop the refcount. Fixes: bd5f5f4ecb78 ("bpf: Add BPF_MAP_GET_FD_BY_ID") Signed-off-by: Peng Sun <sironhide0null@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-28net: dsa: mv88e6xxx: power serdes on/off for 10G interfaces on 6390XMaxime Chevallier
Upon setting the cmode on 6390 and 6390X, the associated serdes interfaces must be powered off/on. Both 6390X and 6390 share code to do so, but it currently uses the 6390 specific helper mv88e6390_serdes_power() to disable and enable the serdes interface. This call will fail silently on 6390X when trying so set a 10G interface such as XAUI or RXAUI, since mv88e6390_serdes_power() internally grabs the lane number based on modes supported by the 6390, and returns 0 when getting -ENODEV as a lane number. Using mv88e6390x_serdes_power() should be safe here, since we explicitly rule-out all ports but the 9 and 10, and because modes supported by 6390 ports 9 and 10 are a subset of those supported on 6390X. This was tested on 6390X using RXAUI mode. Fixes: 364e9d7776a3 ("net: dsa: mv88e6xxx: Power on/off SERDES on cmode change") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28net: dsa: mv88e6xxx: Fix u64 statisticsAndrew Lunn
The switch maintains u64 counters for the number of octets sent and received. These are kept as two u32's which need to be combined. Fix the combing, which wrongly worked on u16's. Fixes: 80c4627b2719 ("dsa: mv88x6xxx: Refactor getting a single statistic") Reported-by: Chris Healy <Chris.Healy@zii.aero> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28xen-netback: don't populate the hash cache on XenBus disconnectIgor Druzhinin
Occasionally, during the disconnection procedure on XenBus which includes hash cache deinitialization there might be some packets still in-flight on other processors. Handling of these packets includes hashing and hash cache population that finally results in hash cache data structure corruption. In order to avoid this we prevent hashing of those packets if there are no queues initialized. In that case RCU protection of queues guards the hash cache as well. Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28xen-netback: fix occasional leak of grant ref mappings under memory pressureIgor Druzhinin
Zero-copy callback flag is not yet set on frag list skb at the moment xenvif_handle_frag_list() returns -ENOMEM. This eventually results in leaking grant ref mappings since xenvif_zerocopy_callback() is never called for these fragments. Those eventually build up and cause Xen to kill Dom0 as the slots get reused for new mappings: "d0v0 Attempt to implicitly unmap a granted PTE c010000329fce005" That behavior is observed under certain workloads where sudden spikes of page cache writes coexist with active atomic skb allocations from network traffic. Additionally, rework the logic to deal with frag_list deallocation in a single place. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28sctp: chunk.c: correct format string for size_t in printkMatthias Maennich
According to Documentation/core-api/printk-formats.rst, size_t should be printed with %zu, rather than %Zu. In addition, using %Zu triggers a warning on clang (-Wformat-extra-args): net/sctp/chunk.c:196:25: warning: data argument not used by format string [-Wformat-extra-args] __func__, asoc, max_data); ~~~~~~~~~~~~~~~~^~~~~~~~~ ./include/linux/printk.h:440:49: note: expanded from macro 'pr_warn_ratelimited' printk_ratelimited(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~ ./include/linux/printk.h:424:17: note: expanded from macro 'printk_ratelimited' printk(fmt, ##__VA_ARGS__); \ ~~~ ^ Fixes: 5b5e0928f742 ("lib/vsprintf.c: remove %Z support") Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Matthias Maennich <maennich@google.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28net: netem: fix skb length BUG_ON in __skb_to_sgvecSheng Lan
It can be reproduced by following steps: 1. virtio_net NIC is configured with gso/tso on 2. configure nginx as http server with an index file bigger than 1M bytes 3. use tc netem to produce duplicate packets and delay: tc qdisc add dev eth0 root netem delay 100ms 10ms 30% duplicate 90% 4. continually curl the nginx http server to get index file on client 5. BUG_ON is seen quickly [10258690.371129] kernel BUG at net/core/skbuff.c:4028! [10258690.371748] invalid opcode: 0000 [#1] SMP PTI [10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.0.0-rc6 #2 [10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202 [10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea [10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002 [10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028 [10258690.372094] FS: 0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000 [10258690.372094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0 [10258690.372094] Call Trace: [10258690.372094] <IRQ> [10258690.372094] skb_to_sgvec+0x11/0x40 [10258690.372094] start_xmit+0x38c/0x520 [virtio_net] [10258690.372094] dev_hard_start_xmit+0x9b/0x200 [10258690.372094] sch_direct_xmit+0xff/0x260 [10258690.372094] __qdisc_run+0x15e/0x4e0 [10258690.372094] net_tx_action+0x137/0x210 [10258690.372094] __do_softirq+0xd6/0x2a9 [10258690.372094] irq_exit+0xde/0xf0 [10258690.372094] smp_apic_timer_interrupt+0x74/0x140 [10258690.372094] apic_timer_interrupt+0xf/0x20 [10258690.372094] </IRQ> In __skb_to_sgvec(), the skb->len is not equal to the sum of the skb's linear data size and nonlinear data size, thus BUG_ON triggered. Because the skb is cloned and a part of nonlinear data is split off. Duplicate packet is cloned in netem_enqueue() and may be delayed some time in qdisc. When qdisc len reached the limit and returns NET_XMIT_DROP, the skb will be retransmit later in write queue. the skb will be fragmented by tso_fragment(), the limit size that depends on cwnd and mss decrease, the skb's nonlinear data will be split off. The length of the skb cloned by netem will not be updated. When we use virtio_net NIC and invoke skb_to_sgvec(), the BUG_ON trigger. To fix it, netem returns NET_XMIT_SUCCESS to upper stack when it clones a duplicate packet. Fixes: 35d889d1 ("sch_netem: fix skb leak in netem_enqueue()") Signed-off-by: Sheng Lan <lansheng@huawei.com> Reported-by: Qin Ji <jiqin.ji@huawei.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27netlabel: fix out-of-bounds memory accessesPaul Moore
There are two array out-of-bounds memory accesses, one in cipso_v4_map_lvl_valid(), the other in netlbl_bitmap_walk(). Both errors are embarassingly simple, and the fixes are straightforward. As a FYI for anyone backporting this patch to kernels prior to v4.8, you'll want to apply the netlbl_bitmap_walk() patch to cipso_v4_bitmap_walk() as netlbl_bitmap_walk() doesn't exist before Linux v4.8. Reported-by: Jann Horn <jannh@google.com> Fixes: 446fda4f2682 ("[NetLabel]: CIPSOv4 engine") Fixes: 3faa8f982f95 ("netlabel: Move bitmap manipulation functions to the NetLabel core.") Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27ipv4: Pass original device to ip_rcv_finish_coreDavid Ahern
ip_route_input_rcu expects the original ingress device (e.g., for proper multicast handling). The skb->dev can be changed by l3mdev_ip_rcv, so dev needs to be saved prior to calling it. This was the behavior prior to the listify changes. Fixes: 5fa12739a53d0 ("net: ipv4: listify ip_rcv_finish") Cc: Edward Cree <ecree@solarflare.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27Merge branch 'pmtu-selftest-fixes'David S. Miller
Paolo Abeni says: ==================== selftests: pmtu: fix and increase coverage This series includes a fixup for the pmtu.sh test script, related to IPv6 address management, and adds coverage for the recently reported and fixed PMTU exception issue v2 -> v3: - more cleanups v1 -> v2: - several script cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27selftests: pmtu: add explicit tests for PMTU exceptions cleanupPaolo Abeni
Add a couple of new tests, explicitly checking that the kernel timely releases PMTU exceptions on related device removal. This is mostly a regression test vs the issue fixed by commit f5b51fe804ec ("ipv6: route: purge exception on removal") Only 2 new test cases have been added, instead of extending all the existing ones, because the reproducer requires executing several commands and would slow down too much the tests otherwise. v2 -> v3: - more cleanup, still from Stefano v1 -> v2: - several script cleanups, as suggested by Stefano Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27selftests: pmtu: disable DAD in all namespacesPaolo Abeni
Otherwise, the configured IPv6 address could be still "tentative" at test time, possibly causing tests failures. We can also drop some sleep along the code and decrease the timeout for most commands so that the test runtime decreases. v1 -> v2: - fix comment (Stefano) Fixes: d1f1b9cbf34c ("selftests: net: Introduce first PMTU test") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27net: phy: dp83867: add soft reset delayMax Uvarov
Similar to dp83640 delay after soft reset is needed to set up registers correctly. Signed-off-by: Max Uvarov <muvarov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27net: nfc: Fix NULL dereference on nfc_llcp_build_tlv failsYueHaibing
KASAN report this: BUG: KASAN: null-ptr-deref in nfc_llcp_build_gb+0x37f/0x540 [nfc] Read of size 3 at addr 0000000000000000 by task syz-executor.0/5401 CPU: 0 PID: 5401 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xfa/0x1ce lib/dump_stack.c:113 kasan_report+0x171/0x18d mm/kasan/report.c:321 memcpy+0x1f/0x50 mm/kasan/common.c:130 nfc_llcp_build_gb+0x37f/0x540 [nfc] nfc_llcp_register_device+0x6eb/0xb50 [nfc] nfc_register_device+0x50/0x1d0 [nfc] nfcsim_device_new+0x394/0x67d [nfcsim] ? 0xffffffffc1080000 nfcsim_init+0x6b/0x1000 [nfcsim] do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f9cb79dcc58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003 RBP: 00007f9cb79dcc70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9cb79dd6bc R13: 00000000004bcefb R14: 00000000006f7030 R15: 0000000000000004 nfc_llcp_build_tlv will return NULL on fails, caller should check it, otherwise will trigger a NULL dereference. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: eda21f16a5ed ("NFC: Set MIU and RW values from CONNECT and CC LLCP frames") Fixes: d646960f7986 ("NFC: Initial LLCP support") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27net: phy: Micrel KSZ8061: link failure after cable connectRajasingh Thavamani
With Micrel KSZ8061 PHY, the link may occasionally not come up after Ethernet cable connect. The vendor's (Microchip, former Micrel) errata sheet 80000688A.pdf descripes the problem and possible workarounds in detail, see below. The batch implements workaround 1, which permanently fixes the issue. DESCRIPTION Link-up may not occur properly when the Ethernet cable is initially connected. This issue occurs more commonly when the cable is connected slowly, but it may occur any time a cable is connected. This issue occurs in the auto-negotiation circuit, and will not occur if auto-negotiation is disabled (which requires that the two link partners be set to the same speed and duplex). END USER IMPLICATIONS When this issue occurs, link is not established. Subsequent cable plug/unplaug cycle will not correct the issue. WORk AROUND There are four approaches to work around this issue: 1. This issue can be prevented by setting bit 15 in MMD device address 1, register 2, prior to connecting the cable or prior to setting the Restart Auto-negotiation bit in register 0h. The MMD registers are accessed via the indirect access registers Dh and Eh, or via the Micrel EthUtil utility as shown here: . if using the EthUtil utility (usually with a Micrel KSZ8061 Evaluation Board), type the following commands: > address 1 > mmd 1 > iw 2 b61a . Alternatively, write the following registers to write to the indirect MMD register: Write register Dh, data 0001h Write register Eh, data 0002h Write register Dh, data 4001h Write register Eh, data B61Ah 2. The issue can be avoided by disabling auto-negotiation in the KSZ8061, either by the strapping option, or by clearing bit 12 in register 0h. Care must be taken to ensure that the KSZ8061 and the link partner will link with the same speed and duplex. Note that the KSZ8061 defaults to full-duplex when auto-negotiation is off, but other devices may default to half-duplex in the event of failed auto-negotiation. 3. The issue can be avoided by connecting the cable prior to powering-up or resetting the KSZ8061, and leaving it plugged in thereafter. 4. If the above measures are not taken and the problem occurs, link can be recovered by setting the Restart Auto-Negotiation bit in register 0h, or by resetting or power cycling the device. Reset may be either hardware reset or software reset (register 0h, bit 15). PLAN This errata will not be corrected in the future revision. Fixes: 7ab59dc15e2f ("drivers/net/phy/micrel_phy: Add support for new PHYs") Signed-off-by: Alexander Onnasch <alexander.onnasch@landisgyr.com> Signed-off-by: Rajasingh Thavamani <T.Rajasingh@landisgyr.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27enc28j60: Correct description of debug module parameterAndy Shevchenko
The netif_msg_init() API takes on input the amount of bits to be set. The description of debug parameter in the enc28j60 module is misleading in this sense and passing 0xffff does not give an expected behaviour. Fix the description of debug module parameter to show what exactly is expected. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27net: dev: Use unsigned integer as an argument to left-shiftAndy Shevchenko
1 << 31 is Undefined Behaviour according to the C standard. Use U type modifier to avoid theoretical overflow. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27bnxt_en: Drop oversize TX packets to prevent errors.Michael Chan
There have been reports of oversize UDP packets being sent to the driver to be transmitted, causing error conditions. The issue is likely caused by the dst of the SKB switching between 'lo' with 64K MTU and the hardware device with a smaller MTU. Patches are being proposed by Mahesh Bandewar <maheshb@google.com> to fix the issue. In the meantime, add a quick length check in the driver to prevent the error. The driver uses the TX packet size as index to look up an array to setup the TX BD. The array is large enough to support all MTU sizes supported by the driver. The oversize TX packet causes the driver to index beyond the array and put garbage values into the TX BD. Add a simple check to prevent this. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26tipc: fix race condition causing hung sendtoTung Nguyen
When sending multicast messages via blocking socket, if sending link is congested (tsk->cong_link_cnt is set to 1), the sending thread will be put into sleeping state. However, tipc_sk_filter_rcv() is called under socket spin lock but tipc_wait_for_cond() is not. So, there is no guarantee that the setting of tsk->cong_link_cnt to 0 in tipc_sk_proto_rcv() in CPU-1 will be perceived by CPU-0. If that is the case, the sending thread in CPU-0 after being waken up, will continue to see tsk->cong_link_cnt as 1 and put the sending thread into sleeping state again. The sending thread will sleep forever. CPU-0 | CPU-1 tipc_wait_for_cond() | { | // condition_ = !tsk->cong_link_cnt | while ((rc_ = !(condition_))) { | ... | release_sock(sk_); | wait_woken(); | | if (!sock_owned_by_user(sk)) | tipc_sk_filter_rcv() | { | ... | tipc_sk_proto_rcv() | { | ... | tsk->cong_link_cnt--; | ... | sk->sk_write_space(sk); | ... | } | ... | } sched_annotate_sleep(); | lock_sock(sk_); | remove_wait_queue(); | } | } | This commit fixes it by adding memory barrier to tipc_sk_proto_rcv() and tipc_wait_for_cond(). Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26hv_netvsc: Fix IP header checksum for coalesced packetsHaiyang Zhang
Incoming packets may have IP header checksum verified by the host. They may not have IP header checksum computed after coalescing. This patch re-compute the checksum when necessary, otherwise the packets may be dropped, because Linux network stack always checks it. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26Merge branch 'net-fail-route'David S. Miller
David Ahern says: ==================== net: Fail route add with unsupported nexthop attribute RTA_VIA was added for MPLS as a way of specifying a gateway from a different address family. IPv4 and IPv6 do not currently support RTA_VIA so using it leads to routes that are not what the user intended. Catch and fail - returning a proper error message. MPLS on the other hand does not support RTA_GATEWAY since it does not make sense to have a nexthop from the MPLS address family. Similarly, catch and fail - returning a proper error message. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26mpls: Return error for RTA_GATEWAY attributeDavid Ahern
MPLS does not support nexthops with an MPLS address family. Specifically, it does not handle RTA_GATEWAY attribute. Make it clear by returning an error. Fixes: 03c0566542f4c ("mpls: Netlink commands to add, remove, and dump routes") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26ipv6: Return error for RTA_VIA attributeDavid Ahern
IPv6 currently does not support nexthops outside of the AF_INET6 family. Specifically, it does not handle RTA_VIA attribute. If it is passed in a route add request, the actual route added only uses the device which is clearly not what the user intended: $ ip -6 ro add 2001:db8:2::/64 via inet 172.16.1.1 dev eth0 $ ip ro ls ... 2001:db8:2::/64 dev eth0 metric 1024 pref medium Catch this and fail the route add: $ ip -6 ro add 2001:db8:2::/64 via inet 172.16.1.1 dev eth0 Error: IPv6 does not support RTA_VIA attribute. Fixes: 03c0566542f4c ("mpls: Netlink commands to add, remove, and dump routes") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26ipv4: Return error for RTA_VIA attributeDavid Ahern
IPv4 currently does not support nexthops outside of the AF_INET family. Specifically, it does not handle RTA_VIA attribute. If it is passed in a route add request, the actual route added only uses the device which is clearly not what the user intended: $ ip ro add 172.16.1.0/24 via inet6 2001:db8:1::1 dev eth0 $ ip ro ls ... 172.16.1.0/24 dev eth0 Catch this and fail the route add: $ ip ro add 172.16.1.0/24 via inet6 2001:db8:1::1 dev eth0 Error: IPv4 does not support RTA_VIA attribute. Fixes: 03c0566542f4c ("mpls: Netlink commands to add, remove, and dump routes") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id()Peng Sun
In bpf/syscall.c, bpf_map_get_fd_by_id() use bpf_map_inc_not_zero() to increase the refcount, both map->refcnt and map->usercnt. Then, if bpf_map_new_fd() fails, should handle map->usercnt too. Fixes: bd5f5f4ecb78 ("bpf: Add BPF_MAP_GET_FD_BY_ID") Signed-off-by: Peng Sun <sironhide0null@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-25net: avoid use IPCB in cipso_v4_errorNazarov Sergey
Extract IP options in cipso_v4_error and use __icmp_send. Signed-off-by: Sergey Nazarov <s-nazarov@yandex.ru> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25net: Add __icmp_send helper.Nazarov Sergey
Add __icmp_send function having ip_options struct parameter Signed-off-by: Sergey Nazarov <s-nazarov@yandex.ru> Reviewed-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25tun: remove unnecessary memory barrierTimur Celik
Replace set_current_state with __set_current_state since no memory barrier is needed at this point. Signed-off-by: Timur Celik <mail@timurcelik.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25net: socket: set sock->sk to NULL after calling proto_ops::release()Eric Biggers
Commit 9060cb719e61 ("net: crypto set sk to NULL when af_alg_release.") fixed a use-after-free in sockfs_setattr() when an AF_ALG socket is closed concurrently with fchownat(). However, it ignored that many other proto_ops::release() methods don't set sock->sk to NULL and therefore allow the same use-after-free: - base_sock_release - bnep_sock_release - cmtp_sock_release - data_sock_release - dn_release - hci_sock_release - hidp_sock_release - iucv_sock_release - l2cap_sock_release - llcp_sock_release - llc_ui_release - rawsock_release - rfcomm_sock_release - sco_sock_release - svc_release - vcc_release - x25_release Rather than fixing all these and relying on every socket type to get this right forever, just make __sock_release() set sock->sk to NULL itself after calling proto_ops::release(). Reproducer that produces the KASAN splat when any of these socket types are configured into the kernel: #include <pthread.h> #include <stdlib.h> #include <sys/socket.h> #include <unistd.h> pthread_t t; volatile int fd; void *close_thread(void *arg) { for (;;) { usleep(rand() % 100); close(fd); } } int main() { pthread_create(&t, NULL, close_thread, NULL); for (;;) { fd = socket(rand() % 50, rand() % 11, 0); fchownat(fd, "", 1000, 1000, 0x1000); close(fd); } } Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25net: sched: act_tunnel_key: fix NULL pointer dereference during initVlad Buslov
Metadata pointer is only initialized for action TCA_TUNNEL_KEY_ACT_SET, but it is unconditionally dereferenced in tunnel_key_init() error handler. Verify that metadata pointer is not NULL before dereferencing it in tunnel_key_init error handling code. Fixes: ee28bb56ac5b ("net/sched: fix memory leak in act_tunnel_key_init()") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25net: dsa: fix a leaked reference by adding missing of_node_putWen Yang
The call to of_parse_phandle returns a node pointer with refcount incremented thus it must be explicitly decremented after the last usage. Detected by coccinelle with the following warnings: ./net/dsa/port.c:294:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 284, but without a corresponding object release within this function. ./net/dsa/dsa2.c:627:3-9: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 618, but without a corresponding object release within this function. ./net/dsa/dsa2.c:630:3-9: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 618, but without a corresponding object release within this function. ./net/dsa/dsa2.c:636:3-9: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 618, but without a corresponding object release within this function. ./net/dsa/dsa2.c:639:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 618, but without a corresponding object release within this function. Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24tun: fix blocking readTimur Celik
This patch moves setting of the current state into the loop. Otherwise the task may end up in a busy wait loop if none of the break conditions are met. Signed-off-by: Timur Celik <mail@timurcelik.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net: dsa: lantiq: Add GPHY firmware filesHauke Mehrtens
This adds the file names of the FW files which this driver handles into the module description. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net/sched: act_skbedit: fix refcount leak when replace failsDavide Caratti
when act_skbedit was converted to use RCU in the data plane, we added an error path, but we forgot to drop the action refcount in case of failure during a 'replace' operation: # tc actions add action skbedit ptype otherhost pass index 100 # tc action show action skbedit total acts 1 action order 0: skbedit ptype otherhost pass index 100 ref 1 bind 0 # tc actions replace action skbedit ptype otherhost drop index 100 RTNETLINK answers: Cannot allocate memory We have an error talking to the kernel # tc action show action skbedit total acts 1 action order 0: skbedit ptype otherhost pass index 100 ref 2 bind 0 Ensure we call tcf_idr_release(), in case 'params_new' allocation failed, also when the action is being replaced. Fixes: c749cdda9089 ("net/sched: act_skbedit: don't use spinlock in the data path") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net/sched: act_ipt: fix refcount leak when replace failsDavide Caratti
After commit 4e8ddd7f1758 ("net: sched: don't release reference on action overwrite"), the error path of all actions was converted to drop refcount also when the action was being overwritten. But we forgot act_ipt_init(), in case allocation of 'tname' was not successful: # tc action add action xt -j LOG --log-prefix hello index 100 tablename: mangle hook: NF_IP_POST_ROUTING target: LOG level warning prefix "hello" index 100 # tc action show action xt total acts 1 action order 0: tablename: mangle hook: NF_IP_POST_ROUTING target LOG level warning prefix "hello" index 100 ref 1 bind 0 # tc action replace action xt -j LOG --log-prefix world index 100 tablename: mangle hook: NF_IP_POST_ROUTING target: LOG level warning prefix "world" index 100 RTNETLINK answers: Cannot allocate memory We have an error talking to the kernel # tc action show action xt total acts 1 action order 0: tablename: mangle hook: NF_IP_POST_ROUTING target LOG level warning prefix "hello" index 100 ref 2 bind 0 Ensure we call tcf_idr_release(), in case 'tname' allocation failed, also when the action is being replaced. Fixes: 4e8ddd7f1758 ("net: sched: don't release reference on action overwrite") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "Bug fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: MMU: record maximum physical address width in kvm_mmu_extended_role kvm: x86: Return LA57 feature based on hardware capability x86/kvm/mmu: fix switch between root and guest MMUs s390: vsie: Use effective CRYCBD.31 to check CRYCBD validity
2019-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Hopefully the last pull request for this release. Fingers crossed: 1) Only refcount ESP stats on full sockets, from Martin Willi. 2) Missing barriers in AF_UNIX, from Al Viro. 3) RCU protection fixes in ipv6 route code, from Paolo Abeni. 4) Avoid false positives in untrusted GSO validation, from Willem de Bruijn. 5) Forwarded mesh packets in mac80211 need more tailroom allocated, from Felix Fietkau. 6) Use operstate consistently for linkup in team driver, from George Wilkie. 7) ThunderX bug fixes from Vadim Lomovtsev. Mostly races between VF and PF code paths. 8) Purge ipv6 exceptions during netdevice removal, from Paolo Abeni. 9) nfp eBPF code gen fixes from Jiong Wang. 10) bnxt_en firmware timeout fix from Michael Chan. 11) Use after free in udp/udpv6 error handlers, from Paolo Abeni. 12) Fix a race in x25_bind triggerable by syzbot, from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits) net: phy: realtek: Dummy IRQ calls for RTL8366RB tcp: repaired skbs must init their tso_segs net/x25: fix a race in x25_bind() net: dsa: Remove documentation for port_fdb_prepare Revert "bridge: do not add port to router list when receives query with source 0.0.0.0" selftests: fib_tests: sleep after changing carrier. again. net: set static variable an initial value in atl2_probe() net: phy: marvell10g: Fix Multi-G advertisement to only advertise 10G bpf, doc: add bpf list as secondary entry to maintainers file udp: fix possible user after free in error handler udpv6: fix possible user after free in error handler fou6: fix proto error handler argument type udpv6: add the required annotation to mib type mdio_bus: Fix use-after-free on device_register fails net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 bnxt_en: Wait longer for the firmware message response to complete. bnxt_en: Fix typo in firmware message timeout logic. nfp: bpf: fix ALU32 high bits clearance bug nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K Documentation: networking: switchdev: Update port parent ID section ...
2019-02-23net: phy: realtek: Dummy IRQ calls for RTL8366RBLinus Walleij
This fixes a regression introduced by commit 0d2e778e38e0ddffab4bb2b0e9ed2ad5165c4bf7 "net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt". This assumes that a PHY cannot trigger interrupt unless it has .config_intr() or .ack_interrupt() implemented. A later patch makes the code assume both need to be implemented for interrupts to be present. But this PHY (which is inside a DSA) will happily fire interrupts without either callback. Implement dummy callbacks for .config_intr() and .ack_interrupt() in the phy header to fix this. Tested on the RTL8366RB on D-Link DIR-685. Fixes: 0d2e778e38e0 ("net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt") Cc: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>