summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-07-23libbpf: fix using uninitialized ioctl resultsIlya Maximets
'channels.max_combined' initialized only on ioctl success and errno is only valid on ioctl failure. The code doesn't produce any runtime issues, but makes memory sanitizers angry: Conditional jump or move depends on uninitialised value(s) at 0x55C056F: xsk_get_max_queues (xsk.c:336) by 0x55C05B2: xsk_create_bpf_maps (xsk.c:354) by 0x55C089F: xsk_setup_xdp_prog (xsk.c:447) by 0x55C0E57: xsk_socket__create (xsk.c:601) Uninitialised value was created by a stack allocation at 0x55C04CD: xsk_get_max_queues (xsk.c:318) Additionally fixed warning on uninitialized bytes in ioctl arguments: Syscall param ioctl(SIOCETHTOOL) points to uninitialised byte(s) at 0x648D45B: ioctl (in /usr/lib64/libc-2.28.so) by 0x55C0546: xsk_get_max_queues (xsk.c:330) by 0x55C05B2: xsk_create_bpf_maps (xsk.c:354) by 0x55C089F: xsk_setup_xdp_prog (xsk.c:447) by 0x55C0E57: xsk_socket__create (xsk.c:601) Address 0x1ffefff378 is on thread 1's stack in frame #1, created by xsk_get_max_queues (xsk.c:318) Uninitialised value was created by a stack allocation at 0x55C04CD: xsk_get_max_queues (xsk.c:318) CC: Magnus Karlsson <magnus.karlsson@intel.com> Fixes: 1cad07884239 ("libbpf: add support for using AF_XDP sockets") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-23Merge branch 'fix-gso_segs'Alexei Starovoitov
Eric Dumazet says: ==================== First patch changes the kernel, second patch adds a new test. Note that other patches might be needed to take care of similar issues in sock_ops_convert_ctx_access() and SOCK_OPS_GET_FIELD() ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-23selftests/bpf: add another gso_segs accessEric Dumazet
Use BPF_REG_1 for source and destination of gso_segs read, to exercise "bpf: fix access to skb_shared_info->gso_segs" fix. Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-23bpf: fix access to skb_shared_info->gso_segsEric Dumazet
It is possible we reach bpf_convert_ctx_access() with si->dst_reg == si->src_reg Therefore, we need to load BPF_REG_AX before eventually mangling si->src_reg. syzbot generated this x86 code : 3: 55 push %rbp 4: 48 89 e5 mov %rsp,%rbp 7: 48 81 ec 00 00 00 00 sub $0x0,%rsp // Might be avoided ? e: 53 push %rbx f: 41 55 push %r13 11: 41 56 push %r14 13: 41 57 push %r15 15: 6a 00 pushq $0x0 17: 31 c0 xor %eax,%eax 19: 48 8b bf c0 00 00 00 mov 0xc0(%rdi),%rdi 20: 44 8b 97 bc 00 00 00 mov 0xbc(%rdi),%r10d 27: 4c 01 d7 add %r10,%rdi 2a: 48 0f b7 7f 06 movzwq 0x6(%rdi),%rdi // Crash 2f: 5b pop %rbx 30: 41 5f pop %r15 32: 41 5e pop %r14 34: 41 5d pop %r13 36: 5b pop %rbx 37: c9 leaveq 38: c3 retq Fixes: d9ff286a0f59 ("bpf: allow BPF programs access skb_shared_info->gso_segs field") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-23bpf: fix narrower loads on s390Ilya Leoshkevich
The very first check in test_pkt_md_access is failing on s390, which happens because loading a part of a struct __sk_buff field produces an incorrect result. The preprocessed code of the check is: { __u8 tmp = *((volatile __u8 *)&skb->len + ((sizeof(skb->len) - sizeof(__u8)) / sizeof(__u8))); if (tmp != ((*(volatile __u32 *)&skb->len) & 0xFF)) return 2; }; clang generates the following code for it: 0: 71 21 00 03 00 00 00 00 r2 = *(u8 *)(r1 + 3) 1: 61 31 00 00 00 00 00 00 r3 = *(u32 *)(r1 + 0) 2: 57 30 00 00 00 00 00 ff r3 &= 255 3: 5d 23 00 1d 00 00 00 00 if r2 != r3 goto +29 <LBB0_10> Finally, verifier transforms it to: 0: (61) r2 = *(u32 *)(r1 +104) 1: (bc) w2 = w2 2: (74) w2 >>= 24 3: (bc) w2 = w2 4: (54) w2 &= 255 5: (bc) w2 = w2 The problem is that when verifier emits the code to replace a partial load of a struct __sk_buff field (*(u8 *)(r1 + 3)) with a full load of struct sk_buff field (*(u32 *)(r1 + 104)), an optional shift and a bitwise AND, it assumes that the machine is little endian and incorrectly decides to use a shift. Adjust shift count calculation to account for endianness. Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-22selftests/bpf: fix sendmsg6_prog on s390Ilya Leoshkevich
"sendmsg6: rewrite IP & port (C)" fails on s390, because the code in sendmsg_v6_prog() assumes that (ctx->user_ip6[0] & 0xFFFF) refers to leading IPv6 address digits, which is not the case on big-endian machines. Since checking bitwise operations doesn't seem to be the point of the test, replace two short comparisons with a single int comparison. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22libbpf: Avoid designated initializers for unnamed union membersArnaldo Carvalho de Melo
As it fails to build in some systems with: libbpf.c: In function 'perf_buffer__new': libbpf.c:4515: error: unknown field 'sample_period' specified in initializer libbpf.c:4516: error: unknown field 'wakeup_events' specified in initializer Doing as: attr.sample_period = 1; I.e. not as a designated initializer makes it build everywhere. Cc: Andrii Nakryiko <andriin@fb.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: fb84b8224655 ("libbpf: add perf buffer API") Link: https://lkml.kernel.org/n/tip-hnlmch8qit1ieksfppmr32si@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22libbpf: Fix endianness macro usage for some compilersArnaldo Carvalho de Melo
Using endian.h and its endianness macros makes this code build in a wider range of compilers, as some don't have those macros (__BYTE_ORDER__, __ORDER_LITTLE_ENDIAN__, __ORDER_BIG_ENDIAN__), so use instead endian.h's macros (__BYTE_ORDER, __LITTLE_ENDIAN, __BIG_ENDIAN) which makes this code even shorter :-) Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: 12ef5634a855 ("libbpf: simplify endianness check") Fixes: e6c64855fd7a ("libbpf: add btf__parse_elf API to load .BTF and .BTF.ext") Link: https://lkml.kernel.org/n/tip-eep5n8vgwcdphw3uc058k03u@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22Merge branch 'bpf-sockmap-tls-fixes'Daniel Borkmann
Jakub Kicinski says: ==================== John says: Resolve a series of splats discovered by syzbot and an unhash TLS issue noted by Eric Dumazet. The main issues revolved around interaction between TLS and sockmap tear down. TLS and sockmap could both reset sk->prot ops creating a condition where a close or unhash op could be called forever. A rare race condition resulting from a missing rcu sync operation was causing a use after free. Then on the TLS side dropping the sock lock and re-acquiring it during the close op could hang. Finally, sockmap must be deployed before tls for current stack assumptions to be met. This is enforced now. A feature series can enable it. To fix this first refactor TLS code so the lock is held for the entire teardown operation. Then add an unhash callback to ensure TLS can not transition from ESTABLISHED to LISTEN state. This transition is a similar bug to the one found and fixed previously in sockmap. Then apply three fixes to sockmap to fix up races on tear down around map free and close. Finally, if sockmap is destroyed before TLS we add a new ULP op update to inform the TLS stack it should not call sockmap ops. This last one appears to be the most commonly found issue from syzbot. v4: - fix some use after frees; - disable disconnect work for offload (ctx lifetime is much more complex); - remove some of the dead code which made it hard to understand (for me) that things work correctly (e.g. the checks TLS is the top ULP); - add selftets. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22selftests/tls: add shutdown testsJakub Kicinski
Add test for killing the connection via shutdown. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22selftests/tls: close the socket with open recordJakub Kicinski
Add test which sends some data with MSG_MORE and then closes the socket (never calling send without MSG_MORE). This should make sure we clean up open records correctly. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22selftests/tls: add a bidirectional testJakub Kicinski
Add a simple test which installs the TLS state for both directions, sends and receives data on both sockets. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22selftests/tls: test error codes around TLS ULP installationJakub Kicinski
Test the error codes returned when TCP connection is not in ESTABLISHED state. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22selftests/tls: add a test for ULP but no keysJakub Kicinski
Make sure we test the TLS_BASE/TLS_BASE case both with data and the tear down/clean up path. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22bpf: sockmap/tls, close can race with map freeJohn Fastabend
When a map free is called and in parallel a socket is closed we have two paths that can potentially reset the socket prot ops, the bpf close() path and the map free path. This creates a problem with which prot ops should be used from the socket closed side. If the map_free side completes first then we want to call the original lowest level ops. However, if the tls path runs first we want to call the sockmap ops. Additionally there was no locking around prot updates in TLS code paths so the prot ops could be changed multiple times once from TLS path and again from sockmap side potentially leaving ops pointed at either TLS or sockmap when psock and/or tls context have already been destroyed. To fix this race first only update ops inside callback lock so that TLS, sockmap and lowest level all agree on prot state. Second and a ULP callback update() so that lower layers can inform the upper layer when they are being removed allowing the upper layer to reset prot ops. This gets us close to allowing sockmap and tls to be stacked in arbitrary order but will save that patch for *next trees. v4: - make sure we don't free things for device; - remove the checks which swap the callbacks back only if TLS is at the top. Reported-by: syzbot+06537213db7ba2745c4a@syzkaller.appspotmail.com Fixes: 02c558b2d5d6 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22bpf: sockmap, only create entry if ulp is not already enabledJohn Fastabend
Sockmap does not currently support adding sockets after TLS has been enabled. There never was a real use case for this so it was never added. But, we lost the test for ULP at some point so add it here and fail the socket insert if TLS is enabled. Future work could make sockmap support this use case but fixup the bug here. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22bpf: sockmap, synchronize_rcu before free'ing mapJohn Fastabend
We need to have a synchronize_rcu before free'ing the sockmap because any outstanding psock references will have a pointer to the map and when they use this could trigger a use after free. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22bpf: sockmap, sock_map_delete needs to use xchgJohn Fastabend
__sock_map_delete() may be called from a tcp event such as unhash or close from the following trace, tcp_bpf_close() tcp_bpf_remove() sk_psock_unlink() sock_map_delete_from_link() __sock_map_delete() In this case the sock lock is held but this only protects against duplicate removals on the TCP side. If the map is free'd then we have this trace, sock_map_free xchg() <- replaces map entry sock_map_unref() sk_psock_put() sock_map_del_link() The __sock_map_delete() call however uses a read, test, null over the map entry which can result in both paths trying to free the map entry. To fix use xchg in TCP paths as well so we avoid having two references to the same map entry. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: fix transition through disconnect with closeJohn Fastabend
It is possible (via shutdown()) for TCP socks to go through TCP_CLOSE state via tcp_disconnect() without actually calling tcp_close which would then call the tls close callback. Because of this a user could disconnect a socket then put it in a LISTEN state which would break our assumptions about sockets always being ESTABLISHED state. More directly because close() can call unhash() and unhash is implemented by sockmap if a sockmap socket has TLS enabled we can incorrectly destroy the psock from unhash() and then call its close handler again. But because the psock (sockmap socket representation) is already destroyed we call close handler in sk->prot. However, in some cases (TLS BASE/BASE case) this will still point at the sockmap close handler resulting in a circular call and crash reported by syzbot. To fix both above issues implement the unhash() routine for TLS. v4: - add note about tls offload still needing the fix; - move sk_proto to the cold cache line; - split TX context free into "release" and "free", otherwise the GC work itself is in already freed memory; - more TX before RX for consistency; - reuse tls_ctx_free(); - schedule the GC work after we're done with context to avoid UAF; - don't set the unhash in all modes, all modes "inherit" TLS_BASE's callbacks anyway; - disable the unhash hook for TLS_HW. Fixes: 3c4d7559159bf ("tls: kernel TLS support") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: remove sock unlock/lock around strp_done()John Fastabend
The tls close() callback currently drops the sock lock to call strp_done(). Split up the RX cleanup into stopping the strparser and releasing most resources, syncing strparser and finally freeing the context. To avoid the need for a strp_done() call on the cleanup path of device offload make sure we don't arm the strparser until we are sure init will be successful. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: remove close callback sock unlock/lock around TX work flushJohn Fastabend
The tls close() callback currently drops the sock lock, makes a cancel_delayed_work_sync() call, and then relocks the sock. By restructuring the code we can avoid droping lock and then reclaiming it. To simplify this we do the following, tls_sk_proto_close set_bit(CLOSING) set_bit(SCHEDULE) cancel_delay_work_sync() <- cancel workqueue lock_sock(sk) ... release_sock(sk) strp_done() Setting the CLOSING bit prevents the SCHEDULE bit from being cleared by any workqueue items e.g. if one happens to be scheduled and run between when we set SCHEDULE bit and cancel work. Then because SCHEDULE bit is set now no new work will be scheduled. Tested with net selftests and bpf selftests. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: don't call tls_sk_proto_close for hw record offloadJakub Kicinski
The deprecated TOE offload doesn't actually do anything in tls_sk_proto_close() - all TLS code is skipped and context not freed. Remove the callback to make it easier to refactor tls_sk_proto_close(). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: don't arm strparser immediately in tls_set_sw_offload()Jakub Kicinski
In tls_set_device_offload_rx() we prepare the software context for RX fallback and proceed to add the connection to the device. Unfortunately, software context prep includes arming strparser so in case of a later error we have to release the socket lock to call strp_done(). In preparation for not releasing the socket lock half way through callbacks move arming strparser into a separate function. Following patches will make use of that. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-19libbpf: sanitize VAR to conservative 1-byte INTAndrii Nakryiko
If VAR in non-sanitized BTF was size less than 4, converting such VAR into an INT with size=4 will cause BTF validation failure due to violationg of STRUCT (into which DATASEC was converted) member size. Fix by conservatively using size=1. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-19libbpf: fix SIGSEGV when BTF loading fails, but .BTF.ext existsAndrii Nakryiko
In case when BTF loading fails despite sanitization, but BPF object has .BTF.ext loaded as well, we free and null obj->btf, but not obj->btf_ext. This leads to an attempt to relocate .BTF.ext later on during bpf_object__load(), which assumes obj->btf is present. This leads to SIGSEGV on null pointer access. Fix bug by freeing and nulling obj->btf_ext as well. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-18tcp: fix tcp_set_congestion_control() use from bpf hookEric Dumazet
Neal reported incorrect use of ns_capable() from bpf hook. bpf_setsockopt(...TCP_CONGESTION...) -> tcp_set_congestion_control() -> ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN) -> ns_capable_common() -> current_cred() -> rcu_dereference_protected(current->cred, 1) Accessing 'current' in bpf context makes no sense, since packets are processed from softirq context. As Neal stated : The capability check in tcp_set_congestion_control() was written assuming a system call context, and then was reused from a BPF call site. The fix is to add a new parameter to tcp_set_congestion_control(), so that the ns_capable() call is only performed under the right context. Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Lawrence Brakmo <brakmo@fb.com> Reported-by: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18ag71xx: fix return value check in ag71xx_probe()Wei Yongjun
In case of error, the function of_get_mac_address() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18ag71xx: fix error return code in ag71xx_probe()Wei Yongjun
Fix to return error code -ENOMEM from the dmam_alloc_coherent() error handling case instead of 0, as done elsewhere in this function. Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18usb: qmi_wwan: add D-Link DWM-222 A2 device IDRogan Dawes
Signed-off-by: Rogan Dawes <rogan@dawes.za.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18bnxt_en: Fix VNIC accounting when enabling aRFS on 57500 chips.Michael Chan
Unlike legacy chips, 57500 chips don't need additional VNIC resources for aRFS/ntuple. Fix the code accordingly so that we don't reserve and allocate additional VNICs on 57500 chips. Without this patch, the driver is failing to initialize when it tries to allocate extra VNICs. Fixes: ac33906c67e2 ("bnxt_en: Add support for aRFS on 57500 chips.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18net: dsa: sja1105: Fix missing unlock on error in sk_buff()Wei Yongjun
Add the missing unlock before return from function sk_buff() in the error handling case. Fixes: f3097be21bf1 ("net: dsa: sja1105: Add a state machine for RX timestamping") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18gve: replace kfree with kvfreeChuhong Yuan
Variables allocated by kvzalloc should not be freed by kfree. Because they may be allocated by vmalloc. So we replace kfree with kvfree here. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2019-07-18 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) verifier precision propagation fix, from Andrii. 2) BTF size fix for typedefs, from Andrii. 3) a bunch of big endian fixes, from Ilya. 4) wide load from bpf_sock_addr fixes, from Stanislav. 5) a bunch of misc fixes from a number of developers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18selftests/bpf: fix test_xdp_noinline on s390Ilya Leoshkevich
test_xdp_noinline fails on s390 due to a handful of endianness issues. Use ntohs for parsing eth_proto. Replace bswaps with ntohs/htons. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-18selftests/bpf: fix "valid read map access into a read-only array 1" on s390Ilya Leoshkevich
This test looks up a 32-bit map element and then loads it using a 64-bit load. This does not work on s390, which is a big-endian machine. Since the point of this test doesn't seem to be loading a smaller value using a larger load, simply use a 32-bit load. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-18net/mlx5: Replace kfree with kvfreeChuhong Yuan
Variable allocated by kvmalloc should not be freed by kfree. Because it may be allocated by vmalloc. So replace kfree with kvfree here. Fixes: 9b1f298236057 ("net/mlx5: Add support for FW fatal reporter dump") Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18MAINTAINERS: update netsec driverIlias Apalodimas
Add myself to maintainers since i provided the XDP and page_pool implementation Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jassi Brar <jaswinder.singh@linaro.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18ipv6: Unlink sibling route in case of failureIdo Schimmel
When a route needs to be appended to an existing multipath route, fib6_add_rt2node() first appends it to the siblings list and increments the number of sibling routes on each sibling. Later, the function notifies the route via call_fib6_entry_notifiers(). In case the notification is vetoed, the route is not unlinked from the siblings list, which can result in a use-after-free. Fix this by unlinking the route from the siblings list before returning an error. Audited the rest of the call sites from which the FIB notification chain is called and could not find more problems. Fixes: 2233000cba40 ("net/ipv6: Move call_fib6_entry_notifiers up for route adds") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alexander Petrovskiy <alexpe@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18Merge tag 'wireless-drivers-for-davem-2019-07-18' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 5.3 First set of fixes for 5.3. iwlwifi * add new cards for 9000 and 20000 series and qu c-step devices ath10k * workaround an uninitialised variable warning rt2x00 * fix rx queue hand on USB ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18liquidio: Replace vmalloc + memset with vzallocChuhong Yuan
Use vzalloc and vzalloc_node instead of using vmalloc and vmalloc_node and then zeroing the allocated memory by memset 0. This simplifies the code. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18udp: Fix typo in net/ipv4/udp.cSu Yanjun
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18net: bcmgenet: use promisc for unsupported filtersJustin Chen
Currently we silently ignore filters if we cannot meet the filter requirements. This will lead to the MAC dropping packets that are expected to pass. A better solution would be to set the NIC to promisc mode when the required filters cannot be met. Also correct the number of MDF filters supported. It should be 17, not 16. Signed-off-by: Justin Chen <justinpopo6@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17ipv6: rt6_check should return NULL if 'from' is NULLDavid Ahern
Paul reported that l2tp sessions were broken after the commit referenced in the Fixes tag. Prior to this commit rt6_check returned NULL if the rt6_info 'from' was NULL - ie., the dst_entry was disconnected from a FIB entry. Restore that behavior. Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: Paul Donohue <linux-kernel@PaulSD.com> Tested-by: Paul Donohue <linux-kernel@PaulSD.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17tipc: initialize 'validated' field of received packetsJon Maloy
The tipc_msg_validate() function leaves a boolean flag 'validated' in the validated buffer's control block, to avoid performing this action more than once. However, at reception of new packets, the position of this field may already have been set by lower layer protocols, so that the packet is erroneously perceived as already validated by TIPC. We fix this by initializing the said field to 'false' before performing the initial validation. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17Merge branch 'ipv4-relax-source-validation-check-for-loopback-packets'David S. Miller
Cong Wang says: ==================== ipv4: relax source validation check for loopback packets This patchset fixes a corner case when loopback packets get dropped by rp_filter when we route them from veth to lo. Patch 1 is the fix and patch 2 provides a simplified test case for this scenario. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17selftests: add a test case for rp_filterCong Wang
Add a test case to simulate the loopback packet case fixed in the previous patch. This test gets passed after the fix: IPv4 rp_filter tests TEST: rp_filter passes local packets [ OK ] TEST: rp_filter passes loopback packets [ OK ] Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17fib: relax source validation check for loopback packetsCong Wang
In a rare case where we redirect local packets from veth to lo, these packets fail to pass the source validation when rp_filter is turned on, as the tracing shows: <...>-311708 [040] ..s1 7951180.957825: fib_table_lookup: table 254 oif 0 iif 1 src 10.53.180.130 dst 10.53.180.130 tos 0 scope 0 flags 0 <...>-311708 [040] ..s1 7951180.957826: fib_table_lookup_nh: nexthop dev eth0 oif 4 src 10.53.180.130 So, the fib table lookup returns eth0 as the nexthop even though the packets are local and should be routed to loopback nonetheless, but they can't pass the dev match check in fib_info_nh_uses_dev() without this patch. It should be safe to relax this check for this special case, as normally packets coming out of loopback device still have skb_dst so they won't even hit this slow path. Cc: Julian Anastasov <ja@ssi.bg> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17Merge branch 'mlxsw-Two-fixes'David S. Miller
Ido Schimmel says: ==================== mlxsw: Two fixes This patchset contains two fixes for mlxsw. Patch #1 from Petr fixes an issue in which DSCP rewrite can occur even if the egress port was switched to Trust L2 mode where priority mapping is based on PCP. Patch #2 fixes a problem where packets can be learned on a non-existing FID if a tc filter with a redirect action is configured on a bridged port. The problem and fix are explained in detail in the commit message. Please consider both patches for 5.2.y ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17mlxsw: spectrum: Do not process learned records with a dummy FIDIdo Schimmel
The switch periodically sends notifications about learned FDB entries. Among other things, the notification includes the FID (Filtering Identifier) and the port on which the MAC was learned. In case the driver does not have the FID defined on the relevant port, the following error will be periodically generated: mlxsw_spectrum2 0000:06:00.0 swp32: Failed to find a matching {Port, VID} following FDB notification This is not supposed to happen under normal conditions, but can happen if an ingress tc filter with a redirect action is installed on a bridged port. The redirect action will cause the packet's FID to be changed to the dummy FID and a learning notification will be emitted with this FID - which is not defined on the bridged port. Fix this by having the driver ignore learning notifications generated with the dummy FID and delete them from the device. Another option is to chain an ignore action after the redirect action which will cause the device to disable learning, but this means that we need to consume another action whenever a redirect action is used. In addition, the scenario described above is merely a corner case. Fixes: cedbb8b25948 ("mlxsw: spectrum_flower: Set dummy FID before forward action") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alex Kushnarov <alexanderk@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Tested-by: Alex Kushnarov <alexanderk@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17mlxsw: spectrum_dcb: Configure DSCP map as the last rule is removedPetr Machata
Spectrum systems use DSCP rewrite map to update DSCP field in egressing packets to correspond to priority that the packet has. Whether rewriting will take place is determined at the point when the packet ingresses the switch: if the port is in Trust L3 mode, packet priority is determined from the DSCP map at the port, and DSCP rewrite will happen. If the port is in Trust L2 mode, 802.1p is used for packet prioritization, and no DSCP rewrite will happen. The driver determines the port trust mode based on whether any DSCP prioritization rules are in effect at given port. If there are any, trust level is L3, otherwise it's L2. When the last DSCP rule is removed, the port is switched to trust L2. Under that scenario, if DSCP of a packet should be rewritten, it should be rewritten to 0. However, when switching to Trust L2, the driver neglects to also update the DSCP rewrite map. The last DSCP rule thus remains in effect, and packets egressing through this port, if they have the right priority, will have their DSCP set according to this rule. Fix by first configuring the rewrite map, and only then switching to trust L2 and bailing out. Fixes: b2b1dab6884e ("mlxsw: spectrum: Support ieee_setapp, ieee_delapp") Signed-off-by: Petr Machata <petrm@mellanox.com> Reported-by: Alex Veber <alexve@mellanox.com> Tested-by: Alex Veber <alexve@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>