Age | Commit message (Collapse) | Author |
|
syzbot reported:
BUG: KMSAN: uninit-value in strlen+0x3b/0xa0 lib/string.c:484
CPU: 1 PID: 6371 Comm: syz-executor652 Not tainted 4.19.0-rc8+ #70
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x306/0x460 lib/dump_stack.c:113
kmsan_report+0x1a2/0x2e0 mm/kmsan/kmsan.c:917
__msan_warning+0x7c/0xe0 mm/kmsan/kmsan_instr.c:500
strlen+0x3b/0xa0 lib/string.c:484
nla_put_string include/net/netlink.h:1011 [inline]
tipc_nl_compat_bearer_enable+0x238/0x7b0 net/tipc/netlink_compat.c:389
__tipc_nl_compat_doit net/tipc/netlink_compat.c:311 [inline]
tipc_nl_compat_doit+0x39f/0xae0 net/tipc/netlink_compat.c:344
tipc_nl_compat_recv+0x147c/0x2760 net/tipc/netlink_compat.c:1107
genl_family_rcv_msg net/netlink/genetlink.c:601 [inline]
genl_rcv_msg+0x185c/0x1a20 net/netlink/genetlink.c:626
netlink_rcv_skb+0x394/0x640 net/netlink/af_netlink.c:2454
genl_rcv+0x63/0x80 net/netlink/genetlink.c:637
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x166d/0x1720 net/netlink/af_netlink.c:1343
netlink_sendmsg+0x1391/0x1420 net/netlink/af_netlink.c:1908
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg net/socket.c:631 [inline]
___sys_sendmsg+0xe47/0x1200 net/socket.c:2116
__sys_sendmsg net/socket.c:2154 [inline]
__do_sys_sendmsg net/socket.c:2163 [inline]
__se_sys_sendmsg+0x307/0x460 net/socket.c:2161
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
do_syscall_64+0xbe/0x100 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x440179
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fffef7beee8 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440179
RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000401a00
R13: 0000000000401a90 R14: 0000000000000000 R15: 0000000000000000
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:255 [inline]
kmsan_internal_poison_shadow+0xc8/0x1d0 mm/kmsan/kmsan.c:180
kmsan_kmalloc+0xa4/0x120 mm/kmsan/kmsan_hooks.c:104
kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan_hooks.c:113
slab_post_alloc_hook mm/slab.h:446 [inline]
slab_alloc_node mm/slub.c:2727 [inline]
__kmalloc_node_track_caller+0xb43/0x1400 mm/slub.c:4360
__kmalloc_reserve net/core/skbuff.c:138 [inline]
__alloc_skb+0x422/0xe90 net/core/skbuff.c:206
alloc_skb include/linux/skbuff.h:996 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline]
netlink_sendmsg+0xcaf/0x1420 net/netlink/af_netlink.c:1883
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg net/socket.c:631 [inline]
___sys_sendmsg+0xe47/0x1200 net/socket.c:2116
__sys_sendmsg net/socket.c:2154 [inline]
__do_sys_sendmsg net/socket.c:2163 [inline]
__se_sys_sendmsg+0x307/0x460 net/socket.c:2161
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
do_syscall_64+0xbe/0x100 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
The root cause is that we don't validate whether bear name is a valid
string in tipc_nl_compat_bearer_enable().
Meanwhile, we also fix the same issue in the following functions:
tipc_nl_compat_bearer_disable()
tipc_nl_compat_link_stat_dump()
tipc_nl_compat_media_set()
tipc_nl_compat_bearer_set()
Reported-by: syzbot+b33d5cae0efd35dbfe77@syzkaller.appspotmail.com
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot reports following splat:
BUG: KMSAN: uninit-value in strlen+0x3b/0xa0 lib/string.c:486
CPU: 1 PID: 11057 Comm: syz-executor0 Not tainted 4.20.0-rc7+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x173/0x1d0 lib/dump_stack.c:113
kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
__msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:295
strlen+0x3b/0xa0 lib/string.c:486
nla_put_string include/net/netlink.h:1154 [inline]
tipc_nl_compat_link_reset_stats+0x1f0/0x360 net/tipc/netlink_compat.c:760
__tipc_nl_compat_doit net/tipc/netlink_compat.c:311 [inline]
tipc_nl_compat_doit+0x3aa/0xaf0 net/tipc/netlink_compat.c:344
tipc_nl_compat_handle net/tipc/netlink_compat.c:1107 [inline]
tipc_nl_compat_recv+0x14d7/0x2760 net/tipc/netlink_compat.c:1210
genl_family_rcv_msg net/netlink/genetlink.c:601 [inline]
genl_rcv_msg+0x185f/0x1a60 net/netlink/genetlink.c:626
netlink_rcv_skb+0x444/0x640 net/netlink/af_netlink.c:2477
genl_rcv+0x63/0x80 net/netlink/genetlink.c:637
netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
netlink_unicast+0xf40/0x1020 net/netlink/af_netlink.c:1336
netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg net/socket.c:631 [inline]
___sys_sendmsg+0xdb9/0x11b0 net/socket.c:2116
__sys_sendmsg net/socket.c:2154 [inline]
__do_sys_sendmsg net/socket.c:2163 [inline]
__se_sys_sendmsg+0x305/0x460 net/socket.c:2161
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x457ec9
Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f2557338c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457ec9
RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f25573396d4
R13: 00000000004cb478 R14: 00000000004d86c8 R15: 00000000ffffffff
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
slab_post_alloc_hook mm/slab.h:446 [inline]
slab_alloc_node mm/slub.c:2759 [inline]
__kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383
__kmalloc_reserve net/core/skbuff.c:137 [inline]
__alloc_skb+0x309/0xa20 net/core/skbuff.c:205
alloc_skb include/linux/skbuff.h:998 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg net/socket.c:631 [inline]
___sys_sendmsg+0xdb9/0x11b0 net/socket.c:2116
__sys_sendmsg net/socket.c:2154 [inline]
__do_sys_sendmsg net/socket.c:2163 [inline]
__se_sys_sendmsg+0x305/0x460 net/socket.c:2161
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
The uninitialised access happened in tipc_nl_compat_link_reset_stats:
nla_put_string(skb, TIPC_NLA_LINK_NAME, name)
This is because name string is not validated before it's used.
Reported-by: syzbot+e01d94b5a4c266be6e4c@syzkaller.appspotmail.com
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot reported:
BUG: KMSAN: uninit-value in tipc_conn_rcv_sub+0x184/0x950 net/tipc/topsrv.c:373
CPU: 0 PID: 66 Comm: kworker/u4:4 Not tainted 4.17.0-rc3+ #88
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: tipc_rcv tipc_conn_recv_work
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:113
kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
__msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
tipc_conn_rcv_sub+0x184/0x950 net/tipc/topsrv.c:373
tipc_conn_rcv_from_sock net/tipc/topsrv.c:409 [inline]
tipc_conn_recv_work+0x3cd/0x560 net/tipc/topsrv.c:424
process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145
worker_thread+0x113c/0x24f0 kernel/workqueue.c:2279
kthread+0x539/0x720 kernel/kthread.c:239
ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412
Local variable description: ----s.i@tipc_conn_recv_work
Variable was created at:
tipc_conn_recv_work+0x65/0x560 net/tipc/topsrv.c:419
process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145
In tipc_conn_rcv_from_sock(), it always supposes the length of message
received from sock_recvmsg() is not smaller than the size of struct
tipc_subscr. However, this assumption is false. Especially when the
length of received message is shorter than struct tipc_subscr size,
we will end up touching uninitialized fields in tipc_conn_rcv_sub().
Reported-by: syzbot+8951a3065ee7fd6d6e23@syzkaller.appspotmail.com
Reported-by: syzbot+75e6e042c5bbf691fc82@syzkaller.appspotmail.com
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To ensure parent qdiscs have the same notion of the number of enqueued
packets even after splitting a GSO packet, update the qdisc tree with the
number of packets that was added due to the split.
Reported-by: Pete Heist <pete@heistp.net>
Tested-by: Pete Heist <pete@heistp.net>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Several qdiscs check on enqueue whether the packet was enqueued to a class
with an empty queue, in which case the class is activated. This is done by
checking if the qlen is exactly 1 after enqueue. However, if GSO splitting
is enabled in the child qdisc, a single packet can result in a qlen longer
than 1. This means the activation check fails, leading to a stalled queue.
Fix this by checking if the queue is empty *before* enqueue, and running
the activation logic if this was the case.
Reported-by: Pete Heist <pete@heistp.net>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Parent qdiscs may dereference the pointer to the enqueued skb after
enqueue. However, both CAKE and TBF call consume_skb() on the original skb
when splitting GSO packets, leading to a potential use-after-free in the
parent. Fix this by avoiding dereferencing the skb pointer after enqueueing
to the child.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We forgot to update ip6erspan version related info when changing link,
which will cause setting new hwid failed.
Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: 94d7d8f292870 ("ip6_gre: add erspan v2 support")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IPv4 routing tables are flushed in two cases:
1. In response to events in the netdev and inetaddr notification chains
2. When a network namespace is being dismantled
In both cases only routes associated with a dead nexthop group are
flushed. However, a nexthop group will only be marked as dead in case it
is populated with actual nexthops using a nexthop device. This is not
the case when the route in question is an error route (e.g.,
'blackhole', 'unreachable').
Therefore, when a network namespace is being dismantled such routes are
not flushed and leaked [1].
To reproduce:
# ip netns add blue
# ip -n blue route add unreachable 192.0.2.0/24
# ip netns del blue
Fix this by not skipping error routes that are not marked with
RTNH_F_DEAD when flushing the routing tables.
To prevent the flushing of such routes in case #1, add a parameter to
fib_table_flush() that indicates if the table is flushed as part of
namespace dismantle or not.
Note that this problem does not exist in IPv6 since error routes are
associated with the loopback device.
[1]
unreferenced object 0xffff888066650338 (size 56):
comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 b0 1c 62 61 80 88 ff ff ..........ba....
e8 8b a1 64 80 88 ff ff 00 07 00 08 fe 00 00 00 ...d............
backtrace:
[<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
[<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
[<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
[<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
[<0000000014f62875>] netlink_sendmsg+0x929/0xe10
[<00000000bac9d967>] sock_sendmsg+0xc8/0x110
[<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
[<000000002e94f880>] __sys_sendmsg+0xf7/0x250
[<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
[<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<000000003a8b605b>] 0xffffffffffffffff
unreferenced object 0xffff888061621c88 (size 48):
comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
hex dump (first 32 bytes):
6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
6b 6b 6b 6b 6b 6b 6b 6b d8 8e 26 5f 80 88 ff ff kkkkkkkk..&_....
backtrace:
[<00000000733609e3>] fib_table_insert+0x978/0x1500
[<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
[<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
[<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
[<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
[<0000000014f62875>] netlink_sendmsg+0x929/0xe10
[<00000000bac9d967>] sock_sendmsg+0xc8/0x110
[<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
[<000000002e94f880>] __sys_sendmsg+0xf7/0x250
[<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
[<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<000000003a8b605b>] 0xffffffffffffffff
Fixes: 8cced9eff1d4 ("[NETNS]: Enable routing configuration in non-initial namespace.")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In changelink ops, the ip6gre_net pointer is retrieved from
dev_net(dev), which is wrong in case of x-netns. Thus, the tunnel is not
unlinked from its current list and is relinked into another net
namespace. This corrupts the tunnel lists and can later trigger a kernel
oops.
Fix this by retrieving the netns from device private area.
Fixes: c8632fc30bb0 ("net: ip6_gre: Split up ip6gre_changelink()")
Cc: Petr Machata <petrm@mellanox.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
This is the first batch of Netfilter fixes for your net tree:
1) Fix endless loop in nf_tables rules netlink dump, from Phil Sutter.
2) Reference counter leak in object from the error path, from Taehee Yoo.
3) Selective rule dump requires table and chain.
4) Fix DNAT with nft_flow_offload reverse route lookup, from wenxu.
5) Use GFP_KERNEL_ACCOUNT in vmalloc allocation from ebtables, from
Shakeel Butt.
6) Set ifindex from route to fix interaction with VRF slave device,
also from wenxu.
7) Use nfct_help() to check for conntrack helper, IPS_HELPER status
flag is only set from explicit helpers via -j CT, from Henry Yen.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When using Kerberos with v4.20, I've observed frequent connection
loss on heavy workloads. I traced it down to the client underrunning
the GSS sequence number window -- NFS servers are required to drop
the RPC with the low sequence number, and also drop the connection
to signal that an RPC was dropped.
Bisected to commit 918f3c1fe83c ("SUNRPC: Improve latency for
interactive tasks").
I've got a one-line workaround for this issue, which is easy to
backport to v4.20 while a more permanent solution is being derived.
Essentially, tk_owner-based sorting is disabled for RPCs that carry
a GSS sequence number.
Fixes: 918f3c1fe83c ("SUNRPC: Improve latency for interactive ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
According to RFC2203, the RPCSEC_GSS sequence numbers are bounded to
an upper limit of MAXSEQ = 0x80000000. Ensure that we handle that
correctly.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
When we resend a request, ensure that the 'rq_bytes_sent' is reset
to zero.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
In the xdp_umem_assign_dev() path, the xsk code does not
check if a queue for which umem is to be created exists.
It leads to a situation where umem is not assigned to any
Tx/Rx queue of a netdevice, without notifying the stack
about an error. This affects both XDP_SKB and XDP_DRV
modes - in case of XDP_DRV_ZC, queue index is checked by
the driver.
This patch fixes xsk code, so that in both XDP_SKB and
XDP_DRV mode of AF_XDP, an error is returned when requested
queue index exceedes an existing maximum.
Fixes: c9b47cc1fabca ("xsk: fix bug when trying to use both copy and zero-copy on one queue id")
Reported-by: Jakub Spizewski <jakub.spizewski@intel.com>
Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Pull networking fixes from David Miller:
1) Fix regression in multi-SKB responses to RTM_GETADDR, from Arthur
Gautier.
2) Fix ipv6 frag parsing in openvswitch, from Yi-Hung Wei.
3) Unbounded recursion in ipv4 and ipv6 GUE tunnels, from Stefano
Brivio.
4) Use after free in hns driver, from Yonglong Liu.
5) icmp6_send() needs to handle the case of NULL skb, from Eric
Dumazet.
6) Missing rcu read lock in __inet6_bind() when operating on mapped
addresses, from David Ahern.
7) Memory leak in tipc-nl_compat_publ_dump(), from Gustavo A. R. Silva.
8) Fix PHY vs r8169 module loading ordering issues, from Heiner
Kallweit.
9) Fix bridge vlan memory leak, from Ido Schimmel.
10) Dev refcount leak in AF_PACKET, from Jason Gunthorpe.
11) Infoleak in ipv6_local_error(), flow label isn't completely
initialized. From Eric Dumazet.
12) Handle mv88e6390 errata, from Andrew Lunn.
13) Making vhost/vsock CID hashing consistent, from Zha Bin.
14) Fix lack of UMH cleanup when it unexpectedly exits, from Taehee Yoo.
15) Bridge forwarding must clear skb->tstamp, from Paolo Abeni.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
bnxt_en: Fix context memory allocation.
bnxt_en: Fix ring checking logic on 57500 chips.
mISDN: hfcsusb: Use struct_size() in kzalloc()
net: clear skb->tstamp in bridge forwarding path
net: bpfilter: disallow to remove bpfilter module while being used
net: bpfilter: restart bpfilter_umh when error occurred
net: bpfilter: use cleanup callback to release umh_info
umh: add exit routine for UMH process
isdn: i4l: isdn_tty: Fix some concurrency double-free bugs
vhost/vsock: fix vhost vsock cid hashing inconsistent
net: stmmac: Prevent RX starvation in stmmac_napi_poll()
net: stmmac: Fix the logic of checking if RX Watchdog must be enabled
net: stmmac: Check if CBS is supported before configuring
net: stmmac: dwxgmac2: Only clear interrupts that are active
net: stmmac: Fix PCI module removal leak
tools/bpf: fix bpftool map dump with bitfields
tools/bpf: test btf bitfield with >=256 struct member offset
bpf: fix bpffs bitfield pretty print
net: ethernet: mediatek: fix warning in phy_start_aneg
tcp: change txhash on SYN-data timeout
...
|
|
This patch uses nfct_help() to detect whether an established connection
needs conntrack helper instead of using test_bit(IPS_HELPER_BIT,
&ct->status).
The reason is that IPS_HELPER_BIT is only set when using explicit CT
target.
However, in the case that a device enables conntrack helper via command
"echo 1 > /proc/sys/net/netfilter/nf_conntrack_helper", the status of
IPS_HELPER_BIT will not present any change, and consequently it loses
the checking ability in the context.
Signed-off-by: Henry Yen <henry.yen@mediatek.com>
Reviewed-by: Ryder Lee <ryder.lee@mediatek.com>
Tested-by: John Crispin <john@phrozen.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Matteo reported forwarding issues inside the linux bridge,
if the enslaved interfaces use the fq qdisc.
Similar to commit 8203e2d844d3 ("net: clear skb->tstamp in
forwarding paths"), we need to clear the tstamp field in
the bridge forwarding path.
Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.")
Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
Reported-and-tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The bpfilter.ko module can be removed while functions of the bpfilter.ko
are executing. so panic can occurred. in order to protect that, locks can
be used. a bpfilter_lock protects routines in the
__bpfilter_process_sockopt() but it's not enough because __exit routine
can be executed concurrently.
Now, the bpfilter_umh can not run in parallel.
So, the module do not removed while it's being used and it do not
double-create UMH process.
The members of the umh_info and the bpfilter_umh_ops are protected by
the bpfilter_umh_ops.lock.
test commands:
while :
do
iptables -I FORWARD -m string --string ap --algo kmp &
modprobe -rv bpfilter &
done
splat looks like:
[ 298.623435] BUG: unable to handle kernel paging request at fffffbfff807440b
[ 298.628512] #PF error: [normal kernel read fault]
[ 298.633018] PGD 124327067 P4D 124327067 PUD 11c1a3067 PMD 119eb2067 PTE 0
[ 298.638859] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 298.638859] CPU: 0 PID: 2997 Comm: iptables Not tainted 4.20.0+ #154
[ 298.638859] RIP: 0010:__mutex_lock+0x6b9/0x16a0
[ 298.638859] Code: c0 00 00 e8 89 82 ff ff 80 bd 8f fc ff ff 00 0f 85 d9 05 00 00 48 8b 85 80 fc ff ff 48 bf 00 00 00 00 00 fc ff df 48 c1 e8 03 <80> 3c 38 00 0f 85 1d 0e 00 00 48 8b 85 c8 fc ff ff 49 39 47 58 c6
[ 298.638859] RSP: 0018:ffff88810e7777a0 EFLAGS: 00010202
[ 298.638859] RAX: 1ffffffff807440b RBX: ffff888111bd4d80 RCX: 0000000000000000
[ 298.638859] RDX: 1ffff110235ff806 RSI: ffff888111bd5538 RDI: dffffc0000000000
[ 298.638859] RBP: ffff88810e777b30 R08: 0000000080000002 R09: 0000000000000000
[ 298.638859] R10: 0000000000000000 R11: 0000000000000000 R12: fffffbfff168a42c
[ 298.638859] R13: ffff888111bd4d80 R14: ffff8881040e9a05 R15: ffffffffc03a2000
[ 298.638859] FS: 00007f39e3758700(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
[ 298.638859] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 298.638859] CR2: fffffbfff807440b CR3: 000000011243e000 CR4: 00000000001006f0
[ 298.638859] Call Trace:
[ 298.638859] ? mutex_lock_io_nested+0x1560/0x1560
[ 298.638859] ? kasan_kmalloc+0xa0/0xd0
[ 298.638859] ? kmem_cache_alloc+0x1c2/0x260
[ 298.638859] ? __alloc_file+0x92/0x3c0
[ 298.638859] ? alloc_empty_file+0x43/0x120
[ 298.638859] ? alloc_file_pseudo+0x220/0x330
[ 298.638859] ? sock_alloc_file+0x39/0x160
[ 298.638859] ? __sys_socket+0x113/0x1d0
[ 298.638859] ? __x64_sys_socket+0x6f/0xb0
[ 298.638859] ? do_syscall_64+0x138/0x560
[ 298.638859] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 298.638859] ? __alloc_file+0x92/0x3c0
[ 298.638859] ? init_object+0x6b/0x80
[ 298.638859] ? cyc2ns_read_end+0x10/0x10
[ 298.638859] ? cyc2ns_read_end+0x10/0x10
[ 298.638859] ? hlock_class+0x140/0x140
[ 298.638859] ? sched_clock_local+0xd4/0x140
[ 298.638859] ? sched_clock_local+0xd4/0x140
[ 298.638859] ? check_flags.part.37+0x440/0x440
[ 298.638859] ? __lock_acquire+0x4f90/0x4f90
[ 298.638859] ? set_rq_offline.part.89+0x140/0x140
[ ... ]
Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The bpfilter_umh will be stopped via __stop_umh() when the bpfilter
error occurred.
The bpfilter_umh() couldn't start again because there is no restart
routine.
The section of the bpfilter_umh_{start/end} is no longer .init.rodata
because these area should be reused in the restart routine. hence
the section name is changed to .bpfilter_umh.
The bpfilter_ops->start() is restart callback. it will be called when
bpfilter_umh is stopped.
The stop bit means bpfilter_umh is stopped. this bit is set by both
start and stop routine.
Before this patch,
Test commands:
$ iptables -vnL
$ kill -9 <pid of bpfilter_umh>
$ iptables -vnL
[ 480.045136] bpfilter: write fail -32
$ iptables -vnL
All iptables commands will fail.
After this patch,
Test commands:
$ iptables -vnL
$ kill -9 <pid of bpfilter_umh>
$ iptables -vnL
$ iptables -vnL
Now, all iptables commands will work.
Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now, UMH process is killed, do_exit() calls the umh_info->cleanup callback
to release members of the umh_info.
This patch makes bpfilter_umh's cleanup routine to use the
umh_info->cleanup callback.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2019-01-11
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix TCP-BPF support for correctly setting the initial window
via TCP_BPF_IW on an active TFO sender, from Yuchung.
2) Fix a panic in BPF's stack_map_get_build_id()'s ELF parsing on
32 bit archs caused by page_address() returning NULL, from Song.
3) Fix BTF pretty print in kernel and bpftool when bitfield member
offset is greater than 256. Also add test cases, from Yonghong.
4) Fix improper argument handling in xdp1 sample, from Ioana.
5) Install missing tcp_server.py and tcp_client.py files from
BPF selftests, from Anders.
6) Add test_libbpf to gitignore in libbpf and BPF selftests,
from Stanislav.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the forward chain, the iif is changed from slave device to master vrf
device. Thus, flow offload does not find a match on the lower slave
device.
This patch uses the cached route, ie. dst->dev, to update the iif and
oif fields in the flow entry.
After this patch, the following example works fine:
# ip addr add dev eth0 1.1.1.1/24
# ip addr add dev eth1 10.0.0.1/24
# ip link add user1 type vrf table 1
# ip l set user1 up
# ip l set dev eth0 master user1
# ip l set dev eth1 master user1
# nft add table firewall
# nft add flowtable f fb1 { hook ingress priority 0 \; devices = { eth0, eth1 } \; }
# nft add chain f ftb-all {type filter hook forward priority 0 \; policy accept \; }
# nft add rule f ftb-all ct zone 1 ip protocol tcp flow offload @fb1
# nft add rule f ftb-all ct zone 1 ip protocol udp flow offload @fb1
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
memory is already accounted to kmemcg. Do the same for ebtables. The
syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
whole system from a restricted memcg, a potential DoS.
By accounting the ebt_table_info, the memory used for ebt_table_info can
be contained within the memcg of the allocating process. However the
lifetime of ebt_table_info is independent of the allocating process and
is tied to the network namespace. So, the oom-killer will not be able to
relieve the memory pressure due to ebt_table_info memory. The memory for
ebt_table_info is allocated through vmalloc. Currently vmalloc does not
handle the oom-killed allocating process correctly and one large
allocation can bypass memcg limit enforcement. So, with this patch,
at least the small allocations will be contained. For large allocations,
we need to fix vmalloc.
Reported-by: syzbot+7713f3aa67be76b1552c@syzkaller.appspotmail.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Previously upon SYN timeouts the sender recomputes the txhash to
try a different path. However this does not apply on the initial
timeout of SYN-data (active Fast Open). Therefore an active IPv6
Fast Open connection may incur one second RTO penalty to take on
a new path after the second SYN retransmission uses a new flow label.
This patch removes this undesirable behavior so Fast Open changes
the flow label just like the regular connections. This also helps
avoid falsely disabling Fast Open on the sender which triggers
after two consecutive SYN timeouts on Fast Open.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch makes sure the flow label in the IPv6 header
forged in ipv6_local_error() is initialized.
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
CPU: 1 PID: 24675 Comm: syz-executor1 Not tainted 4.20.0-rc7+ #4
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x173/0x1d0 lib/dump_stack.c:113
kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
kmsan_internal_check_memory+0x455/0xb00 mm/kmsan/kmsan.c:675
kmsan_copy_to_user+0xab/0xc0 mm/kmsan/kmsan_hooks.c:601
_copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
copy_to_user include/linux/uaccess.h:177 [inline]
move_addr_to_user+0x2e9/0x4f0 net/socket.c:227
___sys_recvmsg+0x5d7/0x1140 net/socket.c:2284
__sys_recvmsg net/socket.c:2327 [inline]
__do_sys_recvmsg net/socket.c:2337 [inline]
__se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334
__x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334
do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x457ec9
Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8750c06c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457ec9
RDX: 0000000000002000 RSI: 0000000020000400 RDI: 0000000000000005
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8750c076d4
R13: 00000000004c4a60 R14: 00000000004d8140 R15: 00000000ffffffff
Uninit was stored to memory at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
kmsan_save_stack mm/kmsan/kmsan.c:219 [inline]
kmsan_internal_chain_origin+0x134/0x230 mm/kmsan/kmsan.c:439
__msan_chain_origin+0x70/0xe0 mm/kmsan/kmsan_instr.c:200
ipv6_recv_error+0x1e3f/0x1eb0 net/ipv6/datagram.c:475
udpv6_recvmsg+0x398/0x2ab0 net/ipv6/udp.c:335
inet_recvmsg+0x4fb/0x600 net/ipv4/af_inet.c:830
sock_recvmsg_nosec net/socket.c:794 [inline]
sock_recvmsg+0x1d1/0x230 net/socket.c:801
___sys_recvmsg+0x4d5/0x1140 net/socket.c:2278
__sys_recvmsg net/socket.c:2327 [inline]
__do_sys_recvmsg net/socket.c:2337 [inline]
__se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334
__x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334
do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
slab_post_alloc_hook mm/slab.h:446 [inline]
slab_alloc_node mm/slub.c:2759 [inline]
__kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383
__kmalloc_reserve net/core/skbuff.c:137 [inline]
__alloc_skb+0x309/0xa20 net/core/skbuff.c:205
alloc_skb include/linux/skbuff.h:998 [inline]
ipv6_local_error+0x1a7/0x9e0 net/ipv6/datagram.c:334
__ip6_append_data+0x129f/0x4fd0 net/ipv6/ip6_output.c:1311
ip6_make_skb+0x6cc/0xcf0 net/ipv6/ip6_output.c:1775
udpv6_sendmsg+0x3f8e/0x45d0 net/ipv6/udp.c:1384
inet_sendmsg+0x54a/0x720 net/ipv4/af_inet.c:798
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg net/socket.c:631 [inline]
__sys_sendto+0x8c4/0xac0 net/socket.c:1788
__do_sys_sendto net/socket.c:1800 [inline]
__se_sys_sendto+0x107/0x130 net/socket.c:1796
__x64_sys_sendto+0x6e/0x90 net/socket.c:1796
do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
Bytes 4-7 of 28 are uninitialized
Memory access of size 28 starts at ffff8881937bfce0
Data copied to user address 0000000020000000
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This fixes false-positive kmemleak reports about leaked neighbour entries:
unreferenced object 0xffff8885c6e4d0a8 (size 1024):
comm "softirq", pid 0, jiffies 4294922664 (age 167640.804s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 20 2c f3 83 ff ff ff ff ........ ,......
08 c0 ef 5f 84 88 ff ff 01 8c 7d 02 01 00 00 00 ..._......}.....
backtrace:
[<00000000748509fe>] ip6_finish_output2+0x887/0x1e40
[<0000000036d7a0d8>] ip6_output+0x1ba/0x600
[<0000000027ea7dba>] ip6_send_skb+0x92/0x2f0
[<00000000d6e2111d>] udp_v6_send_skb.isra.24+0x680/0x15e0
[<000000000668a8be>] udpv6_sendmsg+0x18c9/0x27a0
[<000000004bd5fa90>] sock_sendmsg+0xb3/0xf0
[<000000008227b29f>] ___sys_sendmsg+0x745/0x8f0
[<000000008698009d>] __sys_sendmsg+0xde/0x170
[<00000000889dacf1>] do_syscall_64+0x9b/0x400
[<0000000081cdb353>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<000000005767ed39>] 0xffffffffffffffff
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 2efd4fca703a ("ip: in cmsg IP(V6)_ORIGDSTADDR call
pskb_may_pull") avoided a read beyond the end of the skb linear
segment by calling pskb_may_pull.
That function can trigger a BUG_ON in pskb_expand_head if the skb is
shared, which it is when when peeking. It can also return ENOMEM.
Avoid both by switching to safer skb_header_pointer.
Fixes: 2efd4fca703a ("ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull")
Reported-by: syzbot <syzkaller@googlegroups.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The existing BPF TCP initial congestion window (TCP_BPF_IW) does not
to work on (active) Fast Open sender. This is because it changes the
(initial) window only if data_segs_out is zero -- but data_segs_out
is also incremented on SYN-data. This patch fixes the issue by
proerly accounting for SYN-data additionally.
Fixes: fc7478103c84 ("bpf: Adds support for setting initial cwnd")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Using the following example:
client 1.1.1.7 ---> 2.2.2.7 which dnat to 10.0.0.7 server
The first reply packet (ie. syn+ack) uses an incorrect destination
address for the reverse route lookup since it uses:
daddr = ct->tuplehash[!dir].tuple.dst.u3.ip;
which is 2.2.2.7 in the scenario that is described above, while this
should be:
daddr = ct->tuplehash[dir].tuple.src.u3.ip;
that is 10.0.0.7.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Init missing debug member magic with CRED_MAGIC.
Signed-off-by: Santosh kumar pradhan <santoshkumar.pradhan@wdc.com>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
'dev' is non NULL when the addr_len check triggers so it must goto a label
that does the dev_put otherwise dev will have a leaked refcount.
This bug causes the ib_ipoib module to become unloadable when using
systemd-network as it triggers this check on InfiniBand links.
Fixes: 99137b7888f4 ("packet: validate address length")
Reported-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Table needs to be specified for selective rule dumps per chain.
Fixes: 241faeceb849c ("netfilter: nf_tables: Speed up selective rule dumps")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
There is no code that decreases the reference count of stateful objects
in error path of the nft_add_set_elem(). this causes a leak of reference
count of stateful objects.
Test commands:
$nft add table ip filter
$nft add counter ip filter c1
$nft add map ip filter m1 { type ipv4_addr : counter \;}
$nft add element ip filter m1 { 1 : c1 }
$nft add element ip filter m1 { 1 : c1 }
$nft delete element ip filter m1 { 1 }
$nft delete counter ip filter c1
Result:
Error: Could not process rule: Device or resource busy
delete counter ip filter c1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
At the second 'nft add element ip filter m1 { 1 : c1 }', the reference
count of the 'c1' is increased then it tries to insert into the 'm1'. but
the 'm1' already has same element so it returns -EEXIST.
But it doesn't decrease the reference count of the 'c1' in the error path.
Due to a leak of the reference count of the 'c1', the 'c1' can't be
removed by 'nft delete counter ip filter c1'.
Fixes: 8aeff920dcc9 ("netfilter: nf_tables: add stateful object reference to set elements")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
__nf_tables_dump_rules() stores the current idx value into cb->args[0]
before returning to caller. With multiple chains present, cb->args[0] is
therefore updated after each chain's rules have been traversed. This
though causes the final nf_tables_dump_rules() run (which should return
an skb->len of zero since no rules are left to dump) to continue dumping
rules for each but the first chain. Fix this by moving the cb->args[0]
update to nf_tables_dump_rules().
With no final action to be performed anymore in
__nf_tables_dump_rules(), drop 'out_unfinished' jump label and 'rc'
variable - instead return the appropriate value directly.
Fixes: 241faeceb849c ("netfilter: nf_tables: Speed up selective rule dumps")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When adding / deleting VLANs to / from a bridge port, the bridge driver
first tries to propagate the information via switchdev and falls back to
the 8021q driver in case the underlying driver does not support
switchdev. This can result in a memory leak [1] when VXLAN and mlxsw
ports are enslaved to the bridge:
$ ip link set dev vxlan0 master br0
# No mlxsw ports are enslaved to 'br0', so mlxsw ignores the switchdev
# notification and the bridge driver adds the VLAN on 'vxlan0' via the
# 8021q driver
$ bridge vlan add vid 10 dev vxlan0 pvid untagged
# mlxsw port is enslaved to the bridge
$ ip link set dev swp1 master br0
# mlxsw processes the switchdev notification and the 8021q driver is
# skipped
$ bridge vlan del vid 10 dev vxlan0
This results in 'struct vlan_info' and 'struct vlan_vid_info' being
leaked, as they were allocated by the 8021q driver during VLAN addition,
but never freed as the 8021q driver was skipped during deletion.
Fix this by introducing a new VLAN private flag that indicates whether
the VLAN was added on the port by switchdev or the 8021q driver. If the
VLAN was added by the 8021q driver, then we make sure to delete it via
the 8021q driver as well.
[1]
unreferenced object 0xffff88822d20b1e8 (size 256):
comm "bridge", pid 2532, jiffies 4295216998 (age 1188.830s)
hex dump (first 32 bytes):
e0 42 97 ce 81 88 ff ff 00 00 00 00 00 00 00 00 .B..............
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330
[<00000000e0178b02>] vlan_vid_add+0x661/0x920
[<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00
[<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90
[<000000003535392c>] br_vlan_info+0x132/0x410
[<00000000aedaa9dc>] br_afspec+0x75c/0x870
[<00000000f5716133>] br_setlink+0x3dc/0x6d0
[<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30
[<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80
[<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0
[<000000008be8d614>] rtnetlink_rcv+0x21/0x30
[<000000009ab2ca25>] netlink_unicast+0x52f/0x740
[<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50
[<000000005d1e2050>] sock_sendmsg+0xbe/0x120
[<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0
[<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270
unreferenced object 0xffff888227454308 (size 32):
comm "bridge", pid 2532, jiffies 4295216998 (age 1188.882s)
hex dump (first 32 bytes):
88 b2 20 2d 82 88 ff ff 88 b2 20 2d 82 88 ff ff .. -...... -....
81 00 0a 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330
[<0000000018050631>] vlan_vid_add+0x3e6/0x920
[<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00
[<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90
[<000000003535392c>] br_vlan_info+0x132/0x410
[<00000000aedaa9dc>] br_afspec+0x75c/0x870
[<00000000f5716133>] br_setlink+0x3dc/0x6d0
[<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30
[<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80
[<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0
[<000000008be8d614>] rtnetlink_rcv+0x21/0x30
[<000000009ab2ca25>] netlink_unicast+0x52f/0x740
[<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50
[<000000005d1e2050>] sock_sendmsg+0xbe/0x120
[<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0
[<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270
Fixes: d70e42b22dd4 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: bridge@lists.linux-foundation.org
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After receiving data into the page cache, we need to call flush_dcache_page()
for the architectures that define it.
Fixes: 277e4ab7d530b ("SUNRPC: Simplify TCP receive code by switching...")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.20
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
The clean up is handled by the caller, rpcrdma_buffer_create(), so this
call to rpcrdma_sendctxs_destroy() leads to a double free.
Fixes: ae72950abf99 ("xprtrdma: Add data structure to manage RDMA Send arguments")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
This should return -ENOMEM if __alloc_workqueue_key() fails, but it
returns success.
Fixes: 6d2d0ee27c7a ("xprtrdma: Replace rpcrdma_receive_wq with a per-xprt workqueue")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2019-01-08
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix BSD'ism in sendmsg(2) to rewrite unspecified IPv6 dst for
unconnected UDP sockets with [::1] _after_ cgroup BPF invocation,
from Andrey.
2) Follow-up fix to the speculation fix where we need to reject a
corner case for sanitation when ptr and scalars are mixed in the
same alu op. Also, some unrelated minor doc fixes, from Daniel.
3) Fix BPF kselftest's incorrect uses of create_and_get_cgroup()
by not assuming fd of zero value to be the result of an error
case, from Stanislav.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce a new option abort_on_full, default to false. Then
we can get -ENOSPC when the pool is full, or reaches quota.
[ Don't show abort_on_full in /proc/mounts. ]
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In smc_release() we release smc->clcsock before unhash the smc
sock, but a parallel smc_diag_dump() may be still reading
smc->clcsock, therefore this could cause a use-after-free as
reported by syzbot.
Reported-and-tested-by: syzbot+fbd1e5476e4c94c7b34e@syzkaller.appspotmail.com
Fixes: 51f1de79ad8e ("net/smc: replace sock_put worker by socket refcounting")
Cc: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reported-by: syzbot+0bf2e01269f1274b4b03@syzkaller.appspotmail.com
Reported-by: syzbot+e3132895630f957306bc@syzkaller.appspotmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
state
When handling DNAT'ed packets on a bridge device, the neighbour cache entry
from lookup was used without checking its state. It means that a cache entry
in the NUD_STALE state will be used directly instead of entering the NUD_DELAY
state to confirm the reachability of the neighbor.
This problem becomes worse after commit 2724680bceee ("neigh: Keep neighbour
cache entries if number of them is small enough."), since all neighbour cache
entries in the NUD_STALE state will be kept in the neighbour table as long as
the number of cache entries does not exceed the value specified in gc_thresh1.
This commit validates the state of a neighbour cache entry before using
the entry.
Signed-off-by: JianJhen Chen <kchen@synology.com>
Reviewed-by: JinLin Chen <jlchen@synology.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a memory leak in case genlmsg_put fails.
Fix this by freeing *args* before return.
Addresses-Coverity-ID: 1476406 ("Resource leak")
Fixes: 46273cf7e009 ("tipc: fix a missing check of genlmsg_put")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Yes indeed, DIV_ROUND_UP is in kernel.h.
Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Muyu Yu provided a POC where user root with CAP_NET_ADMIN can create a CAN
frame modification rule that makes the data length code a higher value than
the available CAN frame data size. In combination with a configured checksum
calculation where the result is stored relatively to the end of the data
(e.g. cgw_csum_xor_rel) the tail of the skb (e.g. frag_list pointer in
skb_shared_info) can be rewritten which finally can cause a system crash.
Michael Kubecek suggested to drop frames that have a DLC exceeding the
available space after the modification process and provided a patch that can
handle CAN FD frames too. Within this patch we also limit the length for the
checksum calculations to the maximum of Classic CAN data length (8).
CAN frames that are dropped by these additional checks are counted with the
CGW_DELETED counter which indicates misconfigurations in can-gw rules.
This fixes CVE-2019-3701.
Reported-by: Muyu Yu <ieatmuttonchuan@gmail.com>
Reported-by: Marcus Meissner <meissner@suse.de>
Suggested-by: Michal Kubecek <mkubecek@suse.cz>
Tested-by: Muyu Yu <ieatmuttonchuan@gmail.com>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.2
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
The jump label is controlled by HAVE_JUMP_LABEL, which is defined
like this:
#if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
# define HAVE_JUMP_LABEL
#endif
We can improve this by testing 'asm goto' support in Kconfig, then
make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
match to the real kernel capability.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
|
|
I realized the last patch calls dev_get_by_index_rcu in a branch not
holding the rcu lock. Add the calls to rcu_read_lock and rcu_read_unlock.
Fixes: ec90ad334986 ("ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull ceph updates from Ilya Dryomov:
"A fairly quiet round: a couple of messenger performance improvements
from myself and a few cap handling fixes from Zheng"
* tag 'ceph-for-4.21-rc1' of git://github.com/ceph/ceph-client:
ceph: don't encode inode pathes into reconnect message
ceph: update wanted caps after resuming stale session
ceph: skip updating 'wanted' caps if caps are already issued
ceph: don't request excl caps when mount is readonly
ceph: don't update importing cap's mseq when handing cap export
libceph: switch more to bool in ceph_tcp_sendmsg()
libceph: use MSG_SENDPAGE_NOTLAST with ceph_tcp_sendpage()
libceph: use sock_no_sendpage() as a fallback in ceph_tcp_sendpage()
libceph: drop last_piece logic from write_partial_message_data()
ceph: remove redundant assignment
ceph: cleanup splice_dentry()
|
|
sys_sendmsg has supported unspecified destination IPv6 (wildcard) for
unconnected UDP sockets since 876c7f41. When [::] is passed by user as
destination, sys_sendmsg rewrites it with [::1] to be consistent with
BSD (see "BSD'ism" comment in the code).
This didn't work when cgroup-bpf was enabled though since the rewrite
[::] -> [::1] happened before passing control to cgroup-bpf block where
fl6.daddr was updated with passed by user sockaddr_in6.sin6_addr (that
might or might not be changed by BPF program). That way if user passed
[::] as dst IPv6 it was first rewritten with [::1] by original code from
876c7f41, but then rewritten back with [::] by cgroup-bpf block.
It happened even when BPF_CGROUP_UDP6_SENDMSG program was not present
(CONFIG_CGROUP_BPF=y was enough).
The fix is to apply BSD'ism after cgroup-bpf block so that [::] is
replaced with [::1] no matter where it came from: passed by user to
sys_sendmsg or set by BPF_CGROUP_UDP6_SENDMSG program.
Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg")
Reported-by: Nitin Rawat <nitin.rawat@intel.com>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Similar to c5ee066333eb ("ipv6: Consider sk_bound_dev_if when binding a
socket to an address"), binding a socket to v4 mapped addresses needs to
consider if the socket is bound to a device.
This problem also exists from the beginning of git history.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|