Age | Commit message (Collapse) | Author |
|
virtio_transport_get_ops() and virtio_transport_send_pkt_info()
can only be used on connecting/connected sockets, since a socket
assigned to a transport is required.
This patch adds a WARN_ON() on virtio_transport_get_ops() to check
this requirement, a comment and a returned error on
virtio_transport_send_pkt_info(),
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With multi-transport support, listener sockets are not bound to any
transport. So, calling virtio_transport_reset(), when an error
occurs, on a listener socket produces the following null-pointer
dereference:
BUG: kernel NULL pointer dereference, address: 00000000000000e8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 20 Comm: kworker/0:1 Not tainted 5.5.0-rc1-ste-00003-gb4be21f316ac-dirty #56
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
Workqueue: virtio_vsock virtio_transport_rx_work [vmw_vsock_virtio_transport]
RIP: 0010:virtio_transport_send_pkt_info+0x20/0x130 [vmw_vsock_virtio_transport_common]
Code: 1f 84 00 00 00 00 00 0f 1f 00 55 48 89 e5 41 57 41 56 41 55 49 89 f5 41 54 49 89 fc 53 48 83 ec 10 44 8b 76 20 e8 c0 ba fe ff <48> 8b 80 e8 00 00 00 e8 64 e3 7d c1 45 8b 45 00 41 8b 8c 24 d4 02
RSP: 0018:ffffc900000b7d08 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88807bf12728 RCX: 0000000000000000
RDX: ffff88807bf12700 RSI: ffffc900000b7d50 RDI: ffff888035c84000
RBP: ffffc900000b7d40 R08: ffff888035c84000 R09: ffffc900000b7d08
R10: ffff8880781de800 R11: 0000000000000018 R12: ffff888035c84000
R13: ffffc900000b7d50 R14: 0000000000000000 R15: ffff88807bf12724
FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000e8 CR3: 00000000790f4004 CR4: 0000000000160ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
virtio_transport_reset+0x59/0x70 [vmw_vsock_virtio_transport_common]
virtio_transport_recv_pkt+0x5bb/0xe50 [vmw_vsock_virtio_transport_common]
? detach_buf_split+0xf1/0x130
virtio_transport_rx_work+0xba/0x130 [vmw_vsock_virtio_transport]
process_one_work+0x1c0/0x300
worker_thread+0x45/0x3c0
kthread+0xfc/0x130
? current_work+0x40/0x40
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: sunrpc kvm_intel kvm vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common irqbypass vsock virtio_rng rng_core
CR2: 00000000000000e8
---[ end trace e75400e2ea2fa824 ]---
This happens because virtio_transport_reset() calls
virtio_transport_send_pkt_info() that can be used only on
connecting/connected sockets.
This patch fixes the issue, using virtio_transport_reset_no_sock()
instead of virtio_transport_reset() when we are handling a listener
socket.
Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
FASTOPEN setsockopt() or sendmsg() may switch the SMC socket to fallback
mode. Once fallback mode is active, the native TCP socket functions are
called. Nevertheless there is a small race window, when FASTOPEN
setsockopt/sendmsg runs in parallel to a connect(), and switch the
socket into fallback mode before connect() takes the sock lock.
Make sure the SMC-specific connect setup is omitted in this case.
This way a syzbot-reported refcount problem is fixed, triggered by
different threads running non-blocking connect() and FASTOPEN_KEY
setsockopt.
Reported-by: syzbot+96d3f9ff6a86d37e44c8@syzkaller.appspotmail.com
Fixes: 6d6dd528d5af ("net/smc: fix refcount non-blocking connect() -part 2")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
At the time commit ce5ec440994b ("tcp: ensure epoll edge trigger
wakeup when write queue is empty") was added to the kernel,
we still had a single write queue, combining rtx and write queues.
Once we moved the rtx queue into a separate rb-tree, testing
if sk_write_queue is empty has been suboptimal.
Indeed, if we have packets in the rtx queue, we probably want
to delay the EPOLLOUT generation at the time incoming packets
will free them, making room, but more importantly avoiding
flooding application with EPOLLOUT events.
Solution is to use tcp_rtx_and_write_queues_empty() helper.
Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Due to how tcp_sendmsg() is implemented, we can have an empty
skb at the tail of the write queue.
Most [1] tcp_write_queue_empty() callers want to know if there is
anything to send (payload and/or FIN)
Instead of checking if the sk_write_queue is empty, we need
to test if tp->write_seq == tp->snd_nxt
[1] tcp_send_fin() was the only caller that expected to
see if an skb was in the write queue, I have changed the code
to reuse the tcp_write_queue_tail() result.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from
write queue in error cases") in linux-4.14 stable triggered
various bugs. One of them has been fixed in commit ba2ddb43f270
("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but
we still have crashes in some occasions.
Root-cause is that when tcp_sendmsg() has allocated a fresh
skb and could not append a fragment before being blocked
in sk_stream_wait_memory(), tcp_write_xmit() might be called
and decide to send this fresh and empty skb.
Sending an empty packet is not only silly, it might have caused
many issues we had in the past with tp->packets_out being
out of sync.
Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Christoph Paasch <cpaasch@apple.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Michal Kubecek and Firo Yang did a very nice analysis of crashes
happening in __inet_lookup_established().
Since a TCP socket can go from TCP_ESTABLISH to TCP_LISTEN
(via a close()/socket()/listen() cycle) without a RCU grace period,
I should not have changed listeners linkage in their hash table.
They must use the nulls protocol (Documentation/RCU/rculist_nulls.txt),
so that a lookup can detect a socket in a hash list was moved in
another one.
Since we added code in commit d296ba60d8e2 ("soreuseport: Resolve
merge conflict for v4/v6 ordering fix"), we have to add
hlist_nulls_add_tail_rcu() helper.
Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Reported-by: Firo Yang <firo.yang@suse.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/netdev/20191120083919.GH27852@unicorn.suse.cz/
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
In commit 4b1373de73a3 ("net: ipv6: addr: perform strict checks also for
doit handlers") we add strict check for inet6_rtm_getaddr(). But we did
the invalid header values check before checking if NETLINK_F_STRICT_CHK
is set. This may break backwards compatibility if user already set the
ifm->ifa_prefixlen, ifm->ifa_flags, ifm->ifa_scope in their netlink code.
I didn't move the nlmsg_len check because I thought it's a valid check.
Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: 4b1373de73a3 ("net: ipv6: addr: perform strict checks also for doit handlers")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
In the function 'tipc_disc_rcv()', the 'msg_peer_net_hash()' is called
to read the header data field but after the message skb has been freed,
that might result in a garbage value...
This commit fixes it by defining a new local variable to store the data
first, just like the other header fields' handling.
Fixes: f73b12812a3d ("tipc: improve throughput between nodes in netns")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a user message is sent, TIPC will check if the socket has faced a
congestion at link layer. If that happens, it will make a sleep to wait
for the congestion to disappear. This leaves a gap for other users to
take over the socket (e.g. multi threads) since the socket is released
as well. Also, in case of connectionless (e.g. SOCK_RDM), user is free
to send messages to various destinations (e.g. via 'sendto()'), then
the socket's preformatted header has to be updated correspondingly
prior to the actual payload message building.
Unfortunately, the latter action is done before the first action which
causes a condition issue that the destination of a certain message can
be modified incorrectly in the middle, leading to wrong destination
when that message is built. Consequently, when the message is sent to
the link layer, it gets stuck there forever because the peer node will
simply reject it. After a number of retransmission attempts, the link
is eventually taken down and the retransmission failure is reported.
This commit fixes the problem by rearranging the order of actions to
prevent the race condition from occurring, so the message building is
'atomic' and its header will not be modified by anyone.
Fixes: 365ad353c256 ("tipc: reduce risk of user starvation during link congestion")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In commit c55c8edafa91 ("tipc: smooth change between replicast and
broadcast"), we allow instant switching between replicast and broadcast
by sending a dummy 'SYN' packet on the last used link to synchronize
packets on the links. The 'SYN' message is an object of link congestion
also, so if that happens, a 'SOCK_WAKEUP' will be scheduled to be sent
back to the socket...
However, in that commit, we simply use the same socket 'cong_link_cnt'
counter for both the 'SYN' & normal payload message sending. Therefore,
if both the replicast & broadcast links are congested, the counter will
be not updated correctly but overwritten by the latter congestion.
Later on, when the 'SOCK_WAKEUP' messages are processed, the counter is
reduced one by one and eventually overflowed. Consequently, further
activities on the socket will only wait for the false congestion signal
to disappear but never been met.
Because sending the 'SYN' message is vital for the mechanism, it should
be done anyway. This commit fixes the issue by marking the message with
an error code e.g. 'TIPC_ERR_NO_PORT', so its sending should not face a
link congestion, there is no need to touch the socket 'cong_link_cnt'
either. In addition, in the event of any error (e.g. -ENOBUFS), we will
purge the entire payload message queue and make a return immediately.
Fixes: c55c8edafa91 ("tipc: smooth change between replicast and broadcast")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current rbtree for service ranges in the name table is built based
on the 'lower' & 'upper' range values resulting in a flaw in the rbtree
searching. Some issues have been observed in case of range overlapping:
Case #1: unable to withdraw a name entry:
After some name services are bound, all of them are withdrawn by user
but one remains in the name table forever. This corrupts the table and
that service becomes dummy i.e. no real port.
E.g.
/
{22, 22}
/
/
---> {10, 50}
/ \
/ \
{10, 30} {20, 60}
The node {10, 30} cannot be removed since the rbtree searching stops at
the node's ancestor i.e. {10, 50}, so starting from it will never reach
the finding node.
Case #2: failed to send data in some cases:
E.g. Two service ranges: {20, 60}, {10, 50} are bound. The rbtree for
this service will be one of the two cases below depending on the order
of the bindings:
{20, 60} {10, 50} <--
/ \ / \
/ \ / \
{10, 50} NIL <-- NIL {20, 60}
(a) (b)
Now, try to send some data to service {30}, there will be two results:
(a): Failed, no route to host.
(b): Ok.
The reason is that the rbtree searching will stop at the pointing node
as shown above.
Case #3: Same as case #2b above but if the data sending's scope is
local and the {10, 50} is published by a peer node, then it will result
in 'no route to host' even though the other {20, 60} is for example on
the local node which should be able to get the data.
The issues are actually due to the way we built the rbtree. This commit
fixes it by introducing an additional field to each node - named 'max',
which is the largest 'upper' of that node subtree. The 'max' value for
each subtrees will be propagated correctly whenever a node is inserted/
removed or the tree is rebalanced by the augmented rbtree callbacks.
By this way, we can change the rbtree searching appoarch to solve the
issues above. Another benefit from this is that we can now improve the
searching for a next range matching e.g. in case of multicast, so get
rid of the unneeded looping over all nodes in the tree.
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is softlockup when using TPACKET_V3:
...
NMI watchdog: BUG: soft lockup - CPU#2 stuck for 60010ms!
(__irq_svc) from [<c0558a0c>] (_raw_spin_unlock_irqrestore+0x44/0x54)
(_raw_spin_unlock_irqrestore) from [<c027b7e8>] (mod_timer+0x210/0x25c)
(mod_timer) from [<c0549c30>]
(prb_retire_rx_blk_timer_expired+0x68/0x11c)
(prb_retire_rx_blk_timer_expired) from [<c027a7ac>]
(call_timer_fn+0x90/0x17c)
(call_timer_fn) from [<c027ab6c>] (run_timer_softirq+0x2d4/0x2fc)
(run_timer_softirq) from [<c021eaf4>] (__do_softirq+0x218/0x318)
(__do_softirq) from [<c021eea0>] (irq_exit+0x88/0xac)
(irq_exit) from [<c0240130>] (msa_irq_exit+0x11c/0x1d4)
(msa_irq_exit) from [<c0209cf0>] (handle_IPI+0x650/0x7f4)
(handle_IPI) from [<c02015bc>] (gic_handle_irq+0x108/0x118)
(gic_handle_irq) from [<c0558ee4>] (__irq_usr+0x44/0x5c)
...
If __ethtool_get_link_ksettings() is failed in
prb_calc_retire_blk_tmo(), msec and tmo will be zero, so tov_in_jiffies
is zero and the timer expire for retire_blk_timer is turn to
mod_timer(&pkc->retire_blk_timer, jiffies + 0),
which will trigger cpu usage of softirq is 100%.
Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
Tested-by: Xiao Jiangfeng <xiaojiangfeng@huawei.com>
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Wait for rcu grace period after releasing netns in ctnetlink,
from Florian Westphal.
2) Incorrect command type in flowtable offload ndo invocation,
from wenxu.
3) Incorrect callback type in flowtable offload flow tuple
updates, also from wenxu.
4) Fix compile warning on flowtable offload infrastructure due to
possible reference to uninitialized variable, from Nathan Chancellor.
5) Do not inline nf_ct_resolve_clash(), this is called from slow
path / stress situations. From Florian Westphal.
6) Missing IPv6 flow selector description in flowtable offload.
7) Missing check for NETDEV_UNREGISTER in nf_tables offload
infrastructure, from wenxu.
8) Update NAT selftest to use randomized netns names, from
Florian Westphal.
9) Restore nfqueue bridge support, from Marco Oliverio.
10) Compilation warning in SCTP_CHUNKMAP_*() on xt_sctp header.
From Phil Sutter.
11) Fix bogus lookup/get match for non-anonymous rbtree sets.
12) Missing netlink validation for NFT_SET_ELEM_INTERVAL_END
elements.
13) Missing netlink validation for NFT_DATA_VALUE after
nft_data_init().
14) If rule specifies no actions, offload infrastructure returns
EOPNOTSUPP.
15) Module refcount leak in object updates.
16) Missing sanitization for ARP traffic from br_netfilter, from
Eric Dumazet.
17) Compilation breakage on big-endian due to incorrect memcpy()
size in the flowtable offload infrastructure.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In function 'memcpy',
inlined from 'flow_offload_mangle' at net/netfilter/nf_flow_table_offload.c:112:2,
inlined from 'flow_offload_port_dnat' at net/netfilter/nf_flow_table_offload.c:373:2,
inlined from 'nf_flow_rule_route_ipv4' at net/netfilter/nf_flow_table_offload.c:424:3:
./include/linux/string.h:376:4: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
376 | __read_overflow2();
| ^~~~~~~~~~~~~~~~~~
The original u8* was done in the hope to make this more adaptable but
consensus is to keep this like it is in tc pedit.
Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Reported-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This is needed, because if the flag X25_ACCPT_APPRV_FLAG is not set on a
socket (manual call confirmation) and the channel is cleared by remote
before the manual call confirmation was sent, this situation needs to
be handled.
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Syzbot found a crash:
BUG: KMSAN: uninit-value in crc32_body lib/crc32.c:112 [inline]
BUG: KMSAN: uninit-value in crc32_le_generic lib/crc32.c:179 [inline]
BUG: KMSAN: uninit-value in __crc32c_le_base+0x4fa/0xd30 lib/crc32.c:202
Call Trace:
crc32_body lib/crc32.c:112 [inline]
crc32_le_generic lib/crc32.c:179 [inline]
__crc32c_le_base+0x4fa/0xd30 lib/crc32.c:202
chksum_update+0xb2/0x110 crypto/crc32c_generic.c:90
crypto_shash_update+0x4c5/0x530 crypto/shash.c:107
crc32c+0x150/0x220 lib/libcrc32c.c:47
sctp_csum_update+0x89/0xa0 include/net/sctp/checksum.h:36
__skb_checksum+0x1297/0x12a0 net/core/skbuff.c:2640
sctp_compute_cksum include/net/sctp/checksum.h:59 [inline]
sctp_packet_pack net/sctp/output.c:528 [inline]
sctp_packet_transmit+0x40fb/0x4250 net/sctp/output.c:597
sctp_outq_flush_transports net/sctp/outqueue.c:1146 [inline]
sctp_outq_flush+0x1823/0x5d80 net/sctp/outqueue.c:1194
sctp_outq_uncork+0xd0/0xf0 net/sctp/outqueue.c:757
sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1781 [inline]
sctp_side_effects net/sctp/sm_sideeffect.c:1184 [inline]
sctp_do_sm+0x8fe1/0x9720 net/sctp/sm_sideeffect.c:1155
sctp_primitive_REQUESTHEARTBEAT+0x175/0x1a0 net/sctp/primitive.c:185
sctp_apply_peer_addr_params+0x212/0x1d40 net/sctp/socket.c:2433
sctp_setsockopt_peer_addr_params net/sctp/socket.c:2686 [inline]
sctp_setsockopt+0x189bb/0x19090 net/sctp/socket.c:4672
The issue was caused by transport->ipaddr set with uninit addr param, which
was passed by:
sctp_transport_init net/sctp/transport.c:47 [inline]
sctp_transport_new+0x248/0xa00 net/sctp/transport.c:100
sctp_assoc_add_peer+0x5ba/0x2030 net/sctp/associola.c:611
sctp_process_param net/sctp/sm_make_chunk.c:2524 [inline]
where 'addr' is set by sctp_v4_from_addr_param(), and it doesn't initialize
the padding of addr->v4.
Later when calling sctp_make_heartbeat(), hbinfo.daddr(=transport->ipaddr)
will become the part of skb, and the issue occurs.
This patch is to fix it by initializing the padding of addr->v4 in
sctp_v4_from_addr_param(), as well as other functions that do the similar
thing, and these functions shouldn't trust that the caller initializes the
memory, as Marcelo suggested.
Reported-by: syzbot+6dcbfea81cd3d4dd0b02@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
neigh_cleanup() has not been used for seven years, and was a wrong design.
Messing with shared pointer in bond_neigh_init() without proper
memory barriers would at least trigger syzbot complains eventually.
It is time to remove this stuff.
Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2019-12-08
this is a pull request of 13 patches for net/master.
The first two patches are by Dan Murphy. He adds himself as a maintainer to the
m-can MMIO and tcan SPI driver.
The next two patches the j1939 stack. The first one is by Oleksij Rempel and
fixes a locking problem found by the syzbot, the second one is by me an fixes a
mistake in the documentation.
Srinivas Neeli fixes missing RX CAN packets on CANFD2.0 in the xilinx driver.
Sean Nyekjaer fixes a possible deadlock in the the flexcan driver after
suspend/resume. Joakim Zhang contributes two patches for the flexcan driver
that fix problems with the low power enter/exit.
The next 4 patches all target the tcan part of the m_can driver. Sean Nyekjaer
adds the required delay after reset and fixes the device tree binding example.
Dan Murphy's patches make the wake-gpio optional.
In the last patch Xiaolong Huang fixes several kernel memory info leaks to the
USB device in the kvaser_usb_leaf driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot is kind enough to remind us we need to call skb_may_pull()
BUG: KMSAN: uninit-value in br_nf_forward_arp+0xe61/0x1230 net/bridge/br_netfilter_hooks.c:665
CPU: 1 PID: 11631 Comm: syz-executor.1 Not tainted 5.4.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c9/0x220 lib/dump_stack.c:118
kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108
__msan_warning+0x64/0xc0 mm/kmsan/kmsan_instr.c:245
br_nf_forward_arp+0xe61/0x1230 net/bridge/br_netfilter_hooks.c:665
nf_hook_entry_hookfn include/linux/netfilter.h:135 [inline]
nf_hook_slow+0x18b/0x3f0 net/netfilter/core.c:512
nf_hook include/linux/netfilter.h:260 [inline]
NF_HOOK include/linux/netfilter.h:303 [inline]
__br_forward+0x78f/0xe30 net/bridge/br_forward.c:109
br_flood+0xef0/0xfe0 net/bridge/br_forward.c:234
br_handle_frame_finish+0x1a77/0x1c20 net/bridge/br_input.c:162
nf_hook_bridge_pre net/bridge/br_input.c:245 [inline]
br_handle_frame+0xfb6/0x1eb0 net/bridge/br_input.c:348
__netif_receive_skb_core+0x20b9/0x51a0 net/core/dev.c:4830
__netif_receive_skb_one_core net/core/dev.c:4927 [inline]
__netif_receive_skb net/core/dev.c:5043 [inline]
process_backlog+0x610/0x13c0 net/core/dev.c:5874
napi_poll net/core/dev.c:6311 [inline]
net_rx_action+0x7a6/0x1aa0 net/core/dev.c:6379
__do_softirq+0x4a1/0x83a kernel/softirq.c:293
do_softirq_own_stack+0x49/0x80 arch/x86/entry/entry_64.S:1091
</IRQ>
do_softirq kernel/softirq.c:338 [inline]
__local_bh_enable_ip+0x184/0x1d0 kernel/softirq.c:190
local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
rcu_read_unlock_bh include/linux/rcupdate.h:688 [inline]
__dev_queue_xmit+0x38e8/0x4200 net/core/dev.c:3819
dev_queue_xmit+0x4b/0x60 net/core/dev.c:3825
packet_snd net/packet/af_packet.c:2959 [inline]
packet_sendmsg+0x8234/0x9100 net/packet/af_packet.c:2984
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg net/socket.c:657 [inline]
__sys_sendto+0xc44/0xc70 net/socket.c:1952
__do_sys_sendto net/socket.c:1964 [inline]
__se_sys_sendto+0x107/0x130 net/socket.c:1960
__x64_sys_sendto+0x6e/0x90 net/socket.c:1960
do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45a679
Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f0a3c9e5c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 000000000045a679
RDX: 000000000000000e RSI: 0000000020000200 RDI: 0000000000000003
RBP: 000000000075bf20 R08: 00000000200000c0 R09: 0000000000000014
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f0a3c9e66d4
R13: 00000000004c8ec1 R14: 00000000004dfe28 R15: 00000000ffffffff
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline]
kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132
kmsan_slab_alloc+0x97/0x100 mm/kmsan/kmsan_hooks.c:86
slab_alloc_node mm/slub.c:2773 [inline]
__kmalloc_node_track_caller+0xe27/0x11a0 mm/slub.c:4381
__kmalloc_reserve net/core/skbuff.c:141 [inline]
__alloc_skb+0x306/0xa10 net/core/skbuff.c:209
alloc_skb include/linux/skbuff.h:1049 [inline]
alloc_skb_with_frags+0x18c/0xa80 net/core/skbuff.c:5662
sock_alloc_send_pskb+0xafd/0x10a0 net/core/sock.c:2244
packet_alloc_skb net/packet/af_packet.c:2807 [inline]
packet_snd net/packet/af_packet.c:2902 [inline]
packet_sendmsg+0x63a6/0x9100 net/packet/af_packet.c:2984
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg net/socket.c:657 [inline]
__sys_sendto+0xc44/0xc70 net/socket.c:1952
__do_sys_sendto net/socket.c:1964 [inline]
__se_sys_sendto+0x107/0x130 net/socket.c:1960
__x64_sys_sendto+0x6e/0x90 net/socket.c:1960
do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: c4e70a87d975 ("netfilter: bridge: rename br_netfilter.c to br_netfilter_hooks.c")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
If the rule only specifies the matching side, return EOPNOTSUPP.
Otherwise, the front-end relies on the drivers to reject this rule.
Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Use __nft_obj_type_get() instead, otherwise there is a module reference
counter leak.
Fixes: d62d0ba97b58 ("netfilter: nf_tables: Introduce stateful object update operation")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Userspace might bogusly sent NFT_DATA_VERDICT in several netlink
attributes that assume NFT_DATA_VALUE. Moreover, make sure that error
path invokes nft_data_release() to decrement the reference count on the
chain object.
Fixes: 96518518cc41 ("netfilter: add nftables")
Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Only NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_FLAGS make sense for elements
whose NFT_SET_ELEM_INTERVAL_END flag is set on.
Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
sets
The existing rbtree implementation might store consecutive elements
where the closing element and the opening element might overlap, eg.
[ a, a+1) [ a+1, a+2)
This patch removes the optimization for non-anonymous sets in the exact
matching case, where it is assumed to stop searching in case that the
closing element is found. Instead, invalidate candidate interval and
keep looking further in the tree.
The lookup/get operation might return false, while there is an element
in the rbtree. Moreover, the get operation returns true as if a+2 would
be in the tree. This happens with named sets after several set updates.
The existing lookup optimization (that only works for the anonymous
sets) might not reach the opening [ a+1,... element if the closing
...,a+1) is found in first place when walking over the rbtree. Hence,
walking the full tree in that case is needed.
This patch fixes the lookup and get operations.
Fixes: e701001e7cbe ("netfilter: nft_rbtree: allow adjacent intervals with dynamic updates")
Fixes: ba0e4d9917b4 ("netfilter: nf_tables: get set elements via netlink")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Pull networking fixes from David Miller:
1) More jumbo frame fixes in r8169, from Heiner Kallweit.
2) Fix bpf build in minimal configuration, from Alexei Starovoitov.
3) Use after free in slcan driver, from Jouni Hogander.
4) Flower classifier port ranges don't work properly in the HW offload
case, from Yoshiki Komachi.
5) Use after free in hns3_nic_maybe_stop_tx(), from Yunsheng Lin.
6) Out of bounds access in mqprio_dump(), from Vladyslav Tarasiuk.
7) Fix flow dissection in dsa TX path, from Alexander Lobakin.
8) Stale syncookie timestampe fixes from Guillaume Nault.
[ Did an evil merge to silence a warning introduced by this pull - Linus ]
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
r8169: fix rtl_hw_jumbo_disable for RTL8168evl
net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add()
r8169: add missing RX enabling for WoL on RTL8125
vhost/vsock: accept only packets with the right dst_cid
net: phy: dp83867: fix hfs boot in rgmii mode
net: ethernet: ti: cpsw: fix extra rx interrupt
inet: protect against too small mtu values.
gre: refetch erspan header from skb->data after pskb_may_pull()
pppoe: remove redundant BUG_ON() check in pppoe_pernet
tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE()
tcp: tighten acceptance of ACKs not matching a child socket
tcp: fix rejected syncookies due to stale timestamps
lpc_eth: kernel BUG on remove
tcp: md5: fix potential overestimation of TCP option space
net: sched: allow indirect blocks to bind to clsact in TC
net: core: rename indirect block ingress cb function
net-sysfs: Call dev_hold always in netdev_queue_add_kobject
net: dsa: fix flow dissection on Tx path
net/tls: Fix return values to avoid ENOTSUPP
net: avoid an indirect call in ____sys_recvmsg()
...
|
|
syzbot reproduced following crash:
===============================================================================
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9844 Comm: syz-executor.0 Not tainted 5.4.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:__lock_acquire+0x1254/0x4a00 kernel/locking/lockdep.c:3828
Code: 00 0f 85 96 24 00 00 48 81 c4 f0 00 00 00 5b 41 5c 41 5d 41 5e 41
5f 5d c3 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <80> 3c 02
00 0f 85 0b 28 00 00 49 81 3e 20 19 78 8a 0f 84 5f ee ff
RSP: 0018:ffff888099c3fb48 EFLAGS: 00010006
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000218 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff888099c3fc60 R08: 0000000000000001 R09: 0000000000000001
R10: fffffbfff146e1d0 R11: ffff888098720400 R12: 00000000000010c0
R13: 0000000000000000 R14: 00000000000010c0 R15: 0000000000000000
FS: 00007f0559e98700(0000) GS:ffff8880ae800000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe4d89e0000 CR3: 0000000099606000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4485
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
_raw_spin_lock_bh+0x33/0x50 kernel/locking/spinlock.c:175
spin_lock_bh include/linux/spinlock.h:343 [inline]
j1939_jsk_del+0x32/0x210 net/can/j1939/socket.c:89
j1939_sk_bind+0x2ea/0x8f0 net/can/j1939/socket.c:448
__sys_bind+0x239/0x290 net/socket.c:1648
__do_sys_bind net/socket.c:1659 [inline]
__se_sys_bind net/socket.c:1657 [inline]
__x64_sys_bind+0x73/0xb0 net/socket.c:1657
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x45a679
Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f0559e97c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 000000000045a679
RDX: 0000000000000018 RSI: 0000000020000240 RDI: 0000000000000003
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f0559e986d4
R13: 00000000004c09e9 R14: 00000000004d37d0 R15: 00000000ffffffff
Modules linked in:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 9844 at kernel/locking/mutex.c:1419
mutex_trylock+0x279/0x2f0 kernel/locking/mutex.c:1427
===============================================================================
This issues was caused by null pointer deference. Where j1939_sk_bind()
was using currently not existing priv.
Possible scenario may look as following:
cpu0 cpu1
bind()
bind()
j1939_sk_bind()
j1939_sk_bind()
priv = jsk->priv;
priv = jsk->priv;
lock_sock(sock->sk);
priv = j1939_netdev_start(ndev);
j1939_jsk_add(priv, jsk);
jsk->priv = priv;
relase_sock(sock->sk);
lock_sock(sock->sk);
j1939_jsk_del(priv, jsk);
..... ooops ......
With this patch we move "priv = jsk->priv;" after the lock, to avoid
assigning of wrong priv pointer.
Reported-by: syzbot+99e9e1b200a1e363237d@syzkaller.appspotmail.com
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: linux-stable <stable@vger.kernel.org> # >= v5.4
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Pull nfsd updates from Bruce Fields:
"This is a relatively quiet cycle for nfsd, mainly various bugfixes.
Possibly most interesting is Trond's fixes for some callback races
that were due to my incomplete understanding of rpc client shutdown.
Unfortunately at the last minute I've started noticing a new
intermittent failure to send callbacks. As the logic seems basically
correct, I'm leaving Trond's patches in for now, and hope to find a
fix in the next week so I don't have to revert those patches"
* tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux: (24 commits)
nfsd: depend on CRYPTO_MD5 for legacy client tracking
NFSD fixing possible null pointer derefering in copy offload
nfsd: check for EBUSY from vfs_rmdir/vfs_unink.
nfsd: Ensure CLONE persists data and metadata changes to the target file
SUNRPC: Fix backchannel latency metrics
nfsd: restore NFSv3 ACL support
nfsd: v4 support requires CRYPTO_SHA256
nfsd: Fix cld_net->cn_tfm initialization
lockd: remove __KERNEL__ ifdefs
sunrpc: remove __KERNEL__ ifdefs
race in exportfs_decode_fh()
nfsd: Drop LIST_HEAD where the variable it declares is never used.
nfsd: document callback_wq serialization of callback code
nfsd: mark cb path down on unknown errors
nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()
nfsd: minor 4.1 callback cleanup
SUNRPC: Fix svcauth_gss_proxy_init()
SUNRPC: Trace gssproxy upcall results
sunrpc: fix crash when cache_head become valid before update
nfsd: remove private bin2hex implementation
...
|
|
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Features:
- NFSv4.2 now supports cross device offloaded copy (i.e. offloaded
copy of a file from one source server to a different target
server).
- New RDMA tracepoints for debugging congestion control and Local
Invalidate WRs.
Bugfixes and cleanups
- Drop the NFSv4.1 session slot if nfs4_delegreturn_prepare waits for
layoutreturn
- Handle bad/dead sessions correctly in nfs41_sequence_process()
- Various bugfixes to the delegation return operation.
- Various bugfixes pertaining to delegations that have been revoked.
- Cleanups to the NFS timespec code to avoid unnecessary conversions
between timespec and timespec64.
- Fix unstable RDMA connections after a reconnect
- Close race between waking an RDMA sender and posting a receive
- Wake pending RDMA tasks if connection fails
- Fix MR list corruption, and clean up MR usage
- Fix another RPCSEC_GSS issue with MIC buffer space"
* tag 'nfs-for-5.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
SUNRPC: Capture completion of all RPC tasks
SUNRPC: Fix another issue with MIC buffer space
NFS4: Trace lock reclaims
NFS4: Trace state recovery operation
NFSv4.2 fix memory leak in nfs42_ssc_open
NFSv4.2 fix kfree in __nfs42_copy_file_range
NFS: remove duplicated include from nfs4file.c
NFSv4: Make _nfs42_proc_copy_notify() static
NFS: Fallocate should use the nfs4_fattr_bitmap
NFS: Return -ETXTBSY when attempting to write to a swapfile
fs: nfs: sysfs: Remove NULL check before kfree
NFS: remove unneeded semicolon
NFSv4: add declaration of current_stateid
NFSv4.x: Drop the slot if nfs4_delegreturn_prepare waits for layoutreturn
NFSv4.x: Handle bad/dead sessions correctly in nfs41_sequence_process()
nfsv4: Move NFSPROC4_CLNT_COPY_NOTIFY to end of list
SUNRPC: Avoid RPC delays when exiting suspend
NFS: Add a tracepoint in nfs_fh_to_dentry()
NFSv4: Don't retry the GETATTR on old stateid in nfs4_delegreturn_done()
NFSv4: Handle NFS4ERR_OLD_STATEID in delegreturn
...
|
|
Use the new tcf_proto_check_kind() helper to make sure user
provided value is well formed.
BUG: KMSAN: uninit-value in string_nocheck lib/vsprintf.c:606 [inline]
BUG: KMSAN: uninit-value in string+0x4be/0x600 lib/vsprintf.c:668
CPU: 0 PID: 12358 Comm: syz-executor.1 Not tainted 5.4.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c9/0x220 lib/dump_stack.c:118
kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108
__msan_warning+0x64/0xc0 mm/kmsan/kmsan_instr.c:245
string_nocheck lib/vsprintf.c:606 [inline]
string+0x4be/0x600 lib/vsprintf.c:668
vsnprintf+0x218f/0x3210 lib/vsprintf.c:2510
__request_module+0x2b1/0x11c0 kernel/kmod.c:143
tcf_proto_lookup_ops+0x171/0x700 net/sched/cls_api.c:139
tc_chain_tmplt_add net/sched/cls_api.c:2730 [inline]
tc_ctl_chain+0x1904/0x38a0 net/sched/cls_api.c:2850
rtnetlink_rcv_msg+0x115a/0x1580 net/core/rtnetlink.c:5224
netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5242
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x110f/0x1330 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg net/socket.c:657 [inline]
___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311
__sys_sendmsg net/socket.c:2356 [inline]
__do_sys_sendmsg net/socket.c:2365 [inline]
__se_sys_sendmsg+0x305/0x460 net/socket.c:2363
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2363
do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45a649
Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f0790795c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 000000000045a649
RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000006
RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f07907966d4
R13: 00000000004c8db5 R14: 00000000004df630 R15: 00000000ffffffff
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline]
kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132
kmsan_slab_alloc+0x97/0x100 mm/kmsan/kmsan_hooks.c:86
slab_alloc_node mm/slub.c:2773 [inline]
__kmalloc_node_track_caller+0xe27/0x11a0 mm/slub.c:4381
__kmalloc_reserve net/core/skbuff.c:141 [inline]
__alloc_skb+0x306/0xa10 net/core/skbuff.c:209
alloc_skb include/linux/skbuff.h:1049 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline]
netlink_sendmsg+0x783/0x1330 net/netlink/af_netlink.c:1892
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg net/socket.c:657 [inline]
___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311
__sys_sendmsg net/socket.c:2356 [inline]
__do_sys_sendmsg net/socket.c:2365 [inline]
__se_sys_sendmsg+0x305/0x460 net/socket.c:2363
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2363
do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 6f96c3c6904c ("net_sched: fix backward compatibility for TCA_KIND")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot was once again able to crash a host by setting a very small mtu
on loopback device.
Let's make inetdev_valid_mtu() available in include/net/ip.h,
and use it in ip_setup_cork(), so that we protect both ip_append_page()
and __ip_append_data()
Also add a READ_ONCE() when the device mtu is read.
Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(),
even if other code paths might write over this field.
Add a big comment in include/linux/netdevice.h about dev->mtu
needing READ_ONCE()/WRITE_ONCE() annotations.
Hopefully we will add the missing ones in followup patches.
[1]
refcount_t: saturated; leaking memory.
WARNING: CPU: 0 PID: 9464 at lib/refcount.c:22 refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 9464 Comm: syz-executor850 Not tainted 5.4.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x197/0x210 lib/dump_stack.c:118
panic+0x2e3/0x75c kernel/panic.c:221
__warn.cold+0x2f/0x3e kernel/panic.c:582
report_bug+0x289/0x300 lib/bug.c:195
fixup_bug arch/x86/kernel/traps.c:174 [inline]
fixup_bug arch/x86/kernel/traps.c:169 [inline]
do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
Code: 06 31 ff 89 de e8 c8 f5 e6 fd 84 db 0f 85 6f ff ff ff e8 7b f4 e6 fd 48 c7 c7 e0 71 4f 88 c6 05 56 a6 a4 06 01 e8 c7 a8 b7 fd <0f> 0b e9 50 ff ff ff e8 5c f4 e6 fd 0f b6 1d 3d a6 a4 06 31 ff 89
RSP: 0018:ffff88809689f550 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff815e4336 RDI: ffffed1012d13e9c
RBP: ffff88809689f560 R08: ffff88809c50a3c0 R09: fffffbfff15d31b1
R10: fffffbfff15d31b0 R11: ffffffff8ae98d87 R12: 0000000000000001
R13: 0000000000040100 R14: ffff888099041104 R15: ffff888218d96e40
refcount_add include/linux/refcount.h:193 [inline]
skb_set_owner_w+0x2b6/0x410 net/core/sock.c:1999
sock_wmalloc+0xf1/0x120 net/core/sock.c:2096
ip_append_page+0x7ef/0x1190 net/ipv4/ip_output.c:1383
udp_sendpage+0x1c7/0x480 net/ipv4/udp.c:1276
inet_sendpage+0xdb/0x150 net/ipv4/af_inet.c:821
kernel_sendpage+0x92/0xf0 net/socket.c:3794
sock_sendpage+0x8b/0xc0 net/socket.c:936
pipe_to_sendpage+0x2da/0x3c0 fs/splice.c:458
splice_from_pipe_feed fs/splice.c:512 [inline]
__splice_from_pipe+0x3ee/0x7c0 fs/splice.c:636
splice_from_pipe+0x108/0x170 fs/splice.c:671
generic_splice_sendpage+0x3c/0x50 fs/splice.c:842
do_splice_from fs/splice.c:861 [inline]
direct_splice_actor+0x123/0x190 fs/splice.c:1035
splice_direct_to_actor+0x3b4/0xa30 fs/splice.c:990
do_splice_direct+0x1da/0x2a0 fs/splice.c:1078
do_sendfile+0x597/0xd00 fs/read_write.c:1464
__do_sys_sendfile64 fs/read_write.c:1525 [inline]
__se_sys_sendfile64 fs/read_write.c:1511 [inline]
__x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x441409
Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fffb64c4f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441409
RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000005
RBP: 0000000000073b8a R08: 0000000000000010 R09: 0000000000000010
R10: 0000000000010001 R11: 0000000000000246 R12: 0000000000402180
R13: 0000000000402210 R14: 0000000000000000 R15: 0000000000000000
Kernel Offset: disabled
Rebooting in 86400 seconds..
Fixes: 1470ddf7f8ce ("inet: Remove explicit write references to sk/inet in ip_append_data")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After pskb_may_pull() we should always refetch the header
pointers from the skb->data in case it got reallocated.
In gre_parse_header(), the erspan header is still fetched
from the 'options' pointer which is fetched before
pskb_may_pull().
Found this during code review of a KMSAN bug report.
Fixes: cb73ee40b1b3 ("net: ip_gre: use erspan key field for tunnel lookup")
Cc: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Acked-by: William Tu <u9012063@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Bridge packets that are forwarded have skb->dst == NULL and get
dropped by the check introduced by
b60a77386b1d4868f72f6353d35dabe5fbe981f2 (net: make skb_dst_force
return true when dst is refcounted).
To fix this we check skb_dst() before skb_dst_force(), so we don't
drop skb packet with dst == NULL. This holds also for skb at the
PRE_ROUTING hook so we remove the second check.
Fixes: b60a77386b1d ("net: make skb_dst_force return true when dst is refcounted")
Signed-off-by: Marco Oliverio <marco.oliverio@tanaza.com>
Signed-off-by: Rocco Folino <rocco.folino@tanaza.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Back in 2008, Adam Langley fixed the corner case of packets for flows
having all of the following options : MD5 TS SACK
Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block
can be cooked from the remaining 8 bytes.
tcp_established_options() correctly sets opts->num_sack_blocks
to zero, but returns 36 instead of 32.
This means TCP cooks packets with 4 extra bytes at the end
of options, containing unitialized bytes.
Fixes: 33ad798c924b ("tcp: options clean up")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a device is bound to a clsact qdisc, bind events are triggered to
registered drivers for both ingress and egress. However, if a driver
registers to such a device using the indirect block routines then it is
assumed that it is only interested in ingress offload and so only replays
ingress bind/unbind messages.
The NFP driver supports the offload of some egress filters when
registering to a block with qdisc of type clsact. However, on unregister,
if the block is still active, it will not receive an unbind egress
notification which can prevent proper cleanup of other registered
callbacks.
Modify the indirect block callback command in TC to send messages of
ingress and/or egress bind depending on the qdisc in use. NFP currently
supports egress offload for TC flower offload so the changes are only
added to TC.
Fixes: 4d12ba42787b ("nfp: flower: allow offloading of matches on 'internal' ports")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With indirect blocks, a driver can register for callbacks from a device
that is does not 'own', for example, a tunnel device. When registering to
or unregistering from a new device, a callback is triggered to generate
a bind/unbind event. This, in turn, allows the driver to receive any
existing rules or to properly clean up installed rules.
When first added, it was assumed that all indirect block registrations
would be for ingress offloads. However, the NFP driver can, in some
instances, support clsact qdisc binds for egress offload.
Change the name of the indirect block callback command in flow_offload to
remove the 'ingress' identifier from it. While this does not change
functionality, a follow up patch will implement a more more generic
callback than just those currently just supporting ingress offload.
Fixes: 4d12ba42787b ("nfp: flower: allow offloading of matches on 'internal' ports")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dev_hold has to be called always in netdev_queue_add_kobject.
Otherwise usage count drops below 0 in case of failure in
kobject_init_and_add.
Fixes: b8eb718348b8 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject")
Reported-by: Hulk Robot <hulkci@huawei.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Miller <davem@davemloft.net>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 43e665287f93 ("net-next: dsa: fix flow dissection") added an
ability to override protocol and network offset during flow dissection
for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
in order to fix skb hashing for RPS on Rx path.
However, skb_hash() and added part of code can be invoked not only on
Rx, but also on Tx path if we have a multi-queued device and:
- kernel is running on UP system or
- XPS is not configured.
The call stack in this two cases will be like: dev_queue_xmit() ->
__dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
skb_tx_hash() -> skb_get_hash().
The problem is that skbs queued for Tx have both network offset and
correct protocol already set up even after inserting a CPU tag by DSA
tagger, so calling tag_ops->flow_dissect() on this path actually only
breaks flow dissection and hashing.
This can be observed by adding debug prints just before and right after
tag_ops->flow_dissect() call to the related block of code:
Before the patch:
Rx path (RPS):
[ 19.240001] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
[ 19.244271] tag_ops->flow_dissect()
[ 19.247811] Rx: proto: 0x0800, nhoff: 8 /* ETH_P_IP */
[ 19.215435] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
[ 19.219746] tag_ops->flow_dissect()
[ 19.223241] Rx: proto: 0x0806, nhoff: 8 /* ETH_P_ARP */
[ 18.654057] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
[ 18.658332] tag_ops->flow_dissect()
[ 18.661826] Rx: proto: 0x8100, nhoff: 8 /* ETH_P_8021Q */
Tx path (UP system):
[ 18.759560] Tx: proto: 0x0800, nhoff: 26 /* ETH_P_IP */
[ 18.763933] tag_ops->flow_dissect()
[ 18.767485] Tx: proto: 0x920b, nhoff: 34 /* junk */
[ 22.800020] Tx: proto: 0x0806, nhoff: 26 /* ETH_P_ARP */
[ 22.804392] tag_ops->flow_dissect()
[ 22.807921] Tx: proto: 0x920b, nhoff: 34 /* junk */
[ 16.898342] Tx: proto: 0x86dd, nhoff: 26 /* ETH_P_IPV6 */
[ 16.902705] tag_ops->flow_dissect()
[ 16.906227] Tx: proto: 0x920b, nhoff: 34 /* junk */
After:
Rx path (RPS):
[ 16.520993] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
[ 16.525260] tag_ops->flow_dissect()
[ 16.528808] Rx: proto: 0x0800, nhoff: 8 /* ETH_P_IP */
[ 15.484807] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
[ 15.490417] tag_ops->flow_dissect()
[ 15.495223] Rx: proto: 0x0806, nhoff: 8 /* ETH_P_ARP */
[ 17.134621] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
[ 17.138895] tag_ops->flow_dissect()
[ 17.142388] Rx: proto: 0x8100, nhoff: 8 /* ETH_P_8021Q */
Tx path (UP system):
[ 15.499558] Tx: proto: 0x0800, nhoff: 26 /* ETH_P_IP */
[ 20.664689] Tx: proto: 0x0806, nhoff: 26 /* ETH_P_ARP */
[ 18.565782] Tx: proto: 0x86dd, nhoff: 26 /* ETH_P_IPV6 */
In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)'
to prevent code from calling tag_ops->flow_dissect() on Tx.
I also decided to initialize 'offset' variable so tagger callbacks can
now safely leave it untouched without provoking a chaos.
Fixes: 43e665287f93 ("net-next: dsa: fix flow dissection")
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ENOTSUPP is not available in userspace, for example:
setsockopt failed, 524, Unknown error 524
Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
CONFIG_RETPOLINE=y made indirect calls expensive.
gcc seems to add an indirect call in ____sys_recvmsg().
Rewriting the code slightly makes sure to avoid this indirection.
Alternative would be to not call sock_recvmsg() and instead
use security_socket_recvmsg() and sock_recvmsg_nosec(),
but this is less readable IMO.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: David Laight <David.Laight@aculab.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to set/get/dump, the tipc uses the generic netlink
infrastructure. So, when tipc module is inserted, init function
calls genl_register_family().
After genl_register_family(), set/get/dump commands are immediately
allowed and these callbacks internally use the net_generic.
net_generic is allocated by register_pernet_device() but this
is called after genl_register_family() in the __init function.
So, these callbacks would use un-initialized net_generic.
Test commands:
#SHELL1
while :
do
modprobe tipc
modprobe -rv tipc
done
#SHELL2
while :
do
tipc link list
done
Splat looks like:
[ 59.616322][ T2788] kasan: CONFIG_KASAN_INLINE enabled
[ 59.617234][ T2788] kasan: GPF could be caused by NULL-ptr deref or user memory access
[ 59.618398][ T2788] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 59.619389][ T2788] CPU: 3 PID: 2788 Comm: tipc Not tainted 5.4.0+ #194
[ 59.620231][ T2788] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 59.621428][ T2788] RIP: 0010:tipc_bcast_get_broadcast_mode+0x131/0x310 [tipc]
[ 59.622379][ T2788] Code: c7 c6 ef 8b 38 c0 65 ff 0d 84 83 c9 3f e8 d7 a5 f2 e3 48 8d bb 38 11 00 00 48 b8 00 00 00 00
[ 59.622550][ T2780] NET: Registered protocol family 30
[ 59.624627][ T2788] RSP: 0018:ffff88804b09f578 EFLAGS: 00010202
[ 59.624630][ T2788] RAX: dffffc0000000000 RBX: 0000000000000011 RCX: 000000008bc66907
[ 59.624631][ T2788] RDX: 0000000000000229 RSI: 000000004b3cf4cc RDI: 0000000000001149
[ 59.624633][ T2788] RBP: ffff88804b09f588 R08: 0000000000000003 R09: fffffbfff4fb3df1
[ 59.624635][ T2788] R10: fffffbfff50318f8 R11: ffff888066cadc18 R12: ffffffffa6cc2f40
[ 59.624637][ T2788] R13: 1ffff11009613eba R14: ffff8880662e9328 R15: ffff8880662e9328
[ 59.624639][ T2788] FS: 00007f57d8f7b740(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000
[ 59.624645][ T2788] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 59.625875][ T2780] tipc: Started in single node mode
[ 59.626128][ T2788] CR2: 00007f57d887a8c0 CR3: 000000004b140002 CR4: 00000000000606e0
[ 59.633991][ T2788] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 59.635195][ T2788] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 59.636478][ T2788] Call Trace:
[ 59.637025][ T2788] tipc_nl_add_bc_link+0x179/0x1470 [tipc]
[ 59.638219][ T2788] ? lock_downgrade+0x6e0/0x6e0
[ 59.638923][ T2788] ? __tipc_nl_add_link+0xf90/0xf90 [tipc]
[ 59.639533][ T2788] ? tipc_nl_node_dump_link+0x318/0xa50 [tipc]
[ 59.640160][ T2788] ? mutex_lock_io_nested+0x1380/0x1380
[ 59.640746][ T2788] tipc_nl_node_dump_link+0x4fd/0xa50 [tipc]
[ 59.641356][ T2788] ? tipc_nl_node_reset_link_stats+0x340/0x340 [tipc]
[ 59.642088][ T2788] ? __skb_ext_del+0x270/0x270
[ 59.642594][ T2788] genl_lock_dumpit+0x85/0xb0
[ 59.643050][ T2788] netlink_dump+0x49c/0xed0
[ 59.643529][ T2788] ? __netlink_sendskb+0xc0/0xc0
[ 59.644044][ T2788] ? __netlink_dump_start+0x190/0x800
[ 59.644617][ T2788] ? __mutex_unlock_slowpath+0xd0/0x670
[ 59.645177][ T2788] __netlink_dump_start+0x5a0/0x800
[ 59.645692][ T2788] genl_rcv_msg+0xa75/0xe90
[ 59.646144][ T2788] ? __lock_acquire+0xdfe/0x3de0
[ 59.646692][ T2788] ? genl_family_rcv_msg_attrs_parse+0x320/0x320
[ 59.647340][ T2788] ? genl_lock_dumpit+0xb0/0xb0
[ 59.647821][ T2788] ? genl_unlock+0x20/0x20
[ 59.648290][ T2788] ? genl_parallel_done+0xe0/0xe0
[ 59.648787][ T2788] ? find_held_lock+0x39/0x1d0
[ 59.649276][ T2788] ? genl_rcv+0x15/0x40
[ 59.649722][ T2788] ? lock_contended+0xcd0/0xcd0
[ 59.650296][ T2788] netlink_rcv_skb+0x121/0x350
[ 59.650828][ T2788] ? genl_family_rcv_msg_attrs_parse+0x320/0x320
[ 59.651491][ T2788] ? netlink_ack+0x940/0x940
[ 59.651953][ T2788] ? lock_acquire+0x164/0x3b0
[ 59.652449][ T2788] genl_rcv+0x24/0x40
[ 59.652841][ T2788] netlink_unicast+0x421/0x600
[ ... ]
Fixes: 7e4369057806 ("tipc: fix a slab object leak")
Fixes: a62fbccecd62 ("tipc: make subscriber server support net namespace")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When user runs a command like
tc qdisc add dev eth1 root mqprio
KASAN stack-out-of-bounds warning is emitted.
Currently, NLA_ALIGN macro used in mqprio_dump provides too large
buffer size as argument for nla_put and memcpy down the call stack.
The flow looks like this:
1. nla_put expects exact object size as an argument;
2. Later it provides this size to memcpy;
3. To calculate correct padding for SKB, nla_put applies NLA_ALIGN
macro itself.
Therefore, NLA_ALIGN should not be applied to the nla_put parameter.
Otherwise it will lead to out-of-bounds memory access in memcpy.
Fixes: 4e8b86c06269 ("mqprio: Introduce new hardware offload mode and shaper in mqprio")
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull more block and io_uring updates from Jens Axboe:
"I wasn't expecting this to be so big, and if I was, I would have used
separate branches for this. Going forward I'll be doing separate
branches for the current tree, just like for the next kernel version
tree. In any case, this contains:
- Series from Christoph that fixes an inherent race condition with
zoned devices and revalidation.
- null_blk zone size fix (Damien)
- Fix for a regression in this merge window that caused busy spins by
sending empty disk uevents (Eric)
- Fix for a regression in this merge window for bfq stats (Hou)
- Fix for io_uring creds allocation failure handling (me)
- io_uring -ERESTARTSYS send/recvmsg fix (me)
- Series that fixes the need for applications to retain state across
async request punts for io_uring. This one is a bit larger than I
would have hoped, but I think it's important we get this fixed for
5.5.
- connect(2) improvement for io_uring, handling EINPROGRESS instead
of having applications needing to poll for it (me)
- Have io_uring use a hash for poll requests instead of an rbtree.
This turned out to work much better in practice, so I think we
should make the switch now. For some workloads, even with a fair
amount of cancellations, the insertion sort is just too expensive.
(me)
- Various little io_uring fixes (me, Jackie, Pavel, LimingWu)
- Fix for brd unaligned IO, and a warning for the future (Ming)
- Fix for a bio integrity data leak (Justin)
- bvec_iter_advance() improvement (Pavel)
- Xen blkback page unmap fix (SeongJae)
The major items in here are all well tested, and on the liburing side
we continue to add regression and feature test cases. We're up to 50
topic cases now, each with anywhere from 1 to more than 10 cases in
each"
* tag 'for-linus-20191205' of git://git.kernel.dk/linux-block: (33 commits)
block: fix memleak of bio integrity data
io_uring: fix a typo in a comment
bfq-iosched: Ensure bio->bi_blkg is valid before using it
io_uring: hook all linked requests via link_list
io_uring: fix error handling in io_queue_link_head
io_uring: use hash table for poll command lookups
io-wq: clear node->next on list deletion
io_uring: ensure deferred timeouts copy necessary data
io_uring: allow IO_SQE_* flags on IORING_OP_TIMEOUT
null_blk: remove unused variable warning on !CONFIG_BLK_DEV_ZONED
brd: warn on un-aligned buffer
brd: remove max_hw_sectors queue limit
xen/blkback: Avoid unmapping unmapped grant pages
io_uring: handle connect -EINPROGRESS like -EAGAIN
block: set the zone size in blk_revalidate_disk_zones atomically
block: don't handle bio based drivers in blk_revalidate_disk_zones
block: allocate the zone bitmaps lazily
block: replace seq_zones_bitmap with conv_zones_bitmap
block: simplify blkdev_nr_zones
block: remove the empty line at the end of blk-zoned.c
...
|
|
Pull ceph updates from Ilya Dryomov:
"The two highlights are a set of improvements to how rbd read-only
mappings are handled and a conversion to the new mount API (slightly
complicated by the fact that we had a common option parsing framework
that called out into rbd and the filesystem instead of them calling
into it).
Also included a few scattered fixes and a MAINTAINERS update for rbd,
adding Dongsheng as a reviewer"
* tag 'ceph-for-5.5-rc1' of git://github.com/ceph/ceph-client:
libceph, rbd, ceph: convert to use the new mount API
rbd: ask for a weaker incompat mask for read-only mappings
rbd: don't query snapshot features
rbd: remove snapshot existence validation code
rbd: don't establish watch for read-only mappings
rbd: don't acquire exclusive lock for read-only mappings
rbd: disallow read-write partitions on images mapped read-only
rbd: treat images mapped read-only seriously
rbd: introduce RBD_DEV_FLAG_READONLY
rbd: introduce rbd_is_snap()
ceph: don't leave ino field in ceph_mds_request_head uninitialized
ceph: tone down loglevel on ceph_mdsc_build_path warning
rbd: update MAINTAINERS info
ceph: fix geting random mds from mdsmap
rbd: fix spelling mistake "requeueing" -> "requeuing"
ceph: make several helper accessors take const pointers
libceph: drop unnecessary check from dispatch() in mon_client.c
|
|
hsr_dev_xmit() calls hsr_port_get_hsr() to find master node and that would
return NULL if master node is not existing in the list.
But hsr_dev_xmit() doesn't check return pointer so a NULL dereference
could occur.
Test commands:
ip netns add nst
ip link add veth0 type veth peer name veth1
ip link add veth2 type veth peer name veth3
ip link set veth1 netns nst
ip link set veth3 netns nst
ip link set veth0 up
ip link set veth2 up
ip link add hsr0 type hsr slave1 veth0 slave2 veth2
ip a a 192.168.100.1/24 dev hsr0
ip link set hsr0 up
ip netns exec nst ip link set veth1 up
ip netns exec nst ip link set veth3 up
ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3
ip netns exec nst ip a a 192.168.100.2/24 dev hsr1
ip netns exec nst ip link set hsr1 up
hping3 192.168.100.2 -2 --flood &
modprobe -rv hsr
Splat looks like:
[ 217.351122][ T1635] kasan: CONFIG_KASAN_INLINE enabled
[ 217.352969][ T1635] kasan: GPF could be caused by NULL-ptr deref or user memory access
[ 217.354297][ T1635] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 217.355507][ T1635] CPU: 1 PID: 1635 Comm: hping3 Not tainted 5.4.0+ #192
[ 217.356472][ T1635] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 217.357804][ T1635] RIP: 0010:hsr_dev_xmit+0x34/0x90 [hsr]
[ 217.373010][ T1635] Code: 48 8d be 00 0c 00 00 be 04 00 00 00 48 83 ec 08 e8 21 be ff ff 48 8d 78 10 48 ba 00 b
[ 217.376919][ T1635] RSP: 0018:ffff8880cd8af058 EFLAGS: 00010202
[ 217.377571][ T1635] RAX: 0000000000000000 RBX: ffff8880acde6840 RCX: 0000000000000002
[ 217.379465][ T1635] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: 0000000000000010
[ 217.380274][ T1635] RBP: ffff8880acde6840 R08: ffffed101b440d5d R09: 0000000000000001
[ 217.381078][ T1635] R10: 0000000000000001 R11: ffffed101b440d5c R12: ffff8880bffcc000
[ 217.382023][ T1635] R13: ffff8880bffcc088 R14: 0000000000000000 R15: ffff8880ca675c00
[ 217.383094][ T1635] FS: 00007f060d9d1740(0000) GS:ffff8880da000000(0000) knlGS:0000000000000000
[ 217.384289][ T1635] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 217.385009][ T1635] CR2: 00007faf15381dd0 CR3: 00000000d523c001 CR4: 00000000000606e0
[ 217.385940][ T1635] Call Trace:
[ 217.386544][ T1635] dev_hard_start_xmit+0x160/0x740
[ 217.387114][ T1635] __dev_queue_xmit+0x1961/0x2e10
[ 217.388118][ T1635] ? check_object+0xaf/0x260
[ 217.391466][ T1635] ? __alloc_skb+0xb9/0x500
[ 217.392017][ T1635] ? init_object+0x6b/0x80
[ 217.392629][ T1635] ? netdev_core_pick_tx+0x2e0/0x2e0
[ 217.393175][ T1635] ? __alloc_skb+0xb9/0x500
[ 217.393727][ T1635] ? rcu_read_lock_sched_held+0x90/0xc0
[ 217.394331][ T1635] ? rcu_read_lock_bh_held+0xa0/0xa0
[ 217.395013][ T1635] ? kasan_unpoison_shadow+0x30/0x40
[ 217.395668][ T1635] ? __kasan_kmalloc.constprop.4+0xa0/0xd0
[ 217.396280][ T1635] ? __kmalloc_node_track_caller+0x3a8/0x3f0
[ 217.399007][ T1635] ? __kasan_kmalloc.constprop.4+0xa0/0xd0
[ 217.400093][ T1635] ? __kmalloc_reserve.isra.46+0x2e/0xb0
[ 217.401118][ T1635] ? memset+0x1f/0x40
[ 217.402529][ T1635] ? __alloc_skb+0x317/0x500
[ 217.404915][ T1635] ? arp_xmit+0xca/0x2c0
[ ... ]
Fixes: 311633b60406 ("hsr: switch ->dellink() to ->ndo_uninit()")
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
blocking_notifier_chain_cond_register() does not consider system_booting
state, which is the only difference between this function and
blocking_notifier_cain_register(). This can be a bug and is a piece of
duplicate code.
Delete blocking_notifier_chain_cond_register()
Link: http://lkml.kernel.org/r/1568861888-34045-4-git-send-email-nixiaoming@huawei.com
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The skb_mpls_push was not updating ethertype of an ethernet packet if
the packet was originally received from a non ARPHRD_ETHER device.
In the below OVS data path flow, since the device corresponding to
port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_push function does
not update the ethertype of the packet even though the previous
push_eth action had added an ethernet header to the packet.
recirc_id(0),in_port(7),eth_type(0x0800),ipv4(tos=0/0xfc,ttl=64,frag=no),
actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
push_mpls(label=13,tc=0,ttl=64,bos=1,eth_type=0x8847),4
Fixes: 8822e270d697 ("net: core: move push MPLS functionality from OvS to core helper")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In a recent change to the SPI subsystem [1], a new `delay` struct was added
to replace the `delay_usecs`. This change replaces the current `delay_secs`
with `delay` for this driver.
The `spi_transfer_delay_exec()` function [in the SPI framework] makes sure
that both `delay_usecs` & `delay` are used (in this order to preserve
backwards compatibility).
[1] commit bebcfd272df6485 ("spi: introduce `delay` field for
`spi_transfer` + spi_transfer_delay_exec()")
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A lockdep splat was observed when trying to remove an xdp memory
model from the table since the mutex was obtained when trying to
remove the entry, but not before the table walk started:
Fix the splat by obtaining the lock before starting the table walk.
Fixes: c3f812cea0d7 ("page_pool: do not release pool until inflight == 0.")
Reported-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The act_ct TC module shares a common conntrack and NAT infrastructure
exposed via netfilter. It's possible that a packet needs both SNAT and
DNAT manipulation, due to e.g. tuple collision. Netfilter can support
this because it runs through the NAT table twice - once on ingress and
again after egress. The act_ct action doesn't have such capability.
Like netfilter hook infrastructure, we should run through NAT twice to
keep the symmetry.
Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|