Age | Commit message (Collapse) | Author |
|
TCP resets cause instant transition from established to closed state
provided the reset is in-window. Endpoints that implement RFC 5961
require resets to match the next expected sequence number.
RST segments that are in-window (but that do not match RCV.NXT) are
ignored, and a "challenge ACK" is sent back.
Main problem for conntrack is that its a middlebox, i.e. whereas an end
host might have ACK'd SEQ (and would thus accept an RST with this
sequence number), conntrack might not have seen this ACK (yet).
Therefore we can't simply flag RSTs with non-exact match as invalid.
This updates RST processing as follows:
1. If the connection is in a state other than ESTABLISHED, nothing is
changed, RST is subject to normal in-window check.
2. If the RSTs sequence number either matches exactly RCV.NXT,
connection state moves to CLOSE.
3. The same applies if the RST sequence number aligns with a previous
packet in the same direction.
In all other cases, the connection remains in ESTABLISHED state.
If the normal-in-window check passes, the timeout will be lowered
to that of CLOSE.
If the peer sends a challenge ack, connection timeout will be reset.
If the challenge ACK triggers another RST (RST was valid after all),
this 2nd RST will match expected sequence and conntrack state changes to
CLOSE.
If no challenge ACK is received, the connection will time out after
CLOSE seconds (10 seconds by default), just like without this patch.
Packetdrill test case:
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0
0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
// Receive a segment.
0.210 < P. 1:1001(1000) ack 1 win 46
0.210 > . 1:1(0) ack 1001
// Application writes 1000 bytes.
0.250 write(4, ..., 1000) = 1000
0.250 > P. 1:1001(1000) ack 1001
// First reset, old sequence. Conntrack (correctly) considers this
// invalid due to failed window validation (regardless of this patch).
0.260 < R 2:2(0) ack 1001 win 260
// 2nd reset, but too far ahead sequence. Same: correctly handled
// as invalid.
0.270 < R 99990001:99990001(0) ack 1001 win 260
// in-window, but not exact sequence.
// Current Linux kernels might reply with a challenge ack, and do not
// remove connection.
// Without this patch, conntrack state moves to CLOSE.
// With patch, timeout is lowered like CLOSE, but connection stays
// in ESTABLISHED state.
0.280 < R 1010:1010(0) ack 1001 win 260
// Expect challenge ACK
0.281 > . 1001:1001(0) ack 1001 win 501
// With or without this patch, RST will cause connection
// to move to CLOSE (sequence number matches)
// 0.282 < R 1001:1001(0) ack 1001 win 260
// ACK
0.300 < . 1001:1001(0) ack 1001 win 257
// more data could be exchanged here, connection
// is still established
// Client closes the connection.
0.610 < F. 1001:1001(0) ack 1001 win 260
0.650 > . 1001:1001(0) ack 1002
// Close the connection without reading outstanding data
0.700 close(4) = 0
// so one more reset. Will be deemed acceptable with patch as well:
// connection is already closing.
0.701 > R. 1001:1001(0) ack 1002 win 501
// End packetdrill test case.
With patch, this generates following conntrack events:
[NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
Without patch, first RST moves connection to close, whereas socket state
does not change until FIN is received.
[NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Change the data type of the following variables from int to bool
across ipvs code:
- found
- loop
- need_full_dest
- need_full_svc
- payload_csum
Also change the following functions to use bool full_entry param
instead of int:
- ip_vs_genl_parse_dest()
- ip_vs_genl_parse_service()
This patch does not change any functionality but makes the source
code slightly easier to read.
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Without hardware pnetid support there must currently be a pnet
table configured to determine the IB device port to be used for SMC
RDMA traffic. This patch enables a setup without pnet table, if
the used handshake interface belongs already to a RoCE port.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As per RFC 8033, it is sufficient for the drop probability
decay factor to have a value of (1 - 1/64) instead of 98%.
This avoids the need to do slow division.
Suggested-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
According to Documentation/core-api/printk-formats.rst, size_t should be
printed with %zu, rather than %Zu.
In addition, using %Zu triggers a warning on clang (-Wformat-extra-args):
net/sctp/chunk.c:196:25: warning: data argument not used by format string [-Wformat-extra-args]
__func__, asoc, max_data);
~~~~~~~~~~~~~~~~^~~~~~~~~
./include/linux/printk.h:440:49: note: expanded from macro 'pr_warn_ratelimited'
printk_ratelimited(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
./include/linux/printk.h:424:17: note: expanded from macro 'printk_ratelimited'
printk(fmt, ##__VA_ARGS__); \
~~~ ^
Fixes: 5b5e0928f742 ("lib/vsprintf.c: remove %Z support")
Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Matthias Maennich <maennich@google.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It can be reproduced by following steps:
1. virtio_net NIC is configured with gso/tso on
2. configure nginx as http server with an index file bigger than 1M bytes
3. use tc netem to produce duplicate packets and delay:
tc qdisc add dev eth0 root netem delay 100ms 10ms 30% duplicate 90%
4. continually curl the nginx http server to get index file on client
5. BUG_ON is seen quickly
[10258690.371129] kernel BUG at net/core/skbuff.c:4028!
[10258690.371748] invalid opcode: 0000 [#1] SMP PTI
[10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.0.0-rc6 #2
[10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202
[10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea
[10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002
[10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028
[10258690.372094] FS: 0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000
[10258690.372094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0
[10258690.372094] Call Trace:
[10258690.372094] <IRQ>
[10258690.372094] skb_to_sgvec+0x11/0x40
[10258690.372094] start_xmit+0x38c/0x520 [virtio_net]
[10258690.372094] dev_hard_start_xmit+0x9b/0x200
[10258690.372094] sch_direct_xmit+0xff/0x260
[10258690.372094] __qdisc_run+0x15e/0x4e0
[10258690.372094] net_tx_action+0x137/0x210
[10258690.372094] __do_softirq+0xd6/0x2a9
[10258690.372094] irq_exit+0xde/0xf0
[10258690.372094] smp_apic_timer_interrupt+0x74/0x140
[10258690.372094] apic_timer_interrupt+0xf/0x20
[10258690.372094] </IRQ>
In __skb_to_sgvec(), the skb->len is not equal to the sum of the skb's
linear data size and nonlinear data size, thus BUG_ON triggered.
Because the skb is cloned and a part of nonlinear data is split off.
Duplicate packet is cloned in netem_enqueue() and may be delayed
some time in qdisc. When qdisc len reached the limit and returns
NET_XMIT_DROP, the skb will be retransmit later in write queue.
the skb will be fragmented by tso_fragment(), the limit size
that depends on cwnd and mss decrease, the skb's nonlinear
data will be split off. The length of the skb cloned by netem
will not be updated. When we use virtio_net NIC and invoke skb_to_sgvec(),
the BUG_ON trigger.
To fix it, netem returns NET_XMIT_SUCCESS to upper stack
when it clones a duplicate packet.
Fixes: 35d889d1 ("sch_netem: fix skb leak in netem_enqueue()")
Signed-off-by: Sheng Lan <lansheng@huawei.com>
Reported-by: Qin Ji <jiqin.ji@huawei.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need this functionality for the io_uring file registration, but
we cannot rely on it since CONFIG_UNIX can be modular. Move the helpers
to a separate file, that's always builtin to the kernel if CONFIG_UNIX is
m/y.
No functional changes in this patch, just moving code around.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The submission queue (SQ) and completion queue (CQ) rings are shared
between the application and the kernel. This eliminates the need to
copy data back and forth to submit and complete IO.
IO submissions use the io_uring_sqe data structure, and completions
are generated in the form of io_uring_cqe data structures. The SQ
ring is an index into the io_uring_sqe array, which makes it possible
to submit a batch of IOs without them being contiguous in the ring.
The CQ ring is always contiguous, as completion events are inherently
unordered, and hence any io_uring_cqe entry can point back to an
arbitrary submission.
Two new system calls are added for this:
io_uring_setup(entries, params)
Sets up an io_uring instance for doing async IO. On success,
returns a file descriptor that the application can mmap to
gain access to the SQ ring, CQ ring, and io_uring_sqes.
io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
Initiates IO against the rings mapped to this fd, or waits for
them to complete, or both. The behavior is controlled by the
parameters passed in. If 'to_submit' is non-zero, then we'll
try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
kernel will wait for 'min_complete' events, if they aren't
already available. It's valid to set IORING_ENTER_GETEVENTS
and 'min_complete' == 0 at the same time, this allows the
kernel to return already completed events without waiting
for them. This is useful only for polling, as for IRQ
driven IO, the application can just check the CQ ring
without entering the kernel.
With this setup, it's possible to do async IO with a single system
call. Future developments will enable polled IO with this interface,
and polled submission as well. The latter will enable an application
to do IO without doing ANY system calls at all.
For IRQ driven IO, an application only needs to enter the kernel for
completions if it wants to wait for them to occur.
Each io_uring is backed by a workqueue, to support buffered async IO
as well. We will only punt to an async context if the command would
need to wait for IO on the device side. Any data that can be accessed
directly in the page cache is done inline. This avoids the slowness
issue of usual threadpools, since cached data is accessed as quickly
as a sync interface.
Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The csum calculation is different for IPv4/6. For VLAN packets,
tc_skb_protocol returns the VLAN protocol rather than the packet's one
(e.g. IPv4/6), so csum is not calculated. Furthermore, VLAN may not be
stripped so csum is not calculated in this case too. Calculate the
csum for those cases.
Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are two array out-of-bounds memory accesses, one in
cipso_v4_map_lvl_valid(), the other in netlbl_bitmap_walk(). Both
errors are embarassingly simple, and the fixes are straightforward.
As a FYI for anyone backporting this patch to kernels prior to v4.8,
you'll want to apply the netlbl_bitmap_walk() patch to
cipso_v4_bitmap_walk() as netlbl_bitmap_walk() doesn't exist before
Linux v4.8.
Reported-by: Jann Horn <jannh@google.com>
Fixes: 446fda4f2682 ("[NetLabel]: CIPSOv4 engine")
Fixes: 3faa8f982f95 ("netlabel: Move bitmap manipulation functions to the NetLabel core.")
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ip_route_input_rcu expects the original ingress device (e.g., for
proper multicast handling). The skb->dev can be changed by l3mdev_ip_rcv,
so dev needs to be saved prior to calling it. This was the behavior prior
to the listify changes.
Fixes: 5fa12739a53d0 ("net: ipv4: listify ip_rcv_finish")
Cc: Edward Cree <ecree@solarflare.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tunnel key action params->tcft_enc_metadata is only set when action is
TCA_TUNNEL_KEY_ACT_SET. However, metadata pointer is incorrectly
dereferenced during tunnel key init and release without verifying that
action is if correct type, which causes NULL pointer dereference. Metadata
tunnel dst_cache is also leaked on action overwrite.
Fix metadata handling:
- Verify that metadata pointer is not NULL before dereferencing it in
tunnel_key_init error handling code.
- Move dst_cache destroy code into tunnel_key_release_params() function
that is called in both action overwrite and release cases (fixes resource
leak) and verifies that actions has correct type before dereferencing
metadata pointer (fixes NULL pointer dereference).
Oops with KASAN enabled during tdc tests execution:
[ 261.080482] ==================================================================
[ 261.088049] BUG: KASAN: null-ptr-deref in dst_cache_destroy+0x21/0xa0
[ 261.094613] Read of size 8 at addr 00000000000000b0 by task tc/2976
[ 261.102524] CPU: 14 PID: 2976 Comm: tc Not tainted 5.0.0-rc7+ #157
[ 261.108844] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 261.116726] Call Trace:
[ 261.119234] dump_stack+0x9a/0xeb
[ 261.122625] ? dst_cache_destroy+0x21/0xa0
[ 261.126818] ? dst_cache_destroy+0x21/0xa0
[ 261.131004] kasan_report+0x176/0x192
[ 261.134752] ? idr_get_next+0xd0/0x120
[ 261.138578] ? dst_cache_destroy+0x21/0xa0
[ 261.142768] dst_cache_destroy+0x21/0xa0
[ 261.146799] tunnel_key_release+0x3a/0x50 [act_tunnel_key]
[ 261.152392] tcf_action_cleanup+0x2c/0xc0
[ 261.156490] tcf_generic_walker+0x4c2/0x5c0
[ 261.160794] ? tcf_action_dump_1+0x390/0x390
[ 261.165163] ? tunnel_key_walker+0x5/0x1a0 [act_tunnel_key]
[ 261.170865] ? tunnel_key_walker+0xe9/0x1a0 [act_tunnel_key]
[ 261.176641] tca_action_gd+0x600/0xa40
[ 261.180482] ? tca_get_fill.constprop.17+0x200/0x200
[ 261.185548] ? __lock_acquire+0x588/0x1d20
[ 261.189741] ? __lock_acquire+0x588/0x1d20
[ 261.193922] ? mark_held_locks+0x90/0x90
[ 261.197944] ? mark_held_locks+0x90/0x90
[ 261.202018] ? __nla_parse+0xfe/0x190
[ 261.205774] tc_ctl_action+0x218/0x230
[ 261.209614] ? tcf_action_add+0x230/0x230
[ 261.213726] rtnetlink_rcv_msg+0x3a5/0x600
[ 261.217910] ? lock_downgrade+0x2d0/0x2d0
[ 261.222006] ? validate_linkmsg+0x400/0x400
[ 261.226278] ? find_held_lock+0x6d/0xd0
[ 261.230200] ? match_held_lock+0x1b/0x210
[ 261.234296] ? validate_linkmsg+0x400/0x400
[ 261.238567] netlink_rcv_skb+0xc7/0x1f0
[ 261.242489] ? netlink_ack+0x470/0x470
[ 261.246319] ? netlink_deliver_tap+0x1f3/0x5a0
[ 261.250874] netlink_unicast+0x2ae/0x350
[ 261.254884] ? netlink_attachskb+0x340/0x340
[ 261.261647] ? _copy_from_iter_full+0xdd/0x380
[ 261.268576] ? __virt_addr_valid+0xb6/0xf0
[ 261.275227] ? __check_object_size+0x159/0x240
[ 261.282184] netlink_sendmsg+0x4d3/0x630
[ 261.288572] ? netlink_unicast+0x350/0x350
[ 261.295132] ? netlink_unicast+0x350/0x350
[ 261.301608] sock_sendmsg+0x6d/0x80
[ 261.307467] ___sys_sendmsg+0x48e/0x540
[ 261.313633] ? copy_msghdr_from_user+0x210/0x210
[ 261.320545] ? save_stack+0x89/0xb0
[ 261.326289] ? __lock_acquire+0x588/0x1d20
[ 261.332605] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 261.340063] ? mark_held_locks+0x90/0x90
[ 261.346162] ? do_filp_open+0x138/0x1d0
[ 261.352108] ? may_open_dev+0x50/0x50
[ 261.357897] ? match_held_lock+0x1b/0x210
[ 261.364016] ? __fget_light+0xa6/0xe0
[ 261.369840] ? __sys_sendmsg+0xd2/0x150
[ 261.375814] __sys_sendmsg+0xd2/0x150
[ 261.381610] ? __ia32_sys_shutdown+0x30/0x30
[ 261.388026] ? lock_downgrade+0x2d0/0x2d0
[ 261.394182] ? mark_held_locks+0x1c/0x90
[ 261.400230] ? do_syscall_64+0x1e/0x280
[ 261.406172] do_syscall_64+0x78/0x280
[ 261.411932] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 261.419103] RIP: 0033:0x7f28e91a8b87
[ 261.424791] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48
[ 261.448226] RSP: 002b:00007ffdc5c4e2d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 261.458183] RAX: ffffffffffffffda RBX: 000000005c73c202 RCX: 00007f28e91a8b87
[ 261.467728] RDX: 0000000000000000 RSI: 00007ffdc5c4e340 RDI: 0000000000000003
[ 261.477342] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000000c
[ 261.486970] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
[ 261.496599] R13: 000000000067b4e0 R14: 00007ffdc5c5248c R15: 00007ffdc5c52480
[ 261.506281] ==================================================================
[ 261.516076] Disabling lock debugging due to kernel taint
[ 261.523979] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
[ 261.534413] #PF error: [normal kernel read fault]
[ 261.541730] PGD 8000000317400067 P4D 8000000317400067 PUD 316878067 PMD 0
[ 261.551294] Oops: 0000 [#1] SMP KASAN PTI
[ 261.557985] CPU: 14 PID: 2976 Comm: tc Tainted: G B 5.0.0-rc7+ #157
[ 261.568306] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 261.578874] RIP: 0010:dst_cache_destroy+0x21/0xa0
[ 261.586413] Code: f4 ff ff ff eb f6 0f 1f 00 0f 1f 44 00 00 41 56 41 55 49 c7 c6 60 fe 35 af 41 54 55 49 89 fc 53 bd ff ff ff ff e8 ef 98 73 ff <49> 83 3c 24 00 75 35 eb 6c 4c 63 ed e8 de 98 73 ff 4a 8d 3c ed 40
[ 261.611247] RSP: 0018:ffff888316447160 EFLAGS: 00010282
[ 261.619564] RAX: 0000000000000000 RBX: ffff88835b3e2f00 RCX: ffffffffad1c5071
[ 261.629862] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: 0000000000000297
[ 261.640149] RBP: 00000000ffffffff R08: fffffbfff5dd4e89 R09: fffffbfff5dd4e89
[ 261.650467] R10: 0000000000000001 R11: fffffbfff5dd4e88 R12: 00000000000000b0
[ 261.660785] R13: ffff8883267a10c0 R14: ffffffffaf35fe60 R15: 0000000000000001
[ 261.671110] FS: 00007f28ea3e6400(0000) GS:ffff888364200000(0000) knlGS:0000000000000000
[ 261.682447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 261.691491] CR2: 00000000000000b0 CR3: 00000003178ae004 CR4: 00000000001606e0
[ 261.701283] Call Trace:
[ 261.706374] tunnel_key_release+0x3a/0x50 [act_tunnel_key]
[ 261.714522] tcf_action_cleanup+0x2c/0xc0
[ 261.721208] tcf_generic_walker+0x4c2/0x5c0
[ 261.728074] ? tcf_action_dump_1+0x390/0x390
[ 261.734996] ? tunnel_key_walker+0x5/0x1a0 [act_tunnel_key]
[ 261.743247] ? tunnel_key_walker+0xe9/0x1a0 [act_tunnel_key]
[ 261.751557] tca_action_gd+0x600/0xa40
[ 261.757991] ? tca_get_fill.constprop.17+0x200/0x200
[ 261.765644] ? __lock_acquire+0x588/0x1d20
[ 261.772461] ? __lock_acquire+0x588/0x1d20
[ 261.779266] ? mark_held_locks+0x90/0x90
[ 261.785880] ? mark_held_locks+0x90/0x90
[ 261.792470] ? __nla_parse+0xfe/0x190
[ 261.798738] tc_ctl_action+0x218/0x230
[ 261.805145] ? tcf_action_add+0x230/0x230
[ 261.811760] rtnetlink_rcv_msg+0x3a5/0x600
[ 261.818564] ? lock_downgrade+0x2d0/0x2d0
[ 261.825433] ? validate_linkmsg+0x400/0x400
[ 261.832256] ? find_held_lock+0x6d/0xd0
[ 261.838624] ? match_held_lock+0x1b/0x210
[ 261.845142] ? validate_linkmsg+0x400/0x400
[ 261.851729] netlink_rcv_skb+0xc7/0x1f0
[ 261.857976] ? netlink_ack+0x470/0x470
[ 261.864132] ? netlink_deliver_tap+0x1f3/0x5a0
[ 261.870969] netlink_unicast+0x2ae/0x350
[ 261.877294] ? netlink_attachskb+0x340/0x340
[ 261.883962] ? _copy_from_iter_full+0xdd/0x380
[ 261.890750] ? __virt_addr_valid+0xb6/0xf0
[ 261.897188] ? __check_object_size+0x159/0x240
[ 261.903928] netlink_sendmsg+0x4d3/0x630
[ 261.910112] ? netlink_unicast+0x350/0x350
[ 261.916410] ? netlink_unicast+0x350/0x350
[ 261.922656] sock_sendmsg+0x6d/0x80
[ 261.928257] ___sys_sendmsg+0x48e/0x540
[ 261.934183] ? copy_msghdr_from_user+0x210/0x210
[ 261.940865] ? save_stack+0x89/0xb0
[ 261.946355] ? __lock_acquire+0x588/0x1d20
[ 261.952358] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 261.959468] ? mark_held_locks+0x90/0x90
[ 261.965248] ? do_filp_open+0x138/0x1d0
[ 261.970910] ? may_open_dev+0x50/0x50
[ 261.976386] ? match_held_lock+0x1b/0x210
[ 261.982210] ? __fget_light+0xa6/0xe0
[ 261.987648] ? __sys_sendmsg+0xd2/0x150
[ 261.993263] __sys_sendmsg+0xd2/0x150
[ 261.998613] ? __ia32_sys_shutdown+0x30/0x30
[ 262.004555] ? lock_downgrade+0x2d0/0x2d0
[ 262.010236] ? mark_held_locks+0x1c/0x90
[ 262.015758] ? do_syscall_64+0x1e/0x280
[ 262.021234] do_syscall_64+0x78/0x280
[ 262.026500] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 262.033207] RIP: 0033:0x7f28e91a8b87
[ 262.038421] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48
[ 262.060708] RSP: 002b:00007ffdc5c4e2d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 262.070112] RAX: ffffffffffffffda RBX: 000000005c73c202 RCX: 00007f28e91a8b87
[ 262.079087] RDX: 0000000000000000 RSI: 00007ffdc5c4e340 RDI: 0000000000000003
[ 262.088122] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000000c
[ 262.097157] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
[ 262.106207] R13: 000000000067b4e0 R14: 00007ffdc5c5248c R15: 00007ffdc5c52480
[ 262.115271] Modules linked in: act_tunnel_key act_skbmod act_simple act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 act_csum libcrc32c act_meta_skbtcindex act_meta_skbprio act_meta_mark act_ife ife act_police act_sample psample act_gact veth nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc intel_rapl sb_edac mlx5_ib x86_pkg_temp_thermal sunrpc intel_powerclamp coretemp ib_uverbs kvm_intel ib_core kvm irqbypass mlx5_core crct10dif_pclmul crc32_pclmul crc32c_intel igb ghash_clmulni_intel intel_cstate mlxfw iTCO_wdt devlink intel_uncore iTCO_vendor_support ipmi_ssif ptp mei_me intel_rapl_perf ioatdma joydev pps_core ses mei i2c_i801 pcspkr enclosure lpc_ich dca wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pcc_cpufreq ast i2c_algo_bit drm_kms_helper ttm drm mpt3sas raid_class scsi_transport_sas
[ 262.204393] CR2: 00000000000000b0
[ 262.210390] ---[ end trace 2e41d786f2c7901a ]---
[ 262.226790] RIP: 0010:dst_cache_destroy+0x21/0xa0
[ 262.234083] Code: f4 ff ff ff eb f6 0f 1f 00 0f 1f 44 00 00 41 56 41 55 49 c7 c6 60 fe 35 af 41 54 55 49 89 fc 53 bd ff ff ff ff e8 ef 98 73 ff <49> 83 3c 24 00 75 35 eb 6c 4c 63 ed e8 de 98 73 ff 4a 8d 3c ed 40
[ 262.258311] RSP: 0018:ffff888316447160 EFLAGS: 00010282
[ 262.266304] RAX: 0000000000000000 RBX: ffff88835b3e2f00 RCX: ffffffffad1c5071
[ 262.276251] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: 0000000000000297
[ 262.286208] RBP: 00000000ffffffff R08: fffffbfff5dd4e89 R09: fffffbfff5dd4e89
[ 262.296183] R10: 0000000000000001 R11: fffffbfff5dd4e88 R12: 00000000000000b0
[ 262.306157] R13: ffff8883267a10c0 R14: ffffffffaf35fe60 R15: 0000000000000001
[ 262.316139] FS: 00007f28ea3e6400(0000) GS:ffff888364200000(0000) knlGS:0000000000000000
[ 262.327146] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 262.335815] CR2: 00000000000000b0 CR3: 00000003178ae004 CR4: 00000000001606e0
Fixes: 41411e2fd6b8 ("net/sched: act_tunnel_key: Add dst_cache support")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current fib_multipath_hash_policy can make hash based on the L3 or
L4. But it only work on the outer IP. So a specific tunnel always
has the same hash value. But a specific tunnel may contain so many
inner connections.
This patch provide a generic multipath_hash in floi_common. It can
make a user-define hash which can mix with L3 or L4 hash.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
KASAN report this:
BUG: KASAN: null-ptr-deref in nfc_llcp_build_gb+0x37f/0x540 [nfc]
Read of size 3 at addr 0000000000000000 by task syz-executor.0/5401
CPU: 0 PID: 5401 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xfa/0x1ce lib/dump_stack.c:113
kasan_report+0x171/0x18d mm/kasan/report.c:321
memcpy+0x1f/0x50 mm/kasan/common.c:130
nfc_llcp_build_gb+0x37f/0x540 [nfc]
nfc_llcp_register_device+0x6eb/0xb50 [nfc]
nfc_register_device+0x50/0x1d0 [nfc]
nfcsim_device_new+0x394/0x67d [nfcsim]
? 0xffffffffc1080000
nfcsim_init+0x6b/0x1000 [nfcsim]
do_one_initcall+0xfa/0x5ca init/main.c:887
do_init_module+0x204/0x5f6 kernel/module.c:3460
load_module+0x66b2/0x8570 kernel/module.c:3808
__do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462e99
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f9cb79dcc58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003
RBP: 00007f9cb79dcc70 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9cb79dd6bc
R13: 00000000004bcefb R14: 00000000006f7030 R15: 0000000000000004
nfc_llcp_build_tlv will return NULL on fails, caller should check it,
otherwise will trigger a NULL dereference.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: eda21f16a5ed ("NFC: Set MIU and RW values from CONNECT and CC LLCP frames")
Fixes: d646960f7986 ("NFC: Initial LLCP support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that we have converted all possible callers to using a switchdev
notifier for attributes we do not have a need for implementing
switchdev_ops anymore, and this can be removed from all drivers the
net_device structure.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Drop switchdev_ops.switchdev_port_attr_set. Drop the uses of this field
from all clients, which were migrated to use switchdev notification in
the previous patches.
Add a new function switchdev_port_attr_notify() that sends the switchdev
notifications SWITCHDEV_PORT_ATTR_SET and calls the blocking (process)
notifier chain.
We have one odd case within net/bridge/br_switchdev.c with the
SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attribute identifier that
requires executing from atomic context, we deal with that one
specifically.
Drop __switchdev_port_attr_set() and update switchdev_port_attr_set()
likewise.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Following patches will change the way we communicate setting a port's
attribute and use notifiers towards that goal.
Prepare DSA to support receiving notifier events targeting
SWITCHDEV_PORT_ATTR_SET from both atomic and process context and use a
small helper to translate the event notifier into something that
dsa_slave_port_attr_set() can process.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for allowing switchdev enabled drivers to veto specific
attribute settings from within the context of the caller, introduce a
new switchdev notifier type for port attributes.
Suggested-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 31a998487641 ("net: sched: fw: don't set arg->stop in
fw_walk() when empty")
Cls API function tcf_proto_is_empty() was changed in commit
6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty")
to no longer depend on arg->stop to determine that classifier instance is
empty. Instead, it adds dedicated arg->nonempty field, which makes the fix
in fw classifier no longer necessary.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Initialize the .cmd member by using a designated struct
initializer. This fixes warning of missing field initializers,
and makes code a little easier to read.
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
hashtable is never used for 2-byte keys, remove nft_hash_key().
Fixes: e240cd0df481 ("netfilter: nf_tables: place all set backends in one single module")
Reported-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Use the element from the loop iteration, not the same element we want to
deactivate otherwise this branch always evaluates true.
Fixes: 6c03ae210ce3 ("netfilter: nft_set_hash: add non-resizable hashtable implementation")
Reported-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Call jhash_1word() for the 4-bytes key case from the insertion and
deactivation path, otherwise big endian arch set lookups fail.
Fixes: 446a8268b7f5 ("netfilter: nft_set_hash: add lookup variant for fixed size hashtable")
Reported-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Empty case is fine and does not switch fall-through
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
No need to dirty a cache line if timeout is unchanged.
Also, WARN() is useless here: we crash on 'skb->len' access
if skb is NULL.
Last, ct->timeout is u32, not 'unsigned long' so adapt the
function prototype accordingly.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The l3proto name is gone, its header file is the last trace.
While at it, also remove nf_nat_core.h, its very small and all users
include nf_nat.h too.
before:
text data bss dec hex filename
22948 1612 4136 28696 7018 nf_nat.ko
after removal of l3proto register/unregister functions:
text data bss dec hex filename
22196 1516 4136 27848 6cc8 nf_nat.ko
checkpatch complains about overly long lines, but line breaks
do not make things more readable and the line length gets smaller
here, not larger.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
All l3proto function pointers have been removed.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We can now use direct calls.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We can now use direct calls.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We can now use direct calls.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
after ipv4/6 nat tracker merge, there are no external callers, so
make last function static and remove the header.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
before:
text data bss dec hex filename
16566 1576 4136 22278 5706 nf_nat.ko
3598 844 0 4442 115a nf_nat_ipv6.ko
3187 844 0 4031 fbf nf_nat_ipv4.ko
after:
text data bss dec hex filename
22948 1612 4136 28696 7018 nf_nat.ko
... with ipv4/v6 nat now provided directly via nf_nat.ko.
Also changes:
ret = nf_nat_ipv4_fn(priv, skb, state);
if (ret != NF_DROP && ret != NF_STOLEN &&
into
if (ret != NF_ACCEPT)
return ret;
everywhere.
The nat hooks never should return anything other than
ACCEPT or DROP (and the latter only in rare error cases).
The original code uses multi-line ANDing including assignment-in-if:
if (ret != NF_DROP && ret != NF_STOLEN &&
!(IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) &&
(ct = nf_ct_get(skb, &ctinfo)) != NULL) {
I removed this while moving, breaking those in separate conditionals
and moving the assignments into extra lines.
checkpatch still generates some warnings:
1. Overly long lines (of moved code).
Breaking them is even more ugly. so I kept this as-is.
2. use of extern function declarations in a .c file.
This is necessary evil, we must call
nf_nat_l3proto_register() from the nat core now.
All l3proto related functions are removed later in this series,
those prototypes are then removed as well.
v2: keep empty nf_nat_ipv6_csum_update stub for CONFIG_IPV6=n case.
v3: remove IS_ENABLED(NF_NAT_IPV4/6) tests, NF_NAT_IPVx toggles
are removed here.
v4: also get rid of the assignments in conditionals.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
None of these functions calls any external functions, moving them allows
to avoid both the indirection and a need to export these symbols.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Before:
text data bss dec hex filename
13916 1412 4128 19456 4c00 nf_nat.ko
4510 968 4 5482 156a nf_nat_ipv4.ko
5146 944 8 6098 17d2 nf_nat_ipv6.ko
After:
text data bss dec hex filename
16566 1576 4136 22278 5706 nf_nat.ko
3187 844 0 4031 fbf nf_nat_ipv4.ko
3598 844 0 4442 115a nf_nat_ipv6.ko
... so no drastic changes in combined size.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
They are however frequently triggered by syzkaller, so remove them.
ebtables userspace should never trigger any of these, so there is little
value in making them pr_debug (or ratelimited).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The Amanda CONNECT command has been updated to establish an optional
fourth connection [0]. Previously, a CONNECT command would look like:
CONNECT DATA port0 MESG port1 INDEX port2
nf_conntrack_amanda analyses the CONNECT command string in order to
learn the port numbers of the related DATA, MESG and INDEX streams. As
of amanda v3.4, the CONNECT command can advertise an additional port:
CONNECT DATA port0 MESG port1 INDEX port2 STATE port3
The new STATE stream is not handled, thus the connection on the STATE
port cannot be established.
The patch adds support for STATE streams to the amanda conntrack helper.
I tested with max_expected = 3, leaving the other patch hunks
unmodified. Amanda reports "connection refused" and aborts. After I set
max_expected to 4, the backup completes successfully.
[0] https://github.com/zmanda/amanda/commit/3b8384fc9f2941e2427f44c3aee29f561ed67894#diff-711e502fc81a65182c0954765b42919eR456
Signed-off-by: Florian Tham <tham@fidion.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Add .release_ops, that is called in case of error at a later stage in
the expression initialization path, ie. .select_ops() has been already
set up operations and that needs to be undone. This allows us to unwind
.select_ops from the error path, ie. release the dynamic operations for
this extension.
Moreover, allocate one single operation instead of recycling them, this
comes at the cost of consuming a bit more memory per rule, but it
simplifies the infrastructure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Use div_u64() to resolve build failures on 32-bit platforms.
Fixes: 3f7ae5f3dc52 ("net: sched: pie: add more cases to auto-tune alpha and beta")
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When sending multicast messages via blocking socket,
if sending link is congested (tsk->cong_link_cnt is set to 1),
the sending thread will be put into sleeping state. However,
tipc_sk_filter_rcv() is called under socket spin lock but
tipc_wait_for_cond() is not. So, there is no guarantee that
the setting of tsk->cong_link_cnt to 0 in tipc_sk_proto_rcv() in
CPU-1 will be perceived by CPU-0. If that is the case, the sending
thread in CPU-0 after being waken up, will continue to see
tsk->cong_link_cnt as 1 and put the sending thread into sleeping
state again. The sending thread will sleep forever.
CPU-0 | CPU-1
tipc_wait_for_cond() |
{ |
// condition_ = !tsk->cong_link_cnt |
while ((rc_ = !(condition_))) { |
... |
release_sock(sk_); |
wait_woken(); |
| if (!sock_owned_by_user(sk))
| tipc_sk_filter_rcv()
| {
| ...
| tipc_sk_proto_rcv()
| {
| ...
| tsk->cong_link_cnt--;
| ...
| sk->sk_write_space(sk);
| ...
| }
| ...
| }
sched_annotate_sleep(); |
lock_sock(sk_); |
remove_wait_queue(); |
} |
} |
This commit fixes it by adding memory barrier to tipc_sk_proto_rcv()
and tipc_wait_for_cond().
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This pointer is RCU protected, so proper primitives should be used.
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
MPLS does not support nexthops with an MPLS address family.
Specifically, it does not handle RTA_GATEWAY attribute. Make it
clear by returning an error.
Fixes: 03c0566542f4c ("mpls: Netlink commands to add, remove, and dump routes")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IPv6 currently does not support nexthops outside of the AF_INET6 family.
Specifically, it does not handle RTA_VIA attribute. If it is passed
in a route add request, the actual route added only uses the device
which is clearly not what the user intended:
$ ip -6 ro add 2001:db8:2::/64 via inet 172.16.1.1 dev eth0
$ ip ro ls
...
2001:db8:2::/64 dev eth0 metric 1024 pref medium
Catch this and fail the route add:
$ ip -6 ro add 2001:db8:2::/64 via inet 172.16.1.1 dev eth0
Error: IPv6 does not support RTA_VIA attribute.
Fixes: 03c0566542f4c ("mpls: Netlink commands to add, remove, and dump routes")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IPv4 currently does not support nexthops outside of the AF_INET family.
Specifically, it does not handle RTA_VIA attribute. If it is passed
in a route add request, the actual route added only uses the device
which is clearly not what the user intended:
$ ip ro add 172.16.1.0/24 via inet6 2001:db8:1::1 dev eth0
$ ip ro ls
...
172.16.1.0/24 dev eth0
Catch this and fail the route add:
$ ip ro add 172.16.1.0/24 via inet6 2001:db8:1::1 dev eth0
Error: IPv4 does not support RTA_VIA attribute.
Fixes: 03c0566542f4c ("mpls: Netlink commands to add, remove, and dump routes")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tso_fragment() is only called for packets still in write queue.
Remove the tcp_queue parameter to make this more obvious,
even if the comment clearly states this.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This might speedup tcp_twsk_destructor() a bit,
avoiding a cache line miss.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We prefer static_branch_unlikely() over static_key_false() these days.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This helper is used only once, and its name is no longer relevant.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Function walker_check_empty() incorrectly verifies that tp pointer is not
NULL, instead of actual filter pointer. Fix conditional to check the right
pointer. Adjust filter pointer naming accordingly to other cls API
functions.
Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix the incorrect reference link to RFC 8033
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 76726ccb7f46 ("devlink: add flash update command") and
commit 2d8dc5bbf4e7 ("devlink: Add support for reload")
access devlink ops without NULL-checking. There is, however, no
driver which would pass in NULL ops, so let's just make that
a requirement. Remove the now unnecessary NULL-checking.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|