Age | Commit message (Collapse) | Author |
|
peakrate
For high speed adapter like Mellanox CX-5 card, it can reach upto
100 Gbits per second bandwidth. Currently htb already supports 64bit rate
in tc utility. However police action rate and peakrate are still limited
to 32bit value (upto 32 Gbits per second). Add 2 new attributes
TCA_POLICE_RATE64 and TCA_POLICE_RATE64 in kernel for 64bit support
so that tc utility can use them for 64bit rate and peakrate value to
break the 32bit limit, and still keep the backward binary compatibility.
Tested-by: David Dai <zdai@linux.vnet.ibm.com>
Signed-off-by: David Dai <zdai@linux.vnet.ibm.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Offloaded OvS datapath rules are translated one to one to tc rules,
for example the following simplified OvS rule:
recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)
Will be translated to the following tc rule:
$ tc filter add dev dev1 ingress \
prio 1 chain 0 proto ip \
flower tcp ct_state -trk \
action ct pipe \
action goto chain 2
Received packets will first travel though tc, and if they aren't stolen
by it, like in the above rule, they will continue to OvS datapath.
Since we already did some actions (action ct in this case) which might
modify the packets, and updated action stats, we would like to continue
the proccessing with the correct recirc_id in OvS (here recirc_id(2))
where we left off.
To support this, introduce a new skb extension for tc, which
will be used for translating tc chain to ovs recirc_id to
handle these miss cases. Last tc chain index will be set
by tc goto chain action and read by OvS datapath.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Variable port_rate is being initialized with a value that is never read
and is being re-assigned a little later on. The assignment is redundant
and hence can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
r8152 conflicts are the NAPI fixes in 'net' overlapping with
some tasklet stuff in net-next
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The discussion to be made is absolutely the same as in the case of
previous patch ("taprio: Set default link speed to 10 Mbps in
taprio_set_picos_per_byte"). Nothing is lost when setting a default.
Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com>
Fixes: e0a7683d30e9 ("net/sched: cbs: fix port_rate miscalculation")
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The taprio budget needs to be adapted at runtime according to interface
link speed. But that handling is problematic.
For one thing, installing a qdisc on an interface that doesn't have
carrier is not illegal. But taprio prints the following stack trace:
[ 31.851373] ------------[ cut here ]------------
[ 31.856024] WARNING: CPU: 1 PID: 207 at net/sched/sch_taprio.c:481 taprio_dequeue+0x1a8/0x2d4
[ 31.864566] taprio: dequeue() called with unknown picos per byte.
[ 31.864570] Modules linked in:
[ 31.873701] CPU: 1 PID: 207 Comm: tc Not tainted 5.3.0-rc5-01199-g8838fe023cd6 #1689
[ 31.881398] Hardware name: Freescale LS1021A
[ 31.885661] [<c03133a4>] (unwind_backtrace) from [<c030d8cc>] (show_stack+0x10/0x14)
[ 31.893368] [<c030d8cc>] (show_stack) from [<c10ac958>] (dump_stack+0xb4/0xc8)
[ 31.900555] [<c10ac958>] (dump_stack) from [<c0349d04>] (__warn+0xe0/0xf8)
[ 31.907395] [<c0349d04>] (__warn) from [<c0349d64>] (warn_slowpath_fmt+0x48/0x6c)
[ 31.914841] [<c0349d64>] (warn_slowpath_fmt) from [<c0f38db4>] (taprio_dequeue+0x1a8/0x2d4)
[ 31.923150] [<c0f38db4>] (taprio_dequeue) from [<c0f227b0>] (__qdisc_run+0x90/0x61c)
[ 31.930856] [<c0f227b0>] (__qdisc_run) from [<c0ec82ac>] (net_tx_action+0x12c/0x2bc)
[ 31.938560] [<c0ec82ac>] (net_tx_action) from [<c0302298>] (__do_softirq+0x130/0x3c8)
[ 31.946350] [<c0302298>] (__do_softirq) from [<c03502a0>] (irq_exit+0xbc/0xd8)
[ 31.953536] [<c03502a0>] (irq_exit) from [<c03a4808>] (__handle_domain_irq+0x60/0xb4)
[ 31.961328] [<c03a4808>] (__handle_domain_irq) from [<c0754478>] (gic_handle_irq+0x58/0x9c)
[ 31.969638] [<c0754478>] (gic_handle_irq) from [<c0301a8c>] (__irq_svc+0x6c/0x90)
[ 31.977076] Exception stack(0xe8167b20 to 0xe8167b68)
[ 31.982100] 7b20: e9d4bd80 00000cc0 000000cf 00000000 e9d4bd80 c1f38958 00000cc0 c1f38960
[ 31.990234] 7b40: 00000001 000000cf 00000004 e9dc0800 00000000 e8167b70 c0f478ec c0f46d94
[ 31.998363] 7b60: 60070013 ffffffff
[ 32.001833] [<c0301a8c>] (__irq_svc) from [<c0f46d94>] (netlink_trim+0x18/0xd8)
[ 32.009104] [<c0f46d94>] (netlink_trim) from [<c0f478ec>] (netlink_broadcast_filtered+0x34/0x414)
[ 32.017930] [<c0f478ec>] (netlink_broadcast_filtered) from [<c0f47cec>] (netlink_broadcast+0x20/0x28)
[ 32.027102] [<c0f47cec>] (netlink_broadcast) from [<c0eea378>] (rtnetlink_send+0x34/0x88)
[ 32.035238] [<c0eea378>] (rtnetlink_send) from [<c0f25890>] (notify_and_destroy+0x2c/0x44)
[ 32.043461] [<c0f25890>] (notify_and_destroy) from [<c0f25e08>] (qdisc_graft+0x398/0x470)
[ 32.051595] [<c0f25e08>] (qdisc_graft) from [<c0f27a00>] (tc_modify_qdisc+0x3a4/0x724)
[ 32.059470] [<c0f27a00>] (tc_modify_qdisc) from [<c0ee4c84>] (rtnetlink_rcv_msg+0x260/0x2ec)
[ 32.067864] [<c0ee4c84>] (rtnetlink_rcv_msg) from [<c0f4a988>] (netlink_rcv_skb+0xb8/0x110)
[ 32.076172] [<c0f4a988>] (netlink_rcv_skb) from [<c0f4a170>] (netlink_unicast+0x1b4/0x22c)
[ 32.084392] [<c0f4a170>] (netlink_unicast) from [<c0f4a5e4>] (netlink_sendmsg+0x33c/0x380)
[ 32.092614] [<c0f4a5e4>] (netlink_sendmsg) from [<c0ea9f40>] (sock_sendmsg+0x14/0x24)
[ 32.100403] [<c0ea9f40>] (sock_sendmsg) from [<c0eaa780>] (___sys_sendmsg+0x214/0x228)
[ 32.108279] [<c0eaa780>] (___sys_sendmsg) from [<c0eabad0>] (__sys_sendmsg+0x50/0x8c)
[ 32.116068] [<c0eabad0>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x54)
[ 32.123938] Exception stack(0xe8167fa8 to 0xe8167ff0)
[ 32.128960] 7fa0: b6fa68c8 000000f8 00000003 bea142d0 00000000 00000000
[ 32.137093] 7fc0: b6fa68c8 000000f8 0052154c 00000128 5d6468a2 00000000 00000028 00558c9c
[ 32.145224] 7fe0: 00000070 bea14278 00530d64 b6e17e64
[ 32.150659] ---[ end trace 2139c9827c3e5177 ]---
This happens because the qdisc ->dequeue callback gets called. Which
again is not illegal, the qdisc will dequeue even when the interface is
up but doesn't have carrier (and hence SPEED_UNKNOWN), and the frames
will be dropped further down the stack in dev_direct_xmit().
And, at the end of the day, for what? For calculating the initial budget
of an interface which is non-operational at the moment and where frames
will get dropped anyway.
So if we can't figure out the link speed, default to SPEED_10 and move
along. We can also remove the runtime check now.
Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com>
Fixes: 7b9eba7ba0c1 ("net/sched: taprio: fix picos_per_byte miscalculation")
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
taprio_init may fail earlier than this line:
list_add(&q->taprio_list, &taprio_list);
i.e. due to the net device not being multi queue.
Attempting to remove q from the global taprio_list when it is not part
of it will result in a kernel panic.
Fix it by matching list_add and list_del better to one another in the
order of operations. This way we can keep the deletion unconditional
and with lower complexity - O(1).
Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com>
Fixes: 7b9eba7ba0c1 ("net/sched: taprio: fix picos_per_byte miscalculation")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Recent rtnl lock removal patch changed flow_action infra to require proper
cleanup besides simple memory deallocation. However, matchall classifier
was not updated to call tc_cleanup_flow_action(). Add proper cleanup to
mall_replace_hw_filter() and mall_reoffload().
Fixes: 5a6ff4b13d59 ("net: sched: take reference to action dev before calling offloads")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement this callback in order to get the offloaded stats added to the
kernel stats.
Reported-by: Pengfei Liu <pengfeil@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that 'TCQ_F_CPUSTATS' bit can be cleared, depending on the value of
'TCQ_F_NOLOCK' bit in the parent qdisc, we can't assume anymore that
per-cpu counters are there in the error path of skb_array_produce().
Otherwise, the following splat can be seen:
Unable to handle kernel paging request at virtual address 0000600dea430008
Mem abort info:
ESR = 0x96000005
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000005
CM = 0, WnR = 0
user pgtable: 64k pages, 48-bit VAs, pgdp = 000000007b97530e
[0000600dea430008] pgd=0000000000000000, pud=0000000000000000
Internal error: Oops: 96000005 [#1] SMP
[...]
pstate: 10000005 (nzcV daif -PAN -UAO)
pc : pfifo_fast_enqueue+0x524/0x6e8
lr : pfifo_fast_enqueue+0x46c/0x6e8
sp : ffff800d39376fe0
x29: ffff800d39376fe0 x28: 1ffff001a07d1e40
x27: ffff800d03e8f188 x26: ffff800d03e8f200
x25: 0000000000000062 x24: ffff800d393772f0
x23: 0000000000000000 x22: 0000000000000403
x21: ffff800cca569a00 x20: ffff800d03e8ee00
x19: ffff800cca569a10 x18: 00000000000000bf
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: ffff1001a726edd0
x13: 1fffe4000276a9a4 x12: 0000000000000000
x11: dfff200000000000 x10: ffff800d03e8f1a0
x9 : 0000000000000003 x8 : 0000000000000000
x7 : 00000000f1f1f1f1 x6 : ffff1001a726edea
x5 : ffff800cca56a53c x4 : 1ffff001bf9a8003
x3 : 1ffff001bf9a8003 x2 : 1ffff001a07d1dcb
x1 : 0000600dea430000 x0 : 0000600dea430008
Process ping (pid: 6067, stack limit = 0x00000000dc0aa557)
Call trace:
pfifo_fast_enqueue+0x524/0x6e8
htb_enqueue+0x660/0x10e0 [sch_htb]
__dev_queue_xmit+0x123c/0x2de0
dev_queue_xmit+0x24/0x30
ip_finish_output2+0xc48/0x1720
ip_finish_output+0x548/0x9d8
ip_output+0x334/0x788
ip_local_out+0x90/0x138
ip_send_skb+0x44/0x1d0
ip_push_pending_frames+0x5c/0x78
raw_sendmsg+0xed8/0x28d0
inet_sendmsg+0xc4/0x5c0
sock_sendmsg+0xac/0x108
__sys_sendto+0x1ac/0x2a0
__arm64_sys_sendto+0xc4/0x138
el0_svc_handler+0x13c/0x298
el0_svc+0x8/0xc
Code: f9402e80 d538d081 91002000 8b010000 (885f7c03)
Fix this by testing the value of 'TCQ_F_CPUSTATS' bit in 'qdisc->flags',
before dereferencing 'qdisc->cpu_qstats'.
Fixes: 8a53e616de29 ("net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too")
CC: Paolo Abeni <pabeni@redhat.com>
CC: Stefano Brivio <sbrivio@redhat.com>
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Action sample doesn't properly handle psample_group pointer in overwrite
case. Following issues need to be fixed:
- In tcf_sample_init() function RCU_INIT_POINTER() is used to set
s->psample_group, even though we neither setting the pointer to NULL, nor
preventing concurrent readers from accessing the pointer in some way.
Use rcu_swap_protected() instead to safely reset the pointer.
- Old value of s->psample_group is not released or deallocated in any way,
which results resource leak. Use psample_group_put() on non-NULL value
obtained with rcu_swap_protected().
- The function psample_group_put() that released reference to struct
psample_group pointed by rcu-pointer s->psample_group doesn't respect rcu
grace period when deallocating it. Extend struct psample_group with rcu
head and use kfree_rcu when freeing it.
Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that 'TCQ_F_CPUSTATS' bit can be cleared, depending on the value of
'TCQ_F_NOLOCK' bit in the parent qdisc, we need to be sure that per-cpu
counters are present when 'reset()' is called for pfifo_fast qdiscs.
Otherwise, the following script:
# tc q a dev lo handle 1: root htb default 100
# tc c a dev lo parent 1: classid 1:100 htb \
> rate 95Mbit ceil 100Mbit burst 64k
[...]
# tc f a dev lo parent 1: protocol arp basic classid 1:100
[...]
# tc q a dev lo parent 1:100 handle 100: pfifo_fast
[...]
# tc q d dev lo root
can generate the following splat:
Unable to handle kernel paging request at virtual address dfff2c01bd148000
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
[dfff2c01bd148000] address between user and kernel address ranges
Internal error: Oops: 96000004 [#1] SMP
[...]
pstate: 80000005 (Nzcv daif -PAN -UAO)
pc : pfifo_fast_reset+0x280/0x4d8
lr : pfifo_fast_reset+0x21c/0x4d8
sp : ffff800d09676fa0
x29: ffff800d09676fa0 x28: ffff200012ee22e4
x27: dfff200000000000 x26: 0000000000000000
x25: ffff800ca0799958 x24: ffff1001940f332b
x23: 0000000000000007 x22: ffff200012ee1ab8
x21: 0000600de8a40000 x20: 0000000000000000
x19: ffff800ca0799900 x18: 0000000000000000
x17: 0000000000000002 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: ffff1001b922e6e2
x11: 1ffff001b922e6e1 x10: 0000000000000000
x9 : 1ffff001b922e6e1 x8 : dfff200000000000
x7 : 0000000000000000 x6 : 0000000000000000
x5 : 1fffe400025dc45c x4 : 1fffe400025dc357
x3 : 00000c01bd148000 x2 : 0000600de8a40000
x1 : 0000000000000007 x0 : 0000600de8a40004
Call trace:
pfifo_fast_reset+0x280/0x4d8
qdisc_reset+0x6c/0x370
htb_reset+0x150/0x3b8 [sch_htb]
qdisc_reset+0x6c/0x370
dev_deactivate_queue.constprop.5+0xe0/0x1a8
dev_deactivate_many+0xd8/0x908
dev_deactivate+0xe4/0x190
qdisc_graft+0x88c/0xbd0
tc_get_qdisc+0x418/0x8a8
rtnetlink_rcv_msg+0x3a8/0xa78
netlink_rcv_skb+0x18c/0x328
rtnetlink_rcv+0x28/0x38
netlink_unicast+0x3c4/0x538
netlink_sendmsg+0x538/0x9a0
sock_sendmsg+0xac/0xf8
___sys_sendmsg+0x53c/0x658
__sys_sendmsg+0xc8/0x140
__arm64_sys_sendmsg+0x74/0xa8
el0_svc_handler+0x164/0x468
el0_svc+0x10/0x14
Code: 910012a0 92400801 d343fc03 11000c21 (38fb6863)
Fix this by testing the value of 'TCQ_F_CPUSTATS' bit in 'qdisc->flags',
before dereferencing 'qdisc->cpu_qstats'.
Changes since v1:
- coding style improvements, thanks to Stefano Brivio
Fixes: 8a53e616de29 ("net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too")
CC: Paolo Abeni <pabeni@redhat.com>
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The net pointer in struct xt_tgdtor_param is not explicitly
initialized therefore is still NULL when dereferencing it.
So we have to find a way to pass the correct net pointer to
ipt_destroy_target().
The best way I find is just saving the net pointer inside the per
netns struct tcf_idrinfo, which could make this patch smaller.
Fixes: 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset")
Reported-and-tested-by: itugrok@yahoo.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Don't manually take rtnl lock in flower classifier before calling cls
hardware offloads API. Instead, pass rtnl lock status via 'rtnl_held'
parameter.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to remove dependency on rtnl lock, modify tc_setup_flow_action()
to copy tunnel info, instead of just saving pointer to tunnel_key action
tunnel info. This is necessary to prevent concurrent action overwrite from
releasing tunnel info while it is being used by rtnl-unlocked driver.
Implement helper tcf_tunnel_info_copy() that is used to copy tunnel info
with all its options to dynamically allocated memory block. Modify
tc_cleanup_flow_action() to free dynamically allocated tunnel info.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to remove dependency on rtnl lock when calling hardware offload
API, take reference to action mirred dev when initializing flow_action
structure in tc_setup_flow_action(). Implement function
tc_cleanup_flow_action(), use it to release the device after hardware
offload API is done using it.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to allow using new flow_action infrastructure from unlocked
classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held'
argument. Take rtnl lock before accessing tc_action data. This is necessary
to protect from concurrent action replace.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to remove dependency on rtnl lock from offloads code of
classifiers, take rtnl lock conditionally before executing driver
callbacks. Only obtain rtnl lock if block is bound to devices that require
it.
Block bind/unbind code is rtnl-locked and obtains block->cb_lock while
holding rtnl lock. Obtain locks in same order in tc_setup_cb_*() functions
to prevent deadlock.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extend struct flow_block_offload with "unlocked_driver_cb" flag to allow
registering and unregistering block hardware offload callbacks that do not
require caller to hold rtnl lock. Extend tcf_block with additional
lockeddevcnt counter that is incremented for each non-unlocked driver
callback attached to device. This counter is necessary to conditionally
obtain rtnl lock before calling hardware callbacks in following patches.
Register mlx5 tc block offload callbacks as "unlocked".
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To remove dependency on rtnl lock, extend classifier ops with new
ops->hw_add() and ops->hw_del() callbacks. Call them from cls API while
holding cb_lock every time filter if successfully added to or deleted from
hardware.
Implement the new API in flower classifier. Use it to manage hw_filters
list under cb_lock protection, instead of relying on rtnl lock to
synchronize with concurrent fl_reoffload() call.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Without rtnl lock protection filters can no longer safely manage block
offloads counter themselves. Refactor cls API to protect block offloadcnt
with tcf_block->cb_lock that is already used to protect driver callback
list and nooffloaddevcnt counter. The counter can be modified by concurrent
tasks by new functions that execute block callbacks (which is safe with
previous patch that changed its type to atomic_t), however, block
bind/unbind code that checks the counter value takes cb_lock in write mode
to exclude any concurrent modifications. This approach prevents race
conditions between bind/unbind and callback execution code but allows for
concurrency for tc rule update path.
Move block offload counter, filter in hardware counter and filter flags
management from classifiers into cls hardware offloads API. Make functions
tcf_block_offload_{inc|dec}() and tc_cls_offload_cnt_update() to be cls API
private. Implement following new cls API to be used instead:
tc_setup_cb_add() - non-destructive filter add. If filter that wasn't
already in hardware is successfully offloaded, increment block offloads
counter, set filter in hardware counter and flag. On failure, previously
offloaded filter is considered to be intact and offloads counter is not
decremented.
tc_setup_cb_replace() - destructive filter replace. Release existing
filter block offload counter and reset its in hardware counter and flag.
Set new filter in hardware counter and flag. On failure, previously
offloaded filter is considered to be destroyed and offload counter is
decremented.
tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block
offloads counter.
tc_setup_cb_reoffload() - reoffload filter to single cb. Execute cb() and
call tc_cls_offload_cnt_update() if cb() didn't return an error.
Refactor all offload-capable classifiers to atomically offload filters to
hardware, change block offload counter, and set filter in hardware counter
and flag by means of the new cls API functions.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As a preparation for running proto ops functions without rtnl lock, change
offload counter type to atomic. This is necessary to allow updating the
counter by multiple concurrent users when offloading filters to hardware
from unlocked classifiers.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to remove dependency on rtnl lock, extend tcf_block with 'cb_lock'
rwsem and use it to protect flow_block->cb_list and related counters from
concurrent modification. The lock is taken in read mode for read-only
traversal of cb_list in tc_setup_cb_call() and write mode in all other
cases. This approach ensures that:
- cb_list is not changed concurrently while filters is being offloaded on
block.
- block->nooffloaddevcnt is checked while holding the lock in read mode,
but is only changed by bind/unbind code when holding the cb_lock in write
mode to prevent concurrent modification.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Merge conflict of mlx5 resolved using instructions in merge
commit 9566e650bf7fdf58384bb06df634f7531ca3a97e.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
net/sched/sch_taprio.c:680:32: warning:
entry_list_policy defined but not used [-Wunused-const-variable=]
One of the points of commit a3d43c0d56f1 ("taprio: Add support adding
an admin schedule") is that it removes support (it now returns "not
supported") for schedules using the TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY
attribute (which were never used), the parsing of those types of schedules
was the only user of this policy. So removing this policy should be fine.
Reported-by: Hulk Robot <hulkci@huawei.com>
Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add get_fill_size() routine used to calculate the action size
when building a batch of events.
Fixes: ca9b0e27e ("pkt_action: add new action skbedit")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
'prev_drop_count'
Fixes gcc '-Wunused-but-set-variable' warning:
net/sched/sch_fq_codel.c: In function fq_codel_dequeue:
net/sched/sch_fq_codel.c:288:23: warning: variable prev_ecn_mark set but not used [-Wunused-but-set-variable]
net/sched/sch_fq_codel.c:288:6: warning: variable prev_drop_count set but not used [-Wunused-but-set-variable]
They are not used since commit 77ddaff218fc ("fq_codel: Kill
useless per-flow dropped statistic")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In error case, all entries should be freed from the sched list
before deleting it. For simplicity use rcu way.
Fixes: 5a781ccbd19e46 ("tc: Add support for configuring the taprio scheduler")
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It provide a callback list to find the blocks of tc
and nft subsystems
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
move tc indirect block to flow_offload and rename
it to flow indirect block.The nf_tables can use the
indr block architecture.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch make indr_block_call don't access struct tc_indr_block_cb
and tc_indr_block_dev directly
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove the tcf_block in the tc_indr_block_dev for muti-subsystem
support.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch make tc_indr_block_ing_cmd can't access struct
tc_indr_block_dev and tc_indr_block_cb.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Just minor overlapping changes in the conflicts here.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
TC mirred actions (redirect and mirred) can send to egress or ingress of a
device. Currently only egress is used for hw offload rules.
Modify the intermediate representation for hw offload to include mirred
actions that go to ingress. This gives drivers access to such rules and
can decide whether or not to offload them.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
TC rules can impliment skbedit actions. Currently actions that modify the
skb mark are passed to offloading drivers via the hardware intermediate
representation in the flow_offload API.
Extend this to include skbedit actions that modify the packet type of the
skb. Such actions may be used to set the ptype to HOST when redirecting a
packet to ingress.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is almost impossible to get anything other than a 0 out of
flow->dropped statistic with a tc class dump, as it resets to 0
on every round.
It also conflates ecn marks with drops.
It would have been useful had it kept a cumulative drop count, but
it doesn't. This patch doesn't change the API, it just stops
tracking a stat and state that is impossible to measure and nobody
uses.
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the field fq_codel is often used with a smaller memory or
packet limit than the default, and when the bulk dropper is hit,
the drop pattern bifircates into one that more slowly increases
the codel drop rate and hits the bulk dropper more than it should.
The scan through the 1024 queues happens more often than it needs to.
This patch increases the codel count in the bulk dropper, but
does not change the drop rate there, relying on the next codel round
to deliver the next packet at the original drop rate
(after that burst of loss), then escalate to a higher signaling rate.
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add get_fill_size() routine used to calculate the action size
when building a batch of events.
Fixes: c7e2b9689 ("sched: introduce vlan action")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently init call of all actions (except ipt) init their 'parm'
structure as a direct pointer to nla data in skb. This leads to race
condition when some of the filter actions were initialized successfully
(and were assigned with idr action index that was written directly
into nla data), but then were deleted and retried (due to following
action module missing or classifier-initiated retry), in which case
action init code tries to insert action to idr with index that was
assigned on previous iteration. During retry the index can be reused
by another action that was inserted concurrently, which causes
unintended action sharing between filters.
To fix described race condition, save action idr index to temporary
stack-allocated variable instead on nla data.
Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In dequeue_func(), there is an if statement on line 74 to check whether
skb is NULL:
if (skb)
When skb is NULL, it is used on line 77:
prefetch(&skb->end);
Thus, a possible null-pointer dereference may occur.
To fix this bug, skb->end is used when skb is not NULL.
This bug is found by a static analysis tool STCheck written by us.
Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM")
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
act_ife at least requires TCA_IFE_PARMS, so we have to bail out
when there is no attribute passed in.
Reported-by: syzbot+fbb5b288c9cb6a2eeac4@syzkaller.appspotmail.com
Fixes: ef6980b6becb ("introduce IFE action")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A recent addition to TC actions is the ability to manipulate the MPLS
headers on packets.
In preparation to offload such actions to hardware, update the IR code to
accept and prepare the new actions.
Note that no driver currently impliments the MPLS dec_ttl action so this
is not included.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In function int tc_new_tfilter() q pointer can be NULL when adding filter
on a shared block. With recent change that resets TCQ_F_CAN_BYPASS after
filter creation, following NULL pointer dereference happens in case parent
block is shared:
[ 212.925060] BUG: kernel NULL pointer dereference, address: 0000000000000010
[ 212.925445] #PF: supervisor write access in kernel mode
[ 212.925709] #PF: error_code(0x0002) - not-present page
[ 212.925965] PGD 8000000827923067 P4D 8000000827923067 PUD 827924067 PMD 0
[ 212.926302] Oops: 0002 [#1] SMP KASAN PTI
[ 212.926539] CPU: 18 PID: 2617 Comm: tc Tainted: G B 5.2.0+ #512
[ 212.926938] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 212.927364] RIP: 0010:tc_new_tfilter+0x698/0xd40
[ 212.927633] Code: 74 0d 48 85 c0 74 08 48 89 ef e8 03 aa 62 00 48 8b 84 24 a0 00 00 00 48 8d 78 10 48 89 44 24 18 e8 4d 0c 6b ff 48 8b 44 24 18 <83> 60 10 f
b 48 85 ed 0f 85 3d fe ff ff e9 4f fe ff ff e8 81 26 f8
[ 212.928607] RSP: 0018:ffff88884fd5f5d8 EFLAGS: 00010296
[ 212.928905] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dffffc0000000000
[ 212.929201] RDX: 0000000000000007 RSI: 0000000000000004 RDI: 0000000000000297
[ 212.929402] RBP: ffff88886bedd600 R08: ffffffffb91d4b51 R09: fffffbfff7616e4d
[ 212.929609] R10: fffffbfff7616e4c R11: ffffffffbb0b7263 R12: ffff88886bc61040
[ 212.929803] R13: ffff88884fd5f950 R14: ffffc900039c5000 R15: ffff88835e927680
[ 212.929999] FS: 00007fe7c50b6480(0000) GS:ffff88886f980000(0000) knlGS:0000000000000000
[ 212.930235] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 212.930394] CR2: 0000000000000010 CR3: 000000085bd04002 CR4: 00000000001606e0
[ 212.930588] Call Trace:
[ 212.930682] ? tc_del_tfilter+0xa40/0xa40
[ 212.930811] ? __lock_acquire+0x5b5/0x2460
[ 212.930948] ? find_held_lock+0x85/0xa0
[ 212.931081] ? tc_del_tfilter+0xa40/0xa40
[ 212.931201] rtnetlink_rcv_msg+0x4ab/0x5f0
[ 212.931332] ? rtnl_dellink+0x490/0x490
[ 212.931454] ? lockdep_hardirqs_on+0x260/0x260
[ 212.931589] ? netlink_deliver_tap+0xab/0x5a0
[ 212.931717] ? match_held_lock+0x1b/0x240
[ 212.931844] netlink_rcv_skb+0xd0/0x200
[ 212.931958] ? rtnl_dellink+0x490/0x490
[ 212.932079] ? netlink_ack+0x440/0x440
[ 212.932205] ? netlink_deliver_tap+0x161/0x5a0
[ 212.932335] ? lock_downgrade+0x360/0x360
[ 212.932457] ? lock_acquire+0xe5/0x210
[ 212.932579] netlink_unicast+0x296/0x350
[ 212.932705] ? netlink_attachskb+0x390/0x390
[ 212.932834] ? _copy_from_iter_full+0xe0/0x3a0
[ 212.932976] netlink_sendmsg+0x394/0x600
[ 212.937998] ? netlink_unicast+0x350/0x350
[ 212.943033] ? move_addr_to_kernel.part.0+0x90/0x90
[ 212.948115] ? netlink_unicast+0x350/0x350
[ 212.953185] sock_sendmsg+0x96/0xa0
[ 212.958099] ___sys_sendmsg+0x482/0x520
[ 212.962881] ? match_held_lock+0x1b/0x240
[ 212.967618] ? copy_msghdr_from_user+0x250/0x250
[ 212.972337] ? lock_downgrade+0x360/0x360
[ 212.976973] ? rwlock_bug.part.0+0x60/0x60
[ 212.981548] ? __mod_node_page_state+0x1f/0xa0
[ 212.986060] ? match_held_lock+0x1b/0x240
[ 212.990567] ? find_held_lock+0x85/0xa0
[ 212.994989] ? do_user_addr_fault+0x349/0x5b0
[ 212.999387] ? lock_downgrade+0x360/0x360
[ 213.003713] ? find_held_lock+0x85/0xa0
[ 213.007972] ? __fget_light+0xa1/0xf0
[ 213.012143] ? sockfd_lookup_light+0x91/0xb0
[ 213.016165] __sys_sendmsg+0xba/0x130
[ 213.020040] ? __sys_sendmsg_sock+0xb0/0xb0
[ 213.023870] ? handle_mm_fault+0x337/0x470
[ 213.027592] ? page_fault+0x8/0x30
[ 213.031316] ? lockdep_hardirqs_off+0xbe/0x100
[ 213.034999] ? mark_held_locks+0x24/0x90
[ 213.038671] ? do_syscall_64+0x1e/0xe0
[ 213.042297] do_syscall_64+0x74/0xe0
[ 213.045828] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 213.049354] RIP: 0033:0x7fe7c527c7b8
[ 213.052792] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f
0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54
[ 213.060269] RSP: 002b:00007ffc3f7908a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 213.064144] RAX: ffffffffffffffda RBX: 000000005d34716f RCX: 00007fe7c527c7b8
[ 213.068094] RDX: 0000000000000000 RSI: 00007ffc3f790910 RDI: 0000000000000003
[ 213.072109] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007fe7c5340cc0
[ 213.076113] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000080
[ 213.080146] R13: 0000000000480640 R14: 0000000000000080 R15: 0000000000000000
[ 213.084147] Modules linked in: act_gact cls_flower sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc intel_rapl_msr intel_rapl_common
[<1;69;32Msb_edac rdma_ucm rdma_cm x86_pkg_temp_thermal iw_cm intel_powerclamp ib_cm coretemp kvm_intel kvm irqbypass mlx5_ib ib_uverbs ib_core crct10dif_pclmul crc32_pc
lmul crc32c_intel ghash_clmulni_intel mlx5_core intel_cstate intel_uncore iTCO_wdt igb iTCO_vendor_support mlxfw mei_me ptp ses intel_rapl_perf mei pcspkr ipmi
_ssif i2c_i801 joydev enclosure pps_core lpc_ich ioatdma wmi dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_vram_helpe
r ttm drm_kms_helper drm mpt3sas raid_class scsi_transport_sas
[ 213.112326] CR2: 0000000000000010
[ 213.117429] ---[ end trace adb58eb0a4ee6283 ]---
Verify that q pointer is not NULL before setting the 'flags' field.
Fixes: 3f05e6886a59 ("net_sched: unset TCQ_F_CAN_BYPASS when adding filters")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This object stores the flow block callbacks that are attached to this
block. Update flow_block_cb_lookup() to take this new object.
This patch restores the block sharing feature.
Fixes: da3eeb904ff4 ("net: flow_offload: add list handling functions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rename this type definition and adapt users.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS,
notably fq_codel, it makes no sense to let packets bypass the TC
filters we setup in any scenario, otherwise our packets steering
policy could not be enforced.
This can be reproduced easily with the following script:
ip li add dev dummy0 type dummy
ifconfig dummy0 up
tc qd add dev dummy0 root fq_codel
tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo
tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo
ping -I dummy0 192.168.112.1
Without this patch, packets are sent directly to dummy0 without
hitting any of the filters. With this patch, packets are redirected
to loopback as expected.
This fix is not perfect, it only unsets the flag but does not set it back
because we have to save the information somewhere in the qdisc if we
really want that. Note, both fq_codel and sfq clear this flag in their
->bind_tcf() but this is clearly not sufficient when we don't use any
class ID.
Fixes: 23624935e0c4 ("net_sched: TCQ_F_CAN_BYPASS generalization")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If NF_NAT is m and NET_ACT_CT is y, build fails:
net/sched/act_ct.o: In function `tcf_ct_act':
act_ct.c:(.text+0x21ac): undefined reference to `nf_ct_nat_ext_add'
act_ct.c:(.text+0x229a): undefined reference to `nf_nat_icmp_reply_translation'
act_ct.c:(.text+0x233a): undefined reference to `nf_nat_setup_info'
act_ct.c:(.text+0x234a): undefined reference to `nf_nat_alloc_null_binding'
act_ct.c:(.text+0x237c): undefined reference to `nf_nat_packet'
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During the review of the iproute2 patches for txtime-assist mode, it was
pointed out that it does not make sense for the txtime-delay parameter to
be negative. So, change the type of the parameter from s32 to u32.
Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode")
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After recent refactoring of block offlads infrastructure, indr_dev->block
pointer is dereferenced before it is verified to be non-NULL. Example stack
trace where this behavior leads to NULL-pointer dereference error when
creating vxlan dev on system with mlx5 NIC with offloads enabled:
[ 1157.852938] ==================================================================
[ 1157.866877] BUG: KASAN: null-ptr-deref in tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1157.880877] Read of size 4 at addr 0000000000000090 by task ip/3829
[ 1157.901637] CPU: 22 PID: 3829 Comm: ip Not tainted 5.2.0-rc6+ #488
[ 1157.914438] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 1157.929031] Call Trace:
[ 1157.938318] dump_stack+0x9a/0xeb
[ 1157.948362] ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1157.960262] ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1157.972082] __kasan_report+0x176/0x192
[ 1157.982513] ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1157.994348] kasan_report+0xe/0x20
[ 1158.004324] tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1158.015950] ? tcf_block_setup+0x430/0x430
[ 1158.026558] ? kasan_unpoison_shadow+0x30/0x40
[ 1158.037464] __tc_indr_block_cb_register+0x5f5/0xf20
[ 1158.049288] ? mlx5e_rep_indr_tc_block_unbind+0xa0/0xa0 [mlx5_core]
[ 1158.062344] ? tc_indr_block_dev_put.part.47+0x5c0/0x5c0
[ 1158.074498] ? rdma_roce_rescan_device+0x20/0x20 [ib_core]
[ 1158.086580] ? br_device_event+0x98/0x480 [bridge]
[ 1158.097870] ? strcmp+0x30/0x50
[ 1158.107578] mlx5e_nic_rep_netdevice_event+0xdd/0x180 [mlx5_core]
[ 1158.120212] notifier_call_chain+0x6d/0xa0
[ 1158.130753] register_netdevice+0x6fc/0x7e0
[ 1158.141322] ? netdev_change_features+0xa0/0xa0
[ 1158.152218] ? vxlan_config_apply+0x210/0x310 [vxlan]
[ 1158.163593] __vxlan_dev_create+0x2ad/0x520 [vxlan]
[ 1158.174770] ? vxlan_changelink+0x490/0x490 [vxlan]
[ 1158.185870] ? rcu_read_unlock+0x60/0x60 [vxlan]
[ 1158.196798] vxlan_newlink+0x99/0xf0 [vxlan]
[ 1158.207303] ? __vxlan_dev_create+0x520/0x520 [vxlan]
[ 1158.218601] ? rtnl_create_link+0x3d0/0x450
[ 1158.228900] __rtnl_newlink+0x8a7/0xb00
[ 1158.238701] ? stack_access_ok+0x35/0x80
[ 1158.248450] ? rtnl_link_unregister+0x1a0/0x1a0
[ 1158.258735] ? find_held_lock+0x6d/0xd0
[ 1158.268379] ? is_bpf_text_address+0x67/0xf0
[ 1158.278330] ? lock_acquire+0xc1/0x1f0
[ 1158.287686] ? is_bpf_text_address+0x5/0xf0
[ 1158.297449] ? is_bpf_text_address+0x86/0xf0
[ 1158.307310] ? kernel_text_address+0xec/0x100
[ 1158.317155] ? arch_stack_walk+0x92/0xe0
[ 1158.326497] ? __kernel_text_address+0xe/0x30
[ 1158.336213] ? unwind_get_return_address+0x2f/0x50
[ 1158.346267] ? create_prof_cpu_mask+0x20/0x20
[ 1158.355936] ? arch_stack_walk+0x92/0xe0
[ 1158.365117] ? stack_trace_save+0x8a/0xb0
[ 1158.374272] ? stack_trace_consume_entry+0x80/0x80
[ 1158.384226] ? match_held_lock+0x33/0x210
[ 1158.393216] ? kasan_unpoison_shadow+0x30/0x40
[ 1158.402593] rtnl_newlink+0x53/0x80
[ 1158.410925] rtnetlink_rcv_msg+0x3a5/0x600
[ 1158.419777] ? validate_linkmsg+0x400/0x400
[ 1158.428620] ? find_held_lock+0x6d/0xd0
[ 1158.437117] ? match_held_lock+0x1b/0x210
[ 1158.445760] ? validate_linkmsg+0x400/0x400
[ 1158.454642] netlink_rcv_skb+0xc7/0x1f0
[ 1158.463150] ? netlink_ack+0x470/0x470
[ 1158.471538] ? netlink_deliver_tap+0x1f3/0x5a0
[ 1158.480607] netlink_unicast+0x2ae/0x350
[ 1158.489099] ? netlink_attachskb+0x340/0x340
[ 1158.497935] ? _copy_from_iter_full+0xde/0x3b0
[ 1158.506945] ? __virt_addr_valid+0xb6/0xf0
[ 1158.515578] ? __check_object_size+0x159/0x240
[ 1158.524515] netlink_sendmsg+0x4d3/0x630
[ 1158.532879] ? netlink_unicast+0x350/0x350
[ 1158.541400] ? netlink_unicast+0x350/0x350
[ 1158.549805] sock_sendmsg+0x94/0xa0
[ 1158.557561] ___sys_sendmsg+0x49d/0x570
[ 1158.565625] ? copy_msghdr_from_user+0x210/0x210
[ 1158.574457] ? __fput+0x1e2/0x330
[ 1158.581948] ? __kasan_slab_free+0x130/0x180
[ 1158.590407] ? kmem_cache_free+0xb6/0x2d0
[ 1158.598574] ? mark_lock+0xc7/0x790
[ 1158.606177] ? task_work_run+0xcf/0x100
[ 1158.614165] ? exit_to_usermode_loop+0x102/0x110
[ 1158.622954] ? __lock_acquire+0x963/0x1ee0
[ 1158.631199] ? lockdep_hardirqs_on+0x260/0x260
[ 1158.639777] ? match_held_lock+0x1b/0x210
[ 1158.647918] ? lockdep_hardirqs_on+0x260/0x260
[ 1158.656501] ? match_held_lock+0x1b/0x210
[ 1158.664643] ? __fget_light+0xa6/0xe0
[ 1158.672423] ? __sys_sendmsg+0xd2/0x150
[ 1158.680334] __sys_sendmsg+0xd2/0x150
[ 1158.688063] ? __ia32_sys_shutdown+0x30/0x30
[ 1158.696435] ? lock_downgrade+0x2e0/0x2e0
[ 1158.704541] ? mark_held_locks+0x1a/0x90
[ 1158.712611] ? mark_held_locks+0x1a/0x90
[ 1158.720619] ? do_syscall_64+0x1e/0x2c0
[ 1158.728530] do_syscall_64+0x78/0x2c0
[ 1158.736254] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1158.745414] RIP: 0033:0x7f62d505cb87
[ 1158.753070] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00[87/1817]
48 89 f3 48
[ 1158.780924] RSP: 002b:00007fffd9832268 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 1158.793204] RAX: ffffffffffffffda RBX: 000000005d26048f RCX: 00007f62d505cb87
[ 1158.805111] RDX: 0000000000000000 RSI: 00007fffd98322d0 RDI: 0000000000000003
[ 1158.817055] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
[ 1158.828987] R10: 00007f62d50ce260 R11: 0000000000000246 R12: 0000000000000001
[ 1158.840909] R13: 000000000067e540 R14: 0000000000000000 R15: 000000000067ed20
[ 1158.852873] ==================================================================
Introduce new function tcf_block_non_null_shared() that verifies block
pointer before dereferencing it to obtain index. Use the function in
tc_indr_block_ing_cmd() to prevent NULL pointer dereference.
Fixes: 955bcb6ea0df ("drivers: net: use flow block API")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|