Age | Commit message (Collapse) | Author |
|
If crashed kernel does not shutdown the NIC properly, PCIe FLR
is required in the kdump kernel in order to initialize all the
functions properly.
Fixes: d629522e1d66 ("bnxt_en: Reduce memory usage when running in kdump kernel.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Especially when bnxt_shutdown() is called during kexec, we need to
disable MSIX and disable Bus Master to completely quiesce the device.
Make these 2 calls unconditionally in the shutdown method.
Fixes: c20dc142dd7b ("bnxt_en: Disable bus master during PCI shutdown and driver unload.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While it is not yet understood why a TX underflow can easily occur
for SGMII interfaces resulting in a TX wedge. It has been found that
disabling/re-enabling the LMAC resolves the issue.
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Reviewed-by: Robert Jones <rjones@gateworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The fw_status field is only 8 bits, so fix the read. Also,
we only want to look at the one status bit, to allow for future
use of the other bits, and watch for a bad PCI read.
Fixes: 97ca486592c0 ("ionic: add heartbeat check")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
at91ether_init was handling the phy mode and speed but since the switch to
phylink, the NCFGR register got overwritten by macb_mac_config(). The issue
is that the RM9200_RMII bit and the MACB_CLK_DIV32 field are cleared
but never restored as they conflict with the PAE, GBE and PCSSEL bits.
Add new capability to differentiate between EMAC and the other versions of
the IP and use it to set and avoid clearing the relevant bits.
Also, this fixes a NULL pointer dereference in macb_mac_link_up as the EMAC
doesn't use any rings/bufffers/queues.
Fixes: 7897b071ac3b ("net: macb: convert to phylink")
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The configuration/command below is failing when the VF in the xml
file is already bound to the host iavf driver.
pci_0000_af_0_0.xml:
<interface type='hostdev' managed='yes'>
<source>
<address type='pci' domain='0x0000' bus='0xaf' slot='0x0' function='0x0'/>
</source>
<mac address='00:de:ad:00:11:01'/>
</interface>
> virsh attach-device domain_name pci_0000_af_0_0.xml
error: Failed to attach device from pci_0000_af_0_0.xml
error: Cannot set interface MAC/vlanid to 00:de:ad:00:11:01/0 for
ifname ens1f1 vf 0: Device or resource busy
This is failing because the VF has not been completely removed/reset
after being unbound (via the virsh command above) from the host iavf
driver and ice_set_vf_mac() checks if the VF is disabled before waiting
for the reset to finish.
Fix this by waiting for the VF remove/reset process to happen before
checking if the VF is disabled. Also, since many functions for VF
administration on the PF were more or less calling the same 3 functions
(ice_wait_on_vf_reset(), ice_is_vf_disabled(), and ice_check_vf_init())
move these into the helper function ice_check_vf_ready_for_cfg(). Then
call this function in any flow that attempts to configure/query a VF
from the PF.
Lastly, increase the maximum wait time in ice_wait_on_vf_reset() to
800ms, and modify/add the #define(s) that determine the wait time.
This was done for robustness because in rare/stress cases VF removal can
take a max of ~800ms and previously the wait was a max of ~300ms.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Remove code that tell the OS that link is going down when user
change flow control via ethtool. When link is up it isn't certain
that link goes down after 0x0605 aq command. If link doesn't go
down, OS thinks that link is down, but physical link is up. To
reset this state user have to take interface down and up.
If link goes down after 0x0605 command, FW send information
about that and after that driver tells the OS that the link goes
down. So this code in ethtool is unnecessary.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Currently if a user sets an odd [tx|rx]-usecs value through ethtool,
the request is denied because the hardware is set to have an ITR
granularity of 2us. This caused poor customer experience. Fix this by
aligning to a register allowed value, which results in rounding down.
Also, print a once per ring container type message to be clear about
our intentions.
Also, change the ITR_TO_REG define to be the bitwise and of the ITR
setting and the ICE_ITR_MASK. This makes the purpose of ITR_TO_REG more
obvious.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
On flow table creation, send the relevant flags according to what the FW
currently supports.
When FW doesn't support reformat option over SW-steering managed table,
the driver shouldn't pass this.
Fixes: 988fd6b32d07 ("net/mlx5: DR, Pass table flags at creation to lower layer")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
The pool sizes represent the pool sizes in the fw. when we request
a pool size from fw, it will return the next possible group.
We track how many pools the fw has left and start requesting groups
from the big to the small.
When we start request 4k group, which doesn't exists in fw, fw
wants to allocate the next possible size, 64k, but will fail since
its exhausted. The correct smallest pool size in fw is 128 and not 4k.
Fixes: 39ac237ce009 ("net/mlx5: E-Switch, Refactor chains and priorities")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
There is no need to reset all vf config (except link state) between
legacy and switchdev modes changes.
Also, set link state to AUTO, when legacy enabled.
Fixes: 3b83b6c2e024 ("net/mlx5e: Clear VF config when switching modes")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Set vport gvmi in the tag, only when source gvmi is set in the bit mask.
Fixes: 26d688e3 ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
When health reporters are not supported, recovery function is invoked
directly, not via devlink health reporters.
In this direct flow, the recover function input parameter was passed
incorrectly and is causing a kernel oops. This patch is fixing the input
parameter.
Following call trace is observed on rx error health reporting.
Internal error: Oops: 96000007 [#1] PREEMPT SMP
Process kworker/u16:4 (pid: 4584, stack limit = 0x00000000c9e45703)
Call trace:
mlx5e_rx_reporter_err_rq_cqe_recover+0x30/0x164 [mlx5_core]
mlx5e_health_report+0x60/0x6c [mlx5_core]
mlx5e_reporter_rq_cqe_err+0x6c/0x90 [mlx5_core]
mlx5e_rq_err_cqe_work+0x20/0x2c [mlx5_core]
process_one_work+0x168/0x3d0
worker_thread+0x58/0x3d0
kthread+0x108/0x134
Fixes: c50de4af1d63 ("net/mlx5e: Generalize tx reporter's functionality")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Initialize RQ doorbell counters to zero prior to moving an RQ from RST
to RDY state. Per HW spec, when RQ is back to RDY state, the descriptor
ID on the completion is reset. The doorbell record must comply.
Fixes: 8276ea1353a4 ("net/mlx5e: Report and recover from CQE with error on RQ")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reported-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
rtnl_bridge_getlink is protected by rcu lock, so mlx5_eswitch_get_vepa
cannot take mutex lock. Two possible issues can happen:
1. User at the same time change vepa mode via RTM_SETLINK command.
2. User at the same time change the switchdev mode via devlink netlink
interface.
Case 1 cannot happen because rtnl executes one message in order.
Case 2 can happen but we do not expect user to change the switchdev mode
when changing vepa. Even if a user does it, so he will read a value
which is no longer valid.
Fixes: 8da202b24913 ("net/mlx5: E-Switch, Add support for VEPA in legacy mode.")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If an event is added while the rdma workqueue is being destroyed
it could lead to several races, list corruption, null pointer
dereference during queue_work or init_queue.
This fixes the race between the two flows which can occur during
shutdown.
A kref object and a completion object are added to the rdma_dev
structure, these are initialized before the workqueue is created.
The refcnt is used to indicate work is being added to the
workqueue and ensures the cleanup flow won't start while we're in
the middle of adding the event.
Once the work is added, the refcnt is decreased and the cleanup flow
is safe to run.
Fixes: cee9fbd8e2e ("qede: Add qedr framework")
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The reserved member should be named reserved3.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Each extracted frame on Ocelot has an IFH. The frame and IFH are extracted
by reading chuncks of 4 bytes from a register.
In case the IFH and frames were read corretly it would try to read the next
frame. In case there are no more frames in the queue, it checks if there
were any previous errors and in that case clear the queue. But this check
will always succeed also when there are no errors. Because when extracting
the IFH the error is checked against 4(number of bytes read) and then the
error is set only if the extraction of the frame failed. So in a happy case
where there are no errors the err variable is still 4. So it could be
a case where after the check that there are no more frames in the queue, a
frame will arrive in the queue but because the error is not reseted, it
would try to flush the queue. So the frame will be lost.
The fix consist in resetting the error after reading the IFH.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The call to of_get_mac_address() can return -EPROBE_DEFER, for instance
when the MAC address is read from a NVMEM driver that did not probe yet.
Cc: H. Nikolaus Schaller <hns@goldelico.com>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The Micrel KSZ8851-16MLLI datasheet DS00002357B page 12 states that
BE[3:0] signals are active high. This contradicts the measurements
of the behavior of the actual chip, where these signals behave as
active low. For example, to read the CIDER register, the bus must
expose 0xc0c0 during the address phase, which means BE[3:0]=4'b1100.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The packet data written to and read from Micrel KSZ8851-16MLLI must be
byte-swapped in 16-bit mode, add this byte-swapping.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This driver is mixing 8-bit and 16-bit bus accessors for reasons unknown,
however the speculation is that this was some sort of attempt to support
the 8-bit bus mode.
As per the KS8851-16MLL documentation, all two registers accessed via the
8-bit accessors are internally 16-bit registers, so reading them using
16-bit accessors is fine. The KS_CCR read can be converted to 16-bit read
outright, as it is already a concatenation of two 8-bit reads of that
register. The KS_RXQCR accesses are 8-bit only, however writing the top
8 bits of the register is OK as well, since the driver caches the entire
16-bit register value anyway.
Finally, the driver is not used by any hardware in the kernel right now.
The only hardware available to me is one with 16-bit bus, so I have no
way to test the 8-bit bus mode, however it is unlikely this ever really
worked anyway. If the 8-bit bus mode is ever required, it can be easily
added by adjusting the 16-bit accessors to do 2 consecutive accesses,
which is how this should have been done from the beginning.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the "struct bonding", there is stats_lock.
This lock protects "bond_stats" in the "struct bonding".
bond_stats is updated in the bond_get_stats() and this function would be
executed concurrently. So, the lock is needed.
Bonding interfaces would be nested.
So, either stats_lock should use dynamic lockdep class key or stats_lock
should be used by spin_lock_nested(). In the current code, stats_lock is
using a dynamic lockdep class key.
But there is no updating stats_lock_key routine So, lockdep warning
will occur.
Test commands:
ip link add bond0 type bond
ip link add bond1 type bond
ip link set bond0 master bond1
ip link set bond0 nomaster
ip link set bond1 master bond0
Splat looks like:
[ 38.420603][ T957] 5.5.0+ #394 Not tainted
[ 38.421074][ T957] ------------------------------------------------------
[ 38.421837][ T957] ip/957 is trying to acquire lock:
[ 38.422399][ T957] ffff888063262cd8 (&bond->stats_lock_key#2){+.+.}, at: bond_get_stats+0x90/0x4d0 [bonding]
[ 38.423528][ T957]
[ 38.423528][ T957] but task is already holding lock:
[ 38.424526][ T957] ffff888065fd2cd8 (&bond->stats_lock_key){+.+.}, at: bond_get_stats+0x90/0x4d0 [bonding]
[ 38.426075][ T957]
[ 38.426075][ T957] which lock already depends on the new lock.
[ 38.426075][ T957]
[ 38.428536][ T957]
[ 38.428536][ T957] the existing dependency chain (in reverse order) is:
[ 38.429475][ T957]
[ 38.429475][ T957] -> #1 (&bond->stats_lock_key){+.+.}:
[ 38.430273][ T957] _raw_spin_lock+0x30/0x70
[ 38.430812][ T957] bond_get_stats+0x90/0x4d0 [bonding]
[ 38.431451][ T957] dev_get_stats+0x1ec/0x270
[ 38.432088][ T957] bond_get_stats+0x1a5/0x4d0 [bonding]
[ 38.432767][ T957] dev_get_stats+0x1ec/0x270
[ 38.433322][ T957] rtnl_fill_stats+0x44/0xbe0
[ 38.433866][ T957] rtnl_fill_ifinfo+0xeb2/0x3720
[ 38.434474][ T957] rtmsg_ifinfo_build_skb+0xca/0x170
[ 38.435081][ T957] rtmsg_ifinfo_event.part.33+0x1b/0xb0
[ 38.436848][ T957] rtnetlink_event+0xcd/0x120
[ 38.437455][ T957] notifier_call_chain+0x90/0x160
[ 38.438067][ T957] netdev_change_features+0x74/0xa0
[ 38.438708][ T957] bond_compute_features.isra.45+0x4e6/0x6f0 [bonding]
[ 38.439522][ T957] bond_enslave+0x3639/0x47b0 [bonding]
[ 38.440225][ T957] do_setlink+0xaab/0x2ef0
[ 38.440786][ T957] __rtnl_newlink+0x9c5/0x1270
[ 38.441463][ T957] rtnl_newlink+0x65/0x90
[ 38.442075][ T957] rtnetlink_rcv_msg+0x4a8/0x890
[ 38.442774][ T957] netlink_rcv_skb+0x121/0x350
[ 38.443451][ T957] netlink_unicast+0x42e/0x610
[ 38.444282][ T957] netlink_sendmsg+0x65a/0xb90
[ 38.444992][ T957] ____sys_sendmsg+0x5ce/0x7a0
[ 38.445679][ T957] ___sys_sendmsg+0x10f/0x1b0
[ 38.446365][ T957] __sys_sendmsg+0xc6/0x150
[ 38.447007][ T957] do_syscall_64+0x99/0x4f0
[ 38.447668][ T957] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 38.448538][ T957]
[ 38.448538][ T957] -> #0 (&bond->stats_lock_key#2){+.+.}:
[ 38.449554][ T957] __lock_acquire+0x2d8d/0x3de0
[ 38.450148][ T957] lock_acquire+0x164/0x3b0
[ 38.450711][ T957] _raw_spin_lock+0x30/0x70
[ 38.451292][ T957] bond_get_stats+0x90/0x4d0 [bonding]
[ 38.451950][ T957] dev_get_stats+0x1ec/0x270
[ 38.452425][ T957] bond_get_stats+0x1a5/0x4d0 [bonding]
[ 38.453362][ T957] dev_get_stats+0x1ec/0x270
[ 38.453825][ T957] rtnl_fill_stats+0x44/0xbe0
[ 38.454390][ T957] rtnl_fill_ifinfo+0xeb2/0x3720
[ 38.456257][ T957] rtmsg_ifinfo_build_skb+0xca/0x170
[ 38.456998][ T957] rtmsg_ifinfo_event.part.33+0x1b/0xb0
[ 38.459351][ T957] rtnetlink_event+0xcd/0x120
[ 38.460086][ T957] notifier_call_chain+0x90/0x160
[ 38.460829][ T957] netdev_change_features+0x74/0xa0
[ 38.461752][ T957] bond_compute_features.isra.45+0x4e6/0x6f0 [bonding]
[ 38.462705][ T957] bond_enslave+0x3639/0x47b0 [bonding]
[ 38.463476][ T957] do_setlink+0xaab/0x2ef0
[ 38.464141][ T957] __rtnl_newlink+0x9c5/0x1270
[ 38.464897][ T957] rtnl_newlink+0x65/0x90
[ 38.465522][ T957] rtnetlink_rcv_msg+0x4a8/0x890
[ 38.466215][ T957] netlink_rcv_skb+0x121/0x350
[ 38.466895][ T957] netlink_unicast+0x42e/0x610
[ 38.467583][ T957] netlink_sendmsg+0x65a/0xb90
[ 38.468285][ T957] ____sys_sendmsg+0x5ce/0x7a0
[ 38.469202][ T957] ___sys_sendmsg+0x10f/0x1b0
[ 38.469884][ T957] __sys_sendmsg+0xc6/0x150
[ 38.470587][ T957] do_syscall_64+0x99/0x4f0
[ 38.471245][ T957] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 38.472093][ T957]
[ 38.472093][ T957] other info that might help us debug this:
[ 38.472093][ T957]
[ 38.473438][ T957] Possible unsafe locking scenario:
[ 38.473438][ T957]
[ 38.474898][ T957] CPU0 CPU1
[ 38.476234][ T957] ---- ----
[ 38.480171][ T957] lock(&bond->stats_lock_key);
[ 38.480808][ T957] lock(&bond->stats_lock_key#2);
[ 38.481791][ T957] lock(&bond->stats_lock_key);
[ 38.482754][ T957] lock(&bond->stats_lock_key#2);
[ 38.483416][ T957]
[ 38.483416][ T957] *** DEADLOCK ***
[ 38.483416][ T957]
[ 38.484505][ T957] 3 locks held by ip/957:
[ 38.485048][ T957] #0: ffffffffbccf6230 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x457/0x890
[ 38.486198][ T957] #1: ffff888065fd2cd8 (&bond->stats_lock_key){+.+.}, at: bond_get_stats+0x90/0x4d0 [bonding]
[ 38.487625][ T957] #2: ffffffffbc9254c0 (rcu_read_lock){....}, at: bond_get_stats+0x5/0x4d0 [bonding]
[ 38.488897][ T957]
[ 38.488897][ T957] stack backtrace:
[ 38.489646][ T957] CPU: 1 PID: 957 Comm: ip Not tainted 5.5.0+ #394
[ 38.490497][ T957] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 38.492810][ T957] Call Trace:
[ 38.493219][ T957] dump_stack+0x96/0xdb
[ 38.493709][ T957] check_noncircular+0x371/0x450
[ 38.494344][ T957] ? lookup_address+0x60/0x60
[ 38.494923][ T957] ? print_circular_bug.isra.35+0x310/0x310
[ 38.495699][ T957] ? hlock_class+0x130/0x130
[ 38.496334][ T957] ? __lock_acquire+0x2d8d/0x3de0
[ 38.496979][ T957] __lock_acquire+0x2d8d/0x3de0
[ 38.497607][ T957] ? register_lock_class+0x14d0/0x14d0
[ 38.498333][ T957] ? check_chain_key+0x236/0x5d0
[ 38.499003][ T957] lock_acquire+0x164/0x3b0
[ 38.499800][ T957] ? bond_get_stats+0x90/0x4d0 [bonding]
[ 38.500706][ T957] _raw_spin_lock+0x30/0x70
[ 38.501435][ T957] ? bond_get_stats+0x90/0x4d0 [bonding]
[ 38.502311][ T957] bond_get_stats+0x90/0x4d0 [bonding]
[ ... ]
But, there is another problem.
The dynamic lockdep class key is protected by RTNL, but bond_get_stats()
would be called outside of RTNL.
So, it would use an invalid dynamic lockdep class key.
In order to fix this issue, stats_lock uses spin_lock_nested() instead of
a dynamic lockdep key.
The bond_get_stats() calls bond_get_lowest_level_rcu() to get the correct
nest level value, which will be used by spin_lock_nested().
The "dev->lower_level" indicates lower nest level value, but this value
is invalid outside of RTNL.
So, bond_get_lowest_level_rcu() returns valid lower nest level value in
the RCU critical section.
bond_get_lowest_level_rcu() will be work only when LOCKDEP is enabled.
Fixes: 089bca2caed0 ("bonding: use dynamic lockdep key instead of subclass")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After bond_release(), netdev_update_lockdep_key() should be called.
But both ioctl path and attribute path don't call
netdev_update_lockdep_key().
This patch adds missing netdev_update_lockdep_key().
Test commands:
ip link add bond0 type bond
ip link add bond1 type bond
ifenslave bond0 bond1
ifenslave -d bond0 bond1
ifenslave bond1 bond0
Splat looks like:
[ 29.501182][ T1046] WARNING: possible circular locking dependency detected
[ 29.501945][ T1039] hardirqs last disabled at (1962): [<ffffffffac6c807f>] handle_mm_fault+0x13f/0x700
[ 29.503442][ T1046] 5.5.0+ #322 Not tainted
[ 29.503447][ T1046] ------------------------------------------------------
[ 29.504277][ T1039] softirqs last enabled at (1180): [<ffffffffade00678>] __do_softirq+0x678/0x981
[ 29.505443][ T1046] ifenslave/1046 is trying to acquire lock:
[ 29.505886][ T1039] softirqs last disabled at (1169): [<ffffffffac19c18a>] irq_exit+0x17a/0x1a0
[ 29.509997][ T1046] ffff88805d5da280 (&dev->addr_list_lock_key#3){+...}, at: dev_mc_sync_multiple+0x95/0x120
[ 29.511243][ T1046]
[ 29.511243][ T1046] but task is already holding lock:
[ 29.512192][ T1046] ffff8880460f2280 (&dev->addr_list_lock_key#4){+...}, at: bond_enslave+0x4482/0x47b0 [bonding]
[ 29.514124][ T1046]
[ 29.514124][ T1046] which lock already depends on the new lock.
[ 29.514124][ T1046]
[ 29.517297][ T1046]
[ 29.517297][ T1046] the existing dependency chain (in reverse order) is:
[ 29.518231][ T1046]
[ 29.518231][ T1046] -> #1 (&dev->addr_list_lock_key#4){+...}:
[ 29.519076][ T1046] _raw_spin_lock+0x30/0x70
[ 29.519588][ T1046] dev_mc_sync_multiple+0x95/0x120
[ 29.520208][ T1046] bond_enslave+0x448d/0x47b0 [bonding]
[ 29.520862][ T1046] bond_option_slaves_set+0x1a3/0x370 [bonding]
[ 29.521640][ T1046] __bond_opt_set+0x1ff/0xbb0 [bonding]
[ 29.522438][ T1046] __bond_opt_set_notify+0x2b/0xf0 [bonding]
[ 29.523251][ T1046] bond_opt_tryset_rtnl+0x92/0xf0 [bonding]
[ 29.524082][ T1046] bonding_sysfs_store_option+0x8a/0xf0 [bonding]
[ 29.524959][ T1046] kernfs_fop_write+0x276/0x410
[ 29.525620][ T1046] vfs_write+0x197/0x4a0
[ 29.526218][ T1046] ksys_write+0x141/0x1d0
[ 29.526818][ T1046] do_syscall_64+0x99/0x4f0
[ 29.527430][ T1046] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 29.528265][ T1046]
[ 29.528265][ T1046] -> #0 (&dev->addr_list_lock_key#3){+...}:
[ 29.529272][ T1046] __lock_acquire+0x2d8d/0x3de0
[ 29.529935][ T1046] lock_acquire+0x164/0x3b0
[ 29.530638][ T1046] _raw_spin_lock+0x30/0x70
[ 29.531187][ T1046] dev_mc_sync_multiple+0x95/0x120
[ 29.531790][ T1046] bond_enslave+0x448d/0x47b0 [bonding]
[ 29.532451][ T1046] bond_option_slaves_set+0x1a3/0x370 [bonding]
[ 29.533163][ T1046] __bond_opt_set+0x1ff/0xbb0 [bonding]
[ 29.533789][ T1046] __bond_opt_set_notify+0x2b/0xf0 [bonding]
[ 29.534595][ T1046] bond_opt_tryset_rtnl+0x92/0xf0 [bonding]
[ 29.535500][ T1046] bonding_sysfs_store_option+0x8a/0xf0 [bonding]
[ 29.536379][ T1046] kernfs_fop_write+0x276/0x410
[ 29.537057][ T1046] vfs_write+0x197/0x4a0
[ 29.537640][ T1046] ksys_write+0x141/0x1d0
[ 29.538251][ T1046] do_syscall_64+0x99/0x4f0
[ 29.538870][ T1046] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 29.539659][ T1046]
[ 29.539659][ T1046] other info that might help us debug this:
[ 29.539659][ T1046]
[ 29.540953][ T1046] Possible unsafe locking scenario:
[ 29.540953][ T1046]
[ 29.541883][ T1046] CPU0 CPU1
[ 29.542540][ T1046] ---- ----
[ 29.543209][ T1046] lock(&dev->addr_list_lock_key#4);
[ 29.543880][ T1046] lock(&dev->addr_list_lock_key#3);
[ 29.544873][ T1046] lock(&dev->addr_list_lock_key#4);
[ 29.545863][ T1046] lock(&dev->addr_list_lock_key#3);
[ 29.546525][ T1046]
[ 29.546525][ T1046] *** DEADLOCK ***
[ 29.546525][ T1046]
[ 29.547542][ T1046] 5 locks held by ifenslave/1046:
[ 29.548196][ T1046] #0: ffff88806044c478 (sb_writers#5){.+.+}, at: vfs_write+0x3bb/0x4a0
[ 29.549248][ T1046] #1: ffff88805af00890 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cf/0x410
[ 29.550343][ T1046] #2: ffff88805b8b54b0 (kn->count#157){.+.+}, at: kernfs_fop_write+0x1f2/0x410
[ 29.551575][ T1046] #3: ffffffffaecf4cf0 (rtnl_mutex){+.+.}, at: bond_opt_tryset_rtnl+0x5f/0xf0 [bonding]
[ 29.552819][ T1046] #4: ffff8880460f2280 (&dev->addr_list_lock_key#4){+...}, at: bond_enslave+0x4482/0x47b0 [bonding]
[ 29.554175][ T1046]
[ 29.554175][ T1046] stack backtrace:
[ 29.554907][ T1046] CPU: 0 PID: 1046 Comm: ifenslave Not tainted 5.5.0+ #322
[ 29.555854][ T1046] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 29.557064][ T1046] Call Trace:
[ 29.557504][ T1046] dump_stack+0x96/0xdb
[ 29.558054][ T1046] check_noncircular+0x371/0x450
[ 29.558723][ T1046] ? print_circular_bug.isra.35+0x310/0x310
[ 29.559486][ T1046] ? hlock_class+0x130/0x130
[ 29.560100][ T1046] ? __lock_acquire+0x2d8d/0x3de0
[ 29.560761][ T1046] __lock_acquire+0x2d8d/0x3de0
[ 29.561366][ T1046] ? register_lock_class+0x14d0/0x14d0
[ 29.562045][ T1046] ? find_held_lock+0x39/0x1d0
[ 29.562641][ T1046] lock_acquire+0x164/0x3b0
[ 29.563199][ T1046] ? dev_mc_sync_multiple+0x95/0x120
[ 29.563872][ T1046] _raw_spin_lock+0x30/0x70
[ 29.564464][ T1046] ? dev_mc_sync_multiple+0x95/0x120
[ 29.565146][ T1046] dev_mc_sync_multiple+0x95/0x120
[ 29.565793][ T1046] bond_enslave+0x448d/0x47b0 [bonding]
[ 29.566487][ T1046] ? bond_update_slave_arr+0x940/0x940 [bonding]
[ 29.567279][ T1046] ? bstr_printf+0xc20/0xc20
[ 29.567857][ T1046] ? stack_trace_consume_entry+0x160/0x160
[ 29.568614][ T1046] ? deactivate_slab.isra.77+0x2c5/0x800
[ 29.569320][ T1046] ? check_chain_key+0x236/0x5d0
[ 29.569939][ T1046] ? sscanf+0x93/0xc0
[ 29.570442][ T1046] ? vsscanf+0x1e20/0x1e20
[ 29.571003][ T1046] bond_option_slaves_set+0x1a3/0x370 [bonding]
[ ... ]
Fixes: ab92d68fc22f ("net: core: add generic lockdep keys")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need to ensure that the default VID is untagged otherwise the switch
will be sending tagged frames and the results can be problematic. This
is especially true with b53 switches that use VID 0 as their default
VLAN since VID 0 has a special meaning.
Fixes: fea83353177a ("net: dsa: b53: Fix default VLAN ID")
Fixes: 061f6a505ac3 ("net: dsa: Add ndo_vlan_rx_{add, kill}_vid implementation")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
synchronize_net() is a wrapper around synchronize_rcu(), so there's no
point in having synchronize_net and synchronize_rcu back to back,
despite the documentation comment suggesting maybe it's somewhat useful,
"Wait for packets currently being received to be done." This commit
removes the extra call.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It turns out there's an easy way to get packets queued up while still
having an MTU of zero, and that's via persistent keep alive. This commit
makes sure that in whatever condition, we don't wind up dividing by
zero. Note that an MTU of zero for a wireguard interface is something
quasi-valid, so I don't think the correct fix is to limit it via
min_mtu. This can be reproduced easily with:
ip link add wg0 type wireguard
ip link add wg1 type wireguard
ip link set wg0 up mtu 0
ip link set wg1 up
wg set wg0 private-key <(wg genkey)
wg set wg1 listen-port 1 private-key <(wg genkey) peer $(wg show wg0 public-key)
wg set wg0 peer $(wg show wg1 public-key) persistent-keepalive 1 endpoint 127.0.0.1:1
However, while min_mtu=0 seems fine, it makes sense to restrict the
max_mtu. This commit also restricts the maximum MTU to the greatest
number for which rounding up to the padding multiple won't overflow a
signed integer. Packets this large were always rejected anyway
eventually, due to checks deeper in, but it seems more sound not to even
let the administrator configure something that won't work anyway.
We use this opportunity to clean up this function a bit so that it's
clear which paths we're expecting.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is a small optimization that prevents more expensive comparisons
from happening when they are no longer necessary, by clearing the
last_under_load variable whenever we wind up in a state where we were
under load but we no longer are.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Suggested-by: Matt Dunwoodie <ncon@noconroy.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The mii management register in iproc mdio block
does not have a retention register so it is lost on suspend.
Save and restore value of register while resuming from suspend.
Fixes: bb1a619735b4 ("net: phy: Initialize mdio clock at probe function")
Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
fix static checker warning:
drivers/net/ethernet/aquantia/atlantic/aq_filters.c:166 aq_check_approve_fvlan()
error: passing untrusted data to 'test_bit()'
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 7975d2aff5af: ("net: aquantia: add support of rx-vlan-filter offload")
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
during hibernation freeze, aq_nic_stop could be invoked
on a stopped device. That may cause panic on access to
not yet allocated vector/ring structures.
Add a check to stop device if it is not yet stopped.
Similiarly after freeze in hibernation thaw, aq_nic_start
could be invoked on a not initialized net device.
Result will be the same.
Add a check to start device if it is initialized.
In our case, this is the same as started.
Fixes: 8aaa112a57c1 ("net: atlantic: refactoring pm logic")
Signed-off-by: Pavel Belous <pbelous@marvell.com>
Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Code inspection found that in case of mapping error we do return current
'ret' value. But beside error, it is used to count number of descriptors
allocated for the packet. In that case map_skb function could return '1'.
Changing it to return zero (number of mapped descriptors for skb)
Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code")
Signed-off-by: Pavel Belous <pbelous@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
skb->len is used to calculate statistics after xmit invocation.
Under a stress load it may happen that skb will be xmited,
rx interrupt will come and skb will be freed, all before xmit function
is even returned.
Eventually, skb->len will access unallocated area.
Moving stats calculation into tx_clean routine.
Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code")
Reported-by: Christophe Vu-Brugier <cvubrugier@fastmail.fm>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Pavel Belous <pbelous@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add checks to not enable multiple loopback modes simultaneously,
It was also discovered that for dma loopback to function correctly
promisc mode should be enabled on device.
Fixes: ea4b4d7fc106 ("net: atlantic: loopback tests via private flags")
Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Clock adjustment data should be passed to FW as well, otherwise in some
cases a drift was observed when using GPIO features.
Signed-off-by: Egor Pomozov <epomozov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Artificial HW reliability tests revealed a possible hangup in
the driver. Normally, when device disappears from bus, all
register reads returns 0xFFFFFFFF.
At remote procedure invocation towards FW there is a logic
where result is compared with -1 in a loop.
That caused an infinite loop if hardware due to some issues
disappears from bus.
Add extra result checks to prevent this.
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Yet another checksum offload compatibility issue was found.
The known issue is that AQC HW marks tcp packets with 0xFFFF checksum
as invalid (1). This is workarounded in driver, passing all the suspicious
packets up to the stack for further csum validation.
Another HW problem (2) is that it hides invalid csum of LRO aggregated
packets inside of the individual descriptors. That was workarounded
by forced scan of all LRO descriptors for checksum errors.
However the scan logic was joint for both LRO and multi-descriptor
packets (jumbos). And this causes the issue.
We have to drop LRO packets with the detected bad checksum
because of (2), but we have to pass jumbo packets to stack because of (1).
When using windows tcp partner with jumbo frames but with LSO disabled
driver discards such frames as bad checksummed. But only LRO frames
should be dropped, not jumbos.
On such a configurations tcp stream have a chance of drops and stucks.
(1) 76f254d4afe2 ("net: aquantia: tcp checksum 0xffff being handled incorrectly")
(2) d08b9a0a3ebd ("net: aquantia: do not pass lro session with invalid tcp checksum")
Fixes: d08b9a0a3ebd ("net: aquantia: do not pass lro session with invalid tcp checksum")
Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Because of autosuspend, at91ether_start is called with clocks disabled.
Ensure that pm_runtime doesn't suspend the interface as soon as it is
opened as there is no pm_runtime support is the other relevant parts of the
platform support for at91rm9200.
Fixes: d54f89af6cc4 ("net: macb: Add pm runtime support")
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The IPv6 address defined in struct in6_addr is specified as
big endian, but there is no specified endian in struct
hclge_fd_rule_tuples, so it will cause a problem if directly
use memcpy() to copy ipv6 address between these two structures
since this field in struct hclge_fd_rule_tuples is little endian.
This patch fixes this problem by using be32_to_cpu() to convert
endian of IPv6 address of struct in6_addr before copying.
Fixes: d93ed94fbeaf ("net: hns3: add aRFS support for PF")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When enabling 4 TC after setting the bandwidth of VF, the bandwidth
of VF will resume to default value, because of the qset resources
changed in this case.
This patch fixes it by using a fixed VF's qset resources according to
HNAE3_MAX_TC macro.
Fixes: ee9e44248f52 ("net: hns3: add support for configuring bandwidth of VF on the host")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the current process, the management table is missing after the
IMP reset. This patch adds the management table to the reset process.
Fixes: f5aac71c0327 ("net: hns3: add manager table initialization for hardware")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Because wireguard is calling icmp from network device context, it should
use the ndo helper so that the rate limiting applies correctly. This
commit adds a small test to the wireguard test suite to ensure that the
new functions continue doing the right thing in the context of
wireguard. It does this by setting up a condition that will definately
evoke an icmp error message from the driver, but along a nat'd path.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Because sunvnet is calling icmp from network device context, it should use
the ndo helper so that the rate limiting applies correctly. While we're
at it, doing the additional route lookup before calling icmp_ndo_send is
superfluous, since this is the job of the icmp code in the first place.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Because gtp is calling icmp from network device context, it should use
the ndo helper so that the rate limiting applies correctly.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Harald Welte <laforge@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is a collection of trivial fixes including fixing whitespace, typos,
function headers, reverse Christmas tree, etc.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Use the correct netif_msg_[tx,rx]_error() function to determine whether to
print the MDD event type.
Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
1. Remove local variable num_q_vectors and use vsi->num_q_vectors instead
2. Remove local variable pf and pass vsi->back to ice_pf_to_dev
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Formatting strings in print function calls (like dev_info, dev_err, etc.)
can exceed 80 columns without making checkpatch unhappy. So remove
newlines where applicable and make print statements more compact.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Use ice_pf_to_dev(pf) instead of &pf->pdev->dev
Use ice_pf_to_dev(vsi->back) instead of &vsi->back->pdev->dev
When a pointer to the pf instance is available, use ice_pf_to_dev
instead of ice_hw_to_dev
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|