summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-09-17mptcp: Fix unsigned 'max_seq' compared with zero in mptcp_data_queue_ofoYe Bin
Fixes coccicheck warnig: net/mptcp/protocol.c:164:11-18: WARNING: Unsigned expression compared with zero: max_seq > 0 Fixes: ab174ad8ef76 ("mptcp: move ooo skbs into msk out of order queue") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17Merge branch ↵David S. Miller
'net-marvell-prestera-Add-Switchdev-driver-for-Prestera-family-ASIC-device-98DX3255-AC3x' Vadym Kochan says: ==================== net: marvell: prestera: Add Switchdev driver for Prestera family ASIC device 98DX3255 (AC3x) Marvell Prestera 98DX3255 integrates up to 24 ports of 1GbE with 8 ports of 10GbE uplinks or 2 ports of 40Gbps stacking for a largely wireless SMB deployment. Prestera Switchdev is a firmware based driver that operates via PCI bus. The current implementation supports only boards designed for the Marvell Switchdev solution and requires special firmware. This driver implementation includes only L1, basic L2 support, and RX/TX. The core Prestera switching logic is implemented in prestera_main.c, there is an intermediate hw layer between core logic and firmware. It is implemented in prestera_hw.c, the purpose of it is to encapsulate hw related logic, in future there is a plan to support more devices with different HW related configurations. The following Switchdev features are supported: - VLAN-aware bridge offloading - VLAN-unaware bridge offloading - FDB offloading (learning, ageing) - Switchport configuration The original firmware image is uploaded to the linux-firmware repository. PATCH v9: 1) Replace read_poll_timeout_atomic() by original 'do {} while()' loop because it works much better than read_poll_timeout_atomic() considering the TX rate. Also it fixes warning reported on v8. 2) Use ENOENT instead of EEXIST when item is not found in few places - prestera_hw.c and prestera_rxtx.c Patches updated: [1] net: marvell: prestera: Add driver for Prestera family ASIC devices PATCH v8: 1) Put license in one line. 2) Sort includes. 3) Add missing comma for last enum member 4) Return original error code from last called func in places where instead other error code was used. 5) Add comma for last member in initialized struct in prestera_hw.c 6) Do not initialize 'int err = 0' where it is not needed. 7) Simplify device-tree "marvell,prestera" node parsing by removing not needed checking on 'np == NULL'. 8) Use u32p_replace_bits() instead of open-coded ((word & ~mask) | val) 9) Use dev_warn_ratelimited() instead of pr_warn_ratelimited to indicate the device instance in prestera_rxtx.c 10) Simplify circular buffer list creation in prestera_sdma_{rx,tx}_init() by using do { } while (prev != tail) construction. 11) Use MSEC_PER_SEC instead of hard-coded 1000. 12) Use traditional error handling pattern: err = F(); if (err) return err; 13) Use ether_addr_copy() instead of memcpy() for mac FDB copying in prestera_hw.c 14) Drop swdev->ageing_time member which is not used. 15) Fix ageing macro to be in ms instead of seconds. Patches updated: [1] net: marvell: prestera: Add driver for Prestera family ASIC devices [2] net: marvell: prestera: Add PCI interface support [3] net: marvell: prestera: Add basic devlink support [4] net: marvell: prestera: Add ethtool interface support [5] net: marvell: prestera: Add Switchdev driver implementation PATCH v7: 1) Use ether_addr_copy() in prestera_main.c:prestera_port_set_mac_address() instead of memcpy(). 2) Removed not needed device's DMA address range check on dma_pool_alloc() in prestera_rxtx.c:prestera_sdma_buf_init(), this should be handled by dma_xxx() API considerig device's DMA mask. 3) Removed not needed device's DMA address range check on dma_map_single() in prestera_rxtx.c:prestera_sdma_rx_skb_alloc(), this should be handled by dma_xxx() API considerig device's DMA mask. 4) Add comment about port mac address limitation in the code where it is used and checked - prestera_main.c: - prestera_is_valid_mac_addr() - prestera_port_create() 5) Add missing destroy_workqueue(swdev_wq) in prestera_switchdev.c:prestera_switchdev_init() on error path handling. Patches updated: [1] net: marvell: prestera: Add driver for Prestera family ASIC devices [5] net: marvell: prestera: Add Switchdev driver implementation PATCH v6: 1) Use rwlock to protect port list on create/delete stages. The list is mostly readable by fw event handler or packets receiver, but updated only on create/delete port which are performed on switch init/fini stages. 2) Remove not needed variable initialization in prestera_dsa.c:prestera_dsa_parse() 3) Get rid of bounce buffer used by tx handler in prestera_rxtx.c, the bounce buffer should be handled by dma_xxx API via swiotlb. 4) Fix PRESTERA_SDMA_RX_DESC_PKT_LEN macro by using correct GENMASK(13, 0) in prestera_rxtx.c Patches updated: [1] net: marvell: prestera: Add driver for Prestera family ASIC devices PATCH v5: 0) add Co-developed tags for people who was involved in development. 1) Make SPDX license as separate comment 2) Change 'u8 *' -> 'void *', It allows to avoid not-needed u8* casting. 3) Remove "," in terminated enum's. 4) Use GENMASK(end, start) where it is applicable in. 5) Remove not-needed 'u8 *' casting. 6) Apply common error-check pattern 7) Use ether_addr_copy instead of memcpy 8) Use define for maximum MAC address range (255) 9) Simplify prestera_port_state_set() in prestera_main.c by using separate if-blocks for state setting: if (is_up) { ... } else { ... } which makes logic more understandable. 10) Simplify sdma tx wait logic when checking/updating tx_ring->burst. 11) Remove not-needed packed & aligned attributes 12) Use USEC_PER_MSEC as multiplier when converting ms -> usec on calling readl_poll_timeout. 13) Simplified some error path handling by simple return error code in. 14) Remove not-needed err assignment in. 15) Use dev_err() in prestera_devlink_register(...). Patches updated: [1] net: marvell: prestera: Add driver for Prestera family ASIC devices [2] net: marvell: prestera: Add PCI interface support [3] net: marvell: prestera: Add basic devlink support [4] net: marvell: prestera: Add ethtool interface support [5] net: marvell: prestera: Add Switchdev driver implementation PATCH v4: 1) Use prestera_ prefix in netdev_ops variable. 2) Kconfig: use 'default PRESTERA' build type for CONFIG_PRESTERA_PCI to be synced by default with prestera core module. 3) Use memcpy_xxio helpers in prestera_pci.c for IO buffer copying. 4) Generate fw image path via snprintf() instead of macroses. 5) Use pcim_ helpers in prestera_pci.c which simplified the probe/remove logic. 6) Removed not needed initializations of variables which are used in readl_poll_xxx() helpers. 7) Fixed few grammar mistakes in patch[2] description. 8) Export only prestera_ethtool_ops struct instead of each ethtool handler. 9) Add check for prestera_dev_check() in switchdev event handling to make sure there is no wrong topology. Patches updated: [1] net: marvell: prestera: Add driver for Prestera family ASIC devices [2] net: marvell: prestera: Add PCI interface support [4] net: marvell: prestera: Add ethtool interface support [5] net: marvell: prestera: Add Switchdev driver implementation PATCH v3: 1) Simplify __be32 type casting in prestera_dsa.c 2) Added per-patch changelog under "---" line. PATCH v2: 1) Use devlink_port_type_clear() 2) Add _MS prefix to timeout defines. 3) Remove not-needed packed attribute from the firmware ipc structs, also the firmware image needs to be uploaded too (will do it soon). 4) Introduce prestera_hw_switch_fini(), to be mirrored with init and do simple validation if the event handlers are unregistered. 5) Use kfree_rcu() for event handler unregistering. 6) Get rid of rcu-list usage when dealing with ports, not needed for now. 7) Little spelling corrections in the error/info messages. 8) Make pci probe & remove logic mirrored. 9) Get rid of ETH_FCS_LEN in headroom setting, not needed. PATCH: 1) Fixed W=1 warnings 2) Renamed PCI driver name to be more generic "Prestera DX" because there will be more devices supported. 3) Changed firmware image dir path: marvell/ -> mrvl/prestera/ to be aligned with location in linux-firmware.git (if such will be accepted). RFC v3: 1) Fix prestera prefix in prestera_rxtx.c 2) Protect concurrent access from multiple ports on multiple CPU system on tx path by spinlock in prestera_rxtx.c 3) Try to get base mac address from device-tree, otherwise use a random generated one. 4) Move ethtool interface support into separate prestera_ethtool.c file. 5) Add basic devlink support and get rid of physical port naming ops. 6) Add STP support in Switchdev driver. 7) Removed MODULE_AUTHOR 8) Renamed prestera.c -> prestera_main.c, and kernel module to prestera.ko RFC v2: 1) Use "pestera_" prefix in struct's and functions instead of mvsw_pr_ 2) Original series split into additional patches for Switchdev ethtool support. 3) Use major and minor firmware version numbers in the firmware image filename. 4) Removed not needed prints. 5) Use iopoll API for waiting on register's value in prestera_pci.c 6) Use standart approach for describing PCI ID matching section instead of using custom wrappers in prestera_pci.c 7) Add RX/TX support in prestera_rxtx.c. 8) Rewritten prestera_switchdev.c with following changes: - handle netdev events from prestera.c - use struct prestera_bridge for bridge objects, and get rid of struct prestera_bridge_device which may confuse. - use refcount_t 9) Get rid of macro usage for sending fw requests in prestera_hw.c 10) Add base_mac setting as module parameter. base_mac is required for generation default port's mac. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17dt-bindings: marvell,prestera: Add description for device-tree bindingsVadym Kochan
Add brief description how to configure base mac address binding in device-tree. Describe requirement for the PCI port which is connected to the ASIC, to allow access to the firmware related registers. Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17net: marvell: prestera: Add Switchdev driver implementationVadym Kochan
The following features are supported: - VLAN-aware bridge offloading - VLAN-unaware bridge offloading - FDB offloading (learning, ageing) - Switchport configuration Currently there are some limitations like: - Only 1 VLAN-aware bridge instance supported - FDB ageing timeout parameter is set globally per device Co-developed-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Co-developed-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu> Signed-off-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu> Co-developed-by: Taras Chornyi <taras.chornyi@plvision.eu> Signed-off-by: Taras Chornyi <taras.chornyi@plvision.eu> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17net: marvell: prestera: Add ethtool interface supportVadym Kochan
The ethtool API provides support for the configuration of the following features: speed and duplex, auto-negotiation, MDI-x, forward error correction, port media type. The API also provides information about the port status, hardware and software statistic. The following limitation exists: - port media type should be configured before speed setting - ethtool -m option is not supported - ethtool -p option is not supported - ethtool -r option is supported for RJ45 port only - the following combination of parameters is not supported: ethtool -s sw1pX port XX autoneg on - forward error correction feature is supported only on SFP ports, 10G speed - auto-negotiation and MDI-x features are not supported on Copper-to-Fiber SFP module Co-developed-by: Andrii Savka <andrii.savka@plvision.eu> Signed-off-by: Andrii Savka <andrii.savka@plvision.eu> Co-developed-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17net: marvell: prestera: Add basic devlink supportVadym Kochan
Add very basic support for devlink interface: - driver name - fw version - devlink ports Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17net: marvell: prestera: Add PCI interface supportVadym Kochan
Add PCI interface driver for Prestera Switch ASICs family devices, which provides: - Firmware loading mechanism - Requests & events handling to/from the firmware - Access to the firmware on the bus level The firmware has to be loaded each time the device is reset. The driver is loading it from: /lib/firmware/mrvl/prestera/mvsw_prestera_fw-v{MAJOR}.{MINOR}.img The full firmware image version is located within the internal header and consists of 3 numbers - MAJOR.MINOR.PATCH. Additionally, driver has hard-coded minimum supported firmware version which it can work with: MAJOR - reflects the support on ABI level between driver and loaded firmware, this number should be the same for driver and loaded firmware. MINOR - this is the minimum supported version between driver and the firmware. PATCH - indicates only fixes, firmware ABI is not changed. Firmware image file name contains only MAJOR and MINOR numbers to make driver be compatible with any PATCH version. Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17net: marvell: prestera: Add driver for Prestera family ASIC devicesVadym Kochan
Marvell Prestera 98DX326x integrates up to 24 ports of 1GbE with 8 ports of 10GbE uplinks or 2 ports of 40Gbps stacking for a largely wireless SMB deployment. The current implementation supports only boards designed for the Marvell Switchdev solution and requires special firmware. The core Prestera switching logic is implemented in prestera_main.c, there is an intermediate hw layer between core logic and firmware. It is implemented in prestera_hw.c, the purpose of it is to encapsulate hw related logic, in future there is a plan to support more devices with different HW related configurations. This patch contains only basic switch initialization and RX/TX support over SDMA mechanism. Currently supported devices have DMA access range <= 32bit and require ZONE_DMA to be enabled, for such cases SDMA driver checks if the skb allocated in proper range supported by the Prestera device. Also meanwhile there is no TX interrupt support in current firmware version so recycling work is scheduled on each xmit. Port's mac address is generated from the switch base mac which may be provided via device-tree (static one or as nvme cell), or randomly generated. This is required by the firmware. Co-developed-by: Andrii Savka <andrii.savka@plvision.eu> Signed-off-by: Andrii Savka <andrii.savka@plvision.eu> Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Co-developed-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Co-developed-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu> Signed-off-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu> Co-developed-by: Taras Chornyi <taras.chornyi@plvision.eu> Signed-off-by: Taras Chornyi <taras.chornyi@plvision.eu> Co-developed-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu> Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17genetlink: Remove unused function genl_err_attr()YueHaibing
It is never used, so can remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17net/sched: Remove unused function qdisc_queue_drop_head()YueHaibing
It is not used since commit a09ceb0e0814 ("sched: remove qdisc->drop") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17selftests: mptcp: interpret \n as a new lineMatthieu Baerts
In case of errors, this message was printed: (...) # read: Resource temporarily unavailable # client exit code 0, server 3 # \nnetns ns1-0-BJlt5D socket stat for 10003: (...) Obviously, the idea was to add a new line before the socket stat and not print "\nnetns". Fixes: b08fbf241064 ("selftests: add test-cases for MPTCP MP_JOIN") Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17net/packet: Fix a comment about mac_headerXie He
1. Change all "dev->hard_header" to "dev->header_ops" 2. On receiving incoming frames when header_ops == NULL: The comment only says what is wrong, but doesn't say what is right. This patch changes the comment to make it clear what is right. 3. On transmitting and receiving outgoing frames when header_ops == NULL: The comment explains that the LL header will be later added by the driver. However, I think it's better to simply say that the LL header is invisible to us. This phrasing is better from a software engineering perspective, because this makes it clear that what happens in the driver should be hidden from us and we should not care about what happens internally in the driver. 4. On resuming the LL header (for RAW frames) when header_ops == NULL: The comment says we are "unlikely" to restore the LL header. However, we should say that we are "unable" to restore it. It's not possible (rather than not likely) to restore it, because: 1) There is no way for us to restore because the LL header internally processed by the driver should be invisible to us. 2) In function packet_rcv and tpacket_rcv, the code only tries to restore the LL header when header_ops != NULL. Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Signed-off-by: Xie He <xie.he.0141@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17Merge branch 'net-hns3-updates-for-next'David S. Miller
Huazhong Tan says: ==================== net: hns3: updates for -next There are some optimizations related to IO path. Change since V1: - fixes a unsuitable handling in hns3_lb_clear_tx_ring() of #6 which pointed out by Saeed Mahameed. previous version: V1: https://patchwork.ozlabs.org/project/netdev/cover/1600085217-26245-1-git-send-email-tanhuazhong@huawei.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17net: hns3: use napi_consume_skb() when cleaning tx descYunsheng Lin
Use napi_consume_skb() to batch consuming skb when cleaning tx desc in NAPI polling. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17net: hns3: use writel() to optimize the barrier operationYunsheng Lin
writel() can be used to order I/O vs memory by default when writing portable drivers. Use writel() to replace wmb() + writel_relaxed(), and writel() is dma_wmb() + writel_relaxed() for ARM64, so there is an optimization here because dma_wmb() is a lighter barrier than wmb(). Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17net: hns3: optimize the rx clean processYunsheng Lin
Currently HNS3_RING_RX_RING_FBDNUM_REG register is read to determine how many rx desc can be cleaned. To avoid the register read operation in the critical data path, use the valid bit in the rx desc to determine if a specific rx desc can be cleaned. The hns3 driver clear valid bit in the rx desc before notifying the rx desc to the hw, and hw will only set the valid bit of the rx desc after corresponding buffer is filled with packet data and other field in the rx desc is set accordingly. Add hns3_rx_ring_move_fw() function to clear the valid bit in the rx desc before moving rx ring's next_to_clean forward to avoid double cleaning a rx desc, also add a dma_rmb() barrier in hns3_handle_rx_bd() to make sure valid bit is set before reading other field in the rx desc. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17net: hns3: optimize the tx clean processYunsheng Lin
Currently HNS3_RING_TX_RING_HEAD_REG register is read to determine how many tx desc can be cleaned. To avoid the register read operation in the critical data path, use the valid bit in the tx desc to determine if a specific tx desc can be cleaned. The hns3 driver sets valid bit in the tx desc before ringing a doorbell to the hw, and hw will only clear the valid bit of the tx desc after corresponding packet is sent out to the wire. And because next_to_use for tx ring is a changing variable when the driver is filling the tx desc, so reuse the pull_len for rx ring to record the tx desc that has notified to the hw, so that hns3_nic_reclaim_desc() can decide how many tx desc's valid bit need checking when reclaiming tx desc. And io_err_cnt stat is also removed for it is not used anymore. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17net: hns3: batch tx doorbell operationYunsheng Lin
Use netdev_xmit_more() to defer the tx doorbell operation when the skb is passed to the driver continuously. By doing this we can improve the overall xmit performance by avoid some doorbell operations. Also, the tx_err_cnt stat is not used, so rename it to tx_more stat. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17net: hns3: batch the page reference count updatesYunsheng Lin
Batch the page reference count updates instead of doing them one at a time. By doing this we can improve the overall receive performance by avoid some atomic increment operations when the rx page is reused. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16cxgb4vf: convert to use DEFINE_SEQ_ATTRIBUTE macroLiu Shixin
Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16ionic: dynamic interrupt moderationShannon Nelson
Use the dim library to manage dynamic interrupt moderation in ionic. v3: rebase v2: untangled declarations in ionic_dim_work() Signed-off-by: Shannon Nelson <snelson@pensando.io> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16net/smc: check variable before dereferencing in smc_close.cKarsten Graul
smc->clcsock and smc->clcsock->sk are used before the check if they can be dereferenced. Fix this by checking the variables first. Fixes: a60a2b1e0af1 ("net/smc: reduce active tcp_listen workers") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16net: bridge: mcast: don't ignore return value of __grp_src_toex_exclNikolay Aleksandrov
When we're handling TO_EXCLUDE report in EXCLUDE filter mode we should not ignore the return value of __grp_src_toex_excl() as we'll miss sending notifications about group changes. Fixes: 5bf1e00b6849 ("net: bridge: mcast: support for IGMPV3/MLDv2 CHANGE_TO_INCLUDE/EXCLUDE report") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16net: stmmac: Add support to Ethtool get/set ring parametersSong, Yoong Siang
This patch add support to --show-ring & --set-ring Ethtool functions: - Adding min, max, power of two check to new ring parameter's value. - Bring down the network interface before changing the value of ring parameters. - Bring up the network interface after changing the value of ring parameters. Signed-off-by: Song, Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16Merge branch 'mlxsw-Refactor-headroom-management'David S. Miller
Ido Schimmel says: ==================== mlxsw: Refactor headroom management Petr says: On Spectrum, port buffers, also called port headroom, is where packets are stored while they are parsed and the forwarding decision is being made. For lossless traffic flows, in case shared buffer admission is not allowed, headroom is also where to put the extra traffic received before the sent PAUSE takes effect. Another aspect of the port headroom is the so called internal buffer, which is used for egress mirroring. Linux supports two DCB interfaces related to the headroom: dcbnl_setbuffer for configuration, and dcbnl_getbuffer for inspection. In order to make it possible to implement these interfaces, it is first necessary to clean up headroom handling, which is currently strewn in several places in the driver. The end goal is an architecture whereby it is possible to take a copy of the current configuration, adjust parameters, and then hand the proposed configuration over to the system to implement it. When everything works, the proposed configuration is accepted and saved. First, this centralizes the reconfiguration handling to one function, which takes care of coordinating buffer size changes and priority map changes to avoid introducing drops. Second, the fact that the configuration is all in one place makes it easy to keep a backup and handle error path rollbacks, which were previously hard to understand. Patch #1 introduces struct mlxsw_sp_hdroom, which will keep port headroom configuration. Patch #2 unifies handling of delay provision between PFC and PAUSE. From now on, delay is to be measured in bytes of extra space, and will not include MTU. PFC handler sets the delay directly from the parameter it gets through the DCB interface. For PAUSE, MLXSW_SP_PAUSE_DELAY is converted to have the same meaning. In patches #3-#5, MTU, lossiness and priorities are gradually moved over to struct mlxsw_sp_hdroom. In patches #6-#11, handling of buffer resizing and priority maps is moved from spectrum.c and spectrum_dcb.c to spectrum_buffers.c. The API is gradually adapted so that struct mlxsw_sp_hdroom becomes the main interface through which the various clients express how the headroom should be configured. Patch #12 is a small cleanup that the previous transformation made possible. In patch #13, the port init code becomes a boring client of the headroom code, instead of rolling its own thing. Patches #14 and #15 move handling of internal mirroring buffer to the new headroom code as well. Previously, this code was in the SPAN module. This patchset converts the SPAN module to another boring client of the headroom code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16mlxsw: spectrum_buffers: Manage internal buffer in the hdroom codePetr Machata
Traffic mirroring modes that are in-chip implemented on egress need an internal buffer to work. As the only client, the SPAN module was managing the buffer so far. However logically it belongs to the buffers module. E.g. buffer size validation needs to take the size of the internal buffer into account. Therefore move the related code from SPAN to spectrum_buffers. Move over the callbacks that determine the minimum buffer size as a function of maximum speed and MTU. Add a field describing the internal buffer to struct mlxsw_sp_hdroom. Extend mlxsw_sp_hdroom_bufs_reset_sizes() to take care of sizing the internal buffer as well. Change the SPAN module to invoke that function and mlxsw_sp_hdroom_configure() like all the other hdroom clients. Drop the now-unnecessary mlxsw_sp_span_port_buffer_disable(). Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16mlxsw: spectrum_buffers: Introduce shared buffer opsPetr Machata
The size of the internal buffer is currently calculated in the SPAN module. Logically it belongs to the spectrum_buffers module, where it should be moved. However, that being a chip-specific operation, it needs dynamic dispatch. There currently is a chip-specific structure for description of shared buffer values, struct mlxsw_sp_sb_vals. However placing ops into this structure would be confusing. Therefore introduce a new per-chip structure, currently empty, and initialize the ops pointer as appropriate. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16mlxsw: spectrum_buffers: Convert mlxsw_sp_port_headroom_init()Petr Machata
Currently mlxsw_sp_port_headroom_init() configures both priomap and buffers by hand. Additionally, for port buffers, it configures buffer 0 with a size that it will never again have if PFC configuration is touched. Rewrite the init code to become a client of the new hdroom code. The only difference in invocation is that the configuration is forced, so that it is issued even if the desired configuration happens to match what is contained in (hitherto not initialized with meaningful values) mlxsw_sp_port->hdroom. Since now mlxsw_sp_port_headroom_init() initializes all the PG buffers to meaningful values, mlxsw_sp_hdroom_configure_buffers() can avoid querying the current configuration, and can fill the whole PBMC itself. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16mlxsw: spectrum_buffers: Inline mlxsw_sp_sb_max_headroom_cells()Petr Machata
This function is now only used from the buffers module, and is a trivial field reference. Just inline it and drop the related artifacts. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16mlxsw: spectrum_buffers: Move here the new headroom codePetr Machata
Move all the headroom code to the spectrum_buffers module, where it belongs. Rename mlxsw_sp_pg_buf_threshold_get() and mlxsw_sp_pg_buf_pack() to ..._hdroom_... to match the naming convention of the new headroom code. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16mlxsw: spectrum: Move here the three-step headroom configuration from DCBPetr Machata
The ETS handler performs the headroom configuration in three steps: first it resizes the buffers and adds any new ones. Then it redirects priorities to the new buffers. And finally it sets the size of the now-unused buffers to zero. This way no packet drops are introduced. This sort of careful approach will also be useful for configuring port buffer sizes and priority map by hand, through dcbnl_setbuffer. Therefore move the code from the DCB handler to the generic headroom function. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16mlxsw: spectrum_dcb: Convert mlxsw_sp_port_pg_prio_map() to hdroom codePetr Machata
The new hdroom code has certain conventions: iteration over priorities is done through a variable named `prio', configuration is not pushed unless it is dirty, but a `force' flag can be used to override this, updated configuration is written to port. Convert the function mlxsw_sp_port_pg_prio_map() to use these conventions and rename appropriately to fit in. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16mlxsw: spectrum_dcb: Convert ETS handler fully to mlxsw_sp_hdroom_configure()Petr Machata
The ETS handler performs the headroom configuration in three steps: first it resizes the buffers and adds any new ones. Then it redirects priorities to the new buffers. And finally it sets the size of the now-unused buffers to zero. This way no packet drops are introduced. Both of the buffer size configuration operations are simply buffer size configurations, there is no material difference between setting buffers to zero and any other value. Therefore simply invoke the same mlxsw_sp_hdroom_configure(), and drop mlxsw_sp_port_pg_destroy() and mlxsw_sp_ets_has_pg() which are now unused. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16mlxsw: spectrum: Split headroom autoresize out of buffer configurationPetr Machata
Split mlxsw_sp_port_headroom_set() to three functions. mlxsw_sp_hdroom_bufs_reset_sizes() changes the sizes of the individual PG buffers, and mlxsw_sp_hdroom_configure_buffers() will actually apply the configuration. A third function, mlxsw_sp_hdroom_bufs_fit(), verifies that the requested buffer configuration matches total headroom size requirements. Add wrappers, mlxsw_sp_hdroom_configure() and __..., that will eventually perform full headroom configuration, but for now, only have them verify the configured headroom size, and invoke mlxsw_sp_hdroom_configure_buffers(). Have them take the `force` argument to prepare for a later patch, even though it is currently unused. Note that the loop in mlxsw_sp_hdroom_configure_buffers() only goes through DCBX_MAX_BUFFERS. Since there is no logic to configure the control buffer, it needs to keep the values queried from the FW. Eventually this function should configure all the PGs. Note that conversion of __mlxsw_sp_dcbnl_ieee_setets() is not trivial. That function performs the headroom configuration in three steps: first it resizes the buffers and adds any new ones. Then it redirects priorities to the new buffers. And finally it sets the size of the now-unused buffers to zero. This way no packet drops are introduced. So after invoking mlxsw_sp_hdroom_bufs_reset_sizes(), tweak the configuration to keep the old sizes of PG buffers for those buffers whose size was set to zero. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16mlxsw: spectrum: Track buffer sizes in struct mlxsw_sp_hdroomPetr Machata
So far, port buffers were always autoconfigured. When dcbnl_setbuffer callback is implemented, it will allow the user to change the buffer size configuration by hand. The sizes therefore need to be a configuration parameter, not always deduced, and therefore belong to struct mlxsw_sp_hdroom, where the configuration routine should take them from. Update mlxsw_sp_port_headroom_set() to update these sizes. Have the function update the sizes even for the case that a given buffer is not used. Additionally, change the loop iteration end to DCBX_MAX_BUFFERS instead of IEEE_8021QAZ_MAX_TCS. The value is the same, but the semantics differ. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16mlxsw: spectrum: Track lossiness in struct mlxsw_sp_hdroomPetr Machata
Client-side configuration has lossiness as an attribute of a priority. Therefore add a "lossy" attribute to struct mlxsw_sp_hdroom_prio. To a Spectrum ASIC, lossiness is a feature of a port buffer. Therefore add struct mlxsw_sp_hdroom_buf, which in the following patches will get more attributes, but right now only use it to track port buffer lossiness. Instead of passing around the primary indicators of PFC and pause_en, add a function mlxsw_sp_hdroom_bufs_reset_lossiness() to compute the buffer lossiness from the priority map and priority lossiness. Change mlxsw_sp_port_headroom_set() to take the buffer lossy flag from the headroom configuration. Have the PFC and pause handlers configure priority lossiness in mlxsw_sp_hdroom, from where it will propagate. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16mlxsw: spectrum: Track priorities in struct mlxsw_sp_hdroomPetr Machata
The mapping from priorities to buffers determines which buffers should be configured. Lossiness of these priorities combined with the mapping determines whether a given buffer should be lossy. Currently this configuration is stored implicitly in DCB ETS, PFC and ethtool PAUSE configuration. Keeping it together with the rest of the headroom configuration and deriving it as needed from PFC / ETS / PAUSE will make things clearer. To that end, add a field "prios" to struct mlxsw_sp_hdroom. Previously, __mlxsw_sp_port_headroom_set() took prio_tc as an argument, and assumed that the same mapping as we use on the egress should be used on ingress as well. Instead, track this configuration at each priority, so that it can be adjusted flexibly. In the following patches, as dcbnl_setbuffer is implemented, it will need to store its own mapping, and it will also be sometimes necessary to revert back to the original ETS mapping. Therefore track two buffer indices: the one for chip configuration (buf_idx), and the source one (ets_buf_idx). Introduce a function to configure the chip-level buffer index, and for now have it simply copy the ETS mapping over to the chip mapping. Update the ETS handler to project prio_tc to the ets_buf_idx and invoke the buf_idx recomputation. Now that there is a canonical place to look for this configuration, mlxsw_sp_port_headroom_set() does not need to invent def_prio_tc to use if DCB is compiled out. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16mlxsw: spectrum: Track MTU in struct mlxsw_sp_hdroomPetr Machata
MTU influences sizes of auto-allocated buffers. Make it a part of port buffer configuration and have __mlxsw_sp_port_headroom_set() take it from there, instead of as an argument. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16mlxsw: spectrum: Unify delay handling between PFC and pausePetr Machata
When a priority is marked as lossless using DCB PFC, or when pause frames are enabled on a port, mlxsw adds to port buffers an extra space to cover the traffic that will arrive between the time that a pause or PFC frame is emitted, and the time traffic actually stops. This is called the delay. The concept is the same in PFC and pause, however the way the extra buffer space is calculated differs. In this patch, unify this handling. Delay is to be measured in bytes of extra space, and will not include MTU. PFC handler sets the delay directly from the parameter it gets through the DCB interface. To convert pause handler, move MLXSW_SP_PAUSE_DELAY to ethtool module, convert to bytes, and reduce it by maximum MTU, and divide by two. Then it has the same meaning as the delay_bytes set by the PFC handler. Keep the delay_bytes value in struct mlxsw_sp_hdroom introduced in the previous patch. Change PFC and pause handlers to store the new delay value there and have __mlxsw_sp_port_headroom_set() take it from there. Instead of mlxsw_sp_pfc_delay_get() and mlxsw_sp_pg_buf_delay_get(), introduce mlxsw_sp_hdroom_buf_delay_get() to calculate the delay provision. Drop the unnecessary MLXSW_SP_CELL_FACTOR, and instead add an explanatory comment describing the formula used. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16mlxsw: spectrum_buffers: Add struct mlxsw_sp_hdroomPetr Machata
The port headroom handling is currently strewn across several modules and tricky to follow: MTU, DCB PFC, DCB ETS and ethtool pause all influence the settings, and then there is the completely separate initial configuraion in spectrum_buffers. A following patch will implement the dcbnl_setbuffer callback, which is going to further complicate the landscape. In order to simplify work with port buffers, the following patches are going to centralize all port-buffer handling in spectrum_buffers. As a first step, introduce a (currently empty) struct mlxsw_sp_hdroom that will keep the configuration parameters, and allocate and free it in appropriate places. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16Merge tag 'mlx5-updates-2020-09-15' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-09-15 Various updates to mlx5 driver, 1) Eli adds support for TC trap action. 2) Eran, minor improvements to clock.c code structure 3) Better handling of error reporting in LAG from Jianbo 4) IPv6 traffic class (DSCP) header rewrite support from Maor 5) Ofer Levi adds support for CQE compression of multi-strides packets 6) Vu, Enables use of vport meta data by default. 7) Some minor code cleanup ==================== Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15Merge branch 'nexthop-Small-changes'David S. Miller
Ido Schimmel says: ==================== nexthop: Small changes This patch set contains a few small changes that I split out of the RFC I sent last week [1]. Main change is the conversion of the nexthop notification chain to a blocking chain so that it could be reused by device drivers for nexthop objects programming in the future. Tested with fib_nexthops.sh: Tests passed: 164 Tests failed: 0 [1] https://lore.kernel.org/netdev/20200908091037.2709823-1-idosch@idosch.org/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15selftests: fib_nexthops: Test cleanup of FDB entries following nexthop deletionIdo Schimmel
Commit c7cdbe2efc40 ("vxlan: support for nexthop notifiers") registered a listener in the VXLAN driver to the nexthop notification chain. Its purpose is to cleanup FDB entries that use a nexthop that is being deleted. Test that such FDB entries are removed when the nexthop group that they use is deleted. Test that entries are not deleted when a single nexthop in the group is deleted. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15nexthop: Only emit a notification when nexthop is actually deletedIdo Schimmel
Currently, the in-kernel delete notification is emitted from the error path of nexthop_add() and replace_nexthop(), which can be confusing to in-kernel listeners as they are not familiar with the nexthop. Instead, only emit the notification when the nexthop is actually deleted. The following sub-cases are covered: 1. User space deletes the nexthop 2. The nexthop is deleted by the kernel due to a netdev event (e.g., nexthop device going down) 3. A group is deleted because its last nexthop is being deleted 4. The network namespace of the nexthop device is deleted Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15nexthop: Convert to blocking notification chainIdo Schimmel
Currently, the only listener of the nexthop notification chain is the VXLAN driver. Subsequent patches will add more listeners (e.g., device drivers such as netdevsim) that need to be able to block when processing notifications. Therefore, convert the notification chain to a blocking one. This is safe as notifications are always emitted from process context. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15nexthop: Remove NEXTHOP_EVENT_ADDIdo Schimmel
Not used anywhere. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15nexthop: Remove unused function declaration from header fileIdo Schimmel
Not used or implemented anywhere. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15chelsio/chtls: Re-add dependencies on CHELSIO_T4 to fix modular CHELSIO_T4Geert Uytterhoeven
As CHELSIO_INLINE_CRYPTO is bool, and CHELSIO_T4 is tristate, the dependency of CHELSIO_INLINE_CRYPTO on CHELSIO_T4 is not sufficient to protect CRYPTO_DEV_CHELSIO_TLS and CHELSIO_IPSEC_INLINE. The latter two are also tristate, hence if CHELSIO_T4=n, they cannot be builtin, as that would lead to link failures like: drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c:259: undefined reference to `cxgb4_port_viid' and drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c:752: undefined reference to `cxgb4_reclaim_completed_tx' Fix this by re-adding dependencies on CHELSIO_T4 to tristate symbols. The dependency of CHELSIO_INLINE_CRYPTO on CHELSIO_T4 is kept to avoid asking the user. Fixes: 6bd860ac1c2a0ec2 ("chelsio/chtls: CHELSIO_INLINE_CRYPTO should depend on CHELSIO_T4") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15Merge branch 'mlxsw-Introduce-fw_fatal-health-reporter-and-test-cmd-toDavid S. Miller
-trigger-test-event' Ido Schimmel says: ==================== mlxsw: Introduce fw_fatal health reporter and test cmd to trigger test event Jiri says: This patch set introduces a health reporter for mlxsw that reports FW fatal events. Alongside that, it introduces a test command that is used to trigger a dummy FW fatal event by user: $ sudo devlink health test pci/0000:03:00.0 reporter fw_fatal $ devlink health pci/0000:03:00.0: reporter fw_fatal state error error 1 recover 0 last_dump_date 2020-07-27 last_dump_time 16:33:27 auto_dump true $ sudo devlink health dump show pci/0000:03:00.0 reporter fw_fatal -j -p { "irisc_id": 0, "event": [ "id": 3 ], "method": "query", "long_process": false, "command_type": "mad", "reg_attr_id": 0 } As a dependency, the FW validation and flashing is moved to core.c. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15mlxsw: core: Introduce fw_fatal health reporterJiri Pirko
Introduce devlink health reporter to report FW fatal events. Implement the event listener using MFDE trap and enable the events to be propagated using MFGD register configuration. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>