summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-01-19bonding: add a vlan+srcmac tx hashing optionJarod Wilson
This comes from an end-user request, where they're running multiple VMs on hosts with bonded interfaces connected to some interest switch topologies, where 802.3ad isn't an option. They're currently running a proprietary solution that effectively achieves load-balancing of VMs and bandwidth utilization improvements with a similar form of transmission algorithm. Basically, each VM has it's own vlan, so it always sends its traffic out the same interface, unless that interface fails. Traffic gets split between the interfaces, maintaining a consistent path, with failover still available if an interface goes down. Unlike bond_eth_hash(), this hash function is using the full source MAC address instead of just the last byte, as there are so few components to the hash, and in the no-vlan case, we would be returning just the last byte of the source MAC as the hash value. It's entirely possible to have two NICs in a bond with the same last byte of their MAC, but not the same MAC, so this adjustment should guarantee distinct hashes in all cases. This has been rudimetarily tested to provide similar results to the proprietary solution it is aiming to replace. A patch for iproute2 is also posted, to properly support the new mode there as well. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Thomas Davis <tadavis@lbl.gov> Signed-off-by: Jarod Wilson <jarod@redhat.com> Link: https://lore.kernel.org/r/20210119010927.1191922-1-jarod@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19net: fix GSO for SG-enabled devicesPaolo Abeni
The commit dbd50f238dec ("net: move the hsize check to the else block in skb_segment") introduced a data corruption for devices supporting scatter-gather. The problem boils down to signed/unsigned comparison given unexpected results: if signed 'hsize' is negative, it will be considered greater than a positive 'len', which is unsigned. This commit addresses resorting to the old checks order, so that 'hsize' never has a negative value when compared with 'len'. v1 -> v2: - reorder hsize checks instead of explicit cast (Alex) Bisected-by: Matthieu Baerts <matthieu.baerts@tessares.net> Fixes: dbd50f238dec ("net: move the hsize check to the else block in skb_segment") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/861947c2d2d087db82af93c21920ce8147d15490.1611074818.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19octeontx2-af: Remove unneeded semicolonsXu Wang
fix semicolon.cocci warnings: ./drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c:272:2-3: Unneeded semicolon drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c:1788:3-4: Unneeded semicolon drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c:1809:3-4: Unneeded semicolon drivers/net/ethernet/marvell/octeontx2/af/rvu.c:1326:2-3: Unneeded semicolon Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Link: https://lore.kernel.org/r/20210119075059.17493-1-vulab@iscas.ac.cn Link: https://lore.kernel.org/r/20210119075507.17699-1-vulab@iscas.ac.cn Link: https://lore.kernel.org/r/20210119080037.17931-1-vulab@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19net: smsc911x: Make Runtime PM handling more fine-grainedGeert Uytterhoeven
Currently the smsc911x driver has mininal power management: during driver probe, the device is powered up, and during driver remove, it is powered down. Improve power management by making it more fine-grained: 1. Power the device down when driver probe is finished, 2. Power the device (down) when it is opened (closed), 3. Make sure the device is powered during PHY access. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20210118150857.796943-1-geert+renesas@glider.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19selftests: forwarding: Fix spelling mistake "succeded" -> "succeeded"Colin Ian King
There are two spelling mistakes in check_fail messages. Fix them. Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210118111902.71096-1-colin.king@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19net: tun: fix misspellings using codespell toolMenglong Dong
Some typos are found out by codespell tool: $ codespell -w -i 3 ./drivers/net/tun.c aovid ==> avoid Fix typos found by codespell. Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Link: https://lore.kernel.org/r/20210118111539.35886-1-dong.menglong@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19taprio: boolean values to a bool variableJiapeng Zhong
Fix the following coccicheck warnings: ./net/sched/sch_taprio.c:393:3-16: WARNING: Assignment of 0/1 to bool variable. ./net/sched/sch_taprio.c:375:2-15: WARNING: Assignment of 0/1 to bool variable. ./net/sched/sch_taprio.c:244:4-19: WARNING: Assignment of 0/1 to bool variable. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com> Link: https://lore.kernel.org/r/1610958662-71166-1-git-send-email-abaci-bugfix@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19Merge branch 'net-ethernet-ti-am65-cpsw-nuss-introduce-support-for-am64x-cpsw3g'Jakub Kicinski
Grygorii Strashko says: ==================== net: ethernet: ti: am65-cpsw-nuss: introduce support for am64x cpsw3g This series introduces basic support for recently introduced TI K3 AM642x SoC [1] which contains 3 port (2 external ports) CPSW3g module. The CPSW3g integrated in MAIN domain and can be configured in multi port or switch modes. In this series only multi port mode is enabled. The initial version of switchdev support was introduced by Vignesh Raghavendra [2] and work is in progress. The overall functionality and DT bindings are similar to other K3 CPSWxg versions, so DT binding changes are minimal and driver is mostly re-used for TI K3 AM642x CPSW3g. The main difference is that TI K3 AM642x SoC is not fully DMA coherent and instead DMA coherency is supported per DMA channel. Patches 1-2 - DT bindings update Patches 3-4 - Update driver to support changed DMA coherency model Patches 5-6 - adds TI K3 AM642x SoC platform data and so enable CPSW3g [1] https://www.ti.com/lit/pdf/spruim2 [2] https://patchwork.ozlabs.org/project/netdev/cover/20201130082046.16292-1-vigneshr@ti.com/ ==================== Link: https://lore.kernel.org/r/20210115192853.5469-1-grygorii.strashko@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19net: ethernet: ti: am65-cpsw: add support for am64x cpsw3gVignesh Raghavendra
The TI AM64x SoCs Gigabit Ethernet Switch subsystem (CPSW3g NUSS) has three ports (2 ext. ports) and provides Ethernet packet communication for the device and can be configured in multi port mode or as an Ethernet switch. This patch adds support for the corresponding CPSW3g version. Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19net: ti: cpsw_ale: add driver data for AM64 CPSW3gVignesh Raghavendra
The AM642x CPSW3g is similar to j721e-cpswxg except its ALE table size is 512 entries. Add entry for the same. Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19net: ethernet: ti: am65-cpsw-nuss: Support for transparent ASEL handlingPeter Ujfalusi
Use the glue layer's functions to convert the dma_addr_t to and from CPPI5 address (with the ASEL bits), which should be used within the descriptors and data buffers. - Per channel coherency support The DMAs use the 'ASEL' bits to select data and configuration fetch path. The ASEL bits are placed at the unused parts of any address field used by the DMAs (pointers to descriptors, addresses in descriptors, ring base addresses). The ASEL is not part of the address (the DMAs can address 48bits). Individual channels can be configured to be coherent (via ACP port) or non coherent individually by configuring the ASEL to appropriate value. [1] https://lore.kernel.org/patchwork/cover/1350756/ Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Co-developed-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19net: ethernet: ti: am65-cpsw-nuss: Use DMA device for DMA APIPeter Ujfalusi
For DMA API the DMA device should be used as cpsw does not accesses to descriptors or data buffers in any ways. The DMA does. Also, drop dma_coerce_mask_and_coherent() setting on CPSW device, as it should be done by DMA driver which does data movement. This is required for adding AM64x CPSW3g support where DMA coherency supported per DMA channel. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Co-developed-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19dt-binding: net: ti: k3-am654-cpsw-nuss: update bindings for am64x cpsw3gGrygorii Strashko
Update DT binding for recently introduced TI K3 AM642x SoC [1] which contains 3 port (2 external ports) CPSW3g module. The CPSW3g integrated in MAIN domain and can be configured in multi port or switch modes. The overall functionality and DT bindings are similar to other K3 CPSWxg versions, so DT binding changes are minimal: - reword description - add new compatible 'ti,am642-cpsw-nuss' - allow 2 external ports child nodes - add missed 'assigned-clock' props [1] https://www.ti.com/lit/pdf/spruim2 Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19dt-binding: ti: am65x-cpts: add assigned-clock and power-domains propsGrygorii Strashko
The CPTS clock is usually a clk-mux which allows to select CPTS reference clock by using 'assigned-clock-parents', 'assigned-clocks' DT properties. Also depending on integration the power-domains has to be specified to enable CPTS IP. Hence add 'assigned-clock-parents', 'assigned-clocks' and 'power-domains' properties to the CPTS DT bindings to avoid dtbs_check warnings: cpts@310d0000: 'assigned-clock-parents', 'assigned-clocks' do not match any of the regexes: 'pinctrl-[0-9]+' cpts@310d0000: 'power-domains' does not match any of the regexes: 'pinctrl-[0-9]+' Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19Merge branch ↵Jakub Kicinski
'net-support-sctp-crc-csum-offload-for-tunneling-packets-in-some-drivers' Xin Long says: ==================== net: support SCTP CRC csum offload for tunneling packets in some drivers This patchset introduces inline function skb_csum_is_sctp(), and uses it to validate it's a sctp CRC csum offload packet, to make SCTP CRC csum offload for tunneling packets supported in some HW drivers. ==================== Link: https://lore.kernel.org/r/cover.1610777159.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19net: ixgbevf: use skb_csum_is_sctp instead of protocol checkXin Long
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC checksum offload packet, and yet it also makes ixgbevf support SCTP CRC checksum offload for UDP and GRE encapped packets, just as it does in igb driver. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19net: ixgbe: use skb_csum_is_sctp instead of protocol checkXin Long
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC checksum offload packet, and yet it also makes ixgbe support SCTP CRC checksum offload for UDP and GRE encapped packets, just as it does in igb driver. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19net: igc: use skb_csum_is_sctp instead of protocol checkXin Long
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC checksum offload packet, and yet it also makes igc support SCTP CRC checksum offload for UDP and GRE encapped packets, just as it does in igb driver. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19net: igbvf: use skb_csum_is_sctp instead of protocol checkXin Long
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC checksum offload packet, and yet it also makes igbvf support SCTP CRC checksum offload for UDP and GRE encapped packets, just as it does in igb driver. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19net: igb: use skb_csum_is_sctp instead of protocol checkXin Long
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC checksum offload packet, and there is no need to parse the packet to check its proto field, especially when it's a UDP or GRE encapped packet. So this patch also makes igb support SCTP CRC checksum offload for UDP and GRE encapped packets. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19net: add inline function skb_csum_is_sctpXin Long
This patch is to define a inline function skb_csum_is_sctp(), and also replace all places where it checks if it's a SCTP CSUM skb. This function would be used later in many networking drivers in the following patches. Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19mdio, phy: fix -Wshadow warnings triggered by nested container_of()Alexander Lobakin
container_of() macro hides a local variable '__mptr' inside. This becomes a problem when several container_of() are nested in each other within single line or plain macros. As C preprocessor doesn't support generating random variable names, the sole solution is to avoid defining macros that consist only of container_of() calls, or they will self-shadow '__mptr' each time: In file included from ./include/linux/bitmap.h:10, from drivers/net/phy/phy_device.c:12: drivers/net/phy/phy_device.c: In function ‘phy_device_release’: ./include/linux/kernel.h:693:8: warning: declaration of ‘__mptr’ shadows a previous local [-Wshadow] 693 | void *__mptr = (void *)(ptr); \ | ^~~~~~ ./include/linux/phy.h:647:26: note: in expansion of macro ‘container_of’ 647 | #define to_phy_device(d) container_of(to_mdio_device(d), \ | ^~~~~~~~~~~~ ./include/linux/mdio.h:52:27: note: in expansion of macro ‘container_of’ 52 | #define to_mdio_device(d) container_of(d, struct mdio_device, dev) | ^~~~~~~~~~~~ ./include/linux/phy.h:647:39: note: in expansion of macro ‘to_mdio_device’ 647 | #define to_phy_device(d) container_of(to_mdio_device(d), \ | ^~~~~~~~~~~~~~ drivers/net/phy/phy_device.c:217:8: note: in expansion of macro ‘to_phy_device’ 217 | kfree(to_phy_device(dev)); | ^~~~~~~~~~~~~ ./include/linux/kernel.h:693:8: note: shadowed declaration is here 693 | void *__mptr = (void *)(ptr); \ | ^~~~~~ ./include/linux/phy.h:647:26: note: in expansion of macro ‘container_of’ 647 | #define to_phy_device(d) container_of(to_mdio_device(d), \ | ^~~~~~~~~~~~ drivers/net/phy/phy_device.c:217:8: note: in expansion of macro ‘to_phy_device’ 217 | kfree(to_phy_device(dev)); | ^~~~~~~~~~~~~ As they are declared in header files, these warnings are highly repetitive and very annoying (along with the one from linux/pci.h). Convert the related macros from linux/{mdio,phy}.h to static inlines to avoid self-shadowing and potentially improve bug-catching. No functional changes implied. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20210116161246.67075-1-alobakin@pm.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19vhost_net: avoid tx queue stuck when sendmsg failsYunjian Wang
Currently the driver doesn't drop a packet which can't be sent by tun (e.g bad packet). In this case, the driver will always process the same packet lead to the tx queue stuck. To fix this issue: 1. in the case of persistent failure (e.g bad packet), the driver can skip this descriptor by ignoring the error. 2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM), the driver schedules the worker to try again. Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/1610685980-38608-1-git-send-email-wangyunjian@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net: hns: fix variable used when DEBUG is definedTom Rix
When DEBUG is defined this error occurs drivers/net/ethernet/hisilicon/hns/hns_enet.c:1505:36: error: ‘struct net_device’ has no member named ‘ae_handle’; did you mean ‘rx_handler’? assert(skb->queue_mapping < ndev->ae_handle->q_num); ^~~~~~~~~ ae_handle is an element of struct hns_nic_priv, so change ndev to priv. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20210117191044.533725-1-trix@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18arcnet: fix macro name when DEBUG is definedTom Rix
When DEBUG is defined this error occurs drivers/net/arcnet/com20020_cs.c:70:15: error: ‘com20020_REG_W_ADDR_HI’ undeclared (first use in this function); did you mean ‘COM20020_REG_W_ADDR_HI’? ioaddr, com20020_REG_W_ADDR_HI); ^~~~~~~~~~~~~~~~~~~~~~ From reviewing the context, the suggestion is what is meant. Signed-off-by: Tom Rix <trix@redhat.com> Acked-by: Joe Perches <joe@perches.com> Link: https://lore.kernel.org/r/20210117181519.527625-1-trix@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18Merge branch 'tls-device-offload-for-bond'Jakub Kicinski
Tariq Toukan says: ==================== TLS device offload for Bond This series opens TX and RX TLS device offload for bond interfaces. This allows bond interfaces to benefit from capable lower devices. We add a new ndo_sk_get_lower_dev() to be used to get the lower dev that corresponds to a given socket. The TLS module uses it to interact directly with the lowest device in chain, and invoke the control operations in tlsdev_ops. This means that the bond interface doesn't have his own struct tlsdev_ops instance and derived logic/callbacks. To keep simple track of the HW and SW TLS contexts, we bind each socket to a specific lower device for the socket's whole lifetime. This is logically valid (and similar to the SW kTLS behavior) in the following bond configuration, so we restrict the offload support to it: ((mode == balance-xor) or (mode == 802.3ad)) and xmit_hash_policy == layer3+4. In this design, TLS TX/RX offload feature flags of the bond device are independent from the lower devices. They reflect the current features state, but are not directly controllable. This is because the bond driver is bypassed by the call to ndo_sk_get_lower_dev(), without him knowing who the caller is. The bond TLS feature flags are set/cleared only according to the configuration of the mode and xmit_hash_policy. Bypass is true only for the control flow. Packets in fast path still go through the bond logic. The design here differs from the xfrm/ipsec offload, where the bond driver has his own copy of struct xfrmdev_ops and callbacks. ==================== Link: https://lore.kernel.org/r/20210117145949.8632-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net/tls: Except bond interface from some TLS checksTariq Toukan
In the tls_dev_event handler, ignore tlsdev_ops requirement for bond interfaces, they do not exist as the interaction is done directly with the lower device. Also, make the validate function pass when it's called with the upper bond interface. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net/tls: Device offload to use lowest netdevice in chainTariq Toukan
Do not call the tls_dev_ops of upper devices. Instead, ask them for the proper lowest device and communicate with it directly. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net/bonding: Declare TLS RX device offload supportTariq Toukan
Following the description in previous patch (for TX): As the bond interface is being bypassed by the TLS module, interacting directly against the lower devs, there is no way for the bond interface to disable its device offload capabilities, as long as the mode/policy config allows it. Hence, the feature flag is not directly controllable, but just reflects the offload status based on the logic under bond_sk_check(). Here we just declare RX device offload support, and expose it via the NETIF_F_HW_TLS_RX flag. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net/bonding: Implement TLS TX device offloadTariq Toukan
Implement TLS TX device offload for bonding interfaces. This allows kTLS sockets running on a bond to benefit from the device offload on capable lower devices. To allow a simple and fast maintenance of the TLS context in SW and lower devices, we bind the TLS socket to a specific lower dev. To achieve a behavior similar to SW kTLS, we support only balance-xor and 802.3ad modes, with xmit_hash_policy=layer3+4. This is enforced in bond_sk_check(), done in a previous patch. For the above configuration, the SW implementation keeps picking the same exact lower dev for all the socket's SKBs. The device offload behaves similarly, making the decision once at the connection creation. Per socket, the TLS module should work directly with the lowest netdev in chain, to call the tls_dev_ops operations. As the bond interface is being bypassed by the TLS module, interacting directly against the lower devs, there is no way for the bond interface to disable its device offload capabilities, as long as the mode/policy config allows it. Hence, the feature flag is not directly controllable, but just reflects the current offload status based on the logic under bond_sk_check(). Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net/bonding: Take update_features call out of XFRM funcitonTariq Toukan
In preparation for more cases that call netdev_update_features(). While here, move the features logic to the stage where struct bond is already updated, and pass it as the only parameter to function bond_set_xfrm_features(). Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net/bonding: Implement ndo_sk_get_lower_devTariq Toukan
Add ndo_sk_get_lower_dev() implementation for bond interfaces. Support only for the cases where the socket's and SKBs' hash yields identical value for the whole connection lifetime. Here we restrict it to L3+4 sockets only, with xmit_hash_policy==LAYER34 and bond modes xor/802.3ad. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net/bonding: Take IP hash logic into a helperTariq Toukan
Hash logic on L3 will be used in a downstream patch for one more use case. Take it to a function for a better code reuse. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net: netdevice: Add operation ndo_sk_get_lower_devTariq Toukan
ndo_sk_get_lower_dev returns the lower netdev that corresponds to a given socket. Additionally, we implement a helper netdev_sk_get_lowest_dev() to get the lowest one in chain. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net/qla3xxx: switch from 'pci_' to 'dma_' APIChristophe JAILLET
The wrappers in include/linux/pci-dma-compat.h should go away. The patch has been generated with the coccinelle script below and has been hand modified to replace GFP_ with a correct flag. It has been compile tested. When memory is allocated in 'ql_alloc_net_req_rsp_queues()' GFP_KERNEL can be used because it is only called from 'ql_alloc_mem_resources()' which already calls 'ql_alloc_buffer_queues()' which uses GFP_KERNEL. (see below) When memory is allocated in 'ql_alloc_buffer_queues()' GFP_KERNEL can be used because this flag is already used just a few line above. When memory is allocated in 'ql_alloc_small_buffers()' GFP_KERNEL can be used because it is only called from 'ql_alloc_mem_resources()' which already calls 'ql_alloc_buffer_queues()' which uses GFP_KERNEL. (see above) When memory is allocated in 'ql_alloc_mem_resources()' GFP_KERNEL can be used because this function already calls 'ql_alloc_buffer_queues()' which uses GFP_KERNEL. (see above) While at it, use 'dma_set_mask_and_coherent()' instead of 'dma_set_mask()/ dma_set_coherent_mask()' in order to slightly simplify code. @@ @@ - PCI_DMA_BIDIRECTIONAL + DMA_BIDIRECTIONAL @@ @@ - PCI_DMA_TODEVICE + DMA_TO_DEVICE @@ @@ - PCI_DMA_FROMDEVICE + DMA_FROM_DEVICE @@ @@ - PCI_DMA_NONE + DMA_NONE @@ expression e1, e2, e3; @@ - pci_alloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3; @@ - pci_zalloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3, e4; @@ - pci_free_consistent(e1, e2, e3, e4) + dma_free_coherent(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_single(e1, e2, e3, e4) + dma_map_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_single(e1, e2, e3, e4) + dma_unmap_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4, e5; @@ - pci_map_page(e1, e2, e3, e4, e5) + dma_map_page(&e1->dev, e2, e3, e4, e5) @@ expression e1, e2, e3, e4; @@ - pci_unmap_page(e1, e2, e3, e4) + dma_unmap_page(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_sg(e1, e2, e3, e4) + dma_map_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_sg(e1, e2, e3, e4) + dma_unmap_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_cpu(e1, e2, e3, e4) + dma_sync_single_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_device(e1, e2, e3, e4) + dma_sync_single_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_cpu(e1, e2, e3, e4) + dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_device(e1, e2, e3, e4) + dma_sync_sg_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2; @@ - pci_dma_mapping_error(e1, e2) + dma_mapping_error(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_dma_mask(e1, e2) + dma_set_mask(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_consistent_dma_mask(e1, e2) + dma_set_coherent_mask(&e1->dev, e2) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20210117081542.560021-1-christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net_sched: fix RTNL deadlock again caused by request_module()Cong Wang
tcf_action_init_1() loads tc action modules automatically with request_module() after parsing the tc action names, and it drops RTNL lock and re-holds it before and after request_module(). This causes a lot of troubles, as discovered by syzbot, because we can be in the middle of batch initializations when we create an array of tc actions. One of the problem is deadlock: CPU 0 CPU 1 rtnl_lock(); for (...) { tcf_action_init_1(); -> rtnl_unlock(); -> request_module(); rtnl_lock(); for (...) { tcf_action_init_1(); -> tcf_idr_check_alloc(); // Insert one action into idr, // but it is not committed until // tcf_idr_insert_many(), then drop // the RTNL lock in the _next_ // iteration -> rtnl_unlock(); -> rtnl_lock(); -> a_o->init(); -> tcf_idr_check_alloc(); // Now waiting for the same index // to be committed -> request_module(); -> rtnl_lock() // Now waiting for RTNL lock } rtnl_unlock(); } rtnl_unlock(); This is not easy to solve, we can move the request_module() before this loop and pre-load all the modules we need for this netlink message and then do the rest initializations. So the loop breaks down to two now: for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { struct tc_action_ops *a_o; a_o = tc_action_load_ops(name, tb[i]...); ops[i - 1] = a_o; } for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { act = tcf_action_init_1(ops[i - 1]...); } Although this looks serious, it only has been reported by syzbot, so it seems hard to trigger this by humans. And given the size of this patch, I'd suggest to make it to net-next and not to backport to stable. This patch has been tested by syzbot and tested with tdc.py by me. Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together") Reported-and-tested-by: syzbot+82752bc5331601cf4899@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+b3b63b6bff456bd95294@syzkaller.appspotmail.com Reported-by: syzbot+ba67b12b1ca729912834@syzkaller.appspotmail.com Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Tested-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20210117005657.14810-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net: phy: national: remove definition of DEBUGTom Rix
Defining DEBUG should only be done in development. So remove DEBUG. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20210115235346.289611-1-trix@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18Merge branch 'net-make-udp-tunnel-devices-support-fraglist'Jakub Kicinski
Xin Long says: ==================== net: make udp tunnel devices support fraglist Like GRE device, UDP tunnel devices should also support fraglist, so that some protocol (like SCTP) HW GSO that requires NETIF_F_FRAGLIST in the dev can work. Especially when the lower device support both NETIF_F_GSO_UDP_TUNNEL and NETIF_F_GSO_SCTP. ==================== Link: https://lore.kernel.org/r/cover.1610704037.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18bareudp: add NETIF_F_FRAGLIST flag for dev featuresXin Long
Like vxlan and geneve, bareudp also needs this dev feature to support some protocol's HW GSO. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18geneve: add NETIF_F_FRAGLIST flag for dev featuresXin Long
Some protocol HW GSO requires fraglist supported by the device, like SCTP. Without NETIF_F_FRAGLIST set in the dev features of geneve, it would have to do SW GSO before the packets enter the driver, even when the geneve dev and lower dev (like veth) both have the feature of NETIF_F_GSO_SCTP. So this patch is to add it for geneve. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18vxlan: add NETIF_F_FRAGLIST flag for dev featuresXin Long
Some protocol HW GSO requires fraglist supported by the device, like SCTP. Without NETIF_F_FRAGLIST set in the dev features of vxlan, it would have to do SW GSO before the packets enter the driver, even when the vxlan dev and lower dev (like veth) both have the feature of NETIF_F_GSO_SCTP. So this patch is to add it for vxlan. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18hv_netvsc: Add (more) validation for untrusted Hyper-V valuesAndrea Parri (Microsoft)
For additional robustness in the face of Hyper-V errors or malicious behavior, validate all values that originate from packets that Hyper-V has sent to the guest. Ensure that invalid values cannot cause indexing off the end of an array, or subvert an existing validation via integer overflow. Ensure that outgoing packets do not have any leftover guest memory that has not been zeroed out. Reported-by: Juan Vazquez <juvazq@microsoft.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20210114202628.119541-1-parri.andrea@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net: bridge: check vlan with eth_type_vlan() methodMenglong Dong
Replace some checks for ETH_P_8021Q and ETH_P_8021AD with eth_type_vlan(). Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210117080950.122761-1-dong.menglong@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18Merge branch 'net-ipa-interconnect-improvements'Jakub Kicinski
Alex Elder says: ==================== net: ipa: interconnect improvements The main outcome of this series is to allow the number of interconnects used by the IPA to differ from the three that are implemented now. With this series in place, any number of interconnects can now be used, all specified in the configuration data for a specific platform. A few minor interconnect-related cleanups are implemented as well. ==================== Link: https://lore.kernel.org/r/20210115125050.20555-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net: ipa: allow arbitrary number of interconnectsAlex Elder
Currently we assume that the IPA hardware has exactly three interconnects. But that won't be guaranteed for all platforms, so allow any number of interconnects to be specified in the configuration data. For each platform, define an array of interconnect data entries (still associated with the IPA clock structure), and record the number of entries initialized in that array. Loop over all entries in this array when initializing, enabling, disabling, or tearing down the set of interconnects. With this change we no longer need the ipa_interconnect_id enumerated type, so get rid of it. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net: ipa: clean up interconnect initializationAlex Elder
Pass an the address of an IPA interconnect structure and its configuration data to ipa_interconnect_init_one() and have that function initialize all the structure's fields. Change the function to simply return an error code. Introduce ipa_interconnect_exit_one() to encapsulate the cleanup of an IPA interconnect structure. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net: ipa: add interconnect name to configuration dataAlex Elder
Add the name to the configuration data for each interconnect. Use this information rather than a constant string during initialization. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net: ipa: store average and peak interconnect bandwidthAlex Elder
Add fields in the ipa_interconnect structure to hold the average and peak bandwidth values for the interconnect. Pass the configuring data for interconnects to ipa_interconnect_init() so these values can be recorded, and use them when enabling the interconnects. There's no longer any need to keep a copy of the interconnect data after initialization. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net: ipa: introduce an IPA interconnect structureAlex Elder
Rather than having separate pointers for the memory, imem, and config interconnect paths, maintain an array of ipa_interconnect structures each of which contains a pointer to a path. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net: ipa: don't return an error from ipa_interconnect_disable()Alex Elder
If disabling interconnects fails there's not a lot we can do. The only two callers of ipa_interconnect_disable() ignore the return value, so just give the function a void return type. Print an error message if disabling any of the interconnects is not successful. Return (and print) only the first error seen. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>