summaryrefslogtreecommitdiff
path: root/drivers/net
AgeCommit message (Collapse)Author
2021-02-01net/mlx5e: Increase indirection RQ table size to 256Noam Stolero
Increasing the indirection RQ table size from 128 to 256 improves the packet distribution over the NIC HW queues for various cases. Let's take a look at the following scenario: Assuming RSS result distributed uniformly and indirection table is filled with queues in a cyclic manner. Let N be the number of queues on a given setup. If 256%N = 128%N = 0, then all queues have the same probability to be chosen for a given RSS result. This case doesn't improves nor degrade by this change. If 256%N != 0 and 128%N != 0, there is a remainder which will favor some queues. Increasing the indirection RQ table size to 256 reduce the ratio between the favored queues probability to be selected to the rest of the queues and improves the distribution. For example, let's assume the number of queues is 56. For a table size of 128, we have 128%56=16 queues which will have a 3/128 probability to be chosen and 2/128 for the rest 40. 16 queues have 1.5 times the probability to be chosen over the other 40. For a table size of 256, we have 256%56=32 queues which will have a 5/256 probability to be chosen and 4/256 probability for the rest 24 queues. Here 32 queues have 1.25 more probability to be chosen over the other 24. This shows that the larger indirection table size would more likely cause an even distribution. This change also aligns our mlx5 driver's indirection table size with other vendors. Signed-off-by: Noam Stolero <noams@nvidia.com> Reviewed-by: Tal Gilboa <talgi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-01net/mlx5e: Enable napi in channel's activation stageTariq Toukan
The channel's napi is first needed upon activation, not creation. Minimize its enabled scope by moving it from the channel's open/close stage into the activate/deactivate stage. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-01net/mlx5e: Move representor neigh init into profile enableRoi Dayan
Also cleanup neigh in profile disable. This is for logical separation. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-01net/mlx5e: Avoid false lock depenency warning on tc_htRoi Dayan
To avoid false lock dependency warning set the tc_ht lock class different than the lock class of the ht being used when deleting last flow from a group and then deleting a group, we get into del_sw_flow_group() which call rhashtable_destroy on fg->ftes_hash which will take ht->mutex but it's different than the ht->mutex here. ====================================================== WARNING: possible circular locking dependency detected 5.11.0-rc4_net_next_mlx5_949fdcc #1 Not tainted ------------------------------------------------------ modprobe/12950 is trying to acquire lock: ffff88816510f910 (&node->lock){++++}-{3:3}, at: mlx5_del_flow_rules+0x2a/0x210 [mlx5_core] but task is already holding lock: ffff88815834e3e8 (&ht->mutex){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x37/0x340 which lock already depends on the new lock. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-01net/mlx5e: Move set vxlan nic info to profile initRoi Dayan
Since its profile dependent let's init the vxlan info as part of profile initialization. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-01net/mlx5e: Move netif_carrier_off() out of mlx5e_priv_init()Roi Dayan
It's not part of priv initialization. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-01net/mlx5e: Refactor mlx5e_netdev_init/cleanup to mlx5e_priv_init/cleanupRoi Dayan
We actually initialize priv and not netdev. The only call to set netdev carrier will be moved in the following commit. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-01net/mxl5e: Add change profile methodSaeed Mahameed
Port nic netdevice will be used as uplink representor in downstream patches. Add change profile method to allow changing a mlx5e netdevice profile dynamically. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com>
2021-02-01net/mlx5e: Separate between netdev objects and mlx5e profiles initializationSaeed Mahameed
1) Initialize netdevice features and structures on netdevice allocation and outside of the mlx5e profile. 2) As now mlx5e netdevice private params will be setup on profile init only after netdevice features are already set, we add a call to netde_update_features() to resolve any conflict. This is nice since we reuse the fix_features ndo code if a profile wants different default features, instead of duplicating features conflict resolution code on profile initialization. 3) With this we achieve total separation between mlx5e profiles and netdevices, and will allow replacing mlx5e profiles on the fly to reuse the same netdevice for multiple profiles. e.g. for uplink representor profile as shown in the following patch 4) Profile callbacks are not allowed to touch netdev->features directly anymore, since in downstream patch we will detach/attach netdev dynamically to profile, hence we move the code dealing with netdev->features from profile->init() to fix_features ndo, and we will call netdev_update_features() on mlx5e_attach_netdev(profile, netdev); Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com>
2021-02-01Merge branch '1GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-02-01 This series contains updates to igc and i40e drivers. Kai-Heng Feng fixes igc to report unknown speed and duplex during suspend as an attempted read will cause errors. Kevin Lo sets the default value to -IGC_ERR_NVM instead of success for writing shadow RAM as this could miss a timeout. Also propagates the return value for Flow Control configuration to properly pass on errors for igc. Aleksandr reverts commit 2ad1274fa35a ("i40e: don't report link up for a VF who hasn't enabled queues") as this can cause link flapping. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues" igc: check return value of ret_val in igc_config_fc_after_link_up igc: set the default return value to -IGC_ERR_NVM in igc_write_nvm_srwr igc: Report speed and duplex as unknown when device is runtime suspended ==================== Link: https://lore.kernel.org/r/20210201214618.852831-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01ibmvnic: remove unnecessary rmb() inside ibmvnic_pollLijun Pan
rmb() can be removed since: 1. pending_scrq() has dma_rmb() at the function end; 2. dma_rmb(), though weaker, is enough here. Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Acked-by: Dwip Banerjee <dnbanerg@us.ibm.com> Acked-by: Thomas Falcon <tlfalcon@linux.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01ibmvnic: rework to ensure SCRQ entry reads are properly orderedLijun Pan
Move the dma_rmb() between pending_scrq() and ibmvnic_next_scrq() into the end of pending_scrq() to save the duplicated code since this dma_rmb will be used 3 times. Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Acked-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01Merge tag 'mlx5-dr-2021-01-29' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-dr-2021-01-29 Add support for Connect-X6DX Software steering This series adds SW Steering support for Connect-X6DX. Since the STE and actions formats are different on this new HW, we implemented the HW specific STEv1 layer on the infrastructure implemented in previous mlx5 DR patchset to support all the functionalities as previous devices. Most of the code in this series very is low level HW specific, we implement the function pointers for the generic SW steering layer. * tag 'mlx5-dr-2021-01-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: DR, Allow SW steering for sw_owner_v2 devices net/mlx5: DR, Copy all 64B whenever replacing STE in the head of miss-list net/mlx5: DR, Use HW specific logic API when writing STE net/mlx5: DR, Use the right size when writing partial STE into HW net/mlx5: DR, Add STEv1 modify header logic net/mlx5: DR, Add STEv1 action apply logic net/mlx5: DR, Add STEv1 setters and getters net/mlx5: DR, Allow native protocol support for HW STEv1 net/mlx5: DR, Add HW STEv1 match logic net/mlx5: DR, Add match STEv1 structs to ifc net/mlx5: DR, Fix potential shift wrapping of 32-bit value ==================== Link: https://lore.kernel.org/r/20210130022618.317351-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01ibmvnic: device remove has higher precedence over resetLijun Pan
Returning -EBUSY in ibmvnic_remove() does not actually hold the removal procedure since driver core doesn't care for the return value (see __device_release_driver() in drivers/base/dd.c calling dev->bus->remove()) though vio_bus_remove (in arch/powerpc/platforms/pseries/vio.c) records the return value and passes it on. [1] During the device removal precedure, checking for resetting bit is dropped so that we can continue executing all the cleanup calls in the rest of the remove function. Otherwise, it can cause latent memory leaks and kernel crashes. [1] https://lore.kernel.org/linuxppc-dev/20210117101242.dpwayq6wdgfdzirl@pengutronix.de/T/#m48f5befd96bc9842ece2a3ad14f4c27747206a53 Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Fixes: 7d7195a026ba ("ibmvnic: Do not process device remove during device reset") Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Link: https://lore.kernel.org/r/20210129043402.95744-1-ljp@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01net: dsa: hellcreek: Report FDB table occupancyKurt Kanzenbach
Report the FDB table size and occupancy via devlink. The actual size depends on the used Hellcreek version: |root@tsn:~# devlink resource show platform/ff240000.switch |platform/ff240000.switch: | name VLAN size 4096 occ 2 unit entry dpipe_tables none | name FDB size 256 occ 6 unit entry dpipe_tables none Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Kurt Kanzenbach <kurt@kmk-computers.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01net: dsa: hellcreek: Report VLAN table occupancyKurt Kanzenbach
The VLAN membership configuration is cached in software already. So, it can be reported via devlink. Add support for it: |root@tsn:~# devlink resource show platform/ff240000.switch |platform/ff240000.switch: | name VLAN size 4096 occ 4 unit entry dpipe_tables none Signed-off-by: Kurt Kanzenbach <kurt@kmk-computers.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01net: dsa: mv88e6xxx: override existent unicast portvec in port_fdb_addDENG Qingfang
Having multiple destination ports for a unicast address does not make sense. Make port_db_load_purge override existent unicast portvec instead of adding a new port bit. Fixes: 884729399260 ("net: dsa: mv88e6xxx: handle multiple ports in ATU") Signed-off-by: DENG Qingfang <dqfext@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20210130134334.10243-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues"Aleksandr Loktionov
This reverts commit 2ad1274fa35ace5c6360762ba48d33b63da2396c VF queues were not brought up when PF was brought up after being downed if the VF driver disabled VFs queues during PF down. This could happen in some older or external VF driver implementations. The problem was that PF driver used vf->queues_enabled as a condition to decide what link-state it would send out which caused the issue. Remove the check for vf->queues_enabled in the VF link notify. Now VF will always be notified of the current link status. Also remove the queues_enabled member from i40e_vf structure as it is not used anymore. Otherwise VNF implementation was broken and caused a link flap. The original commit was a workaround to avoid breaking existing VFs though it's really a fault of the VF code not the PF. The commit should be safe to revert as all of the VFs we know of have been fixed. Also, since we now know there is a related bug in the workaround, removing it is preferred. Fixes: 2ad1274fa35a ("i40e: don't report link up for a VF who hasn't enabled") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-01igc: check return value of ret_val in igc_config_fc_after_link_upKevin Lo
Check return value from ret_val to make error check actually work. Fixes: 4eb8080143a9 ("igc: Add setup link functionality") Signed-off-by: Kevin Lo <kevlo@kevlo.org> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-01igc: set the default return value to -IGC_ERR_NVM in igc_write_nvm_srwrKevin Lo
This patch sets the default return value to -IGC_ERR_NVM in igc_write_nvm_srwr. Without this change it wouldn't lead to a shadow RAM write EEWR timeout. Fixes: ab4056126813 ("igc: Add NVM support") Signed-off-by: Kevin Lo <kevlo@kevlo.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-01igc: Report speed and duplex as unknown when device is runtime suspendedKai-Heng Feng
Similar to commit 165ae7a8feb5 ("igb: Report speed and duplex as unknown when device is runtime suspended"), if we try to read speed and duplex sysfs while the device is runtime suspended, igc will complain and stops working: [ 123.449883] igc 0000:03:00.0 enp3s0: PCIe link lost, device now detached [ 123.450052] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 123.450056] #PF: supervisor read access in kernel mode [ 123.450058] #PF: error_code(0x0000) - not-present page [ 123.450059] PGD 0 P4D 0 [ 123.450064] Oops: 0000 [#1] SMP NOPTI [ 123.450068] CPU: 0 PID: 2525 Comm: udevadm Tainted: G U W OE 5.10.0-1002-oem #2+rkl2-Ubuntu [ 123.450078] RIP: 0010:igc_rd32+0x1c/0x90 [igc] [ 123.450080] Code: c0 5d c3 b8 fd ff ff ff c3 0f 1f 44 00 00 0f 1f 44 00 00 55 89 f0 48 89 e5 41 56 41 55 41 54 49 89 c4 53 48 8b 57 08 48 01 d0 <44> 8b 28 41 83 fd ff 74 0c 5b 44 89 e8 41 5c 41 5d 4 [ 123.450083] RSP: 0018:ffffb0d100d6fcc0 EFLAGS: 00010202 [ 123.450085] RAX: 0000000000000008 RBX: ffffb0d100d6fd30 RCX: 0000000000000000 [ 123.450087] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff945a12716c10 [ 123.450089] RBP: ffffb0d100d6fce0 R08: ffff945a12716550 R09: ffff945a09874000 [ 123.450090] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000008 [ 123.450092] R13: ffff945a12716000 R14: ffff945a037da280 R15: ffff945a037da290 [ 123.450094] FS: 00007f3b34c868c0(0000) GS:ffff945b89200000(0000) knlGS:0000000000000000 [ 123.450096] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 123.450098] CR2: 0000000000000008 CR3: 00000001144de006 CR4: 0000000000770ef0 [ 123.450100] PKRU: 55555554 [ 123.450101] Call Trace: [ 123.450111] igc_ethtool_get_link_ksettings+0xd6/0x1b0 [igc] [ 123.450118] __ethtool_get_link_ksettings+0x71/0xb0 [ 123.450123] duplex_show+0x74/0xc0 [ 123.450129] dev_attr_show+0x1d/0x40 [ 123.450134] sysfs_kf_seq_show+0xa1/0x100 [ 123.450137] kernfs_seq_show+0x27/0x30 [ 123.450142] seq_read+0xb7/0x400 [ 123.450148] ? common_file_perm+0x72/0x170 [ 123.450151] kernfs_fop_read+0x35/0x1b0 [ 123.450155] vfs_read+0xb5/0x1b0 [ 123.450157] ksys_read+0x67/0xe0 [ 123.450160] __x64_sys_read+0x1a/0x20 [ 123.450164] do_syscall_64+0x38/0x90 [ 123.450168] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 123.450170] RIP: 0033:0x7f3b351fe142 [ 123.450173] Code: c0 e9 c2 fe ff ff 50 48 8d 3d 3a ca 0a 00 e8 f5 19 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 [ 123.450174] RSP: 002b:00007fffef2ec138 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 123.450177] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3b351fe142 [ 123.450179] RDX: 0000000000001001 RSI: 00005644c047f070 RDI: 0000000000000003 [ 123.450180] RBP: 00007fffef2ec340 R08: 00005644c047f070 R09: 00007f3b352d9320 [ 123.450182] R10: 00005644c047c010 R11: 0000000000000246 R12: 00005644c047cbf0 [ 123.450184] R13: 00005644c047e6d0 R14: 0000000000000003 R15: 00007fffef2ec140 [ 123.450189] Modules linked in: rfcomm ccm cmac algif_hash algif_skcipher af_alg bnep toshiba_acpi industrialio toshiba_haps hp_accel lis3lv02d btusb btrtl btbcm btintel bluetooth ecdh_generic ecc joydev input_leds nls_iso8859_1 snd_sof_pci snd_sof_intel_byt snd_sof_intel_ipc snd_sof_intel_hda_common snd_soc_hdac_hda snd_hda_codec_hdmi snd_sof_xtensa_dsp snd_sof_intel_hda snd_sof snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec snd_hda_core ath10k_pci snd_hwdep intel_rapl_msr intel_rapl_common ath10k_core soundwire_bus snd_soc_core x86_pkg_temp_thermal ath intel_powerclamp snd_compress ac97_bus snd_pcm_dmaengine mac80211 snd_pcm coretemp snd_seq_midi snd_seq_midi_event snd_rawmidi kvm_intel cfg80211 snd_seq snd_seq_device snd_timer mei_hdcp kvm libarc4 snd crct10dif_pclmul ghash_clmulni_intel aesni_intel mei_me dell_wmi [ 123.450266] dell_smbios soundcore sparse_keymap dcdbas crypto_simd cryptd mei dell_uart_backlight glue_helper ee1004 wmi_bmof intel_wmi_thunderbolt dell_wmi_descriptor mac_hid efi_pstore acpi_pad acpi_tad intel_cstate sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log hid_generic usbhid hid i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec crc32_pclmul rc_core drm intel_lpss_pci i2c_i801 ahci igc intel_lpss i2c_smbus idma64 xhci_pci libahci virt_dma xhci_pci_renesas wmi video pinctrl_tigerlake [ 123.450335] CR2: 0000000000000008 [ 123.450338] ---[ end trace 9f731e38b53c35cc ]--- The more generic approach will be wrap get_link_ksettings() with begin() and complete() callbacks, and calls runtime resume and runtime suspend routine respectively. However, igc is like igb, runtime resume routine uses rtnl_lock() which upper ethtool layer also uses. So to prevent a deadlock on rtnl, take a different approach, use pm_runtime_suspended() to avoid reading register while device is runtime suspended. Fixes: 8c5ad0dae93c ("igc: Add ethtool support") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-01-29net: dsa: felix: perform switch setup for tag_8021qVladimir Oltean
Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net: dsa: add a second tagger for Ocelot switches based on tag_8021qVladimir Oltean
There are use cases for which the existing tagger, based on the NPI (Node Processor Interface) functionality, is insufficient. Namely: - Frames injected through the NPI port bypass the frame analyzer, so no source address learning is performed, no TSN stream classification, etc. - Flow control is not functional over an NPI port (PAUSE frames are encapsulated in the same Extraction Frame Header as all other frames) - There can be at most one NPI port configured for an Ocelot switch. But in NXP LS1028A and T1040 there are two Ethernet CPU ports. The non-NPI port is currently either disabled, or operated as a plain user port (albeit an internally-facing one). Having the ability to configure the two CPU ports symmetrically could pave the way for e.g. creating a LAG between them, to increase bandwidth seamlessly for the system. So there is a desire to have an alternative to the NPI mode. This change keeps the default tagger for the Seville and Felix switches as "ocelot", but it can be changed via the following device attribute: echo ocelot-8021q > /sys/class/<dsa-master>/dsa/tagging Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net: dsa: felix: convert to the new .change_tag_protocol DSA APIVladimir Oltean
In expectation of the new tag_ocelot_8021q tagger implementation, we need to be able to do runtime switchover between one tagger and another. So we must structure the existing code for the current NPI-based tagger in a certain way. We move the felix_npi_port_init function in expectation of the future driver configuration necessary for tag_ocelot_8021q: we would like to not have the NPI-related bits interspersed with the tag_8021q bits. The conversion from this: ocelot_write_rix(ocelot, ANA_PGID_PGID_PGID(GENMASK(ocelot->num_phys_ports, 0)), ANA_PGID_PGID, PGID_UC); to this: cpu_flood = ANA_PGID_PGID_PGID(BIT(ocelot->num_phys_ports)); ocelot_rmw_rix(ocelot, cpu_flood, cpu_flood, ANA_PGID_PGID, PGID_UC); is perhaps non-trivial, but is nonetheless non-functional. The PGID_UC (replicator for unknown unicast) is already configured out of hardware reset to flood to all ports except ocelot->num_phys_ports (the CPU port module). All we change is that we use a read-modify-write to only add the CPU port module to the unknown unicast replicator, as opposed to doing a full write to the register. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net: mscc: ocelot: don't use NPI tag prefix for the CPU port moduleVladimir Oltean
Context: Ocelot switches put the injection/extraction frame header in front of the Ethernet header. When used in NPI mode, a DSA master would see junk instead of the destination MAC address, and it would most likely drop the packets. So the Ocelot frame header can have an optional prefix, which is just "ff:ff:ff:ff:ff:fe > ff:ff:ff:ff:ff:ff" padding put before the actual tag (still before the real Ethernet header) such that the DSA master thinks it's looking at a broadcast frame with a strange EtherType. Unfortunately, a lesson learned in commit 69df578c5f4b ("net: mscc: ocelot: eliminate confusion between CPU and NPI port") seems to have been forgotten in the meanwhile. The CPU port module and the NPI port have independent settings for the length of the tag prefix. However, the driver is using the same variable to program both of them. There is no reason really to use any tag prefix with the CPU port module, since that is not connected to any Ethernet port. So this patch makes the inj_prefix and xtr_prefix variables apply only to the NPI port (which the switchdev ocelot_vsc7514 driver does not use). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net: mscc: ocelot: reapply bridge forwarding mask on bonding join/leaveVladimir Oltean
Applying the bridge forwarding mask currently is done only on the STP state changes for any port. But it depends on both STP state changes, and bonding interface state changes. Export the bit that recalculates the forwarding mask so that it could be reused, and call it when a port starts and stops offloading a bonding interface. Now that the logic is split into a separate function, we can rename "p" into "port", since the "port" variable was already taken in ocelot_bridge_stp_state_set. Also, we can rename "i" into "lag", to make it more clear what is it that we're iterating through. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net: mscc: ocelot: store a namespaced VCAP filter IDVladimir Oltean
We will be adding some private VCAP filters that should not interfere in any way with the filters added using tc-flower. So we need to allocate some IDs which will not be used by tc. Currently ocelot uses an u32 id derived from the flow cookie, which in itself is an unsigned long. This is a problem in itself, since on 64 bit systems, sizeof(unsigned long)=8, so the driver is already truncating these. Create a struct ocelot_vcap_id which contains the full unsigned long cookie from tc, as well as a boolean that is supposed to namespace the filters added by tc with the ones that aren't. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net: mscc: ocelot: export VCAP structures to include/soc/msccVladimir Oltean
The Felix driver will need to preinstall some VCAP filters for its tag_8021q implementation (outside of the tc-flower offload logic), so these need to be exported to the common includes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29r8169: work around RTL8125 UDP hw bugHeiner Kallweit
It was reported that on RTL8125 network breaks under heavy UDP load, e.g. torrent traffic ([0], from comment 27). Realtek confirmed a hw bug and provided me with a test version of the r8125 driver including a workaround. Tests confirmed that the workaround fixes the issue. I modified the original version of the workaround to meet mainline code style. [0] https://bugzilla.kernel.org/show_bug.cgi?id=209839 v2: - rebased to net v3: - make rtl_skb_is_udp() more robust and use skb_header_pointer() to access the ip(v6) header v4: - remove dependency on ptp_classify.h - replace magic number with offsetof(struct udphdr, len) Fixes: f1bce4ad2f1c ("r8169: add support for RTL8125") Tested-by: xplo <xplo.bn@gmail.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/6e453d49-1801-e6de-d5f7-d7e6c7526c8f@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29vmxnet3: Remove buf_info from device accessible structuresRonak Doshi
buf_info structures in RX & TX queues are private driver data that do not need to be visible to the device. Although there is physical address and length in the queue descriptor that points to these structures, their layout is not standardized, and device never looks at them. So lets allocate these structures in non-DMA-able memory, and fill physical address as all-ones and length as zero in the queue descriptor. That should alleviate worries brought by Martin Radev in https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210104/022829.html that malicious vmxnet3 device could subvert SVM/TDX guarantees. Signed-off-by: Petr Vandrovec <petr@vmware.com> Signed-off-by: Ronak Doshi <doshir@vmware.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net: dsa: hellcreek: Add missing TAPRIO dependencyKurt Kanzenbach
Add missing dependency to TAPRIO to avoid build failures such as: |ERROR: modpost: "taprio_offload_get" [drivers/net/dsa/hirschmann/hellcreek_sw.ko] undefined! |ERROR: modpost: "taprio_offload_free" [drivers/net/dsa/hirschmann/hellcreek_sw.ko] undefined! Fixes: 24dfc6eb39b2 ("net: dsa: hellcreek: Add TAPRIO offloading support") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/r/20210128163338.22665-1-kurt@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net: arcnet: Fix RESET flag handlingAhmed S. Darwish
The main arcnet interrupt handler calls arcnet_close() then arcnet_open(), if the RESET status flag is encountered. This is invalid: 1) In general, interrupt handlers should never call ->ndo_stop() and ->ndo_open() functions. They are usually full of blocking calls and other methods that are expected to be called only from drivers init and exit code paths. 2) arcnet_close() contains a del_timer_sync(). If the irq handler interrupts the to-be-deleted timer, del_timer_sync() will just loop forever. 3) arcnet_close() also calls tasklet_kill(), which has a warning if called from irq context. 4) For device reset, the sequence "arcnet_close(); arcnet_open();" is not complete. Some children arcnet drivers have special init/exit code sequences, which then embed a call to arcnet_open() and arcnet_close() accordingly. Check drivers/net/arcnet/com20020.c. Run the device RESET sequence from a scheduled workqueue instead. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20210128194802.727770-1-a.darwish@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net: hns3: add debugfs support for tm nodes, priority and qset infoGuangbin Huang
In order to query tm info of nodes, priority and qset for debugging, adds three debugfs files tm_nodes, tm_priority and tm_qset in newly created tm directory. Unlike previous debugfs commands, these three files just support read ops, so they only support to use cat command to dump their info. The new tm file style is acccording to suggestion from Jakub Kicinski's opinion as link https://lkml.org/lkml/2020/9/29/2101. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net: hns3: add interfaces to query information of tm priority/qsetGuangbin Huang
Add some interfaces to get information of tm priority and qset, then they can be used by debugfs. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29Merge tag 'linux-can-next-for-5.12-20210129' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== linux-can-next-for-5.12-20210129 All patches are by me and target the mcp251xfd driver. The first 4 patches update the information regarding the "85% of (FSYSCLK/2)" errata. The other 4 are misc cleanups, unitfy error messages, add missing postfix to a macro, simplify the return of a function, and make use of dev_err_probe() in the mcp251xfd_probe() function. ==================== Link: https://lore.kernel.org/r/20210129084302.3040284-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net: mhi: Get rid of local rx queue countLoic Poulain
Use the new mhi_get_free_desc_count helper to track queue usage instead of relying on the locally maintained rx_queued count. Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net: mhi: Get RX queue size from MHI coreLoic Poulain
The RX queue size can be determined at runtime by retrieving the number of available transfer descriptors. Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net/ethernet: convert to use module_platform_driver in octeon_mgmt.cdingsenjie
Simplify the code by using module_platform_driver macro for octeon_mgmt. Signed-off-by: dingsenjie <dingsenjie@yulong.com> Link: https://lore.kernel.org/r/20210128035330.17676-1-dingsenjie@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net/mlx5: DR, Allow SW steering for sw_owner_v2 devicesYevgeny Kliteynik
Allow sw_owner_v2 based on sw_format_version. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-29net/mlx5: DR, Copy all 64B whenever replacing STE in the head of miss-listYevgeny Kliteynik
Till now the code assumed that need to copy reduced size of the ste because the rest is the mask part which shouldn't be changed. This is not true for all types of HW (like STEv1). Take all 64B from the new STE and write them in the replaced STE place. This change will make it easier to handle all STE HW types because we have all the data that is about to be written into HW. Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-29net/mlx5: DR, Use HW specific logic API when writing STEYevgeny Kliteynik
STEv0 format and STEv1 HW format are different, each has a different order: STEv0: CTRL 32B, TAG 16B, BITMASK 16B STEv1: CTRL 32B, BITMASK 16B, TAG 16B To make this transparent to upper layers we introduce a new ste_ctx function to format the STE prior to writing it. Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-29net/mlx5: DR, Use the right size when writing partial STE into HWYevgeny Kliteynik
In these cases we need to update only the ctrl area of the STE. So it is better to write only the control 32B and avoid copying the unneeded reduced 48B (control 32B + tag 16B). Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-29net/mlx5: DR, Add STEv1 modify header logicYevgeny Kliteynik
Add HW specific modify header fields and logic to STEv1 file. Since STEv0 and STEv1 modify actions values are different, each version has its own implementation. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-29net/mlx5: DR, Add STEv1 action apply logicYevgeny Kliteynik
Add HW specific action apply logic to STEv1. Since STEv0 and STEv1 actions format is different, each version has its implementation. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-29net/mlx5: DR, Add STEv1 setters and gettersYevgeny Kliteynik
Add HW specific setter and getters to STEv1 file. Since STEv0 and STEv1 format are different, each version should implemented different setters and getters. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-29net/mlx5: DR, Allow native protocol support for HW STEv1Yevgeny Kliteynik
Some flex parser protocols are native as part of STEv1. The check for supported protocols was modified to allow this. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-29net/mlx5: DR, Add HW STEv1 match logicYevgeny Kliteynik
Add STEv1 match logic to a new file. This file will be used for HW specific STEv1. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-29net/mlx5: DR, Add match STEv1 structs to ifcYevgeny Kliteynik
Add mlx5_ifc_dr_ste_v1.h - a new header with HW specific STE structs for version 1. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-29net/mlx5: DR, Fix potential shift wrapping of 32-bit valueYevgeny Kliteynik
Fix 32-bit variable shift wrapping in dr_ste_v0_get_miss_addr. Fixes: 6b93b400aa88 ("net/mlx5: DR, Move STEv0 setters and getters") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-29net/mlx5: CT: Add support for matching on ct_state reply flagPaul Blakey
Add support for matching on ct_state reply flag. Example: $ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \ ct_state +trk+est+rpl \ action mirred egress redirect dev ens1f0_1 $ tc filter add dev ens1f0_1 ingress prio 1 chain 1 proto ip flower \ ct_state +trk+est-rpl \ action mirred egress redirect dev ens1f0_0 Signed-off-by: Paul Blakey <paulb@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>