summaryrefslogtreecommitdiff
path: root/net/dsa/slave.c
AgeCommit message (Collapse)Author
2021-05-10net: dsa: fix error code getting shifted with 4 in dsa_slave_get_sset_countVladimir Oltean
DSA implements a bunch of 'standardized' ethtool statistics counters, namely tx_packets, tx_bytes, rx_packets, rx_bytes. So whatever the hardware driver returns in .get_sset_count(), we need to add 4 to that. That is ok, except that .get_sset_count() can return a negative error code, for example: b53_get_sset_count -> phy_ethtool_get_sset_count -> return -EIO -EIO is -5, and with 4 added to it, it becomes -1, aka -EPERM. One can imagine that certain error codes may even become positive, although based on code inspection I did not see instances of that. Check the error code first, if it is negative return it as-is. Based on a similar patch for dsa_master_get_strings from Dan Carpenter: https://patchwork.kernel.org/project/netdevbpf/patch/YJaSe3RPgn7gKxZv@mwanda/ Fixes: 91da11f870f0 ("net: Distributed Switch Architecture protocol support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27net: dsa: free skb->cb usage in core driverYangbo Lu
Free skb->cb usage in core driver and let device drivers decide to use or not. The reason having a DSA_SKB_CB(skb)->clone was because dsa_skb_tx_timestamp() which may set the clone pointer was called before p->xmit() which would use the clone if any, and the device driver has no way to initialize the clone pointer. This patch just put memset(skb->cb, 0, sizeof(skb->cb)) at beginning of dsa_slave_xmit(). Some new features in the future, like one-step timestamp may need more bytes of skb->cb to use in dsa_skb_tx_timestamp(), and p->xmit(). Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27net: dsa: no longer clone skb in core driverYangbo Lu
It was a waste to clone skb directly in dsa_skb_tx_timestamp(). For one-step timestamping, a clone was not needed. For any failure of port_txtstamp (this may usually happen), the skb clone had to be freed. So this patch moves skb cloning for tx timestamp out of dsa core, and let drivers clone skb in port_txtstamp if they really need. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27net: dsa: no longer identify PTP packet in core driverYangbo Lu
Move ptp_classify_raw out of dsa core driver for handling tx timestamp request. Let device drivers do this if they want. Not all drivers want to limit tx timestamping for only PTP packet. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27net: dsa: check tx timestamp request in core driverYangbo Lu
Check tx timestamp request in core driver at very beginning of dsa_skb_tx_timestamp(), so that most skbs not requiring tx timestamp just return. And drop such checking in device drivers. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20net: dsa: enable selftest support for all switches by defaultOleksij Rempel
Most of generic selftest should be able to work with probably all ethernet controllers. The DSA switches are not exception, so enable it by default at least for DSA. This patch was tested with SJA1105 and AR9331. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: bridge: switchdev: include local flag in FDB notificationsVladimir Oltean
As explained in bugfix commit 6ab4c3117aec ("net: bridge: don't notify switchdev for local FDB addresses") as well as in this discussion: https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/ the switchdev notifiers for FDB entries managed to have a zero-day bug, which was that drivers would not know what to do with local FDB entries, because they were not told that they are local. The bug fix was to simply not notify them of those addresses. Let us now add the 'is_local' bit to bridge FDB entries, and make all drivers ignore these entries by their own choice. Co-developed-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13of: net: pass the dst buffer to of_get_mac_address()Michael Walle
of_get_mac_address() returns a "const void*" pointer to a MAC address. Lately, support to fetch the MAC address by an NVMEM provider was added. But this will only work with platform devices. It will not work with PCI devices (e.g. of an integrated root complex) and esp. not with DSA ports. There is an of_* variant of the nvmem binding which works without devices. The returned data of a nvmem_cell_read() has to be freed after use. On the other hand the return of_get_mac_address() points to some static data without a lifetime. The trick for now, was to allocate a device resource managed buffer which is then returned. This will only work if we have an actual device. Change it, so that the caller of of_get_mac_address() has to supply a buffer where the MAC address is written to. Unfortunately, this will touch all drivers which use the of_get_mac_address(). Usually the code looks like: const char *addr; addr = of_get_mac_address(np); if (!IS_ERR(addr)) ether_addr_copy(ndev->dev_addr, addr); This can then be simply rewritten as: of_get_mac_address(np, ndev->dev_addr); Sometimes is_valid_ether_addr() is used to test the MAC address. of_get_mac_address() already makes sure, it just returns a valid MAC address. Thus we can just test its return code. But we have to be careful if there are still other sources for the MAC address before the of_get_mac_address(). In this case we have to keep the is_valid_ether_addr() call. The following coccinelle patch was used to convert common cases to the new style. Afterwards, I've manually gone over the drivers and fixed the return code variable: either used a new one or if one was already available use that. Mansour Moufid, thanks for that coccinelle patch! <spml> @a@ identifier x; expression y, z; @@ - x = of_get_mac_address(y); + x = of_get_mac_address(y, z); <... - ether_addr_copy(z, x); ...> @@ identifier a.x; @@ - if (<+... x ...+>) {} @@ identifier a.x; @@ if (<+... x ...+>) { ... } - else {} @@ identifier a.x; expression e; @@ - if (<+... x ...+>@e) - {} - else + if (!(e)) {...} @@ expression x, y, z; @@ - x = of_get_mac_address(y, z); + of_get_mac_address(y, z); ... when != x </spml> All drivers, except drivers/net/ethernet/aeroflex/greth.c, were compile-time tested. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24dsa: slave: add support for TC_SETUP_FTPablo Neira Ayuso
The dsa infrastructure provides a well-defined hierarchy of devices, pass up the call to set up the flow block to the master device. From the software dataplane, the netfilter infrastructure uses the dsa slave devices to refer to the input and output device for the given skbuff. Similarly, the flowtable definition in the ruleset refers to the dsa slave port devices. This patch adds the glue code to call ndo_setup_tc with TC_SETUP_FT with the master device via the dsa slave devices. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24net: dsa: resolve forwarding path for dsa slave portsFelix Fietkau
Add .ndo_fill_forward_path for dsa slave port devices Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23net: dsa: sync up switchdev objects and port attributes when joining the bridgeVladimir Oltean
If we join an already-created bridge port, such as a bond master interface, then we can miss the initial switchdev notifications emitted by the bridge for this port, while it wasn't offloaded by anybody. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23net: dsa: pass extack to dsa_port_{bridge,lag}_joinVladimir Oltean
This is a pretty noisy change that was broken out of the larger change for replaying switchdev attributes and objects at bridge join time, which is when these extack objects are actually used. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08net: dsa: fix switchdev objects on bridge master mistakenly being applied on ↵Vladimir Oltean
ports Tobias reports that after the blamed patch, VLAN objects being added to a bridge device are being added to all slave ports instead (swp2, swp3). ip link add br0 type bridge vlan_filtering 1 ip link set swp2 master br0 ip link set swp3 master br0 bridge vlan add dev br0 vid 100 self This is because the fix was too broad: we made dsa_port_offloads_netdev say "yes, I offload the br0 bridge" for all slave ports, but we didn't add the checks whether the switchdev object was in fact meant for the physical port or for the bridge itself. So we are reacting on events in a way in which we shouldn't. The reason why the fix was too broad is because the question itself, "does this DSA port offload this netdev", was too broad in the first place. The solution is to disambiguate the question and separate it into two different functions, one to be called for each switchdev attribute / object that has an orig_dev == net_bridge (dsa_port_offloads_bridge), and the other for orig_dev == net_bridge_port (*_offloads_bridge_port). In the case of VLAN objects on the bridge interface, this solves the problem because we know that VLAN objects are per bridge port and not per bridge. And when orig_dev is equal to the net_bridge, we offload it as a bridge, but not as a bridge port; that's how we are able to skip reacting on those events. Note that this is compatible with future plans to have explicit offloading of VLAN objects on the bridge interface as a bridge port (in DSA, this signifies that we should add that VLAN towards the CPU port). Fixes: 99b8202b179f ("net: dsa: fix SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING getting ignored") Reported-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Tested-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16net: dsa: add MRP supportHoratiu Vultur
Add support for offloading MRP in HW. Currently implement the switchdev calls 'SWITCHDEV_OBJ_ID_MRP', 'SWITCHDEV_OBJ_ID_RING_ROLE_MRP', to allow to create MRP instances and to set the role of these instances. Add DSA_NOTIFIER_MRP_ADD/DEL and DSA_NOTIFIER_MRP_ADD/DEL_RING_ROLE which calls to .port_mrp_add/del and .port_mrp_add/del_ring_role in the DSA driver for the switch. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14net: dsa: propagate extack to .port_vlan_filteringVladimir Oltean
Some drivers can't dynamically change the VLAN filtering option, or impose some restrictions, it would be nice to propagate this info through netlink instead of printing it to a kernel log that might never be read. Also netlink extack includes the module that emitted the message, which means that it's easier to figure out which ones are driver-generated errors as opposed to command misuse. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14net: dsa: propagate extack to .port_vlan_addVladimir Oltean
Allow drivers to communicate their restrictions to user space directly, instead of printing to the kernel log. Where the conversion would have been lossy and things like VLAN ID could no longer be conveyed (due to the lack of support for printf format specifier in netlink extack), I chose to keep the messages in full form to the kernel log only, and leave it up to individual driver maintainers to move more messages to extack. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: dsa: act as passthrough for bridge port flagsVladimir Oltean
There are multiple ways in which a PORT_BRIDGE_FLAGS attribute can be expressed by the bridge through switchdev, and not all of them can be emulated by DSA mid-layer API at the same time. One possible configuration is when the bridge offloads the port flags using a mask that has a single bit set - therefore only one feature should change. However, DSA currently groups together unicast and multicast flooding in the .port_egress_floods method, which limits our options when we try to add support for turning off broadcast flooding: do we extend .port_egress_floods with a third parameter which b53 and mv88e6xxx will ignore? But that means that the DSA layer, which currently implements the PRE_BRIDGE_FLAGS attribute all by itself, will see that .port_egress_floods is implemented, and will report that all 3 types of flooding are supported - not necessarily true. Another configuration is when the user specifies more than one flag at the same time, in the same netlink message. If we were to create one individual function per offloadable bridge port flag, we would limit the expressiveness of the switch driver of refusing certain combinations of flag values. For example, a switch may not have an explicit knob for flooding of unknown multicast, just for flooding in general. In that case, the only correct thing to do is to allow changes to BR_FLOOD and BR_MCAST_FLOOD in tandem, and never allow mismatched values. But having a separate .port_set_unicast_flood and .port_set_multicast_flood would not allow the driver to possibly reject that. Also, DSA doesn't consider it necessary to inform the driver that a SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it just calls .port_egress_floods for the CPU port. When we'll add support for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real problem because the flood settings will need to be held statefully in the DSA middle layer, otherwise changing the mrouter port attribute will impact the flooding attribute. And that's _assuming_ that the underlying hardware doesn't have anything else to do when a multicast router attaches to a port than flood unknown traffic to it. If it does, there will need to be a dedicated .port_set_mrouter anyway. So we need to let the DSA drivers see the exact form that the bridge passes this switchdev attribute in, otherwise we are standing in the way. Therefore we also need to use this form of language when communicating to the driver that it needs to configure its initial (before bridge join) and final (after bridge leave) port flags. The b53 and mv88e6xxx drivers are converted to the passthrough API and their implementation of .port_egress_floods is split into two: a function that configures unicast flooding and another for multicast. The mv88e6xxx implementation is quite hairy, and it turns out that the implementations of unknown unicast flooding are actually the same for 6185 and for 6352: behind the confusing names actually lie two individual bits: NO_UNKNOWN_MC -> FLOOD_UC = 0x4 = BIT(2) NO_UNKNOWN_UC -> FLOOD_MC = 0x8 = BIT(3) so there was no reason to entangle them in the first place. Whereas the 6185 writes to MV88E6185_PORT_CTL0_FORWARD_UNKNOWN of PORT_CTL0, which has the exact same bit index. I have left the implementations separate though, for the only reason that the names are different enough to confuse me, since I am not able to double-check with a user manual. The multicast flooding setting for 6185 is in a different register than for 6352 though. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: switchdev: propagate extack to port attributesVladimir Oltean
When a struct switchdev_attr is notified through switchdev, there is no way to report informational messages, unlike for struct switchdev_obj. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-11net: dsa: add support for offloading HSRGeorge McCollister
Add support for offloading of HSR/PRP (IEC 62439-3) tag insertion tag removal, duplicate generation and forwarding on DSA switches. Add DSA_NOTIFIER_HSR_JOIN and DSA_NOTIFIER_HSR_LEAVE which trigger calls to .port_hsr_join and .port_hsr_leave in the DSA driver for the switch. The DSA switch driver should then set netdev feature flags for the HSR/PRP operation that it offloads. NETIF_F_HW_HSR_TAG_INS NETIF_F_HW_HSR_TAG_RM NETIF_F_HW_HSR_FWD NETIF_F_HW_HSR_DUP Signed-off-by: George McCollister <george.mccollister@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-06net: dsa: make assisted_learning_on_cpu_port bypass offloaded LAG interfacesVladimir Oltean
Given the following topology, and focusing only on Box A: Box A +----------------------------------+ | Board 1 br0 | | +---------+ | | / \ | | | | | | | bond0 | | | +-----+ | |192.168.1.1 | / \ | | eno0 swp0 swp1 swp2 | +---|--------|-------|-------|-----+ | | | | +--------+ | | Cable | | Cable| |Cable Cable | | +--------+ | | | | | | +---|--------|-------|-------|-----+ | eno0 swp0 swp1 swp2 | |192.168.1.2 | \ / | | | +-----+ | | | bond0 | | | | | | \ / | | +---------+ | | Board 2 br0 | +----------------------------------+ Box B The assisted_learning_on_cpu_port logic will see that swp0 is bridged with a "foreign interface" (bond0) and will therefore install all addresses learnt by the software bridge towards bond0 (including the address of eno0 on Box B) as static addresses towards the CPU port. But that's not what we want - bond0 is not really a "foreign interface" but one we can offload including L2 forwarding from/towards it. So we need to refine our logic for assisted learning such that, whenever we see an address learnt on a non-DSA interface, we search through the tree for any port that offloads that non-DSA interface. Some confusion might arise as to why we search through the whole tree instead of just the local switch returned by dsa_slave_dev_lower_find. Or a different angle of the same confusion: why does dsa_slave_dev_lower_find(br_dev) return a single dp that's under br_dev instead of the whole list of bridged DSA ports? To answer the second question, it should be enough to install the static FDB entry on the CPU port of a single switch in the tree, because dsa_port_fdb_add uses DSA_NOTIFIER_FDB_ADD which ensures that all other switches in the tree get notified of that address, and add the entry themselves using dsa_towards_port(). This should help understand the answer to the first question: the port returned by dsa_slave_dev_lower_find may not be on the same switch as the ports that offload the LAG. Nonetheless, if the driver implements .crosschip_lag_join and .crosschip_bridge_join as mv88e6xxx does, there still isn't any reason for trapping addresses learnt on the remote LAG towards the CPU, and we should prevent that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06net: dsa: automatically bring user ports down when master goes downVladimir Oltean
This is not fixing any actual bug that I know of, but having a DSA interface that is up even when its lower (master) interface is down is one of those things that just do not sound right. Yes, DSA checks if the master is up before actually bringing the user interface up, but nobody prevents bringing the master interface down immediately afterwards... Then the user ports would attempt dev_queue_xmit on an interface that is down, and wonder what's wrong. This patch prevents that from happening. NETDEV_GOING_DOWN is the notification emitted _before_ the master actually goes down, and we are protected by the rtnl_mutex, so all is well. For those of you reading this because you were doing switch testing such as latency measurements for autonomously forwarded traffic, and you needed a controlled environment with no extra packets sent by the network stack, this patch breaks that, because now the user ports go down too, which may shut down the PHY etc. But please don't do it like that, just do instead: tc qdisc add dev eno2 clsact tc filter add dev eno2 egress flower action drop Tested with two cascaded DSA switches: $ ip link set eno2 down sja1105 spi2.0 sw0p2: Link is Down mscc_felix 0000:00:00.5 swp0: Link is Down fsl_enetc 0000:00:00.2 eno2: Link is Down Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06net: dsa: automatically bring up DSA master when opening user portVladimir Oltean
DSA wants the master interface to be open before the user port is due to historical reasons. The promiscuity of interfaces that are down used to have issues, as referenced Lennert Buytenhek in commit df02c6ff2e39 ("dsa: fix master interface allmulti/promisc handling"). The bugfix mentioned there, commit b6c40d68ff64 ("net: only invoke dev->change_rx_flags when device is UP"), was basically a "don't do that" approach to working around the promiscuity while down issue. Further work done by Vlad Yasevich in commit d2615bf45069 ("net: core: Always propagate flag changes to interfaces") has resolved the underlying issue, and it is strictly up to the DSA and 8021q drivers now, it is no longer mandated by the networking core that the master interface must be up when changing its promiscuity. From DSA's point of view, deciding to error out in dsa_slave_open because the master isn't up is (a) a bad user experience and (b) knocking at an open door. Even if there still was an issue with promiscuity while down, DSA could still just open the master and avoid it. Doing it this way has the additional benefit that user space can now remove DSA-specific workarounds, like systemd-networkd with BindCarrier: https://github.com/systemd/systemd/issues/7478 And we can finally remove one of the 2 bullets in the "Common pitfalls using DSA setups" chapter. Tested with two cascaded DSA switches: $ ip link set sw0p2 up fsl_enetc 0000:00:00.2 eno2: configuring for fixed/internal link mode fsl_enetc 0000:00:00.2 eno2: Link is Up - 1Gbps/Full - flow control rx/tx mscc_felix 0000:00:00.5 swp0: configuring for fixed/sgmii link mode mscc_felix 0000:00:00.5 swp0: Link is Up - 1Gbps/Full - flow control off 8021q: adding VLAN 0 to HW filter on device swp0 sja1105 spi2.0 sw0p2: configuring for phy/rgmii-id link mode IPv6: ADDRCONF(NETDEV_CHANGE): eno2: link becomes ready IPv6: ADDRCONF(NETDEV_CHANGE): swp0: link becomes ready Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net: dsa: allow changing the tag protocol via the "tagging" device attributeVladimir Oltean
Currently DSA exposes the following sysfs: $ cat /sys/class/net/eno2/dsa/tagging ocelot which is a read-only device attribute, introduced in the kernel as commit 98cdb4807123 ("net: dsa: Expose tagging protocol to user-space"), and used by libpcap since its commit 993db3800d7d ("Add support for DSA link-layer types"). It would be nice if we could extend this device attribute by making it writable: $ echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging This is useful with DSA switches that can make use of more than one tagging protocol. It may be useful in dsa_loop in the future too, to perform offline testing of various taggers, or for changing between dsa and edsa on Marvell switches, if that is desirable. In terms of implementation, drivers can support this feature by implementing .change_tag_protocol, which should always leave the switch in a consistent state: either with the new protocol if things went well, or with the old one if something failed. Teardown of the old protocol, if necessary, must be handled by the driver. Some things remain as before: - The .get_tag_protocol is currently only called at probe time, to load the initial tagging protocol driver. Nonetheless, new drivers should report the tagging protocol in current use now. - The driver should manage by itself the initial setup of tagging protocol, no later than the .setup() method, as well as destroying resources used by the last tagger in use, no earlier than the .teardown() method. For multi-switch DSA trees, error handling is a bit more complicated, since e.g. the 5th out of 7 switches may fail to change the tag protocol. When that happens, a revert to the original tag protocol is attempted, but that may fail too, leaving the tree in an inconsistent state despite each individual switch implementing .change_tag_protocol transactionally. Since the intersection between drivers that implement .change_tag_protocol and drivers that support D in DSA is currently the empty set, the possibility for this error to happen is ignored for now. Testing: $ insmod mscc_felix.ko [ 79.549784] mscc_felix 0000:00:00.5: Adding to iommu group 14 [ 79.565712] mscc_felix 0000:00:00.5: Failed to register DSA switch: -517 $ insmod tag_ocelot.ko $ rmmod mscc_felix.ko $ insmod mscc_felix.ko [ 97.261724] libphy: VSC9959 internal MDIO bus: probed [ 97.267363] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 0 [ 97.274998] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 1 [ 97.282561] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 2 [ 97.289700] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 3 [ 97.599163] mscc_felix 0000:00:00.5 swp0 (uninitialized): PHY [0000:00:00.3:10] driver [Microsemi GE VSC8514 SyncE] (irq=POLL) [ 97.862034] mscc_felix 0000:00:00.5 swp1 (uninitialized): PHY [0000:00:00.3:11] driver [Microsemi GE VSC8514 SyncE] (irq=POLL) [ 97.950731] mscc_felix 0000:00:00.5 swp0: configuring for inband/qsgmii link mode [ 97.964278] 8021q: adding VLAN 0 to HW filter on device swp0 [ 98.146161] mscc_felix 0000:00:00.5 swp2 (uninitialized): PHY [0000:00:00.3:12] driver [Microsemi GE VSC8514 SyncE] (irq=POLL) [ 98.238649] mscc_felix 0000:00:00.5 swp1: configuring for inband/qsgmii link mode [ 98.251845] 8021q: adding VLAN 0 to HW filter on device swp1 [ 98.433916] mscc_felix 0000:00:00.5 swp3 (uninitialized): PHY [0000:00:00.3:13] driver [Microsemi GE VSC8514 SyncE] (irq=POLL) [ 98.485542] mscc_felix 0000:00:00.5: configuring for fixed/internal link mode [ 98.503584] mscc_felix 0000:00:00.5: Link is Up - 2.5Gbps/Full - flow control rx/tx [ 98.527948] device eno2 entered promiscuous mode [ 98.544755] DSA: tree 0 setup $ ping 10.0.0.1 PING 10.0.0.1 (10.0.0.1): 56 data bytes 64 bytes from 10.0.0.1: seq=0 ttl=64 time=2.337 ms 64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.754 ms ^C - 10.0.0.1 ping statistics - 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 0.754/1.545/2.337 ms $ cat /sys/class/net/eno2/dsa/tagging ocelot $ cat ./test_ocelot_8021q.sh #!/bin/bash ip link set swp0 down ip link set swp1 down ip link set swp2 down ip link set swp3 down ip link set swp5 down ip link set eno2 down echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging ip link set eno2 up ip link set swp0 up ip link set swp1 up ip link set swp2 up ip link set swp3 up ip link set swp5 up $ ./test_ocelot_8021q.sh ./test_ocelot_8021q.sh: line 9: echo: write error: Protocol not available $ rmmod tag_ocelot.ko rmmod: can't unload module 'tag_ocelot': Resource temporarily unavailable $ insmod tag_ocelot_8021q.ko $ ./test_ocelot_8021q.sh $ cat /sys/class/net/eno2/dsa/tagging ocelot-8021q $ rmmod tag_ocelot.ko $ rmmod tag_ocelot_8021q.ko rmmod: can't unload module 'tag_ocelot_8021q': Resource temporarily unavailable $ ping 10.0.0.1 PING 10.0.0.1 (10.0.0.1): 56 data bytes 64 bytes from 10.0.0.1: seq=0 ttl=64 time=0.953 ms 64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.787 ms 64 bytes from 10.0.0.1: seq=2 ttl=64 time=0.771 ms $ rmmod mscc_felix.ko [ 645.544426] mscc_felix 0000:00:00.5: Link is Down [ 645.838608] DSA: tree 0 torn down $ rmmod tag_ocelot_8021q.ko Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15net: dsa: set configure_vlan_while_not_filtering to true by defaultVladimir Oltean
As explained in commit 54a0ed0df496 ("net: dsa: provide an option for drivers to always receive bridge VLANs"), DSA has historically been skipping VLAN switchdev operations when the bridge wasn't in vlan_filtering mode, but the reason why it was doing that has never been clear. So the configure_vlan_while_not_filtering option is there merely to preserve functionality for existing drivers. It isn't some behavior that drivers should opt into. Ideally, when all drivers leave this flag set, we can delete the dsa_port_skip_vlan_configuration() function. New drivers always seem to omit setting this flag, for some reason. So let's reverse the logic: the DSA core sets it by default to true before the .setup() callback, and legacy drivers can turn it off. This way, new drivers get the new behavior by default, unless they explicitly set the flag to false, which is more obvious during review. Remove the assignment from drivers which were setting it to true, and add the assignment to false for the drivers that didn't previously have it. This way, it should be easier to see how many we have left. The following drivers: lan9303, mv88e6060 were skipped from setting this flag to false, because they didn't have any VLAN offload ops in the first place. The Broadcom Starfighter 2 driver calls the common b53_switch_alloc and therefore also inherits the configure_vlan_while_not_filtering=true behavior. Also, print a message through netlink extack every time a VLAN has been skipped. This is mildly annoying on purpose, so that (a) it is at least clear that VLANs are being skipped - the legacy behavior in itself is confusing, and the extack should be much more difficult to miss, unlike kernel logs - and (b) people have one more incentive to convert to the new behavior. No behavior change except for the added prints is intended at this time. $ ip link add br0 type bridge vlan_filtering 0 $ ip link set sw0p2 master br0 [ 60.315148] br0: port 1(sw0p2) entered blocking state [ 60.320350] br0: port 1(sw0p2) entered disabled state [ 60.327839] device sw0p2 entered promiscuous mode [ 60.334905] br0: port 1(sw0p2) entered blocking state [ 60.340142] br0: port 1(sw0p2) entered forwarding state Warning: dsa_core: skipping configuration of VLAN. # This was the pvid $ bridge vlan add dev sw0p2 vid 100 Warning: dsa_core: skipping configuration of VLAN. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20210115231919.43834-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14net: dsa: Link aggregation supportTobias Waldekranz
Monitor the following events and notify the driver when: - A DSA port joins/leaves a LAG. - A LAG, made up of DSA ports, joins/leaves a bridge. - A DSA port in a LAG is enabled/disabled (enabled meaning "distributing" in 802.3ad LACP terms). When a LAG joins a bridge, the DSA subsystem will treat that as each individual port joining the bridge. The driver may look at the port's LAG device pointer to see if it is associated with any LAG, if that is required. This is analogue to how switchdev events are replicated out to all lower devices when reaching e.g. a LAG. Drivers can optionally request that DSA maintain a linear mapping from a LAG ID to the corresponding netdev by setting ds->num_lag_ids to the desired size. In the event that the hardware is not capable of offloading a particular LAG for any reason (the typical case being use of exotic modes like broadcast), DSA will take a hands-off approach, allowing the LAG to be formed as a pure software construct. This is reported back through the extended ACK, but is otherwise transparent to the user. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14net: dsa: Don't offload port attributes on standalone portsTobias Waldekranz
In a situation where a standalone port is indirectly attached to a bridge (e.g. via a LAG) which is not offloaded, do not offload any port attributes either. The port should behave as a standard NIC. Previously, on mv88e6xxx, this meant that in the following setup: br0 / team0 / \ swp0 swp1 If vlan filtering was enabled on br0, swp0's and swp1's QMode was set to "secure". This caused all untagged packets to be dropped, as their default VID (0) was not loaded into the VTU. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-12net: dsa: add optional stats64 supportOleksij Rempel
Allow DSA drivers to export stats64 Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11net: dsa: remove obsolete comments about switchdev transactionsVladimir Oltean
Now that all port object notifiers were converted to be non-transactional, we can remove the comments that say otherwise. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11net: switchdev: remove the transaction structure from port attributesVladimir Oltean
Since the introduction of the switchdev API, port attributes were transmitted to drivers for offloading using a two-step transactional model, with a prepare phase that was supposed to catch all errors, and a commit phase that was supposed to never fail. Some classes of failures can never be avoided, like hardware access, or memory allocation. In the latter case, merely attempting to move the memory allocation to the preparation phase makes it impossible to avoid memory leaks, since commit 91cf8eceffc1 ("switchdev: Remove unused transaction item queue") which has removed the unused mechanism of passing on the allocated memory between one phase and another. It is time we admit that separating the preparation from the commit phase is something that is best left for the driver to decide, and not something that should be baked into the API, especially since there are no switchdev callers that depend on this. This patch removes the struct switchdev_trans member from switchdev port attribute notifier structures, and converts drivers to not look at this member. In part, this patch contains a revert of my previous commit 2e554a7a5d8a ("net: dsa: propagate switchdev vlan_filtering prepare phase to drivers"). For the most part, the conversion was trivial except for: - Rocker's world implementation based on Broadcom OF-DPA had an odd implementation of ofdpa_port_attr_bridge_flags_set. The conversion was done mechanically, by pasting the implementation twice, then only keeping the code that would get executed during prepare phase on top, then only keeping the code that gets executed during the commit phase on bottom, then simplifying the resulting code until this was obtained. - DSA's offloading of STP state, bridge flags, VLAN filtering and multicast router could be converted right away. But the ageing time could not, so a shim was introduced and this was left for a further commit. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek Reviewed-by: Linus Walleij <linus.walleij@linaro.org> # RTL8366RB Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11net: switchdev: remove the transaction structure from port object notifiersVladimir Oltean
Since the introduction of the switchdev API, port objects were transmitted to drivers for offloading using a two-step transactional model, with a prepare phase that was supposed to catch all errors, and a commit phase that was supposed to never fail. Some classes of failures can never be avoided, like hardware access, or memory allocation. In the latter case, merely attempting to move the memory allocation to the preparation phase makes it impossible to avoid memory leaks, since commit 91cf8eceffc1 ("switchdev: Remove unused transaction item queue") which has removed the unused mechanism of passing on the allocated memory between one phase and another. It is time we admit that separating the preparation from the commit phase is something that is best left for the driver to decide, and not something that should be baked into the API, especially since there are no switchdev callers that depend on this. This patch removes the struct switchdev_trans member from switchdev port object notifier structures, and converts drivers to not look at this member. Where driver conversion is trivial (like in the case of the Marvell Prestera driver, NXP DPAA2 switch, TI CPSW, and Rocker drivers), it is done in this patch. Where driver conversion needs more attention (DSA, Mellanox Spectrum), the conversion is left for subsequent patches and here we only fake the prepare/commit phases at a lower level, just not in the switchdev notifier itself. Where the code has a natural structure that is best left alone as a preparation and a commit phase (as in the case of the Ocelot switch), that structure is left in place, just made to not depend upon the switchdev transactional model. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11net: switchdev: remove vid_begin -> vid_end range from VLAN objectsVladimir Oltean
The call path of a switchdev VLAN addition to the bridge looks something like this today: nbp_vlan_init | __br_vlan_set_default_pvid | | | | | br_afspec | | | | | | | v | | | br_process_vlan_info | | | | | | | v | | | br_vlan_info | | | / \ / | | / \ / | | / \ / | | / \ / v v v v v nbp_vlan_add br_vlan_add ------+ | ^ ^ | | | / | | | | / / / | \ br_vlan_get_master/ / v \ ^ / / br_vlan_add_existing \ | / / | \ | / / / \ | / / / \ | / / / \ | / / / v | | v / __vlan_add / / | / / | / v | / __vlan_vid_add | / \ | / v v v br_switchdev_port_vlan_add The ranges UAPI was introduced to the bridge in commit bdced7ef7838 ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec) have always been passed one by one, through struct bridge_vlan_info tmp_vinfo, to br_vlan_info. So the range never went too far in depth. Then Scott Feldman introduced the switchdev_port_bridge_setlink function in commit 47f8328bb1a4 ("switchdev: add new switchdev bridge setlink"). That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made full use of the range. But switchdev_port_bridge_setlink was called like this: br_setlink -> br_afspec -> switchdev_port_bridge_setlink Basically, the switchdev and the bridge code were not tightly integrated. Then commit 41c498b9359e ("bridge: restore br_setlink back to original") came, and switchdev drivers were required to implement .ndo_bridge_setlink = switchdev_port_bridge_setlink for a while. In the meantime, commits such as 0944d6b5a2fa ("bridge: try switchdev op first in __vlan_vid_add/del") finally made switchdev penetrate the br_vlan_info() barrier and start to develop the call path we have today. But remember, br_vlan_info() still receives VLANs one by one. Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit 29ab586c3d83 ("net: switchdev: Remove bridge bypass support from switchdev") so that drivers would not implement .ndo_bridge_setlink any longer. The switchdev_port_bridge_setlink also got deleted. This refactoring removed the parallel bridge_setlink implementation from switchdev, and left the only switchdev VLAN objects to be the ones offloaded from __vlan_vid_add (basically RX filtering) and __vlan_add (the latter coming from commit 9c86ce2c1ae3 ("net: bridge: Notify about bridge VLANs")). That is to say, today the switchdev VLAN object ranges are not used in the kernel. Refactoring the above call path is a bit complicated, when the bridge VLAN call path is already a bit complicated. Let's go off and finish the job of commit 29ab586c3d83 by deleting the bogus iteration through the VLAN ranges from the drivers. Some aspects of this feature never made too much sense in the first place. For example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID flag supposed to mean, when a port can obviously have a single pvid? This particular configuration _is_ denied as of commit 6623c60dc28e ("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API perspective, the driver still has to play pretend, and only offload the vlan->vid_end as pvid. And the addition of a switchdev VLAN object can modify the flags of another, completely unrelated, switchdev VLAN object! (a VLAN that is PVID will invalidate the PVID flag from whatever other VLAN had previously been offloaded with switchdev and had that flag. Yet switchdev never notifies about that change, drivers are supposed to guess). Nonetheless, having a VLAN range in the API makes error handling look scarier than it really is - unwinding on errors and all of that. When in reality, no one really calls this API with more than one VLAN. It is all unnecessary complexity. And despite appearing pretentious (two-phase transactional model and all), the switchdev API is really sloppy because the VLAN addition and removal operations are not paired with one another (you can add a VLAN 100 times and delete it just once). The bridge notifies through switchdev of a VLAN addition not only when the flags of an existing VLAN change, but also when nothing changes. There are switchdev drivers out there who don't like adding a VLAN that has already been added, and those checks don't really belong at driver level. But the fact that the API contains ranges is yet another factor that prevents this from being addressed in the future. Of the existing switchdev pieces of hardware, it appears that only Mellanox Spectrum supports offloading more than one VLAN at a time, through mlxsw_sp_port_vlan_set. I have kept that code internal to the driver, because there is some more bookkeeping that makes use of it, but I deleted it from the switchdev API. But since the switchdev support for ranges has already been de facto deleted by a Mellanox employee and nobody noticed for 4 years, I'm going to assume it's not a biggie. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09net: dsa: dsa_legacy_fdb_{add,del} can be staticVladimir Oltean
Introduced in commit 37b8da1a3c68 ("net: dsa: Move FDB add/del implementation inside DSA") in net/dsa/legacy.c, these functions were moved again to slave.c as part of commit 2a93c1a3651f ("net: dsa: Allow compiling out legacy support"), before actually deleting net/dsa/slave.c in 93e86b3bc842 ("net: dsa: Remove legacy probing support"). Along with that movement there should have been a deletion of the prototypes from dsa_priv.h, they are not useful. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20210108233054.1222278-1-olteanv@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: dsa: remove the DSA specific notifiersVladimir Oltean
This effectively reverts commit 60724d4bae14 ("net: dsa: Add support for DSA specific notifiers"). The reason is that since commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), it appears that there is a generic way to achieve the same purpose. The only user thus far, the Broadcom SYSTEMPORT driver, was converted to use the generic notifiers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: dsa: export dsa_slave_dev_checkVladimir Oltean
Using the NETDEV_CHANGEUPPER notifications, drivers can be aware when they are enslaved to e.g. a bridge by calling netif_is_bridge_master(). Export this helper from DSA to get the equivalent functionality of determining whether the upper interface of a CHANGEUPPER notifier is a DSA switch interface or not. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign bridge ↵Vladimir Oltean
neighbors Some DSA switches (and not only) cannot learn source MAC addresses from packets injected from the CPU. They only perform hardware address learning from inbound traffic. This can be problematic when we have a bridge spanning some DSA switch ports and some non-DSA ports (which we'll call "foreign interfaces" from DSA's perspective). There are 2 classes of problems created by the lack of learning on CPU-injected traffic: - excessive flooding, due to the fact that DSA treats those addresses as unknown - the risk of stale routes, which can lead to temporary packet loss To illustrate the second class, consider the following situation, which is common in production equipment (wireless access points, where there is a WLAN interface and an Ethernet switch, and these form a single bridging domain). AP 1: +------------------------------------------------------------------------+ | br0 | +------------------------------------------------------------------------+ +------------+ +------------+ +------------+ +------------+ +------------+ | swp0 | | swp1 | | swp2 | | swp3 | | wlan0 | +------------+ +------------+ +------------+ +------------+ +------------+ | ^ ^ | | | | | | | Client A Client B | | | +------------+ +------------+ +------------+ +------------+ +------------+ | swp0 | | swp1 | | swp2 | | swp3 | | wlan0 | +------------+ +------------+ +------------+ +------------+ +------------+ +------------------------------------------------------------------------+ | br0 | +------------------------------------------------------------------------+ AP 2 - br0 of AP 1 will know that Clients A and B are reachable via wlan0 - the hardware fdb of a DSA switch driver today is not kept in sync with the software entries on other bridge ports, so it will not know that clients A and B are reachable via the CPU port UNLESS the hardware switch itself performs SA learning from traffic injected from the CPU. Nonetheless, a substantial number of switches don't. - the hardware fdb of the DSA switch on AP 2 may autonomously learn that Client A and B are reachable through swp0. Therefore, the software br0 of AP 2 also may or may not learn this. In the example we're illustrating, some Ethernet traffic has been going on, and br0 from AP 2 has indeed learnt that it can reach Client B through swp0. One of the wireless clients, say Client B, disconnects from AP 1 and roams to AP 2. The topology now looks like this: AP 1: +------------------------------------------------------------------------+ | br0 | +------------------------------------------------------------------------+ +------------+ +------------+ +------------+ +------------+ +------------+ | swp0 | | swp1 | | swp2 | | swp3 | | wlan0 | +------------+ +------------+ +------------+ +------------+ +------------+ | ^ | | | Client A | | | Client B | | | v +------------+ +------------+ +------------+ +------------+ +------------+ | swp0 | | swp1 | | swp2 | | swp3 | | wlan0 | +------------+ +------------+ +------------+ +------------+ +------------+ +------------------------------------------------------------------------+ | br0 | +------------------------------------------------------------------------+ AP 2 - br0 of AP 1 still knows that Client A is reachable via wlan0 (no change) - br0 of AP 1 will (possibly) know that Client B has left wlan0. There are cases where it might never find out though. Either way, DSA today does not process that notification in any way. - the hardware FDB of the DSA switch on AP 1 may learn autonomously that Client B can be reached via swp0, if it receives any packet with Client 1's source MAC address over Ethernet. - the hardware FDB of the DSA switch on AP 2 still thinks that Client B can be reached via swp0. It does not know that it has roamed to wlan0, because it doesn't perform SA learning from the CPU port. Now Client A contacts Client B. AP 1 routes the packet fine towards swp0 and delivers it on the Ethernet segment. AP 2 sees a frame on swp0 and its fdb says that the destination is swp0. Hairpinning is disabled => drop. This problem comes from the fact that these switches have a 'blind spot' for addresses coming from software bridging. The generic solution is not to assume that hardware learning can be enabled somehow, but to listen to more bridge learning events. It turns out that the bridge driver does learn in software from all inbound frames, in __br_handle_local_finish. A proper SWITCHDEV_FDB_ADD_TO_DEVICE notification is emitted for the addresses serviced by the bridge on 'foreign' interfaces. The software bridge also does the right thing on migration, by notifying that the old entry is deleted, so that does not need to be special-cased in DSA. When it is deleted, we just need to delete our static FDB entry towards the CPU too, and wait. The problem is that DSA currently only cares about SWITCHDEV_FDB_ADD_TO_DEVICE events received on its own interfaces, such as static FDB entries. Luckily we can change that, and DSA can listen to all switchdev FDB add/del events in the system and figure out if those events were emitted by a bridge that spans at least one of DSA's own ports. In case that is true, DSA will also offload that address towards its own CPU port, in the eventuality that there might be bridge clients attached to the DSA switch who want to talk to the station connected to the foreign interface. In terms of implementation, we need to keep the fdb_info->added_by_user check for the case where the switchdev event was targeted directly at a DSA switch port. But we don't need to look at that flag for snooped events. So the check is currently too late, we need to move it earlier. This also simplifies the code a bit, since we avoid uselessly allocating and freeing switchdev_work. We could probably do some improvements in the future. For example, multi-bridge support is rudimentary at the moment. If there are two bridges spanning a DSA switch's ports, and both of them need to service the same MAC address, then what will happen is that the migration of one of those stations will trigger the deletion of the FDB entry from the CPU port while it is still used by other bridge. That could be improved with reference counting but is left for another time. This behavior needs to be enabled at driver level by setting ds->assisted_learning_on_cpu_port = true. This is because we don't want to inflict a potential performance penalty (accesses through MDIO/I2C/SPI are expensive) to hardware that really doesn't need it because address learning on the CPU port works there. Reported-by: DENG Qingfang <dqfext@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: dsa: exit early in dsa_slave_switchdev_event if we can't program the FDBVladimir Oltean
Right now, the following would happen for a switch driver that does not implement .port_fdb_add or .port_fdb_del. dsa_slave_switchdev_event returns NOTIFY_OK and schedules: -> dsa_slave_switchdev_event_work -> dsa_port_fdb_add -> dsa_port_notify(DSA_NOTIFIER_FDB_ADD) -> dsa_switch_fdb_add -> if (!ds->ops->port_fdb_add) return -EOPNOTSUPP; -> an error is printed with dev_dbg, and dsa_fdb_offload_notify(switchdev_work) is not called. We can avoid scheduling the worker for nothing and say NOTIFY_DONE. Because we don't call dsa_fdb_offload_notify, the static FDB entry will remain just in the software bridge. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: dsa: move switchdev event implementation under the same switch/case ↵Vladimir Oltean
statement We'll need to start listening to SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE events even for interfaces where dsa_slave_dev_check returns false, so we need that check inside the switch-case statement for SWITCHDEV_FDB_*. This movement also avoids a useless allocation / free of switchdev_work on the untreated "default event" case. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: dsa: don't use switchdev_notifier_fdb_info in dsa_switchdev_event_workVladimir Oltean
Currently DSA doesn't add FDB entries on the CPU port, because it only does so through switchdev, which is associated with a net_device, and there are none of those for the CPU port. But actually FDB addresses on the CPU port have some use cases of their own, if the switchdev operations are initiated from within the DSA layer. There is just one problem with the existing code: it passes a structure in dsa_switchdev_event_work which was retrieved directly from switchdev, so it contains a net_device. We need to generalize the contents to something that covers the CPU port as well: the "ds, port" tuple is fine for that. Note that the new procedure for notifying the successful FDB offload is inspired from the rocker model. Also, nothing was being done if added_by_user was false. Let's check for that a lot earlier, and don't actually bother to schedule the worker for nothing. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: dsa: be louder when a non-legacy FDB operation failsVladimir Oltean
The dev_close() call was added in commit c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification") "to indicate inconsistent situation" when we could not delete an FDB entry from the port. bridge fdb del d8:58:d7:00:ca:6d dev swp0 self master It is a bit drastic and at the same time not helpful if the above fails to only print with netdev_dbg log level, but on the other hand to bring the interface down. So increase the verbosity of the error message, and drop dev_close(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-08net: dsa: print the MTU value that could not be setRasmus Villemoes
These warnings become somewhat more informative when they include the MTU value that could not be set and not just the errno. Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Link: https://lore.kernel.org/r/20201205133944.10182-1-rasmus.villemoes@prevas.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20net: dsa: avoid potential use-after-free errorChristian Eggers
If dsa_switch_ops::port_txtstamp() returns false, clone will be freed immediately. Shouldn't store a pointer to freed memory. Signed-off-by: Christian Eggers <ceggers@arri.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20201119110906.25558-1-ceggers@arri.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-09net: dsa: use net core stats64 handlingHeiner Kallweit
Use netdev->tstats instead of a member of dsa_slave_priv for storing a pointer to the per-cpu counters. This allows us to use core functionality for statistics handling. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-05net: dsa: Give drivers the chance to veto certain upper devicesVladimir Oltean
Some switches rely on unique pvids to ensure port separation in standalone mode, because they don't have a port forwarding matrix configurable in hardware. So, setups like a group of 2 uppers with the same VLAN, swp0.100 and swp1.100, will cause traffic tagged with VLAN 100 to be autonomously forwarded between these switch ports, in spite of there being no bridge between swp0 and swp1. These drivers need to prevent this from happening. They need to have VLAN filtering enabled in standalone mode (so they'll drop frames tagged with unknown VLANs) and they can only accept an 8021q upper on a port as long as it isn't installed on any other port too. So give them the chance to veto bad user requests. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> [Kurt: Pass info instead of ptr] Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02net: dsa: implement a central TX reallocation procedureVladimir Oltean
At the moment, taggers are left with the task of ensuring that the skb headers are writable (which they aren't, if the frames were cloned for TX timestamping, for flooding by the bridge, etc), and that there is enough space in the skb data area for the DSA tag to be pushed. Moreover, the life of tail taggers is even harder, because they need to ensure that short frames have enough padding, a problem that normal taggers don't have. The principle of the DSA framework is that everything except for the most intimate hardware specifics (like in this case, the actual packing of the DSA tag bits) should be done inside the core, to avoid having code paths that are very rarely tested. So provide a TX reallocation procedure that should cover the known needs of DSA today. Note that this patch also gives the network stack a good hint about the headroom/tailroom it's going to need. Up till now it wasn't doing that. So the reallocation procedure should really be there only for the exceptional cases, and for cloned packets which need to be unshared. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Christian Eggers <ceggers@arri.de> # For tail taggers only Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13net: dsa: use new function dev_fetch_sw_netstatsHeiner Kallweit
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/b6047017-8226-6b7e-a3cd-064e69fdfa27@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Two minor conflicts: 1) net/ipv4/route.c, adding a new local variable while moving another local variable and removing it's initial assignment. 2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes. One pretty prints the port mode differently, whilst another changes the driver to try and obtain the port mode from the port node rather than the switch node. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-20net: dsa: install VLANs into the master's RX filter tooVladimir Oltean
Most DSA switch tags shift the EtherType to the right, causing the master to not parse the VLAN as VLAN. However, not all switches do that (example: tail tags, tag_8021q etc), and if the DSA master has "rx-vlan-filter: on" in ethtool -k, then we have a problem. Therefore, we could populate the VLAN table of the master, just in case (for some switches it will not make a difference), so that network I/O can work even with a VLAN filtering master. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-20net: dsa: allow 8021q uppers while the bridge has vlan_filtering=0Vladimir Oltean
When the bridge has VLAN awareness disabled there isn't any duplication of functionality, since the bridge does not process VLAN. Don't deny adding 8021q uppers to DSA switch ports in that case. The switch is supposed to simply pass traffic leaving the VLAN tag as-is, and the stack will happily strip the VLAN tag for all 8021q uppers that exist. We need to ensure that there are no 8021q uppers when the user attempts to enable bridge vlan_filtering. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-20net: dsa: convert denying bridge VLAN with existing 8021q upper to ↵Vladimir Oltean
PRECHANGEUPPER This is checking for the following order of operations, and makes sure to deny that configuration: ip link add link swp2 name swp2.100 type vlan id 100 ip link add br0 type bridge vlan_filtering 1 ip link set swp2 master br0 bridge vlan add dev swp2 vid 100 Instead of using vlan_for_each(), which looks at the VLAN filters installed with vlan_vid_add(), just track the 8021q uppers. This has the advantage of freeing up the vlan_vid_add() call for actual VLAN filtering. There is another change in this patch. The check is moved in slave.c, from switch.c. I don't think it makes sense to have this 8021q upper check for each switch port that gets notified of that VLAN addition (these include DSA links and CPU ports, we know those can't have 8021q uppers because they don't have a net_device registered for them), so just do it in slave.c, for that one slave interface. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-20net: dsa: convert check for 802.1Q upper when bridged into PRECHANGEUPPERVladimir Oltean
DSA tries to prevent having a VLAN added by a bridge and by an 802.1Q upper at the same time. It does that by checking the VID in .ndo_vlan_rx_add_vid(), since that's something that the 8021q module calls, via vlan_vid_add(). When a VLAN matches in both subsystems, this check returns -EBUSY. However the vlan_vid_add() function isn't specific to the 8021q module in any way at all. It is simply the kernel's way to tell an interface to add a VLAN to its RX filter and not drop that VLAN. So there's no reason to return -EBUSY when somebody tries to call vlan_vid_add() for a VLAN that was installed by the bridge. The proper behavior is to accept that configuration. So what's wrong is how DSA checks that it has an 8021q upper. It should look at the actual uppers for that, not just assume that the 8021q module was somewhere in the call stack of .ndo_vlan_rx_add_vid(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>