summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-12-18s390/qeth: wake up all waiters from qeth_irq()Julian Wiedmann
card->wait_q is shared by different users, for different wake-up conditions. qeth_irq() can potentially trigger multiple of these conditions: 1) A change to channel->irq_pending, which qeth_send_control_data() is waiting for. 2) A change to card->state, which qeth_clear_channel() and qeth_halt_channel() are waiting for. As qeth_irq() does only a single wake_up(), we might miss to wake up a second eligible waiter. Luckily all waiters are guarded with a timeout, so this situation should recover on its own eventually. To make things work robustly, add an additional wake_up() for changes to channel->state. And extract a helper that updates channel->irq_pending along with the needed wake_up(). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18s390/qeth: only handle IRQs while device is onlineJulian Wiedmann
A qeth device that's offline should not be receiving any IRQs - all pending IOs have been terminated, and we avoid starting any new ones. So rather than immediately registering the IRQ handler when the device is probed, only register it while the device is online. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18Merge branch 'stmmac-taprio'David S. Miller
Jose Abreu says: ==================== net: stmmac: TSN support using TAPRIO API This series adds TSN support (EST and Frame Preemption) for stmmac driver. 1) Adds the HW specific support for EST in GMAC5+ cores. 2) Adds the HW specific support for EST in XGMAC3+ cores. 3) Integrates EST HW specific support with TAPRIO scheduler API. 4) Adds the Frame Preemption suppor on stmmac TAPRIO implementation. 5) Adds the HW specific support for Frame Preemption in GMAC5+ cores. 6) Adds the HW specific support for Frame Preemption in XGMAC3+ cores. 7) Adds support for HW debug counters for Frame Preemption available in GMAC5+ cores. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18net: stmmac: mmc: Add Frame Preemption counters on GMAC5+ coresJose Abreu
This can be useful for debug. Add these counters on GMAC5+ cores just like we did for XGMAC. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18net: stmmac: xgmac3+: Add support for Frame PreemptionJose Abreu
Adds the HW specific support for Frame Preemption on XGMAC3+ cores. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18net: stmmac: gmac5+: Add support for Frame PreemptionJose Abreu
Adds the HW specific support for Frame Preemption on GMAC5+ cores. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18net: stmmac: Add Frame Preemption support using TAPRIO APIJose Abreu
Adds the support for Frame Preemption using TAPRIO API. This works along with EST feature and allows to select if preemptable traffic shall be sent during specific queues opening time. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18net: stmmac: Integrate EST with TAPRIO scheduler APIJose Abreu
Now that we have the EST code for XGMAC and QoS we can use it with the TAPRIO scheduler. Integrate it into the main driver and use the API to configure the EST feature. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18net: stmmac: Add basic EST support for XGMACJose Abreu
Adds the support for EST in XGMAC cores. This feature allows to offload scheduling of queues opening time to the IP. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18net: stmmac: Add basic EST support for GMAC5+Jose Abreu
Adds the support for EST in GMAC5+ cores. This feature allows to offload scheduling of queues opening time to the IP. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18Merge branch 'stmmac-next'David S. Miller
Jose Abreu says: ==================== net: stmmac: Improvements for -next Misc improvements for stmmac. 1) Adds more information regarding HW Caps in the DebugFS file. 2) Allows interrupts to be independently enabled or disabled so that we don't have to schedule both TX and RX NAPIs. 3) Stops using a magic number in coalesce timer re-arm. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18net: stmmac: Always use TX coalesce timer value when reschedulingJose Abreu
When we have pending packets we re-arm the TX timer with a magic value. This changes the re-arm of the timer from 10us to the user-defined coalesce value. As we support different speeds, having a magic value of 10us can be either too short or to large depending on the speed so we let user configure it. The default value of the timer is 1ms but it can be reconfigured by ethtool. Changes from v1: - Reword commit message (Jakub) Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18net: stmmac: Let TX and RX interrupts be independently enabled/disabledJose Abreu
By using this mechanism we can get rid of the not so nice method of scheduling TX NAPI when the RX was scheduled. No bandwidth reduction was seen with this change. Changes from v1: - Remove useless comment (Jakub) - Do not bind the TX clean to NAPI budget (Jakub) Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18net: stmmac: Print more information in DebugFS DMA Capabilities fileJose Abreu
DMA Capabilites have grown but the DebugFS that shows this info has not been updated. Lets add the missing information. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17xen-netback: remove 'hotplug-status' once it has served its purposePaul Durrant
Removing the 'hotplug-status' node in netback_remove() is wrong; the script may not have completed. Only remove the node once the watch has fired and has been unregistered. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17xen-netback: switch state to InitWait at the end of netback_probe()...Paul Durrant
...as the comment above the function states. The switch to Initialising at the start of the function is somewhat bogus as the toolstack will have set that initial state anyway. To behave correctly, a backend should switch to InitWait once it has set up all xenstore values that may be required by a initialising frontend. This patch calls backend_switch_state() to make the transition at the appropriate point. NOTE: backend_switch_state() ignores errors from xenbus_switch_state() and so this patch removes an error path from netback_probe(). This means a failure to change state at this stage (in the absence of other failures) will leave the device instantiated. This is highly unlikley to happen as a failure to change state would indicate a failure to write to xenstore, and that will trigger other error paths. Also, a 'stuck' device can still be cleaned up using 'unbind' in any case. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17xen-netback: move netback_probe() and netback_remove() to the end...Paul Durrant
...of xenbus.c This is a cosmetic function re-ordering to reduce churn in a subsequent patch. Some style fix-up was done to make checkpatch.pl happier. No functional change. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17Merge branch 'cxgb4-chtls-fix-issues-related-to-high-priority-region'David S. Miller
Shahjada Abul Husain says: ==================== cxgb4/chtls: fix issues related to high priority region The high priority region introduced by: commit c21939998802 ("cxgb4: add support for high priority filters") had caused regression in some code paths, leading to connection failures for the ULDs. This series of patches attempt to fix the regressions. Patch 1 fixes some code paths that have been missed to consider the high priority region. Patch 2 fixes ULD connection failures due to wrong TID base that had been shifted after the high priority region. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17cxgb4/chtls: fix ULD connection failures due to wrong TID baseShahjada Abul Husain
Currently, the hardware TID index is assumed to start from index 0. However, with the following changeset, commit c21939998802 ("cxgb4: add support for high priority filters") hardware TID index can start after the high priority region, which has introduced a regression resulting in connection failures for ULDs. So, fix all related code to properly recalculate the TID start index based on whether high priority filters are enabled or not. Fixes: c21939998802 ("cxgb4: add support for high priority filters") Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17cxgb4: fix missed high priority region calculationShahjada Abul Husain
commit c21939998802 ("cxgb4: add support for high priority filters") has missed considering high priority region calculation in some code paths. This patch fixes them. Fixes: c21939998802 ("cxgb4: add support for high priority filters") Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17net: dsa: Make PHYLINK related function static againFlorian Fainelli
Commit 77373d49de22 ("net: dsa: Move the phylink driver calls into port.c") moved and exported a bunch of symbols, but they are not used outside of net/dsa/port.c at the moment, so no reason to export them. Reported-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17tipc: don't send gap blocks in ACK messagesJon Maloy
In the commit referred to below we eliminated sending of the 'gap' indicator in regular ACK messages, reserving this to explicit NACK ditto. Unfortunately we missed to also eliminate building of the 'gap block' area in ACK messages. This area is meant to report gaps in the received packet sequence following the initial gap, so that lost packets can be retransmitted earlier and received out-of-sequence packets can be released earlier. However, the interpretation of those blocks is dependent on a complete and correct sequence of gaps and acks. Hence, when the initial gap indicator is missing a single gap block will be interpreted as an acknowledgment of all preceding packets. This may lead to packets being released prematurely from the sender's transmit queue, with easily predicatble consequences. We now fix this by not building any gap block area if there is no initial gap to report. Fixes: commit 02288248b051 ("tipc: eliminate gap indicator from ACK messages") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17Merge branch 'stmmac-dwc-qos-ACPI-device-support'David S. Miller
Ajay Gupta says: ==================== net: stmmac: dwc-qos: ACPI device support Version 3 of patches have fixes for comments from Jakub Kicinski. These two changes are needed to enable ACPI based devices to use stmmac driver. First patch is to use generic device api (device_*) instead of device tree based api (of_*). Second patch avoids clock and reset accesses for Tegra ACPI based devices. ACPI interface will be used to access clock and reset for Tegra ACPI devices in later patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17net: stmmac: dwc-qos: avoid clk and reset for acpi deviceAjay Gupta
There are no clocks, resets or gpios referenced by Tegra ACPI device so don't access clocks, resets or gpios interface with ACPI device. Clocks, resets and GPIOs for ACPI devices will be handled via ACPI interface. Signed-off-by: Ajay Gupta <ajayg@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17net: stmmac: dwc-qos: use generic device apiAjay Gupta
Use generic device api so that driver can work both with DT or ACPI based devices. Signed-off-by: Ajay Gupta <ajayg@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17Merge branch 'dwmac-mediatek-add-more-support-for-RMII'David S. Miller
Biao Huang says: ==================== net-next: stmmac: dwmac-mediatek: add more support for RMII changes in v2: PATCH 1/2 net-next: stmmac: mediatek: add more support for RMII As Andrew's comments, add the "rmii_internal" clock to the list of clocks. PATCH 2/2 net-next: dt-binding: dwmac-mediatek: add more description for RMII document the "rmii_internal" clock in dt-bindings rewrite the sample dts in dt-bindings. v1: This series is for support RMII when MT2712 SoC provides the reference clock. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17net-next: dt-binding: dwmac-mediatek: add more description for RMIIBiao Huang
MT2712 SoC can provide RMII reference clock, so add corresponding description in dt-binding. Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17net-next: stmmac: mediatek: add more support for RMIIBiao Huang
MT2712 SoC can provide the rmii reference clock, and the clock will output from TXC pin only, which means ref_clk pin of external PHY should connect to TXC pin in this case. Add corresponding clock and timing settings. Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17Merge branch 'improve-clause-45-support-in-phylink'David S. Miller
Russell King says: ==================== improve clause 45 support in phylink These three patches improve the clause 45 support in phylink, fixing some corner cases that have been noticed with the addition of SFP+ NBASE-T modules, but are actually a little more wisespread than I initially realised. The first issue was spotted with a NBASE-T PHY on a SFP+ module plugged into a mvneta platform. When these PHYs are not operating in USXGMII mode, but are in a single-lane Serdes mode, they will switch between one of several different PHY interface modes. If we call the MAC validate() function with the current PHY interface mode, we will restrict the supported and advertising masks to the link modes that the current PHY interface mode supports. For example, if we determine that we want to start the PHY with an interface mode of 2500BASE-X, then this setup will restrict the advertisement and supported masks to 2.5G speed link modes. What we actually want for these PHYs is to allow them to support any link modes that the PHY supports _and_ the MAC is also capable of supporting. Without knowing the details of the PHY interface modes that may be used, we can do this by using PHY_INTERFACE_MODE_NA to validate and restrict the link modes to any that the MAC supports. mvpp2 with the 88X3310 PHY avoids this problem, because the validate() implementation allows all MAC supported speeds not only for PHY_INTERFACE_MODE_NA, but also for XAUI and 10GKR modes. The first patch addresses this; current MAC drivers should continue to work as-is, but there will be a follow-on patch to fixup at least mvpp2. The second issue addresses a very similar problem that occurs when trying to use ethtool to alter the advertisement mask - we call the MAC validate() function with the current interface mode, the current support and requested advertisement masks. This immediately restricts the advertisement in the same way as the above. This patch series addresses both issues, although the patches are not in the above order. v2: fix patch 3 missing 1G link modes for SGMII and RGMII interface modes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17net: mvpp2: update mvpp2 validate() implementationRussell King
Fix up the mvpp2 validate implementation to adopt the same behaviour as mvneta: - only allow the link modes that the specified PHY interface mode supports with the exception of 1000base-X and 2500base-X. - use the basex helper to deal with SFP modules that can be switched between 1000base-X vs 2500base-X. This gives consistent behaviour between mvneta and mvpp2. This commit depends on "net: phylink: extend clause 45 PHY validation workaround" so is not marked for backporting to stable kernels. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17net: phylink: extend clause 45 PHY validation workaroundRussell King
Commit e45d1f5288b8 ("net: phylink: support Clause 45 PHYs on SFP+ modules") added a workaround to support clause 45 PHYs which dynamically switch their interface mode on SFP+ modules. This was implemented by validating the PHYs supported/advertising using PHY_INTERFACE_MODE_NA, rather than the specific interface mode that we attached the PHY with. However, we already have a situation where phylink is used to connect a Marvell 88X3310 PHY which also behaves in exactly the same way, but which seemingly doesn't need this. The reason seems to be that the mvpp2 driver sets a whole bunch of link modes for PHY_INTERFACE_MODE_10GKR down to 10Mb/s, despite 10GBASE-R not actually supporting anything but 10Gb/s speeds. When testing with drivers that (correctly) take the mvneta approach, where the validate() method only returns what can be supported / advertised for the specified link mode, we find that Clause 45 PHYs do not behave as we expect: their advertisement is restricted to what the current link will support, rather than what the PHY supports through its dynamic switching. Extend this workaround to all such cases; if we have a Clause 45 PHY attaching via any means, except in USXGMII, XAUI and RXAUI which are all unable to support this dynamic switching or have other solutions to it, then we need to validate using PHY_INTERFACE_MODE_NA. This should allow mvpp2 to switch to a more conformant validate() implementation. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17net: phylink: improve clause 45 PHY ksettings_set implementationRussell King
While testing ethtool with the Methode DM7052 module, it was noticed that attempting to set the advertising mask results in the mask being truncated to the support offered by the currently chosen PHY interface mode. When a PHY dynamically changes the PHY interface mode, limiting the advertising mask in this way is not correct - if the PHY happened to negotiate 10GBASE-T, and selected 10GBASE-R as the host interface, we don't want to restrict the advertisement to just 10GBASE-* modes. Rework setting the advertisement to take account of this; do not pass the requested advertisement through phylink_validate(), but rely on the advertisement restriction (supported mask) set when the PHY was initially setup. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16Merge branch 'WireGuard-CI-and-housekeeping'David S. Miller
Jason A. Donenfeld says: ==================== WireGuard CI and housekeeping This is a collection of commits gathered during the last 1.5 weeks since merging WireGuard. If you'd prefer, I can send tree pull requests instead, but I figure it might be best for now to just send things as full patch sets to netdev. The first part of this adds in the CI test harness that we've been using for quite some time with success. You can type `make` and get the selftests running in a fresh VM immediately. This has been an instrumental tool in developing WireGuard, and I think it'd benefit most from being in-tree alongside the selftests that are already there. Once this lands, I plan to get build.wireguard.com building wireguard- linux.git and net-next.git on every single commit pushed, and do so on a bunch of different architectures. As this migrates into Linus' tree eventually and then into net.git, I'll get net.git building there too on every commit. Future work with this involves generalizing it to include more networking subsystem tests beyond just WireGuard, but one step at a time. In the process of porting this to the tree, the builder uncovered a mistake in the config menu file, which the second commit fixes. The last three commits are small housekeeping things, fixing spelling mistakes, replacing call_rcu with kfree_rcu, and removing an unused include. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16wireguard: allowedips: use kfree_rcu() instead of call_rcu()Wei Yongjun
The callback function of call_rcu() just calls a kfree(), so we can use kfree_rcu() instead of call_rcu() + callback function. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16wireguard: main: remove unused include <linux/version.h>YueHaibing
Remove <linux/version.h> from the includes for main.c, which is unused. Signed-off-by: YueHaibing <yuehaibing@huawei.com> [Jason: reworded commit message] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16wireguard: global: fix spelling mistakes in commentsJosh Soref
This fixes two spelling errors in source code comments. Signed-off-by: Josh Soref <jsoref@gmail.com> [Jason: rewrote commit message] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16wireguard: Kconfig: select parent dependency for cryptoJason A. Donenfeld
This fixes the crypto selection submenu depenencies. Otherwise, we'd wind up issuing warnings in which certain dependencies we also select couldn't be satisfied. This condition was triggered by the addition of the test suite autobuilder in the previous commit. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16wireguard: selftests: import harness makefile for test suiteJason A. Donenfeld
WireGuard has been using this on build.wireguard.com for the last several years with considerable success. It allows for very quick and iterative development cycles, and supports several platforms. To run the test suite on your current platform in QEMU: $ make -C tools/testing/selftests/wireguard/qemu -j$(nproc) To run it with KASAN and such turned on: $ DEBUG_KERNEL=yes make -C tools/testing/selftests/wireguard/qemu -j$(nproc) To run it emulated for another platform in QEMU: $ ARCH=arm make -C tools/testing/selftests/wireguard/qemu -j$(nproc) At the moment, we support aarch64_be, aarch64, arm, armeb, i686, m68k, mips64, mips64el, mips, mipsel, powerpc64le, powerpc, and x86_64. The system supports incremental rebuilding, so it should be very fast to change a single file and then test it out and have immediate feedback. This requires for the right toolchain and qemu to be installed prior. I've had success with those from musl.cc. This is tailored for WireGuard at the moment, though later projects might generalize it for other network testing. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16net: caif: replace BUG_ON with recovery codeAditya Pakki
In caif_xmit, there is a crash if the ptr dev is NULL. However, by returning the error to the callers, the error can be handled. The patch fixes this issue. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16fore200e: Fix incorrect checks of NULL pointer dereferenceAditya Pakki
In fore200e_send and fore200e_close, the pointers from the arguments are dereferenced in the variable declaration block and then checked for NULL. The patch fixes these issues by avoiding NULL pointer dereferences. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16Merge branch 'Simplify-IPv4-route-offload-API'David S. Miller
Ido Schimmel says: ==================== Simplify IPv4 route offload API Motivation ========== The aim of this patch set is to simplify the IPv4 route offload API by making the stack a bit smarter about the notifications it is generating. This allows driver authors to focus on programming the underlying device instead of having to duplicate the IPv4 route insertion logic in their driver, which is error-prone. This is the first patch set out of a series of four. Subsequent patch sets will simplify the IPv6 API, add offload/trap indication to routes and add tests for all the code paths (including error paths). Available here [1]. Details ======= Today, whenever an IPv4 route is added or deleted a notification is sent in the FIB notification chain and it is up to offload drivers to decide if the route should be programmed to the hardware or not. This is not an easy task as in hardware routes are keyed by {prefix, prefix length, table id}, whereas the kernel can store multiple such routes that only differ in metric / TOS / nexthop info. This series makes sure that only routes that are actually used in the data path are notified to offload drivers. This greatly simplifies the work these drivers need to do, as they are now only concerned with programming the hardware and do not need to replicate the IPv4 route insertion logic and store multiple identical routes. The route that is notified is the first FIB alias in the FIB node with the given {prefix, prefix length, table ID}. In case the route is deleted and there is another route with the same key, a replace notification is emitted. Otherwise, a delete notification is emitted. The above means that in the case of multiple routes with the same key, but different TOS, only the route with the highest TOS is notified. While the kernel can route a packet based on its TOS, this is not supported by any hardware devices I am familiar with. Moreover, this is not supported by IPv6 nor by BIRD/FRR from what I could see. Offload drivers should therefore use the presence of a non-zero TOS as an indication to trap packets matching the route and let the kernel route them instead. mlxsw has been doing it for the past two years. Testing ======= To ensure there is no degradation in route insertion rates, I averaged the insertion rate of 512k routes (/24 and /32) over 50 runs. Did not observe any degradation. Functional tests are available here [1]. They rely on route trap indication, which is only added in the last patch set. In addition, I have been running syzkaller for the past week with all four patch sets and debug options enabled. Did not observe any problems. Patch set overview ================== Patches #1-#8 gradually introduce the new FIB notifications Patch #9 converts mlxsw to use the new notifications Patch #10 converts the remaining listeners and removes the old notifications v2: * Extend fib_find_alias() with another argument instead of introducing a new function (David Ahern) RFC: https://patchwork.ozlabs.org/cover/1170530/ [1] https://github.com/idosch/linux/tree/fib-notifier ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16ipv4: Remove old route notifications and convert listenersIdo Schimmel
Unlike mlxsw, the other listeners to the FIB notification chain do not require any special modifications as they never considered multiple identical routes. This patch removes the old route notifications and converts all the listeners to use the new replace / delete notifications. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16mlxsw: spectrum_router: Start using new IPv4 route notificationsIdo Schimmel
With the new notifications mlxsw does not need to handle identical routes itself, as this is taken care of by the core IPv4 code. Instead, mlxsw only needs to take care of inserting and removing routes from the device. Convert mlxsw to use the new IPv4 route notifications and simplify the code. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16ipv4: Only Replay routes of interest to new listenersIdo Schimmel
When a new listener is registered to the FIB notification chain it receives a dump of all the available routes in the system. Instead, make sure to only replay the IPv4 routes that are actually used in the data path and are of any interest to the new listener. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16ipv4: Handle route deletion notification during flushIdo Schimmel
In a similar fashion to previous patch, when a route is deleted as part of table flushing, promote the next route in the list, if exists. Otherwise, simply emit a delete notification. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16ipv4: Handle route deletion notificationIdo Schimmel
When a route is deleted we potentially need to promote the next route in the FIB alias list (e.g., with an higher metric). In case we find such a route, a replace notification is emitted. Otherwise, a delete notification for the deleted route. v2: * Convert to use fib_find_alias() instead of fib_find_first_alias() Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16ipv4: Notify newly added route if should be offloadedIdo Schimmel
When a route is added, it should only be notified in case it is the first route in the FIB alias list with the given {prefix, prefix length, table ID}. Otherwise, it is not used in the data path and should not be considered by switch drivers. v2: * Convert to use fib_find_alias() instead of fib_find_first_alias() Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16ipv4: Notify route if replacing currently offloaded oneIdo Schimmel
When replacing a route, its replacement should only be notified in case the replaced route is of any interest to listeners. In other words, if the replaced route is currently used in the data path, which means it is the first route in the FIB alias list with the given {prefix, prefix length, table ID}. v2: * Convert to use fib_find_alias() instead of fib_find_first_alias() Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16ipv4: Extend FIB alias find functionIdo Schimmel
Extend the function with another argument, 'find_first'. When set, the function returns the first FIB alias with the matching {prefix, prefix length, table ID}. The TOS and priority parameters are ignored. Current callers are converted to pass 'false' in order to maintain existing behavior. This will be used by subsequent patches in the series. v2: * New patch Signed-off-by: Ido Schimmel <idosch@mellanox.com> Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16ipv4: Notify route after insertion to the routing tableIdo Schimmel
Currently, a new route is notified in the FIB notification chain before it is inserted to the FIB alias list. Subsequent patches will use the placement of the new route in the ordered FIB alias list in order to determine if the route should be notified or not. As a preparatory step, change the order so that the route is first inserted into the FIB alias list and only then notified. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>