summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-11-23net/tun: Call type change netdev notifiersMartin Schiller
Call netdev notifiers before and after changing the device type. Signed-off-by: Martin Schiller <ms@dev.tdt.de> Link: https://lore.kernel.org/r/20201118063919.29485-1-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21octeontx2-af: Add support for RSS hashing based on Transport protocol fieldGeorge Cherian
Add support to choose RSS flow key algorithm with IPv4 transport protocol field included in hashing input data. This will be enabled by default. There-by enabling 3/5 tuple hash Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: George Cherian <george.cherian@marvell.com> Link: https://lore.kernel.org/r/20201120093906.2873616-1-george.cherian@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21Merge tag 'linux-can-next-for-5.11-20201120' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2020-11-20 The first patch is by Yegor Yefremov and he improves the j1939 documentaton by adding tables for the CAN identifier and its fields. Then there are 8 patches by Oliver Hartkopp targeting the CAN driver infrastructure and drivers. These add support for optional DLC element to the Classical CAN frame structure. See patch ea7800565a12 ("can: add optional DLC element to Classical CAN frame structure") for details. Oliver's last patch adds len8_dlc support to several drivers. Stefan Mätje provides a patch to add len8_dlc support to the esd_usb2 driver. The next patch is by Oliver Hartkopp, too and adds support for modification of Classical CAN DLCs to CAN GW sockets. The next 3 patches target the nxp,flexcan DT bindings. One patch by my adds the missing uint32 reference to the clock-frequency property. Joakim Zhang's patches fix the fsl,clk-source property and add the IMX_SC_R_CAN() macro to the imx firmware header file, which will be used in the flexcan driver later. Another patch by Joakim Zhang prepares the flexcan driver for SCU based stop-mode, by giving the existing, GPR based stop-mode, a _GPR postfix. The next 5 patches are by me, target the flexcan driver, and clean up the .ndo_open and .ndo_stop callbacks. These patches try to fix a sporadically hanging flexcan_close() during simultanious ifdown, sending of CAN messages and probably open CAN bus. I was never able to reproduce, but these seem to fix the problem at the reporting user. As these changes are rather big, I'd like to mainline them via net-next/master. The next patches are by Jimmy Assarsson and Christer Beskow, they add support for new USB devices to the existing kvaser_usb driver. The last patch is by Kaixu Xia and simplifies the return in the mcp251xfd_chip_softreset() function in the mcp251xfd driver. * tag 'linux-can-next-for-5.11-20201120' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (25 commits) can: mcp251xfd: remove useless code in mcp251xfd_chip_softreset can: kvaser_usb: Add new Kvaser hydra devices can: kvaser_usb: kvaser_usb_hydra: Add support for new device variant can: kvaser_usb: Add new Kvaser Leaf v2 devices can: kvaser_usb: Add USB_{LEAF,HYDRA}_PRODUCT_ID_END defines can: flexcan: flexcan_close(): change order if commands to properly shut down the controller can: flexcan: flexcan_open(): completely initialize controller before requesting IRQ can: flexcan: flexcan_rx_offload_setup(): factor out mailbox and rx-offload setup into separate function can: flexcan: move enabling/disabling of interrupts from flexcan_chip_{start,stop}() to callers can: flexcan: factor out enabling and disabling of interrupts into separate function can: flexcan: rename macro FLEXCAN_QUIRK_SETUP_STOP_MODE -> FLEXCAN_QUIRK_SETUP_STOP_MODE_GPR dt-bindings: firmware: add IMX_SC_R_CAN(x) macro for CAN dt-bindings: can: fsl,flexcan: fix fsl,clk-source property dt-bindings: can: fsl,flexcan: add uint32 reference to clock-frequency property can: gw: support modification of Classical CAN DLCs can: drivers: add len8_dlc support for esd_usb2 CAN adapter can: drivers: add len8_dlc support for various CAN adapters can: drivers: introduce helpers to access Classical CAN DLC values can: update documentation for DLC usage in Classical CAN can: rename CAN FD related can_len2dlc and can_dlc2len helpers ... ==================== Link: https://lore.kernel.org/r/20201120133318.3428231-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21net: bridge: switch to net core statistics counters handlingHeiner Kallweit
Use netdev->tstats instead of a member of net_bridge for storing a pointer to the per-cpu counters. This allows us to use core functionality for statistics handling. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/9bad2be2-fd84-7c6e-912f-cee433787018@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21Merge branch 'net-hns3-misc-updates-for-next'Jakub Kicinski
Huazhong Tan says: ==================== net: hns3: misc updates for -next This series includes some misc updates for the HNS3 ethernet driver. #1 adds support for 1280 queues #2 adds mapping for BAR45 which is needed by RoCE client. #3 extend the interrupt resources. #4&#5 add support to query firmware's calculated shaping parameters. ==================== Link: https://lore.kernel.org/r/1605863783-36995-1-git-send-email-tanhuazhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21net: hns3: adds debugfs to dump more info of shaping parametersYonglong Liu
Adds debugfs to dump new shaping parameters: rate and flag. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21net: hns3: add support to utilize the firmware calculated shaping parametersYonglong Liu
Since the calculation of the driver is fixed, if the number of queue or clock changed, the calculated result may be inaccurate. So for compatible and maintainable, add a new flag to tell the firmware to calculate the shaping parameters with the specified rate. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21net: hns3: add support for pf querying new interrupt resourcesYufeng Mo
For HNAE3_DEVICE_VERSION_V3, a maximum of 1281 interrupt resources are supported. To utilize these new resources, extend the corresponding field or variable to 16bit type, and remove the restriction of NIC client that only use a maximum of 65 interrupt vectors. In addition, the I/O address of the extended interrupt resources are different, so an extra handler is needed. Currently, the total number of interrupts is the sum of RoCE's number and RoCE's offset (RoCE is in front of NIC), since the number of both NIC and RoCE are same. For readability, rewrite the corresponding field of the command, rename the RoCE's offset field as the number of NIC interrupts, then the total number of interrupts is sum of the number of RoCE and NIC, and replace vport->back with hdev in hclge_init_roce_base_info() for simplifying the code. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21net: hns3: add support for mapping device memoryHuazhong Tan
For device who has device memory accessed through the PCI BAR4, IO descriptor push of NIC and direct WQE(Work Queue Element) of RoCE will use this device memory, so add support for mapping this device memory, and add this info to the RoCE client whose new feature needs. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21net: hns3: add support for 1280 queuesYonglong Liu
For DEVICE_VERSION_V1/2, there are total 1024 queues and queue sets. For DEVICE_VERSION_V3, it increases to 1280, and can be assigned to one pf, so remove the limitation of 1024. To keep compatible with DEVICE_VERSION_V1/2 and old driver version, the queue number is split into two part: tqp_num(range 0~1023) and ext_tqp_num(range 1024~1279). Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20Merge branch 'ibmvnic-performance-improvements-and-other-updates'Jakub Kicinski
Thomas Falcon says: ==================== ibmvnic: Performance improvements and other updates The first three patches utilize a hypervisor call allowing multiple TX and RX buffer replenishment descriptors to be sent in one operation, which significantly reduces hypervisor call overhead. The xmit_more and Byte Queue Limit API's are leveraged to provide this support for TX descriptors. The subsequent two patches remove superfluous code and members in TX completion handling function and TX buffer structure, respectively, and remove unused routines. Finally, four patches which ensure that device queue memory is cache-line aligned, resolving slowdowns observed in PCI traces, as well as optimize the driver's NAPI polling function and to RX buffer replenishment are provided by Dwip Banerjee. This series provides significant performance improvements, allowing the driver to fully utilize 100Gb NIC's. ==================== Link: https://lore.kernel.org/r/1605748345-32062-1-git-send-email-tlfalcon@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20ibmvnic: Do not replenish RX buffers after every polling loopDwip N. Banerjee
Reduce the amount of time spent replenishing RX buffers by only doing so once available buffers has fallen under a certain threshold, in this case half of the total number of buffers, or if the polling loop exits before the packets processed is less than its budget. Non-exhaustion of NAPI budget implies lower incoming packet pressure, allowing the leeway to refill the buffers in preparation for any impending burst. Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20ibmvnic: Use netdev_alloc_skb instead of alloc_skb to replenish RX buffersDwip N. Banerjee
Take advantage of the additional optimizations in netdev_alloc_skb when allocating socket buffers to be used for packet reception. Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com> Acked-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20ibmvnic: Correctly re-enable interrupts in NAPI polling routineDwip N. Banerjee
If the current NAPI polling loop exits without completing it's budget, only re-enable interrupts if there are no entries remaining in the queue and napi_complete_done is successful. If there are entries remaining on the queue that were missed, restart the polling loop. Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20ibmvnic: Ensure that device queue memory is cache-line alignedDwip N. Banerjee
PCI bus slowdowns were observed on IBM VNIC devices as a result of partial cache line writes and non-cache aligned full cache line writes. Ensure that packet data buffers are cache-line aligned to avoid these slowdowns. Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20ibmvnic: Remove send_subcrq functionThomas Falcon
It is not longer used, so remove it. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Acked-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20ibmvnic: Clean up TX code and TX buffer data structureThomas Falcon
Remove unused and superfluous code and members in existing TX implementation and data structures. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20ibmvnic: Introduce xmit_more support using batched subCRQ hcallsThomas Falcon
Include support for the xmit_more feature utilizing the H_SEND_SUB_CRQ_INDIRECT hypervisor call which allows the sending of multiple subordinate Command Response Queue descriptors in one hypervisor call via a DMA-mapped buffer. This update reduces hypervisor calls and thus hypervisor call overhead per TX descriptor. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20ibmvnic: Introduce batched RX buffer descriptor transmissionThomas Falcon
Utilize the H_SEND_SUB_CRQ_INDIRECT hypervisor call to send multiple RX buffer descriptors to the device in one hypervisor call operation. This change will reduce the number of hypervisor calls and thus hypervisor call overhead needed to transmit RX buffer descriptors to the device. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20ibmvnic: Introduce indirect subordinate Command Response Queue bufferThomas Falcon
This patch introduces the infrastructure to send batched subordinate Command Response Queue descriptors, which are used by the ibmvnic driver to send TX frame and RX buffer descriptors. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Acked-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20Merge branch 'net-ipa-add-a-driver-shutdown-callback'Jakub Kicinski
Alex Elder says: ==================== net: ipa: add a driver shutdown callback The final patch in this series adds a driver shutdown callback for the IPA driver. The patches leading up to that address some issues encountered while ensuring that callback worked as expected: - The first just reports a little more information when channels or event rings are in unexpected states - The second patch recognizes a condition where an as-yet-unused channel does not require a reset during teardown - The third patch explicitly ignores a certain error condition, because it can't be avoided, and is harmless if it occurs - The fourth properly handles requests to retry a channel HALT request - The fifth makes a second attempt to stop modem activity during shutdown if it's busy The shutdown callback is implemented by calling the existing remove callback function (reporting if that returns an error). ==================== Link: https://lore.kernel.org/r/20201119224929.23819-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20net: ipa: add driver shutdown callbackAlex Elder
A system shutdown can happen at essentially any time, and it's possible that the IPA driver is busy when a shutdown is underway. IPA hardware accesses IMEM and SMEM memory regions using an IOMMU, and at some point during shutdown, needed I/O mappings could become invalid. This could be disastrous for any "in flight" IPA activity. Avoid this by defining a new driver shutdown callback that stops all IPA activity and cleanly shuts down the driver. It merely calls the driver's existing remove callback, reporting the error if it returns one. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20net: ipa: retry modem stop if busyAlex Elder
The IPA driver remove callback, ipa_remove(), calls ipa_modem_stop() if the setup stage of initialization is complete. If a concurrent call to ipa_modem_start() or ipa_modem_stop() has begin but not completed, ipa_modem_stop() can return an error (-EBUSY). The next patch adds a driver shutdown callback, which will simply call ipa_remove(). We really want our shutdown callback to clean things up. So add a single retry to the ipa_modem_stop() call in ipa_remove() after a short (millisecond) delay. This offers no guarantee the shutdown will complete successfully, but we'll at least try a little harder before giving up. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20net: ipa: support retries on generic GSI commandsAlex Elder
When stopping an AP RX channel, there can be a transient period while the channel enters STOP_IN_PROC state before reaching the final STOPPED state. In that case we make another attempt to stop the channel. Similarly, when stopping a modem channel (using a GSI generic command issued from the AP), it's possible that multiple attempts will be required before the channel reaches STOPPED state. Add a field to the GSI structure to record an errno representing the result code provided when a generic command completes. If the result learned in gsi_isr_gp_int1() is RETRY, record -EAGAIN in the result code, otherwise record 0 for success, or -EIO for any other result. If we time out nf gsi_generic_command() waiting for the command to complete, return -ETIMEDOUT (as before). Otherwise return the result stashed by gsi_isr_gp_int1(). Add a loop in gsi_modem_channel_halt() to reissue the HALT command if the result code indicates -EAGAIN. Limit this to 10 retries (after the initial attempt). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20net: ipa: ignore CHANNEL_NOT_RUNNING errorsAlex Elder
IPA v4.2 has a hardware quirk that requires the AP to allocate GSI channels for the modem to use. It is recommended that these modem channels get stopped (with a HALT generic command) by the AP when its IPA driver gets removed. The AP has no way of knowing the current state of a modem channel. So when the IPA driver issues a HALT command it's possible the channel is not running, and in that case we get an error indication. This error simply means we didn't need to stop the channel, so we can ignore it. This patch adds an explanation for this situation, and arranges for this condition to *not* report an error message. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20net: ipa: don't reset an ALLOCATED channelAlex Elder
If the rmnet_ipa0 network device has not been opened at the time we remove or shut down the IPA driver, its underlying TX and RX GSI channels will not have been started, and they will still be in ALLOCATED state. The RESET command on a channel is meant to return a channel to ALLOCATED state after it's been stopped. But if it was never started, its state will still be ALLOCATED, the RESET command is not required. Quietly skip doing the reset without printing an error message if a channel is already in ALLOCATED state when we request it be reset. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20net: ipa: print channel/event ring number on errorAlex Elder
When a GSI command is used to change the state of a channel or event ring we check the state before and after the command to ensure it is as expected. If not, we print an error message, but it does not include the channel or event ring id, and it easily can. Add the channel or event ring id to these error messages. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20Merge branch 'net-ipa-platform-specific-clock-and-interconnect-rates'Jakub Kicinski
Alex Elder says: ==================== net: ipa: platform-specific clock and interconnect rates This series changes the way the IPA core clock rate and the bandwidth parameters for interconnects are specified. Previously these were specified with hard-wired constants, with the same values used for the SDM845 and SC7180 platforms. Now these parameters are recorded in platform-specific configuration data. For the SC7180 this means we use an all-new core clock rate and interconnect parameters. Additionally, while developing this I learned that the average bandwidth setting for two of the interconnects is ignored (on both platforms). Zero is now used explicitly as that unused bandwidth value. This means the SDM845 bandwidth settings are also changed by this series. ==================== Link: https://lore.kernel.org/r/20201119224041.16066-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20net: ipa: use config data for clockingAlex Elder
Stop assuming a fixed IPA core clock rate and interconnect bandwidths. Use the configuration data defined for these things instead. Get rid of the previously-used constants. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20net: ipa: populate clock and interconnect dataAlex Elder
Populate the core clock rate and interconnect average and peak bandwidth data for SDM845 and SC7180 in their configuration data files. At this point we still don't *use* this data. Note that SC7180 actually defines a new core clock rate (100 MHz instead of 75 MHz) and new interconnect bandwidth values. They will be activated in the next commit, which uses the configured values rather than the fixed constants. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20net: ipa: define clock and interconnect dataAlex Elder
Define a new type of configuration data, used to initialize the IPA core clock and interconnects. This is the first of three patches, and defines the data types and interface but doesn't yet use them. Switch the return value if there is no matching configuration data to ENODEV instead of ENOTSUPP (to avoid using the nonstandard errno). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20mdio_bus: suppress err message for reset gpio EPROBE_DEFERGrygorii Strashko
The mdio_bus may have dependencies from GPIO controller and so got deferred. Now it will print error message every time -EPROBE_DEFER is returned which from: __mdiobus_register() |-devm_gpiod_get_optional() without actually identifying error code. "mdio_bus 4a101000.mdio: mii_bus 4a101000.mdio couldn't get reset GPIO" Hence, suppress error message for devm_gpiod_get_optional() returning -EPROBE_DEFER case by using dev_err_probe(). Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Link: https://lore.kernel.org/r/20201119203446.20857-1-grygorii.strashko@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20r8169: use dev_err_probe in rtl_get_ether_clkHeiner Kallweit
Tiny improvement, let dev_err_probe() deal with EPROBE_DEFER. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/b0c4ebcf-2047-e933-b890-8a20e4bdb19f@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20r8169: reduce number of workaround doorbell ringsHeiner Kallweit
Some chip versions have a hw bug resulting in lost door bell rings. To work around this the doorbell is also rung whenever we still have tx descriptors in flight after having cleaned up tx descriptors. These PCI(e) writes come at a cost, therefore let's reduce the number of extra doorbell rings. If skb is NULL then this means: - last cleaned-up descriptor belongs to a skb with at least one fragment and last fragment isn't marked as sent yet - hw is in progress sending the skb, therefore no extra doorbell ring is needed for this skb - once last fragment is marked as transmitted hw will trigger a tx done interrupt and we come here again (with skb != NULL) and ring the doorbell if needed Therefore skip the workaround doorbell ring if skb is NULL. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/0a15a83c-aecf-ab51-8071-b29d9dcd529a@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20Merge branch 'mptcp-more-miscellaneous-mptcp-fixes'Jakub Kicinski
Mat Martineau says: ==================== mptcp: More miscellaneous MPTCP fixes Here's another batch of fixup and enhancement patches that we have collected in the MPTCP tree. Patch 1 removes an unnecessary flag and related code. Patch 2 fixes a bug encountered when closing fallback sockets. Patches 3 and 4 choose a better transmit subflow, with a self test. Patch 5 adjusts tracking of unaccepted subflows Patches 6-8 improve handling of long ADD_ADDR options, with a test. Patch 9 more reliably tracks the MPTCP-level window shared with peers. Patch 10 sends MPTCP-level acknowledgements more aggressively, so the peer can send more data without extra delay. ==================== Link: https://lore.kernel.org/r/20201119194603.103158-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20mptcp: refine MPTCP-level ack schedulingPaolo Abeni
Send timely MPTCP-level ack is somewhat difficult when the insertion into the msk receive level is performed by the worker. It needs TCP-level dup-ack to notify the MPTCP-level ack_seq increase, as both the TCP-level ack seq and the rcv window are unchanged. We can actually avoid processing incoming data with the worker, and let the subflow or recevmsg() send ack as needed. When recvmsg() moves the skbs inside the msk receive queue, the msk space is still unchanged, so tcp_cleanup_rbuf() could end-up skipping TCP-level ack generation. Anyway, when __mptcp_move_skbs() is invoked, a known amount of bytes is going to be consumed soon: we update rcv wnd computation taking them in account. Additionally we need to explicitly trigger tcp_cleanup_rbuf() when recvmsg() consumes a significant amount of the receive buffer. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20mptcp: track window announced to peerFlorian Westphal
OoO handling attempts to detect when packet is out-of-window by testing current ack sequence and remaining space vs. sequence number. This doesn't work reliably. Store the highest allowed sequence number that we've announced and use it to detect oow packets. Do this when mptcp options get written to the packet (wire format). For this to work we need to move the write_options call until after stack selected a new tcp window. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20selftests: mptcp: add ADD_ADDR IPv6 test casesGeliang Tang
This patch added IPv6 support for do_transfer, and the test cases for ADD_ADDR IPv6. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20mptcp: send out dedicated ADD_ADDR packetGeliang Tang
When ADD_ADDR suboption includes an IPv6 address, the size is 28 octets. It will not fit when other MPTCP suboptions are included in this packet, e.g. DSS. So here we send out an ADD_ADDR dedicated packet to carry only ADD_ADDR suboption, no other MPTCP suboptions. In mptcp_pm_announce_addr, we check whether this is an IPv6 ADD_ADDR. If it is, we set the flag MPTCP_ADD_ADDR_IPV6 to true. Then we call mptcp_pm_nl_add_addr_send_ack to sent out a new pure ACK packet. In mptcp_established_options_add_addr, we check whether this is a pure ACK packet for ADD_ADDR. If it is, we drop all other MPTCP suboptions in this packet, only put ADD_ADDR suboption in it. Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20mptcp: change add_addr_signal typeGeliang Tang
This patch changed the 'add_addr_signal' type from bool to char, so that we could encode the addr type there. Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20mptcp: keep unaccepted MPC subflow into join listPaolo Abeni
This will simplify all operation dealing with subflows before accept time (e.g. data fin processing, add_addr). The join list is already flushed by mptcp_stream_accept() before returning the newly created msk to the user space. This also fixes an potential bug present into the old code: conn_list was manipulated without helding the msk lock in mptcp_stream_accept(). Tested-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20selftests: mptcp: add link failure test caseFlorian Westphal
Add a test case where a link fails with multiple subflows. The expectation is that MPTCP will transmit any data that could not be delivered via the failed link on another subflow. Co-developed-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20mptcp: skip to next candidate if subflow has unacked dataFlorian Westphal
In case a subflow path is blocked, MPTCP-level retransmit may not take place anymore because such subflow is likely to have unacked data left in its write queue. Ignore subflows that have experienced loss and test next candidate. Fixes: 3b1d6210a95773691 ("mptcp: implement and use MPTCP-level retransmission") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20mptcp: fix state tracking for fallback socketPaolo Abeni
We need to cope with some more state transition for fallback sockets, or could still end-up moving to TCP_CLOSE too early and avoid spooling some pending data Fixes: e16163b6e2b7 ("mptcp: refactor shutdown and close") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20mptcp: drop WORKER_RUNNING status bitPaolo Abeni
Only mptcp_close() can actually cancel the workqueue, no need to add and use this flag. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20Merge branch 'mlxsw-add-support-for-nexthop-objects'Jakub Kicinski
Ido Schimmel says: ==================== mlxsw: Add support for nexthop objects This patch set adds support for nexthop objects in mlxsw. Nexthop objects are treated as another front-end for programming nexthops, in addition to the existing IPv4 and IPv6 front-ends. Patch #1 registers a listener to the nexthop notification chain and parses the nexthop information into the existing mlxsw data structures that are already used by the IPv4 and IPv6 front-ends. Blackhole nexthops are currently rejected. Support will be added in a follow-up patch set. Patch #2 extends mlxsw to resolve its internal nexthop objects from the nexthop identifier encoded in the FIB info of the notified routes. Patch #3 finally removes the limitation of rejecting routes that use nexthop objects. Patch #4 adds a selftest. Patches #5-#8 add generic forwarding selftests that can be used with veth pairs or physical loopbacks. ==================== Link: https://lore.kernel.org/r/20201119130848.407918-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20selftests: forwarding: Add multipath tunneling nexthop testIdo Schimmel
Add a nexthop objects version of gre_multipath.sh. Unlike the original test, it also tests IPv6 overlay which is not possible with the legacy nexthop implementation. See commit 9a2ad3623868 ("selftests: forwarding: gre_multipath: Drop IPv6 tests") for more info. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20selftests: forwarding: Add device-only nexthop testIdo Schimmel
In a similar fashion to router_multipath.sh and its nexthop objects version router_mpath_nh.sh, create a nexthop objects version of router.sh. It reuses the same topology, but uses device-only nexthop objects instead of legacy ones. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20selftests: forwarding: Test IPv4 routes with IPv6 link-local nexthopsIdo Schimmel
In addition to IPv4 multipath tests with IPv4 nexthops, also test IPv4 multipath with nexthops that use IPv6 link-local addresses. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20selftests: forwarding: Do not configure nexthop objects twiceIdo Schimmel
routing_nh_obj() is used to configure the nexthop objects employed by the test, but it is called twice resulting in "RTNETLINK answers: File exists" messages. Remove the first call, so that the function is only called after setup_wait(), when all the interfaces are up and ready. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>