summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-04-19powerpc/kvm: Fix lockups when running KVM guests on Power8Michael Ellerman
When running KVM guests on Power8 we can see a lockup where one CPU stops responding. This often leads to a message such as: watchdog: CPU 136 detected hard LOCKUP on other CPUs 72 Task dump for CPU 72: qemu-system-ppc R running task 10560 20917 20908 0x00040004 And then backtraces on other CPUs, such as: Task dump for CPU 48: ksmd R running task 10032 1519 2 0x00000804 Call Trace: ... --- interrupt: 901 at smp_call_function_many+0x3c8/0x460 LR = smp_call_function_many+0x37c/0x460 pmdp_invalidate+0x100/0x1b0 __split_huge_pmd+0x52c/0xdb0 try_to_unmap_one+0x764/0x8b0 rmap_walk_anon+0x15c/0x370 try_to_unmap+0xb4/0x170 split_huge_page_to_list+0x148/0xa30 try_to_merge_one_page+0xc8/0x990 try_to_merge_with_ksm_page+0x74/0xf0 ksm_scan_thread+0x10ec/0x1ac0 kthread+0x160/0x1a0 ret_from_kernel_thread+0x5c/0x78 This is caused by commit 8c1c7fb0b5ec ("powerpc/64s/idle: avoid sync for KVM state when waking from idle"), which added a check in pnv_powersave_wakeup() to see if the kvm_hstate.hwthread_state is already set to KVM_HWTHREAD_IN_KERNEL, and if so to skip the store and test of kvm_hstate.hwthread_req. The problem is that the primary does not set KVM_HWTHREAD_IN_KVM when entering the guest, so it can then come out to cede with KVM_HWTHREAD_IN_KERNEL set. It can then go idle in kvm_do_nap after setting hwthread_req to 1, but because hwthread_state is still KVM_HWTHREAD_IN_KERNEL we will skip the test of hwthread_req when we wake up from idle and won't go to kvm_start_guest. From there the thread will return somewhere garbage and crash. Fix it by skipping the store of hwthread_state, but not the test of hwthread_req, when coming out of idle. It's OK to skip the sync in that case because hwthread_req will have been set on the same thread, so there is no synchronisation required. Fixes: 8c1c7fb0b5ec ("powerpc/64s/idle: avoid sync for KVM state when waking from idle") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-18ipv6: frags: fix a lockdep false positiveEric Dumazet
lockdep does not know that the locks used by IPv4 defrag and IPv6 reassembly units are of different classes. It complains because of following chains : 1) sch_direct_xmit() (lock txq->_xmit_lock) dev_hard_start_xmit() xmit_one() dev_queue_xmit_nit() packet_rcv_fanout() ip_check_defrag() ip_defrag() spin_lock() (lock frag queue spinlock) 2) ip6_input_finish() ipv6_frag_rcv() (lock frag queue spinlock) ip6_frag_queue() icmpv6_param_prob() (lock txq->_xmit_lock at some point) We could add lockdep annotations, but we also can make sure IPv6 calls icmpv6_param_prob() only after the release of the frag queue spinlock, since this naturally makes frag queue spinlock a leaf in lock hierarchy. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19powerpc/eeh: Fix enabling bridge MMIO windowsMichael Neuling
On boot we save the configuration space of PCIe bridges. We do this so when we get an EEH event and everything gets reset that we can restore them. Unfortunately we save this state before we've enabled the MMIO space on the bridges. Hence if we have to reset the bridge when we come back MMIO is not enabled and we end up taking an PE freeze when the driver starts accessing again. This patch forces the memory/MMIO and bus mastering on when restoring bridges on EEH. Ideally we'd do this correctly by saving the configuration space writes later, but that will have to come later in a larger EEH rewrite. For now we have this simple fix. The original bug can be triggered on a boston machine by doing: echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound On boston, this PHB has a PCIe switch on it. Without this patch, you'll see two EEH events, 1 expected and 1 the failure we are fixing here. The second EEH event causes the anything under the PHB to disappear (i.e. the i40e eth). With this patch, only 1 EEH event occurs and devices properly recover. Fixes: 652defed4875 ("powerpc/eeh: Check PCIe link after reset") Cc: stable@vger.kernel.org # v3.11+ Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-18net: qualcomm: rmnet: Fix warning seen with fill_infoSubash Abhinov Kasiviswanathan
When the last rmnet device attached to a real device is removed, the real device is unregistered from rmnet. As a result, the real device lookup fails resulting in a warning when the fill_info handler is called as part of the rmnet device unregistration. Fix this by returning the rmnet flags as 0 when no real device is present. WARNING: CPU: 0 PID: 1779 at net/core/rtnetlink.c:3254 rtmsg_ifinfo_build_skb+0xca/0x10d Modules linked in: CPU: 0 PID: 1779 Comm: ip Not tainted 4.16.0-11872-g7ce2367 #1 Stack: 7fe655f0 60371ea3 00000000 00000000 60282bc6 6006b116 7fe65600 60371ee8 7fe65660 6003a68c 00000000 900000000 Call Trace: [<6006b116>] ? printk+0x0/0x94 [<6001f375>] show_stack+0xfe/0x158 [<60371ea3>] ? dump_stack_print_info+0xe8/0xf1 [<60282bc6>] ? rtmsg_ifinfo_build_skb+0xca/0x10d [<6006b116>] ? printk+0x0/0x94 [<60371ee8>] dump_stack+0x2a/0x2c [<6003a68c>] __warn+0x10e/0x13e [<6003a82c>] warn_slowpath_null+0x48/0x4f [<60282bc6>] rtmsg_ifinfo_build_skb+0xca/0x10d [<60282c4d>] rtmsg_ifinfo_event.part.37+0x1e/0x43 [<60282c2f>] ? rtmsg_ifinfo_event.part.37+0x0/0x43 [<60282d03>] rtmsg_ifinfo+0x24/0x28 [<60264e86>] dev_close_many+0xba/0x119 [<60282cdf>] ? rtmsg_ifinfo+0x0/0x28 [<6027c225>] ? rtnl_is_locked+0x0/0x1c [<6026ca67>] rollback_registered_many+0x1ae/0x4ae [<600314be>] ? unblock_signals+0x0/0xae [<6026cdc0>] ? unregister_netdevice_queue+0x19/0xec [<6026ceec>] unregister_netdevice_many+0x21/0xa1 [<6027c765>] rtnl_delete_link+0x3e/0x4e [<60280ecb>] rtnl_dellink+0x262/0x29c [<6027c241>] ? rtnl_get_link+0x0/0x3e [<6027f867>] rtnetlink_rcv_msg+0x235/0x274 Fixes: be81a85f5f87 ("net: qualcomm: rmnet: Implement fill_info") Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18hv_netvsc: Add NetVSP v6 and v6.1 into version negotiationHaiyang Zhang
This patch adds the NetVSP v6 and 6.1 message structures, and includes these versions into NetVSC/NetVSP version negotiation process. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18hv_netvsc: propogate Hyper-V friendly name into interface aliasStephen Hemminger
This patch implement the 'Device Naming' feature of the Hyper-V network device API. In Hyper-V on the host through the GUI or PowerShell it is possible to enable the device naming feature which causes the host to make available to the guest the name of the device. This shows up in the RNDIS protocol as the friendly name. The name has no particular meaning and is limited to 256 characters. The value can only be set via PowerShell on the host, but could be scripted for mass deployments. The default value is the string 'Network Adapter' and since that is the same for all devices and useless, the driver ignores it. In Windows, the value goes into a registry key for use in SNMP ifAlias. For Linux, this patch puts the value in the network device alias property; where it is visible in ip tools and SNMP. The host provided ifAlias is just a suggestion, and can be overridden by later ip commands. Also requires exporting dev_set_alias in netdev core. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18Merge branch 'r8169-series-with-further-smaller-improvements'David S. Miller
Heiner Kallweit says: ==================== r8169: series with further smaller improvements This series includes further smaller improvements. Then I think the basic cleanup has been done and next step would be preparing the switch to phylib. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: remove jumbo_tx_csum from chip config structHeiner Kallweit
According to the chip configuration entries only RTL8169 (ver <= 06) supports tx checksumming for jumbo packets. By the way: constant JUMBO_1K is a little misleading because it refers to the standard packet size and not to a jumbo packet size. By implementing this rule we can get rid of configuring tx checksumming support per chip type. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: improve pci region handlingHeiner Kallweit
The region to be used is always the first of type IORESOURCE_MEM. We can implement this rule directly w/o having to specify which region is the first one per configuration entry. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: drop member txd_version from struct rtl8169_privateHeiner Kallweit
txd_version is used in rtl_init_one() only, so we can drop member txd_version from struct rtl8169_private. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: improve rtl8169_get_mac_versionHeiner Kallweit
Certain entries in array mac_info[] are redundant, so remove them: 0x7cf, 0x2c200000 (VER 33): matched by entry 0x7c8, 0x2c000000 0x7cf, 0x28300000 (VER 26): matched by entry 0x7c8, 0x28000000 0x7cf, 0x3cb00000 (VER 24): matched by entry 0x7c8, 0x3c800000 0x7cf, 0x3c400000 (VER 22): matched by entry 0x7c8, 0x3c000000 0x7cf, 0x38500000 (VER 17): matched by entry 0x7c8, 0x38000000 0x7cf, 0x44900000 (VER 39): matched by entry 0x7c8, 0x44800000 0x7cf, 0x40b00000 (VER 30): matched by entry 0x7c8, 0x40800000 0x7cf, 0x40a00000 (VER 30): matched by entry 0x7c8, 0x40800000 0x7cf, 0x34a00000 (VER 09): matched by entry 0x7c8, 0x34800000 0x7cf, 0x24a00000 (VER 09): matched by entry 0x7c8, 0x24800000 In addition don't mask out bits 30 and 29 when printing the XID. Most likely this is a relict from the times when the driver covered RTL8169 chip version only. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: don't display tp->mmio_addr addressHeiner Kallweit
For security reasons since commit ad67b74d2469 "printk: hash addresses printed with %p" %p doesn't display the full address any longer. We could switch to %px, but I think the pointer address doesn't provide a real benefit, so remove printing the hashed address. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: drop member opts1_mask from struct rtl8169_privateHeiner Kallweit
We can get rid of member opts1_mask and in addition save a few cpu cycles in the hot path of rtl_rx(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: change interrupt handler argument typeHeiner Kallweit
Code can be a little simplified by switching the interrupt handler argument type to struct rtl8169_private *. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: change argument type of counters handling functionsHeiner Kallweit
The counter handling functions don't deal with the net_device, so code can be simplified by changing the argument type to struct rtl8169_private *. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: change hw_start argument typeHeiner Kallweit
Code can be simplified by changing the argument type of hw_start callbacks from struct net_device * to struct rtl8169_private *. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: remove rtl8169_map_to_asicHeiner Kallweit
This function is very simple and used only once, so we can inline the two statements. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: replace rx_buf_sz with a constantHeiner Kallweit
rx_buf_sz is constant, so we don't have to pass it as parameter and in general can replace it with a constant. When working on this I noticed that also before in rtl_set_rx_max_size() a value of 0x4000 is set, what is not in line with the chip spec. According to the spec only bits 0..13 are used and we set an effective value of zero therefore. However, the driver still seems to work and due to potential side effects I'm reluctant to make a change. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: remove unneeded check in rtl8169_rx_fillHeiner Kallweit
rtl8169_rx_fill() is called only once and directly before the call array tp->Rx_databuff[] is filled with zero's. Therefore we don't need this check. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: improve rtl8169_init_ringHeiner Kallweit
This function doesn't use the net_device, therefore change the parameter to type struct rtl8169_private * to simplify the code. In addition we don't need the calculations in the memset statements, we can use the size of the arrays directly. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: simplify rtl8169_alloc_rx_dataHeiner Kallweit
dev->dev.parent has the same value as tp_to_dev(tp) (set by SET_NETDEV_DEV() in rtl_init_one()) and we know it can't be NULL. This allows us to simplify the code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: switch to napi_schedule_irqoffHeiner Kallweit
napi_schedule() is called from hard irq context, so we can switch to napi_schedule_irqoff() and avoid some overhead. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: use constant NAPI_POLL_WAITHeiner Kallweit
We can use generic constant NAPI_POLL_WAIT instead of defining an own constant for the same value. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: use skb_copy_to_linear_data in rtl8169_try_rx_copyHeiner Kallweit
Not a giant leap for mankind, but let's avoid the open-coded memcpy and use standard helper skb_copy_to_linear_data instead. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: remove member align from struct rtl_cfg_infoHeiner Kallweit
Since commit 6f0333b8fde4 "r8169: use 50% less ram for RX ring" member align isn't used any longer, so remove it. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18r8169: remove unused member features from structHeiner Kallweit
Member features of struct rtl8169_private isn't used any longer since commit 6c6aa15fdea5 "r8169: improve interrupt handling", so remove it. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18Merge branch 'netcp-K2G-SoC-support'David S. Miller
Murali Karicheri says: ==================== Add support for netcp driver on K2G SoC K2G SoC is another variant of Keystone family of SoCs. This patch series add support for NetCP driver on this SoC. The QMSS found on K2G SoC is a cut down version of the QMSS found on other keystone devices with less number of queues, internal link ram etc. The patch series has 2 patch sets that goes into the drivers/soc and the rest has to be applied to net sub system. Please review and merge if this looks good. K2G TRM is located at http://www.ti.com/lit/ug/spruhy8g/spruhy8g.pdf Thanks The boot logs on K2G ICE board (tftp boot over Ethernet and from mmc) https://pastebin.ubuntu.com/p/yvZ6drFhkW/ The boot logs on K2G GP board (tftp boot over Ethernet and from mmc) https://pastebin.ubuntu.com/p/QTr6K7s4Zp/ Also regressed boot on K2HK and K2L EVMs as we have modified GBE version detection logic (K2E uses same version of NetCP as in K2L. So regression on one of them is needed). Boot log on K2L and K2HK EVMs are at https://pastebin.ubuntu.com/p/N9DBdPjbvR/ This series applies to net-next master branch. Change history: v4 - ready for merge to net-next Folded the series "Add promiscous mode support in k2g network driver" into this. Fixed a typo in 5/11 (sgmii to rgmii) based on TI internal comment Reworked 4/11 and title changed to reflect additional changes to exclude sgmii configuration code for 2U cpsw. Use IS_SS_ID_2U() macro for customization. Added Reviewed-by from Rob Herring against 1/13 v3 - Addressed comments from Andrew Lunn and Grygorii Strashko against v2. v2 - Addressed following comments on initial version - split patch 3/5 to multiple patches from Andrew Lunn ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18net: netcp: ethss: k2g: add promiscuous mode supportWingMan Kwok
This patch adds support for promiscuous mode in k2g's network driver. When upper layer instructs to transition from non-promiscuous mode to promiscuous mode or vice versa K2G network driver needs to configure ALE accordingly so that in case of non-promiscuous mode, ALE will not flood all unicast packets to host port, while in promiscuous mode, it will pass all received unicast packets to host port. Signed-off-by: WingMan Kwok <w-kwok2@ti.com> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18net: netcp: add api to support set rx mode in netcp modulesWingMan Kwok
This patch adds an API to support setting rx mode in netcp modules. If a netcp module needs to be notified when upper layer transitions from one rx mode to another and react accordingly, such a module will implement the new API set_rx_mode added in this patch. Currently rx modes supported are PROMISCUOUS and NON_PROMISCUOUS modes. Signed-off-by: WingMan Kwok <w-kwok2@ti.com> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18net: netcp: support probe deferralMurali Karicheri
The netcp driver shouldn't proceed until the knav qmss and dma devices are ready. So return -EPROBE_DEFER if these devices are not ready. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18Revert "net: netcp: remove dead code from the driver"Murali Karicheri
As the probe sequence is not guaranteed contrary to the assumption of the commit 2d8e276a9030, same has to be reverted. commit 2d8e276a9030 ("net: netcp: remove dead code from the driver") Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18net: netcp: ethss: use of_get_phy_mode() to support different RGMII modesMurali Karicheri
The phy used for K2G allows for internal delays to be added optionally to the clock circuitry based on board desing. To add this support, enhance the driver to use of_get_phy_mode() to read the phy-mode from the phy device and pass the same to phy through of_phy_connect(). Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18net: netcp: ethss: re-use stats handling code for 2u hardwareMurali Karicheri
The stats block in 2u cpsw hardware is similar to the one on nu and hence handle it in a similar way by using a macro that includes 2u hardware as well. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18net: netcp: ethss: map vlan priorities to zero flowMurali Karicheri
The driver currently support only vlan priority zero. So map the vlan priorities to zero flow in hardware. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18net: netcp: ethss: use rgmii link status for 2u cpsw hardwareMurali Karicheri
Introduce rgmii link status to handle link state events for 2u cpsw hardware on K2G. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18net: netcp: ethss: add support for handling rgmii link interfaceMurali Karicheri
2u cpsw hardware on K2G uses rgmii link to interface with Phy. So add support for this interface in the code so that driver can be re-used for 2u hardware. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18net: netcp: ethss: make sgmii configuration conditionalMurali Karicheri
As a preparatory patch to add support for 2u cpsw hardware found on K2G SoC, make sgmii configuration conditional. This is required since 2u uses RGMII interface instead of SGMII and to allow for driver re-use. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18net: netcp: ethss: use macro for checking ss_version consistentlyMurali Karicheri
Driver currently uses macro for NU and XBE hardwrae, while other places for older hardware such as that on K2H/K SoC (version 1.4 of the cpsw hardware, it explicitly check for the ss_version inline. Add a new macro for version 1.4 and use it to customize code in the driver. While at it also fix similar issue with checking XBE version by re-using existing macro IS_SS_ID_XGBE(). Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18soc: ti: K2G: provide APIs to support driver probe deferralMurali Karicheri
This patch provide APIs to allow client drivers to support probe deferral. On K2G SoC, devices can be probed only after the ti_sci_pm_domains driver is probed and ready. As drivers may get probed at different order, any driver that depends on knav dma and qmss drivers, for example netcp network driver, needs to defer probe until knav devices are probed and ready to service. To do this, add an API to query the device ready status from the knav dma and qmss devices. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18soc: ti: K2G: enhancement to support QMSS in K2G NAVSSMurali Karicheri
Navigator Subsystem (NAVSS) available on K2G SoC has a cut down version of QMSS with less number of queues, internal linking ram with lesser number of buffers etc. It doesn't have status and explicit push register space as in QMSS available on other K2 SoCs. So define reg indices specific to QMSS on K2G. This patch introduces "ti,66ak2g-navss-qm" compatibility to identify QMSS on K2G NAVSS and to customize the dts handling code. Per Device manual, descriptors with index less than or equal to regions0_size is in region 0 in the case of K2 QMSS where as for QMSS on K2G, descriptors with index less than regions0_size is in region 0. So update the size accordingly in the regions0_size bits of the linking ram size 0 register. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: WingMan Kwok <w-kwok2@ti.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18Merge branch 'bpf-xdp-adjust-tail'Daniel Borkmann
Nikita V. Shirokov says: ==================== In this patch series i'm add new bpf helper which allow to manupulate xdp's data_end pointer. right now only "shrinking" (reduce packet's size by moving pointer) is supported (and i see no use case for "growing"). Main use case for such helper is to be able to generate controll (ICMP) messages from XDP context. such messages usually contains first N bytes from original packets as a payload, and this is exactly what this helper would allow us to do (see patch 3 for sample program, where we generate ICMP "packet too big" message). This helper could be usefull for load balancing applications where after additional encapsulation, resulting packet could be bigger then interface MTU. Aside from new helper this patch series contains minor changes in device drivers (for ones which requires), so they would recal packet's length not only when head pointer was adjusted, but if tail's one as well. v2->v3: * adding missed "signed off by" in v2 v1->v2: * fixed kbuild warning * made offset eq 0 invalid for xdp_bpf_adjust_tail * splitted bpf_prog_test_run fix and selftests in sep commits * added SPDX licence where applicable * some reshuffling in patches order (tests now in the end) ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-18bpf: add bpf_xdp_adjust_tail sample progNikita V. Shirokov
adding bpf's sample program which is using bpf_xdp_adjust_tail helper by generating ICMPv4 "packet to big" message if ingress packet's size is bigger then 600 bytes Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-18bpf: adding tests for bpf_xdp_adjust_tailNikita V. Shirokov
adding selftests for bpf_xdp_adjust_tail helper. in this synthetic test we are testing that 1) if data_end < data helper will return EINVAL 2) for normal use case packet's length would be reduced. Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-18bpf: making bpf_prog_test run aware of possible data_end ptr changeNikita V. Shirokov
after introduction of bpf_xdp_adjust_tail helper packet length could be changed not only if xdp->data pointer has been changed but xdp->data_end as well. making bpf_prog_test_run aware of this possibility Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-18bpf: make virtio compatible w/ bpf_xdp_adjust_tailNikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as well (only "decrease" of pointer's location is going to be supported). changing of this pointer will change packet's size. for virtio driver we need to adjust XDP_PASS handling by recalculating length of the packet if it was passed to the TCP/IP stack Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-18bpf: make tun compatible w/ bpf_xdp_adjust_tailNikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as well (only "decrease" of pointer's location is going to be supported). changing of this pointer will change packet's size. for tun driver we need to adjust XDP_PASS handling by recalculating length of the packet if it was passed to the TCP/IP stack (in case if after xdp's prog run data_end pointer was adjusted) Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-18bpf: make netronome nfp compatible w/ bpf_xdp_adjust_tailNikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as well (only "decrease" of pointer's location is going to be supported). changing of this pointer will change packet's size. for nfp driver we will just calculate packet's length unconditionally Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-18bpf: make cavium thunder compatible w/ bpf_xdp_adjust_tailNikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as well (only "decrease" of pointer's location is going to be supported). changing of this pointer will change packet's size. for cavium's thunder driver we will just calculate packet's length unconditionally Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-18bpf: make bnxt compatible w/ bpf_xdp_adjust_tailNikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as well (only "decrease" of pointer's location is going to be supported). changing of this pointer will change packet's size. for bnxt driver we will just calculate packet's length unconditionally Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-18bpf: make mlx4 compatible w/ bpf_xdp_adjust_tailNikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as well (only "decrease" of pointer's location is going to be supported). changing of this pointer will change packet's size. for mlx4 driver we will just calculate packet's length unconditionally (the same way as it's already being done in mlx5) Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>