summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-11-05nfp: bpf: remove unnecessary include of nfp_net.hJakub Kicinski
BPF offload's main header does not need to include nfp_net.h. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05nfp: bpf: remove the register renumbering leftoversJakub Kicinski
The register renumbering was removed and will not be coming back in its old, naive form, given that it would be fundamentally incompatible with calling functions. Remove the leftovers. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05nfp: bpf: drop support for cls_bpf with legacy actionsJakub Kicinski
Only support BPF_PROG_TYPE_SCHED_CLS programs in direct action mode. This simplifies preparing the offload since there will now be only one mode of operation for that type of program. We need to know the attachment mode type of cls_bpf programs, because exit codes are interpreted differently for legacy vs DA mode. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05cls_bpf: allow attaching programs loaded for specific deviceJakub Kicinski
If TC program is loaded with skip_sw flag, we should allow the device-specific programs to be accepted. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05xdp: allow attaching programs loaded for specific deviceJakub Kicinski
Pass the netdev pointer to bpf_prog_get_type(). This way BPF code can decide whether the device matches what the code was loaded/translated for. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05bpftool: print program device bound infoJakub Kicinski
If program is bound to a device, print the name of the relevant interface or unknown if the netdev has since been removed. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05bpf: report offload info to user spaceJakub Kicinski
Extend struct bpf_prog_info to contain information about program being bound to a device. Since the netdev may get destroyed while program still exists we need a flag to indicate the program is loaded for a device, even if the device is gone. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05bpf: offload: add infrastructure for loading programs for a specific netdevJakub Kicinski
The fact that we don't know which device the program is going to be used on is quite limiting in current eBPF infrastructure. We have to reverse or limit the changes which kernel makes to the loaded bytecode if we want it to be offloaded to a networking device. We also have to invent new APIs for debugging and troubleshooting support. Make it possible to load programs for a specific netdev. This helps us to bring the debug information closer to the core eBPF infrastructure (e.g. we will be able to reuse the verifer log in device JIT). It allows device JITs to perform translation on the original bytecode. __bpf_prog_get() when called to get a reference for an attachment point will now refuse to give it if program has a device assigned. Following patches will add a version of that function which passes the expected netdev in. @type argument in __bpf_prog_get() is renamed to attach_type to make it clearer that it's only set on attachment. All calls to ndo_bpf are protected by rtnl, only verifier callbacks are not. We need a wait queue to make sure netdev doesn't get destroyed while verifier is still running and calling its driver. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: bpf: rename ndo_xdp to ndo_bpfJakub Kicinski
ndo_xdp is a control path callback for setting up XDP in the driver. We can reuse it for other forms of communication between the eBPF stack and the drivers. Rename the callback and associated structures and definitions. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05mISDN: l1oip_core: replace _manual_ swap with swap macroGustavo A. R. Silva
Make use of the swap macro and remove unnecessary variables skb and cnt. This makes the code easier to read and maintain. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: plip: mark expected switch fall-throughsGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 114893 Addresses-Coverity-ID: 114894 Addresses-Coverity-ID: 114895 Addresses-Coverity-ID: 114896 Addresses-Coverity-ID: 114897 Addresses-Coverity-ID: 114898 Addresses-Coverity-ID: 114899 Addresses-Coverity-ID: 114900 Addresses-Coverity-ID: 114901 Addresses-Coverity-ID: 114902 Addresses-Coverity-ID: 114903 Addresses-Coverity-ID: 114904 Addresses-Coverity-ID: 114905 Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05tcp: do not clear again skb->csum in tcp_init_nondata_skb()Eric Dumazet
tcp_init_nondata_skb() is fed with freshly allocated skbs. They already have a cleared csum field, no need to clear it again. This is based on Neal review on commit 3b11775033dc ("tcp: do not mangle skb->cb[] in tcp_make_synack()"), noticing I did not clear skb->csum. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05tcp: tcp_mtu_probing() cleanupEric Dumazet
Reduce one indentation level to make code more readable. tcp_sync_mss() can be factorized. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05dpaa_eth: avoid uninitialized variable false-positive warningArnd Bergmann
We can now build this driver on ARM, so I ran into a randconfig build warning that presumably had existed on powerpc already. drivers/net/ethernet/freescale/dpaa/dpaa_eth.c: In function 'sg_fd_to_skb': drivers/net/ethernet/freescale/dpaa/dpaa_eth.c:1712:18: error: 'skb' may be used uninitialized in this function [-Werror=maybe-uninitialized] I'm slightly changing the logic here, to make it obvious to the compiler that 'skb' is always initialized. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05Merge branch 'openvswitch-netns'David S. Miller
Flavio Leitner says: ==================== Allow openvswitch to query ports in another netns. Today Open vSwitch users are moving internal ports to other namespaces and although packets are flowing OK, the userspace daemon can't find out basic information like if the port is UP or DOWN, for instance. This patchset extends openvswitch API to retrieve the current netnsid of a port. It will be used by the userspace daemon to find out in which netns the port is located. This patchset also extends the rtnetlink getlink call to accept and operate on a given netnsid. More details are available in each patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05rtnetlink: use netnsid to query interfaceJiri Benc
Currently, when an application gets netnsid from the kernel (for example as the result of RTM_GETLINK call on one end of the veth pair), it's not much useful. There's no reliable way to get to the netns fd from the netnsid, nor does any kernel API accept netnsid. Extend the RTM_GETLINK call to also accept netnsid. It will operate on the netns with the given netnsid in such case. Of course, the calling process needs to have enough capabilities in the target name space; for now, require CAP_NET_ADMIN. This can be relaxed in the future. To signal to the calling process that the kernel understood the new IFLA_IF_NETNSID attribute in the query, it will include it in the response. This is needed to detect older kernels, as they will just ignore IFLA_IF_NETNSID and query in the current name space. This patch implemetns IFLA_IF_NETNSID only for get and dump. For set operations, this can be extended later. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05openvswitch: reliable interface indentification in port dumpsJiri Benc
This patch allows reliable identification of netdevice interfaces connected to openvswitch bridges. In particular, user space queries the netdev interfaces belonging to the ports for statistics, up/down state, etc. Datapath dump needs to provide enough information for the user space to be able to do that. Currently, only interface names are returned. This is not sufficient, as openvswitch allows its ports to be in different name spaces and the interface name is valid only in its name space. What is needed and generally used in other netlink APIs, is the pair ifindex+netnsid. The solution is addition of the ifindex+netnsid pair (or only ifindex if in the same name space) to vport get/dump operation. On request side, ideally the ifindex+netnsid pair could be used to get/set/del the corresponding vport. This is not implemented by this patch and can be added later if needed. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: export peernet2id_allocJiri Benc
It will be used by openvswitch. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04net/mlx5e: Enable CQE based moderation on TX CQTal Gilboa
By using CQE based moderation on TX CQ we can reduce the number of TX interrupt rate. Besides the benefit of less interrupts, this also allows the kernel to better utilize TSO. Since TSO has some CPU overhead, it might not aggregate when CPU is under high stress. By reducing the interrupt rate and the CPU utilization, we can get better aggregation and better overall throughput. The feature is enabled by default and has a private flag in ethtool for control. Throughput, interrupt rate and TSO utilization improvements: (ConnectX-4Lx 40GbE, unidirectional, 1/16 TCP streams, 64B packets) --------------------------------------------------------- Metric | Streams | CQE Based | EQE Based | improvement --------------------------------------------------------- BW | 1 | 2.4Gb/s | 2.15Gb/s | +11.6% IR | 1 | 27Kips | 50.6Kips | -46.7% TSO Util | 1 | 74.6% | 71% | +5% BW | 16 | 29Gb/s | 25.85Gb/s | +12.2% IR | 16 | 482Kips | 745Kips | -35.3% TSO Util | 16 | 69.1% | 49% | +41.1% *BW = Bandwidth, IR = Interrupt rate, ips = interrupt per second. TSO Util = bytes in TSO sessions / all bytes transferred Signed-off-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04net/mlx5e: IPoIB, Add inner TTC table to IPoIB flow steeringFeras Daoud
For supported platforms, add inner TTC flow table to enhanced IPoIB flow steering. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04net/mlx5: Initialize destination_flow struct to 0Rabie Loulou
This is needed in order to enlarge it with more members that will get value of 0 when not set. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04net/mlx5: Enlarge the NIC TC offload table sizeOr Gerlitz
The NIC TC offload table size was hard coded to 1k. Change it to be min(max NIC RX table size, min(max flow counters, 64k) * num flow groups) where the max values are read from the firmware and the number of flow groups is hard-coded as before this change. We don't know upfront the division of flows to groups (== different masks). This setup allows each group to be of size up to the where we want to go (when supported, all offloaded flows use counters). Thus, we don't expect multiple occurences for a group which in turn would add steering hops. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04net/mlx5e: DCBNL, Add debug messages logInbar Karmy
Add debug print when changing the configuration of QoS through dcbnl. Use ethtool -s <devname> msglvl hw on/off to toggle debug messages. Signed-off-by: Inbar Karmy <inbark@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04net/mlx5e: Add support for ethtool msglvl supportGal Pressman
Use ethtool -s <devname> msglvl <type> on/off to toggle debug messages. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Inbar Karmy <inbark@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04net/mlx5e: Support DSCP trust state to Ethernet's IP packet on SQHuy Nguyen
If the port is in DSCP trust state, packets are placed in the right priority queue based on the dscp value. This is done by selecting the transmit queue based on the dscp of the skb. Until now select_queue honors priority only from the vlan header. However that is not sufficient in cases where port trust state is DSCP mode as packet might not even contain vlan header. Therefore if the port is in dscp trust state and vport's min inline mode is not NONE, copy the IP header to the eseg's inline header if the skb has it. This is done by changing the transmit queue sq's min inline mode to L3. Note that the min inline mode of sqs that belong to other features such as xdpsq, icosq are not modified. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04net/mlx5e: Add dcbnl dscp to priority supportHuy Nguyen
This patch implements dcbnl hooks to set and delete DSCP to priority map as defined by the DCB subsystem. Device maintains internal trust state which needs to be set to DSCP state for performing DSCP to priority mapping. When the first dscp to priority APP entry is added by the user, the trust state is changed to dscp. When the last dscp to priority APP entry is deleted by the user, the trust state is changed to pcp. If user sends multiple dscp to priority APP entries on the same dscp, the last sent one will take effect. All the previous sent will be deleted. The dscp to priority APP entries are added and deleted in the net/dcb APP database using dcb_ieee_setapp/getapp. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04net/mlx5: QPTS and QPDPM register firmware command supportHuy Nguyen
The QPTS register allows changing the priority trust state between pcp and dscp. Add support to get/set trust state from device. When the port is in pcp/dscp trust state, packet is routed by hardware to matching priority based on its pcp/dscp value respectively. The QPDPM register allow channing the dscp to priority mapping. Add support to get/set dscp to priority mapping from device. Note that to change a dscp mapping, the "e" bit of this dscp structure must be set in the QPDPM firmware command. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04net/mlx5: Add MLX5_SET16 and MLX5_GET16Huy Nguyen
Add MLX5_SET16 and MLX5_GET16 for 16bit structure field in firmware command. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04net/mlx5: QCAM register firmware command supportHuy Nguyen
The QCAM register provides capability bit for all the QoS registers using ACCESS_REG command. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04net/dcb: Add dscp to priority selector typeHuy Nguyen
IEEE specification P802.1Qcd/D2.1 defines priority selector 5. This APP TLV selector defines DSCP to priority map. This patch defines such DSCP selector. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-05ipv6: remove IN6_ADDR_HSIZE from addrconf.hEric Dumazet
IN6_ADDR_HSIZE is private to addrconf.c, move it here to avoid confusion. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05pktgen: do not abuse IN6_ADDR_HSIZEEric Dumazet
pktgen accidentally used IN6_ADDR_HSIZE, instead of using the size of an IPv6 address. Since IN6_ADDR_HSIZE recently was increased from 16 to 256, this old bug is hitting us. Fixes: 3f27fb23219e ("ipv6: addrconf: add per netns perturbation in inet6_addr_hash()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04net: sched: cls_u32: use bitwise & rather than logical && on n->flagsColin Ian King
Currently n->flags is being operated on by a logical && operator rather than a bitwise & operator. This looks incorrect as these should be bit flag operations. Fix this. Detected by CoverityScan, CID#1460398 ("Logical vs. bitwise operator") Fixes: 245dc5121a9b ("net: sched: cls_u32: call block callbacks for offload") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04tcp_nv: use do_div() instead of expensive div64_u64()Konstantin Khlebnikov
Average RTT is 32-bit thus full 64-bit division is redundant. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04add support of IFF_XMIT_DST_RELEASE bit in vlanVadim Fedorenko
Some time ago Eric Dumazet suggested a "hack the IFF_XMIT_DST_RELEASE flag on the vlan netdev". But the last comment was "does not support properly bonding/team.(If the real_dev->privflags IFF_XMIT_DST_RELEASE bit changes, we want to update all the vlans at the same time )" I've extended that patch to support changes of IFF_XMIT_DST_RELEASE in bonding/team. Both bonding and team call netdev_change_features() after recalculation of features including priv_flags IFF_XMIT_DST_RELEASE bit. So the only thing needed to support is to recheck this bit in vlan_transfer_features(). Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Vadim Fedorenko <vfedorenko@yandex-team.ru> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04phylink: make local function phylink_phy_change() staticWei Yongjun
Fixes the following sparse warnings: drivers/net/phy/phylink.c:570:6: warning: symbol 'phylink_phy_change' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04Merge tag 'wireless-drivers-next-for-davem-2017-11-03' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.15 Mostly fixes this time, but also few new features. Major changes: wil6210 * remove ssid debugfs file rsi * add WOWLAN support for suspend, hibernate and shutdown states ath10k * add support for CCMP-256, GCMP and GCMP-256 ciphers on hardware where it's supported (QCA99x0 and QCA4019) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Files removed in 'net-next' had their license header updated in 'net'. We take the remove from 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04liquidio: Fix an issue with multiple switchdev enable disablesVijaya Mohan Guvva
Return success if the same dispatch function is being registered for a given opcode and subcode, there by allow multiple switchdev enable and disables. Signed-off-by: Vijaya Mohan Guvva <vijaya.guvva@cavium.com> Signed-off-by: Satanand Burla <satananda.burla@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04Merge branch 'mlxsw-Handle-changes-in-GRE-configuration'David S. Miller
Jiri Pirko says: ==================== mlxsw: Handle changes in GRE configuration Petr says: Until now, when an IP tunnel was offloaded by the mlxsw driver, the offload was pretty much static, and changes in Linux configuration were not reflected in the hardware. That led to discrepancies between traffic flows in slow path and fast path. The work-around used to be to remove all routes that forward to the netdevice and re-add them. This is clearly suboptimal, but actually, as of the decap-only patchset, it's not even enough anymore, and one needs to go all the way and simply drop the tunnel and recreate it correctly. With this patchset, the NETDEV_CHANGE events that are generated for changes of up'd tunnel netdevices are captured and interpreted to correctly reconfigure the HW in accordance with changes requested at the software layer. In addition, NETDEV_CHANGEUPPER, NETDEV_UP and NETDEV_DOWN are now handled not only for tunnel devices themselves, but also for their bound devices. Each change is then translated to one or more of the following updates to the HW configuration: - refresh of offload of local route that corresponds to tunnel's local address - refresh of the loopback RIF - refresh of offloads of routes that forward to the changed tunnel - removal of tunnel offloads These tools are used to implement the following configuration changes: - addition of a new offloadable tunnel with local address that conflicts with that of an already-offloaded tunnel (the existing tunnel is onloaded, the new one isn't offloaded) - changes to TTL, TOS that make tunnel unsuitable for offloading - changes to ikey, okey, remote - changes to local, which when they cause conflict with another tunnel, lead to onloading of both newly-conflicting tunnels - migration of a bound device of an offloaded tunnel device to a different VRF - changes to what device is bound to a tunnel device (i.e. like what "ip tunnel change name g dev another" does) - changes to up / down state of a bound device. A down bound device doesn't forward encapsulated traffic anymore, but decap still works. This patchset starts with a suite of patches that adapt the existing code base step by step to facilitate introduction of the offloading code. The five substantial patches at the end then implement the changes mentioned above. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04mlxsw: spectrum_router: Handle down of tunnel underlayPetr Machata
When the bound device of a tunnel device is down, encapsulated packets are not egressed anymore, but tunnel decap still works. Extend mlxsw_sp_nexthop_rif_update() to take IFF_UP into consideration when deciding whether a given next hop should be offloaded. Because the new logic was added to mlxsw_sp_nexthop_rif_update(), this fixes the case where a newly-added tunnel has a down bound device, which would previously be fully offloaded. Now the down state of the bound device is noted and next hops forwarding to such tunnel are not offloaded. In addition to that, notice NETDEV_UP and NETDEV_DOWN of a bound device to force refresh of tunnel encap route offloads. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04mlxsw: spectrum_ipip: Handle underlay device changePetr Machata
When a bound device of an IP-in-IP tunnel changes, such as through 'ip tunnel change name $name dev $dev', the loopback backing the tunnel needs to be recreated. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04mlxsw: spectrum: Handle NETDEV_CHANGE on L3 tunnelsPetr Machata
Changes to L3 tunnel netdevices (through `ip tunnel change' as well as `ip link set') lead to NETDEV_CHANGE being generated on the tunnel device. Because what is relevant for the tunnel in question depends on the tunnel type, handling of the event is dispatched to the IPIP module through a newly-added interface mlxsw_sp_ipip_ops.ol_netdev_change(). IPIP tunnels now remember the last set of tunnel parameters in struct mlxsw_sp_ipip_entry.parms, and use it to figure out what exactly has changed. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04mlxsw: spectrum: Support IPIP underlay VRF migrationPetr Machata
When a bound device of a tunnel netdevice changes VRF, the loopback RIF that backs the tunnel needs to be updated and existing encapsulating routes need to be refreshed. Note that several tunnels can share the same bound device, in which case all the impacted tunnels need to be updated. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04mlxsw: spectrum_router: Onload conflicting tunnelsPetr Machata
The approach for offloading IP tunnels implemented currently by mlxsw doesn't allow two tunnels that have the same local IP address in the same (underlay) VRF. Previously, offloads were introduced on demand as encap routes were formed. When such a route was created that would cause offload of a conflicting tunnel, mlxsw_sp_ipip_entry_create() would detect it and return -EEXIST, which would propagate up and cause FIB abort. Now however IPIP entries are created as soon as an offloadable netdevice is created, and the failure prevents creation of such device. Furthermore, if the driver is installed at the point where such conflicting tunnels exist, the failure actually prevents successful modprobe. Furthermore, follow-up patches implement handling of NETDEV_CHANGE due to the local address change. However, NETDEV_CHANGE can't be vetoed. The failure merely means that the offloads weren't updated, but the change in Linux configuration is not rolled back. It is thus desirable to have a robust way of handling these conflicts, which can later be reused for handling NETDEV_CHANGE as well. To fix this, when a conflicting tunnel is created, instead of failing, simply pull the old tunnel to slow path and reject offloading the new one. Introduce two functions: mlxsw_sp_ipip_entry_demote_tunnel() and mlxsw_sp_ipip_demote_tunnel_by_saddr() to handle this. Make them both public, because they will be useful later on in this patchset. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04mlxsw: spectrum_router: Fix saddr deduction in mlxsw_sp_ipip_entry_create()Petr Machata
When trying to determine whether there are other offloaded tunnels with the same local address, mlxsw_sp_ipip_entry_create() should look for a tunnel with matching UL protocol, matching saddr, in the same VRF. However instead of taking into account the UL protocol of the tunnel netdevice (which mlxsw_sp_ipip_entry_saddr_matches() then compares to the UL protocol of inspected IPIP entry), it deduces the UL protocol from the inspected IPIP entry (and that's compared to itself). This is currently immaterial, because only one tunnel type is offloaded, and therefore the UL protocol always matches, but introducing support for a tunnel with IPv6 underlay would uncover this error. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04mlxsw: spectrum_router: Generalize __mlxsw_sp_ipip_entry_update_tunnel()Petr Machata
The work that needs to be done to update HW configuration in response to changes is similar to what __mlxsw_sp_ipip_entry_update_tunnel() already does, but with a number of twists: each change requires a different subset of things to happen. Extend the function to support all these uses, and allow finely-grained configuration of what should happen at each call through a suite of function arguments. Publish the updated function to allow use from the spectrum_ipip module. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04mlxsw: spectrum_router: Extract __mlxsw_sp_ipip_entry_update_tunnel()Petr Machata
The work that's done by mlxsw_sp_netdevice_ipip_ol_vrf_event() is a good basis for a more versatile function that would take care of all sorts of tunnel updates requests: __mlxsw_sp_ipip_entry_update_tunnel(). Extract that function. Factor out a helper mlxsw_sp_ipip_entry_ol_lb_update() as well. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04mlxsw: spectrum: Propagate extack for tunnel eventsPetr Machata
The function mlxsw_sp_rif_create() takes an extack parameter. So far, for creation of loopback interfaces, NULL was passed. For some events however the extack can be extracted and passed along. So do that for NETDEV_CHANGEUPPER handler. Use the opportunity to update the type of info argument that mlxsw_sp_netdevice_ipip_ol_event() takes. Follow-up patches will introduce handling of more changes, and some of them carry an extack as well, but in an info structure of a different type. Though not strictly erroneous (the pointer could be cast whichever way), it makes no sense to pretend the value is always of a certain type, when in fact it isn't. So change the prototype of the above-mentioned function as well. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04mlxsw: spectrum_router: Extract mlxsw_sp_ipip_entry_ol_up_event()Petr Machata
The piece of logic to promote decap route, if any, is useful for generic tunnel updates, not just for handling of NETDEV_UP events on tunnel interfaces. Extract it to a separate function. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>