summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-03-01net/smc: respond to test link messagesKarsten Graul
Add TEST LINK message responses, which also serves as preparation for support of sockopt TCP_KEEPALIVE. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net/smc: remove unused fields from smc structuresKarsten Graul
The daddr field holds the destination IPv4 address. The field was set but never used and can be removed. The addr field was a left-over from an earlier version of non-blocking connects and can be removed. The result of the call to kernel_getpeername is not used, the call can be removed. Non-blocking connects are working, so remove restriction comment. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net/smc: move netinfo function to file smc_clc.cKarsten Graul
The function smc_netinfo_by_tcpsk() belongs to CLC handling. Move it to smc_clc.c and rename to smc_clc_netinfo_by_tcpsk. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net/smc: cleanup smc_llc.h and smc_clc.h headersStefan Raspl
Remove structures used internal only from headers. And remove an extra function parameter. Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01Merge branch 'ipv4-ipv6-mcast-align'David S. Miller
Yuval Mintz says: ==================== ipmr, ip6mr: Align multicast routing for IPv4 & IPv6 Historically ip6mr was based [cut-n-paste] on ipmr and the two have not diverged too much. Apparently as ipv4 multicast routing is more common than its ipv6 brethren modifications since then are mostly one-way, affecting ipmr while leaving ip6mr unchanged. This series is meant to re-factor both ipmr and ip6mr into having common structures [and some functionality], adding 2 new common files - mroute_base.h and ipmr_base.c. The series begins by bringing ip6mr up to speed to some of the changes applied in the past to ipmr [#2, #3]. It is then possible to re-factor a lot of the common structures - vif devices [#1], mr_table [#4] mfc_cache [#6], and use the common structures in both ipmr and ip6mr. The rest of the patches re-factor some choice flows used by both ipmr and ip6mr and eliminates duplicity. This series would later allow for easy extension of ipmr offloading to support ip6mr offloading as well, as almost all structures related to the offloading would be shared between the two protocols. Changes from previous versions ------------------------------ v2: - #6 Corrected reporting logic when hitting an unresolved cache - #7 Addressed kernel doc style [Thanks Nikolay] RFC -> v1: - Corrected support for CONFIG_IP{,V6}_MROUTE_MULTIPLE_TABLES - Addressed a couple of kbuild test robot issues ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Unite dumproute flowsYuval Mintz
The various MFC entries are being held in the same kind of mr_tables for both ipmr and ip6mr, and their traversal logic is identical. Also, with the exception of the addresses [and other small tidbits] the major bulk of the nla setting is identical. Unite as much of the dumping as possible between the two. Notice this requires creating an mr_table iterator for each, as the for-each preprocessor macro can't be used by the common logic. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ip6mr: Remove MFC_NOTIFY and refactor flagsYuval Mintz
MFC_NOTIFY exists in ip6mr, probably as some legacy code [was already removed for ipmr in commit 06bd6c0370bb ("net: ipmr: remove unused MFC_NOTIFY flag and make the flags enum"). Remove it from ip6mr as well, and move the enum into a common file; Notice MFC_OFFLOAD is currently only used by ipmr. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Unite vif seq functionsYuval Mintz
Same as previously done with the mfc seq, the logic for the vif seq is refactored to be shared between ipmr and ip6mr. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Unite mfc seq logicYuval Mintz
With the exception of the final dump, ipmr and ip6mr have the exact same seq logic for traversing a given mr_table. Refactor that code and make it common. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Unite logic for searching in MFC cacheYuval Mintz
ipmr and ip6mr utilize the exact same methods for searching the hashed resolved connections, difference being only in the construction of the hash comparison key. In order to unite the flow, introduce an mr_table operation set that would contain the protocol specific information required for common flows, in this case - the hash parameters and a comparison key representing a (*,*) route. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Make mfc_cache a common structureYuval Mintz
mfc_cache and mfc6_cache are almost identical - the main difference is in the origin/group addresses and comparison-key. Make a common structure encapsulating most of the multicast routing logic - mr_mfc and convert both ipmr and ip6mr into using it. For easy conversion [casting, in this case] mr_mfc has to be the first field inside every multicast routing abstraction utilizing it. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Unite creation of new mr_tableYuval Mintz
Now that both ipmr and ip6mr are using the same mr_table structure, we can have a common function to allocate & initialize a new instance. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01mroute*: Make mr_table a common structYuval Mintz
Following previous changes to ip6mr, mr_table and mr6_table are basically the same [up to mr6_table having additional '6' suffixes to its variable names]. Move the common structure definition into a common header; This requires renaming all references in ip6mr to variables that had the distinct suffix. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ip6mr: Align hash implementation to ipmrYuval Mintz
Since commit 8fb472c09b9d ("ipmr: improve hash scalability") ipmr has been using rhashtable as a basis for its mfc routes, but ip6mr is currently still using the old private MFC hash implementation. Align ip6mr to the current ipmr implementation. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ip6mr: Make mroute_sk rcu-basedYuval Mintz
In ipmr the mr_table socket is handled under RCU. Introduce the same for ip6mr. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr,ipmr6: Define a uniform vif_deviceYuval Mintz
The two implementations have almost identical structures - vif_device and mif_device. As a step toward uniforming the mr_tables, eliminate the mif_device and relocate the vif_device definition into a new common header file. Also, introduce a common initializing function for setting most of the vif_device fields in a new common source file. This requires modifying the ipv{4,6] Kconfig and ipv4 makefile as we're introducing a new common config option - CONFIG_IP_MROUTE_COMMON. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28Merge branch 'fib_rules-support-sport-dport-and-proto-match'David S. Miller
Roopa Prabhu says: ==================== fib_rules: support sport, dport and proto match This series extends fib rule match support to include sport, dport and ip proto match (to complete the 5-tuple match support). Common use-cases of Policy based routing in the data center require 5-tuple match. The last 2 patches in the series add a call to flow dissect in the fwd path if required by the installed fib rules (controlled by a flag). v1: - Fix errors reported by kbuild and feedback on RFC - extend port match uapi to accomodate port ranges v2: - address comments from Nikolay, David Ahern and Paolo (Thanks!) Pending things I will submit separate patches for: - extack for fib rules - fib rules test (as requested by david ahern) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28ipv6: route: dissect flow in input path if fib rules need itRoopa Prabhu
Dissect flow in fwd path if fib rules require it. Controlled by a flag to avoid penatly for the common case. Flag is set when fib rules with sport, dport and proto match that require flow dissect are installed. Also passes the dissected hash keys to the multipath hash function when applicable to avoid dissecting the flow again. icmp packets will continue to use inner header for hash calculations. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28ipv6: route: dissect flow in input path if fib rules need itRoopa Prabhu
Dissect flow in fwd path if fib rules require it. Controlled by a flag to avoid penatly for the common case. Flag is set when fib rules with sport, dport and proto match that require flow dissect are installed. Also passes the dissected hash keys to the multipath hash function when applicable to avoid dissecting the flow again. icmp packets will continue to use inner header for hash calculations (Thanks to Nikolay Aleksandrov for some review here). Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28ipv6: fib6_rules: support for match on sport, dport and ip protoRoopa Prabhu
support to match on src port, dst port and ip protocol. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28ipv4: fib_rules: support match on sport, dport and ip protoRoopa Prabhu
support to match on src port, dst port and ip protocol. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28net: fib_rules: support for match on ip_proto, sport and dportRoopa Prabhu
uapi for ip_proto, sport and dport range match in fib rules. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28r8169: fix interrupt number after adding support for MSI-X interruptsHeiner Kallweit
In case of MSI-X the interrupt number may differ from pcidev->irq. Fix this by using pci_irq_vector(). Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2018-02-28 This series contains updates to fm10k only. Jake provides all the changes in this series, starting with making the function header comments consistent and to align with how the kernel documentation expects it. Also cleaned up code comment as well as bump the driver version. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28Merge branch 'selftests-forwarding-Add-VRF-based-tests'David S. Miller
Ido Schimmel says: ==================== selftests: forwarding: Add VRF-based tests One of the nice things about network namespaces is that they allow one to easily create and test complex environments. Unfortunately, these namespaces can not be used with actual switching ASICs, as their ports can not be migrated to other network namespaces (NETIF_F_NETNS_LOCAL) and most of them probably do not support the L1-separation provided by namespaces. However, a similar kind of flexibility can be achieved by using VRFs and by looping the switch ports together. For example: br0 + vrf-h1 | vrf-h2 + +---+----+ + | | | | 192.0.2.1/24 + + + + 192.0.2.2/24 swp1 swp2 swp3 swp4 + + + + | | | | +--------+ +--------+ The VRFs act as lightweight namespaces representing hosts connected to the switch. This approach for testing switch ASICs has several advantages over the traditional method that requires multiple physical machines, to name a few: 1. Only the device under test (DUT) is being tested without noise from other system. 2. Ability to easily provision complex topologies. Testing bridging between 4-ports LAGs or 8-way ECMP requires many physical links that are not always available. With the VRF-based approach one merely needs to loopback more ports. These tests are written with switch ASICs in mind, but they can be run on any Linux box using veth pairs to emulate physical loopbacks. v2: * Order local variables declaration according to function arguments order (Petr) v1: * Change location to net/forwarding instead of forwarding/ * Add ability to pause on failure * Add ability to pause on cleanup * Make configuration file optional * Make ping/ping6/mz configurable * Add more tc tests ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28selftests: forwarding: Introduce basic shared blocks testsJiri Pirko
Test shared block infrastructure. This is a basic test that shares TC block in between 2 clsact qdiscs. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28selftests: forwarding: Introduce basic tc chains testsJiri Pirko
Tests chains matching and goto chain action. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28selftests: forwarding: Introduce tc actions testsJiri Pirko
Add first part of actions tests. This patch only contains tests of gact ok/drop/trap and mirred redirect egress. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28selftests: forwarding: Introduce tc flower matching testsJiri Pirko
Add first part of flower tests. This patch only contains dst/src ip/mac matching. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28selftests: forwarding: Allow to get netdev interfaces names from commandlineJiri Pirko
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28selftests: forwarding: Add MAC get helperJiri Pirko
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28selftests: forwarding: Add tc offload check helperJiri Pirko
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28selftests: forwarding: Test IPv6 weighted nexthopsIdo Schimmel
Have one host generate 16K IPv6 echo requests with a random flow label and check that they are distributed between both multipath links according to the provided weights. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28selftests: forwarding: Test IPv4 weighted nexthopsIdo Schimmel
Use different weights for the multipath route configured on the first router and check that the different flows generated by the first host are distributed according to the provided weights. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28selftests: forwarding: Create test topology for multipath routingIdo Schimmel
Create a topology with two hosts, each directly connected to a different router. Both routers are connected using two links, enabling multipath routing. Test IPv4 and IPv6 ping using default MTU and large MTU. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28selftests: forwarding: Add a test for basic IPv4 and IPv6 routingIdo Schimmel
Configure two hosts which are directly connected to the same router and test IPv4 and IPv6 ping. Use a large MTU and check that ping is unaffected. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28selftests: forwarding: Add a test for flooded trafficIdo Schimmel
Add test cases for unknown unicast and unregistered multicast flooding. For each traffic type, turn off flooding on one bridged port and inject a packet of the specified type through the second bridged port. Make sure the packet was not received by checking the ACL counters on the other end. Later, turn on flooding and make sure the packet was received. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28selftests: forwarding: Add a test for FDB learningIdo Schimmel
Send a packet with a specific destination MAC, make sure it was learned on the ingress port and then aged-out. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28selftests: forwarding: Add initial testing frameworkIdo Schimmel
Add initial framework to test packet forwarding functionality. The tests can run on actual devices using loop-backed cables or using veth pairs. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28ipvlan: use per device spinlock to protect addrs list updatesPaolo Abeni
This changeset moves ipvlan address under RCU protection, using a per ipvlan device spinlock to protect list mutation and RCU read access to protect list traversal. Also explicitly use RCU read lock to traverse the per port ipvlans list, so that we can now perform a full address lookup without asserting the RTNL lock. Overall this allows the ipvlan driver to check fully for duplicate addresses - before this commit ipv6 addresses assigned by autoconf via prefix delegation where accepted without any check - and avoid the following rntl assertion failure still in the same code path: RTNL: assertion failed at drivers/net/ipvlan/ipvlan_core.c (124) WARNING: CPU: 15 PID: 0 at drivers/net/ipvlan/ipvlan_core.c:124 ipvlan_addr_busy+0x97/0xa0 [ipvlan] Modules linked in: ipvlan(E) ixgbe CPU: 15 PID: 0 Comm: swapper/15 Tainted: G E 4.16.0-rc2.ipvlan+ #1782 Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016 RIP: 0010:ipvlan_addr_busy+0x97/0xa0 [ipvlan] RSP: 0018:ffff881ff9e03768 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff881fdf2a9000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 00000000000000f6 RDI: 0000000000000300 RBP: ffff881fdf2a8000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: ffff881ff9e034c0 R12: ffff881fe07bcc00 R13: 0000000000000001 R14: ffffffffa02002b0 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff881ff9e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc5c1a4f248 CR3: 000000207e012005 CR4: 00000000001606e0 Call Trace: <IRQ> ipvlan_addr6_event+0x6c/0xd0 [ipvlan] notifier_call_chain+0x49/0x90 atomic_notifier_call_chain+0x6a/0x100 ipv6_add_addr+0x5f9/0x720 addrconf_prefix_rcv_add_addr+0x244/0x3c0 addrconf_prefix_rcv+0x2f3/0x790 ndisc_router_discovery+0x633/0xb70 ndisc_rcv+0x155/0x180 icmpv6_rcv+0x4ac/0x5f0 ip6_input_finish+0x138/0x6a0 ip6_input+0x41/0x1f0 ipv6_rcv+0x4db/0x8d0 __netif_receive_skb_core+0x3d5/0xe40 netif_receive_skb_internal+0x89/0x370 napi_gro_receive+0x14f/0x1e0 ixgbe_clean_rx_irq+0x4ce/0x1020 [ixgbe] ixgbe_poll+0x31a/0x7a0 [ixgbe] net_rx_action+0x296/0x4f0 __do_softirq+0xcf/0x4f5 irq_exit+0xf5/0x110 do_IRQ+0x62/0x110 common_interrupt+0x91/0x91 </IRQ> v1 -> v2: drop unneeded in_softirq check in ipvlan_addr6_validator_event() Fixes: e9997c2938b2 ("ipvlan: fix check for IP addresses in control path") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28ipvlan: egress mcast packets are not exceptionalPaolo Abeni
Currently, if IPv6 is enabled on top of an ipvlan device in l3 mode, the following warning message: Dropped {multi|broad}cast of type= [86dd] is emitted every time that a RS is generated and dmseg is soon filled with irrelevant messages. Replace pr_warn with pr_debug, to preserve debuggability, without scaring the sysadmin. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28Merge branch 'mlxsw-mq-red-offload'David S. Miller
Jiri Pirko says: ==================== mlxsw: Offload multi-queue RED support Nogah says: Support a two level hierarchy of offloaded qdiscs in mlxsw, with sch_prio being the root qdisc and sch_red as the children. +----------+ | sch_prio | +----+-----+ | | +----------------------------------+ | | | | | | | | | +---v---+ +----v---+ +-----v--+ |sch_red| |sch_red | |sch_red | +-------+ +--------+ +--------+ When setting sch_prio as the root qdisc on a physical port, mlxsw will offload it. When adding it with sch_red as a child qdisc, it will offload it as well. Relocating child qdisc or connecting them to more then one child will result in unoffloading them. Relocating child qdisc more then once is highly unrecommended and might cause a miss match between the kernel configuration and the offloaded one. The offloaded configuration will be aligned with the one shown in the show command. Changing the priomap parameter of sch_prio might cause a band that its configuration was changed and it has offloaded sch_red set on it, to lose some stats data as if sch_red was unoffloaded and offloaded again. However, it won't affect the data on this band that will have sch_red continuously. Patch 1 adds support for setting RED as the child of root qdisc. Patches 2-4 add support for RED bstasts for offloaded child qdiscs. Patches 5-6 handle backlog related changes for offloaded child qdiscs. Patches 7-8 update PRIO in mlxsw to be able to have RED as child on its bands. Patch 9 adds offload handles for PRIO graft operations. In mlxsw it will cause the driver to stop offloading the child in question. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28mlxsw: spectrum: qdiscs: prio: Handle graft commandNogah Frankel
Handle graft command for an offloaded sch_prio. Grafting a qdisc to any place other than under its original parent is not supported by mlxsw and will cause the grafted qdisc to stop being offloaded. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28net: sch: prio: Add offload ability for grafting a childNogah Frankel
Offload sch_prio graft command for capable drivers. Warn in case of a failure, unless the graft was done as part of a destroy operation (the new qdisc is a noop) or if all the qdiscs (the parent, the old child, and the new one) are not offloaded. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28mlxsw: spectrum: qdiscs: prio: Delete child qdiscs when removing bandsNogah Frankel
When the number the bands of sch_prio is decreased, child qdiscs on the deleted bands would get deleted as well. This change and deletions are being done under sch_tree_lock of the sch_prio qdisc. Part of the destruction of qdisc is unoffloading it, if it is offloaded. Un-offloading can't be done inside this lock. Move the offload command to be done before reducing the number of bands, so unoffloading of the qdiscs that are about to be deleted could be done outside of the lock. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28mlxsw: spectrum: Update sch_prio stats to include sch_red related dropsNogah Frankel
sch_prio as root qdisc should count all the drops its children have. Since it is possible for it to have sch_red children, it needs to count RED early drops. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28net: sch: Don't warn on missmatching qlen and backlog for offloaded qdiscsNogah Frankel
Offloaded qdiscs are allowed to expose only parts of their statistics. It means that if backlog is being exposed and qlen is not, it might trigger a warning in qdisc_tree_reduce_backlog. Do not warn in case the qdisc that was removed was an offloaded one. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28mlxsw: spectrum: qdiscs: Update backlog handling of a child qdiscsNogah Frankel
When removing a child qdisc its backlog will be decreased from the parent backlog. The driver backlog count should do the same. When the parent changes its configuration, the child might need to clean its stats. However, the backlog can't be cleaned with the rest of the stats, because it reflects a momentary value that needs to be synced with the core, not the history of the qdisc. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28mlxsw: spectrum: qdiscs: Collect stats for sch_red based on priomapNogah Frankel
Priority counters count packets according to their packet priority. Collect the stats for sch_red based on these counters, so the qdisc bstats will be the sum of counters matching the priorities marked in the qdisc priomap. Changing the mapping of the priorities to bands while traffic is running can result in losing the stats of the bands qdiscs from their last dump call to this change, as if the qdisc was unoffloaded and re-offloaded. It will not affect the traffic behaviour according to sch_red. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28mlxsw: spectrum: qdiscs: Add priority map per qdiscNogah Frankel
Add priority map per qdisc, to indicate which priorities are being directed through this qdisc. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>