summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox
AgeCommit message (Collapse)Author
2018-03-06net/mlx5: FPGA and IPSec initialization to be before flow steeringMatan Barak
Some flow steering namespace initialization (i.e. egress namespace) might depend on FPGA capabilities. Changing the initialization order such that the FPGA will be initialized before flow steering. Flow steering fs cmds initialization might depend on IPSec capabilities. Changing the initialization order such that the IPSec will be initialized before flow steering as well. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06net/mlx5e: Removed not need synchronize_rcuAviad Yehezkel
This is already done by xfrm layer between state_dev_del callback to state_dev_free callback. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06net/mlx5e: Fixed sleeping inside atomic contextAviad Yehezkel
We can't allocate with GFP_KERNEL inside spinlock. Actually ida_simple doesn't require spinlock so remove it. Fixes: 547eede070eb ("net/mlx5e: IPSec, Innova IPSec offload infrastructure") Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06net/mlx5e: Wait for FPGA command responses with a timeoutAviad Yehezkel
Generally, FPGA IPSec commands must always complete. We want to wait for one minute for them to complete gracefully also when killing a process. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06net/mlx5: Fixed compilation issue when CONFIG_MLX5_ACCEL is disabledAviad Yehezkel
IPSec init and cleanup functions also depends on linux/mlx5/driver.h. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23net/mlx5: E-Switch, Reload IB interface when switching devlink modesMark Bloch
Up until this point it wasn't possible to activate IB representors when switching to switchdev mode, remove this limitation. We trigger reload of the PF IB interface in order to make sure that already allocated resources are invalid and new resources will be opened correctly with all the limitations of switchdev mode applied (only raw packet capabilities, without RoCE). We also move the remove/add to a place where the E-Switch mode is set/unset to better control when to trigger this action, this will allow the IB side to start in the correct mode. For better code reuse, create a function which reloads an interface and export it. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23net/mlx5: E-Switch, Optimize HW steering tables in switchdev modeMark Bloch
Under switchdev mode we insert an eswitch miss rule causing any unmatched traffic to be sent towards the PF vport. This miss rule can be optimized if we break it to two, one case is for multicast traffic and the other for unicast. Breaking the miss rule into two (unicast and multicast) allows the firmware to program the hardware in a more efficient way. Using ConncetX-5 Ex with IXIA and testpmd (which use IB representors): IXIA -> NIC -> PF -> IB representor -> NIC -> VF: - Without this optimization: 9.2 MPPS. - With this optimization: 18 MPPS. VF -> NIC -> IB representor-> PF -> NIC -> IXIA: - Without this optimization: 17 MPPS. - With this optimization: 23.4 MPPS. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23net/mlx5: E-Switch, Increase number of FTEs in FDB in switchdev modeMark Bloch
The max FTE number should be the max number of SQs that can be opened. Ethernet representors open one SQ each. Once we add IB representor this will increase (depends on the user). For now lets start with 31 per IB representor and if needed increase in the future. This increase only affects the number of FTEs in the slow path FDB, offloaded rules (done via TC on the fast path portion of the FDB) aren't affected. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23net/mlx5: E-Switch, Move representors definition to a global scopeMark Bloch
In preparation for IB representors, move representors structs to a global scope, also expose functions needed for registration, unregistration, eswitch mode and creating a flow rule to direct traffic from SQs to the right VF. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23net/mlx5: E-Switch, Add callback to get representor deviceMark Bloch
Add a callback interface to get a protocol device (per representor type). The Ethernet representors will expose their netdev via this interface. This functionality can be later used by IB representor in order to find the corresponding net device representor. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-15IB/mlx5: Implement fragmented completion queue (CQ)Yonatan Cohen
The current implementation of create CQ requires contiguous memory, such requirement is problematic once the memory is fragmented or the system is low in memory, it causes for failures in dma_zalloc_coherent(). This patch implements new scheme of fragmented CQ to overcome this issue by introducing new type: 'struct mlx5_frag_buf_ctrl' to allocate fragmented buffers, rather than contiguous ones. Base the Completion Queues (CQs) on this new fragmented buffer. It fixes following crashes: kworker/29:0: page allocation failure: order:6, mode:0x80d0 CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0 Workqueue: ib_cm cm_work_handler [ib_cm] Call Trace: [<>] dump_stack+0x19/0x1b [<>] warn_alloc_failed+0x110/0x180 [<>] __alloc_pages_slowpath+0x6b7/0x725 [<>] __alloc_pages_nodemask+0x405/0x420 [<>] dma_generic_alloc_coherent+0x8f/0x140 [<>] x86_swiotlb_alloc_coherent+0x21/0x50 [<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core] [<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core] [<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core] [<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core] [<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib] [<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib] Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-15net/mlx5: Remove redundant EQ API exportsSaeed Mahameed
EQ structure and API is private to mlx5_core driver only, external drivers should not have access or the means to manipulate EQ objects. Remove redundant exports and move API functions out of the linux/mlx5 include directory into the driver's mlx5_core.h private include file. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-02-15net/mlx5: Move CQ completion and event forwarding logic to eq.cSaeed Mahameed
Since CQ tree is now per EQ, CQ completion and event forwarding became specific implementation of EQ logic, this patch moves that logic to eq.c and makes those functions static. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-02-15net/mlx5: CQ hold/put APISaeed Mahameed
Now as the CQ table is per EQ, add an API to hold/put CQ to be used from eq.c in downstream patch. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-02-15net/mlx5: EQ add/del CQ APISaeed Mahameed
Add API to add/del CQ to/from EQs CQ table to be used in cq.c upon CQ creation/destruction, as CQ table is now private to eq.c. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-02-15net/mlx5: Add missing likely/unlikely hints to cq eventsSaeed Mahameed
If a hardware event is targeting a CQ, that CQ should exist. Add unlikely to error handling flows. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-02-15net/mlx5: CQ Database per EQSaeed Mahameed
Before this patch the driver had one CQ database protected via one spinlock, this spinlock is meant to synchronize between CQ adding/removing and CQ IRQ interrupt handling. On a system with large number of CPUs and on a work load that requires lots of interrupts, this global spinlock becomes a very nasty hotspot and introduces a contention between the active cores, which will significantly hurt performance and becomes a bottleneck that prevents seamless cpu scaling. To solve this we simply move the CQ database and its spinlock to be per EQ (IRQ), thus per core. Tested with: system: 2 sockets, 14 cores per socket, hyperthreading, 2x14x2=56 cores netperf command: ./super_netperf 200 -P 0 -t TCP_RR -H <server> -l 30 -- -r 300,300 -o -s 1M,1M -S 1M,1M WITHOUT THIS PATCH: Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 4.32 0.00 36.15 0.09 0.00 34.02 0.00 0.00 0.00 25.41 Samples: 2M of event 'cycles:pp', Event count (approx.): 1554616897271 Overhead Command Shared Object Symbol + 14.28% swapper [kernel.vmlinux] [k] intel_idle + 12.25% swapper [kernel.vmlinux] [k] queued_spin_lock_slowpath + 10.29% netserver [kernel.vmlinux] [k] queued_spin_lock_slowpath + 1.32% netserver [kernel.vmlinux] [k] mlx5e_xmit WITH THIS PATCH: Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 4.27 0.00 34.31 0.01 0.00 18.71 0.00 0.00 0.00 42.69 Samples: 2M of event 'cycles:pp', Event count (approx.): 1498132937483 Overhead Command Shared Object Symbol + 23.33% swapper [kernel.vmlinux] [k] intel_idle + 1.69% netserver [kernel.vmlinux] [k] mlx5e_xmit Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-02-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull more rdma updates from Doug Ledford: "Items of note: - two patches fix a regression in the 4.15 kernel. The 4.14 kernel worked fine with NVMe over Fabrics and mlx5 adapters. That broke in 4.15. The fix is here. - one of the patches (the endian notation patch from Lijun) looks like a lot of lines of change, but it's mostly mechanical in nature. It amounts to the biggest chunk of change in it (it's about 2/3rds of the overall pull request). Summary: - Clean up some function signatures in rxe for clarity - Tidy the RDMA netlink header to remove unimplemented constants - bnxt_re driver fixes, one is a regression this window. - Minor hns driver fixes - Various fixes from Dan Carpenter and his tool - Fix IRQ cleanup race in HFI1 - HF1 performance optimizations and a fix to report counters in the right units - Fix for an IPoIB startup sequence race with the external manager - Oops fix for the new kabi path - Endian cleanups for hns - Fix for mlx5 related to the new automatic affinity support" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (38 commits) net/mlx5: increase async EQ to avoid EQ overrun mlx5: fix mlx5_get_vector_affinity to start from completion vector 0 RDMA/hns: Fix the endian problem for hns IB/uverbs: Use the standard kConfig format for experimental IB: Update references to libibverbs IB/hfi1: Add 16B rcvhdr trace support IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_node IB/core: Avoid a potential OOPs for an unused optional parameter IB/core: Map iWarp AH type to undefined in rdma_ah_find_type IB/ipoib: Fix for potential no-carrier state IB/hfi1: Show fault stats in both TX and RX directions IB/hfi1: Remove blind constants from 16B update IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit times IB/hfi1: Do not override given pcie_pset value IB/hfi1: Optimize process_receive_ib() IB/hfi1: Remove unnecessary fecn and becn fields IB/hfi1: Look up ibport using a pointer in receive path IB/hfi1: Optimize packet type comparison using 9B and bypass code paths IB/hfi1: Compute BTH only for RDMA_WRITE_LAST/SEND_LAST packet IB/hfi1: Remove dependence on qp->s_hdrwords ...
2018-02-05net/mlx5: increase async EQ to avoid EQ overrunMax Gurtovoy
Currently the async EQ has 256 entries only. It might not be big enough for the SW to handle all the needed pending events. For example, in case of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target and the target goes down, the FW will raise 1024 "last WQE reached" events and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond it the FW should be able to delay the event and raise it later on using internal backpressure mechanism. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Significantly shrink the core networking routing structures. Result of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf 2) Add netdevsim driver for testing various offloads, from Jakub Kicinski. 3) Support cross-chip FDB operations in DSA, from Vivien Didelot. 4) Add a 2nd listener hash table for TCP, similar to what was done for UDP. From Martin KaFai Lau. 5) Add eBPF based queue selection to tun, from Jason Wang. 6) Lockless qdisc support, from John Fastabend. 7) SCTP stream interleave support, from Xin Long. 8) Smoother TCP receive autotuning, from Eric Dumazet. 9) Lots of erspan tunneling enhancements, from William Tu. 10) Add true function call support to BPF, from Alexei Starovoitov. 11) Add explicit support for GRO HW offloading, from Michael Chan. 12) Support extack generation in more netlink subsystems. From Alexander Aring, Quentin Monnet, and Jakub Kicinski. 13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From Russell King. 14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso. 15) Many improvements and simplifications to the NFP driver bpf JIT, from Jakub Kicinski. 16) Support for ipv6 non-equal cost multipath routing, from Ido Schimmel. 17) Add resource abstration to devlink, from Arkadi Sharshevsky. 18) Packet scheduler classifier shared filter block support, from Jiri Pirko. 19) Avoid locking in act_csum, from Davide Caratti. 20) devinet_ioctl() simplifications from Al viro. 21) More TCP bpf improvements from Lawrence Brakmo. 22) Add support for onlink ipv6 route flag, similar to ipv4, from David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits) tls: Add support for encryption using async offload accelerator ip6mr: fix stale iterator net/sched: kconfig: Remove blank help texts openvswitch: meter: Use 64-bit arithmetic instead of 32-bit tcp_nv: fix potential integer overflow in tcpnv_acked r8169: fix RTL8168EP take too long to complete driver initialization. qmi_wwan: Add support for Quectel EP06 rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK ipmr: Fix ptrdiff_t print formatting ibmvnic: Wait for device response when changing MAC qlcnic: fix deadlock bug tcp: release sk_frag.page in tcp_disconnect ipv4: Get the address of interface correctly. net_sched: gen_estimator: fix lockdep splat net: macb: Handle HRESP error net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring ipv6: addrconf: break critical section in addrconf_verify_rtnl() ipv6: change route cache aging logic i40e/i40evf: Update DESC_NEEDED value to reflect larger value bnxt_en: cleanup DIM work on device shutdown ...
2018-01-30Merge tag v4.15 of ↵Jason Gunthorpe
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git To resolve conflicts in: drivers/infiniband/hw/mlx5/main.c drivers/infiniband/hw/mlx5/qp.c From patches merged into the -rc cycle. The conflict resolution matches what linux-next has been carrying. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoringGal Pressman
On TTC table creation, the indirection TIRs should be used instead of the inner indirection TIRs. Fixes: 1ae1df3a1193 ("net/mlx5e: Refactor RSS related objects and code") Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Shalom Lagziel <shaloml@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25mlxsw: use tc_cls_can_offload_and_chain0()Jakub Kicinski
Make use of tc_cls_can_offload_and_chain0() to set extack msg in case ethtool tc offload flag is not set or chain unsupported. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25mlx5: use tc_cls_can_offload_and_chain0()Jakub Kicinski
Make use of tc_cls_can_offload_and_chain0() to set extack msg in case ethtool tc offload flag is not set or chain unsupported. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24mlxsw: spectrum_router: Don't log an error on missing neighborYuval Mintz
Driver periodically samples all neighbors configured in device in order to update the kernel regarding their state. When finding an entry configured in HW that doesn't show in neigh_lookup() driver logs an error message. This introduces a race when removing multiple neighbors - it's possible that a given entry would still be configured in HW as its removal is still being processed but is already removed from the kernel's neighbor tables. Simply remove the error message and gracefully accept such events. Fixes: c723c735fa6b ("mlxsw: spectrum_router: Periodically update the kernel's neigh table") Fixes: 60f040ca11b9 ("mlxsw: spectrum_router: Periodically dump active IPv6 neighbours") Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-22mlxsw: spectrum_router: Remove unnecessary prefix lengths from LPM treeIdo Schimmel
In commit fc922bb0dd94 ("mlxsw: spectrum_router: Use one LPM tree for all virtual routers") I tried to make sure only used prefix lengths are present in the LPM tree shared between all virtual routers. However, this optimization had to be removed in commit a69518cf0b4c ("mlxsw: spectrum_router: Avoid expensive lookup during route removal"), since determining the used prefix lengths required us to traverse all the active virtual routers, which could result in a hung task depending on the number of VRFs and whether routes were removed due to abort or not. Re-introduce the optimization by moving the prefix usage accounting from the virtual routers to the LPM tree, as this accounting is only used in order to determine the tree's structure. To make the sharing of the trees more explicit, the two trees (for IPv4 and IPv6) are stored in the shared router struct and upon the creation of a virtual router it is immediately bound to both. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-22mlxsw: spectrum_router: Pass FIB node to LPM tree unlink functionIdo Schimmel
Next patch will try to optimize the LPM tree and make sure only used prefix lengths are present, to avoid unnecessary look-ups. Pass the currently removed FIB node to the unlinking function as its associated prefix length is a potential candidate for removal from the tree. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-22mlxsw: spectrum_router: Use the nodes list as indication for empty FIBIdo Schimmel
Currently, each FIB (IPv4 / IPv6) in a virtual router holds a prefix usage that is used to choose a matching LPM tree, but also to check if the FIB is empty, so that the LPM tree could be unbound. Next patches will remove the reliance on the per-FIB prefix usage for LPM tree matching. Keeping it only to check if the FIB is empty is a waste, since we can use the nodes ({Prefix, Length}) list instead. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21mlxsw: spectrum_acl: Add support for mirror actionArkadi Sharshevsky
Add support for mirror action. Only one mirror action can be set per rule. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21mlxsw: spectrum: Extend mlxsw_afa_ops for counter index and implement for ↵Arkadi Sharshevsky
Spectrum Introduce extension of mlxsw_afa_ops in order to add/del mirroring and implement the ops for Spectrum. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21mlxsw: spectrum: Extend and export SPAN APIArkadi Sharshevsky
Extend SPAN API for ACL case. In case of ACL triggering the MPAR register shouldn't be configured. This patch also export those helpers for ACL usage. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21mlxsw: spectrum_acl: Add support for mirroring actionArkadi Sharshevsky
The patch extends the trap action for mirroring. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21mlxsw: core: Make counter index allocated inside the action appendJiri Pirko
So far, the caller of mlxsw_afa_block_append_counter needed to allocate counter index by hand. Benefit from the previously introduced resource infra and counter_index_get/put callbacks, and allocate the counter index in place where it is needed, inside the action append function. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21mlxsw: core: Convert fwd_entry_ref list to be generic per-block resource listJiri Pirko
Since the resource list needs to be used also for other entries different to fwd_entry_ref, make the list generic. For that purpose, introduce a resource structure with couple of helpers that the code which need to store a per-block resource should use. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21mlxsw: spectrum: Extend mlxsw_afa_ops for counter index and implement for ↵Jiri Pirko
Spectrum Introduce extension of mlxsw_afa_ops in order to get/put counter indexes and implement the ops for Spectrum. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21Merge tag 'mlx5-updates-2018-01-19' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2018-01-19 From: Or Gerlitz <ogerlitz@mellanox.com> ======= First six patches of this series further enhances the mlx5 hairpin support. The first two patches deal with using different hairpin instances for flows whose packets have different priorities to align with the port TX QoS model. The next four patches allow us to do HW spreading of flows over a set of hairpin pairs using RSS. The last two patches change the driver to also set the size of the HW hairpin queues. ======== Next four patches from Eran Ben Elisha <eranbe@mellanox.com>: Add more debug data for TX timeout handling, and further enhance and optimize TX timeout handling upon lost interrupts, which adds a mechanism for explicitly polling EQ in case of a TX timeout in order to recover from a lost interrupt. If this is not the case (no pending EQEs), perform a channels full recovery as usual. From Kamal Heib <kamalh@mellanox.com>, Two patches to extend the stats group API to have an update_stats() callback which will be used to fetch the hardware or software counters data, this will improve the current API and reduce code duplication. From Gal Pressman <galp@mellanox.com>, Last patch, Add likely to the common RX checksum flow. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21net/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compareTalat Batheesh
Helmut reported a bug about division by zero while running traffic and doing physical cable pull test. When the cable unplugged the ppms become zero, so when dividing the current ppms by the previous ppms in the next dim iteration there is division by zero. This patch prevent this division for both ppms and epms. Fixes: c3164d2fc48f ("net/mlx5e: Added BW check for DIM decision mechanism") Reported-by: Helmut Grauer <helmut.grauer@de.ibm.com> Signed-off-by: Talat Batheesh <talatb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The BPF verifier conflict was some minor contextual issue. The TUN conflict was less trivial. Cong Wang fixed a memory leak of tfile->tx_array in 'net'. This is an skb_array. But meanwhile in net-next tun changed tfile->tx_arry into tfile->tx_ring which is a ptr_ring. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19mlxsw: spectrum: Upper-bound supported FW versionYuval Mintz
During initialization the driver checks whether the flashed FW image suits its requirements by checking that it's sufficiently new. However, there's only a weak backward compatibility scheme that is actually guaranteed by the FW, so driver must also upper bound the version to prevent compatibility issues between current driver and some possible future fw. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19net/mlx5e: Add likely to the common RX checksum flowGal Pressman
Most of the packets return true for is_last_ethertype_ip, surround it with likely compiler hint. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19net/mlx5e: Extend the stats group API to have update_stats()Kamal Heib
Extend the stats group API to have an update_stats() callback which will be used to fetch the hardware or software counters data. Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19net/mlx5e: Merge per priority stats groupsKamal Heib
Merge the per priority traffic and pfc groups into one group, because both groups share the same update_stats() callback which will be introduced in the upcoming patch. Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19net/mlx5e: Add per-channel counters infrastructure, use it upon TX timeoutEran Ben Elisha
Add per-channel counter ch#_eq_rearm to monitor how many lost interrupt recovery actions happened upon TX timeouts. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19net/mlx5e: Poll event queue upon TX timeout before performing full channels ↵Eran Ben Elisha
recovery Up until this patch, on every TX timeout we would try to do channels recovery. However, in case of a lost interrupt for an EQ, the channel associated to it cannot be recovered if reopened as it would never get another interrupt on sent/received traffic, and eventually ends up with another TX timeout (Restarting the EQ is not part of channel recovery). This patch adds a mechanism for explicitly polling EQ in case of a TX timeout in order to recover from a lost interrupt. If this is not the case (no pending EQEs), perform a channels full recovery as usual. Once a lost EQE is recovered, it triggers the NAPI to run and handle all pending completions. This will free some budget in the bql (via calling netdev_tx_completed_queue) or by clearing pending TXWQEs and waking up the queue. One of the above actions will move the queue to be ready for transmit again. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19net/mlx5e: Add Event Queue meta data info for TX timeout logsEran Ben Elisha
When TX timeout occurs, EQ consumer index and irqn can help in debug for understanding the SW state of EQ. Add them to the logger prints for the relevant EQ only. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19net/mlx5e: Print delta since last transmit per SQ upon TX timeoutEran Ben Elisha
When driver callback for TX timeout is being called, it handles all stopped xmit queues (not only the ones which their timeout expired). Add usecs since last transmit to TX timeout logs per send queue in order to monitor if the queue timeout expired. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19net/mlx5e: Set hairpin queue sizeOr Gerlitz
For a given hairpin packet buffer size, different queue sizes (values of log_hairpin_num_packets) determine how the data is broken to strides on the RQ. Currently the chosen value is set to 64B strides. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19net/mlx5: Enable setting hairpin queue sizeOr Gerlitz
Allow to specify the size of the hairpin queues along with the packet buffer data size from the core setup code. If the driver doesn't provide this, the FW applies proper value that matches the provided data size and a FW chosen RQ stride size. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19net/mlx5e: Add RSS support for hairpinOr Gerlitz
Support RSS for hairpin traffic. We create multiple hairpin RQ/SQ pairs and RSS TTC table per hairpin instance and steer the related flows through that table so they are spread between the pairs. We open one pair per 50Gbs link speed, for all speeds <= 50Gbs, there is one pair and no RSS while for 100Gbs ports two RSSed pairs. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>