summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox
AgeCommit message (Collapse)Author
2016-04-06mlxsw: spectrum: Initialize egress schedulingIdo Schimmel
Before introducing support for DCB ops we should first make sure we initialize the relevant parts in the device correctly. Specifically, the egress scheduling. The device supports a superset of the 802.1Qaz standard with 4 hierarchy levels that can be linked to each other in multiple ways and with different transmission selection algorithms (TSA) employed between them. However, since we only intend to support the 802.1Qaz standard we flatten the hierarchies and let the user configure via DCB ops the TSA and max rate shaper at the subgroup hierarchy (see figure below) and the mapping between switch priority to traffic class. By default, all switch priorities are mapped to traffic class 0, strict priority is employed and max shaper is disabled. Default configuration: switch priority 0 ... switch priority 7 + + | | +----------------------------------+ | +--v--+ +-----+ Traffic Class | | | | Hierarchy | TC0 | ... | TC7 | | | | | +--+--+ +--+--+ | | +--v--+ +--v--+ Subgroup | SG0 | | SG7 | Hierarchy | | | | +-----+ +-----+ | TSA | | TSA | +-----+ ... +-----+ | MAX | | MAX | +--+--+ +--+--+ | | +---------------+----------------+ | +--v--+ Group | | Hierarchy | GR0 | | | +--+--+ | +--v--+ Port | | Hierarchy | PR0 | | | +-----+ Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06mlxsw: reg: Add QoS Switch Traffic Class Table registerIdo Schimmel
As part of DCB ops we'll have to configure the priority to traffic class mapping of a port. Add the QoS Switch Traffic Class Table (QTCT) register, which configures the mapping between the packet switch priority and traffic class on the transmit port. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06mlxsw: reg: Add QoS ETS Element Configuration registerIdo Schimmel
We are going to introduce support for DCB, so we need to be able to configure the traffic selection algorithm (TSA) used by each traffic class (TC), as well as the bandwidth percentage allocated to each TC in case of ETS. Add the QoS ETS Element Configuration register, which controls the above parameters. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06mlxsw: spectrum: Set port's shared buffer size to 0Ido Schimmel
In addition to the priority group (PG) buffers in the headroom, the device enables the allocation of headroom shared buffer, which can be shared between different PGs. However, we are not going to use the headroom shared buffer and instead allow the user to use its size for PGs or the switch's shared buffer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06mlxsw: reg: Use correct PBMC register lengthIdo Schimmel
The last field of the PBMC register is at offset 0x64 and its size is 0x8, so the correct register's length is 0x6C bytes. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06mlxsw: spectrum: Correctly configure headroom sizeIdo Schimmel
When packets ingress the switch they are assigned a switch priority and directed to the corresponding priority group (PG) buffer in the port's headroom buffer. Since we now map all switch priorities to priority group 0 (PG0) by default, there is no need to allocate the other priority groups during initialization. The only exception is PG9, which is used for control traffic. At minimum, the PG should be able to store the currently classified packet (pipeline latency isn't 0) and also the packets arriving during the classification time. However, an incoming packet will not be buffered if there is no available MTU-sized buffer space for storing it. The buffer needed to accommodate for pipeline latency is variable and needs to take into account both the current link speed and current latency of the pipeline, which is time-dependent. Testing showed that setting the PG's size to twice the current MTU is optimal. Since PG9 is used strictly for control packets and not subject to flow control, we are not going to resize it according to user configuration, so we simply set it according to worst case scenario, which is twice the maximum MTU. In any case, later patches in the series will allow a user to direct lossless flows to other PGs than PG0 and set their size to accommodate for round-trip propagation delay. The above change also requires us to resize the PG buffer whenever the port's MTU is changed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06mlxsw: spectrum: Add bytes to cells helperIdo Schimmel
Buffers in the switch store packets in units called buffer cells. Add a helper to convert from bytes to cells, so that the actual number of cells required (result is round up) is returned. Also, drop the SB (shared buffer) acronym from the BYTES_PER_CELL macro, as this unit is also used in the ports' buffers and not only the switch's shared buffer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06mlxsw: spectrum: Map all switch priorities to priority group 0Ido Schimmel
During transmission, the skb's priority is used to map the skb to a traffic class, where the idea is to group priorities with similar characteristics (e.g. lossy, lossless) to the same traffic class. By default, all priorities are mapped to traffic class 0. In the device, we model the skb's priority as the switch priority, which is assigned to a packet according to its PCP value and ingress port (untagged packets are assigned the port's default switch priority - 0). At ingress, the packet is directed to a priority group (PG) buffer in the port's headroom buffer according to the packet's switch priority and switch priority to buffer mapping. While it's possible to configure the egress mapping between skb's priority (switch priority) and traffic class, there is no mechanism to configure the ingress mapping to a PG. In order to keep things simple and since grouping certain priorities into a traffic class at egress also implies they should be grouped the same at ingress, treat a PG as the ingress counterpart of an egress traffic class. Having established the above, during initialization map all the switch priorities to PG0 in accordance with the Linux defaults for traffic class mapping. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06mlxsw: reg: Add Port Prio To Buffer registerIdo Schimmel
When packets ingress the switch they are assigned a switch priority number that dictates the packet's priority group (PG) buffer in the port's headroom buffer. Add the Port Prio To Buffer (PPTB) register, which configures the switch priority to PG mapping. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05mlxsw: spectrum: Add support for physical port namesIdo Schimmel
Export to userspace the front panel name of the port, so that udev can rename the ports accordingly. The convention suggested by switchdev documentation is used: 1) Non-split: pX 2) Split: pXsY Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05mlxsw: spectrum: Reduce number of supported 802.1D bridgesIdo Schimmel
Resources allocated for these bridges at init time cannot be later used for other purposes. While current number is supported by the device, it's mostly theoretical with regards to any real use case, which leads to poor utilization of device's resources. Solve that by reducing the number. The long term plan is to make this value (along with others) user configurable via devlink and write it to NVRAM, so that it can be used during the next init. Until then we must hardcode such values. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking bugfixes from David Miller: "Several bug fixes rolling in, some for changes introduced in this merge window, and some for problems that have existed for some time: 1) Fix prepare_to_wait() handling in AF_VSOCK, from Claudio Imbrenda. 2) The new DST_CACHE should be a silent config option, from Dave Jones. 3) inet_current_timestamp() unintentionally truncates timestamps to 16-bit, from Deepa Dinamani. 4) Missing reference to netns in ppp, from Guillaume Nault. 5) Free memory reference in hv_netvsc driver, from Haiyang Zhang. 6) Missing kernel doc documentation for function arguments in various spots around the networking, from Luis de Bethencourt. 7) UDP stopped receiving broadcast packets properly, due to overzealous multicast checks, fix from Paolo Abeni" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits) net: ping: make ping_v6_sendmsg static hv_netvsc: Fix the order of num_sc_offered decrement net: Fix typos and whitespace. hv_netvsc: Fix the array sizes to be max supported channels hv_netvsc: Fix accessing freed memory in netvsc_change_mtu() ppp: take reference on channels netns net: Reset encap_level to avoid resetting features on inner IP headers net: mediatek: fix checking for NULL instead of IS_ERR() in .probe net: phy: at803x: Request 'reset' GPIO only for AT8030 PHY at803x: fix reset handling AF_VSOCK: Shrink the area influenced by prepare_to_wait Revert "vsock: Fix blocking ops call in prepare_to_wait" macb: fix PHY reset ipv4: initialize flowi4_flags before calling fib_lookup() fsl/fman: Workaround for Errata A-007273 ipv4: fix broadcast packets reception net: hns: bug fix about the overflow of mss net: hns: adds limitation for debug port mtu net: hns: fix the bug about mtu setting net: hns: fixes a bug of RSS ...
2016-03-22Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull more rdma updates from Doug Ledford: "Round two of 4.6 merge window patches. This is a monster pull request. I held off on the hfi1 driver updates (the hfi1 driver is intimately tied to the qib driver and the new rdmavt software library that was created to help both of them) in my first pull request. The hfi1/qib/rdmavt update is probably 90% of this pull request. The hfi1 driver is being left in staging so that it can be fixed up in regards to the API that Al and yourself didn't like. Intel has agreed to do the work, but in the meantime, this clears out 300+ patches in the backlog queue and brings my tree and their tree closer to sync. This also includes about 10 patches to the core and a few to mlx5 to create an infrastructure for configuring SRIOV ports on IB devices. That series includes one patch to the net core that we sent to netdev@ and Dave Miller with each of the three revisions to the series. We didn't get any response to the patch, so we took that as implicit approval. Finally, this series includes Intel's new iWARP driver for their x722 cards. It's not nearly the beast as the hfi1 driver. It also has a linux-next merge issue, but that has been resolved and it now passes just fine. Summary: - A few minor core fixups needed for the next patch series - The IB SRIOV series. This has bounced around for several versions. Of note is the fact that the first patch in this series effects the net core. It was directed to netdev and DaveM for each iteration of the series (three versions total). Dave did not object, but did not respond either. I've taken this as permission to move forward with the series. - The new Intel X722 iWARP driver - A huge set of updates to the Intel hfi1 driver. Of particular interest here is that we have left the driver in staging since it still has an API that people object to. Intel is working on a fix, but getting these patches in now helps keep me sane as the upstream and Intel's trees were over 300 patches apart" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (362 commits) IB/ipoib: Allow mcast packets from other VFs IB/mlx5: Implement callbacks for manipulating VFs net/mlx5_core: Implement modify HCA vport command net/mlx5_core: Add VF param when querying vport counter IB/ipoib: Add ndo operations for configuring VFs IB/core: Add interfaces to control VF attributes IB/core: Support accessing SA in virtualized environment IB/core: Add subnet prefix to port info IB/mlx5: Fix decision on using MAD_IFC net/core: Add support for configuring VF GUIDs IB/{core, ulp} Support above 32 possible device capability flags IB/core: Replace setting the zero values in ib_uverbs_ex_query_device net/mlx5_core: Introduce offload arithmetic hardware capabilities net/mlx5_core: Refactor device capability function net/mlx5_core: Fix caching ATOMIC endian mode capability ib_srpt: fix a WARN_ON() message i40iw: Replace the obsolete crypto hash interface with shash IB/hfi1: Add SDMA cache eviction algorithm IB/hfi1: Switch to using the pin query function IB/hfi1: Specify mm when releasing pages ...
2016-03-21net/mlx5_core: Implement modify HCA vport commandEli Cohen
Implement the modify HCA vport commands used to modify the parameters of virtual HCA's ports. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21net/mlx5_core: Add VF param when querying vport counterEli Cohen
Add a vf parameter to mlx5_core_query_vport_counter so we can call it to query counters of virtual functions. Also update current users of the API. PFs may call mlx5_core_query_vport_counter with other_vport set to indicate that they are querying a virtual function. The virtual function to be queried is given by the vf parameter. Virtual function numbering is zero based so the first VF is 0 and so on. When a PF queries its own function, the other_vport parameter is cleared. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21net/mlx5_core: Introduce offload arithmetic hardware capabilitiesSagi Grimberg
Define the necessary hardware structures for the offload arithmetic capabilities and read/cache them on driver load. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21net/mlx5_core: Refactor device capability functionLeon Romanovsky
Device capability function was called similar in all places. It was called twice for every queried parameter, while the difference between calls was in HCA capability mode only. The change proposed unify these calls into one function. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21net/mlx5_core: Fix caching ATOMIC endian mode capabilityLeon Romanovsky
Add caching of maximum device capability of ATOMIC endian mode. Fixes: f91e6d8941bf ('net/mlx5_core: Add setting ATOMIC endian mode') Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-20net/mlx4: remove unused array zero_gid[]Colin Ian King
zero_gid is not used, so remove this redundant array. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Support more Realtek wireless chips, from Jes Sorenson. 2) New BPF types for per-cpu hash and arrap maps, from Alexei Starovoitov. 3) Make several TCP sysctls per-namespace, from Nikolay Borisov. 4) Allow the use of SO_REUSEPORT in order to do per-thread processing of incoming TCP/UDP connections. The muxing can be done using a BPF program which hashes the incoming packet. From Craig Gallek. 5) Add a multiplexer for TCP streams, to provide a messaged based interface. BPF programs can be used to determine the message boundaries. From Tom Herbert. 6) Add 802.1AE MACSEC support, from Sabrina Dubroca. 7) Avoid factorial complexity when taking down an inetdev interface with lots of configured addresses. We were doing things like traversing the entire address less for each address removed, and flushing the entire netfilter conntrack table for every address as well. 8) Add and use SKB bulk free infrastructure, from Jesper Brouer. 9) Allow offloading u32 classifiers to hardware, and implement for ixgbe, from John Fastabend. 10) Allow configuring IRQ coalescing parameters on a per-queue basis, from Kan Liang. 11) Extend ethtool so that larger link mode masks can be supported. From David Decotigny. 12) Introduce devlink, which can be used to configure port link types (ethernet vs Infiniband, etc.), port splitting, and switch device level attributes as a whole. From Jiri Pirko. 13) Hardware offload support for flower classifiers, from Amir Vadai. 14) Add "Local Checksum Offload". Basically, for a tunneled packet the checksum of the outer header is 'constant' (because with the checksum field filled into the inner protocol header, the payload of the outer frame checksums to 'zero'), and we can take advantage of that in various ways. From Edward Cree" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits) bonding: fix bond_get_stats() net: bcmgenet: fix dma api length mismatch net/mlx4_core: Fix backward compatibility on VFs phy: mdio-thunder: Fix some Kconfig typos lan78xx: add ndo_get_stats64 lan78xx: handle statistics counter rollover RDS: TCP: Remove unused constant RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket net: smc911x: convert pxa dma to dmaengine team: remove duplicate set of flag IFF_MULTICAST bonding: remove duplicate set of flag IFF_MULTICAST net: fix a comment typo ethernet: micrel: fix some error codes ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it bpf, dst: add and use dst_tclassid helper bpf: make skb->tc_classid also readable net: mvneta: bm: clarify dependencies cls_bpf: reset class and reuse major in da ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c ldmvsw: Add ldmvsw.c driver code ...
2016-03-18net/mlx4_core: Fix backward compatibility on VFsEli Cohen
Commit 85743f1eb345 ("net/mlx4_core: Set UAR page size to 4KB regardless of system page size") introduced dependency where old VF drivers without this fix fail to load if the PF driver runs with this commit. To resolve this add a module parameter which disables that functionality by default. If both the PF and VFs are running with a driver with that commit the administrator may set the module param to true. The module parameter is called enable_4k_uar. Fixes: 85743f1eb345 ('net/mlx4_core: Set UAR page size to 4KB ...') Signed-off-by: Eli Cohen <eli@mellanox.com> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge second patch-bomb from Andrew Morton: - a couple of hotfixes - the rest of MM - a new timer slack control in procfs - a couple of procfs fixes - a few misc things - some printk tweaks - lib/ updates, notably to radix-tree. - add my and Nick Piggin's old userspace radix-tree test harness to tools/testing/radix-tree/. Matthew said it was a godsend during the radix-tree work he did. - a few code-size improvements, switching to __always_inline where gcc screwed up. - partially implement character sets in sscanf * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) sscanf: implement basic character sets lib/bug.c: use common WARN helper param: convert some "on"/"off" users to strtobool lib: add "on"/"off" support to kstrtobool lib: update single-char callers of strtobool() lib: move strtobool() to kstrtobool() include/linux/unaligned: force inlining of byteswap operations include/uapi/linux/byteorder, swab: force inlining of some byteswap operations include/asm-generic/atomic-long.h: force inlining of some atomic_long operations usb: common: convert to use match_string() helper ide: hpt366: convert to use match_string() helper ata: hpt366: convert to use match_string() helper power: ab8500: convert to use match_string() helper power: charger_manager: convert to use match_string() helper drm/edid: convert to use match_string() helper pinctrl: convert to use match_string() helper device property: convert to use match_string() helper lib/string: introduce match_string() helper radix-tree tests: add test for radix_tree_iter_next radix-tree tests: add regression3 test ...
2016-03-18Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "Initial roundup of 4.6 merge window patches. This is the first of two pull requests. It is the smaller request, but touches for more different things (this is everything but what is in or going into staging). The pull request for the code in staging/rdma is on hold until after we decide what to do on the write/writev API issue and may be partially deferred until 4.7 as a result. Summary: - cxgb4 updates - nes updates - unification of iwarp portmapper code to core - add drain_cq API - various ib_core updates - minor ipoib updates - minor mlx4 updates - more significant mlx5 updates (including a minor merge conflict with net-next tree...merge is simple to resolve and Stephen's resolution was confirmed by Mellanox) - trivial net/9p rdma conversion - ocrdma RoCEv2 update - srpt updates" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (85 commits) iwpm: crash fix for large connections test iw_cxgb3: support for iWARP port mapping iw_cxgb4: remove port mapper related code iw_nes: remove port mapper related code iwcm: common code for port mapper net/9p: convert to new CQ API IB/mlx5: Add support for don't trap rules net/mlx5_core: Introduce forward to next priority action net/mlx5_core: Create anchor of last flow table iser: Accept arbitrary sg lists mapping if the device supports it mlx5: Add arbitrary sg list support IB/core: Add arbitrary sg_list support IB/mlx5: Expose correct max_fast_reg_page_list_len IB/mlx5: Make coding style more consistent IB/mlx5: Convert UMR CQ to new CQ API IB/ocrdma: Skip using unneeded intermediate variable IB/ocrdma: Skip using unneeded intermediate variable IB/ocrdma: Delete unnecessary variable initialisations in 11 functions IB/core: Documentation fix in the MAD header file IB/core: trivial prink cleanup. ...
2016-03-17mm: introduce page reference manipulation functionsJoonsoo Kim
The success of CMA allocation largely depends on the success of migration and key factor of it is page reference count. Until now, page reference is manipulated by direct calling atomic functions so we cannot follow up who and where manipulate it. Then, it is hard to find actual reason of CMA allocation failure. CMA allocation should be guaranteed to succeed so finding offending place is really important. In this patch, call sites where page reference is manipulated are converted to introduced wrapper function. This is preparation step to add tracepoint to each page reference manipulation function. With this facility, we can easily find reason of CMA allocation failure. There is no functional change in this patch. In addition, this patch also converts reference read sites. It will help a second step that renames page._count to something else and prevents later attempt to direct access to it (Suggested by Andrew). Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-16Merge branches 'mlx4', 'mlx5' and 'ocrdma' into k.o/for-4.6Doug Ledford
2016-03-14mlx4: add missing braces in verify_qp_parametersArnd Bergmann
The implementation of QP paravirtualization back in linux-3.7 included some code that looks very dubious, and gcc-6 has grown smart enough to warn about it: drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'verify_qp_parameters': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3154:5: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation] if (optpar & MLX4_QP_OPTPAR_ALT_ADDR_PATH) { ^~ drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3144:4: note: ...this 'if' clause, but it is not if (slave != mlx4_master_func_num(dev)) >From looking at the context, I'm reasonably sure that the indentation is correct but that it should have contained curly braces from the start, as the update_gid() function in the same patch correctly does. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 54679e148287 ("mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop") Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13mlx5: use napi_consume_skb API to get bulk free operationsJesper Dangaard Brouer
Bulk free of SKBs happen transparently by the API call napi_consume_skb(). The napi budget parameter is needed by napi_consume_skb() to detect if called from netpoll. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13mlx4: use napi_consume_skb API to get bulk free operationsJesper Dangaard Brouer
Bulk free of SKBs happen transparently by the API call napi_consume_skb(). The napi budget parameter is usually needed by napi_consume_skb() to detect if called from netpoll. In this patch it has an extra meaning. For mlx4 driver, the mlx4_en_stop_port() call is done outside NAPI/softirq context, and cleanup the entire TX ring via mlx4_en_free_tx_buf(). The code mlx4_en_free_tx_desc() for freeing SKBs are shared with NAPI calls. To handle this shared use the zero budget indication is reused, and handled appropriately in napi_consume_skb(). To reflect this, variable is called napi_mode for the function call that needed this distinction. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13mlxsw: pci: Implement reset done checkJiri Pirko
Firmware now tells us that the reset is done by passing a magic value via register. Use it to shorten the wait in case this is supported. With old firmware, we still wait until the timeout is reached. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-11mlxsw: spectrum: Check requested ageing time is validIdo Schimmel
Commit c62987bbd8a1 ("bridge: push bridge setting ageing_time down to switchdev") added a check for minimum and maximum ageing time, but this breaks existing behaviour where one can set ageing time to 0 for a non-learning bridge. Push this check down to the driver and allow the check in the bridge layer to be removed. Currently ageing time 0 is refused by the driver, but we can later add support for this functionality. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-10net/mlx5e: Support offload cls_flower with skbedit mark actionAmir Vadai
Introduce offloading of skbedit mark action. For example, to mark with 0x1234, all TCP (ip_proto 6) packets arriving to interface ens9: # tc qdisc add dev ens9 ingress # tc filter add dev ens9 protocol ip parent ffff: \ flower ip_proto 6 \ indev ens9 \ action skbedit mark 0x1234 Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-10net/mlx5e: Support offload cls_flower with drop actionAmir Vadai
Parse tc_cls_flower_offload into device specific commands and program the hardware to classify and act accordingly. For example, to drop ICMP (ip_proto 1) packets from specific smac, dmac, src_ip, src_ip, arriving to interface ens9: # tc qdisc add dev ens9 ingress # tc filter add dev ens9 protocol ip parent ffff: \ flower ip_proto 1 \ dst_mac 7c:fe:90:69:81:62 src_mac 7c:fe:90:69:81:56 \ dst_ip 11.11.11.11 src_ip 11.11.11.12 indev ens9 \ action drop Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-10net/mlx5e: Introduce tc offload supportAmir Vadai
Extend ndo_setup_tc() to support ingress tc offloading. Will be used by later patches to offload tc flower filter. Feature is off by default and could be enabled by issuing: # ethtool -K eth0 hw-tc-offload on Offloads flow table is dynamically created when first filter is added. Rules are saved in a hash table that is maintained by the consumer (for example - the flower offload in the next patch). When last filter is removed and no filters exist in the hash table, the offload flow table is destroyed. Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-10net/mlx5e: Add a new priority for kernel flow tablesAmir Vadai
Move the vlan and main flow tables to use priority 1. This will allow the upcoming TC offload logic to use a higher priority (0) for the offload steering table. Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-10net/mlx5e: Relax ndo_setup_tc handle restrictionAmir Vadai
Restricting handle to TC_H_ROOT breaks the old instantiation of mqprio to setup a hardware qdisc. This patch relaxes the test, to only check the type. Fixes: 08fb1da ("net/mlx5e: Support DCBNL IEEE ETS") Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-10net/mlx5_core: Set flow steering dest only for forward rulesAmir Vadai
We need to handle flow table entry destinations only if the action associated with the rule is forwarding (MLX5_FLOW_CONTEXT_ACTION_FWD_DEST). Fixes: 26a8145390b3 ('net/mlx5_core: Introduce flow steering firmware commands') Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-10net/mlx5_core: Introduce forward to next priority actionMaor Gottlieb
Add support to create flow rule that forward packets to the first flow table in the next priority (next priority could be the first priority in the next namespace or the next priority in the same namespace). This feature could be used for DONT_TRAP rules or rules that only want to mark the packet with flow tag. In order to do it optimally, each flow table has list of all rules that point to this flow table, when a flow table is destroyed/created, we update the list head correspondingly. This kind of rule is created when destination is NULL and action is MLX5_FLOW_CONTEXT_ACTION_FWD_NEXT_PRIO. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10net/mlx5_core: Create anchor of last flow tableMaor Gottlieb
Create an empty flow table in the end of NIC rx namesapce. Adding this flow table simplify the implementation of "forward to next prio" rules. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Several cases of overlapping changes, as well as one instance (vxlan) of a bug fix in 'net' overlapping with code movement in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-07mlxsw: pci: Correctly determine if descriptor queue is fullIdo Schimmel
The descriptor queues for sending (SDQs) and receiving (RDQs) packets are managed by two counters - producer and consumer - which are both 16-bit in size. A queue is considered full when the difference between the two equals the queue's maximum number of descriptors. However, if the producer counter overflows, then it's possible for the full queue check to fail, as it doesn't take the overflow into account. In such a case, descriptors already passed to the device - but for which a completion has yet to be posted - will be overwritten, thereby causing undefined behavior. The above can be achieved under heavy load (~30 netperf instances). Fix that by casting the subtraction result to u16, preventing it from being treated as a signed integer. Fixes: eda6500a987a ("mlxsw: Add PCI bus implementation") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-07mlxsw: spectrum: Always decrement bridge's ref countIdo Schimmel
Since we only support one VLAN filtering bridge we need to associate a reference count with it, so that when the last port netdev leaves it, we would know that a different bridge can be offloaded to hardware. When a LAG device is memeber in a bridge and port netdevs are leaving the LAG, we should always decrement the bridge's reference count, as it's incremented for any port in the LAG. Fixes: 4dc236c31733 ("mlxsw: spectrum: Handle port leaving LAG while bridged") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-03net: mellanox: add DEVLINK dependenciesArnd Bergmann
The new NET_DEVLINK infrastructure can be a loadable module, but the drivers using it might be built-in, which causes link errors like: drivers/net/built-in.o: In function `mlx4_load_one': :(.text+0x2fbfda): undefined reference to `devlink_port_register' :(.text+0x2fc084): undefined reference to `devlink_port_unregister' drivers/net/built-in.o: In function `mlxsw_sx_port_remove': :(.text+0x33a03a): undefined reference to `devlink_port_type_clear' :(.text+0x33a04e): undefined reference to `devlink_port_unregister' There are multiple ways to avoid this: a) add 'depends on NET_DEVLINK || !NET_DEVLINK' dependencies for each user b) use 'select NET_DEVLINK' from each driver that uses it and hide the symbol in Kconfig. c) make NET_DEVLINK a 'bool' option so we don't have to list it as a dependency, and rely on the APIs to be stubbed out when it is disabled d) use IS_REACHABLE() rather than IS_ENABLED() to check for NET_DEVLINK in include/net/devlink.h This implements a variation of approach a) by adding an intermediate symbol that drivers can depend on, and changes the three drivers using it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 09d4d087cd48 ("mlx4: Implement devlink interface") Fixes: c4745500e988 ("mlxsw: Implement devlink interface") Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-03net: relax setup_tc ndo op handle restrictionJohn Fastabend
I added this check in setup_tc to multiple drivers, if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO) Unfortunately restricting to TC_H_ROOT like this breaks the old instantiation of mqprio to setup a hardware qdisc. This patch relaxes the test to only check the type to make it equivalent to the check before I broke it. With this the old instantiation continues to work. A good smoke test is to setup mqprio with, # tc qdisc add dev eth4 root mqprio num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 0@0 1@1 2@2 3@3 4@4 5@5 6@6 7@7 Fixes: e4c6734eaab9 ("net: rework ndo tc op to consume additional qdisc handle paramete") Reported-by: Singh Krishneil <krishneil.k.singh@intel.com> Reported-by: Jake Keller <jacob.e.keller@intel.com> CC: Murali Karicheri <m-karicheri2@ti.com> CC: Shradha Shah <sshah@solarflare.com> CC: Or Gerlitz <ogerlitz@mellanox.com> CC: Ariel Elior <ariel.elior@qlogic.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Bruce Allan <bruce.w.allan@intel.com> CC: Jesse Brandeburg <jesse.brandeburg@intel.com> CC: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02net/mlx4_core: Allow resetting VF admin mac to zeroJack Morgenstein
The VF administrative mac addresses (stored in the PF driver) are initialized to zero when the PF driver starts up. These addresses may be modified in the PF driver through ndo calls initiated by iproute2 or libvirt. While we allow the PF/host to change the VF admin mac address from zero to a valid unicast mac, we do not allow restoring the VF admin mac to zero. We currently only allow changing this mac to a different unicast mac. This leads to problems when libvirt scripts are used to deal with VF mac addresses, and libvirt attempts to revoke the mac so this host will not use it anymore. Fix this by allowing resetting a VF administrative MAC back to zero. Fixes: 8f7ba3ca12f6 ('net/mlx4: Add set VF mac address support') Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reported-by: Moshe Levi <moshele@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02net/mlx4_core: Check the correct limitation on VFs for HA modeMoni Shoua
The limit of 63 is only for virtual functions while the actual enforcement was for VFs plus physical functions, fix that. Fixes: e57968a10bc1 ('net/mlx4_core: Support the HA mode for SRIOV VFs too') Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02net/mlx4_core: Fix lockdep warning in handling of mac/vlan tablesJack Morgenstein
In the mac and vlan register/unregister/replace functions, the driver locks the mac table mutex (or vlan table mutex) on both ports. We move to use mutex_lock_nested() to prevent warnings, such as the one below. [ 101.828445] ============================================= [ 101.834820] [ INFO: possible recursive locking detected ] [ 101.841199] 4.5.0-rc2+ #49 Not tainted [ 101.850251] --------------------------------------------- [ 101.856621] modprobe/3054 is trying to acquire lock: [ 101.862514] (&table->mutex#2){+.+.+.}, at: [<ffffffffa079c10e>] __mlx4_register_mac+0x87e/0xa90 [mlx4_core] [ 101.874598] [ 101.874598] but task is already holding lock: [ 101.881703] (&table->mutex#2){+.+.+.}, at: [<ffffffffa079c0f0>] __mlx4_register_mac+0x860/0xa90 [mlx4_core] [ 101.893776] [ 101.893776] other info that might help us debug this: [ 101.901658] Possible unsafe locking scenario: [ 101.901658] [ 101.908859] CPU0 [ 101.911923] ---- [ 101.914985] lock(&table->mutex#2); [ 101.919595] lock(&table->mutex#2); [ 101.924199] [ 101.924199] * DEADLOCK * [ 101.924199] [ 101.931643] May be due to missing lock nesting notation Fixes: 5f61385d2ebc ('net/mlx4_core: Keep VLAN/MAC tables mirrored in multifunc HA mode') Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Suggested-by: Doron Tsur <doront@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02net/mlx5e: Provide correct packet/bytes statisticsGal Pressman
Using the HW VPort counters for traffic (rx/tx packets/bytes) statistics is wrong. This is because frames dropped due to steering or out of buffer will be counted as received. To fix that, we move to use the packet/bytes accounting done by the driver for what the netdev reports out. Fixes: f62b8bb8f2d3 ('net/mlx5: Extend mlx5_core to support [...]') Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02net/mlx5e: Add rx/tx bytes software countersGal Pressman
Sum up rx/tx bytes in software as we do for rx/tx packets, to be reported in upcoming statistics fix. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02net/mlx5e: Correctly handle RSS indirection table when changing number of ↵Tariq Toukan
channels Upon changing num_channels, reset the RSS indirection table to match the new value. Fixes: 2d75b2bc8a8c ('net/mlx5e: Add ethtool RSS configuration options') Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02net/mlx5e: Fix ethtool RX hash func configuration changeTariq Toukan
We should modify TIRs explicitly to apply the new RSS configuration. The light ndo close/open calls do not "refresh" them. Fixes: 2d75b2bc8a8c ('net/mlx5e: Add ethtool RSS configuration options') Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>