Age | Commit message (Collapse) | Author |
|
LS1028a has one Ethernet management interface. On the QDS board, the
MDIO signals are multiplexed to either on-board AR8035 PHY device or
to 4 PCIe slots allowing for SGMII cards.
To enable the Ethernet ENETC Port 1, which can only be connected to a
RGMII PHY, the multiplexer needs to be configured to route the MDIO to
the AR8035 PHY. The MDIO/MDC routing is controlled by bits 7:4 of FPGA
board config register 0x54, and value 0 selects the on-board RGMII PHY.
The FPGA board config registers are accessible on the i2c bus, at address
0x66.
The PF3 MDIO PCIe integrated endpoint device allows for centralized access
to the MDIO bus. Add the corresponding devicetree node and set it to be
the MDIO bus parent.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The on-chip PCIe root complex that integrates the ENETC ethernet
controllers also integrates a PCIe endpoint for the MDIO controller
providing for centralized control of the ENETC mdio bus.
Add bindings for this "central" MDIO Integrated PCIe Endpoint.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ENETC ports can manage the MDIO bus via local register
interface. However there's also a centralized way
to manage the MDIO bus, via the MDIO PCIe endpoint
device integrated by the same root complex that also
integrates the ENETC ports (eth controllers).
Depending on board design and use case, centralized
access to MDIO may be better than using local ENETC
port registers. For instance, on the LS1028A QDS board
where MDIO muxing is required. Also, the LS1028A on-chip
switch doesn't have a local MDIO register interface.
The current patch registers the above PCIe endpoint as a
separate MDIO bus and provides a driver for it by re-using
the code used for local MDIO access. It also allows the
ENETC port PHYs to be managed by this driver if the local
"mdio" node is missing from the ENETC port node.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Clean up overcomplicated makefile to make it more maintainable.
Basically, there's a set of common objects shared between
the PF and VF driver modules. This can be implemented in a
simpler way, without conditionals, less repetition, allowing
also for easier updates in the future.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
What's needed is basically a pointer to the mdio registers.
This is one way to store it inside bus->priv allocated space,
without upsetting sparse.
Reworked accessors to avoid __iomem casting.
Used devm_* variant to further clean up the init error /
remove paths.
Fixes following sparse warning:
warning: incorrect type in assignment (different address spaces)
expected void *priv
got struct enetc_mdio_regs [noderef] <asn:2>*[assigned] regs
Fixes: ebfcb23d62ab ("enetc: Add ENETC PF level external MDIO support")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Hubert Feurstein says:
====================
net: dsa: mv88e6xxx: add support for MV88E6220
This patch series adds support for the MV88E6220 chip to the mv88e6xxx driver.
The MV88E6220 is almost the same as MV88E6250 except that the ports 2-4 are
not routed to pins.
Furthermore, PTP support is added to the MV88E6250 family.
v2:
- insert all 6220 entries in correct numerical order
- introduce invalid_port_mask
- move ptp_cc_mult* to ptp_ops and restored original ptp_adjfine code
- added Andrews Reviewed-By to patch 2 and 4
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This adds PTP support for the MV88E6250 family.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As it is done for all the other structs within this driver.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MV88E6250 family doesn't support the MV88E6XXX_PORT_CTL1_MESSAGE_PORT
bit.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With this it is possible to mark certain chip ports as invalid. This is
required for example for the MV88E6220 (which is in general a MV88E6250
with 7 ports) but the ports 2-4 are not routed to pins.
If a user configures an invalid port, an error is returned.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MV88E6220 is part of the MV88E6250 family.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MV88E6220 is almost the same as MV88E6250 except that the ports 2-4 are
not routed to pins. So the usable ports are 0, 1, 5 and 6.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Andrew Jeffery says:
====================
net: phy: Add AST2600 MDIO support
v2 of the ASPEED MDIO series addresses comments from Rob on the devicetree
bindings and Andrew on the driver itself.
v1 of the series can be found here:
http://patchwork.ozlabs.org/cover/1138140/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ensures we can talk to a PHY via MDIO on the AST2600, as the MDIO
controller is now separate from the MAC.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
phy-handle is necessary for the AST2600 which separates the MDIO
controllers from the MAC.
I've tried to minimise the intrusion of supporting the AST2600 to the
FTGMAC100 by leaving in place the existing MDIO support for the embedded
MDIO interface. The AST2400 and AST2500 continue to be supported this
way, as it avoids breaking/reworking existing devicetrees.
The AST2600 support by contrast requires the presence of the phy-handle
property in the MAC devicetree node to specify the appropriate PHY to
associate with the MAC. In the event that someone wants to specify the
MDIO bus topology under the MAC node on an AST2400 or AST2500, the
current auto-probe approach is done conditional on the absence of an
"mdio" child node of the MAC.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The AST2600 design separates the MDIO controllers from the MAC, which is
where they were placed in the AST2400 and AST2500. Further, the register
interface is reworked again, so now we have three possible different
interface implementations, however this driver only supports the
interface provided by the AST2600. The AST2400 and AST2500 will continue
to be supported by the MDIO support embedded in the FTGMAC100 driver.
The hardware supports both C22 and C45 mode, but for the moment only C22
support is implemented.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The AST2600 splits out the MDIO bus controller from the MAC into its own
IP block and rearranges the register layout. Add a new binding to
describe the new hardware.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In commit 365ad353c256 ("tipc: reduce risk of user starvation during
link congestion") we allowed senders to add exactly one list of extra
buffers to the link backlog queues during link congestion (aka
"oversubscription"). However, the criteria for when to stop adding
wakeup messages to the input queue when the overload abates is
inaccurate, and may cause starvation problems during very high load.
Currently, we stop adding wakeup messages after 10 total failed attempts
where we find that there is no space left in the backlog queue for a
certain importance level. The counter for this is accumulated across all
levels, which may lead the algorithm to leave the loop prematurely,
although there may still be plenty of space available at some levels.
The result is sometimes that messages near the wakeup queue tail are not
added to the input queue as they should be.
We now introduce a more exact algorithm, where we keep adding wakeup
messages to a level as long as the backlog queue has free slots for
the corresponding level, and stop at the moment there are no more such
slots or when there are no more wakeup messages to dequeue.
Fixes: 365ad35 ("tipc: reduce risk of user starvation during link congestion")
Reported-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Reduce the scope of the ring local variable in the fm10k_assign_l2_accel
function.
This was detected by cppcheck and resolves the following warning
produced by that tool:
[fm10k_netdev.c:1447]: (style) The scope of the variable 'ring' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Reduce the scope of the result local variable in the
fm10k_iov_msg_lport_state_pf function.
This was detected by cppcheck and resolves the following warning
produced by that tool:
[fm10k_pf.c:1435]: (style) The scope of the variable 'result' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
The msg variable in the fm10k_mbx_validate_msg_size and
fm10k_sm_mbx_transmit functions is only used within the do {} loop
scope. Reduce its scope only to where it is used.
This was detected by cppcheck, and resolves the following warnings
produced by that tool:
[fm10k_mbx.c:299]: (style) The scope of the variable 'msg' can be reduced.
[fm10k_mbx.c:2004]: (style) The scope of the variable 'msg' can be reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Reduce the scope of the local loop variable in the
fm10k_check_hang_subtask function.
This was detected by cppcheck and resolves the following warning
produced by that tool:
[driver/fm10k_pci.c:852]: (style) The scope of the variable 'i' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Reduce the scope of the local variable err in the fm10k_detach_subtask
function.
This was detected by cppcheck and resolves the following warning
produced by that tool:
[fm10k_pci.c:403]: (style) The scope of the variable 'err' can be reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
The tx_buffer local variable in the function fm10k_clean_tx_ring is not
used except inside a smaller block scope. Reduce the scope to its point
of use.
This was detected by cppcheck and resolves the following style warning
produced by that tool:
[fm10k_netdev.c:179]: (style) The scope of the variable 'tx_buffer' can
be reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Reduce the scope of the q_idx local variable in the fm10k_cache_ring_qos
function.
This was detected by cppcheck and resolves the following style warning
produced by that tool:
[fm10k_main.c:2016]: (style) The scope of the variable 'q_idx' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Reduce the scope of the local err variable in the fm10k_iov_alloc_data
function.
This was detected by cppcheck and resolves the following style warning
produced by that tool:
[fm10k_iov.c:426]: (style) The scope of the variable 'err' can be reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Reduce the scope of the qv vector pointer local variable in the
fm10k_set_coalesce function.
This was detected by cppcheck and resolves the following style warning
produced by that tool:
[fm10k_ethtool.c:658]: (style) The scope of the variable 'qv' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Reduce the scope of the char *p local variable to only the block where
it is used.
This was detected by cppcheck and resolves the following style warning
produced by that tool:
[fm10k_ethtool.c:229]: (style) The scope of the variable 'p' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Reduce the scope of the err local variable in the fm10k_dcbnl_ieee_setets
function.
This was detected using cppcheck, and resolves the following style
warning:
[fm10k_dcbnl.c:37]: (style) The scope of the variable 'err' can be reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: avoid some redundant VTU operations
The mv88e6xxx driver currently uses a mv88e6xxx_vtu_get wrapper to get a
single entry and uses a boolean to eventually initialize a fresh one.
However the fresh entry is only needed in one place and mv88e6xxx_vtu_getnext
is simple enough to call it directly. Doing so makes the code easier to read,
especially for the return code expected by switchdev to honor software VLANs.
In addition to not loading the VTU again when an entry is already correctly
programmed, this also allows to avoid programming the broadcast entries
again when updating a port's membership, from e.g. tagged to untagged.
This patch series removes the mv88e6xxx_vtu_get wrapper in favor of direct
calls to mv88e6xxx_vtu_getnext, and also renames the _mv88e6xxx_port_vlan_add
and _mv88e6xxx_port_vlan_del helpers using an old underscore prefix convention.
In case the port's membership is already correctly programmed in hardware,
the following debug message may be printed:
[ 745.989884] mv88e6085 2188000.ethernet-1:00: p4: already a member of VLAN 42
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Wrapping mv88e6xxx_vtu_getnext makes the code less easy to read and
_mv88e6xxx_port_vlan_add is the only function requiring the preparation
of a new VLAN entry.
To simplify things up, remove the mv88e6xxx_vtu_get wrapper and
explicit the VLAN lookup in _mv88e6xxx_port_vlan_add. This rework
also avoids programming the broadcast entries again when changing a
port's membership, e.g. from tagged to untagged.
At the same time, rename the helper using an old underscore convention.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Wrapping mv88e6xxx_vtu_getnext makes the code less easy to read.
Explicit the call to mv88e6xxx_vtu_getnext in _mv88e6xxx_port_vlan_del
and the return value expected by switchdev in case of software VLANs.
At the same time, rename the helper using an old underscore convention.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mv88e6xxx_vtu_getnext is simple enough to call it directly in the
mv88e6xxx_port_db_load_purge function and explicit the return code
expected by switchdev for software VLANs when an hardware VLAN does
not exist.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mv88e6xxx_vtu_getnext interprets two members from the input
mv88e6xxx_vtu_entry structure: the (excluded) vid member to start
the iteration from, and the valid argument specifying whether the VID
must be written or not (only required once at the start of a loop).
Explicit the assignation of these two fields right before calling
mv88e6xxx_vtu_getnext, as it is done in the mv88e6xxx_vtu_get wrapper.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Lock the mutex in the mv88e6xxx_port_vlan_prepare function
called by the DSA stack, instead of doing it in the internal
mv88e6xxx_port_check_hw_vlan helper.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In some case, we don't want to allow specific tunnel packets
to host that can avoid to take up high CPU (e.g network attacks).
But other tunnel packets which not matched in hardware will be
sent to host too.
$ tc filter add dev vxlan_sys_4789 \
protocol ip chain 0 parent ffff: prio 1 handle 1 \
flower dst_ip 1.1.1.100 ip_proto tcp dst_port 80 \
enc_dst_ip 2.2.2.100 enc_key_id 100 enc_dst_port 4789 \
action tunnel_key unset pipe action drop
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Remove redundant include files.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
When failing to create tx reporter, don't set the reporter's pointer.
Creating a reporter is not mandatory for driver load, avoid
garbage/error pointer.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Return error when failing to create a reporter in devlink. Since
NET_DEVLINK mandatory to MLX5_CORE in Kconfig, returned pointer
can't be NULL and can only hold an error in bad path.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Move vlan checksum fixup flow into mlx5e_skb_padding_csum(), which is
supposed to fixup SKB checksum if needed. And rename
mlx5e_skb_padding_csum() to mlx5e_skb_csum_fixup().
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
If capable, use zero inline mode in TX WQE for non-VLAN packets.
For VLAN ones, keep the enforcement of at least L2 inline mode,
unless the WQE VLAN insertion offload cap is on.
Performance:
Tested single core packet rate of 64Bytes.
NIC: ConnectX-5
CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
pktgen:
Before: 12.46 Mpps
After: 14.65 Mpps (+17.5%)
XDP_TX:
The MPWQE flow is not affected, as it already has this optimization.
So we test with priv-flag xdp_tx_mpwqe: off.
Before: 9.90 Mpps
After: 10.20 Mpps (+3%)
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Tested-by: Noam Stolero <noams@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Instead of passing an output param, let function return the
WQE pointer.
In addition, pass &pi so it gets its value in the function,
and save the redundant assignment that comes after it.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
In MPWQE mode, when transmitting packets with XDP, a packet that is smaller
than a certain size (set to 256 bytes) would be sent inline within its WQE
TX descriptor (mem-copied), in case the hardware tx queue is congested
beyond a pre-defined water-mark.
If a MPWQE cannot contain an additional inline packet, we close this
MPWQE session, and send the packet inlined within the next MPWQE.
To save some MPWQE session close+open operations, we don't open MPWQE
sessions that are contiguously smaller than certain size (set to the
HW MPWQE maximum size). If there isn't enough contiguous room in the
send queue, we fill it with NOPs and wrap the send queue index around.
This way, qualified packets are always sent inline.
Perf tests:
Tested packet rate for UDP 64Byte multi-stream
over two dual port ConnectX-5 100Gbps NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
XDP_TX:
With 24 channels:
| ------ | bounced packets | inlined packets | inline ratio |
| before | 113.6Mpps | 96.3Mpps | 84% |
| after | 115Mpps | 99.5Mpps | 86% |
With one channel:
| ------ | bounced packets | inlined packets | inline ratio |
| before | 6.7Mpps | 0pps | 0% |
| after | 6.8Mpps | 0pps | 0% |
As we can see, there is improvement in both inline ratio and overall
packet rate for 24 channels. Also, we see no degradation for the
one-channel case.
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
We use NOPs to populate the WQ fragment edge if the WQE does not fit
in frag, to avoid WQEs crossing a page boundary (or wrap-around the WQ).
The upper bound on the needed number of NOPs is one WQEBB less than
the largest possible WQE, for otherwise the WQE would certainly fit.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Add a pool of flow counters, based on flow counter bulks, removing the
need to allocate a new counter via a costly FW command during the flow
creation process. The time it takes to acquire/release a flow counter
is cut from ~50 [us] to ~50 [ns].
The pool is part of the mlx5 driver instance, and provides flow
counters for aging flows. mlx5_fc_create() was modified to provide
counters for aging flows from the pool by default, and
mlx5_destroy_fc() was modified to release counters back to the pool
for later reuse. If bulk allocation is not supported or fails, and for
non-aging flows, the fallback behavior is to allocate and free
individual counters.
The pool is comprised of three lists of flow counter bulks, one of
fully used bulks, one of partially used bulks, and one of unused
bulks. Counters are provided from the partially used bulks first, to
help limit bulk fragmentation.
The pool maintains a threshold, and strives to maintain the amount of
available counters below it. The pool is increased in size when a
counter acquisition request is made and there are no available
counters, and it is decreased in size when the last counter in a bulk
is released and there are more available counters than the threshold.
All pool size changes are done in the context of the
acquiring/releasing process.
The value of the threshold is directly correlated to the amount of
used counters the pool is providing, while constrained by a hard
maximum, and is recalculated every time a bulk is allocated/freed.
This ensures that the pool only consumes large amounts of memory for
available counters if the pool is being used heavily. When fully
populated and at the hard maximum, the buffer of available counters
consumes ~40 [MB].
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Add infrastructure to track bulks of flow counters, providing
the means to allocate and deallocate bulks, and to acquire and
release individual counters from the bulks.
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Use the scheduling elements to implement ingress rate limiter on an
eswitch ports ingress traffic. Since the ingress of eswitch port is the
egress of VF port, we control eswitch ingress by controlling VF egress.
Configuration is done using the ports' representor net devices.
Please note that burst size configuration is not supported by devices
ConnectX-5 and earlier generations.
Configuration examples:
tc:
tc filter add dev enp59s0f0_0 root protocol ip matchall action police rate 1mbit burst 20k
ovs:
ovs-vsctl set interface eth0 ingress_policing_rate=1000
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Misc updates from mlx5-next branch.
1) Eli improves the handling of the support for QoS element type
2) Gavi refactors and prepares mlx5 flow counters for bulk allocation
support
3) Parav, refactors and improves E-Switch load/unload flows
4) Saeed, two misc cleanups
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Currently for PF and ECPF vports, representors are created before
their eswitch hardware ports are initialized in below flow.
mlx5_eswitch_enable()
esw_offloads_init()
esw_offloads_load_all_reps()
[..]
esw_enable_vport()
However for VFs, vports are initialized before creating their
respective netdev represnetors in event handling context.
Similarly while disabling eswitch, first hardware vports are disabled,
followed by destroying their representors.
Here while underlying vports gets destroyed but its respective user
facing netdevice can still exist on which user can continue to perform
more offload operations.
Instead, its more accurate to do
enable_eswitch switchdev mode:
1. perform FDB tables initialization
2. initialize hw vport
3. create and publish representor for this vport
disable_eswitch switchdev mode:
1. destroy user facing representor for the vport
2. disable hw vport
3. perform FDB tables cleanup
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
mc_promisc pointer points to an instance of struct esw_mc_addr allocated
as part of the esw structure.
Hence it cannot be NULL.
Removed such redundant check and assign where it is actually used.
While at it, add comment around legacy mode fields and move mc_promisc
close to other legacy mode structures to improve code redability.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|