summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-03-23net: aquantia: Introduce rx refill threshold valueIgor Russkikh
Before that, we've refilled ring even on single descriptor move. Under high packet load that caused page allocation logic to be triggered too often. That made overall ring processing slower. Moreover, with page buffer reuse implemented, we should give a chance higher networking levels to process received packets faster, release the pages they consumed and therefore give a higher chance for these pages to be reused. RX ring is now refilled only when AQ_CFG_RX_REFILL_THRES or more descriptors were processed (32 by default). Under regular traffic this gives quite enough time for packet to be consumed and page to be reused. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23net: aquantia: optimize rx performance by page reuse strategyIgor Russkikh
We introduce internal aq_rxpage wrapper over regular page where extra field is tracked: rxpage offset inside of allocated page. This offset allows to reuse one page for multiple packets. When needed (for example with large frames processing), allocated pageorder could be customized. This gives even larger page reuse efficiency. page_ref_count is used to track page users. If during rx refill underlying page has users, we increase pg_off by rx frame size thus the top half of the page is reused. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23net: aquantia: optimize rx path using larger preallocated skb lenIgor Russkikh
Atlantic driver used 14 bytes preallocated skb size. That made L3 protocol processing inefficient because pskb_pull had to fetch all the L3/L4 headers from extra fragments. Specially on UDP flows that caused extra packet drops because CPU was overloaded with pskb_pull. This patch uses eth_get_headlen for skb preallocation. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23Merge tag 'mlx5-updates-2019-03-20' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-03-20 This series includes updates to mlx5 driver, 1) Compiler warnings cleanup from Saeed Mahameed 2) Parav Pandit simplifies sriov enable/disables 3) Gustavo A. R. Silva, Removes a redundant assignment 4) Moshe Shemesh, Adds Geneve tunnel stateless offload support 5) Eli Britstein, Adds the Support for VLAN modify action and Replaces TC VLAN pop and push actions with VLAN modify Note: This series includes two simple non-mlx5 patches, 1) Declare IANA_VXLAN_UDP_PORT definition in include/net/vxlan.h, and use it in some drivers. 2) Declare GENEVE_UDP_PORT definition in include/net/geneve.h, and use it in mlx5 and nfp drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2019-03-22 This series contains updates to ice driver only. Akeem enables MAC anti-spoofing by default when a new VSI is being created. Fixes an issue when reclaiming VF resources back to the pool after reset, by freeing VF resources separately using the first VF vector index to traverse the list, instead of starting at the last assigned vectors list. Added support for VF & PF promiscuous mode in the ice driver. Fixed the PF driver from letting the VF know it is "not trusted" when it attempts to add more than its permitted additional MAC addresses. Altered how the driver gets the VF VSIs instances, instead of using the mailbox messages to retrieve VSIs, get it directly via the VF object in the PF data structure. Bruce fixes return values to resolve static analysis warnings. Made whitespace changes to increase readability and reduce code wrapping. Anirudh cleans up code by removing a function prototype that was never implemented and removed an unused field in the ice_sched_vsi_info structure. Kiran fixes a potential divide by zero issue by adding a check. Victor cleans up the transmit scheduler by adjusting the stack variable usage and added/modified debug prints to make them more useful. Yashaswini updates the driver in VEB mode to ensure that the LAN_EN bit is set if all the right conditions are met. Christopher ensures the loopback enable bit is not set for prune switch rules, since all transmit traffic would be looped back to the internal switch and dropped. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23Merge branch 'tcp-rx-tx-cache'David S. Miller
Eric Dumazet says: ==================== tcp: add rx/tx cache to reduce lock contention On hosts with many cpus we can observe a very serious contention on spinlocks used in mm slab layer. The following can happen quite often : 1) TX path sendmsg() allocates one (fclone) skb on CPU A, sends a clone. ACK is received on CPU B, and consumes the skb that was in the retransmit queue. 2) RX path network driver allocates skb on CPU C recvmsg() happens on CPU D, freeing the skb after it has been delivered to user space. In both cases, we are hitting the asymetric alloc/free pattern for which slab has to drain alien caches. At 8 Mpps per second, this represents 16 Mpps alloc/free per second and has a huge penalty. In an interesting experiment, I tried to use a single kmem_cache for all the skbs (in skb_init() : skbuff_fclone_cache = skbuff_head_cache = kmem_cache_create("skbuff_fclone_cache", sizeof(struct sk_buff_fclones),); qnd most of the contention disappeared, since cpus could better use their local slab per-cpu cache. But we can do actually better, in the following patches. TX : at ACK time, no longer free the skb but put it back in a tcp socket cache, so that next sendmsg() can reuse it immediately. RX : at recvmsg() time, do not free the skb but put it in a tcp socket cache so that it can be freed by the cpu feeding the incoming packets in BH. This increased the performance of small RPC benchmark by about 10 % on a host with 112 hyperthreads. v2 : - Solved a race condition : sk_stream_alloc_skb() to make sure the prior clone has been freed. - Really test rps_needed in sk_eat_skb() as claimed. - Fixed rps_needed use in drivers/net/tun.c v3: Added a #ifdef CONFIG_RPS, to avoid compile error (kbuild robot) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23tcp: add one skb cache for rxEric Dumazet
Often times, recvmsg() system calls and BH handling for a particular TCP socket are done on different cpus. This means the incoming skb had to be allocated on a cpu, but freed on another. This incurs a high spinlock contention in slab layer for small rpc, but also a high number of cache line ping pongs for larger packets. A full size GRO packet might use 45 page fragments, meaning that up to 45 put_page() can be involved. More over performing the __kfree_skb() in the recvmsg() context adds a latency for user applications, and increase probability of trapping them in backlog processing, since the BH handler might found the socket owned by the user. This patch, combined with the prior one increases the rpc performance by about 10 % on servers with large number of cores. (tcp_rr workload with 10,000 flows and 112 threads reach 9 Mpps instead of 8 Mpps) This also increases single bulk flow performance on 40Gbit+ links, since in this case there are often two cpus working in tandem : - CPU handling the NIC rx interrupts, feeding the receive queue, and (after this patch) freeing the skbs that were consumed. - CPU in recvmsg() system call, essentially 100 % busy copying out data to user space. Having at most one skb in a per-socket cache has very little risk of memory exhaustion, and since it is protected by socket lock, its management is essentially free. Note that if rps/rfs is used, we do not enable this feature, because there is high chance that the same cpu is handling both the recvmsg() system call and the TCP rx path, but that another cpu did the skb allocations in the device driver right before the RPS/RFS logic. To properly handle this case, it seems we would need to record on which cpu skb was allocated, and use a different channel to give skbs back to this cpu. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23tcp: add one skb cache for txEric Dumazet
On hosts with a lot of cores, RPC workloads suffer from heavy contention on slab spinlocks. 20.69% [kernel] [k] queued_spin_lock_slowpath 5.64% [kernel] [k] _raw_spin_lock 3.83% [kernel] [k] syscall_return_via_sysret 3.48% [kernel] [k] __entry_text_start 1.76% [kernel] [k] __netif_receive_skb_core 1.64% [kernel] [k] __fget For each sendmsg(), we allocate one skb, and free it at the time ACK packet comes. In many cases, ACK packets are handled by another cpus, and this unfortunately incurs heavy costs for slab layer. This patch uses an extra pointer in socket structure, so that we try to reuse the same skb and avoid these expensive costs. We cache at most one skb per socket so this should be safe as far as memory pressure is concerned. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23net: convert rps_needed and rfs_needed to new static branch apiEric Dumazet
We prefer static_branch_unlikely() over static_key_false() these days. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23Merge branch 'net-dev-BYPASS-for-lockless-qdisc'David S. Miller
Paolo Abeni says: ==================== net: dev: BYPASS for lockless qdisc This patch series is aimed at improving xmit performances of lockless qdisc in the uncontended scenario. After the lockless refactor pfifo_fast can't leverage the BYPASS optimization. Due to retpolines the overhead for the avoidables enqueue and dequeue operations has increased and we see measurable regressions. The first patch introduces the BYPASS code path for lockless qdisc, and the second one optimizes such path further. Overall this avoids up to 3 indirect calls per xmit packet. Detailed performance figures are reported in the 2nd patch. v2 -> v3: - qdisc_is_empty() has a const argument (Eric) v1 -> v2: - use really an 'empty' flag instead of 'not_empty', as suggested by Eric ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23net: dev: introduce support for sch BYPASS for lockless qdiscPaolo Abeni
With commit c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") pfifo_fast no longer benefit from the TCQ_F_CAN_BYPASS optimization. Due to retpolines the cost of the enqueue()/dequeue() pair has become relevant and we observe measurable regression for the uncontended scenario when the packet-rate is below line rate. After commit 46b1c18f9deb ("net: sched: put back q.qlen into a single location") we can check for empty qdisc with a reasonably fast operation even for nolock qdiscs. This change extends TCQ_F_CAN_BYPASS support to nolock qdisc. The new chunk of code mirrors closely the existing one for traditional qdisc, leveraging a newly introduced helper to read atomically the qdisc length. Tested with pktgen in queue xmit mode, with pfifo_fast, a MQ device, and MQ root qdisc: threads vanilla patched kpps kpps 1 2465 2889 2 4304 5188 4 7898 9589 Same as above, but with a single queue device: threads vanilla patched kpps kpps 1 2556 2827 2 2900 2900 4 5000 5000 8 4700 4700 No mesaurable changes in the contended scenarios, and more 10% improvement in the uncontended ones. v1 -> v2: - rebased after flag name change Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23net: sched: add empty status flag for NOLOCK qdiscPaolo Abeni
The queue is marked not empty after acquiring the seqlock, and it's up to the NOLOCK qdisc clearing such flag on dequeue. Since the empty status lays on the same cache-line of the seqlock, it's always hot on cache during the updates. This makes the empty flag update a little bit loosy. Given the lack of synchronization between enqueue and dequeue, this is unavoidable. v2 -> v3: - qdisc_is_empty() has a const argument (Eric) v1 -> v2: - use really an 'empty' flag instead of 'not_empty', as suggested by Eric Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23tcp: add documentation for tcp_ca_stateSoheil Hassas Yeganeh
Add documentation to the tcp_ca_state enum, since this enum is exposed in uapi. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Cc: Sowmini Varadhan <sowmini05@gmail.com> Acked-by: Sowmini Varadhan <sowmini05@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23tcp: remove conditional branches from tcp_mstamp_refresh()Eric Dumazet
tcp_clock_ns() (aka ktime_get_ns()) is using monotonic clock, so the checks we had in tcp_mstamp_refresh() are no longer relevant. This patch removes cpu stall (when the cache line is not hot) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23net: phy: Correct Cygnus/Omega PHY driver promptFlorian Fainelli
The tristate prompt should have been replaced rather than defined a few lines below, rebase mistake. Fixes: 17cc9821766c ("net: phy: Move Omega PHY entry to Cygnus PHY driver") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-22net/mlx5e: Replace TC VLAN pop and push actions with VLAN modifyEli Britstein
Changing the VLAN header may be implemented by pop the existing header and push a new one. Translate those operations as VLAN modify. Applicable for use cases such as OVS where the controller translates a vlan modify meta (OF) rule to DP pop+push actions rule. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-22net/mlx5e: Support VLAN modify actionEli Britstein
Support VLAN modify action by emulating a rewrite action for the VLAN fields. Currently, the only supported field is the vid. The prio in the action must be set to 0 to indicate no change. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-22net/mlx5e: Add VLAN ID rewrite fieldsEli Britstein
Add VLAN ID rewrite fields as a pre-step to support this rewrite. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-22net: Add IANA_VXLAN_UDP_PORT definition to vxlan header fileMoshe Shemesh
Added IANA_VXLAN_UDP_PORT (4789) definition to vxlan header file so it can be used by drivers instead of local definition. Updated drivers which locally defined it as 4789 to use it. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: John Hurley <john.hurley@netronome.com> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Yunsheng Lin <linyunsheng@huawei.com> Cc: Peng Li <lipeng321@huawei.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-22net/mlx5e: TX, Add geneve tunnel stateless offload supportMoshe Shemesh
Currently support only default geneve udp port (6081). For the tx side, the HW is assisted by SW parsing, which sets the headers offset to offload tunneled LSO and csum. Note that for udp tunnels, we don't use special rx offloads, as rss on the outer headers is enough, we support checksum complete and GRO takes care of aggregation. Geneve TSO BW and CPU load results (tested using iperf single tcp stream). In this patch we add TSO support over Geneve, so the "before" result doesn't actually get to using the TSO HW offload even when turned on. Tested on ConnectX-5, Intel(R) Xeon(R) CPU E5-2660 v2 @2.20GHz. __________________________________ | Before | After | |________________|_________________| | 12.6 Gbits/sec | 21.7 Gbits/sec | | 100% CPU load | 61.5% CPU load | |________________|_________________| Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-22net/mlx5e: Take SW parser code to a separate functionMoshe Shemesh
Refactor mlx5e_ipsec_set_swp() code, split the part which sets the eseg software parser (SWP) offsets and flags, so it can be used in a downstream patch by other mlx5e functionality which needs to set eseg SWP. The new function mlx5e_set_eseg_swp() is useful for setting swp for both outer and inner headers. It also handles the special ipsec case of xfrm mode transfer. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-22net: Move the definition of the default Geneve udp port to public header fileMoshe Shemesh
Move the definition of the default Geneve udp port from the geneve source to the header file, so we can re-use it from drivers. Modify existing drivers to use it. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: John Hurley <john.hurley@netronome.com> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-22net/mlx5e: Remove redundant assignmentGustavo A. R. Silva
Remove redundant assignment to tun_entropy->enabled. Addesses-Coverity-ID: 1477328 ("Unused value") Fixes: 97417f6182f8 ("net/mlx5e: Fix GRE key by controlling port tunnel entropy calculation") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-22net/mlx5e: Fix compilation warning in en_tc.cSaeed Mahameed
Amazingly a mlx5e_tc function is being called from the eswitch layer, which is by itself very terrible! The function was declared locally in eswitch_offloads.c so it could be used there, which caused the following compilation warning, fix that. drivers/.../mlx5/core/en_tc.c:3242:6: [-Werror=missing-prototypes] error: no previous prototype for ‘mlx5e_tc_clean_fdb_peer_flows’ Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows") Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-22net/mlx5e: Fix port buffer function documentation formatSaeed Mahameed
This patch fixes compiler warnings: In drivers/.../mlx5/core/en/port_buffer.c:190: warning: Function parameter or member 'pfc_en' not described... ... warning: Function parameter or member 'change' not described... Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-22net/mlx5: Fix compilation warning in eq.cSaeed Mahameed
mlx5_eq_table_get_rmap is being used only when CONFIG_RFS_ACCEL is enabled, this patch fixes the below warning when CONFIG_RFS_ACCEL is disabled. drivers/.../mlx5/core/eq.c:903:18: [-Werror=missing-prototypes] error: no previous prototype for ‘mlx5_eq_table_get_rmap’ Fixes: f2f3df550139 ("net/mlx5: EQ, Privatize eq_table and friends") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-22net/mlx5: Simplify mlx5_sriov_is_enabled() by using pci core APIParav Pandit
It is desired to get rid of num_vfs stored inside mlx5_core_sriov to safely support vports more than vfs. To reduce dependency on mlx5_core_sriov num_vfs, start using pci_num_vf() from pci core. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-22net/mlx5: Rename total_vfs to total_vportsParav Pandit
Macro MLX5_TOTAL_VPORTS() returns total number of vports. Therefore, rename variable total_vfs to total_vports to improve code readability. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-22net/mlx5: Simplify sriov enable/disable flowParav Pandit
Simplify sriov enable/disable flow for below two checks. 1. PCI core driver allows sriov configuration only on a PF. This is done in drivers/pci/pci-sysfs.c sriov_attrs_are_visible(). 2. PCI core driver allow sriov enablement if the sriov is currently disabled for for a PF. This is done in drivers/pci/pci-sysfs.c sriov_numvfs_store(). Hence there is no need for mlx5 driver to duplicate such checks. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-22ice: Get VF VSI instances directly via PFAkeem G Abodunrin
This patch changes how we get VF VSIs instances. Instead of relying on mailbox virtual channel message to retrieve VSI, it is more reliable getting it directly via VF object in PF data structure. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-22ice: Don't let VF know that it is untrustedAkeem G Abodunrin
Don't let the VF know it's not trusted when it tries to add more than permitted additional MAC addresses. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-22ice: Set LAN_EN for all directional rulesYashaswini Raghuram Prathivadi Bhayankaram
The LAN_EN bit for a switch rule determines if the packet can go out on the wire or not. Set the LAN_EN flag in the switch action for all directional rules. Signed-off-by: Yashaswini Raghuram Prathivadi Bhayankaram <yashaswini.raghuram.prathivadi.bhayankaram@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-22ice: Do not set LB_EN for prune switch rulesChristopher N Bednarz
LB_EN for prune switch rules was causing all TX traffic to loopback to the internal switch and dropped. When running bi-directional stress workloads with RDMA the RDPU would hang blocking tx and rx traffic. Signed-off-by: Christopher N Bednarz <christopher.n.bednarz@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-22ice: Enable LAN_EN for the right recipesYashaswini Raghuram Prathivadi Bhayankaram
In VEB mode, enable LAN_EN bit in the action fields for filter rules corresponding to the right recipes. Signed-off-by: Yashaswini Raghuram Prathivadi Bhayankaram <yashaswini.raghuram.prathivadi.bhayankaram@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-22ice: Add support for PF/VF promiscuous modeAkeem G Abodunrin
Implement support for VF promiscuous mode, MAC/VLAN/MAC_VLAN and PF multicast MAC/VLAN/MAC_VLAN promiscuous mode. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-22ice: code cleanup in ice_sched.cVictor Raj
This patch does some clean up in the Tx scheduler code: 1. Adjust the stack variable usage 2. Modify the debug prints to display the FW error 3. Add additional debug prints while adding/removing VSIs Signed-off-by: Victor Raj <victor.raj@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-22ice: Remove unused vsi_id fieldAnirudh Venkataramanan
Remove unused vsi_id field from struct ice_sched_vsi_info. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-22ice: fix some function prototype and signature style issuesBruce Allan
Put the return type on a separate line for function prototypes and signatures that would exceed the 80-character limit if both were on the same line. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-22ice: fix the divide by zero issueKiran Patil
Static analysis flagged a potential divide by zero error because vsi->num_rxq can become zero in certain condition and it is used as divisor. Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-22ice: Fix issue reconfiguring VF queuesAkeem G Abodunrin
When VF requested for queues changes, we need to update LAN Tx queue with correct number of VF queue pairs and re-allocate VF resources based on this new requested number of queues, which is constraint within maximum queue supported per VF. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-22ice: Remove unused function prototypeAnirudh Venkataramanan
Commit 7c710869d64e ("ice: Add handlers for VF netdevice operations") seems to have inadvertently introduced a function prototype for ice_set_vf_bw that isn't implemented. Remove it. Fixes: 7c710869d64e ("ice: Add handlers for VF netdevice operations") Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-22ice: fix static analysis warningsBruce Allan
cppcheck warns "Identical condition '<var>', second condition is always false". Fix them. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-22ice: Fix issue reclaiming resources back to the pool after resetAkeem G Abodunrin
This patch fixes issue reclaiming VF resources back to the pool after reset - Since we only allocate HW vector for all VFs and track together with resources allocation for PF with ice_search_res, we need to free VFs resources separately, using first VF vector index to traverse the list. Otherwise tracker starts from the last assigned vectors list and causes maximum supported number of HW vectors, 1024 to be exhausted, depending on the number of VFs enabled, which causes a lot of unwanted issues, and failed to reassign vectors for VFs. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-22ice: Enable MAC anti-spoof by defaultAkeem G Abodunrin
This patch enables MAC anti-spoof by default, with creation of VF VSIs or when the VF VSIs are being re-initialized. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-22genetlink: make policy common to familyJohannes Berg
Since maxattr is common, the policy can't really differ sanely, so make it common as well. The only user that did in fact manage to make a non-common policy is taskstats, which has to be really careful about it (since it's still using a common maxattr!). This is no longer supported, but we can fake it using pre_doit. This reduces the size of e.g. nl80211.o (which has lots of commands): text data bss dec hex filename 398745 14323 2240 415308 6564c net/wireless/nl80211.o (before) 397913 14331 2240 414484 65314 net/wireless/nl80211.o (after) -------------------------------- -832 +8 0 -824 Which is obviously just 8 bytes for each command, and an added 8 bytes for the new policy pointer. I'm not sure why the ops list is counted as .text though. Most of the code transformations were done using the following spatch: @ops@ identifier OPS; expression POLICY; @@ struct genl_ops OPS[] = { ..., { - .policy = POLICY, }, ... }; @@ identifier ops.OPS; expression ops.POLICY; identifier fam; expression M; @@ struct genl_family fam = { .ops = OPS, .maxattr = M, + .policy = POLICY, ... }; This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing the cb->data as ops, which we want to change in a later genl patch. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-22r8169: use netif_start_queue instead of netif_wake_qeueue in rtl8169_start_xmitHeiner Kallweit
Replace the call to netif_wake_queue in rtl8169_start_xmit with netif_start_queue as we don't need to actually wake up the queue since we are still in mid transmit so we just need to reset the bit so it doesn't prevent the next transmit. (Description shamelessly copied from a mail sent by Alex.) Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-22net: phy: aquantia: add downshift supportHeiner Kallweit
Aquantia PHY's of the AQR107 family support the downshift feature. Add support for it as standard PHY tunable so that it can be controlled via ethtool. The AQCS109 supports a proprietary 2-pair 1Gbps mode. If two such PHY's are connected to each other with a 2-pair cable, they may not be able to establish a link if both advertise modes > 1Gbps. v2: - add downshift event detection - warn if downshift occurred - read downshifted rate from vendor register - enable downshift per default on all AQR107 family members Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21Merge branch 'Refactor-flower-classifier-to-remove-dependency-on-rtnl-lock'David S. Miller
Vlad Buslov says: ==================== Refactor flower classifier to remove dependency on rtnl lock Currently, all netlink protocol handlers for updating rules, actions and qdiscs are protected with single global rtnl lock which removes any possibility for parallelism. This patch set is a third step to remove rtnl lock dependency from TC rules update path. Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added. TC rule update handlers (RTM_NEWTFILTER, RTM_DELTFILTER, etc.) are already registered with this flag and only take rtnl lock when qdisc or classifier requires it. Classifiers can indicate that their ops callbacks don't require caller to hold rtnl lock by setting the TCF_PROTO_OPS_DOIT_UNLOCKED flag. The goal of this change is to refactor flower classifier to support unlocked execution and register it with unlocked flag. This patch set implements following changes to make flower classifier concurrency-safe: - Implement reference counting for individual filters. Change fl_get to take reference to filter. Implement tp->ops->put callback that was introduced in cls API patch set to release reference to flower filter. - Use tp->lock spinlock to protect internal classifier data structures from concurrent modification. - Handle concurrent tcf proto deletion by returning EAGAIN, which will cause cls API to retry and create new proto instance or return error to the user (depending on message type). - Handle concurrent insertion of filter with same priority and handle by returning EAGAIN, which will cause cls API to lookup filter again and process it accordingly to netlink message flags. - Extend flower mask with reference counting and protect masks list with masks_lock spinlock. - Prevent concurrent mask insertion by inserting temporary value to masks hash table. This is necessary because mask initialization is a sleeping operation and cannot be done while holding tp->lock. Both chain level and classifier level conflicts are resolved by returning -EAGAIN to cls API that results restart of whole operation. This retry mechanism is a result of fine-grained locking approach used in this and previous changes in series and is necessary to allow concurrent updates on same chain instance. Alternative approach would be to lock the whole chain while updating filters on any of child tp's, adding and removing classifier instances from the chain. However, since most CPU-intensive parts of filter update code are specifically in classifier code and its dependencies (extensions and hw offloads), such approach would negate most of the gains introduced by this change and previous changes in the series when updating same chain instance. Tcf hw offloads API is not changed by this patch set and still requires caller to hold rtnl lock. Refactored flower classifier tracks rtnl lock state by means of 'rtnl_held' flag provided by cls API and obtains the lock before calling hw offloads. Following patch set will lift this restriction and refactor cls hw offloads API to support unlocked execution. With these changes flower classifier is safely registered with TCF_PROTO_OPS_DOIT_UNLOCKED flag in last patch. Changes from V2 to V3: - Rebase on latest net-next Changes from V1 to V2: - Extend cover letter with explanation about retry mechanism. - Rebase on current net-next. - Patch 1: - Use rcu_dereference_raw() for tp->root dereference. - Update comment in fl_head_dereference(). - Patch 2: - Remove redundant check in fl_change error handling code. - Add empty line between error check and new handle assignment. - Patch 3: - Refactor loop in fl_get_next_filter() to improve readability. - Patch 4: - Refactor __fl_delete() to improve readability. - Patch 6: - Fix comment in fl_check_assign_mask(). - Patch 9: - Extend commit message. - Fix error code in comment. - Patch 11: - Fix fl_hw_replace_filter() to always release rtnl lock in error handlers. - Patch 12: - Don't take rtnl lock before calling __fl_destroy_filter() in workqueue context. - Extend commit message with explanation why flower still takes rtnl lock before calling hardware offloads API. Github: <https://github.com/vbuslov/linux/tree/unlocked-flower-cong3> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: sched: flower: set unlocked flag for flower proto opsVlad Buslov
Set TCF_PROTO_OPS_DOIT_UNLOCKED for flower classifier to indicate that its ops callbacks don't require caller to hold rtnl lock. Don't take rtnl lock in fl_destroy_filter_work() that is executed on workqueue instead of being called by cls API and is not affected by setting TCF_PROTO_OPS_DOIT_UNLOCKED. Rtnl mutex is still manually taken by flower classifier before calling hardware offloads API that has not been updated for unlocked execution. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: sched: flower: track rtnl lock stateVlad Buslov
Use 'rtnl_held' flag to track if caller holds rtnl lock. Propagate the flag to internal functions that need to know rtnl lock state. Take rtnl lock before calling tcf APIs that require it (hw offload, bind filter, etc.). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>