summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2020-05-16dpaa2-eth: add bulking to XDP_TXIoana Ciornei
Add driver level bulking to the XDP_TX action. An array of frame descriptors is held for each Tx frame queue and populated accordingly when the action returned by the XDP program is XDP_TX. The frames will be actually enqueued only when the array is filled. At the end of the NAPI cycle a flush on the queued frames is performed in order to enqueue the remaining FDs. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16net: phy: broadcom: fix checkpatch complains about tabsKevin Lo
This patch makes checkpatch happy for tabs Signed-off-by: Kevin Lo <kevlo@kevlo.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15Merge tag 'mlx5-updates-2020-05-15' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-05-15 mlx5 core and mlx5e (netdev) updates: 1) Two fixes for release all FW pages support. 2) Improvement in calculating the send queue stop room on tx 3) Flow steering auto-groups creation improvements 4) TC offload fix for Connection tracking with NAT action 5) IPoIB support for self looback to allow communication between ipoib pkey child interfaces on the same host. 6) DCBNL cleanup to avoid #ifdef DCBNL all over the main mlx5e code 7) Small and trivial code cleanup ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15ethernet: ti: am65-cpts: Add missing inline qualifier to stub functionsNathan Chancellor
When building with Clang: In file included from drivers/net/ethernet/ti/am65-cpsw-ethtool.c:15: drivers/net/ethernet/ti/am65-cpts.h:58:12: warning: unused function 'am65_cpts_ns_gettime' [-Wunused-function] static s64 am65_cpts_ns_gettime(struct am65_cpts *cpts) ^ drivers/net/ethernet/ti/am65-cpts.h:63:12: warning: unused function 'am65_cpts_estf_enable' [-Wunused-function] static int am65_cpts_estf_enable(struct am65_cpts *cpts, ^ drivers/net/ethernet/ti/am65-cpts.h:69:13: warning: unused function 'am65_cpts_estf_disable' [-Wunused-function] static void am65_cpts_estf_disable(struct am65_cpts *cpts, int idx) ^ 3 warnings generated. These functions need to be marked as inline, which adds __maybe_unused, to avoid these warnings, which is the pattern for stub functions. Fixes: ec008fa2a9e5 ("ethernet: ti: am65-cpts: add routines to support taprio offload") Link: https://github.com/ClangBuiltLinux/linux/issues/1026 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15net/mlx5e: Take DCBNL-related definitions into dedicated filesTariq Toukan
Take DCBNL-related definitions out of the common en.h header, Use a dedicated header file for exposing them. Some need not to be exposed, use them locally in the .c file. Use stubs to eliminate use of CONFIG_MLX5_CORE_EN_DCB in the generic control flows. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15net/mlx5e: Calculate SQ stop room in a robust wayMaxim Mikityanskiy
Currently, different formulas are used to estimate the space that may be taken by WQEs in the SQ during a single packet transmit. This space is called stop room, and it's checked in the end of packet transmit to find out if the next packet could overflow the SQ. If it could, the driver tells the kernel to stop sending next packets. Many factors affect the stop room: 1. Padding with NOPs to avoid WQEs spanning over page boundaries. 2. Enabled and disabled offloads (TLS, upcoming MPWQE). 3. The maximum size of a WQE. The padding is performed before every WQE if it doesn't fit the current page. The current formula assumes that only one padding will be required per packet, and it doesn't take into account that the WQEs posted during the transmission of a single packet might exceed the page size in very rare circumstances. For example, to hit this condition with 4096-byte pages, TLS offload will have to interrupt an almost-full MPWQE session, be in the resync flow and try to transmit a near to maximum amount of data. To avoid SQ overflows in such rare cases after MPWQE is added, this patch introduces a more robust formula to estimate the stop room. The new formula uses the fact that a WQE of size X will not require more than X-1 WQEBBs of padding. More exact estimations are possible, but they result in much more complex and error-prone code for little gain. Before this patch, the TLS stop room included space for both INNOVA and ConnectX TLS offloads that couldn't run at the same time anyway, so this patch accounts only for the active one. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15net/mlx5e: IPoIB, Drop multicast packets that this interface sentErez Shitrit
After enabled loopback packets for IPoIB, we need to drop these packets that this HCA has replicated and came back to the same interface that sent them. Fixes: 4c6c615e3f30 ("net/mlx5e: IPoIB, Add PKEY child interface nic profile") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15net/mlx5e: IPoIB, Enable loopback packets for IPoIB interfacesErez Shitrit
Enable loopback of unicast and multicast traffic for IPoIB enhanced mode. This will allow interfaces with the same pkey to communicate between them e.g cloned interfaces that located in different namespaces. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15net/mlx5e: CT: Fix offload with CT action after CT NAT actionRoi Dayan
It could be a chain of rules will do action CT again after CT NAT Before this fix matching will break as we get into the CT table after NAT changes and not CT NAT. Fix this by adding pre ct and pre ct nat tables to skip ct/ct_nat tables and go straight to post_ct table if ct/nat was already done. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15net/mlx5: Move internal timer read function to clock libraryEran Ben Elisha
Move mlx5_read_internal_timer() into lib/clock.c file as it is being used there. As such, make this function a static one. In addition, rearrange headers include to support function move. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15net/mlx5: Wait for inactive autogroupsPaul Blakey
Currently, if one thread tries to add an entry to an autogrouped table with no free matching group, while another thread is in the process of creating a new matching autogroup, it doesn't wait for the new group creation, and creates an unnecessary new autogroup. Instead of skipping inactive, wait on the write lock of those groups. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15net/mlx5: Drain wq first during PCI device removalParav Pandit
mlx5_unload_one() is done with cleanup = true only once. So instead of doing health wq drain inside the if(), directly do during PCI device removal. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15net/mlx5: Have single error unwinding pathParav Pandit
Having multiple error unwinding path are error prone. Lets have just one error unwinding path. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15net/mlx5: Fix a bug of releasing wrong chunks on > 4K page size systemsEran Ben Elisha
On systems with page size larger than 4K, a fwp object has few 4K chunks. Fix a bug in fwp free flow where the chunk address was dropped and fwp->addr was used instead (first chunk address). This caused a wrong update of fwp->bitmask which later can cause errors in re-alloc fwp chunk flow. In order to fix this it, re-factor the release flow: - Free 4k: Releases a specific 4k chunk inside the fwp, defined by starting address. - Free fwp: Unconditionally release the whole fwp and its resources. Free addr will call free fwp if all chunks were released, in order to do code sharing. In addition, fix npages to count for all released chunks correctly. Fixes: c6168161f693 ("net/mlx5: Add support for release all pages event") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15net/mlx5: Dedicate fw page to the requesting functionEran Ben Elisha
The cited patch assumes that all chuncks in a fw page belong to the same function, thus the driver must dedicate fw page to the requesting function, which is actually what was intedned in the original fw pages allocator design, hence the fwp->func_id ! Up until the cited patch everything worked ok, but now "relase all pages" is broken on systems with page_size > 4k. Fix this by dedicating fw page to the requesting function id via adding a func_id parameter to alloc_4k() function. Fixes: c6168161f693 ("net/mlx5: Add support for release all pages event") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Move the bpf verifier trace check into the new switch statement in HEAD. Resolve the overlapping changes in hinic, where bug fixes overlap the addition of VF support. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix sk_psock reference count leak on receive, from Xiyu Yang. 2) CONFIG_HNS should be invisible, from Geert Uytterhoeven. 3) Don't allow locking route MTUs in ipv6, RFCs actually forbid this, from Maciej Żenczykowski. 4) ipv4 route redirect backoff wasn't actually enforced, from Paolo Abeni. 5) Fix netprio cgroup v2 leak, from Zefan Li. 6) Fix infinite loop on rmmod in conntrack, from Florian Westphal. 7) Fix tcp SO_RCVLOWAT hangs, from Eric Dumazet. 8) Various bpf probe handling fixes, from Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits) selftests: mptcp: pm: rm the right tmp file dpaa2-eth: properly handle buffer size restrictions bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_range bpf: Restrict bpf_probe_read{, str}() only to archs where they work MAINTAINERS: Mark networking drivers as Maintained. ipmr: Add lockdep expression to ipmr_for_each_table macro ipmr: Fix RCU list debugging warning drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.c net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810 tcp: fix error recovery in tcp_zerocopy_receive() MAINTAINERS: Add Jakub to networking drivers. MAINTAINERS: another add of Karsten Graul for S390 networking drivers: ipa: fix typos for ipa_smp2p structure doc pppoe: only process PADT targeted at local interfaces selftests/bpf: Enforce returning 0 for fentry/fexit programs bpf: Enforce returning 0 for fentry/fexit progs net: stmmac: fix num_por initialization security: Fix the default value of secid_to_secctx hook libbpf: Fix register naming in PT_REGS s390 macros ...
2020-05-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "A few minor bug fixes for user visible defects, and one regression: - Various bugs from static checkers and syzkaller - Add missing error checking in mlx4 - Prevent RTNL lock recursion in i40iw - Fix segfault in cxgb4 in peer abort cases - Fix a regression added in 5.7 where the IB_EVENT_DEVICE_FATAL could be lost, and wasn't delivered to all the FDs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/uverbs: Move IB_EVENT_DEVICE_FATAL to destroy_uobj RDMA/uverbs: Do not discard the IB_EVENT_DEVICE_FATAL event RDMA/iw_cxgb4: Fix incorrect function parameters RDMA/core: Fix double put of resource IB/core: Fix potential NULL pointer dereference in pkey cache IB/hfi1: Fix another case where pq is left on waitlist IB/i40iw: Remove bogus call to netdev_master_upper_dev_get() IB/mlx4: Test return value of calls to ib_get_cached_pkey RDMA/rxe: Always return ERR_PTR from rxe_create_mmap_info() i40iw: Fix error handling in i40iw_manage_arp_cache()
2020-05-15net: phy: tja11xx: execute cable test on link upOleksij Rempel
A typical 100Base-T1 link should be always connected. If the link is in a shot or open state, it is a failure. In most cases, we won't be able to automatically handle this issue, but we need to log it or notify user (if possible). With this patch, the cable will be tested on "ip l s dev .. up" attempt and send ethnl notification to the user space. This patch was tested with TJA1102 PHY and "ethtool --monitor" command. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15net: phy: broadcom: add support for BCM54811 PHYKevin Lo
The BCM54811 PHY shares many similarities with the already supported BCM54810 PHY but additionally requires some semi-unique configuration. Signed-off-by: Kevin Lo <kevlo@kevlo.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15cxgb4: add EOTID tracking and software context dumpRahul Lakkireddy
Rework and add support for dumping EOTID software context used by TC-MQPRIO. Also track number of EOTIDs in use. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15cxgb4: tune burst buffer size for TC-MQPRIO offloadRahul Lakkireddy
For each traffic class, firmware handles up to 4 * MTU amount of data per burst cycle. Under heavy load, this small buffer size is a bottleneck when buffering large TSO packets in <= 1500 MTU case. Increase the burst buffer size to 8 * MTU when supported. Also, keep the driver's traffic class configuration API similar to the firmware API counterpart. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15cxgb4: improve credits recovery in TC-MQPRIO Tx pathRahul Lakkireddy
Request credit update for every half credits consumed, including the current request. Also, avoid re-trying to post packets when there are no credits left. The credit update reply via interrupt will eventually restore the credits and will invoke the Tx path again. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-05-15 The following pull-request contains BPF updates for your *net-next* tree. We've added 37 non-merge commits during the last 1 day(s) which contain a total of 67 files changed, 741 insertions(+), 252 deletions(-). The main changes are: 1) bpf_xdp_adjust_tail() now allows to grow the tail as well, from Jesper. 2) bpftool can probe CONFIG_HZ, from Daniel. 3) CAP_BPF is introduced to isolate user processes that use BPF infra and to secure BPF networking services by dropping CAP_SYS_ADMIN requirement in certain cases, from Alexei. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15net: dsa: mt7530: fix VLAN setupDENG Qingfang
Allow DSA to add VLAN entries even if VLAN filtering is disabled, so enabling it will not block the traffic of existent ports in the bridge Signed-off-by: DENG Qingfang <dqfext@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15dpaa2-eth: properly handle buffer size restrictionsIoana Ciornei
Depending on the WRIOP version, the buffer size on the RX path must by a multiple of 64 or 256. Handle this restriction properly by aligning down the buffer size to the necessary value. Also, use the new buffer size dynamically computed instead of the compile time one. Fixes: 27c874867c4e ("dpaa2-eth: Use a single page per Rx buffer") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15Merge tag 'hwmon-for-v5.7-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Fix ADC access synchronization problem with da9052 driver - Fix temperature limit and status reporting in nct7904 driver - Fix drivetemp temperature reporting if SCT is supported but SCT data tables are not. * tag 'hwmon-for-v5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (da9052) Synchronize access with mfd hwmon: (nct7904) Fix incorrect range of temperature limit registers hwmon: (nct7904) Read all SMI status registers in probe function hwmon: (drivetemp) Fix SCT support if SCT data tables are not supported
2020-05-15Merge tag 'drm-fixes-2020-05-15' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "As mentioned last week an i915 PR came in late, but I left it, so the i915 bits of this cover 2 weeks, which is why it's likely a bit larger than usual. Otherwise it's mostly amdgpu fixes, one tegra fix, one meson fix. i915: - Handle idling during i915_gem_evict_something busy loops (Chris) - Mark current submissions with a weak-dependency (Chris) - Propagate error from completed fences (Chris) - Fixes on execlist to avoid GPU hang situation (Chris) - Fixes couple deadlocks (Chris) - Timeslice preemption fixes (Chris) - Fix Display Port interrupt handling on Tiger Lake (Imre) - Reduce debug noise around Frame Buffer Compression (Peter) - Fix logic around IPC W/a for Coffee Lake and Kaby Lake (Sultan) - Avoid dereferencing a dead context (Chris) tegra: - tegra120/4 smmu fixes amdgpu: - Clockgating fixes - Fix fbdev with scatter/gather display - S4 fix for navi - Soft recovery for gfx10 - Freesync fixes - Atomic check cursor fix - Add a gfxoff quirk - MST fix amdkfd: - Fix GEM reference counting meson: - error code propogation fix" * tag 'drm-fixes-2020-05-15' of git://anongit.freedesktop.org/drm/drm: (29 commits) drm/i915: Handle idling during i915_gem_evict_something busy loops drm/meson: pm resume add return errno branch drm/amd/amdgpu: Update update_config() logic drm/amd/amdgpu: add raven1 part to the gfxoff quirk list drm/i915: Mark concurrent submissions with a weak-dependency drm/i915: Propagate error from completed fences drm/i915/gvt: Fix kernel oops for 3-level ppgtt guest drm/i915/gvt: Init DPLL/DDI vreg for virtual display instead of inheritance. drm/amd/display: add basic atomic check for cursor plane drm/amd/display: Fix vblank and pageflip event handling for FreeSync drm/amdgpu: implement soft_recovery for gfx10 drm/amdgpu: enable hibernate support on Navi1X drm/amdgpu: Use GEM obj reference for KFD BOs drm/amdgpu: force fbdev into vram drm/amd/powerplay: perform PG ungate prior to CG ungate drm/amdgpu: drop unnecessary cancel_delayed_work_sync on PG ungate drm/amdgpu: disable MGCG/MGLS also on gfx CG ungate drm/i915/execlists: Track inflight CCID drm/i915/execlists: Avoid reusing the same logical CCID drm/i915/gem: Remove object_is_locked assertion from unpin_from_display_plane ...
2020-05-15bpf: Implement CAP_BPFAlexei Starovoitov
Implement permissions as stated in uapi/linux/capability.h In order to do that the verifier allow_ptr_leaks flag is split into four flags and they are set as: env->allow_ptr_leaks = bpf_allow_ptr_leaks(); env->bypass_spec_v1 = bpf_bypass_spec_v1(); env->bypass_spec_v4 = bpf_bypass_spec_v4(); env->bpf_capable = bpf_capable(); The first three currently equivalent to perfmon_capable(), since leaking kernel pointers and reading kernel memory via side channel attacks is roughly equivalent to reading kernel memory with cap_perfmon. 'bpf_capable' enables bounded loops, precision tracking, bpf to bpf calls and other verifier features. 'allow_ptr_leaks' enable ptr leaks, ptr conversions, subtraction of pointers. 'bypass_spec_v1' disables speculative analysis in the verifier, run time mitigations in bpf array, and enables indirect variable access in bpf programs. 'bypass_spec_v4' disables emission of sanitation code by the verifier. That means that the networking BPF program loaded with CAP_BPF + CAP_NET_ADMIN will have speculative checks done by the verifier and other spectre mitigation applied. Such networking BPF program will not be able to leak kernel pointers and will not be able to access arbitrary kernel memory. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200513230355.7858-3-alexei.starovoitov@gmail.com
2020-05-15Merge tag 'drm-misc-fixes-2020-05-14' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Just one meson patch this time to propagate an error code Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20200514073538.wvdtv5s2mt4wdrdj@gilmour.lan
2020-05-14mlx5: Rx queue setup time determine frame_sz for XDPJesper Dangaard Brouer
The mlx5 driver have multiple memory models, which are also changed according to whether a XDP bpf_prog is attached. The 'rx_striding_rq' setting is adjusted via ethtool priv-flags e.g.: # ethtool --set-priv-flags mlx5p2 rx_striding_rq off On the general case with 4K page_size and regular MTU packet, then the frame_sz is 2048 and 4096 when XDP is enabled, in both modes. The info on the given frame size is stored differently depending on the RQ-mode and encoded in a union in struct mlx5e_rq union wqe/mpwqe. In rx striding mode rq->mpwqe.log_stride_sz is either 11 or 12, which corresponds to 2048 or 4096 (MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ). In non-striding mode (MLX5_WQ_TYPE_CYCLIC) the frag_stride is stored in rq->wqe.info.arr[0].frag_stride, for the first fragment, which is what the XDP case cares about. To reduce effect on fast-path, this patch determine the frame_sz at setup time, to avoid determining the memory model runtime. Variable is named frame0_sz to make it clear that this is only the frame size of the first fragment. This mlx5 driver does a DMA-sync on XDP_TX action, but grow is safe as it have done a DMA-map on the entire PAGE_SIZE. The driver also already does a XDP length check against sq->hw_mtu on the possible XDP xmit paths mlx5e_xmit_xdp_frame() + mlx5e_xmit_xdp_frame_mpwqe(). V3+4: Change variable name first_frame_sz to frame0_sz V2: Fix that frag_size need to be recalc before creating SKB. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Link: https://lore.kernel.org/bpf/158945348021.97035.12295039384250022883.stgit@firesoul
2020-05-14xdp: For Intel AF_XDP drivers add XDP frame_szJesper Dangaard Brouer
Intel drivers implement native AF_XDP zerocopy in separate C-files, that have its own invocation of bpf_prog_run_xdp(). The setup of xdp_buff is also handled in separately from normal code path. This patch update XDP frame_sz for AF_XDP zerocopy drivers i40e, ice and ixgbe, as the code changes needed are very similar. Introduce a helper function xsk_umem_xdp_frame_sz() for calculating frame size. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Björn Töpel <bjorn.topel@intel.com> Cc: intel-wired-lan@lists.osuosl.org Cc: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/158945347511.97035.8536753731329475655.stgit@firesoul
2020-05-14ice: Add XDP frame size to driverJesper Dangaard Brouer
This driver uses different memory models depending on PAGE_SIZE at compile time. For PAGE_SIZE 4K it uses page splitting, meaning for normal MTU frame size is 2048 bytes (and headroom 192 bytes). For larger MTUs the driver still use page splitting, by allocating order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than 4K, driver instead advance its rx_buffer->page_offset with the frame size "truesize". For XDP frame size calculations, this mean that in PAGE_SIZE larger than 4K mode the frame_sz change on a per packet basis. For the page split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be updated once outside the main NAPI loop. The default setting in the driver uses build_skb(), which provides the necessary headroom and tailroom for XDP-redirect in RX-frame (in both modes). There is one complication, which is legacy-rx mode (configurable via ethtool priv-flags). There are zero headroom in this mode, which is a requirement for XDP-redirect to work. The conversion to xdp_frame (convert_to_xdp_frame) will detect this insufficient space, and xdp_do_redirect() call will fail. This is deemed acceptable, as it allows other XDP actions to still work in legacy-mode. In legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also accept that xdp_adjust_tail shrink doesn't work. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: intel-wired-lan@lists.osuosl.org Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Link: https://lore.kernel.org/bpf/158945347002.97035.328088795813704587.stgit@firesoul
2020-05-14i40e: Add XDP frame size to driverJesper Dangaard Brouer
This driver uses different memory models depending on PAGE_SIZE at compile time. For PAGE_SIZE 4K it uses page splitting, meaning for normal MTU frame size is 2048 bytes (and headroom 192 bytes). For larger MTUs the driver still use page splitting, by allocating order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than 4K, driver instead advance its rx_buffer->page_offset with the frame size "truesize". For XDP frame size calculations, this mean that in PAGE_SIZE larger than 4K mode the frame_sz change on a per packet basis. For the page split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be updated once outside the main NAPI loop. The default setting in the driver uses build_skb(), which provides the necessary headroom and tailroom for XDP-redirect in RX-frame (in both modes). There is one complication, which is legacy-rx mode (configurable via ethtool priv-flags). There are zero headroom in this mode, which is a requirement for XDP-redirect to work. The conversion to xdp_frame (convert_to_xdp_frame) will detect this insufficient space, and xdp_do_redirect() call will fail. This is deemed acceptable, as it allows other XDP actions to still work in legacy-mode. In legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also accept that xdp_adjust_tail shrink doesn't work. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: intel-wired-lan@lists.osuosl.org Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Link: https://lore.kernel.org/bpf/158945346494.97035.12809400414566061815.stgit@firesoul
2020-05-14ixgbevf: Add XDP frame size to VF driverJesper Dangaard Brouer
This patch mirrors the changes to ixgbe in previous patch. This VF driver doesn't support XDP_REDIRECT, but correct tailroom is still necessary for BPF-helper xdp_adjust_tail. In legacy-mode + larger PAGE_SIZE, due to lacking tailroom, we accept that xdp_adjust_tail shrink doesn't work. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: intel-wired-lan@lists.osuosl.org Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Link: https://lore.kernel.org/bpf/158945345984.97035.13518286183248025173.stgit@firesoul
2020-05-14ixgbe: Add XDP frame size to driverJesper Dangaard Brouer
This driver uses different memory models depending on PAGE_SIZE at compile time. For PAGE_SIZE 4K it uses page splitting, meaning for normal MTU frame size is 2048 bytes (and headroom 192 bytes). For larger MTUs the driver still use page splitting, by allocating order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than 4K, driver instead advance its rx_buffer->page_offset with the frame size "truesize". For XDP frame size calculations, this mean that in PAGE_SIZE larger than 4K mode the frame_sz change on a per packet basis. For the page split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be updated once outside the main NAPI loop. The default setting in the driver uses build_skb(), which provides the necessary headroom and tailroom for XDP-redirect in RX-frame (in both modes). There is one complication, which is legacy-rx mode (configurable via ethtool priv-flags). There are zero headroom in this mode, which is a requirement for XDP-redirect to work. The conversion to xdp_frame (convert_to_xdp_frame) will detect this insufficient space, and xdp_do_redirect() call will fail. This is deemed acceptable, as it allows other XDP actions to still work in legacy-mode. In legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also accept that xdp_adjust_tail shrink doesn't work. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: intel-wired-lan@lists.osuosl.org Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Link: https://lore.kernel.org/bpf/158945345455.97035.14334355929030628741.stgit@firesoul
2020-05-14ixgbe: Fix XDP redirect on archs with PAGE_SIZE above 4KJesper Dangaard Brouer
The ixgbe driver have another memory model when compiled on archs with PAGE_SIZE above 4096 bytes. In this mode it doesn't split the page in two halves, but instead increment rx_buffer->page_offset by truesize of packet (which include headroom and tailroom for skb_shared_info). This is done correctly in ixgbe_build_skb(), but in ixgbe_rx_buffer_flip which is currently only called on XDP_TX and XDP_REDIRECT, it forgets to add the tailroom for skb_shared_info. This breaks XDP_REDIRECT, for veth and cpumap. Fix by adding size of skb_shared_info tailroom. Maintainers notice: This fix have been queued to Jeff. Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Link: https://lore.kernel.org/bpf/158945344946.97035.17031588499266605743.stgit@firesoul
2020-05-14virtio_net: Add XDP frame size in two code pathsJesper Dangaard Brouer
The virtio_net driver is running inside the guest-OS. There are two XDP receive code-paths in virtio_net, namely receive_small() and receive_mergeable(). The receive_big() function does not support XDP. In receive_small() the frame size is available in buflen. The buffer backing these frames are allocated in add_recvbuf_small() with same size, except for the headroom, but tailroom have reserved room for skb_shared_info. The headroom is encoded in ctx pointer as a value. In receive_mergeable() the frame size is more dynamic. There are two basic cases: (1) buffer size is based on a exponentially weighted moving average (see DECLARE_EWMA) of packet length. Or (2) in case virtnet_get_headroom() have any headroom then buffer size is PAGE_SIZE. The ctx pointer is this time used for encoding two values; the buffer len "truesize" and headroom. In case (1) if the rx buffer size is underestimated, the packet will have been split over more buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of buffer area). If that happens the XDP path does a xdp_linearize_page operation. V3: Adjust frame_sz in receive_mergeable() case, spotted by Jason Wang. The code is really hard to follow, so some hints to reviewers. The receive_mergeable() case gets frames that were allocated in add_recvbuf_mergeable() which uses headroom=virtnet_get_headroom(), and 'buf' ptr is advanced this headroom. The headroom can only be 0 or VIRTIO_XDP_HEADROOM, as virtnet_get_headroom is really simple: static unsigned int virtnet_get_headroom(struct virtnet_info *vi) { return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0; } As frame_sz is an offset size from xdp.data_hard_start, reviewers should notice how this is calculated in receive_mergeable(): int offset = buf - page_address(page); [...] data = page_address(xdp_page) + offset; xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len; The calculated offset will always be VIRTIO_XDP_HEADROOM when reaching this code. Thus, xdp.data_hard_start will be page-start address plus vi->hdr_len. Given this xdp.frame_sz need to be reduced with vi->hdr_len size. IMHO a followup patch should cleanup this code to make it easier to maintain and understand, but it is outside the scope of this patchset. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/bpf/158945344436.97035.9445115070189151680.stgit@firesoul
2020-05-14vhost_net: Also populate XDP frame sizeJesper Dangaard Brouer
In vhost_net_build_xdp() the 'buf' that gets queued via an xdp_buff have embedded a struct tun_xdp_hdr (located at xdp->data_hard_start) which contains the buffer length 'buflen' (with tailroom for skb_shared_info). Also storing this buflen in xdp->frame_sz, does not obsolete struct tun_xdp_hdr, as it also contains a struct virtio_net_hdr with other information. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/bpf/158945343928.97035.4620233649151726289.stgit@firesoul
2020-05-14tun: Add XDP frame sizeJesper Dangaard Brouer
The tun driver have two code paths for running XDP (bpf_prog_run_xdp). In both cases 'buflen' contains enough tailroom for skb_shared_info. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/bpf/158945343419.97035.9594485183958037621.stgit@firesoul
2020-05-14nfp: Add XDP frame size to netronome driverJesper Dangaard Brouer
The netronome nfp driver use PAGE_SIZE when xdp_prog is set, but xdp.data_hard_start begins at offset NFP_NET_RX_BUF_HEADROOM. Thus, adjust for this when setting xdp.frame_sz, as it counts from data_hard_start. When doing XDP_TX this driver is smart and instead of a full DMA-map does a DMA-sync on with packet length. As xdp_adjust_tail can now grow packet length, add checks to make sure that grow size is within the DMA-mapped size. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/158945342911.97035.11214251236208648808.stgit@firesoul
2020-05-14net: thunderx: Add XDP frame sizeJesper Dangaard Brouer
To help reviewers these are the defines related to RCV_FRAG_LEN #define DMA_BUFFER_LEN 1536 /* In multiples of 128bytes */ #define RCV_FRAG_LEN (SKB_DATA_ALIGN(DMA_BUFFER_LEN + NET_SKB_PAD) + \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Sunil Goutham <sgoutham@marvell.com> Cc: Robert Richter <rrichter@marvell.com> Link: https://lore.kernel.org/bpf/158945342402.97035.12649844447148990032.stgit@firesoul
2020-05-14mlx4: Add XDP frame size and adjust max XDP MTUJesper Dangaard Brouer
The mlx4 drivers size of memory backing the RX packet is stored in frag_stride. For XDP mode this will be PAGE_SIZE (normally 4096). For normal mode frag_stride is 2048. Also adjust MLX4_EN_MAX_XDP_MTU to take tailroom into account. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Link: https://lore.kernel.org/bpf/158945341893.97035.2688142527052329942.stgit@firesoul
2020-05-14ena: Add XDP frame size to amazon NIC driverJesper Dangaard Brouer
Frame size ENA_PAGE_SIZE is limited to 16K on systems with larger PAGE_SIZE than 16K. Change ENA_XDP_MAX_MTU to also take into account the reserved tailroom. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Sameeh Jubran <sameehj@amazon.com> Cc: Arthur Kiyanovski <akiyano@amazon.com> Link: https://lore.kernel.org/bpf/158945341384.97035.907403694833419456.stgit@firesoul
2020-05-14net: ethernet: ti: Add XDP frame size to driver cpswJesper Dangaard Brouer
The driver code cpsw.c and cpsw_new.c both use page_pool with default order-0 pages or their RX-pages. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/bpf/158945340875.97035.752144756428532878.stgit@firesoul
2020-05-14qlogic/qede: Add XDP frame size to driverJesper Dangaard Brouer
The driver qede uses a full page, when XDP is enabled. The drivers value in rx_buf_seg_size (struct qede_rx_queue) will be PAGE_SIZE when an XDP bpf_prog is attached. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Ariel Elior <aelior@marvell.com> Cc: GR-everest-linux-l2@marvell.com Link: https://lore.kernel.org/bpf/158945340366.97035.7764939691580349618.stgit@firesoul
2020-05-14hv_netvsc: Add XDP frame size to driverJesper Dangaard Brouer
The hyperv NIC driver does memory allocation and copy even without XDP. In XDP mode it will allocate a new page for each packet and copy over the payload, before invoking the XDP BPF-prog. The positive thing it that its easy to determine the xdp.frame_sz. The XDP implementation for hv_netvsc transparently passes xdp_prog to the associated VF NIC. Many of the Azure VMs are using SRIOV, so majority of the data are actually processed directly on the VF driver's XDP path. So the overhead of the synthetic data path (hv_netvsc) is minimal. Then XDP is enabled on this driver, XDP_PASS and XDP_TX will create the SKB via build_skb (based on the newly allocated page). Now using XDP frame_sz this will provide more skb_tailroom, which netstack can use for SKB coalescing (e.g tcp_try_coalesce -> skb_try_coalesce). V3: Adjust patch desc to be more positive. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Link: https://lore.kernel.org/bpf/158945339857.97035.10212138582505736163.stgit@firesoul
2020-05-14dpaa2-eth: Add XDP frame sizeJesper Dangaard Brouer
The dpaa2-eth driver reserve some headroom used for hardware and software annotation area in RX/TX buffers. Thus, xdp.data_hard_start doesn't start at page boundary. When XDP is configured the area reserved via dpaa2_fd_get_offset(fd) is 448 bytes of which XDP have reserved 256 bytes. As frame_sz is calculated as an offset from xdp_buff.data_hard_start, an adjust from the full PAGE_SIZE == DPAA2_ETH_RX_BUF_RAW_SIZE. When doing XDP_REDIRECT, the driver doesn't need this reserved headroom any-longer and allows xdp_do_redirect() to use it. This is an advantage for the drivers own ndo-xdp_xmit, as it uses part of this headroom for itself. Patch also adjust frame_sz in this case. The driver cannot support XDP data_meta, because it uses the headroom just before xdp.data for struct dpaa2_eth_swa (DPAA2_ETH_SWA_SIZE=64), when transmitting the packet. When transmitting a xdp_frame in dpaa2_eth_xdp_xmit_frame (call via ndo_xdp_xmit) is uses this area to store a pointer to xdp_frame and dma_size, which is used in TX completion (free_tx_fd) to return frame via xdp_return_frame(). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Ioana Radulescu <ruxandra.radulescu@nxp.com> Link: https://lore.kernel.org/bpf/158945339348.97035.8562488847066908856.stgit@firesoul
2020-05-14veth: Xdp using frame_sz in veth driverJesper Dangaard Brouer
The veth driver can run XDP in "native" mode in it's own NAPI handler, and since commit 9fc8d518d9d5 ("veth: Handle xdp_frames in xdp napi ring") packets can come in two forms either xdp_frame or skb, calling respectively veth_xdp_rcv_one() or veth_xdp_rcv_skb(). For packets to arrive in xdp_frame format, they will have been redirected from an XDP native driver. In case of XDP_PASS or no XDP-prog attached, the veth driver will allocate and create an SKB. The current code in veth_xdp_rcv_one() xdp_frame case, had to guess the frame truesize of the incoming xdp_frame, when using veth_build_skb(). With xdp_frame->frame_sz this is not longer necessary. Calculating the frame_sz in veth_xdp_rcv_skb() skb case, is done similar to the XDP-generic handling code in net/core/dev.c. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Link: https://lore.kernel.org/bpf/158945338840.97035.935897116345700902.stgit@firesoul
2020-05-14veth: Adjust hard_start offset on redirect XDP framesJesper Dangaard Brouer
When native XDP redirect into a veth device, the frame arrives in the xdp_frame structure. It is then processed in veth_xdp_rcv_one(), which can run a new XDP bpf_prog on the packet. Doing so requires converting xdp_frame to xdp_buff, but the tricky part is that xdp_frame memory area is located in the top (data_hard_start) memory area that xdp_buff will point into. The current code tried to protect the xdp_frame area, by assigning xdp_buff.data_hard_start past this memory. This results in 32 bytes less headroom to expand into via BPF-helper bpf_xdp_adjust_head(). This protect step is actually not needed, because BPF-helper bpf_xdp_adjust_head() already reserve this area, and don't allow BPF-prog to expand into it. Thus, it is safe to point data_hard_start directly at xdp_frame memory area. Fixes: 9fc8d518d9d5 ("veth: Handle xdp_frames in xdp napi ring") Reported-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158945338331.97035.5923525383710752178.stgit@firesoul