linux.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2021-06-07	netfilter: conntrack: Introduce tcp offload timeout configuration	Oz Shlomo
	TCP connections may be offloaded from nf conntrack to nf flow table. Offloaded connections are aged after 30 seconds of inactivity. Once aged, ownership is returned to conntrack with a hard coded pickup time of 120 seconds, after which the connection may be deleted. eted. The current aging intervals may be too aggressive for some users. Provide users with the ability to control the nf flow table offload aging and pickup time intervals via sysctl parameter as a pre-step for configuring the nf flow table GC timeout intervals. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-07	netfilter: nftables: add nf_ct_pernet() helper function	Pablo Neira Ayuso
	Consolidate call to net_generic(net, nf_conntrack_net_id) in this wrapper function. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-07	netfilter: nf_tables: remove nft_ctx_init_from_setattr()	Pablo Neira Ayuso
	Replace nft_ctx_init_from_setattr() by nft_table_lookup(). This patch also disentangles nf_tables_delset() where NFTA_SET_TABLE is required while nft_ctx_init_from_setattr() allows it to be optional. From the nf_tables_delset() path, this also allows to set up the context structure when it is needed. Removing this helper function saves us 14 LoC, so it is not helping to consolidate code. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-07	netfilter: nf_tables: remove nft_ctx_init_from_elemattr()	Pablo Neira Ayuso
	Replace nft_ctx_init_from_elemattr() by nft_table_lookup() and set up the context structure right before it is really needed. Moreover, nft_ctx_init_from_elemattr() is setting up the context structure for codepaths where this is not really needed at all. This helper function is also not helping to consolidate code, removing it saves us 4 LoC. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-07	netfilter: nfnetlink: add struct nfgenmsg to struct nfnl_info and use it	Pablo Neira Ayuso
	Update the nfnl_info structure to add a pointer to the nfnetlink header. This simplifies the existing codebase since this header is usually accessed. Update existing clients to use this new field. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-04	net: dsa: xrs700x: allow HSR/PRP supervision dupes for node_table	George McCollister
	Add an inbound policy filter which matches the HSR/PRP supervision MAC range and forwards to the CPU port without discarding duplicates. This is required to correctly populate time_in[A] and time_in[B] in the HSR/PRP node_table. Leave the policy disabled by default and enable/disable it when joining/leaving hsr. Signed-off-by: George McCollister <george.mccollister@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	net:cxgb3: fix incorrect work cancellation	Íñigo Huguet
	In my last changes in commit 5e0b8928927f I introduced a copy-paste bug, leading to cancel twice qresume_task work for OFLD queue, and never the one for CTRL queue. This patch cancels correctly both works. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	net: bridge: mrp: Update ring transitions.	Horatiu Vultur
	According to the standard IEC 62439-2, the number of transitions needs to be counted for each transition 'between' ring state open and ring state closed and not from open state to closed state. Therefore fix this for both ring and interconnect ring. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	net: enetc: use get/put_unaligned helpers for MAC address handling	Michael Walle
	The supplied buffer for the MAC address might not be aligned. Thus doing a 32bit (or 16bit) access could be on an unaligned address. For now, enetc is only used on aarch64 which can do unaligned accesses, thus there is no error. In any case, be correct and use the get/put_unaligned helpers. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	Merge branch 'hdlc_x25-cleanups'	David S. Miller
	Peng Li says: ==================== net: hdlc_x25: clean up some code style issues This patchset clean up some code style issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	net: hdlc_x25: fix the alignment issue	Peng Li
	Alignment should match open parenthesis. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	net: hdlc_x25: fix the code issue about "if..else.."	Peng Li
	According to the chackpatch.pl, else should follow close brace '}'. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	net: hdlc_x25: add some required spaces	Peng Li
	Add spaces required around that '='. Add space required after that ','. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	net: hdlc_x25: move out assignment in if condition	Peng Li
	Should not use assignment in if condition. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	net: hdlc_x25: remove unnecessary out of memory message	Peng Li
	This patch removes unnecessary out of memory message, to fix the following checkpatch.pl warning: "WARNING: Possible unnecessary 'out of memory' message" Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	net: hdlc_x25: remove redundant blank lines	Peng Li
	This patch removes some redundant blank lines. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	Merge branch '1GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 1GbE Intel Wired LAN Driver Updates 2021-06-04 This series contains updates to igc driver only. Sasha utilizes the newly introduced ethtool_sprintf() function, removes unused defines, and fixes indentation. Muhammad adds support for hardware VLAN insertion and stripping. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	tipc: Return the correct errno code	Zheng Yongjun
	When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	Merge branch 'mptcp-timestamps'	David S. Miller
	Mat Martineau says: ==================== mptcp: Add timestamp support Enable the SO_TIMESTAMP and SO_TIMESTAMPING socket options for MPTCP sockets and add receive path cmsg support for timestamps. Patches 1, 2, and 5 expose existing sock and tcp helpers for timestamps (no new EXPORT_SYMBOLS()s). Patch 3 propagates timestamp options to subflows. Patch 4 cleans up MPTCP handling of SOL_SOCKET options. Patch 6 adds timestamp csmg data when receiving on sockets that have been configured for timestamps. Patch 7 adds self test coverage for timestamps. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	selftests: mptcp_connect: add SO_TIMESTAMPNS cmsg support	Florian Westphal
	This extends the existing setsockopt test case to also check for cmsg timestamps. mptcp_connect will abort/fail if the setockopt was passed but the timestamp cmsg isn't present after successful recvmsg(). Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	mptcp: receive path cmsg support	Florian Westphal
	This adds support for SO_TIMESTAMP(NS). Timestamps are passed to userspace in the same way as for plain tcp sockets. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	tcp: export timestamp helpers for mptcp	Florian Westphal
	MPTCP is builtin, so no need to add EXPORT_SYMBOL()s. It will be used to support SO_TIMESTAMP(NS) ancillary messages in the mptcp receive path. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	mptcp: setsockopt: handle SOL_SOCKET in one place only	Florian Westphal
	Move the pre-check to the function that handles all SOL_SOCKET values. At this point there is complete coverage for all values that were accepted by the pre-check. BUSYPOLL functions are accepted but will not have any functionality yet until its clear how the expected mptcp behaviour should look like. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	mptcp: sockopt: propagate timestamp request to subflows	Florian Westphal
	This adds support for TIMESTAMP(NS) setsockopt. This doesn't make things work yet, because the mptcp receive path doesn't convert the skb timestamps to cmsgs for userspace consumption. receive path cmsg support is added ina followup patch. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	sock: expose so_timestamping options for mptcp	Florian Westphal
	Similar to previous patch: expose SO_TIMESTAMPING helper so we do not have to copy & paste this into the mptcp core. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	sock: expose so_timestamp options for mptcp	Florian Westphal
	This exports SO_TIMESTAMP_* function for re-use by MPTCP. Without this there is too much copy & paste needed to support this from mptcp setsockopt path. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04	igc: Enable HW VLAN Insertion and HW VLAN Stripping	Muhammad Husaini Zulkifli
	Add HW VLAN acceleration protocol handling. In case of HW VLAN tagging, we need that protocol available in the ndo_start_xmit(), so that it will be stored in a new fields in the skb. HW offloading is set to OFF by default. Users are allow to turn on/off Rx/Tx HW VLAN acceleration via ethtool. Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-04	igc: Indentation fixes	Sasha Neftin
	Minor fix of indentation in igc_defines.h Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-04	igc: Remove unused MDICNFG register	Sasha Neftin
	The MDICNFG register from igc registers is not used so remove it. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-04	igc: Remove unused asymmetric pause bit from igc defines	Sasha Neftin
	The CR_1000T_ASYM_PAUSE bit from igc defines is not used so remove it. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-04	igc: Update driver to use ethtool_sprintf	Sasha Neftin
	Complete to commit c8d4725e985d ("intel: Update drivers to use ethtool_sprintf") Update the igc driver to make use of ethtool_sprintf. The general idea is to reduce code size and overhead by replacing the repeated pattern of string printf statements and ETH_STRING_LEN counter increments. Suggested-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-03	netdevsim: Fix unsigned being compared to less than zero	Colin Ian King
	The comparison of len < 0 is always false because len is a size_t. Fix this by making len a ssize_t instead. Addresses-Coverity: ("Unsigned compared against 0") Fixes: d395381909a3 ("netdevsim: Add max_vfs to bus_dev") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	icmp: fix lib conflict with trinity	Andreas Roeseler
	Including <linux/in.h> and <netinet/in.h> in the dependencies breaks compilation of trinity due to multiple definitions. <linux/in.h> is only used in <linux/icmp.h> to provide the definition of the struct in_addr, but this can be substituted out by using the datatype __be32. Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	net: ethernet: rmnet: Restructure if checks to avoid uninitialized warning	Nathan Chancellor
	Clang warns that proto in rmnet_map_v5_checksum_uplink_packet() might be used uninitialized: drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:283:14: warning: variable 'proto' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (skb->protocol == htons(ETH_P_IPV6)) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:295:36: note: uninitialized use occurs here check = rmnet_map_get_csum_field(proto, trans); ^~~~~ drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:283:10: note: remove the 'if' if its condition is always true } else if (skb->protocol == htons(ETH_P_IPV6)) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:270:11: note: initialize the variable 'proto' to silence this warning u8 proto; ^ = '\0' 1 warning generated. This is technically a false positive because there is an if statement above this one that checks skb->protocol for not being either ETH_P_IP{,V6}. However, it is more obvious to sink that into the if statement as an else branch, which makes the code clearer and fixes the warning. At the same time, move the "IS_ENABLED(CONFIG_IPV6)" into the else if condition so that the else branch of the preprocessor conditional can be shared, since there is no build failure with CONFIG_IPV6 disabled. Fixes: b6e5d27e32ef ("net: ethernet: rmnet: Add support for MAPv5 egress packets") Link: https://github.com/ClangBuiltLinux/linux/issues/1390 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	net: ks8851: Make ks8851_read_selftest() return void	Nathan Chancellor
	clang points out that ret in ks8851_read_selftest() is set but unused: drivers/net/ethernet/micrel/ks8851_common.c:1028:6: warning: variable 'ret' set but not used [-Wunused-but-set-variable] int ret = 0; ^ 1 warning generated. The return code of this function has never been checked so just remove ret and make the function return void. Fixes: 3ba81f3ece3c ("net: Micrel KS8851 SPI network driver") Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	sch_htb: fix doc warning in htb_add_to_id_tree()	Yu Kuai
	Add description for parameters of htb_add_to_id_tree() to fix gcc W=1 warnings: net/sched/sch_htb.c:282: warning: Function parameter or member 'root' not described in 'htb_add_to_id_tree' net/sched/sch_htb.c:282: warning: Function parameter or member 'cl' not described in 'htb_add_to_id_tree' net/sched/sch_htb.c:282: warning: Function parameter or member 'prio' not described in 'htb_add_to_id_tree' Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	bonding: remove redundant initialization of variable ret	Colin Ian King
	The variable ret is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	net: phy: marvell: use phy_modify_changed() for marvell_set_polarity()	Russell King
	Rather than open-coding the phy_modify_changed() sequence, use this helper in marvell_set_polarity(). Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	Merge branch 'ipa-inline-csum'	David S. Miller
	Alex Elder says: ==================== net: ipa: support inline checksum offload Inline offload--required for checksum offload support on IPA version 4.5 and above--is now supported by the RMNet driver: https://lore.kernel.org/netdev/162259440606.2786.10278242816453240434.git-patchwork-notify@kernel.org/ Add support for it in the IPA driver, and revert the commit that disabled it pending acceptance of the RMNet code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	Revert "net: ipa: disable checksum offload for IPA v4.5+"	Alex Elder
	This reverts commit c88c34fcf8f501d588c0a999aa7e51e18552c5f0. The RMNet driver now supports inline checksum offload. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	net: ipa: add support for inline checksum offload	Alex Elder
	Starting with IPA v4.5, IP payload checksum offload is implemented differently. Prior to v4.5, the IPA hardware appends an rmnet_map_dl_csum_trailer structure to each packet if checksum offload is enabled in the download direction (modem->AP). In the upload direction (AP->modem) a rmnet_map_ul_csum_header structure is prepended before each sent packet. Starting with IPA v4.5, checksum offload is implemented using a single new rmnet_map_v5_csum_header structure which sits between the QMAP header and the packet data. The same header structure is used in both directions. The new header contains a header type (CSUM_OFFLOAD); a checksum flag; and a flag indicating whether any other headers follow this one. The checksum flag indicates whether the hardware should compute (and insert) the checksum on a sent packet. On a received packet the checksum flag indicates whether the hardware confirms the checksum value in the payload is correct. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	Merge tag 'mlx5-updates-2021-06-03' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== This series provides misc updates for mlx5 drivers. For more information please see tag log below. Please pull and let me know if there is any problem. mlx5-updates-2021-06-03 This series contains misc updates for mlx5 driver 1) Alaa disables advanced features when kdump mode to save on memory 2) Jakub counts all link flap events 3) Meir adds support for IPoIB NDR speed 4) Various misc cleanup ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	net:cxgb3: fix code style issues	Íñigo Huguet
	Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	net:cxgb3: replace tasklets with works	Íñigo Huguet
	OFLD and CTRL TX queues can be stopped if there is no room in their DMA rings. If this happens, they're tried to be restarted later after having made some room in the corresponding ring. The tasks of restarting these queues were triggered using tasklets, but they can be replaced for workqueue works, getting them out of softirq context. This queues stop/restart probably doesn't happen often and they can be quite lengthy because they try to send all pending skbs. Moreover, given that probably the ring is not empty yet, so the DMA still has work to do, we don't need to be so fast to justify using tasklets/softirq instead of running in a thread. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	net: tcp better handling of reordering then loss cases	Yuchung Cheng
	This patch aims to improve the situation when reordering and loss are ocurring in the same flight of packets. Previously the reordering would first induce a spurious recovery, then the subsequent ACK may undo the cwnd (based on the timestamps e.g.). However the current loss recovery does not proceed to invoke RACK to install a reordering timer. If some packets are also lost, this may lead to a long RTO-based recovery. An example is https://groups.google.com/g/bbr-dev/c/OFHADvJbTEI The solution is to after reverting the recovery, always invoke RACK to either mount the RACK timer to fast retransmit after the reordering window, or restarts the recovery if new loss is identified. Hence it is possible the sender may go from Recovery to Disorder/Open to Recovery again in one ACK. Reported-by: mingkun bian <bianmingkun@gmail.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	net: bonding: Use strscpy_pad() instead of manually-truncated strncpy()	Kees Cook
	Silence this warning by using strscpy_pad() directly: drivers/net/bonding/bond_main.c:4877:3: warning: 'strncpy' specified bound 16 equals destination size [-Wstringop-truncation] 4877 \| strncpy(params->primary, primary, IFNAMSIZ); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Additionally replace other strncpy() uses, as it is considered deprecated: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/202102150705.fdR6obB0-lkp@intel.com Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	net: vlan: Avoid using strncpy()	Kees Cook
	Use strscpy_pad() instead of strncpy() which is considered deprecated: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	Merge branch 'NVMeTCP-Offload-ULP'	David S. Miller
	Shai Malin says: ==================== NVMeTCP Offload ULP With the goal of enabling a generic infrastructure that allows NVMe/TCP offload devices like NICs to seamlessly plug into the NVMe-oF stack, this patch series introduces the nvme-tcp-offload ULP host layer, which will be a new transport type called "tcp-offload" and will serve as an abstraction layer to work with vendor specific nvme-tcp offload drivers. NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes both the TCP level and the NVMeTCP level. The nvme-tcp-offload transport can co-exist with the existing tcp and other transports. The tcp offload was designed so that stack changes are kept to a bare minimum: only registering new transports. All other APIs, ops etc. are identical to the regular tcp transport. Representing the TCP offload as a new transport allows clear and manageable differentiation between the connections which should use the offload path and those that are not offloaded (even on the same device). The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma: * NVMe layer: * [ nvme/nvme-fabrics/blk-mq ] \| (nvme API and blk-mq API) \| \| * Vendor agnostic transport layer: * [ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ] \| \| \| (Verbs) \| \| \| \| (Socket) \| \| \| \| \| (nvme-tcp-offload API) \| \| \| \| \| \| * Vendor Specific Driver: * \| \| \| [ qedr ] \| \| [ qede ] \| [ qedn ] Performance: ============ With this implementation on top of the Marvell qedn driver (using the Marvell FastLinQ NIC), we were able to demonstrate the following CPU utilization improvement: On AMD EPYC 7402, 2.80GHz, 28 cores: - For 16K queued read IOs, 16jobs, 4qd (50Gbps line rate): Improved the CPU utilization from 15.1% with NVMeTCP SW to 4.7% with NVMeTCP offload. On Intel(R) Xeon(R) Gold 5122 CPU, 3.60GHz, 16 cores: - For 512K queued read IOs, 16jobs, 4qd (25Gbps line rate): Improved the CPU utilization from 16.3% with NVMeTCP SW to 1.1% with NVMeTCP offload. In addition, we were able to demonstrate the following latency improvement: - For 200K read IOPS (16 jobs, 16 qd, with fio rate limiter): Improved the average latency from 105 usec with NVMeTCP SW to 39 usec with NVMeTCP offload. Improved the 99.99 tail latency from 570 usec with NVMeTCP SW to 91 usec with NVMeTCP offload. The end-to-end offload latency was measured from fio while running against back end of null device. Upstream plan: ============== The RFC series "NVMeTCP Offload ULP and QEDN Device Driver" https://lore.kernel.org/netdev/20210531225222.16992-1-smalin@marvell.com/ was designed in a modular way so that part 1 (nvme-tcp-offload) and part 2 (qed) are independent and part 3 (qedn) depends on both parts 1+2. - Part 1 (RFC patch 1-8): NVMeTCP Offload ULP The nvme-tcp-offload patches, will be sent to 'linux-nvme@lists.infradead.org'. - Part 2 (RFC patches 9-15): QED NVMeTCP Offload The qed infrastructure, will be sent to 'netdev@vger.kernel.org'. Once part 1 and 2 are accepted: - Part 3 (RFC patches 16-27): QEDN NVMeTCP Offload The qedn patches, will be sent to 'linux-nvme@lists.infradead.org'. Marvell is fully committed to maintain, test, and address issues with the new nvme-tcp-offload layer. Usage: ====== With the Marvell NVMeTCP offload design, the network-device (qede) and the offload-device (qedn) are paired on each port - Logically similar to the RDMA model. The user will interact with the network-device in order to configure the ip/vlan. The NVMeTCP configuration is populated as part of the nvme connect command. Example: Assign IP to the net-device (from any existing Linux tool): ip addr add 100.100.0.101/24 dev p1p1 This IP will be used by both net-device (qede) and offload-device (qedn). In order to connect from "sw" nvme-tcp through the net-device (qede): nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn In order to connect from "offload" nvme-tcp through the offload-device (qedn): nvme connect -t tcp_offload -s 4420 -a 100.100.0.100 -n testnqn An alternative approach, and as a future enhancement that will not impact this series will be to modify nvme-cli with a new flag that will determine if "-t tcp" should be the regular nvme-tcp (which will be the default) or nvme-tcp-offload. Exmaple: nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn -[new flag] Queue Initialization Design: ============================ The nvme-tcp-offload ULP module shall register with the existing nvmf_transport_ops (.name = "tcp_offload"), nvme_ctrl_ops and blk_mq_ops. The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP with the following ops: - claim_dev() - in order to resolve the route to the target according to the paired net_dev. - create_queue() - in order to create offloaded nvme-tcp queue. The nvme-tcp-offload ULP module shall manage all the controller level functionalities, call claim_dev and based on the return values shall call the relevant module create_queue in order to create the admin queue and the IO queues. IO-path Design: =============== The nvme-tcp-offload shall work at the IO-level - the nvme-tcp-offload ULP module shall pass the request (the IO) to the nvme-tcp-offload vendor driver and later, the nvme-tcp-offload vendor driver returns the request completion (the IO completion). No additional handling is needed in between; this design will reduce the CPU utilization as we will describe below. The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP with the following IO-path ops: - send_req() - in order to pass the request to the handling of the offload driver that shall pass it to the vendor specific device. - poll_queue() Once the IO completes, the nvme-tcp-offload vendor driver shall call command.done() that will invoke the nvme-tcp-offload ULP layer to complete the request. TCP events: =========== The Marvell FastLinQ NIC HW engine handle all the TCP re-transmissions and OOO events. Teardown and errors: ==================== In case of NVMeTCP queue error the nvme-tcp-offload vendor driver shall call the nvme_tcp_ofld_report_queue_err. The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP with the following teardown ops: - drain_queue() - destroy_queue() The Marvell FastLinQ NIC HW engine: ==================================== The Marvell NIC HW engine is capable of offloading the entire TCP/IP stack and managing up to 64K connections per PF, already implemented and upstream use cases for this include iWARP (by the Marvell qedr driver) and iSCSI (by the Marvell qedi driver). In addition, the Marvell NIC HW engine offloads the NVMeTCP queue layer and is able to manage the IO level also in case of TCP re-transmissions and OOO events. The HW engine enables direct data placement (including the data digest CRC calculation and validation) and direct data transmission (including data digest CRC calculation). The Marvell qedn driver: ======================== The new driver will be added under "drivers/nvme/hw" and will be enabled by the Kconfig "Marvell NVM Express over Fabrics TCP offload". As part of the qedn init, the driver will register as a pci device driver and will work with the Marvell fastlinQ NIC. As part of the probe, the driver will register to the nvme_tcp_offload (ULP) and to the qed module (qed_nvmetcp_ops) - similar to other "qed_*_ops" which are used by the qede, qedr, qedf and qedi device drivers. nvme-tcp-offload Future work: ============================= - NVMF_OPT_HOST_IFACE Support. Changes since RFC v1: ===================== - nvme-tcp-offload: Fix nvme_tcp_ofld_ops return values. - nvme-tcp-offload: Remove NVMF_TRTYPE_TCP_OFFLOAD. - nvme-tcp-offload: Add nvme_tcp_ofld_poll() implementation. - nvme-tcp-offload: Fix nvme_tcp_ofld_queue_rq() to check map_sg() and send_req() return values. Changes since RFC v2: ===================== - nvme-tcp-offload: Fixes in controller and queue level (patches 3-6). - qedn: Add the Marvell's NVMeTCP HW offload vendor driver init and probe (patches 8-11). Changes since RFC v3: ===================== - nvme-tcp-offload: Add the full implementation of the nvme-tcp-offload layer including the new ops: setup_ctrl(), release_ctrl(), commit_rqs() and new flows (ASYNC and timeout). - nvme-tcp-offload: Add device maximums: max_hw_sectors, max_segments. - nvme-tcp-offload: layer design and optimization changes. Changes since RFC v4: ===================== (Many thanks to Hannes Reinecke for his feedback) - nvme_tcp_offload: Add num_hw_vectors in order to limit the number of queues. - nvme_tcp_offload: Add per device private_data. - nvme_tcp_offload: Fix header digest, data digest and tos initialization. Changes since RFC v5: ===================== (Many thanks to Sagi Grimberg for his feedback) - nvme-fabrics: Expose nvmf_check_required_opts() globally (as a new patch). - nvme_tcp_offload: Remove io-queues BLK_MQ_F_BLOCKING. - nvme_tcp_offload: Fix the nvme_tcp_ofld_stop_queue (drain_queue) flow. - nvme_tcp_offload: Fix the nvme_tcp_ofld_free_queue (destroy_queue) flow. - nvme_tcp_offload: Change rwsem to mutex. - nvme_tcp_offload: remove redundant fields. - nvme_tcp_offload: Remove the "new" from setup_ctrl(). - nvme_tcp_offload: Remove the init_req() and commit_rqs() ops. - nvme_tcp_offload: Minor fixes in nvme_tcp_ofld_create_ctrl() ansd nvme_tcp_ofld_free_queue(). - nvme_tcp_offload: Patch 8 (timeout and async) was squeashed into patch 7 (io level). Changes since RFC v6: ===================== - No changes in nvme_tcp_offload (only in qedn). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	nvme-tcp-offload: Add IO level implementation	Dean Balandin
	In this patch, we present the IO level functionality. The nvme-tcp-offload shall work on the IO-level, meaning the nvme-tcp-offload ULP module shall pass the request to the nvme-tcp-offload vendor driver and shall expect for the request completion. No additional handling is needed in between, this design will reduce the CPU utilization as we will describe below. The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP with the following IO-path ops: - send_req - in order to pass the request to the handling of the offload driver that shall pass it to the vendor specific device - poll_queue The vendor driver will manage the context from which the request will be executed and the request aggregations. Once the IO completed, the nvme-tcp-offload vendor driver shall call command.done() that shall invoke the nvme-tcp-offload ULP layer for completing the request. This patch also add support for the nvme-tcp-offload timeout and nvme-tcp-offload ASYNC flow. Acked-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Dean Balandin <dbalandin@marvell.com> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com> Signed-off-by: Michal Kalderon <mkalderon@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03	nvme-tcp-offload: Add queue level implementation	Dean Balandin
	In this patch we implement queue level functionality. The implementation is similar to the nvme-tcp module, the main difference being that we call the vendor specific create_queue op which creates the TCP connection, and NVMeTPC connection including icreq+icresp negotiation. Once create_queue returns successfully, we can move on to the fabrics connect. Acked-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Dean Balandin <dbalandin@marvell.com> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com> Signed-off-by: Michal Kalderon <mkalderon@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>