summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-06-24ipv6: fix out-of-bound access in ip6_parse_tlv()Eric Dumazet
First problem is that optlen is fetched without checking there is more than one byte to parse. Fix this by taking care of IPV6_TLV_PAD1 before fetching optlen (under appropriate sanity checks against len) Second problem is that IPV6_TLV_PADN checks of zero padding are performed before the check of remaining length. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Fixes: c1412fce7ecc ("net/ipv6/exthdrs.c: Strict PadN option checking") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24Merge branch 'macsec-key-length'David S. Miller
Antoine Tenart says: ==================== net: macsec: fix key length when offloading The key length used to copy the key to offloading drivers and to store it is wrong and was working by chance as it matched the default key length. But using a different key length fails. Fix it by using instead the max length accepted in uAPI to store the key and the actual key length when copying it. This was tested on the MSCC PHY driver but not on the Atlantic MAC (looking at the code it looks ok, but testing would be appreciated). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24net: atlantic: fix the macsec key lengthAntoine Tenart
The key length used to store the macsec key was set to MACSEC_KEYID_LEN (16), which is an issue as: - This was never meant to be the key length. - The key length can be > 16. Fix this by using MACSEC_MAX_KEY_LEN instead (the max length accepted in uAPI). Fixes: 27736563ce32 ("net: atlantic: MACSec egress offload implementation") Fixes: 9ff40a751a6f ("net: atlantic: MACSec ingress offload implementation") Reported-by: Lior Nahmanson <liorna@nvidia.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24net: phy: mscc: fix macsec key lengthAntoine Tenart
The key length used to store the macsec key was set to MACSEC_KEYID_LEN (16), which is an issue as: - This was never meant to be the key length. - The key length can be > 16. Fix this by using MACSEC_MAX_KEY_LEN instead (the max length accepted in uAPI). Fixes: 28c5107aa904 ("net: phy: mscc: macsec support") Reported-by: Lior Nahmanson <liorna@nvidia.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24net: macsec: fix the length used to copy the key for offloadingAntoine Tenart
The key length used when offloading macsec to Ethernet or PHY drivers was set to MACSEC_KEYID_LEN (16), which is an issue as: - This was never meant to be the key length. - The key length can be > 16. Fix this by using MACSEC_MAX_KEY_LEN to store the key (the max length accepted in uAPI) and secy->key_len to copy it. Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure") Reported-by: Lior Nahmanson <liorna@nvidia.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24Merge tag 'linux-can-fixes-for-5.13-20210624' of git://git.kernel.org/David S. Miller
pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2021-06-24 this is a pull request of 2 patches for net/master. The first patch is by Norbert Slusarek and prevent allocation of filter for optlen == 0 in the j1939 CAN protocol. The last patch is by Stephane Grosjean and fixes a potential starvation in the TX path of the peak_pciefd driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24Merge branch 'ibmvnic-fixes'David S. Miller
Sukadev Bhattiprolu says: ==================== ibmvnic: Assorted bug fixes Assorted bug fixes that we tested over the last several weeks. Thanks to Brian King, Cris Forno, Dany Madden and Rick Lindsley for reviews and help with testing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24ibmvnic: parenthesize a checkSukadev Bhattiprolu
Parenthesize a check to be more explicit and to fix a sparse warning seen on some distros. Fixes: 91dc5d2553fbf ("ibmvnic: fix miscellaneous checks") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24ibmvnic: free tx_pool if tso_pool alloc failsSukadev Bhattiprolu
Free tx_pool and clear it, if allocation of tso_pool fails. release_tx_pools() assumes we have both tx and tso_pools if ->tx_pool is non-NULL. If allocation of tso_pool fails in init_tx_pools(), the assumption will not be true and we would end up dereferencing ->tx_buff, ->free_map fields from a NULL pointer. Fixes: 3205306c6b8d ("ibmvnic: Update TX pool initialization routine") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24ibmvnic: set ltb->buff to NULL after freeingSukadev Bhattiprolu
free_long_term_buff() checks ltb->buff to decide whether we have a long term buffer to free. So set ltb->buff to NULL afer freeing. While here, also clear ->map_id, fix up some coding style and log an error. Fixes: 9c4eaabd1bb39 ("Check CRQ command return codes") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24ibmvnic: account for bufs already saved in indir_bufSukadev Bhattiprolu
This fixes a crash in replenish_rx_pool() when called from ibmvnic_poll() after a previous call to replenish_rx_pool() encountered an error when allocating a socket buffer. Thanks to Rick Lindsley and Dany Madden for helping debug the crash. Fixes: 4f0b6812e9b9 ("ibmvnic: Introduce batched RX buffer descriptor transmission") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24ibmvnic: clean pending indirect buffs during resetSukadev Bhattiprolu
We batch subordinate command response queue (scrq) descriptors that we need to send to the VIOS using an "indirect" buffer. If after we queue one or more scrqs in the indirect buffer encounter an error (say fail to allocate an skb), we leave the queued scrq descriptors in the indirect buffer until the next call to ibmvnic_xmit(). On the next call to ibmvnic_xmit(), it is possible that the adapter is going through a reset and it is possible that the long term buffers have been unmapped on the VIOS side. If we proceed to flush (send) the packets that are in the indirect buffer, we will end up using the old map ids and this can cause the VIOS to trigger an unnecessary FATAL error reset. Instead of flushing packets remaining on the indirect_buff, discard (clean) them instead. Fixes: 0d973388185d4 ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24Revert "ibmvnic: remove duplicate napi_schedule call in open function"Dany Madden
This reverts commit 7c451f3ef676c805a4b77a743a01a5c21a250a73. When a vnic interface is taken down and then up, connectivity is not restored. We bisected it to this commit. Reverting this commit until we can fully investigate the issue/benefit of the change. Fixes: 7c451f3ef676 ("ibmvnic: remove duplicate napi_schedule call in open function") Reported-by: Cristobal Forno <cforno12@linux.ibm.com> Reported-by: Abdul Haleem <abdhalee@in.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24Revert "ibmvnic: simplify reset_long_term_buff function"Sukadev Bhattiprolu
This reverts commit 1c7d45e7b2c29080bf6c8cd0e213cc3cbb62a054. We tried to optimize the number of hcalls we send and skipped sending the REQUEST_MAP calls for some maps. However during resets, we need to resend all the maps to the VIOS since the VIOS does not remember the old values. In fact we may have failed over to a new VIOS which will not have any of the mappings. When we send packets with map ids the VIOS does not know about, it triggers a FATAL reset. While the client does recover from the FATAL error reset, we are seeing a large number of such resets. Handling FATAL resets is lot more unnecessary work than issuing a few more hcalls so revert the commit and resend the maps to the VIOS. Fixes: 1c7d45e7b2c ("ibmvnic: simplify reset_long_term_buff function") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24can: peak_pciefd: pucan_handle_status(): fix a potential starvation issue in ↵Stephane Grosjean
TX path Rather than just indicating that transmission can start, this patch requires the explicit flushing of the network TX queue when the driver is informed by the device that it can transmit, next to its configuration. In this way, if frames have already been written by the application, they will actually be transmitted. Fixes: ffd137f7043c ("can: peak/pcie_fd: remove useless code when interface starts") Link: https://lore.kernel.org/r/20210623142600.149904-1-s.grosjean@peak-system.com Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-06-24can: j1939: j1939_sk_setsockopt(): prevent allocation of j1939 filter for ↵Norbert Slusarek
optlen == 0 If optval != NULL and optlen == 0 are specified for SO_J1939_FILTER in j1939_sk_setsockopt(), memdup_sockptr() will return ZERO_PTR for 0 size allocation. The new filter will be mistakenly assigned ZERO_PTR. This patch checks for optlen != 0 and filter will be assigned NULL in case of optlen == 0. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Link: https://lore.kernel.org/r/20210620123842.117975-1-nslusarek@gmx.net Signed-off-by: Norbert Slusarek <nslusarek@gmx.net> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-06-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2021-06-23 The following pull-request contains BPF updates for your *net* tree. We've added 14 non-merge commits during the last 6 day(s) which contain a total of 13 files changed, 137 insertions(+), 64 deletions(-). Note that when you merge net into net-next, there is a small merge conflict between 9f2470fbc4cb ("skmsg: Improve udp_bpf_recvmsg() accuracy") from bpf with c49661aa6f70 ("skmsg: Remove unused parameters of sk_msg_wait_data()") from net-next. Resolution is to: i) net/ipv4/udp_bpf.c: take udp_msg_wait_data() and remove err parameter from the function, ii) net/ipv4/tcp_bpf.c: take tcp_msg_wait_data() and remove err parameter from the function, iii) for net/core/skmsg.c and include/linux/skmsg.h: remove the sk_msg_wait_data() implementation and its prototype in header. The main changes are: 1) Fix BPF poke descriptor adjustments after insn rewrite, from John Fastabend. 2) Fix regression when using BPF_OBJ_GET with non-O_RDWR flags, from Maciej Żenczykowski. 3) Various bug and error handling fixes for UDP-related sock_map, from Cong Wang. 4) Fix patching of vmlinux BTF IDs with correct endianness, from Tony Ambardar. 5) Two fixes for TX descriptor validation in AF_XDP, from Magnus Karlsson. 6) Fix overflow in size calculation for bpf_map_area_alloc(), from Bui Quang Minh. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23ipv6: exthdrs: do not blindly use init_netEric Dumazet
I see no reason why max_dst_opts_cnt and max_hbh_opts_cnt are fetched from the initial net namespace. The other sysctls (max_dst_opts_len & max_hbh_opts_len) are in fact already using the current ns. Note: it is not clear why ipv6_destopt_rcv() use two ways to get to the netns : 1) dev_net(dst->dev) Originally used to increment IPSTATS_MIB_INHDRERRORS 2) dev_net(skb->dev) Tom used this variant in his patch. Maybe this calls to use ipv6_skb_net() instead ? Fixes: 47d3d7ac656a ("ipv6: Implement limits on Hop-by-Hop and Destination options") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@quantonium.net> Cc: Coco Li <lixiaoyan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23net: bcmgenet: Fix attaching to PYH failed on RPi 4BJian-Hong Pan
The Broadcom UniMAC MDIO bus from mdio-bcm-unimac module comes too late. So, GENET cannot find the ethernet PHY on UniMAC MDIO bus. This leads GENET fail to attach the PHY as following log: bcmgenet fd580000.ethernet: GENET 5.0 EPHY: 0x0000 ... could not attach to PHY bcmgenet fd580000.ethernet eth0: failed to connect to PHY uart-pl011 fe201000.serial: no DMA platform data libphy: bcmgenet MII bus: probed ... unimac-mdio unimac-mdio.-19: Broadcom UniMAC MDIO bus This patch adds the soft dependency to load mdio-bcm-unimac module before genet module to avoid the issue. Fixes: 9a4e79697009 ("net: bcmgenet: utilize generic Broadcom UniMAC MDIO controller driver") Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=213485 Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23bonding: allow nesting of bonding deviceDi Zhu
The commit 3c9ef511b9fa ("bonding: avoid adding slave device with IFF_MASTER flag") fix a crash when add slave device with IFF_MASTER, but it rejects the scenario of nested bonding device. As Eric Dumazet described: since there indeed is a usage scenario about nesting bonding, we should not break it. So we add a new judgment condition to allow nesting of bonding device. Fixes: 3c9ef511b9fa ("bonding: avoid adding slave device with IFF_MASTER flag") Suggested-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Di Zhu <zhudi21@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2021-06-23 1) Don't return a mtu smaller than 1280 on IPv6 pmtu discovery. From Sabrina Dubroca 2) Fix seqcount rcu-read side in xfrm_policy_lookup_bytype for the PREEMPT_RT case. From Varad Gautam. 3) Remove a repeated declaration of xfrm_parse_spi. From Shaokun Zhang. 4) IPv4 beet mode can't handle fragments, but IPv6 does. commit 68dc022d04eb ("xfrm: BEET mode doesn't support fragments for inner packets") handled IPv4 and IPv6 the same way. Relax the check for IPv6 because fragments are possible here. From Xin Long. 5) Memory allocation failures are not reported for XFRMA_ENCAP and XFRMA_COADDR in xfrm_state_construct. Fix this by moving both cases in front of the function. 6) Fix a missing initialization in the xfrm offload fallback fail case for bonding devices. From Ayush Sawal. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Nicolas Dichtel updates MAINTAINERS file to add Netfilter IRC channel. 2) Skip non-IPv6 packets in nft_exthdr. 3) Skip non-TCP packets in nft_osf. 4) Skip non-TCP/UDP packets in nft_tproxy. 5) Memleak in hardware offload infrastructure when counters are used for first time in a rule. 6) The VLAN transfer routine must use FLOW_DISSECTOR_KEY_BASIC instead of FLOW_DISSECTOR_KEY_CONTROL. Moreover, make a more robust check for 802.1q and 802.1ad to restore simple matching on transport protocols. 7) Fix bogus EPERM when listing a ruleset when table ownership flag is set on. 8) Honor table ownership flag when table is referenced by handle. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22bpf: Fix null ptr deref with mixed tail calls and subprogsJohn Fastabend
The sub-programs prog->aux->poke_tab[] is populated in jit_subprogs() and then used when emitting 'BPF_JMP|BPF_TAIL_CALL' insn->code from the individual JITs. The poke_tab[] to use is stored in the insn->imm by the code adding it to that array slot. The JIT then uses imm to find the right entry for an individual instruction. In the x86 bpf_jit_comp.c this is done by calling emit_bpf_tail_call_direct with the poke_tab[] of the imm value. However, we observed the below null-ptr-deref when mixing tail call programs with subprog programs. For this to happen we just need to mix bpf-2-bpf calls and tailcalls with some extra calls or instructions that would be patched later by one of the fixup routines. So whats happening? Before the fixup_call_args() -- where the jit op is done -- various code patching is done by do_misc_fixups(). This may increase the insn count, for example when we patch map_lookup_up using map_gen_lookup hook. This does two things. First, it means the instruction index, insn_idx field, of a tail call instruction will move by a 'delta'. In verifier code, struct bpf_jit_poke_descriptor desc = { .reason = BPF_POKE_REASON_TAIL_CALL, .tail_call.map = BPF_MAP_PTR(aux->map_ptr_state), .tail_call.key = bpf_map_key_immediate(aux), .insn_idx = i + delta, }; Then subprog start values subprog_info[i].start will be updated with the delta and any poke descriptor index will also be updated with the delta in adjust_poke_desc(). If we look at the adjust subprog starts though we see its only adjusted when the delta occurs before the new instructions, /* NOTE: fake 'exit' subprog should be updated as well. */ for (i = 0; i <= env->subprog_cnt; i++) { if (env->subprog_info[i].start <= off) continue; Earlier subprograms are not changed because their start values are not moved. But, adjust_poke_desc() does the offset + delta indiscriminately. The result is poke descriptors are potentially corrupted. Then in jit_subprogs() we only populate the poke_tab[] when the above insn_idx is less than the next subprogram start. From above we corrupted our insn_idx so we might incorrectly assume a poke descriptor is not used in a subprogram omitting it from the subprogram. And finally when the jit runs it does the deref of poke_tab when emitting the instruction and crashes with below. Because earlier step omitted the poke descriptor. The fix is straight forward with above context. Simply move same logic from adjust_subprog_starts() into adjust_poke_descs() and only adjust insn_idx when needed. [ 82.396354] bpf_testmod: version magic '5.12.0-rc2alu+ SMP preempt mod_unload ' should be '5.12.0+ SMP preempt mod_unload ' [ 82.623001] loop10: detected capacity change from 0 to 8 [ 88.487424] ================================================================== [ 88.487438] BUG: KASAN: null-ptr-deref in do_jit+0x184a/0x3290 [ 88.487455] Write of size 8 at addr 0000000000000008 by task test_progs/5295 [ 88.487471] CPU: 7 PID: 5295 Comm: test_progs Tainted: G I 5.12.0+ #386 [ 88.487483] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019 [ 88.487490] Call Trace: [ 88.487498] dump_stack+0x93/0xc2 [ 88.487515] kasan_report.cold+0x5f/0xd8 [ 88.487530] ? do_jit+0x184a/0x3290 [ 88.487542] do_jit+0x184a/0x3290 ... [ 88.487709] bpf_int_jit_compile+0x248/0x810 ... [ 88.487765] bpf_check+0x3718/0x5140 ... [ 88.487920] bpf_prog_load+0xa22/0xf10 Fixes: a748c6975dea3 ("bpf: propagate poke descriptors to subprograms") Reported-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Daniel Borkmann <daniel@iogearbox.net>
2021-06-22net: ti: am65-cpsw-nuss: Fix crash when changing number of TX queuesVignesh Raghavendra
When changing number of TX queues using ethtool: # ethtool -L eth0 tx 1 [ 135.301047] Unable to handle kernel paging request at virtual address 00000000af5d0000 [...] [ 135.525128] Call trace: [ 135.525142] dma_release_from_dev_coherent+0x2c/0xb0 [ 135.525148] dma_free_attrs+0x54/0xe0 [ 135.525156] k3_cppi_desc_pool_destroy+0x50/0xa0 [ 135.525164] am65_cpsw_nuss_remove_tx_chns+0x88/0xdc [ 135.525171] am65_cpsw_set_channels+0x3c/0x70 [...] This is because k3_cppi_desc_pool_destroy() which is called after k3_udma_glue_release_tx_chn() in am65_cpsw_nuss_remove_tx_chns() references struct device that is unregistered at the end of k3_udma_glue_release_tx_chn() Therefore the right order is to call k3_cppi_desc_pool_destroy() and destroy desc pool before calling k3_udma_glue_release_tx_chn(). Fix this throughout the driver. Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22net: broadcom: bcm4908_enet: reset DMA rings sw indexes properlyRafał Miłecki
Resetting software indexes in bcm4908_dma_alloc_buf_descs() is not enough as it's called during device probe only. Driver resets DMA on every .ndo_open callback and it's required to reset indexes then. This fixes inconsistent rings state and stalled traffic after interface down & up sequence. Fixes: 4feffeadbcb2 ("net: broadcom: bcm4908enet: add BCM4908 controller driver") Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22net/ipv4: swap flow ports when validating sourceMiao Wang
When doing source address validation, the flowi4 struct used for fib_lookup should be in the reverse direction to the given skb. fl4_dport and fl4_sport returned by fib4_rules_early_flow_dissect should thus be swapped. Fixes: 5a847a6e1477 ("net/ipv4: Initialize proto and ports in flow struct") Signed-off-by: Miao Wang <shankerwangmiao@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22bonding: avoid adding slave device with IFF_MASTER flagDi Zhu
The following steps will definitely cause the kernel to crash: ip link add vrf1 type vrf table 1 modprobe bonding.ko max_bonds=1 echo "+vrf1" >/sys/class/net/bond0/bonding/slaves rmmod bonding The root cause is that: When the VRF is added to the slave device, it will fail, and some cleaning work will be done. because VRF device has IFF_MASTER flag, cleanup process will not clear the IFF_BONDING flag. Then, when we unload the bonding module, unregister_netdevice_notifier() will treat the VRF device as a bond master device and treat netdev_priv() as struct bonding{} which actually is struct net_vrf{}. By analyzing the processing logic of bond_enslave(), it seems that it is not allowed to add the slave device with the IFF_MASTER flag, so we need to add a code check for this situation. Signed-off-by: Di Zhu <zhudi21@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22ip6_tunnel: fix GRE6 segmentationJakub Kicinski
Commit 6c11fbf97e69 ("ip6_tunnel: add MPLS transmit support") moved assiging inner_ipproto down from ipxip6_tnl_xmit() to its callee ip6_tnl_xmit(). The latter is also used by GRE. Since commit 38720352412a ("gre: Use inner_proto to obtain inner header protocol") GRE had been depending on skb->inner_protocol during segmentation. It sets it in gre_build_header() and reads it in gre_gso_segment(). Changes to ip6_tnl_xmit() overwrite the protocol, resulting in GSO skbs getting dropped. Note that inner_protocol is a union with inner_ipproto, GRE uses the former while the change switched it to the latter (always setting it to just IPPROTO_GRE). Restore the original location of skb_set_inner_ipproto(), it is unclear why it was moved in the first place. Fixes: 6c11fbf97e69 ("ip6_tunnel: add MPLS transmit support") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22Merge branch 'mptcp-fixes'David S. Miller
Mat Martineau says: ==================== mptcp: Fixes for v5.13 Here are two MPTCP fixes from Paolo. Patch 1 fixes some possible connect-time race conditions with MPTCP-level connection state changes. Patch 2 deletes a duplicate function declaration. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22mptcp: drop duplicate mptcp_setsockopt() declarationPaolo Abeni
commit 7896248983ef ("mptcp: add skeleton to sync msk socket options to subflows") introduced a duplicate declaration of mptcp_setsockopt(), just drop it. Reported-by: Florian Westphal <fw@strlen.de> Fixes: 7896248983ef ("mptcp: add skeleton to sync msk socket options to subflows") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22mptcp: avoid race on msk state changesPaolo Abeni
The msk socket state is currently updated in a few spots without owning the msk socket lock itself. Some of such operations are safe, as they happens before exposing the msk socket to user-space and can't race with other changes. A couple of them, at connect time, can actually race with close() or shutdown(), leaving breaking the socket state machine. This change addresses the issue moving such update under the msk socket lock with the usual: <acquire spinlock> <check sk lock onwers> <ev defer to release_cb> scheme. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/56 Fixes: 8fd738049ac3 ("mptcp: fallback in case of simultaneous connect") Fixes: c3c123d16c0e ("net: mptcp: don't hang in mptcp_sendmsg() after TCP fallback") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22bpf: Fix integer overflow in argument calculation for bpf_map_area_allocBui Quang Minh
In 32-bit architecture, the result of sizeof() is a 32-bit integer so the expression becomes the multiplication between 2 32-bit integer which can potentially leads to integer overflow. As a result, bpf_map_area_alloc() allocates less memory than needed. Fix this by casting 1 operand to u64. Fixes: 0d2c4f964050 ("bpf: Eliminate rlimit-based memory accounting for sockmap and sockhash maps") Fixes: 99c51064fb06 ("devmap: Use bpf_map_area_alloc() for allocating hash buckets") Fixes: 546ac1ffb70d ("bpf: add devmap, a map for storing net device references") Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210613143440.71975-1-minhquangbui99@gmail.com
2021-06-22sfc: avoid duplicated code in ef10_sriovÍñigo Huguet
The fail path of efx_ef10_sriov_alloc_vf_vswitching is identical to the full content of efx_ef10_sriov_free_vf_vswitching, so replace it for a single call to efx_ef10_sriov_free_vf_vswitching. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22sfc: explain that "attached" VFs only refer to XenÍñigo Huguet
During SRIOV disabling it is checked wether any VF is currently attached to a guest, using pci_vfs_assigned function. However, this check only works with VFs attached with Xen, not with vfio/KVM. Added comments clarifying this point. Also, replaced manual check of PCI_DEV_FLAGS_ASSIGNED flag and used the helper function pci_is_dev_assigned instead. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22sfc: error code if SRIOV cannot be disabledÍñigo Huguet
If SRIOV cannot be disabled during device removal or module unloading, return error code so it can be logged properly in the calling function. Note that this can only happen if any VF is currently attached to a guest using Xen, but not with vfio/KVM. Despite that in that case the VFs won't work properly with PF removed and/or the module unloaded, I have let it as is because I don't know what side effects may have changing it, and also it seems to be the same that other drivers are doing in this situation. In the case of being called during SRIOV reconfiguration, the behavior hasn't changed because the function is called with force=false. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22sfc: avoid double pci_remove of VFsÍñigo Huguet
If pci_remove was called for a PF with VFs, the removal of the VFs was called twice from efx_ef10_sriov_fini: one directly with pci_driver->remove and another implicit by calling pci_disable_sriov, which also perform the VFs remove. This was leading to crashing the kernel on the second attempt. Given that pci_disable_sriov already calls to pci remove function, get rid of the direct call to pci_driver->remove from the driver. 2 different ways to trigger the bug: - Create one or more VFs, then attach the PF to a virtual machine (at least with qemu/KVM) - Create one or more VFs, then remove the PF with: echo 1 > /sys/bus/pci/devices/PF_PCI_ID/remove Removing sfc module does not trigger the error, at least for me, because it removes the VF first, and then the PF. Example of a log with the error: list_del corruption, ffff967fd20a8ad0->next is LIST_POISON1 (dead000000000100) ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:47! [...trimmed...] RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x4c [...trimmed...] Call Trace: efx_dissociate+0x1f/0x140 [sfc] efx_pci_remove+0x27/0x150 [sfc] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x103/0x1f0 pci_stop_bus_device+0x69/0x90 pci_stop_and_remove_bus_device+0xe/0x20 pci_iov_remove_virtfn+0xba/0x120 sriov_disable+0x2f/0xe0 efx_ef10_pci_sriov_disable+0x52/0x80 [sfc] ? pcie_aer_is_native+0x12/0x40 efx_ef10_sriov_fini+0x72/0x110 [sfc] efx_pci_remove+0x62/0x150 [sfc] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x103/0x1f0 unbind_store+0xf6/0x130 kernfs_fop_write+0x116/0x190 vfs_write+0xa5/0x1a0 ksys_write+0x4f/0xb0 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22vxlan: add missing rcu_read_lock() in neigh_reduce()Eric Dumazet
syzbot complained in neigh_reduce(), because rcu_read_lock_bh() is treated differently than rcu_read_lock() WARNING: suspicious RCU usage 5.13.0-rc6-syzkaller #0 Not tainted ----------------------------- include/net/addrconf.h:313 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by kworker/0:0/5: #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline] #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline] #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline] #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline] #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x871/0x1600 kernel/workqueue.c:2247 #1: ffffc90000ca7da8 ((work_completion)(&port->wq)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1600 kernel/workqueue.c:2251 #2: ffffffff8bf795c0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1da/0x3130 net/core/dev.c:4180 stack backtrace: CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.13.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events ipvlan_process_multicast Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 __in6_dev_get include/net/addrconf.h:313 [inline] __in6_dev_get include/net/addrconf.h:311 [inline] neigh_reduce drivers/net/vxlan.c:2167 [inline] vxlan_xmit+0x34d5/0x4c30 drivers/net/vxlan.c:2919 __netdev_start_xmit include/linux/netdevice.h:4944 [inline] netdev_start_xmit include/linux/netdevice.h:4958 [inline] xmit_one net/core/dev.c:3654 [inline] dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3670 __dev_queue_xmit+0x2133/0x3130 net/core/dev.c:4246 ipvlan_process_multicast+0xa99/0xd70 drivers/net/ipvlan/ipvlan_core.c:287 process_one_work+0x98d/0x1600 kernel/workqueue.c:2276 worker_thread+0x64c/0x1120 kernel/workqueue.c:2422 kthread+0x3b1/0x4a0 kernel/kthread.c:313 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Fixes: f564f45c4518 ("vxlan: add ipv6 proxy support") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22bpf: Fix regression on BPF_OBJ_GET with non-O_RDWR flagsMaciej Żenczykowski
This reverts commit d37300ed1821 ("bpf: program: Refuse non-O_RDWR flags in BPF_OBJ_GET"). It breaks Android userspace which expects to be able to fetch programs with just read permissions. See: https://cs.android.com/android/platform/superproject/+/master:frameworks/libs/net/common/native/bpf_syscall_wrappers/include/BpfSyscallWrappers.h;drc=7005c764be23d31fa1d69e826b4a2f6689a8c81e;l=124 Side-note: another option to fix it would be to extend bpf_prog_new_fd() and to pass in used file mode flags in the same way as we do for maps via bpf_map_new_fd(). Meaning, they'd end up in anon_inode_getfd() and thus would be retained for prog fd operations with bpf() syscall. Right now these flags are not checked with progs since they are immutable for their lifetime (as opposed to maps which can be updated from user space). In future this could potentially change with new features, but at that point it's still fine to do the bpf_prog_new_fd() extension when needed. For a simple stable fix, a revert is less churn. Fixes: d37300ed1821 ("bpf: program: Refuse non-O_RDWR flags in BPF_OBJ_GET") Signed-off-by: Maciej Żenczykowski <maze@google.com> [ Daniel: added side-note to commit message ] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Lorenz Bauer <lmb@cloudflare.com> Acked-by: Greg Kroah-Hartman <gregkh@google.com> Link: https://lore.kernel.org/bpf/20210618105526.265003-1-zenczykowski@gmail.com
2021-06-22netfilter: nf_tables: do not allow to delete table with owner by handlePablo Neira Ayuso
nft_table_lookup_byhandle() also needs to validate the netlink PortID owner when deleting a table by handle. Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-22netfilter: nf_tables: skip netlink portID validation if zeroPablo Neira Ayuso
nft_table_lookup() allows us to obtain the table object by the name and the family. The netlink portID validation needs to be skipped for the dump path, since the ownership only applies to commands to update the given table. Skip validation if the specified netlink PortID is zero when calling nft_table_lookup(). Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-22xfrm: Fix xfrm offload fallback fail caseAyush Sawal
In case of xfrm offload, if xdo_dev_state_add() of driver returns -EOPNOTSUPP, xfrm offload fallback is failed. In xfrm state_add() both xso->dev and xso->real_dev are initialized to dev and when err(-EOPNOTSUPP) is returned only xso->dev is set to null. So in this scenario the condition in func validate_xmit_xfrm(), if ((x->xso.dev != dev) && (x->xso.real_dev == dev)) return skb; returns true, due to which skb is returned without calling esp_xmit() below which has fallback code. Hence the CRYPTO_FALLBACK is failing. So fixing this with by keeping x->xso.real_dev as NULL when err is returned in func xfrm_dev_state_add(). Fixes: bdfd2d1fa79a ("bonding/xfrm: use real_dev instead of slave_dev") Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-21pkt_sched: sch_qfq: fix qfq_change_class() error pathEric Dumazet
If qfq_change_class() is unable to allocate memory for qfq_aggregate, it frees the class that has been inserted in the class hash table, but does not unhash it. Defer the insertion after the problematic allocation. BUG: KASAN: use-after-free in hlist_add_head include/linux/list.h:884 [inline] BUG: KASAN: use-after-free in qdisc_class_hash_insert+0x200/0x210 net/sched/sch_api.c:731 Write of size 8 at addr ffff88814a534f10 by task syz-executor.4/31478 CPU: 0 PID: 31478 Comm: syz-executor.4 Not tainted 5.13.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233 __kasan_report mm/kasan/report.c:419 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436 hlist_add_head include/linux/list.h:884 [inline] qdisc_class_hash_insert+0x200/0x210 net/sched/sch_api.c:731 qfq_change_class+0x96c/0x1990 net/sched/sch_qfq.c:489 tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4665d9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fdc7b5f0188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000000056bf80 RCX: 00000000004665d9 RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003 RBP: 00007fdc7b5f01d0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002 R13: 00007ffcf7310b3f R14: 00007fdc7b5f0300 R15: 0000000000022000 Allocated by task 31445: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:428 [inline] ____kasan_kmalloc mm/kasan/common.c:507 [inline] ____kasan_kmalloc mm/kasan/common.c:466 [inline] __kasan_kmalloc+0x9b/0xd0 mm/kasan/common.c:516 kmalloc include/linux/slab.h:556 [inline] kzalloc include/linux/slab.h:686 [inline] qfq_change_class+0x705/0x1990 net/sched/sch_qfq.c:464 tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 31445: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track+0x1c/0x30 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357 ____kasan_slab_free mm/kasan/common.c:360 [inline] ____kasan_slab_free mm/kasan/common.c:325 [inline] __kasan_slab_free+0xfb/0x130 mm/kasan/common.c:368 kasan_slab_free include/linux/kasan.h:212 [inline] slab_free_hook mm/slub.c:1583 [inline] slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1608 slab_free mm/slub.c:3168 [inline] kfree+0xe5/0x7f0 mm/slub.c:4212 qfq_change_class+0x10fb/0x1990 net/sched/sch_qfq.c:518 tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff88814a534f00 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 16 bytes inside of 128-byte region [ffff88814a534f00, ffff88814a534f80) The buggy address belongs to the page: page:ffffea0005294d00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x14a534 flags: 0x57ff00000000200(slab|node=1|zone=2|lastcpupid=0x7ff) raw: 057ff00000000200 ffffea00004fee00 0000000600000006 ffff8880110418c0 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 29797, ts 604817765317, free_ts 604810151744 prep_new_page mm/page_alloc.c:2358 [inline] get_page_from_freelist+0x1033/0x2b60 mm/page_alloc.c:3994 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5200 alloc_pages+0x18c/0x2a0 mm/mempolicy.c:2272 alloc_slab_page mm/slub.c:1646 [inline] allocate_slab+0x2c5/0x4c0 mm/slub.c:1786 new_slab mm/slub.c:1849 [inline] new_slab_objects mm/slub.c:2595 [inline] ___slab_alloc+0x4a1/0x810 mm/slub.c:2758 __slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2798 slab_alloc_node mm/slub.c:2880 [inline] slab_alloc mm/slub.c:2922 [inline] __kmalloc+0x315/0x330 mm/slub.c:4050 kmalloc include/linux/slab.h:561 [inline] kzalloc include/linux/slab.h:686 [inline] __register_sysctl_table+0x112/0x1090 fs/proc/proc_sysctl.c:1318 mpls_dev_sysctl_register+0x1b7/0x2d0 net/mpls/af_mpls.c:1421 mpls_add_dev net/mpls/af_mpls.c:1472 [inline] mpls_dev_notify+0x214/0x8b0 net/mpls/af_mpls.c:1588 notifier_call_chain+0xb5/0x200 kernel/notifier.c:83 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:2121 call_netdevice_notifiers_extack net/core/dev.c:2133 [inline] call_netdevice_notifiers net/core/dev.c:2147 [inline] register_netdevice+0x106b/0x1500 net/core/dev.c:10312 veth_newlink+0x585/0xac0 drivers/net/veth.c:1547 __rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3452 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3500 page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1298 [inline] free_pcp_prepare+0x223/0x300 mm/page_alloc.c:1342 free_unref_page_prepare mm/page_alloc.c:3250 [inline] free_unref_page+0x12/0x1d0 mm/page_alloc.c:3298 __vunmap+0x783/0xb60 mm/vmalloc.c:2566 free_work+0x58/0x70 mm/vmalloc.c:80 process_one_work+0x98d/0x1600 kernel/workqueue.c:2276 worker_thread+0x64c/0x1120 kernel/workqueue.c:2422 kthread+0x3b1/0x4a0 kernel/kthread.c:313 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Memory state around the buggy address: ffff88814a534e00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88814a534e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88814a534f00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88814a534f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88814a535000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Fixes: 462dbc9101acd ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21net: dsa: mv88e6xxx: Fix adding vlan 0Eldar Gasanov
8021q module adds vlan 0 to all interfaces when it starts. When 8021q module is loaded it isn't possible to create bond with mv88e6xxx interfaces, bonding module dipslay error "Couldn't add bond vlan ids", because it tries to add vlan 0 to slave interfaces. There is unexpected behavior in the switch. When a PVID is assigned to a port the switch changes VID to PVID in ingress frames with VID 0 on the port. Expected that the switch doesn't assign PVID to tagged frames with VID 0. But there isn't a way to change this behavior in the switch. Fixes: 57e661aae6a8 ("net: dsa: mv88e6xxx: Link aggregation support") Signed-off-by: Eldar Gasanov <eldargasanov2@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21vsock: notify server to shutdown when client has pending signalLongpeng(Mike)
The client's sk_state will be set to TCP_ESTABLISHED if the server replay the client's connect request. However, if the client has pending signal, its sk_state will be set to TCP_CLOSE without notify the server, so the server will hold the corrupt connection. client server 1. sk_state=TCP_SYN_SENT | 2. call ->connect() | 3. wait reply | | 4. sk_state=TCP_ESTABLISHED | 5. insert to connected list | 6. reply to the client 7. sk_state=TCP_ESTABLISHED | 8. insert to connected list | 9. *signal pending* <--------------------- the user kill client 10. sk_state=TCP_CLOSE | client is exiting... | 11. call ->release() | virtio_transport_close if (!(sk->sk_state == TCP_ESTABLISHED || sk->sk_state == TCP_CLOSING)) return true; *return at here, the server cannot notice the connection is corrupt* So the client should notify the peer in this case. Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jorgen Hansen <jhansen@vmware.com> Cc: Norbert Slusarek <nslusarek@gmx.net> Cc: Andra Paraschiv <andraprs@amazon.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: David Brazdil <dbrazdil@google.com> Cc: Alexander Popov <alex.popov@linux.com> Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lkml.org/lkml/2021/5/17/418 Signed-off-by: lixianming <lixianming5@huawei.com> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21net: mana: Fix a memory leak in an error handling path in 'mana_create_txq()'Christophe JAILLET
If this test fails we must free some resources as in all the other error handling paths of this function. Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21Merge branch 'nnicstar-fixes'David S. Miller
Zheyu Ma says: ==================== atm: nicstar: fix two bugs about error handling Zheyu Ma (2): atm: nicstar: use 'dma_free_coherent' instead of 'kfree' atm: nicstar: register the interrupt handler in the right place ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21atm: nicstar: register the interrupt handler in the right placeZheyu Ma
Because the error handling is sequential, the application of resources should be carried out in the order of error handling, so the operation of registering the interrupt handler should be put in front, so as not to free the unregistered interrupt handler during error handling. This log reveals it: [ 3.438724] Trying to free already-free IRQ 23 [ 3.439060] WARNING: CPU: 5 PID: 1 at kernel/irq/manage.c:1825 free_irq+0xfb/0x480 [ 3.440039] Modules linked in: [ 3.440257] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.12.4-g70e7f0549188-dirty #142 [ 3.440793] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 3.441561] RIP: 0010:free_irq+0xfb/0x480 [ 3.441845] Code: 6e 08 74 6f 4d 89 f4 e8 c3 78 09 00 4d 8b 74 24 18 4d 85 f6 75 e3 e8 b4 78 09 00 8b 75 c8 48 c7 c7 a0 ac d5 85 e8 95 d7 f5 ff <0f> 0b 48 8b 75 c0 4c 89 ff e8 87 c5 90 03 48 8b 43 40 4c 8b a0 80 [ 3.443121] RSP: 0000:ffffc90000017b50 EFLAGS: 00010086 [ 3.443483] RAX: 0000000000000000 RBX: ffff888107c6f000 RCX: 0000000000000000 [ 3.443972] RDX: 0000000000000000 RSI: ffffffff8123f301 RDI: 00000000ffffffff [ 3.444462] RBP: ffffc90000017b90 R08: 0000000000000001 R09: 0000000000000003 [ 3.444950] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [ 3.444994] R13: ffff888107dc0000 R14: ffff888104f6bf00 R15: ffff888107c6f0a8 [ 3.444994] FS: 0000000000000000(0000) GS:ffff88817bd40000(0000) knlGS:0000000000000000 [ 3.444994] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.444994] CR2: 0000000000000000 CR3: 000000000642e000 CR4: 00000000000006e0 [ 3.444994] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3.444994] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3.444994] Call Trace: [ 3.444994] ns_init_card_error+0x18e/0x250 [ 3.444994] nicstar_init_one+0x10d2/0x1130 [ 3.444994] local_pci_probe+0x4a/0xb0 [ 3.444994] pci_device_probe+0x126/0x1d0 [ 3.444994] ? pci_device_remove+0x100/0x100 [ 3.444994] really_probe+0x27e/0x650 [ 3.444994] driver_probe_device+0x84/0x1d0 [ 3.444994] ? mutex_lock_nested+0x16/0x20 [ 3.444994] device_driver_attach+0x63/0x70 [ 3.444994] __driver_attach+0x117/0x1a0 [ 3.444994] ? device_driver_attach+0x70/0x70 [ 3.444994] bus_for_each_dev+0xb6/0x110 [ 3.444994] ? rdinit_setup+0x40/0x40 [ 3.444994] driver_attach+0x22/0x30 [ 3.444994] bus_add_driver+0x1e6/0x2a0 [ 3.444994] driver_register+0xa4/0x180 [ 3.444994] __pci_register_driver+0x77/0x80 [ 3.444994] ? uPD98402_module_init+0xd/0xd [ 3.444994] nicstar_init+0x1f/0x75 [ 3.444994] do_one_initcall+0x7a/0x3d0 [ 3.444994] ? rdinit_setup+0x40/0x40 [ 3.444994] ? rcu_read_lock_sched_held+0x4a/0x70 [ 3.444994] kernel_init_freeable+0x2a7/0x2f9 [ 3.444994] ? rest_init+0x2c0/0x2c0 [ 3.444994] kernel_init+0x13/0x180 [ 3.444994] ? rest_init+0x2c0/0x2c0 [ 3.444994] ? rest_init+0x2c0/0x2c0 [ 3.444994] ret_from_fork+0x1f/0x30 [ 3.444994] Kernel panic - not syncing: panic_on_warn set ... [ 3.444994] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.12.4-g70e7f0549188-dirty #142 [ 3.444994] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 3.444994] Call Trace: [ 3.444994] dump_stack+0xba/0xf5 [ 3.444994] ? free_irq+0xfb/0x480 [ 3.444994] panic+0x155/0x3ed [ 3.444994] ? __warn+0xed/0x150 [ 3.444994] ? free_irq+0xfb/0x480 [ 3.444994] __warn+0x103/0x150 [ 3.444994] ? free_irq+0xfb/0x480 [ 3.444994] report_bug+0x119/0x1c0 [ 3.444994] handle_bug+0x3b/0x80 [ 3.444994] exc_invalid_op+0x18/0x70 [ 3.444994] asm_exc_invalid_op+0x12/0x20 [ 3.444994] RIP: 0010:free_irq+0xfb/0x480 [ 3.444994] Code: 6e 08 74 6f 4d 89 f4 e8 c3 78 09 00 4d 8b 74 24 18 4d 85 f6 75 e3 e8 b4 78 09 00 8b 75 c8 48 c7 c7 a0 ac d5 85 e8 95 d7 f5 ff <0f> 0b 48 8b 75 c0 4c 89 ff e8 87 c5 90 03 48 8b 43 40 4c 8b a0 80 [ 3.444994] RSP: 0000:ffffc90000017b50 EFLAGS: 00010086 [ 3.444994] RAX: 0000000000000000 RBX: ffff888107c6f000 RCX: 0000000000000000 [ 3.444994] RDX: 0000000000000000 RSI: ffffffff8123f301 RDI: 00000000ffffffff [ 3.444994] RBP: ffffc90000017b90 R08: 0000000000000001 R09: 0000000000000003 [ 3.444994] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [ 3.444994] R13: ffff888107dc0000 R14: ffff888104f6bf00 R15: ffff888107c6f0a8 [ 3.444994] ? vprintk_func+0x71/0x110 [ 3.444994] ns_init_card_error+0x18e/0x250 [ 3.444994] nicstar_init_one+0x10d2/0x1130 [ 3.444994] local_pci_probe+0x4a/0xb0 [ 3.444994] pci_device_probe+0x126/0x1d0 [ 3.444994] ? pci_device_remove+0x100/0x100 [ 3.444994] really_probe+0x27e/0x650 [ 3.444994] driver_probe_device+0x84/0x1d0 [ 3.444994] ? mutex_lock_nested+0x16/0x20 [ 3.444994] device_driver_attach+0x63/0x70 [ 3.444994] __driver_attach+0x117/0x1a0 [ 3.444994] ? device_driver_attach+0x70/0x70 [ 3.444994] bus_for_each_dev+0xb6/0x110 [ 3.444994] ? rdinit_setup+0x40/0x40 [ 3.444994] driver_attach+0x22/0x30 [ 3.444994] bus_add_driver+0x1e6/0x2a0 [ 3.444994] driver_register+0xa4/0x180 [ 3.444994] __pci_register_driver+0x77/0x80 [ 3.444994] ? uPD98402_module_init+0xd/0xd [ 3.444994] nicstar_init+0x1f/0x75 [ 3.444994] do_one_initcall+0x7a/0x3d0 [ 3.444994] ? rdinit_setup+0x40/0x40 [ 3.444994] ? rcu_read_lock_sched_held+0x4a/0x70 [ 3.444994] kernel_init_freeable+0x2a7/0x2f9 [ 3.444994] ? rest_init+0x2c0/0x2c0 [ 3.444994] kernel_init+0x13/0x180 [ 3.444994] ? rest_init+0x2c0/0x2c0 [ 3.444994] ? rest_init+0x2c0/0x2c0 [ 3.444994] ret_from_fork+0x1f/0x30 [ 3.444994] Dumping ftrace buffer: [ 3.444994] (ftrace buffer empty) [ 3.444994] Kernel Offset: disabled [ 3.444994] Rebooting in 1 seconds.. Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21atm: nicstar: use 'dma_free_coherent' instead of 'kfree'Zheyu Ma
When 'nicstar_init_one' fails, 'ns_init_card_error' will be executed for error handling, but the correct memory free function should be used, otherwise it will cause an error. Since 'card->rsq.org' and 'card->tsq.org' are allocated using 'dma_alloc_coherent' function, they should be freed using 'dma_free_coherent'. Fix this by using 'dma_free_coherent' instead of 'kfree' This log reveals it: [ 3.440294] kernel BUG at mm/slub.c:4206! [ 3.441059] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 3.441430] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.12.4-g70e7f0549188-dirty #141 [ 3.441986] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 3.442780] RIP: 0010:kfree+0x26a/0x300 [ 3.443065] Code: e8 3a c3 b9 ff e9 d6 fd ff ff 49 8b 45 00 31 db a9 00 00 01 00 75 4d 49 8b 45 00 a9 00 00 01 00 75 0a 49 8b 45 08 a8 01 75 02 <0f> 0b 89 d9 b8 00 10 00 00 be 06 00 00 00 48 d3 e0 f7 d8 48 63 d0 [ 3.443396] RSP: 0000:ffffc90000017b70 EFLAGS: 00010246 [ 3.443396] RAX: dead000000000100 RBX: 0000000000000000 RCX: 0000000000000000 [ 3.443396] RDX: 0000000000000000 RSI: ffffffff85d3df94 RDI: ffffffff85df38e6 [ 3.443396] RBP: ffffc90000017b90 R08: 0000000000000001 R09: 0000000000000001 [ 3.443396] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888107dc0000 [ 3.443396] R13: ffffea00001f0100 R14: ffff888101a8bf00 R15: ffff888107dc0160 [ 3.443396] FS: 0000000000000000(0000) GS:ffff88817bc80000(0000) knlGS:0000000000000000 [ 3.443396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.443396] CR2: 0000000000000000 CR3: 000000000642e000 CR4: 00000000000006e0 [ 3.443396] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3.443396] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3.443396] Call Trace: [ 3.443396] ns_init_card_error+0x12c/0x220 [ 3.443396] nicstar_init_one+0x10d2/0x1130 [ 3.443396] local_pci_probe+0x4a/0xb0 [ 3.443396] pci_device_probe+0x126/0x1d0 [ 3.443396] ? pci_device_remove+0x100/0x100 [ 3.443396] really_probe+0x27e/0x650 [ 3.443396] driver_probe_device+0x84/0x1d0 [ 3.443396] ? mutex_lock_nested+0x16/0x20 [ 3.443396] device_driver_attach+0x63/0x70 [ 3.443396] __driver_attach+0x117/0x1a0 [ 3.443396] ? device_driver_attach+0x70/0x70 [ 3.443396] bus_for_each_dev+0xb6/0x110 [ 3.443396] ? rdinit_setup+0x40/0x40 [ 3.443396] driver_attach+0x22/0x30 [ 3.443396] bus_add_driver+0x1e6/0x2a0 [ 3.443396] driver_register+0xa4/0x180 [ 3.443396] __pci_register_driver+0x77/0x80 [ 3.443396] ? uPD98402_module_init+0xd/0xd [ 3.443396] nicstar_init+0x1f/0x75 [ 3.443396] do_one_initcall+0x7a/0x3d0 [ 3.443396] ? rdinit_setup+0x40/0x40 [ 3.443396] ? rcu_read_lock_sched_held+0x4a/0x70 [ 3.443396] kernel_init_freeable+0x2a7/0x2f9 [ 3.443396] ? rest_init+0x2c0/0x2c0 [ 3.443396] kernel_init+0x13/0x180 [ 3.443396] ? rest_init+0x2c0/0x2c0 [ 3.443396] ? rest_init+0x2c0/0x2c0 [ 3.443396] ret_from_fork+0x1f/0x30 [ 3.443396] Modules linked in: [ 3.443396] Dumping ftrace buffer: [ 3.443396] (ftrace buffer empty) [ 3.458593] ---[ end trace 3c6f8f0d8ef59bcd ]--- [ 3.458922] RIP: 0010:kfree+0x26a/0x300 [ 3.459198] Code: e8 3a c3 b9 ff e9 d6 fd ff ff 49 8b 45 00 31 db a9 00 00 01 00 75 4d 49 8b 45 00 a9 00 00 01 00 75 0a 49 8b 45 08 a8 01 75 02 <0f> 0b 89 d9 b8 00 10 00 00 be 06 00 00 00 48 d3 e0 f7 d8 48 63 d0 [ 3.460499] RSP: 0000:ffffc90000017b70 EFLAGS: 00010246 [ 3.460870] RAX: dead000000000100 RBX: 0000000000000000 RCX: 0000000000000000 [ 3.461371] RDX: 0000000000000000 RSI: ffffffff85d3df94 RDI: ffffffff85df38e6 [ 3.461873] RBP: ffffc90000017b90 R08: 0000000000000001 R09: 0000000000000001 [ 3.462372] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888107dc0000 [ 3.462871] R13: ffffea00001f0100 R14: ffff888101a8bf00 R15: ffff888107dc0160 [ 3.463368] FS: 0000000000000000(0000) GS:ffff88817bc80000(0000) knlGS:0000000000000000 [ 3.463949] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.464356] CR2: 0000000000000000 CR3: 000000000642e000 CR4: 00000000000006e0 [ 3.464856] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3.465356] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3.465860] Kernel panic - not syncing: Fatal exception [ 3.466370] Dumping ftrace buffer: [ 3.466616] (ftrace buffer empty) [ 3.466871] Kernel Offset: disabled [ 3.467122] Rebooting in 1 seconds.. Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21Merge branch 'mptcp-sdeq-fixes'David S. Miller
Mat Martineau says: ==================== mptcp: 32-bit sequence number improvements MPTCP-level sequence numbers are 64 bits, but RFC 8684 allows use of 32-bit sequence numbers in the DSS option to save header space. Those 32-bit numbers are the least significant bits of the full 64-bit sequence number, so the receiver must infer the correct upper 32 bits. These two patches improve the logic for determining the full 64-bit sequence numbers when the 32-bit truncated version has wrapped around. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21mptcp: fix 32 bit DSN expansionPaolo Abeni
The current implementation of 32 bit DSN expansion is buggy. After the previous patch, we can simply reuse the newly introduced helper to do the expansion safely. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/120 Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>