Age | Commit message (Collapse) | Author |
|
Use l2tp_tunnel_get() instead of l2tp_tunnel_find() so that we get
a reference on the tunnel, preventing l2tp_tunnel_destruct() from
freeing it from under us.
Also move l2tp_tunnel_get() below nlmsg_new() so that we only take
the reference when needed.
Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need to make sure the tunnel is not going to be destroyed by
l2tp_tunnel_destruct() concurrently.
Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
l2tp_nl_cmd_tunnel_delete() needs to take a reference on the tunnel, to
prevent it from being concurrently freed by l2tp_tunnel_destruct().
Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
l2tp_tunnel_find() doesn't take a reference on the returned tunnel.
Therefore, it's unsafe to use it because the returned tunnel can go
away on us anytime.
Fix this by defining l2tp_tunnel_get(), which works like
l2tp_tunnel_find(), but takes a reference on the returned tunnel.
Caller then has to drop this reference using l2tp_tunnel_dec_refcount().
As l2tp_tunnel_dec_refcount() needs to be moved to l2tp_core.h, let's
simplify the patch and not move the L2TP_REFCNT_DEBUG part. This code
has been broken (not even compiling) in May 2012 by
commit a4ca44fa578c ("net: l2tp: Standardize logging styles")
and fixed more than two years later by
commit 29abe2fda54f ("l2tp: fix missing line continuation"). So it
doesn't appear to be used by anyone.
Same thing for l2tp_tunnel_free(); instead of moving it to l2tp_core.h,
let's just simplify things and call kfree_rcu() directly in
l2tp_tunnel_dec_refcount(). Extra assertions and debugging code
provided by l2tp_tunnel_free() didn't help catching any of the
reference counting and socket handling issues found while working on
this series.
Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sessions must be fully initialised before calling
l2tp_session_add_to_tunnel(). Otherwise, there's a short time frame
where partially initialised sessions can be accessed by external users.
Fixes: dbdbc73b4478 ("l2tp: fix duplicate session creation")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The mac address is only retrieved from h/w when using PPv2.1. Otherwise
the variable holding it is still checked and used if it contains a valid
value. As the variable isn't initialized to an invalid mac address
value, we end up with random mac addresses which can be the same for all
the ports handled by this PPv2 driver.
Fixes this by initializing the h/w mac address variable to {0}, which is
an invalid mac address value. This way the random assignation fallback
is called and all ports end up with their own addresses.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Fixes: 2697582144dd ("net: mvpp2: handle misc PPv2.1/PPv2.2 differences")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The u-blox TOBY-L4 is a LTE Advanced (Cat 6) module with HSPA+ and 2G
fallback.
Unlike the TOBY-L2, this module has one single USB layout and exposes
several TTYs for control and a NCM interface for data. Connecting this
module may be done just by activating the desired PDP context with
'AT+CGACT=1,<cid>' and then running DHCP on the NCM interface.
Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Noticed that busy_poll_stop() also invoke the drivers napi->poll()
function pointer, but didn't have an associated call to trace_napi_poll()
like all other call sites.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 4.13
Only one iwlwifi patch this time.
iwlwifi
* fix multiple times reported lockdep warning found by new locking
annotation introduced in v4.13-rc1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, in the udp6 code, the dst cookie is not initialized/updated
concurrently with the RX dst used by early demux.
As a result, the dst_check() in the early_demux path always fails,
the rx dst cache is always invalidated, and we can't really
leverage significant gain from the demux lookup.
Fix it adding udp6 specific variant of sk_rx_dst_set() and use it
to set the dst cookie when the dst entry is really changed.
The issue is there since the introduction of early demux for ipv6.
Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast")
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Check memory allocation failure and return -ENOMEM in such a case, as
already done few lines below for another memory allocation.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Florian Fainelli says:
====================
r8169: Be drop monitor friendly
First patch may be questionable but no other driver appears to be
doing that and while it is defendable to account for left packets as
dropped during TX clean, this appears misleading. I picked Stanislaw
changes which brings us back to 2010, but this was present from
pre-git days as well.
Second patch fixes the two missing calls to dev_consume_skb_any().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rtl_tx() is the TX reclamation process whereas rtl8169_tx_clear_range() does
the TX ring cleaning during shutdown, both of these functions should call
dev_consume_skb_any() to be drop monitor friendly.
Fixes: cac4b22f3d6a ("r8169: do not account fragments as packets")
Fixes: eb781397904e ("r8169: Do not use dev_kfree_skb in xmit path")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rtl8169_tx_clear_range() is responsible for cleaning up the TX ring
during interface shutdown, incrementing tx_dropped for every SKB that we
left at the time in the ring is misleading.
Fixes: cac4b22f3d6a ("r8169: do not account fragments as packets")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are a few bugs around refcnt handling in the new BPF congestion
control setsockopt:
- The new ca is assigned to icsk->icsk_ca_ops even in the case where we
cannot get a reference on it. This would lead to a use after free,
since that ca is going away soon.
- Changing the congestion control case doesn't release the refcnt on
the previous ca.
- In the reinit case, we first leak a reference on the old ca, then we
call tcp_reinit_congestion_control on the ca that we have just
assigned, leading to deinitializing the wrong ca (->release of the
new ca on the old ca's data) and releasing the refcount on the ca
that we actually want to use.
This is visible by building (for example) BIC as a module and setting
net.ipv4.tcp_congestion_control=bic, and using tcp_cong_kern.c from
samples/bpf.
This patch fixes the refcount issues, and moves reinit back into tcp
core to avoid passing a ca pointer back to BPF.
Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rt_cookie might be used uninitialized, fix this by
initializing it.
Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a deadlock possible when canceling the link status
delayed work queue. The removal process is run with RTNL held,
and the link status callback is acquring RTNL.
Resolve the issue by using trylock and rescheduling.
If cancel is in process, that block it from happening.
Fixes: 122a5f6410f4 ("staging: hv: use delayed_work for netvsc_send_garp()")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Parthasarathy Bhuvaragan says:
====================
tipc: buffer reassignment fixes
This series contains fixes for buffer reassignments and a context imbalance.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If we fail to find a valid bearer in tipc_node_get_linkname(),
node_read_unlock() is called without holding the node read lock.
This commit fixes this error.
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In tipc_msg_reverse(), we assign skb attributes to local pointers
in stack at startup. This is followed by skb_linearize() and for
cloned buffers we perform skb relocation using pskb_expand_head().
Both these methods may update the skb attributes and thus making
the pointers incorrect.
In this commit, we fix this error by ensuring that the pointers
are re-assigned after any of these skb operations.
Fixes: 29042e19f2c60 ("tipc: let function tipc_msg_reverse() expand header
when needed")
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In tipc_rcv(), we linearize only the header and usually the packets
are consumed as the nodes permit direct reception. However, if the
skb contains tunnelled message due to fail over or synchronization
we parse it in tipc_node_check_state() without performing
linearization. This will cause link disturbances if the skb was
non linear.
In this commit, we perform linearization for the above messages.
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzkaller reported a refcount_t warning [1]
Issue here is that noop_qdisc refcnt was never really considered as
a true refcount, since qdisc_destroy() found TCQ_F_BUILTIN set :
if (qdisc->flags & TCQ_F_BUILTIN ||
!refcount_dec_and_test(&qdisc->refcnt)))
return;
Meaning that all atomic_inc() we did on noop_qdisc.refcnt were not
really needed, but harmless until refcount_t came.
To fix this problem, we simply need to not increment noop_qdisc.refcnt,
since we never decrement it.
[1]
refcount_t: increment on 0; use-after-free.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 21754 at lib/refcount.c:152 refcount_inc+0x47/0x50 lib/refcount.c:152
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 21754 Comm: syz-executor7 Not tainted 4.13.0-rc6+ #20
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
panic+0x1e4/0x417 kernel/panic.c:180
__warn+0x1c4/0x1d9 kernel/panic.c:541
report_bug+0x211/0x2d0 lib/bug.c:183
fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
RIP: 0010:refcount_inc+0x47/0x50 lib/refcount.c:152
RSP: 0018:ffff8801c43477a0 EFLAGS: 00010282
RAX: 000000000000002b RBX: ffffffff86093c14 RCX: 0000000000000000
RDX: 000000000000002b RSI: ffffffff8159314e RDI: ffffed0038868ee8
RBP: ffff8801c43477a8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff86093ac0
R13: 0000000000000001 R14: ffff8801d0f3bac0 R15: dffffc0000000000
attach_default_qdiscs net/sched/sch_generic.c:792 [inline]
dev_activate+0x7d3/0xaa0 net/sched/sch_generic.c:833
__dev_open+0x227/0x330 net/core/dev.c:1380
__dev_change_flags+0x695/0x990 net/core/dev.c:6726
dev_change_flags+0x88/0x140 net/core/dev.c:6792
dev_ifsioc+0x5a6/0x930 net/core/dev_ioctl.c:256
dev_ioctl+0x2bc/0xf90 net/core/dev_ioctl.c:554
sock_do_ioctl+0x94/0xb0 net/socket.c:968
sock_ioctl+0x2c2/0x440 net/socket.c:1058
vfs_ioctl fs/ioctl.c:45 [inline]
do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
SYSC_ioctl fs/ioctl.c:700 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
Fixes: 7b9364050246 ("net, sched: convert Qdisc.refcnt from atomic_t to refcount_t")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Reshetova, Elena <elena.reshetova@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case bcm_sysport_init_tx_ring() is not able to allocate ring->cbs, we
would return with an error, and call bcm_sysport_fini_tx_ring() and it
would see that ring->cbs is NULL and do nothing. This would leak the
coherent DMA descriptor area, so we need to free it on error before
returning.
Reported-by: Eric Dumazet <edumazet@gmail.com>
Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are 3 spots where we call dev_kfree_skb() but we are actually
just doing a normal SKB consumption: __bcmgenet_tx_reclaim() for normal
TX reclamation, bcmgenet_alloc_rx_buffers() during the initial RX ring
setup and bcmgenet_free_rx_buffers() during RX ring cleanup.
Fixes: d6707bec5986 ("net: bcmgenet: rewrite bcmgenet_rx_refill()")
Fixes: f48bed16a756 ("net: bcmgenet: Free skb after last Tx frag")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes a bug causing any sock operations to always return EINVAL.
Fixes: a5192c52377e ("bpf: fix to bpf_setsockops").
Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Craig Gallek <kraig@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Utilize dev_consume_skb_any(cb->skb) in bcm_sysport_free_cb() which is
used when a TX packet is completed, as well as when the RX ring is
cleaned on shutdown. None of these two cases are packet drops, so be
drop monitor friendly.
Suggested-by: Eric Dumazet <edumazet@gmail.com>
Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In 9dbbfb0ab6680c6a85609041011484e6658e7d3c function tipc_sk_reinit
had additional logic added to loop in the event that function
rhashtable_walk_next() returned -EAGAIN. No worries.
However, if rhashtable_walk_start returns -EAGAIN, it does "continue",
and therefore skips the call to rhashtable_walk_stop(). That has
the effect of calling rcu_read_lock() without its paired call to
rcu_read_unlock(). Since rcu_read_lock() may be nested, the problem
may not be apparent for a while, especially since resize events may
be rare. But the comments to rhashtable_walk_start() state:
* ...Note that we take the RCU lock in all
* cases including when we return an error. So you must always call
* rhashtable_walk_stop to clean up.
This patch replaces the continue with a goto and label to ensure a
matching call to rhashtable_walk_stop().
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
gcc-8.0.0 (snapshot) points out that we copy a variable-length string
into a fixed length field using memcpy() with the destination length,
and that ends up copying whatever follows the string:
inlined from 'ql_core_dump' at drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:1106:2:
drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:708:2: error: 'memcpy' reading 15 bytes from a region of size 14 [-Werror=stringop-overflow=]
memcpy(seg_hdr->description, desc, (sizeof(seg_hdr->description)) - 1);
Changing it to use strncpy() will instead zero-pad the destination,
which seems to be the right thing to do here.
The bug is probably harmless, but it seems like a good idea to address
it in stable kernels as well, if only for the purpose of building with
gcc-8 without warnings.
Fixes: a61f80261306 ("qlge: Add ethtool register dump function.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This change is needed to not fool drop monitor.
(perf record ... -e skb:kfree_skb )
Packets were properly sent and are consumed after TX completion.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Fix use after free of struct proc_dir_entry in ipt_CLUSTERIP, patch
from Sabrina Dubroca.
2) Fix spurious EINVAL errors from iptables over nft compatibility layer.
3) Reload pointer to ip header only if there is non-terminal verdict,
ie. XT_CONTINUE, otherwise invalid memory access may happen, patch
from Taehee Yoo.
4) Fix interaction between SYNPROXY and NAT, SYNPROXY adds sequence
adjustment already, however from nf_nat_setup() assumes there's not.
Patch from Xin Long.
5) Fix burst arithmetics in nft_limit as Joe Stringer mentioned during
NFWS in Faro. Patch from Andy Zhou.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current implementation treats the burst configuration the same as
rate configuration. This can cause the per packet cost to be lower
than configured. In effect, this bug causes the token bucket to be
refilled at a higher rate than what user has specified.
This patch changes the implementation so that the token bucket size
is controlled by "rate + burst", while maintain the token bucket
refill rate the same as user specified.
Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Commit 4440a2ab3b9f ("netfilter: synproxy: Check oom when adding synproxy
and seqadj ct extensions") wanted to drop the packet when it fails to add
seqadj ext due to no memory by checking if nfct_seqadj_ext_add returns
NULL.
But that nfct_seqadj_ext_add returns NULL can also happen when seqadj ext
already exists in a nf_conn. It will cause that userspace protocol doesn't
work when both dnat and snat are configured.
Li Shuang found this issue in the case:
Topo:
ftp client router ftp server
10.167.131.2 <-> 10.167.131.254 10.167.141.254 <-> 10.167.141.1
Rules:
# iptables -t nat -A PREROUTING -i eth1 -p tcp -m tcp --dport 21 -j \
DNAT --to-destination 10.167.141.1
# iptables -t nat -A POSTROUTING -o eth2 -p tcp -m tcp --dport 21 -j \
SNAT --to-source 10.167.141.254
In router, when both dnat and snat are added, nf_nat_setup_info will be
called twice. The packet can be dropped at the 2nd time for DNAT due to
seqadj ext is already added at the 1st time for SNAT.
This patch is to fix it by checking for seqadj ext existence before adding
it, so that the packet will not be dropped if seqadj ext already exists.
Note that as Florian mentioned, as a long term, we should review ext_add()
behaviour, it's better to return a pointer to the existing ext instead.
Fixes: 4440a2ab3b9f ("netfilter: synproxy: Check oom when adding synproxy and seqadj ct extensions")
Reported-by: Li Shuang <shuali@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Work queues cannot be allocated when a mutex is held because the mutex
may be in use and that would make it sleep. Doing so generates the
following splat with 4.13+:
[ 19.513298] ======================================================
[ 19.513429] WARNING: possible circular locking dependency detected
[ 19.513557] 4.13.0-rc5+ #6 Not tainted
[ 19.513638] ------------------------------------------------------
[ 19.513767] cpuhp/0/12 is trying to acquire lock:
[ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0
[ 19.514047]
[ 19.514047] but task is already holding lock:
[ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210
[ 19.514338]
[ 19.514338] which lock already depends on the new lock.
This lock dependency already existed with previous kernel versions,
but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link
lock stacks for hotplug callbacks") was introduced.
Reported-by: David Weinehall <david.weinehall@intel.com>
Reported-by: Jiri Kosina <jikos@kernel.org>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Michael Chan says:
====================
bnxt_en: bug fixes.
3 bug fixes related to XDP ring accounting in bnxt_setup_tc(), freeing
MSIX vectors when bnxt_re unregisters, and preserving the user-administered
PF MAC address when disabling SRIOV.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
bnxt_hwrm_func_qcaps() is called during probe to get all device
resources and it also sets up the factory MAC address. The same function
is called when SRIOV is disabled to reclaim all resources. If
the MAC address has been overridden by a user administered MAC
address, calling this function will overwrite it.
Separate the logic that sets up the default MAC address into a new
function bnxt_init_mac_addr() that is only called during probe time.
Fixes: 4a21b49b34c0 ("bnxt_en: Improve VF resource accounting.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Take back ownership of the MSIX vectors when unregistering the device
from bnxt_re.
Fixes: a588e4580a7e ("bnxt_en: Add interface to support RDMA driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the number of TX rings is changed in bnxt_setup_tc(), we need to
include the XDP rings in the total TX ring count.
Fixes: 38413406277f ("bnxt_en: Add support for XDP_TX action.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
TX completion may happen any time after HW queue was kicked.
We can't access the skb afterwards. Move the time stamping
before ringing the doorbell.
Fixes: 4c3523623dc0 ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
inet_diag_msg_sctp{,l}addr_fill() and sctp_get_sctp_info() copy
sizeof(sockaddr_storage) bytes to fill in sockaddr structs used
to export diagnostic information to userspace.
However, the memory allocated to store sockaddr information is
smaller than that and depends on the address family, so we leak
up to 100 uninitialized bytes to userspace. Just use the size of
the source structs instead, in all the three cases this is what
userspace expects. Zero out the remaining memory.
Unused bytes (i.e. when IPv4 addresses are used) in source
structs sctp_sockaddr_entry and sctp_transport are already
cleared by sctp_add_bind_addr() and sctp_transport_new(),
respectively.
Noticed while testing KASAN-enabled kernel with 'ss':
[ 2326.885243] BUG: KASAN: slab-out-of-bounds in inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag] at addr ffff881be8779800
[ 2326.896800] Read of size 128 by task ss/9527
[ 2326.901564] CPU: 0 PID: 9527 Comm: ss Not tainted 4.11.0-22.el7a.x86_64 #1
[ 2326.909236] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
[ 2326.917585] Call Trace:
[ 2326.920312] dump_stack+0x63/0x8d
[ 2326.924014] kasan_object_err+0x21/0x70
[ 2326.928295] kasan_report+0x288/0x540
[ 2326.932380] ? inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag]
[ 2326.938500] ? skb_put+0x8b/0xd0
[ 2326.942098] ? memset+0x31/0x40
[ 2326.945599] check_memory_region+0x13c/0x1a0
[ 2326.950362] memcpy+0x23/0x50
[ 2326.953669] inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag]
[ 2326.959596] ? inet_diag_msg_sctpasoc_fill+0x460/0x460 [sctp_diag]
[ 2326.966495] ? __lock_sock+0x102/0x150
[ 2326.970671] ? sock_def_wakeup+0x60/0x60
[ 2326.975048] ? remove_wait_queue+0xc0/0xc0
[ 2326.979619] sctp_diag_dump+0x44a/0x760 [sctp_diag]
[ 2326.985063] ? sctp_ep_dump+0x280/0x280 [sctp_diag]
[ 2326.990504] ? memset+0x31/0x40
[ 2326.994007] ? mutex_lock+0x12/0x40
[ 2326.997900] __inet_diag_dump+0x57/0xb0 [inet_diag]
[ 2327.003340] ? __sys_sendmsg+0x150/0x150
[ 2327.007715] inet_diag_dump+0x4d/0x80 [inet_diag]
[ 2327.012979] netlink_dump+0x1e6/0x490
[ 2327.017064] __netlink_dump_start+0x28e/0x2c0
[ 2327.021924] inet_diag_handler_cmd+0x189/0x1a0 [inet_diag]
[ 2327.028045] ? inet_diag_rcv_msg_compat+0x1b0/0x1b0 [inet_diag]
[ 2327.034651] ? inet_diag_dump_compat+0x190/0x190 [inet_diag]
[ 2327.040965] ? __netlink_lookup+0x1b9/0x260
[ 2327.045631] sock_diag_rcv_msg+0x18b/0x1e0
[ 2327.050199] netlink_rcv_skb+0x14b/0x180
[ 2327.054574] ? sock_diag_bind+0x60/0x60
[ 2327.058850] sock_diag_rcv+0x28/0x40
[ 2327.062837] netlink_unicast+0x2e7/0x3b0
[ 2327.067212] ? netlink_attachskb+0x330/0x330
[ 2327.071975] ? kasan_check_write+0x14/0x20
[ 2327.076544] netlink_sendmsg+0x5be/0x730
[ 2327.080918] ? netlink_unicast+0x3b0/0x3b0
[ 2327.085486] ? kasan_check_write+0x14/0x20
[ 2327.090057] ? selinux_socket_sendmsg+0x24/0x30
[ 2327.095109] ? netlink_unicast+0x3b0/0x3b0
[ 2327.099678] sock_sendmsg+0x74/0x80
[ 2327.103567] ___sys_sendmsg+0x520/0x530
[ 2327.107844] ? __get_locked_pte+0x178/0x200
[ 2327.112510] ? copy_msghdr_from_user+0x270/0x270
[ 2327.117660] ? vm_insert_page+0x360/0x360
[ 2327.122133] ? vm_insert_pfn_prot+0xb4/0x150
[ 2327.126895] ? vm_insert_pfn+0x32/0x40
[ 2327.131077] ? vvar_fault+0x71/0xd0
[ 2327.134968] ? special_mapping_fault+0x69/0x110
[ 2327.140022] ? __do_fault+0x42/0x120
[ 2327.144008] ? __handle_mm_fault+0x1062/0x17a0
[ 2327.148965] ? __fget_light+0xa7/0xc0
[ 2327.153049] __sys_sendmsg+0xcb/0x150
[ 2327.157133] ? __sys_sendmsg+0xcb/0x150
[ 2327.161409] ? SyS_shutdown+0x140/0x140
[ 2327.165688] ? exit_to_usermode_loop+0xd0/0xd0
[ 2327.170646] ? __do_page_fault+0x55d/0x620
[ 2327.175216] ? __sys_sendmsg+0x150/0x150
[ 2327.179591] SyS_sendmsg+0x12/0x20
[ 2327.183384] do_syscall_64+0xe3/0x230
[ 2327.187471] entry_SYSCALL64_slow_path+0x25/0x25
[ 2327.192622] RIP: 0033:0x7f41d18fa3b0
[ 2327.196608] RSP: 002b:00007ffc3b731218 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 2327.205055] RAX: ffffffffffffffda RBX: 00007ffc3b731380 RCX: 00007f41d18fa3b0
[ 2327.213017] RDX: 0000000000000000 RSI: 00007ffc3b731340 RDI: 0000000000000003
[ 2327.220978] RBP: 0000000000000002 R08: 0000000000000004 R09: 0000000000000040
[ 2327.228939] R10: 00007ffc3b730f30 R11: 0000000000000246 R12: 0000000000000003
[ 2327.236901] R13: 00007ffc3b731340 R14: 00007ffc3b7313d0 R15: 0000000000000084
[ 2327.244865] Object at ffff881be87797e0, in cache kmalloc-64 size: 64
[ 2327.251953] Allocated:
[ 2327.254581] PID = 9484
[ 2327.257215] save_stack_trace+0x1b/0x20
[ 2327.261485] save_stack+0x46/0xd0
[ 2327.265179] kasan_kmalloc+0xad/0xe0
[ 2327.269165] kmem_cache_alloc_trace+0xe6/0x1d0
[ 2327.274138] sctp_add_bind_addr+0x58/0x180 [sctp]
[ 2327.279400] sctp_do_bind+0x208/0x310 [sctp]
[ 2327.284176] sctp_bind+0x61/0xa0 [sctp]
[ 2327.288455] inet_bind+0x5f/0x3a0
[ 2327.292151] SYSC_bind+0x1a4/0x1e0
[ 2327.295944] SyS_bind+0xe/0x10
[ 2327.299349] do_syscall_64+0xe3/0x230
[ 2327.303433] return_from_SYSCALL_64+0x0/0x6a
[ 2327.308194] Freed:
[ 2327.310434] PID = 4131
[ 2327.313065] save_stack_trace+0x1b/0x20
[ 2327.317344] save_stack+0x46/0xd0
[ 2327.321040] kasan_slab_free+0x73/0xc0
[ 2327.325220] kfree+0x96/0x1a0
[ 2327.328530] dynamic_kobj_release+0x15/0x40
[ 2327.333195] kobject_release+0x99/0x1e0
[ 2327.337472] kobject_put+0x38/0x70
[ 2327.341266] free_notes_attrs+0x66/0x80
[ 2327.345545] mod_sysfs_teardown+0x1a5/0x270
[ 2327.350211] free_module+0x20/0x2a0
[ 2327.354099] SyS_delete_module+0x2cb/0x2f0
[ 2327.358667] do_syscall_64+0xe3/0x230
[ 2327.362750] return_from_SYSCALL_64+0x0/0x6a
[ 2327.367510] Memory state around the buggy address:
[ 2327.372855] ffff881be8779700: fc fc fc fc 00 00 00 00 00 00 00 00 fc fc fc fc
[ 2327.380914] ffff881be8779780: fb fb fb fb fb fb fb fb fc fc fc fc 00 00 00 00
[ 2327.388972] >ffff881be8779800: 00 00 00 00 fc fc fc fc fb fb fb fb fb fb fb fb
[ 2327.397031] ^
[ 2327.401792] ffff881be8779880: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc
[ 2327.409850] ffff881be8779900: 00 00 00 00 00 04 fc fc fc fc fc fc 00 00 00 00
[ 2327.417907] ==================================================================
This fixes CVE-2017-7558.
References: https://bugzilla.redhat.com/show_bug.cgi?id=1480266
Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Two kfree_skb() should be consume_skb(), to be friend with drop monitor
(perf record ... -e skb:kfree_skb)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jakub Kicinski says:
====================
nfp: fix SR-IOV deadlock and representor bugs
This series tackles the bug I've already tried to fix in commit
6d48ceb27af1 ("nfp: allocate a private workqueue for driver work").
I created a separate workqueue to avoid possible deadlock, and
the lockdep error disappeared, coincidentally. The way workqueues
are operating, separate workqueue doesn't necessarily mean separate
thread of execution. Luckily we can safely forego the lock.
Second fix changes the order in which vNIC netdevs and representors
are created/destroyed. The fix is kept small and should be sufficient
for net because of how flower uses representors, a more thorough fix
will be targeted at net-next.
Third fix avoids leaking mapped frame buffers if FW sent a frame with
unknown portid.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When driver receives a muxed frame, but it can't find the representor
netdev it is destined to it will try to "drop" that frame, i.e. reuse
the buffer. The issue is that the replacement buffer has already been
allocated at this point, and reusing the buffer from received frame
will leak it. Change the code to put the new buffer on the ring
earlier and not reuse the old buffer (make the buffer parameter
to nfp_net_rx_drop() a NULL).
Fixes: 91bf82ca9eed ("nfp: add support for tx/rx with metadata portid")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
App start/stop callbacks can perform application initialization.
Unfortunately, flower app started using them for creating and
destroying representors. This can lead to a situation where
lower vNIC netdev is destroyed while representors still try
to pass traffic. This will most likely lead to a NULL-dereference
on the lower netdev TX path.
Move the start/stop callbacks, so that representors are created/
destroyed when vNICs are fully initialized.
Fixes: 5de73ee46704 ("nfp: general representor implementation")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Enabling SR-IOV VFs will cause the PCI subsystem to schedule a
work and flush its workqueue. Since the nfp driver schedules its
own work we can't enable VFs while holding driver load. Commit
6d48ceb27af1 ("nfp: allocate a private workqueue for driver work")
tried to avoid this deadlock by creating a separate workqueue.
Unfortunately, due to the architecture of workqueue subsystem this
does not guarantee a separate thread of execution. Luckily
we can simply take pci_enable_sriov() from under the driver lock.
Take pci_disable_sriov() from under the lock too for symmetry.
Fixes: 6d48ceb27af1 ("nfp: allocate a private workqueue for driver work")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Florian Fainelli says:
====================
net: dsa: Fix tag_ksz.c
This implements David's suggestion of providing low-level functions
to control whether skb_pad() and skb_put_padto() should be freeing
the passed skb.
We make use of it to fix a double free in net/dsa/tag_ksz.c that would
occur if we kept using skb_put_padto() in both places.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The first call of skb_put_padto() will free up the SKB on error, but we
return NULL which tells dsa_slave_xmit() that the original SKB should be
freed so this would lead to a double free here.
The second skb_put_padto() already frees the passed sk_buff reference
upon error, so calling kfree_skb() on it again is not necessary.
Detected by CoverityScan, CID#1416687 ("USE_AFTER_FREE")
Fixes: e71cb9e00922 ("net: dsa: ksz: fix skb freeing")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rename skb_pad() into __skb_pad() and make it take a third argument:
free_on_error which controls whether kfree_skb() should be called or
not, skb_pad() directly makes use of it and passes true to preserve its
existing behavior. Do exactly the same thing with __skb_put_padto() and
skb_put_padto().
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When using MII/GMII/SGMII in the Altera SoC, the phy needs to be
wired through the FPGA. To ensure correct behavior, the appropriate
bit in the System Manager FPGA Interface Group register needs to be
set.
Signed-off-by: Stephan Gatzka <stephan.gatzka@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Junote Cai reported that he was not able to get a DSA setup involving the
Freescale DPAA/FMAN driver to work and narrowed it down to
of_find_net_device_by_node(). This function requires the network device's
device reference to be correctly set which is the case here, though we have
lost any device_node association there.
The problem is that dpaa_eth_add_device() allocates a "dpaa-ethernet" platform
device, and later on dpaa_eth_probe() is called but SET_NETDEV_DEV() won't be
propagating &pdev->dev.of_node properly. Fix this by inherenting both the parent
device and the of_node when dpaa_eth_add_device() creates the platform device.
Fixes: 3933961682a3 ("fsl/fman: Add FMan MAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, iproute2's BPF ELF loader works fine with array of maps
when retrieving the fd from a pinned node and doing a selfcheck
against the provided map attributes from the object file, but we
fail to do the same for hash of maps and thus refuse to get the
map from pinned node.
Reason is that when allocating hash of maps, fd_htab_map_alloc() will
set the value size to sizeof(void *), and any user space map creation
requests are forced to set 4 bytes as value size. Thus, selfcheck
will complain about exposed 8 bytes on 64 bit archs vs. 4 bytes from
object file as value size. Contract is that fdinfo or BPF_MAP_GET_FD_BY_ID
returns the value size used to create the map.
Fix it by handling it the same way as we do for array of maps, which
means that we leave value size at 4 bytes and in the allocation phase
round up value size to 8 bytes. alloc_htab_elem() needs an adjustment
in order to copy rounded up 8 bytes due to bpf_fd_htab_map_update_elem()
calling into htab_map_update_elem() with the pointer of the map
pointer as value. Unlike array of maps where we just xchg(), we're
using the generic htab_map_update_elem() callback also used from helper
calls, which published the key/value already on return, so we need
to ensure to memcpy() the right size.
Fixes: bcc6b1b7ebf8 ("bpf: Add hash of maps support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|