Age | Commit message (Collapse) | Author |
|
Add stricter checking for a few attributes.
Note that these changes don't fix any bug in the current code base.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We used to have several queueing backends, but nowadays only
nfnetlink_queue remains.
In light of this there doesn't seem to be a good reason to
support per-af registering -- just hook up nfnetlink_queue on module
load and remove it on unload.
This means that the userspace BIND/UNBIND_PF commands are now obsolete;
the kernel will ignore them.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch adds a new operation to dump the content of the dying and
unconfirmed lists.
Under some situations, the global conntrack counter can be inconsistent
with the number of entries that we can dump from the conntrack table.
The way to resolve this is to allow dumping the content of the unconfirmed
and dying lists, so far it was not possible to look at its content.
This provides some extra instrumentation to resolve problematic situations
in which anyone suspects memory leaks.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch modifies the conntrack subsystem so that all existing
allocated conntrack objects can be found in any of the following
places:
* the hash table, this is the typical place for alive conntrack objects.
* the unconfirmed list, this is the place for newly created conntrack objects
that are still traversing the stack.
* the dying list, this is where you can find conntrack objects that are dying
or that should die anytime soon (eg. once the destroy event is delivered to
the conntrackd daemon).
Thus, we make sure that we follow the track for all existing conntrack
objects. This patch, together with some extension of the ctnetlink interface
to dump the content of the dying and unconfirmed lists, will help in case
to debug suspected nf_conn object leaks.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The max number of sets was hardcoded at kernel cofiguration time and
could only be modified via a module parameter. The patch adds the support
of increasing the max number of sets automatically, as needed.
The array of sets is incremented by 64 new slots if we run out of
empty slots. The absolute limit for the maximal number of sets
is limited by 65534.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
TCP coalescing added a regression in splice(socket->pipe) performance,
for some workloads because of the way tcp_read_sock() is implemented.
The reason for this is the break when (offset + 1 != skb->len).
As we released the socket lock, this condition is possible if TCP stack
added a fragment to the skb, which can happen with TCP coalescing.
So let's go back to the beginning of the loop when this happens,
to give a chance to splice more frags per system call.
Doing so fixes the issue and makes GRO 10% faster than LRO
on CPU-bound splice() workloads instead of the opposite.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
David Woodhouse says:
====================
This is the result of pulling on the thread started by Krzysztof Mazur's
original patch 'pppoatm: don't send frames to destroyed vcc'.
Various problems in the pppoatm and br2684 code are solved, some of which
were easily triggered and would panic the kernel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We don't need to schedule the wakeup tasklet on *every* unlock; only if we
actually blocked the channel in the first place.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
|
|
The br2684 does not check if used vcc is in connected state,
causing potential Oops in pppoatm_send() when vcc->send() is called
on not fully connected socket.
Now br2684 can be assigned only on connected sockets; otherwise
-EINVAL error is returned.
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
The br2684 code used module_put() during unassignment from vcc with
hope that we have BKL. This assumption is no longer true.
Now owner field in atmvcc is used to move this module_put()
to vcc_destroy_socket().
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
|
|
Now that we can return zero from pppoatm_send() for reasons *other* than
the queue being full, that means we can't depend on a subsequent call to
pppoatm_pop() waking the queue, and we might leave it stalled
indefinitely.
Use the ->release_cb() callback to wake the queue after the sock is
unlocked.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
|
|
Avoid submitting packets to a vcc which is being closed. Things go badly
wrong when the ->pop method gets later called after everything's been
torn down.
Use the ATM socket lock for synchronisation with vcc_destroy_socket(),
which clears the ATM_VF_READY bit under the same lock. Otherwise, we
could end up submitting a packet to the device driver even after its
->ops->close method has been called. And it could call the vcc's ->pop
method after the protocol has been shut down. Which leads to a panic.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
|
|
The immediate use case for this is that it will allow us to ensure that a
pppoatm queue is woken after it has to drop a packet due to the sock being
locked.
Note that 'release_cb' is called when the socket is *unlocked*. This is
not to be confused with vcc_release() — which probably ought to be called
vcc_close().
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
|
|
As of 026359b [ipv6: Send ICMPv6 RSes only when RAs are accepted], the
logic determining whether to send Router Solicitations is identical
to the logic determining whether kernel accepts Router Advertisements.
However the condition itself is repeated in several code locations.
Unify it by introducing 'ipv6_accept_ra()' accessor.
Also, simplify the condition expression, making it more readable.
No semantic change.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As time passed, available memory increased faster than number of
concurrent tcp sockets.
As a result, a machine with 4GB of ram gets a hash table
with 524288 slots, using 8388608 bytes of memory.
Lets change that by a 16x factor (one slot for 128 KB of ram)
Even if a small machine needs a _lot_ of sockets, tcp lookups are now
very efficient, using one cache line per socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
commit 68835aba4d9b (net: optimize INET input path further)
moved some fields used for tcp/udp sockets lookup in the first cache
line of struct sock_common.
This patch moves inet_dport/inet_num as well, filling a 32bit hole
on 64 bit arches and reducing number of cache line misses in lookups.
Also change INET_MATCH()/INET_TW_MATCH() to perform the ports match
before addresses match, as this check is more discriminant.
Remove the hash check from MATCH() macros because we dont need to
re validate the hash value after taking a refcount on socket, and
use likely/unlikely compiler hints, as the sk_hash/hash check
makes the following conditional tests 100% predicted by cpu.
Introduce skc_addrpair/skc_portpair pair values to better
document the alignment requirements of the port/addr pairs
used in the various MATCH() macros, and remove some casts.
The namespace check can also be done at last.
This slightly improves TCP/UDP lookup times.
IP/TCP early demux needs inet->rx_dst_ifindex and
TCP needs inet->min_ttl, lets group them together in same cache line.
With help from Ben Hutchings & Joe Perches.
Idea of this patch came after Ling Ma proposal to move skc_hash
to the beginning of struct sock_common, and should allow him
to submit a final version of his patch. My tests show an improvement
doing so.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Joe Perches <joe@perches.com>
Cc: Ling Ma <ling.ma.program@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the variable parameter length provided in the mandatory
heartbeat information parameter exceeds the calculated payload
length the packet has been corrupted. Reply with a parameter
length protocol violation message.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch removes an unused parameter (src_net) from rtnl_create_link()
method and from the method single invocation, in veth.
This parameter was used in the past when calling
ops->get_tx_queues(src_net, tb) in rtnl_create_link().
The get_tx_queues() member of rtnl_link_ops was replaced by two methods,
get_num_tx_queues() and get_num_rx_queues(), which do not get any
parameter. This was done in commit d40156aa5ecbd51fed932ed4813df82b56e5ff4d by
Jiri Pirko ("rtnl: allow to specify different num for rx and tx queue count").
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Included changes:
- Use the new ETH_P_BATMAN define instead of the private BATADV_ETH_P_BATMAN
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Trinity (the syscall fuzzer) triggered the following BUG, reproducible
only when the kernel is configured with CONFIG_SCTP_DBG_MSG=y.
When CONFIG_SCTP_DBG_MSG is not set, the null pointer is never
dereferenced.
---[ end trace a4de0bfcb38a3642 ]---
BUG: unable to handle kernel NULL pointer dereference at 0000000000000100
IP: [<ffffffff8136796e>] ip6_string+0x1e/0xa0
PGD 4eead067 PUD 4e472067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in:
CPU 3
Pid: 21324, comm: trinity-child11 Tainted: G W 3.7.0-rc7+ #61 ASUSTeK Computer INC. EB1012/EB1012
RIP: 0010:[<ffffffff8136796e>] [<ffffffff8136796e>] ip6_string+0x1e/0xa0
RSP: 0018:ffff88004e4637a0 EFLAGS: 00010046
RAX: ffff88004e4637da RBX: ffff88004e4637da RCX: 0000000000000000
RDX: ffffffff8246e92a RSI: 0000000000000100 RDI: ffff88004e4637da
RBP: ffff88004e4637a8 R08: 000000000000ffff R09: 000000000000ffff
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8289d600
R13: ffffffff8289d230 R14: ffffffff8246e928 R15: ffffffff8289d600
FS: 00007fed95153700(0000) GS:ffff88005fd80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000100 CR3: 000000004eeac000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process trinity-child11 (pid: 21324, threadinfo ffff88004e462000, task ffff8800524b0000)
Stack:
ffff88004e4637da ffff88004e463828 ffffffff81368eee 000000004e4637d8
ffffffff0000ffff ffff88000000ffff 0000000000000000 000000004e4637f8
ffffffff826285d8 ffff88004e4637f8 0000000000000000 ffff8800524b06b0
Call Trace:
[<ffffffff81368eee>] ip6_addr_string.isra.11+0x3e/0xa0
[<ffffffff81369183>] pointer.isra.12+0x233/0x2d0
[<ffffffff810a413a>] ? vprintk_emit+0x1ba/0x450
[<ffffffff8110953d>] ? trace_hardirqs_on_caller+0x10d/0x1a0
[<ffffffff81369757>] vsnprintf+0x187/0x5d0
[<ffffffff81369c62>] vscnprintf+0x12/0x30
[<ffffffff810a4028>] vprintk_emit+0xa8/0x450
[<ffffffff81e5cb00>] printk+0x49/0x4b
[<ffffffff81d17221>] sctp_v6_get_dst+0x731/0x780
[<ffffffff81d16e15>] ? sctp_v6_get_dst+0x325/0x780
[<ffffffff81d00a96>] sctp_transport_route+0x46/0x120
[<ffffffff81cff0f1>] sctp_assoc_add_peer+0x161/0x350
[<ffffffff81d0fd8d>] sctp_sendmsg+0x6cd/0xcb0
[<ffffffff81b55bf0>] ? inet_create+0x670/0x670
[<ffffffff81b55cfb>] inet_sendmsg+0x10b/0x220
[<ffffffff81b55bf0>] ? inet_create+0x670/0x670
[<ffffffff81a72a64>] ? sock_update_classid+0xa4/0x2b0
[<ffffffff81a72ab0>] ? sock_update_classid+0xf0/0x2b0
[<ffffffff81a6ac1c>] sock_sendmsg+0xdc/0xf0
[<ffffffff8118e9e5>] ? might_fault+0x85/0x90
[<ffffffff8118e99c>] ? might_fault+0x3c/0x90
[<ffffffff81a6e12a>] sys_sendto+0xfa/0x130
[<ffffffff810a9887>] ? do_setitimer+0x197/0x380
[<ffffffff81e960d5>] ? sysret_check+0x22/0x5d
[<ffffffff81e960a9>] system_call_fastpath+0x16/0x1b
Code: 01 eb 89 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 f8 31 c9 48 89 e5 53 eb 12 0f 1f 40 00 48 83 c1 01 48 83 c0 04 48 83 f9 08 74 70 <0f> b6 3c 4e 89 fb 83 e7 0f c0 eb 04 41 89 d8 41 83 e0 0f 0f b6
RIP [<ffffffff8136796e>] ip6_string+0x1e/0xa0
RSP <ffff88004e4637a0>
CR2: 0000000000000100
---[ end trace a4de0bfcb38a3643 ]---
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
kfree_skb() indicates failure, which is where this is being used.
Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
kfree_skb() was not getting called in the case of some failures.
This was pointed out by Eric Dumazet.
Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change the threshold for framentation of a lowpan packet from
using the MTU size to now use the MTU size minus the checksum length,
which is added by the hardware. For IEEE 802.15.4, this effectively
changes it from 127 bytes to 125 bytes.
Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This bug is observed on running FCoE over a VLAN device associated w/
a real device that has IFF_UNICAST_FLT set since FCoE would add unicast
address such as FLOGI MAC to the VLAN interface that FCoE is on. Since
currently, VLAN device is not inheriting the IFF_UNICAST_FLT flag from the
parent real device even though the real device is capable of doing unicast
filtering. This forces the VLAN device and its real device go to promiscuous
mode unnecessarily even the added address is actually being added to the
available unicast filter table in real device.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Cc: devel@open-fcoe.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Conflicts:
net/ipv6/exthdrs_core.c
Jesse Gross says:
====================
This series of improvements for 3.8/net-next contains four components:
* Support for modifying IPv6 headers
* Support for matching and setting skb->mark for better integration with
things like iptables
* Ability to recognize the EtherType for RARP packets
* Two small performance enhancements
The movement of ipv6_find_hdr() into exthdrs_core.c causes two small merge
conflicts. I left it as is but can do the merge if you want. The conflicts
are:
* ipv6_find_hdr() and ipv6_find_tlv() were both moved to the bottom of
exthdrs_core.c. Both should stay.
* A new use of ipv6_find_hdr() was added to net/netfilter/ipvs/ip_vs_core.c
after this patch. The IPVS user has two instances of the old constant
name IP6T_FH_F_FRAG which has been renamed to IP6_FH_F_FRAG.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The pppoatm_may_send() is quite heavy and it's called three times
in pppoatm_send() and inlining costs more than 200 bytes of code
(more than 10% of total pppoatm driver code size).
add/remove: 1/0 grow/shrink: 0/1 up/down: 132/-367 (-235)
function old new delta
pppoatm_may_send - 132 +132
pppoatm_send 900 533 -367
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
The vcc_destroy_socket() closes vcc before the protocol is detached
from vcc by calling vcc->push() with NULL skb. This leaves some time
window, where the protocol may call vcc->send() on closed vcc
and crash.
Now pppoatm_send(), like vcc_sendmsg(), checks for vcc flags that
indicate that vcc is not ready. If the vcc is not ready we just
drop frame. Queueing frames is much more complicated because we
don't have callbacks that inform us about vcc flags changes.
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
The ETH_P_BATMAN ethertype is now defined kernel-wide. Use it instead
of the private BATADV_ETH_P_BATMAN define.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
|
|
This patch changes three methods to be static and removes their
EXPORT_SYMBOLs in core/dev.c and their external declaration in
netdevice.h. The methods, dev_gro_receive(), napi_frags_finish() and
napi_skb_finish(), which are in the GRO rx path, are not used
outside core/dev.c.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Two small openswitch fixes from Jesse Gross.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:
====================
This pull request is intended for the 3.8 stream. It is a bit large
-- I guess Thanksgiving got me off track! At least the code got to
spend some time in linux-next... :-)
This includes the usual batch of pulls for Bluetooth, NFC, and mac80211
as well as iwlwifi. Also here is an ath6kl pull, and a new driver
in the rtlwifi family. The brcmfmac, brcmsmac, ath9k, and mwl8k get
their usual levels of attention, and a handful of other updates tag
along as well.
For more detail on the pulls, please see below...
On Bluetooth, Gustavo says:
"Another set of patches for integration in wireless-next. There are two big set
of changes in it: Andrei Emeltchenko and Mat Martineau added more patches
towards a full Bluetooth High Speed support and Johan Hedberg improve the
single mode support for Bluetooth dongles. Apart from that we have small fixes
and improvements."
...and:
"A few patches to 3.8. The majority of the work here is from Andrei on the High
Speed support. Other than that Johan added support for setting LE advertising
data. The rest are fixes and clean ups and small improvements like support for
a new broadcom hardware."
On mac80211, Johannes says:
"This is for mac80211, for -next (3.8). Plenty of changes, as you can see
below. Some fixes for previous changes like the export.h include, the
beacon listener fix from Ben Greear, etc. Overall, no exciting new
features, though hwsim does gain channel context support for people to
try it out and look at."
...and...:
"This one contains the mac80211-next material. Apart from a few small new
features and cleanups I have two fixes for the channel context code. The
RX_END timestamp support will probably be reworked again as Simon Barber
noted the calculations weren't really valid, but the discussions there
are still going on and it's better than what we had before."
...and:
"Please pull (see below) to get the following changes:
* a fix & a debug aid in IBSS from Antonio,
* mesh cleanups from Marco,
* a few bugfixes for some of my previous patches from Arend and myself,
* and the big initial VHT support patchset"
And on iwlwifi, Johannes says:
"In addition to the previous four patches that I'm not resending,
we have a number of cleanups, message reduction, firmware error
handling improvements (yes yes... we need to fix them instead)
and various other small things all over."
...and:
"In his quest to try to understand the current iwlwifi problems (like
stuck queues etc.) Emmanuel has first cleaned up the PCIe code, I'm
including his changes in this pull request. Other than that I only have
a small cleanup from Sachin Kamat to remove a duplicate include and a
bugfix to turn off MFP if software crypto is enabled, but this isn't
really interesting as MFP isn't supported right now anyway."
On NFC, Samuel says:
"With this one we have:
- A few HCI improvements in preparation for an upcoming HCI chipset support.
- A pn544 code cleanup after the old driver was removed.
- An LLCP improvement for notifying user space when one peer stops ACKing I
frames."
On ath6kl, Kalle says:
"Major changes this time are firmware recover support to gracefully
handle if firmware crashes, support for changing regulatory domain and
support for new ar6004 hardware revision 1.4. Otherwise there are just
smaller fixes or cleanups from different people."
Thats about it... :-) Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, each time a device is detached from an OVS datapath
we call synchronize RCU before freeing associated data structures.
However, if a bridge is deleted (which detaches all ports) when
many devices are connected then there can be a long delay. This
switches to use call_rcu() to group the cost together.
Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
|
|
This patch reports the change made by Stephen Hemminger in ipip and gre[6] in
commit eccc1bb8d4b4 (tunnel: drop packet if ECN present with not-ECT).
Goal is to handle RFC6040, Section 4.2:
Default Tunnel Egress Behaviour.
o If the inner ECN field is Not-ECT, the decapsulator MUST NOT
propagate any other ECN codepoint onwards. This is because the
inner Not-ECT marking is set by transports that rely on dropped
packets as an indication of congestion and would not understand or
respond to any other ECN codepoint [RFC4774]. Specifically:
* If the inner ECN field is Not-ECT and the outer ECN field is
CE, the decapsulator MUST drop the packet.
* If the inner ECN field is Not-ECT and the outer ECN field is
Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the
outgoing packet with the ECN field cleared to Not-ECT.
The patch takes benefits from common function added in net/inet_ecn.h.
Like it was done for Xin4 tunnels, it adds logging to allow detecting broken
systems that set ECN bits incorrectly when tunneling (or an intermediate
router might be changing the header). Errors are also tracked via
rx_frame_error.
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
An interface name overflow fix in netfilter via Pablo Neira Ayuso.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cleanup the memory we allocated earlier in irttp_open_tsap() when we hit
this error path. The leak goes back to at least 1da177e4
("Linux-2.6.12-rc2").
Discovered with Trinity (the syscall fuzzer).
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch turns QFQ into QFQ+, a variant of QFQ that provides the
following two benefits: 1) QFQ+ is faster than QFQ, 2) differently
from QFQ, QFQ+ correctly schedules also non-leaves classes in a
hierarchical setting. A detailed description of QFQ+, plus a
performance comparison with DRR and QFQ, can be found in [1].
[1] P. Valente, "Reducing the Execution Time of Fair-Queueing Schedulers"
http://algo.ing.unimo.it/people/paolo/agg-sched/agg-sched.pdf
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The calculation of RTTVAR involves the subtraction of two unsigned
numbers which
may causes rollover and results in very high values of RTTVAR when RTT > SRTT.
With this patch it is possible to set RTOmin = 1 to get the minimum of RTO at
4 times the clock granularity.
Change Notes:
v2)
*Replaced abs() by abs64() and long by __s64, changed patch
description.
Signed-off-by: Christian Schoch <e0326715@student.tuwien.ac.at>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: Neil Horman <nhorman@tuxdriver.com>
CC: linux-sctp@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Consider the following program, that sets the second argument to the
sendto() syscall incorrectly:
#include <string.h>
#include <arpa/inet.h>
#include <sys/socket.h>
int main(void)
{
int fd;
struct sockaddr_in sa;
fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
if (fd < 0)
return 1;
memset(&sa, 0, sizeof(sa));
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = inet_addr("127.0.0.1");
sa.sin_port = htons(11111);
sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));
return 0;
}
We get -ENOMEM:
$ strace -e sendto ./demo
sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ENOMEM (Cannot allocate memory)
Propagate the error code from sctp_user_addto_chunk(), so that we will
tell user space what actually went wrong:
$ strace -e sendto ./demo
sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EFAULT (Bad address)
Noticed while running Trinity (the syscall fuzzer).
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
fails
Trinity (the syscall fuzzer) discovered a memory leak in SCTP,
reproducible e.g. with the sendto() syscall by passing invalid
user space pointer in the second argument:
#include <string.h>
#include <arpa/inet.h>
#include <sys/socket.h>
int main(void)
{
int fd;
struct sockaddr_in sa;
fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
if (fd < 0)
return 1;
memset(&sa, 0, sizeof(sa));
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = inet_addr("127.0.0.1");
sa.sin_port = htons(11111);
sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));
return 0;
}
As far as I can tell, the leak has been around since ~2003.
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
drivers/net/wireless/iwlwifi/pcie/tx.c
|
|
The pppoatm_send() does not take any lock that will prevent concurrent
vcc_sendmsg(). This causes two problems:
- there is no locking between checking the send queue size
with atm_may_send() and incrementing sk_wmem_alloc,
and the real queue size can be a little higher than sk_sndbuf
- the vcc->sendmsg() can be called concurrently. I'm not sure
if it's allowed. Some drivers (eni, nicstar, ...) seem
to assume it will never happen.
Now pppoatm_send() takes ATM socket lock, the same that is used
in vcc_sendmsg() and other ATM socket functions. The pppoatm_send()
is called with BH disabled, so bh_lock_sock() is used instead
of lock_sock().
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
The pppoatm used module_put() during unassignment from vcc with
hope that we have BKL. This assumption is no longer true.
Now owner field in atmvcc is used to move this module_put()
to vcc_destroy_socket().
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
The pppoatm does not check if used vcc is in connected state,
causing an Oops in pppoatm_send() when vcc->send() is called
on not fully connected socket.
Now pppoatm can be assigned only on connected sockets; otherwise
-EINVAL error is returned.
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
The atm is using atmvcc->push(vcc, NULL) callback to notify protocol
that vcc will be closed and protocol must detach from it. This callback
is usually used by protocol to decrement module usage count by module_put(),
but it leaves small window then module is still used after module_put().
Now the owner of push() callback is kept in atmvcc and
module_put(atmvcc->owner) is called after the protocol is detached from vcc.
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Chas Williams <chas@cmf.nrl.navy.mil>
|
|
Name of pimreg devices are built from following format :
char name[IFNAMSIZ]; // IFNAMSIZ == 16
sprintf(name, "pimreg%u", mrt->id);
We must therefore limit mrt->id to 9 decimal digits
or risk a buffer overflow and a crash.
Restrict table identifiers in [0 ... 999999999] interval.
Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Verify the length of the user-space arguments.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
inet_getpeer_v4() can return NULL under OOM conditions, and while
inet_peer_xrlim_allow() is OK with a NULL peer, inet_putpeer() will
crash.
This code path now uses the same idiom as the others from:
1d861aa4b3fb08822055345f480850205ffe6170 ("inet: Minimize use of
cached route inetpeer.").
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of having the getsockopt() of SO_BINDTODEVICE return an index, which
will then require another call like if_indextoname() to get the actual interface
name, have it return the name directly.
This also matches the existing man page description on socket(7) which mentions
the argument being an interface name.
If the value has not been set, zero is returned and optlen will be set to zero
to indicate there is no interface name present.
Added a seqlock to protect this code path, and dev_ifname(), from someone
changing the device name via dev_change_name().
v2: Added seqlock protection while copying device name.
v3: Fixed word wrap in patch.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There's really no excuse for an additional wmem_default of buffering
between the netdev queue and the ATM device. Two packets (one in-flight,
and one ready to send) ought to be fine. It's not as if it should take
long to get another from the netdev queue when we need it.
If necessary we can make the queue space configurable later, but I don't
think it's likely to be necessary.
cf. commit 9d02daf754238adac48fa075ee79e7edd3d79ed3 (pppoatm: Fix
excessive queue bloat) which did something very similar for PPPoATM.
Note that there is a tremendously unlikely race condition which may
result in qspace temporarily going negative. If a CPU running the
br2684_pop() function goes off into the weeds for a long period of time
after incrementing qspace to 1, but before calling netdev_wake_queue()...
and another CPU ends up calling br2684_start_xmit() and *stopping* the
queue again before the first CPU comes back, the netdev queue could
end up being woken when qspace has already reached zero.
An alternative approach to coping with this race would be to check in
br2684_start_xmit() for qspace==0 and return NETDEV_TX_BUSY, but just
using '> 0' and '< 1' for comparison instead of '== 0' and '!= 0' is
simpler. It just warranted a mention of *why* we do it that way...
Move the call to atmvcc->send() to happen *after* the accounting and
potentially stopping the netdev queue, in br2684_xmit_vcc(). This matters
if the ->send() call suffers an immediate failure, because it'll call
br2684_pop() with the offending skb before returning. We want that to
happen *after* we've done the initial accounting for the packet in
question. Also make it return an appropriate success/failure indication
while we're at it.
Tested by running 'ping -l 1000 bottomless.aaisp.net.uk' from within my
network, with only a single PPPoE-over-BR2684 link running. And after
setting txqueuelen on the nas0 interface to something low (5, in fact).
Before the patch, we'd see about 15 packets being queued and a resulting
latency of ~56ms being reached. After the patch, we see only about 8,
which is fairly much what we expect. And a max latency of ~36ms. On this
OpenWRT box, wmem_default is 163840.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Reviewed-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|