Age | Commit message (Collapse) | Author |
|
Namespace related cleaning
* make cred_to_ucred static
* remove unused sock_rmalloc function
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While we're at it and introduced CGROUP_NET_CLASSID, lets also make
NETPRIO_CGROUP more consistent with the rest of cgroups and rename it
into CONFIG_CGROUP_NET_PRIO so that for networking, we now have
CONFIG_CGROUP_NET_{PRIO,CLASSID}. This not only makes the CONFIG
option consistent among networking cgroups, but also among cgroups
CONFIG conventions in general as the vast majority has a prefix of
CONFIG_CGROUP_<SUBSYS>.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Zefan Li requested [1] to perform the following cleanup/refactoring:
- Split cgroupfs classid handling into net core to better express a
possible more generic use.
- Disable module support for cgroupfs bits as the majority of other
cgroupfs subsystems do not have that, and seems to be not wished
from cgroup side. Zefan probably might want to follow-up for netprio
later on.
- By this, code can be further reduced which previously took care of
functionality built when compiled as module.
cgroupfs bits are being placed under net/core/netclassid_cgroup.c, so
that we are consistent with {netclassid,netprio}_cgroup naming that is
under net/core/ as suggested by Zefan.
No change in functionality, but only code refactoring that is being
done here.
[1] http://patchwork.ozlabs.org/patch/304825/
Suggested-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
If user run pktgen plus ipsec by using spi, show spi value
properly when cat /proc/net/pktgen/ethX
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Introduce xfrm_state_lookup_byspi to find user specified by custom
from "pgset spi xxx". Using this scheme, any flow regardless its
saddr/daddr could be transform by SA specified with configurable
spi.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
IPsec tunnel mode encapuslation needs to set outter ip header
with right protocol/ttl/id value with regard to skb->dst->child.
Looking up a rt in a standard way is absolutely wrong for every
packet transmission. In a simple way, construct a dst by setting
neccessary information to make tunnel mode encapuslation working.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
User could set specific SPI value to arm pktgen flow with IPsec
transformation, instead of looking up SA by sadr/daddr. The reaseon
to do so is because current state lookup scheme is both slow and, most
important of all, in fact pktgen doesn't need to match any SA state
addresses information, all it needs is the SA transfromation shell to
do the encapuslation.
And this option also provide user an alternative to using pktgen
test existing SA without creating new ones.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
so /proc/net/xfrm_stat could give user clue about what's
wrong in this process.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
xfrm_state lock protects its state, i.e., VALID/DEAD and statistics,
not the transforming procedure, as both mode/type output functions
are reentrant.
Another issue is state lock can be used in BH context when state timer
alarmed, after transformation in pktgen, update state statistics acquiring
state lock should disabled BH context for a moment. Otherwise LOCKDEP
critisize this:
[ 62.354339] pktgen: Packet Generator for packet performance testing. Version: 2.74
[ 62.655444]
[ 62.655448] =================================
[ 62.655451] [ INFO: inconsistent lock state ]
[ 62.655455] 3.13.0-rc2+ #70 Not tainted
[ 62.655457] ---------------------------------
[ 62.655459] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[ 62.655463] kpktgend_0/2764 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 62.655466] (&(&x->lock)->rlock){+.?...}, at: [<ffffffffa00886f6>] pktgen_thread_worker+0x1796/0x1860 [pktgen]
[ 62.655479] {IN-SOFTIRQ-W} state was registered at:
[ 62.655484] [<ffffffff8109a61d>] __lock_acquire+0x62d/0x1d70
[ 62.655492] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 62.655498] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[ 62.655505] [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290
[ 62.655511] [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40
[ 62.655519] [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0
[ 62.655523] [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0
[ 62.655526] [<ffffffff8105a026>] irq_exit+0x96/0xc0
[ 62.655530] [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60
[ 62.655537] [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80
[ 62.655541] [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30
[ 62.655547] [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0
[ 62.655552] [<ffffffff81761c3c>] rest_init+0xbc/0xd0
[ 62.655557] [<ffffffff81ea5e5e>] start_kernel+0x3c4/0x3d1
[ 62.655583] [<ffffffff81ea55a8>] x86_64_start_reservations+0x2a/0x2c
[ 62.655588] [<ffffffff81ea569f>] x86_64_start_kernel+0xf5/0xfc
[ 62.655592] irq event stamp: 77
[ 62.655594] hardirqs last enabled at (77): [<ffffffff810ab7f2>] vprintk_emit+0x1b2/0x520
[ 62.655597] hardirqs last disabled at (76): [<ffffffff810ab684>] vprintk_emit+0x44/0x520
[ 62.655601] softirqs last enabled at (22): [<ffffffff81059b57>] __do_softirq+0x177/0x2d0
[ 62.655605] softirqs last disabled at (15): [<ffffffff8105a026>] irq_exit+0x96/0xc0
[ 62.655609]
[ 62.655609] other info that might help us debug this:
[ 62.655613] Possible unsafe locking scenario:
[ 62.655613]
[ 62.655616] CPU0
[ 62.655617] ----
[ 62.655618] lock(&(&x->lock)->rlock);
[ 62.655622] <Interrupt>
[ 62.655623] lock(&(&x->lock)->rlock);
[ 62.655626]
[ 62.655626] *** DEADLOCK ***
[ 62.655626]
[ 62.655629] no locks held by kpktgend_0/2764.
[ 62.655631]
[ 62.655631] stack backtrace:
[ 62.655636] CPU: 0 PID: 2764 Comm: kpktgend_0 Not tainted 3.13.0-rc2+ #70
[ 62.655638] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006
[ 62.655642] ffffffff8216b7b0 ffff88001be43ab8 ffffffff8176af37 0000000000000007
[ 62.655652] ffff88001c8d4fc0 ffff88001be43b18 ffffffff81766d78 0000000000000000
[ 62.655663] ffff880000000001 ffff880000000001 ffffffff8101025f ffff88001be43b18
[ 62.655671] Call Trace:
[ 62.655680] [<ffffffff8176af37>] dump_stack+0x46/0x58
[ 62.655685] [<ffffffff81766d78>] print_usage_bug+0x1f1/0x202
[ 62.655691] [<ffffffff8101025f>] ? save_stack_trace+0x2f/0x50
[ 62.655696] [<ffffffff81099f8c>] mark_lock+0x28c/0x2f0
[ 62.655700] [<ffffffff810994b0>] ? check_usage_forwards+0x150/0x150
[ 62.655704] [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70
[ 62.655712] [<ffffffff81115b09>] ? irq_work_queue+0x69/0xb0
[ 62.655717] [<ffffffff810ab7f2>] ? vprintk_emit+0x1b2/0x520
[ 62.655722] [<ffffffff8109cec5>] ? trace_hardirqs_on_caller+0x105/0x1d0
[ 62.655730] [<ffffffffa00886f6>] ? pktgen_thread_worker+0x1796/0x1860 [pktgen]
[ 62.655734] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 62.655741] [<ffffffffa00886f6>] ? pktgen_thread_worker+0x1796/0x1860 [pktgen]
[ 62.655745] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[ 62.655752] [<ffffffffa00886f6>] ? pktgen_thread_worker+0x1796/0x1860 [pktgen]
[ 62.655758] [<ffffffffa00886f6>] pktgen_thread_worker+0x1796/0x1860 [pktgen]
[ 62.655766] [<ffffffffa0087a79>] ? pktgen_thread_worker+0xb19/0x1860 [pktgen]
[ 62.655771] [<ffffffff8109cf9d>] ? trace_hardirqs_on+0xd/0x10
[ 62.655777] [<ffffffff81775410>] ? _raw_spin_unlock_irq+0x30/0x40
[ 62.655785] [<ffffffff8151faa0>] ? e1000_clean+0x9d0/0x9d0
[ 62.655791] [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60
[ 62.655795] [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60
[ 62.655800] [<ffffffffa0086f60>] ? mod_cur_headers+0x7f0/0x7f0 [pktgen]
[ 62.655806] [<ffffffff81078f84>] kthread+0xe4/0x100
[ 62.655813] [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170
[ 62.655819] [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0
[ 62.655824] [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
The VLAN tag handling code in netpoll_send_skb_on_dev() has two problems.
1) It exits without unlocking the TXQ.
2) It then tries to queue a NULL skb to npinfo->txq.
Reported-by: Ahmed Tamrawi <atamrawi@iastate.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The following functions are not used outside of net/core/dev.c
and should be declared static.
call_netdevice_notifiers_info
__dev_remove_offload
netdev_has_any_upper_dev
__netdev_adjacent_dev_remove
__netdev_adjacent_dev_link_lists
__netdev_adjacent_dev_unlink_lists
__netdev_adjacent_dev_unlink
__netdev_adjacent_dev_link_neighbour
__netdev_adjacent_dev_unlink_neighbour
And the following are never used and should be deleted
netdev_lower_dev_get_private_rcu
__netdev_find_adj_rcu
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function __rtnl_af_register is never called outside this
code, and the return value is always 0.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the vlan code detects that the real device can do TX VLAN offloads
in hardware, it tries to arrange for the real device's header_ops to
be invoked directly.
But it does so illegally, by simply hooking the real device's
header_ops up to the VLAN device.
This doesn't work because we will end up invoking a set of header_ops
routines which expect a device type which matches the real device, but
will see a VLAN device instead.
Fix this by providing a pass-thru set of header_ops which will arrange
to pass the proper real device instead.
To facilitate this add a dev_rebuild_header(). There are
implementations which provide a ->cache and ->create but not a
->rebuild (f.e. PLIP). So we need a helper function just like
dev_hard_header() to avoid crashes.
Use this helper in the one existing place where the
header_ops->rebuild was being invoked, the neighbour code.
With lots of help from Florian Westphal.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We can use kfree_skb_list() instead of open coding it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Given we allocate memory for each cpu, we can do this
using NUMA affinities, instead of using NUMA policies
of the process changing flow_limit_cpu_bitmap value.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sync with Linus' tree to be able to apply fixes on top of newer things
in tree (efi-stub).
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Conflicts:
drivers/net/ethernet/intel/i40e/i40e_main.c
drivers/net/macvtap.c
Both minor merge hassles, simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is useful to be able to walk all upper devices when bringing
a device online where the RTNL lock is held. In this case it
is safe to walk the all_adj_list because the RTNL lock is used
to protect the write side as well.
This patch adds a check to see if the rtnl lock is held before
throwing a warning in netdev_all_upper_get_next_dev_rcu().
Also because we now have a call site for lockdep_rtnl_is_held()
outside COFIG_LOCK_PROVING an inline definition returning 1 is
needed. Similar to the rcu_read_lock_is_held().
Fixes: 2a47fa45d4df ("ixgbe: enable l2 forwarding acceleration for macvlans")
CC: Veaceslav Falico <vfalico@redhat.com>
Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Adds skb_copy_hash to copy rxhash and l4_rxhash from one skb to another.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Changing name of function as part of making the hash in skbuff to be
generic property, not just for receive path.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The neighbour code sends up an RTM_NEWNEIGH netlink notification if
the NUD state of a neighbour cache entry is changed by a timer (e.g.
from REACHABLE to STALE), even if the lladdr of the entry has not
changed.
But an administrative change to the the NUD state of a neighbour cache
entry that does not change the lladdr (e.g. via "ip -4 neigh change
... nud ...") does not trigger a netlink notification. This means
that netlink listeners will not hear about administrative NUD state
changes such as from a resolved state to PERMANENT.
This patch changes the neighbor code to generate an RTM_NEWNEIGH
message when the NUD state of an entry is changed administratively.
Signed-off-by: Bob Gilligan <gilligan@aristanetworks.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These function to manipulate multiple addresses are not used anywhere
in current net-next tree. Some out of tree code maybe using these but
too bad; they should submit their code upstream..
Also, make __hw_addr_flush local since only used by dev_addr_lists.c
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The bond_first_slave_rcu() will be used to instead of bond_first_slave()
in rcu_read_lock().
According to the Jay Vosburgh's suggestion, the struct netdev_adjacent
should hide from users who wanted to use it directly. so I package a
new function to get the first slave of the bond.
Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch modifies the GRO stack to avoid the use of "network_header"
and associated macros like ip_hdr() and ipv6_hdr() in order to allow
an arbitary number of IP hdrs (v4 or v6) to be used in the
encapsulation chain. This lays the foundation for various IP
tunneling support (IP-in-IP, GRE, VXLAN, SIT,...) to be added later.
With this patch, the GRO stack traversing now is mostly based on
skb_gro_offset rather than special hdr offsets saved in skb (e.g.,
skb->network_header). As a result all but the top layer (i.e., the
the transport layer) must have hdrs of the same length in order for
a pkt to be considered for aggregation. Therefore when adding a new
encap layer (e.g., for tunneling), one must check and skip flows
(e.g., by setting NAPI_GRO_CB(p)->same_flow to 0) that have a
different hdr length.
Note that unlike the network header, the transport header can and
will continue to be set by the GRO code since there will be at
most one "transport layer" in the encap chain.
Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
RFC 4191 states in 3.5:
When a host avoids using any non-reachable router X and instead sends
a data packet to another router Y, and the host would have used
router X if router X were reachable, then the host SHOULD probe each
such router X's reachability by sending a single Neighbor
Solicitation to that router's address. A host MUST NOT probe a
router's reachability in the absence of useful traffic that the host
would have sent to the router if it were reachable. In any case,
these probes MUST be rate-limited to no more than one per minute per
router.
Currently, when the neighbour corresponding to a router falls into
NUD_FAILED, it's never considered again. Introduce a new rt6_nud_state
value, RT6_NUD_FAIL_PROBE, which suggests the route should not be used but
should be probed with a single NS. The probe is ratelimited by the existing
code. To better distinguish meanings of the failure values, rename
RT6_NUD_FAIL_SOFT to RT6_NUD_FAIL_DO_RR.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Various spelling fixes in networking stack
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
unix_dgram_recvmsg() will hold the readlock of the socket until recv
is complete.
In the same time, we may try to setsockopt(SO_PEEK_OFF) which will hang until
unix_dgram_recvmsg() will complete (which can take a while) without allowing
us to break out of it, triggering a hung task spew.
Instead, allow set_peek_off to fail, this way userspace will not hang.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This fixes compile error when CONFIG_NET_NS is not set.
Introduced by:
commit 1d4c8c29841b9991cdf3c7cc4ba7f96a94f104ca
"neigh: restore old behaviour of default parms values"
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
maxattr in genl_family should be used to save the max attribute
type, but not the max command type. Drop monitor doesn't support
any attributes, so we should leave it as zero.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make the behaviour similar to ipv4. This will allow user to set sysctl
default neigh param values and these values will be respected even by
devices registered before (that ones what do not have address set yet).
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previously inet devices were only constructed when addresses are added.
Therefore the default neigh parms values they get are the ones at the
time of these operations.
Now that we're creating inet devices earlier, this changes the behaviour
of default neigh parms values in an incompatible way (see bug #8519).
This patch creates a compromise by setting the default values at the
same point as before but only for those that have not been explicitly
set by the user since the inet device's creation.
Introduced by:
commit 8030f54499925d073a88c09f30d5d844fb1b3190
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu Feb 22 01:53:47 2007 +0900
[IPV4] devinet: Register inetdev earlier.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This will be needed later on to provide better management of default values.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch converts the neigh param members to an array. This allows easier
manipulation which will be needed later on to provide better management of
default values.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As we need it elsewhere, move the inline helper function of
skb_needs_linearize() over to skbuff.h include file. While
at it, also convert the return to 'bool' instead of 'int'
and add a proper kernel doc.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Merge 'net' into 'net-next' to get the AF_PACKET bug fix that
Daniel's direct transmit changes depend upon.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some network drivers use dev_kfree_skb_any() and dev_kfree_skb_irq()
helpers to free skbs, both for dropped packets and TX completed ones.
We need to separate the two causes to get better diagnostics
given by dropwatch or "perf record -e skb:kfree_skb"
This patch provides two new helpers, dev_consume_skb_any() and
dev_consume_skb_irq() to be used for consumed skbs.
__dev_kfree_skb_irq() is slightly optimized to remove one
atomic_dec_and_test() in fast path, and use this_cpu_{r|w} accessors.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove one useless conditional branch :
napi->skb is NULL, so nothing bad can happen.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We must clear local_df when passing the skb between namespaces as the
packet is not local to the new namespace any more and thus may not get
fragmented by local rules. Fred Templin noticed that other namespaces
do fragment IPv6 packets while forwarding. Instead they should have send
back a PTB.
The same problem should be present when forwarding DF-IPv4 packets
between namespaces.
Reported-by: Templin, Fred L <Fred.L.Templin@boeing.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next
Ben Hutchings says:
====================
SIOCGHWTSTAMP ioctl
1. Add the SIOCGHWTSTAMP ioctl and update the timestamping
documentation.
2. Implement SIOCGHWTSTAMP in most drivers that support SIOCSHWTSTAMP.
3. Add a test program to exercise SIOC{G,S}HWTSTAMP.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation of conversion to kernfs, cgroup file handling is
updated so that it can be easily mapped to kernfs. This patch
replaces cftype->read_seq_string() with cftype->seq_show() which is
not limited to single_open() operation and will map directcly to
kernfs seq_file interface.
The conversions are mechanical. As ->seq_show() doesn't have @css and
@cft, the functions which make use of them are converted to use
seq_css() and seq_cft() respectively. In several occassions, e.f. if
it has seq_string in its name, the function name is updated to fit the
new method better.
This patch does not introduce any behavior changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
|
|
In preparation of conversion to kernfs, cgroup file handling is being
consolidated so that it can be easily mapped to the seq_file based
interface of kernfs.
cftype->read_map() doesn't add any value and being replaced with
->read_seq_string(). Update read_priomap() to use ->read_seq_string()
instead.
This patch doesn't make any visible behavior changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Li Zefan <lizefan@huawei.com>
|
|
it seems subquue should be subqueue here
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
commit a553e4a6317b2cfc7659542c10fe43184ffe53da ("[PKTGEN]: IPSEC support")
tried to support IPsec ESP transport transformation for pktgen, but acctually
this doesn't work at all for two reasons(The orignal transformed packet has
bad IPv4 checksum value, as well as wrong auth value, reported by wireshark)
- After transpormation, IPv4 header total length needs update,
because encrypted payload's length is NOT same as that of plain text.
- After transformation, IPv4 checksum needs re-caculate because of payload
has been changed.
With this patch, armmed pktgen with below cofiguration, Wireshark is able to
decrypted ESP packet generated by pktgen without any IPv4 checksum error or
auth value error.
pgset "flag IPSEC"
pgset "flows 1"
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Recently GRO started generating packets with frag_lists of frags.
This was not handled by GSO, thus leading to a crash.
Thankfully these packets are of a regular form and are easy to
handle. This patch handles them in two ways. For completely
non-linear frag_list entries, we simply continue to iterate over
the frag_list frags once we exhaust the normal frags. For frag_list
entries with linear parts, we call pskb_trim on the first part
of the frag_list skb, and then process the rest of the frags in
the usual way.
This patch also kills a chunk of dead frag_list code that has
obviously never ever been run since it ends up generating a bogus
GSO-segmented packet with a frag_list entry.
Future work is planned to split super big packets into TSO
ones.
Fixes: 8a29111c7ca6 ("net: gro: allow to build full sized skb")
Reported-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Reported-by: Jerry Chu <hkchu@google.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
to return msg_name to the user.
This prevents numerous uninitialized memory leaks we had in the
recvmsg handlers and makes it harder for new code to accidentally leak
uninitialized memory.
Optimize for the case recvfrom is called with NULL as address. We don't
need to copy the address at all, so set it to NULL before invoking the
recvmsg handler. We can do so, because all the recvmsg handlers must
cope with the case a plain read() is called on them. read() also sets
msg_name to NULL.
Also document these changes in include/linux/net.h as suggested by David
Miller.
Changes since RFC:
Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
affect sendto as it would bail out earlier while trying to copy-in the
address. It also more naturally reflects the logic by the callers of
verify_iovec.
With this change in place I could remove "
if (!uaddr || msg_sys->msg_namelen == 0)
msg->msg_name = NULL
".
This change does not alter the user visible error logic as we ignore
msg_namelen as long as msg_name is NULL.
Also remove two unnecessary curly brackets in ___sys_recvmsg and change
comments to netdev style.
Cc: David Miller <davem@davemloft.net>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The following commit:
b6c40d68ff6498b7f63ddf97cf0aa818d748dee7
net: only invoke dev->change_rx_flags when device is UP
tried to fix a problem with VLAN devices and promiscuouse flag setting.
The issue was that VLAN device was setting a flag on an interface that
was down, thus resulting in bad promiscuity count.
This commit blocked flag propagation to any device that is currently
down.
A later commit:
deede2fabe24e00bd7e246eb81cd5767dc6fcfc7
vlan: Don't propagate flag changes on down interfaces
fixed VLAN code to only propagate flags when the VLAN interface is up,
thus fixing the same issue as above, only localized to VLAN.
The problem we have now is that if we have create a complex stack
involving multiple software devices like bridges, bonds, and vlans,
then it is possible that the flags would not propagate properly to
the physical devices. A simple examle of the scenario is the
following:
eth0----> bond0 ----> bridge0 ---> vlan50
If bond0 or eth0 happen to be down at the time bond0 is added to
the bridge, then eth0 will never have promisc mode set which is
currently required for operation as part of the bridge. As a
result, packets with vlan50 will be dropped by the interface.
The only 2 devices that implement the special flag handling are
VLAN and DSA and they both have required code to prevent incorrect
flag propagation. As a result we can remove the generic solution
introduced in b6c40d68ff6498b7f63ddf97cf0aa818d748dee7 and leave
it to the individual devices to decide whether they will block
flag propagation or not.
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Register generic netlink multicast groups as an array with
the family and give them contiguous group IDs. Then instead
of passing the global group ID to the various functions that
send messages, pass the ID relative to the family - for most
families that's just 0 because the only have one group.
This avoids the list_head and ID in each group, adding a new
field for the mcast group ID offset to the family.
At the same time, this allows us to prevent abusing groups
again like the quota and dropmon code did, since we can now
check that a family only uses a group it owns.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This doesn't really change anything, but prepares for the
next patch that will change the APIs to pass the group ID
within the family, rather than the global group ID.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|