Age | Commit message (Collapse) | Author |
|
Right now the sessionid value in the kernel is a combination of u32,
int, and unsigned int. Just use unsigned int throughout.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
Introduce xfrm_state_lookup_byspi to find user specified by custom
from "pgset spi xxx". Using this scheme, any flow regardless its
saddr/daddr could be transform by SA specified with configurable
spi.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Acquiring xfrm_state_lock in process context is expected to turn BH off,
as this lock is also used in BH context, namely xfrm state timer handler.
Otherwise it surprises LOCKDEP with below messages.
[ 81.422781] pktgen: Packet Generator for packet performance testing. Version: 2.74
[ 81.725194]
[ 81.725211] =========================================================
[ 81.725212] [ INFO: possible irq lock inversion dependency detected ]
[ 81.725215] 3.13.0-rc2+ #92 Not tainted
[ 81.725216] ---------------------------------------------------------
[ 81.725218] kpktgend_0/2780 just changed the state of lock:
[ 81.725220] (xfrm_state_lock){+.+...}, at: [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0
[ 81.725231] but this lock was taken by another, SOFTIRQ-safe lock in the past:
[ 81.725232] (&(&x->lock)->rlock){+.-...}
[ 81.725232]
[ 81.725232] and interrupts could create inverse lock ordering between them.
[ 81.725232]
[ 81.725235]
[ 81.725235] other info that might help us debug this:
[ 81.725237] Possible interrupt unsafe locking scenario:
[ 81.725237]
[ 81.725238] CPU0 CPU1
[ 81.725240] ---- ----
[ 81.725241] lock(xfrm_state_lock);
[ 81.725243] local_irq_disable();
[ 81.725244] lock(&(&x->lock)->rlock);
[ 81.725246] lock(xfrm_state_lock);
[ 81.725248] <Interrupt>
[ 81.725249] lock(&(&x->lock)->rlock);
[ 81.725251]
[ 81.725251] *** DEADLOCK ***
[ 81.725251]
[ 81.725254] no locks held by kpktgend_0/2780.
[ 81.725255]
[ 81.725255] the shortest dependencies between 2nd lock and 1st lock:
[ 81.725269] -> (&(&x->lock)->rlock){+.-...} ops: 8 {
[ 81.725274] HARDIRQ-ON-W at:
[ 81.725276] [<ffffffff8109a64b>] __lock_acquire+0x65b/0x1d70
[ 81.725282] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725284] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[ 81.725289] [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290
[ 81.725292] [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40
[ 81.725300] [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0
[ 81.725303] [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0
[ 81.725305] [<ffffffff8105a026>] irq_exit+0x96/0xc0
[ 81.725308] [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60
[ 81.725313] [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80
[ 81.725316] [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30
[ 81.725329] [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0
[ 81.725333] [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0
[ 81.725338] IN-SOFTIRQ-W at:
[ 81.725340] [<ffffffff8109a61d>] __lock_acquire+0x62d/0x1d70
[ 81.725342] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725344] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[ 81.725347] [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290
[ 81.725349] [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40
[ 81.725352] [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0
[ 81.725355] [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0
[ 81.725358] [<ffffffff8105a026>] irq_exit+0x96/0xc0
[ 81.725360] [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60
[ 81.725363] [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80
[ 81.725365] [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30
[ 81.725368] [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0
[ 81.725370] [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0
[ 81.725373] INITIAL USE at:
[ 81.725375] [<ffffffff8109a31a>] __lock_acquire+0x32a/0x1d70
[ 81.725385] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725388] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[ 81.725390] [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290
[ 81.725394] [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40
[ 81.725398] [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0
[ 81.725401] [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0
[ 81.725404] [<ffffffff8105a026>] irq_exit+0x96/0xc0
[ 81.725407] [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60
[ 81.725409] [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80
[ 81.725412] [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30
[ 81.725415] [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0
[ 81.725417] [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0
[ 81.725420] }
[ 81.725421] ... key at: [<ffffffff8295b9c8>] __key.46349+0x0/0x8
[ 81.725445] ... acquired at:
[ 81.725446] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725449] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[ 81.725452] [<ffffffff816dc057>] __xfrm_state_delete+0x37/0x140
[ 81.725454] [<ffffffff816dc18c>] xfrm_state_delete+0x2c/0x50
[ 81.725456] [<ffffffff816dc277>] xfrm_state_flush+0xc7/0x1b0
[ 81.725458] [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key]
[ 81.725465] [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key]
[ 81.725468] [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key]
[ 81.725471] [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0
[ 81.725476] [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130
[ 81.725479] [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10
[ 81.725482] [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b
[ 81.725484]
[ 81.725486] -> (xfrm_state_lock){+.+...} ops: 11 {
[ 81.725490] HARDIRQ-ON-W at:
[ 81.725493] [<ffffffff8109a64b>] __lock_acquire+0x65b/0x1d70
[ 81.725504] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725507] [<ffffffff81774e4b>] _raw_spin_lock_bh+0x3b/0x70
[ 81.725510] [<ffffffff816dc1df>] xfrm_state_flush+0x2f/0x1b0
[ 81.725513] [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key]
[ 81.725516] [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key]
[ 81.725519] [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key]
[ 81.725522] [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0
[ 81.725525] [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130
[ 81.725527] [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10
[ 81.725530] [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b
[ 81.725533] SOFTIRQ-ON-W at:
[ 81.725534] [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70
[ 81.725537] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725539] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[ 81.725541] [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0
[ 81.725544] [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen]
[ 81.725547] [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen]
[ 81.725550] [<ffffffff81078f84>] kthread+0xe4/0x100
[ 81.725555] [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0
[ 81.725565] INITIAL USE at:
[ 81.725567] [<ffffffff8109a31a>] __lock_acquire+0x32a/0x1d70
[ 81.725569] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725572] [<ffffffff81774e4b>] _raw_spin_lock_bh+0x3b/0x70
[ 81.725574] [<ffffffff816dc1df>] xfrm_state_flush+0x2f/0x1b0
[ 81.725576] [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key]
[ 81.725580] [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key]
[ 81.725583] [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key]
[ 81.725586] [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0
[ 81.725589] [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130
[ 81.725594] [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10
[ 81.725597] [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b
[ 81.725599] }
[ 81.725600] ... key at: [<ffffffff81cadef8>] xfrm_state_lock+0x18/0x50
[ 81.725606] ... acquired at:
[ 81.725607] [<ffffffff810995c0>] check_usage_backwards+0x110/0x150
[ 81.725609] [<ffffffff81099e96>] mark_lock+0x196/0x2f0
[ 81.725611] [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70
[ 81.725614] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725616] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[ 81.725627] [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0
[ 81.725629] [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen]
[ 81.725632] [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen]
[ 81.725635] [<ffffffff81078f84>] kthread+0xe4/0x100
[ 81.725637] [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0
[ 81.725640]
[ 81.725641]
[ 81.725641] stack backtrace:
[ 81.725645] CPU: 0 PID: 2780 Comm: kpktgend_0 Not tainted 3.13.0-rc2+ #92
[ 81.725647] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006
[ 81.725649] ffffffff82537b80 ffff880018199988 ffffffff8176af37 0000000000000007
[ 81.725652] ffff8800181999f0 ffff8800181999d8 ffffffff81099358 ffffffff82537b80
[ 81.725655] ffffffff81a32def ffff8800181999f4 0000000000000000 ffff880002cbeaa8
[ 81.725659] Call Trace:
[ 81.725664] [<ffffffff8176af37>] dump_stack+0x46/0x58
[ 81.725667] [<ffffffff81099358>] print_irq_inversion_bug.part.42+0x1e8/0x1f0
[ 81.725670] [<ffffffff810995c0>] check_usage_backwards+0x110/0x150
[ 81.725672] [<ffffffff81099e96>] mark_lock+0x196/0x2f0
[ 81.725675] [<ffffffff810994b0>] ? check_usage_forwards+0x150/0x150
[ 81.725685] [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70
[ 81.725691] [<ffffffff810899a5>] ? sched_clock_local+0x25/0x90
[ 81.725694] [<ffffffff81089b38>] ? sched_clock_cpu+0xa8/0x120
[ 81.725697] [<ffffffff8109a31a>] ? __lock_acquire+0x32a/0x1d70
[ 81.725699] [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0
[ 81.725702] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725704] [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0
[ 81.725707] [<ffffffff810899a5>] ? sched_clock_local+0x25/0x90
[ 81.725710] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[ 81.725712] [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0
[ 81.725715] [<ffffffff810971ec>] ? lock_release_holdtime.part.26+0x1c/0x1a0
[ 81.725717] [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0
[ 81.725721] [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen]
[ 81.725724] [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen]
[ 81.725727] [<ffffffffa008ba71>] ? pktgen_thread_worker+0xb11/0x1880 [pktgen]
[ 81.725729] [<ffffffff8109cf9d>] ? trace_hardirqs_on+0xd/0x10
[ 81.725733] [<ffffffff81775410>] ? _raw_spin_unlock_irq+0x30/0x40
[ 81.725745] [<ffffffff8151faa0>] ? e1000_clean+0x9d0/0x9d0
[ 81.725751] [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60
[ 81.725753] [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60
[ 81.725757] [<ffffffffa008af60>] ? mod_cur_headers+0x7f0/0x7f0 [pktgen]
[ 81.725759] [<ffffffff81078f84>] kthread+0xe4/0x100
[ 81.725762] [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170
[ 81.725765] [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0
[ 81.725768] [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
This patch clean up some checkpatch errors like this:
ERROR: "foo * bar" should be "foo *bar"
ERROR: "(foo*)" should be "(foo *)"
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
This patch cleanup some space errors.
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
In order to check against valid IPcomp spi range, export verify_userspi_info
for both pfkey and netlink interface.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
We now queue packets to the policy if the states are not yet resolved,
this replaces the ancient sleeping code. Also the sleeping can cause
indefinite task hangs if the needed state does not get resolved.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
By semantics, xfrm layer is fully name space aware,
so will the locks, e.g. xfrm_state/pocliy_lock.
Ensure exclusive access into state/policy link list
for different name space with one global lock is not
right in terms of semantics aspect at first place,
as they are indeed mutually independent with each
other, but also more seriously causes scalability
problem.
One practical scenario is on a Open Network Stack,
more than hundreds of lxc tenants acts as routers
within one host, a global xfrm_state/policy_lock
becomes the bottleneck. But onces those locks are
decoupled in a per-namespace fashion, locks contend
is just with in specific name space scope, without
causing additional SPD/SAD access delay for other
name space.
Also this patch improve scalability while as without
changing original xfrm behavior.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
__xfrm4/6_state_addr_check is a four steps check, all we need to do
is checking whether the destination address match when looking SA
using wildcard source address. Passing saddr from flow is worst option,
as the checking needs to reach the fourth step while actually only
one time checking will do the work.
So, simplify this process by only checking destination address when
using wildcard source address for looking up SAs.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
If SA is in the process of acquiring, which indicates this SA is more
promising and precise than the fall back option, i.e. using wild card
source address for searching less suitable SA.
So, here bail out, and try again.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Conflicts:
include/net/xfrm.h
Simple conflict between Joe Perches "extern" removal for function
declarations in header files and the changes in Steffen's tree.
Steffen Klassert says:
====================
Two patches that are left from the last development cycle.
Manual merging of include/net/xfrm.h is needed. The conflict
can be solved as it is currently done in linux-next.
1) We announce the creation of temporary acquire state via an asyc event,
so the deletion should be annunced too. From Nicolas Dichtel.
2) The VTI tunnels do not real tunning, they just provide a routable
IPsec tunnel interface. So introduce and use xfrm_tunnel_notifier
instead of xfrm_tunnel for xfrm tunnel mode callback. From Fan Du.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conflicts:
drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
net/bridge/br_multicast.c
net/ipv6/sit.c
The conflicts were minor:
1) sit.c changes overlap with change to ip_tunnel_xmit() signature.
2) br_multicast.c had an overlap between computing max_delay using
msecs_to_jiffies and turning MLDV2_MRC() into an inline function
with a name using lowercase instead of uppercase letters.
3) stmmac had two overlapping changes, one which conditionally allocated
and hooked up a dma_cfg based upon the presence of the pbl OF property,
and another one handling store-and-forward DMA made. The latter of
which should not go into the new of_find_property() basic block.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Creation of temporary SA are announced by netlink, but there is no notification
for the deletion.
This patch fix this asymmetric situation.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
xfrm_state timer should be independent of system clock change,
so switch to CLOCK_BOOTTIME base which is not only monotonic but
also counting suspend time.
Thus issue reported in commit: 9e0d57fd6dad37d72a3ca6db00ca8c76f2215454
("xfrm: SAD entries do not expire correctly after suspend-resume")
could ALSO be avoided.
v2: Use CLOCK_BOOTTIME to count suspend time, but still monotonic.
Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
In xfrm4 and xfrm6 we need to take care about sockets of the other
address family. This could happen because a 6in4 or 4in6 tunnel could
get protected by ipsec.
Because we don't want to have a run-time dependency on ipv6 when only
using ipv4 xfrm we have to embed a pointer to the correct local_error
function in xfrm_state_afinet and look it up when returning an error
depending on the socket address family.
Thanks to vi0ss for the great bug report:
<https://bugzilla.kernel.org/show_bug.cgi?id=58691>
v2:
a) fix two more unsafe interpretations of skb->sk as ipv6 socket
(xfrm6_local_dontfrag and __xfrm6_output)
v3:
a) add an EXPORT_SYMBOL_GPL(xfrm_local_error) to fix a link error when
building ipv6 as a module (thanks to Steffen Klassert)
Reported-by: <vi0oss@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
The mark argument is read only, so constify it. Also make dummy_mark in
af_key const -- only used as dummy argument for this very function.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
By default, DSCP is copying during encapsulation.
Copying the DSCP in IPsec tunneling may be a bit dangerous because packets with
different DSCP may get reordered relative to each other in the network and then
dropped by the remote IPsec GW if the reordering becomes too big compared to the
replay window.
It is possible to avoid this copy with netfilter rules, but it's very convenient
to be able to configure it for each SA directly.
This patch adds a toogle for this purpose. By default, it's not set to maintain
backward compatibility.
Field flags in struct xfrm_usersa_info is full, hence I add a new attribute.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
All users of xfrm_addr_cmp() use its result as boolean.
Introduce xfrm_addr_equal() (which is equal to !xfrm_addr_cmp())
and convert all users.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
afinfo->type_map and afinfo->mode_map deserve separated locks,
they are different things.
We should just take RCU read lock to protect afinfo itself,
but not for the inner pointers.
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Similar to commit 418a99ac6ad487dc9c42e6b0e85f941af56330f2
(Replace rwlock on xfrm_policy_afinfo with rcu), the rwlock
on xfrm_state_afinfo can be replaced by RCU too.
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Remove the check if x->km.state equal to XFRM_STATE_VALID in
xfrm_state_check_expire(), which will be done before call
xfrm_state_check_expire().
add a LINUX_MIB_XFRMOUTSTATEINVALID statistic to record the
outbound error due to invalid xfrm state.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Pull networking changes from David Miller:
1) GRE now works over ipv6, from Dmitry Kozlov.
2) Make SCTP more network namespace aware, from Eric Biederman.
3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.
4) Make openvswitch network namespace aware, from Pravin B Shelar.
5) IPV6 NAT implementation, from Patrick McHardy.
6) Server side support for TCP Fast Open, from Jerry Chu and others.
7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
Borkmann.
8) Increate the loopback default MTU to 64K, from Eric Dumazet.
9) Use a per-task rather than per-socket page fragment allocator for
outgoing networking traffic. This benefits processes that have very
many mostly idle sockets, which is quite common.
From Eric Dumazet.
10) Use up to 32K for page fragment allocations, with fallbacks to
smaller sizes when higher order page allocations fail. Benefits are
a) less segments for driver to process b) less calls to page
allocator c) less waste of space.
From Eric Dumazet.
11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.
12) VXLAN device driver, one way to handle VLAN issues such as the
limitation of 4096 VLAN IDs yet still have some level of isolation.
From Stephen Hemminger.
13) As usual there is a large boatload of driver changes, with the scale
perhaps tilted towards the wireless side this time around.
Fix up various fairly trivial conflicts, mostly caused by the user
namespace changes.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
hyperv: Add buffer for extended info after the RNDIS response message.
hyperv: Report actual status in receive completion packet
hyperv: Remove extra allocated space for recv_pkt_list elements
hyperv: Fix page buffer handling in rndis_filter_send_request()
hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
hyperv: Fix the max_xfer_size in RNDIS initialization
vxlan: put UDP socket in correct namespace
vxlan: Depend on CONFIG_INET
sfc: Fix the reported priorities of different filter types
sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
sfc: Fix loopback self-test with separate_tx_channels=1
sfc: Fix MCDI structure field lookup
sfc: Add parentheses around use of bitfield macro arguments
sfc: Fix null function pointer in efx_sriov_channel_type
vxlan: virtual extensible lan
igmp: export symbol ip_mc_leave_group
netlink: add attributes to fdb interface
tg3: unconditionally select HWMON support when tg3 is enabled.
Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
gre: fix sparse warning
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace changes from Eric Biederman:
"This is a mostly modest set of changes to enable basic user namespace
support. This allows the code to code to compile with user namespaces
enabled and removes the assumption there is only the initial user
namespace. Everything is converted except for the most complex of the
filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
nfs, ocfs2 and xfs as those patches need a bit more review.
The strategy is to push kuid_t and kgid_t values are far down into
subsystems and filesystems as reasonable. Leaving the make_kuid and
from_kuid operations to happen at the edge of userspace, as the values
come off the disk, and as the values come in from the network.
Letting compile type incompatible compile errors (present when user
namespaces are enabled) guide me to find the issues.
The most tricky areas have been the places where we had an implicit
union of uid and gid values and were storing them in an unsigned int.
Those places were converted into explicit unions. I made certain to
handle those places with simple trivial patches.
Out of that work I discovered we have generic interfaces for storing
quota by projid. I had never heard of the project identifiers before.
Adding full user namespace support for project identifiers accounts
for most of the code size growth in my git tree.
Ultimately there will be work to relax privlige checks from
"capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
root in a user names to do those things that today we only forbid to
non-root users because it will confuse suid root applications.
While I was pushing kuid_t and kgid_t changes deep into the audit code
I made a few other cleanups. I capitalized on the fact we process
netlink messages in the context of the message sender. I removed
usage of NETLINK_CRED, and started directly using current->tty.
Some of these patches have also made it into maintainer trees, with no
problems from identical code from different trees showing up in
linux-next.
After reading through all of this code I feel like I might be able to
win a game of kernel trivial pursuit."
Fix up some fairly trivial conflicts in netfilter uid/git logging code.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
userns: Convert the ufs filesystem to use kuid/kgid where appropriate
userns: Convert the udf filesystem to use kuid/kgid where appropriate
userns: Convert ubifs to use kuid/kgid
userns: Convert squashfs to use kuid/kgid where appropriate
userns: Convert reiserfs to use kuid and kgid where appropriate
userns: Convert jfs to use kuid/kgid where appropriate
userns: Convert jffs2 to use kuid and kgid where appropriate
userns: Convert hpfs to use kuid and kgid where appropriate
userns: Convert btrfs to use kuid/kgid where appropriate
userns: Convert bfs to use kuid/kgid where appropriate
userns: Convert affs to use kuid/kgid wherwe appropriate
userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
userns: On ia64 deal with current_uid and current_gid being kuid and kgid
userns: On ppc convert current_uid from a kuid before printing.
userns: Convert s390 getting uid and gid system calls to use kuid and kgid
userns: Convert s390 hypfs to use kuid and kgid where appropriate
userns: Convert binder ipc to use kuids
userns: Teach security_path_chown to take kuids and kgids
userns: Add user namespace support to IMA
userns: Convert EVM to deal with kuids and kgids in it's hmac computation
...
|
|
Always store audit loginuids in type kuid_t.
Print loginuids by converting them into uids in the appropriate user
namespace, and then printing the resulting uid.
Modify audit_get_loginuid to return a kuid_t.
Modify audit_set_loginuid to take a kuid_t.
Modify /proc/<pid>/loginuid on read to convert the loginuid into the
user namespace of the opener of the file.
Modify /proc/<pid>/loginud on write to convert the loginuid
rom the user namespace of the opener of the file.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Paul Moore <paul@paul-moore.com> ?
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
Conflicts:
net/netfilter/nfnetlink_log.c
net/netfilter/xt_LOG.c
Rather easy conflict resolution, the 'net' tree had bug fixes to make
sure we checked if a socket is a time-wait one or not and elide the
logging code if so.
Whereas on the 'net-next' side we are calculating the UID and GID from
the creds using different interfaces due to the user namespace changes
from Eric Biederman.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.
I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.
I have successfully built an allyesconfig kernel with this change.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Initialize return variable before exiting on an error path.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
when != &ret
*if(...)
{
... when != ret = e2
when forall
return ret;
}
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sematically speaking, xfrm_mgr.acquire is called when kernel intends to ask
user space IKE daemon to negotiate SAs with peers. IOW the direction will
*always* be XFRM_POLICY_OUT, so remove int dir for clarity.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After SA is setup, one timer is armed to detect soft/hard expiration,
however the timer handler uses xtime to do the math. This makes hard
expiration occurs first before soft expiration after setting new date
with big interval. As a result new child SA is deleted before rekeying
the new one.
Signed-off-by: Fan Du <fdu@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
C assignment can handle struct in6_addr copying.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Upon "ip xfrm state update ..", xfrm_add_sa() takes an extra reference on
the user-supplied SA and forgets to drop the reference when
xfrm_state_update() returns 0. This leads to a memory leak as the
parameter SA is never freed. This change attempts to fix the leak by
calling __xfrm_state_put() when xfrm_state_update() updates a valid SA
(err = 0). The parameter SA is added to the gc list when the final
reference is dropped by xfrm_add_sa() upon completion.
Signed-off-by: Tushar Gohad <tgohad@mvista.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When we clone a xfrm state we have to assign the replay_esn
and the preplay_esn pointers to the state if we use the
new replay detection method. To this end, we add a
xfrm_replay_clone() function that allocates memory for
the replay detection and takes over the necessary values
from the original state.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 'xfrm: Move IPsec replay detection functions to a separate file'
(9fdc4883d92d20842c5acea77a4a21bb1574b495)
introduce repl field to struct xfrm_state, and only initialize it
under SA's netlink create path, the other path, such as pf_key,
ipcomp/ipcomp6 etc, the repl field remaining uninitialize. So if
the SA is created by pf_key, any input packet with SA's encryption
algorithm will cause panic.
int xfrm_input()
{
...
x->repl->advance(x, seq);
...
}
This patch fixed it by introduce new function __xfrm_init_state().
Pid: 0, comm: swapper Not tainted 2.6.38-next+ #14 Bochs Bochs
EIP: 0060:[<c078e5d5>] EFLAGS: 00010206 CPU: 0
EIP is at xfrm_input+0x31c/0x4cc
EAX: dd839c00 EBX: 00000084 ECX: 00000000 EDX: 01000000
ESI: dd839c00 EDI: de3a0780 EBP: dec1de88 ESP: dec1de64
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper (pid: 0, ti=dec1c000 task=c09c0f20 task.ti=c0992000)
Stack:
00000000 00000000 00000002 c0ba27c0 00100000 01000000 de3a0798 c0ba27c0
00000033 dec1de98 c0786848 00000000 de3a0780 dec1dea4 c0786868 00000000
dec1debc c074ee56 e1da6b8c de3a0780 c074ed44 de3a07a8 dec1decc c074ef32
Call Trace:
[<c0786848>] xfrm4_rcv_encap+0x22/0x27
[<c0786868>] xfrm4_rcv+0x1b/0x1d
[<c074ee56>] ip_local_deliver_finish+0x112/0x1b1
[<c074ed44>] ? ip_local_deliver_finish+0x0/0x1b1
[<c074ef32>] NF_HOOK.clone.1+0x3d/0x44
[<c074ef77>] ip_local_deliver+0x3e/0x44
[<c074ed44>] ? ip_local_deliver_finish+0x0/0x1b1
[<c074ec03>] ip_rcv_finish+0x30a/0x332
[<c074e8f9>] ? ip_rcv_finish+0x0/0x332
[<c074ef32>] NF_HOOK.clone.1+0x3d/0x44
[<c074f188>] ip_rcv+0x20b/0x247
[<c074e8f9>] ? ip_rcv_finish+0x0/0x332
[<c072797d>] __netif_receive_skb+0x373/0x399
[<c0727bc1>] netif_receive_skb+0x4b/0x51
[<e0817e2a>] cp_rx_poll+0x210/0x2c4 [8139cp]
[<c072818f>] net_rx_action+0x9a/0x17d
[<c0445b5c>] __do_softirq+0xa1/0x149
[<c0445abb>] ? __do_softirq+0x0/0x149
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds a netlink based user interface to configure
esn and big anti-replay windows. The new netlink attribute
XFRMA_REPLAY_ESN_VAL is used to configure the new implementation.
If the XFRM_STATE_ESN flag is set, we use esn and support for big
anti-replay windows for the configured state. If this flag is not
set we use the new implementation with 32 bit sequence numbers.
A big anti-replay window can be configured in this case anyway.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To support multiple versions of replay detection, we move the replay
detection functions to a separate file and make them accessible
via function pointers contained in the struct xfrm_replay.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I intend to turn struct flowi into a union of AF specific flowi
structs. There will be a common structure that each variant includes
first, much like struct sock_common.
This is the first step to move in that direction.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This required a const'ification in xfrm_init_tempstate() too.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|