summaryrefslogtreecommitdiff
path: root/net/xfrm/xfrm_policy.c
AgeCommit message (Collapse)Author
2017-02-09xfrm: policy: remove garbage_collect callbackFlorian Westphal
Just call xfrm_garbage_collect_deferred() directly. This gets rid of a write to afinfo in register/unregister and allows to constify afinfo later on. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09xfrm: policy: xfrm_policy_unregister_afinfo can return voidFlorian Westphal
Nothing checks the return value. Also, the errors returned on unregister are impossible (we only support INET and INET6, so no way xfrm_policy_afinfo[afinfo->family] can be anything other than 'afinfo' itself). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09xfrm: policy: xfrm_get_tos cannot failFlorian Westphal
The comment makes it look like get_tos() is used to validate something, but it turns out the comment was about xfrm_find_bundle() which got removed years ago. xfrm_get_tos will return either the tos (ipv4) or 0 (ipv6). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09xfrm: policy: init locks earlyFlorian Westphal
Dmitry reports following splat: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 0 PID: 13059 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170207 #1 [..] spin_lock_bh include/linux/spinlock.h:304 [inline] xfrm_policy_flush+0x32/0x470 net/xfrm/xfrm_policy.c:963 xfrm_policy_fini+0xbf/0x560 net/xfrm/xfrm_policy.c:3041 xfrm_net_init+0x79f/0x9e0 net/xfrm/xfrm_policy.c:3091 ops_init+0x10a/0x530 net/core/net_namespace.c:115 setup_net+0x2ed/0x690 net/core/net_namespace.c:291 copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396 create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106 unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205 SYSC_unshare kernel/fork.c:2281 [inline] Problem is that when we get error during xfrm_net_init we will call xfrm_policy_fini which will acquire xfrm_policy_lock before it was initialized. Just move it around so locks get set up first. Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: 283bc9f35bbbcb0e9 ("xfrm: Namespacify xfrm state/policy locks") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-07net: add confirm_neigh method to dst_opsJulian Anastasov
Add confirm_neigh method to dst_ops and use it from IPv4 and IPv6 to lookup and confirm the neighbour. Its usage via the new helper dst_confirm_neigh() should be restricted to MSG_PROBE users for performance reasons. For XFRM prefer the last tunnel address, if present. With help from Steffen Klassert. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04xfrm: trivial typosAlexander Alemayhu
o s/descentant/descendant o s/workarbound/workaround Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-12-12Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp hotplug updates from Thomas Gleixner: "This is the final round of converting the notifier mess to the state machine. The removal of the notifiers and the related infrastructure will happen around rc1, as there are conversions outstanding in other trees. The whole exercise removed about 2000 lines of code in total and in course of the conversion several dozen bugs got fixed. The new mechanism allows to test almost every hotplug step standalone, so usage sites can exercise all transitions extensively. There is more room for improvement, like integrating all the pointlessly different architecture mechanisms of synchronizing, setting cpus online etc into the core code" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) tracing/rb: Init the CPU mask on allocation soc/fsl/qbman: Convert to hotplug state machine soc/fsl/qbman: Convert to hotplug state machine zram: Convert to hotplug state machine KVM/PPC/Book3S HV: Convert to hotplug state machine arm64/cpuinfo: Convert to hotplug state machine arm64/cpuinfo: Make hotplug notifier symmetric mm/compaction: Convert to hotplug state machine iommu/vt-d: Convert to hotplug state machine mm/zswap: Convert pool to hotplug state machine mm/zswap: Convert dst-mem to hotplug state machine mm/zsmalloc: Convert to hotplug state machine mm/vmstat: Convert to hotplug state machine mm/vmstat: Avoid on each online CPU loops mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead() tracing/rb: Convert to hotplug state machine oprofile/nmi timer: Convert to hotplug state machine net/iucv: Use explicit clean up labels in iucv_init() x86/pci/amd-bus: Convert to hotplug state machine x86/oprofile/nmi: Convert to hotplug state machine ...
2016-11-18xfrm: unbreak xfrm_sk_policy_lookupFlorian Westphal
if we succeed grabbing the refcount, then if (err && !xfrm_pol_hold_rcu) will evaluate to false so this hits last else branch which then sets policy to ERR_PTR(0). Fixes: ae33786f73a7ce ("xfrm: policy: only use rcu in xfrm_sk_policy_lookup") Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-11-09net/flowcache: Convert to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine. Use multi state support to avoid custom list handling for the multiple instances. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: netdev@vger.kernel.org Cc: rt@linutronix.de Cc: "David S. Miller" <davem@davemloft.net> Link: http://lkml.kernel.org/r/20161103145021.28528-10-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/mediatek/mtk_eth_soc.c drivers/net/ethernet/qlogic/qed/qed_dcbx.c drivers/net/phy/Kconfig All conflicts were cases of overlapping commits. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24xfrm: Fix xfrm_policy_lock imbalanceSteffen Klassert
An earlier patch accidentally replaced a write_lock_bh with a spin_unlock_bh. Fix this by using spin_lock_bh instead. Fixes: 9d0380df6217 ("xfrm: policy: convert policy_lock to spinlock") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12xfrm: policy: convert policy_lock to spinlockFlorian Westphal
After earlier patches conversions all spots acquire the writer lock and we can now convert this to a normal spinlock. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12xfrm: policy: don't acquire policy lock in xfrm_spd_getinfoFlorian Westphal
It doesn't seem that important. We now get inconsistent view of the counters, but those are stale anyway right after we drop the lock. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12xfrm: policy: only use rcu in xfrm_sk_policy_lookupFlorian Westphal
Don't acquire the readlock anymore and rely on rcu alone. In case writer on other CPU changed policy at the wrong moment (after we obtained sk policy pointer but before we could obtain the reference) just repeat the lookup. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12xfrm: policy: make xfrm_policy_lookup_bytype locklessFlorian Westphal
side effect: no longer disables BH (should be fine). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12xfrm: policy: use atomic_inc_not_zero in rcu sectionFlorian Westphal
If we don't hold the policy lock anymore the refcnt might already be 0, i.e. policy struct is about to be free'd. Switch to atomic_inc_not_zero to avoid this. On removal policies are already unlinked from the tables (lists) before the last _put occurs so we are not supposed to find the same 'dead' entry on the next loop, so its safe to just repeat the lookup. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12xfrm: policy: add sequence count to sync with hash resizeFlorian Westphal
Once xfrm_policy_lookup_bytype doesn't grab xfrm_policy_lock anymore its possible for a hash resize to occur in parallel. Use sequence counter to block lookup in case a resize is in progress and to also re-lookup in case hash table was altered in the mean time (might cause use to not find the best-match). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12xfrm: policy: prepare policy_bydst hash for rcu lookupsFlorian Westphal
Since commit 56f047305dd4b6b617 ("xfrm: add rcu grace period in xfrm_policy_destroy()") xfrm policy objects are already free'd via rcu. In order to make more places lockless (i.e. use rcu_read_lock instead of grabbing read-side of policy rwlock) we only need to: - use rcu_assign_pointer to store address of new hash table backend memory - add rcu barrier so that freeing of old memory is delayed (expansion and free happens from system workqueue, so synchronize_rcu is fine) - use rcu_dereference to fetch current address of the hash table. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12xfrm: policy: use rcu versions for iteration and list add/delFlorian Westphal
This is required once we allow lockless readers. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-07-29xfrm: Ignore socket policies when rebuilding hash tablesTobias Brunner
Whenever thresholds are changed the hash tables are rebuilt. This is done by enumerating all policies and hashing and inserting them into the right table according to the thresholds and direction. Because socket policies are also contained in net->xfrm.policy_all but no hash tables are defined for their direction (dir + XFRM_POLICY_MAX) this causes a NULL or invalid pointer dereference after returning from policy_hash_bysel() if the rebuild is done while any socket policies are installed. Since the rebuild after changing thresholds is scheduled this crash could even occur if the userland sets thresholds seemingly before installing any socket policies. Fixes: 53c2e285f970 ("xfrm: Do not hash socket policies") Signed-off-by: Tobias Brunner <tobias@strongswan.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-12-22Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2015-12-22 Just one patch to fix dst_entries_init with multiple namespaces. From Dan Streetman. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-11xfrm: add rcu protection to sk->sk_policy[]Eric Dumazet
XFRM can deal with SYNACK messages, sent while listener socket is not locked. We add proper rcu protection to __xfrm_sk_clone_policy() and xfrm_sk_policy_lookup() This might serve as the first step to remove xfrm.xfrm_policy_lock use in fast path. Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-11xfrm: add rcu grace period in xfrm_policy_destroy()Eric Dumazet
We will soon switch sk->sk_policy[] to RCU protection, as SYNACK packets are sent while listener socket is not locked. This patch simply adds RCU grace period before struct xfrm_policy freeing, and the corresponding rcu_head in struct xfrm_policy. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-07xfrm: take care of request socketsEric Dumazet
TCP SYNACK messages might now be attached to request sockets. XFRM needs to get back to a listener socket. Adds new helpers that might be used elsewhere : sk_to_full_sk() and sk_const_to_full_sk() Note: We also need to add RCU protection for xfrm lookups, now TCP/DCCP have lockless listener processing. This will be addressed in separate patches. Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03xfrm: dst_entries_init() per-net dst_opsDan Streetman
Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops templates; their dst_entries counters will never be used. Move the xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init and dst_entries_destroy for each net namespace. The ipv4 and ipv6 xfrms each create dst_ops template, and perform dst_entries_init on the templates. The template values are copied to each net namespace's xfrm.xfrm*_dst_ops. The problem there is the dst_ops pcpuc_entries field is a percpu counter and cannot be used correctly by simply copying it to another object. The result of this is a very subtle bug; changes to the dst entries counter from one net namespace may sometimes get applied to a different net namespace dst entries counter. This is because of how the percpu counter works; it has a main count field as well as a pointer to the percpu variables. Each net namespace maintains its own main count variable, but all point to one set of percpu variables. When any net namespace happens to change one of the percpu variables to outside its small batch range, its count is moved to the net namespace's main count variable. So with multiple net namespaces operating concurrently, the dst_ops entries counter can stray from the actual value that it should be; if counts are consistently moved from one net namespace to another (which my testing showed is likely), then one net namespace winds up with a negative dst_ops count while another winds up with a continually increasing count, eventually reaching its gc_thresh limit, which causes all new traffic on the net namespace to fail with -ENOBUFS. Signed-off-by: Dan Streetman <dan.streetman@canonical.com> Signed-off-by: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-10-08dst: Pass net into dst->outputEric W. Biederman
The network namespace is already passed into dst_output pass it into dst->output lwt->output and friends. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08net: Pass net into dst_output and remove dst_output_okfnEric W. Biederman
Replace dst_output_okfn with dst_output Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08xfrm: Only compute net once in xfrm_policy_queue_processEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25inet: constify ip_route_output_flow() socket argumentEric Dumazet
Very soon, TCP stack might call inet_csk_route_req(), which calls inet_csk_route_req() with an unlocked listener socket, so we need to make sure ip_route_output_flow() is not trying to change any field from its socket argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17net: Merge dst_output and dst_output_skEric W. Biederman
Add a sock paramter to dst_output making dst_output_sk superfluous. Add a skb->sk parameter to all of the callers of dst_output Have the callers of dst_output_sk call dst_output. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17xfrm: Remove unused afinfo method init_dstEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-11xfrm: Add oif to dst lookupsDavid Ahern
Rules can be installed that direct route lookups to specific tables based on oif. Plumb the oif through the xfrm lookups so it gets set in the flow struct and passed to the resolver routines. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-05-18xfrm: optimise to search the inexact policy listLi RongQing
The policies are organized into list by priority ascent of policy, so it is unnecessary to continue to loop the policy if the priority of current looped police is larger than or equal priority which is from the policy_bydst list. This allows to match policy with ~0U priority in inexact list too. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-05-05xfrm: move the checking for old xfrm_policy hold_queue to beginningLi RongQing
if hold_queue of old xfrm_policy is NULL, return directly, then not need to run other codes, especially take the spin lock Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-05-05xfrm: remove the unnecessary checking before call xfrm_pol_holdLi RongQing
xfrm_pol_hold will check its input with NULL Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-04-23xfrm: fix the return code when xfrm_*_register_afinfo failedLi RongQing
If xfrm_*_register_afinfo failed since xfrm_*_afinfo[afinfo->family] had the value, return the -EEXIST, not -ENOBUFS Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-04-23xfrm: optimise the use of walk list header in xfrm_policy/state_walkLi RongQing
The walk from input is the list header, and marked as dead, and will be skipped in loop. list_first_entry() can be used to return the true usable value from walk if walk is not empty Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-04-23xfrm: remove the xfrm_queue_purge definitionLi RongQing
The task of xfrm_queue_purge is same as skb_queue_purge, so remove it Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-02-12xfrm: release dst_orig in case of error in xfrm_lookup()huaibin Wang
dst_orig should be released on error. Function like __xfrm_route_forward() expects that behavior. Since a recent commit, xfrm_lookup() may also be called by xfrm_lookup_route(), which expects the opposite. Let's introduce a new flag (XFRM_LOOKUP_KEEP_DST_REF) to tell what should be done in case of error. Fixes: f92ee61982d("xfrm: Generate blackhole routes only from route lookup functions") Signed-off-by: huaibin Wang <huaibin.wang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-12-08Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2014-12-03 1) Fix a set but not used warning. From Fabian Frederick. 2) Currently we make sequence number values available to userspace only if we use ESN. Make the sequence number values also available for non ESN states. From Zhi Ding. 3) Remove socket policy hashing. We don't need it because socket policies are always looked up via a linked list. From Herbert Xu. 4) After removing socket policy hashing, we can use __xfrm_policy_link in xfrm_policy_insert. From Herbert Xu. 5) Add a lookup method for vti6 tunnels with wildcard endpoints. I forgot this when I initially implemented vti6. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13xfrm: Use __xfrm_policy_link in xfrm_policy_insertHerbert Xu
For a long time we couldn't actually use __xfrm_policy_link in xfrm_policy_insert because the latter wanted to do hashing at a specific position. Now that __xfrm_policy_link no longer does hashing it can now be safely used in xfrm_policy_insert to kill some duplicate code, finally reuniting general policies with socket policies. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-11-13xfrm: Do not hash socket policiesHerbert Xu
Back in 2003 when I added policy expiration, I half-heartedly did a clean-up and renamed xfrm_sk_policy_link/xfrm_sk_policy_unlink to __xfrm_policy_link/__xfrm_policy_unlink, because the latter could be reused for all policies. I never actually got around to using __xfrm_policy_link for non-socket policies. Later on hashing was added to all xfrm policies, including socket policies. In fact, we don't need hashing on socket policies at all since they're always looked up via a linked list. This patch restores xfrm_sk_policy_link/xfrm_sk_policy_unlink as wrappers around __xfrm_policy_link/__xfrm_policy_unlink so that it's obvious we're dealing with socket policies. This patch also removes hashing from __xfrm_policy_link as for now it's only used by socket policies which do not need to be hashed. Ironically this will in fact allow us to use this helper for non-socket policies which I shall do later. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-10-30net: skb_fclone_busy() needs to detect orphaned skbEric Dumazet
Some drivers are unable to perform TX completions in a bound time. They instead call skb_orphan() Problem is skb_fclone_busy() has to detect this case, otherwise we block TCP retransmits and can freeze unlucky tcp sessions on mostly idle hosts. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 1f3279ae0c13 ("tcp: avoid retransmits of TCP packets hanging in host queues") Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27xfrm: fix set but not used warning in xfrm_policy_queue_process()Fabian Frederick
err was set but unused. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-10-01net: cleanup and document skb fclone layoutEric Dumazet
Lets use a proper structure to clearly document and implement skb fast clones. Then, we might experiment more easily alternative layouts. This patch adds a new skb_fclone_busy() helper, used by tcp and xfrm, to stop leaking of implementation details. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2014-09-25 1) Remove useless hash_resize_mutex in xfrm_hash_resize(). This mutex is used only there, but xfrm_hash_resize() can't be called concurrently at all. From Ying Xue. 2) Extend policy hashing to prefixed policies based on prefix lenght thresholds. From Christophe Gouault. 3) Make the policy hash table thresholds configurable via netlink. From Christophe Gouault. 4) Remove the maximum authentication length for AH. This was needed to limit stack usage. We switched already to allocate space, so no need to keep the limit. From Herbert Xu. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16xfrm: Generate queueing routes only from route lookup functionsSteffen Klassert
Currently we genarate a queueing route if we have matching policies but can not resolve the states and the sysctl xfrm_larval_drop is disabled. Here we assume that dst_output() is called to kill the queued packets. Unfortunately this assumption is not true in all cases, so it is possible that these packets leave the system unwanted. We fix this by generating queueing routes only from the route lookup functions, here we can guarantee a call to dst_output() afterwards. Fixes: a0073fe18e71 ("xfrm: Add a state resolution packet queue") Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-16xfrm: Generate blackhole routes only from route lookup functionsSteffen Klassert
Currently we genarate a blackhole route route whenever we have matching policies but can not resolve the states. Here we assume that dst_output() is called to kill the balckholed packets. Unfortunately this assumption is not true in all cases, so it is possible that these packets leave the system unwanted. We fix this by generating blackhole routes only from the route lookup functions, here we can guarantee a call to dst_output() afterwards. Fixes: 2774c131b1d ("xfrm: Handle blackhole route creation via afinfo.") Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-02xfrm: configure policy hash table thresholds by netlinkChristophe Gouault
Enable to specify local and remote prefix length thresholds for the policy hash table via a netlink XFRM_MSG_NEWSPDINFO message. prefix length thresholds are specified by XFRMA_SPD_IPV4_HTHRESH and XFRMA_SPD_IPV6_HTHRESH optional attributes (struct xfrmu_spdhthresh). example: struct xfrmu_spdhthresh thresh4 = { .lbits = 0; .rbits = 24; }; struct xfrmu_spdhthresh thresh6 = { .lbits = 0; .rbits = 56; }; struct nlmsghdr *hdr; struct nl_msg *msg; msg = nlmsg_alloc(); hdr = nlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, XFRMA_SPD_IPV4_HTHRESH, sizeof(__u32), NLM_F_REQUEST); nla_put(msg, XFRMA_SPD_IPV4_HTHRESH, sizeof(thresh4), &thresh4); nla_put(msg, XFRMA_SPD_IPV6_HTHRESH, sizeof(thresh6), &thresh6); nla_send_auto(sk, msg); The numbers are the policy selector minimum prefix lengths to put a policy in the hash table. - lbits is the local threshold (source address for out policies, destination address for in and fwd policies). - rbits is the remote threshold (destination address for out policies, source address for in and fwd policies). The default values are: XFRMA_SPD_IPV4_HTHRESH: 32 32 XFRMA_SPD_IPV6_HTHRESH: 128 128 Dynamic re-building of the SPD is performed when the thresholds values are changed. The current thresholds can be read via a XFRM_MSG_GETSPDINFO request: the kernel replies to XFRM_MSG_GETSPDINFO requests by an XFRM_MSG_NEWSPDINFO message, with both attributes XFRMA_SPD_IPV4_HTHRESH and XFRMA_SPD_IPV6_HTHRESH. Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-02xfrm: hash prefixed policies based on preflen thresholdsChristophe Gouault
The idea is an extension of the current policy hashing. Today only non-prefixed policies are stored in a hash table. This patch relaxes the constraints, and hashes policies whose prefix lengths are greater or equal to a configurable threshold. Each hash table (one per direction) maintains its own set of IPv4 and IPv6 thresholds (dbits4, sbits4, dbits6, sbits6), by default (32, 32, 128, 128). Example, if the output hash table is configured with values (16, 24, 56, 64): ip xfrm policy add dir out src 10.22.0.0/20 dst 10.24.1.0/24 ... => hashed ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.1.1/32 ... => hashed ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.0.0/16 ... => unhashed ip xfrm policy add dir out \ src 3ffe:304:124:2200::/60 dst 3ffe:304:124:2401::/64 ... => hashed ip xfrm policy add dir out \ src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2401::2/128 ... => hashed ip xfrm policy add dir out \ src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2400::/56 ... => unhashed The high order bits of the addresses (up to the threshold) are used to compute the hash key. Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>