summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-11-10netfilter: ipset: Optimize hash creation routineJozsef Kadlecsik
Exit as easly as possible on error and use RCU_INIT_POINTER() as set is not seen at creation time. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10netfilter: ipset: Make sure element data size is a multiple of u32Jozsef Kadlecsik
Data for hashing required to be array of u32. Make sure that element data always multiple of u32. Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10netfilter: ipset: Make NLEN compile time constant for hash typesJozsef Kadlecsik
Hash types define HOST_MASK before inclusion of ip_set_hash_gen.h and the only place where NLEN needed to be calculated at runtime is *_create() method. Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10netfilter: ipset: Simplify mtype_expire() for hash typesJozsef Kadlecsik
Remove one leve of intendation by using continue while iterating over elements in bucket. Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10netfilter: ipset: Remove redundant mtype_expire() argumentsJozsef Kadlecsik
Remove redundant parameters nets_length and dsize, because they can be get from other parameters. Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10netfilter: ipset: Count non-static extension memory for userspaceJozsef Kadlecsik
Non-static (i.e. comment) extension was not counted into the memory size. A new internal counter is introduced for this. In the case of the hash types the sizes of the arrays are counted there as well so that we can avoid to scan the whole set when just the header data is requested. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10netfilter: ipset: Add element count to all set types headerJozsef Kadlecsik
It is better to list the set elements for all set types, thus the header information is uniform. Element counts are therefore added to the bitmap and list types. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10netfilter: ipset: Add element count to hash headersEric B Munson
It would be useful for userspace to query the size of an ipset hash, however, this data is not exposed to userspace outside of counting the number of member entries. This patch uses the attribute IPSET_ATTR_ELEMENTS to indicate the size in the the header that is exported to userspace. This field is then printed by the userspace tool for hashes. Signed-off-by: Eric B Munson <emunson@akamai.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Josh Hunt <johunt@akamai.com> Cc: netfilter-devel@vger.kernel.org Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10netfilter: ipset: Separate memsize calculation code into dedicated functionJozsef Kadlecsik
Hash types already has it's memsize calculation code in separate functions. Clean up and do the same for *bitmap* and *list* sets. Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>. Suggested-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10netfilter: ipset: Improve skbinfo get/init helpersJozsef Kadlecsik
Use struct ip_set_skbinfo in struct ip_set_ext instead of open coded fields and assign structure members in get/init helpers instead of copying members one by one. Explicitly note that struct ip_set_skbinfo must be padded to prevent non-aligned access in the extension blob. Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>. Suggested-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10netfilter: conntrack: fix NF_REPEAT handlingArnd Bergmann
gcc correctly identified a theoretical uninitialized variable use: net/netfilter/nf_conntrack_core.c: In function 'nf_conntrack_in': net/netfilter/nf_conntrack_core.c:1125:14: error: 'l4proto' may be used uninitialized in this function [-Werror=maybe-uninitialized] This could only happen when we 'goto out' before looking up l4proto, and then enter the retry, implying that l3proto->get_l4proto() returned NF_REPEAT. This does not currently get returned in any code path and probably won't ever happen, but is not good to rely on. Moving the repeat handling up a little should have the same behavior as today but avoids the warning by making that case impossible to enter. [ I have mangled this original patch to remove the check for tmpl, we should inconditionally jump back to the repeat label in case we hit NF_REPEAT instead. I have also moved the comment that explains this where it belongs. --pablo ] Fixes: 08733a0cb7de ("netfilter: handle NF_REPEAT from nf_conntrack_in()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-10udp: provide udp{4,6}_lib_lookup for nf_socket_ipv{4,6}Arnd Bergmann
Since commit ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU") the udp6_lib_lookup and udp4_lib_lookup functions are only provided when it is actually possible to call them. However, moving the callers now caused a link error: net/built-in.o: In function `nf_sk_lookup_slow_v6': (.text+0x131a39): undefined reference to `udp6_lib_lookup' net/ipv4/netfilter/nf_socket_ipv4.o: In function `nf_sk_lookup_slow_v4': nf_socket_ipv4.c:(.text.nf_sk_lookup_slow_v4+0x114): undefined reference to `udp4_lib_lookup' This extends the #ifdef so we also provide the functions when CONFIG_NF_SOCKET_IPV4 or CONFIG_NF_SOCKET_IPV6, respectively are set. Fixes: 8db4c5be88f6 ("netfilter: move socket lookup infrastructure to nf_socket_ipv{4,6}.c") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-09netfilter: conntrack: simplify init/uninit of L4 protocol trackersDavide Caratti
modify registration and deregistration of layer-4 protocol trackers to facilitate inclusion of new elements into the current list of builtin protocols. Both builtin (TCP, UDP, ICMP) and non-builtin (DCCP, GRE, SCTP, UDPlite) layer-4 protocol trackers usually register/deregister themselves using consecutive calls to nf_ct_l4proto_{,pernet}_{,un}register(...). This sequence is interrupted and rolled back in case of error; in order to simplify addition of builtin protocols, the input of the above functions has been modified to allow registering/unregistering multiple protocols. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-09netfilter: nf_tables: simplify the basic expressions' init routineLiping Zhang
Some basic expressions are built into nf_tables.ko, such as nft_cmp, nft_lookup, nft_range and so on. But these basic expressions' init routine is a little ugly, too many goto errX labels, and we forget to call nft_range_module_exit in the exit routine, although it is harmless. Acctually, the init and exit routines of these basic expressions are same, i.e. do nft_register_expr in the init routine and do nft_unregister_expr in the exit routine. So it's better to arrange them into an array and deal with them together. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-09netfilter: nft_hash: get random bytes if seed is not specifiedPablo Neira Ayuso
If the user doesn't specify a seed, generate one at configuration time. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: handle NF_REPEAT from nf_conntrack_in()Pablo Neira Ayuso
NF_REPEAT is only needed from nf_conntrack_in() under a very specific case required by the TCP protocol tracker, we can handle this case without returning to the core hook path. Handling of NF_REPEAT from the nf_reinject() is left untouched. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: merge nf_iterate() into nf_hook_slow()Pablo Neira Ayuso
nf_iterate() has become rather simple, we can integrate this code into nf_hook_slow() to reduce the amount of LOC in the core path. However, we still need nf_iterate() around for nf_queue packet handling, so move this function there where we only need it. I think it should be possible to refactor nf_queue code to get rid of it definitely, but given this is slow path anyway, let's have a look this later. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: remove hook_entries field from nf_hook_statePablo Neira Ayuso
This field is only useful for nf_queue, so store it in the nf_queue_entry structure instead, away from the core path. Pass hook_head to nf_hook_slow(). Since we always have a valid entry on the first iteration in nf_iterate(), we can use 'do { ... } while (entry)' loop instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: use switch() to handle verdict cases from nf_hook_slow()Pablo Neira Ayuso
Use switch() for verdict handling and add explicit handling for NF_STOLEN and other non-conventional verdicts. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: nf_tables: use hook state from xt_action_param structurePablo Neira Ayuso
Don't copy relevant fields from hook state structure, instead use the one that is already available in struct xt_action_param. This patch also adds a set of new wrapper functions to fetch relevant hook state structure fields. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: x_tables: move hook state into xt_action_param structurePablo Neira Ayuso
Place pointer to hook state in xt_action_param structure instead of copying the fields that we need. After this change xt_action_param fits into one cacheline. This patch also adds a set of new wrapper functions to fetch relevant hook state structure fields. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: deprecate NF_STOPPablo Neira Ayuso
NF_STOP is only used by br_netfilter these days, and it can be emulated with a combination of NF_STOLEN plus explicit call to the ->okfn() function as Florian suggests. To retain binary compatibility with userspace nf_queue application, we have to keep NF_STOP around, so libnetfilter_queue userspace userspace applications still work if they use NF_STOP for some exotic reason. Out of tree modules using NF_STOP would break, but we don't care about those. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: kill NF_HOOK_THRESH() and state->treshPablo Neira Ayuso
Patch c5136b15ea36 ("netfilter: bridge: add and use br_nf_hook_thresh") introduced br_nf_hook_thresh(). Replace NF_HOOK_THRESH() by br_nf_hook_thresh from br_nf_forward_finish(), so we have no more callers for this macro. As a result, state->thresh and explicit thresh parameter in the hook state structure is not required anymore. And we can get rid of skip-hook-under-thresh loop in nf_iterate() in the core path that is only used by br_netfilter to search for the filter hook. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: remove comments that predate rcu daysPablo Neira Ayuso
We cannot block/sleep on nf_iterate because netfilter runs under rcu read lock these days, where blocking is well-known to be illegal. So let's remove these old comments. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: get rid of useless debugging from corePablo Neira Ayuso
This patch remove compile time code to catch inconventional verdicts. We have better ways to handle this case these days, eg. pr_debug() but even though I don't think this is useful at all, so let's remove this. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-02ila: Fix crash caused by rhashtable changesTom Herbert
commit ca26893f05e86 ("rhashtable: Add rhlist interface") added a field to rhashtable_iter so that length became 56 bytes and would exceed the size of args in netlink_callback (which is 48 bytes). The netlink diag dump function already has been allocating a iter structure and storing the pointed to that in the args of netlink_callback. ila_xlat also uses rhahstable_iter but is still putting that directly in the arg block. Now since rhashtable_iter size is increased we are overwriting beyond the structure. The next field happens to be cb_mutex pointer in netlink_sock and hence the crash. Fix is to alloc the rhashtable_iter and save it as pointer in arg. Tested: modprobe ila ./ip ila add loc 3333:0:0:0 loc_match 2222:0:0:1, ./ip ila list # NO crash now Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02net: ip, diag -- Adjust raw_abort to use unlocked __udp_disconnectCyrill Gorcunov
While being preparing patches for killing raw sockets via diag netlink interface I noticed that my runs are stuck: | [root@pcs7 ~]# cat /proc/`pidof ss`/stack | [<ffffffff816d1a76>] __lock_sock+0x80/0xc4 | [<ffffffff816d206a>] lock_sock_nested+0x47/0x95 | [<ffffffff8179ded6>] udp_disconnect+0x19/0x33 | [<ffffffff8179b517>] raw_abort+0x33/0x42 | [<ffffffff81702322>] sock_diag_destroy+0x4d/0x52 which has not been the case before. I narrowed it down to the commit | commit 286c72deabaa240b7eebbd99496ed3324d69f3c0 | Author: Eric Dumazet <edumazet@google.com> | Date: Thu Oct 20 09:39:40 2016 -0700 | | udp: must lock the socket in udp_disconnect() where we start locking the socket for different reason. So the raw_abort escaped the renaming and we have to fix this typo using __udp_disconnect instead. Fixes: 286c72deabaa ("udp: must lock the socket in udp_disconnect()") CC: David S. Miller <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Ahern <dsa@cumulusnetworks.com> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> CC: James Morris <jmorris@namei.org> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Patrick McHardy <kaber@trash.net> CC: Andrey Vagin <avagin@openvz.org> CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02tcp: enhance tcp collapsingEric Dumazet
As Ilya Lesokhin suggested, we can collapse two skbs at retransmit time even if the skb at the right has fragments. We simply have to use more generic skb_copy_bits() instead of skb_copy_from_linear_data() in tcp_collapse_retrans() Also need to guard this skb_copy_bits() in case there is nothing to copy, otherwise skb_put() could panic if left skb has frags. Tested: Used following packetdrill test // Establish a connection. 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> +.100 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 +0 write(4, ..., 200) = 200 +0 > P. 1:201(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 201:401(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 401:601(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 601:801(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 801:1001(200) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1001:1101(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1101:1201(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1201:1301(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1301:1401(100) ack 1 +.100 < . 1:1(0) ack 1 win 257 <nop,nop,sack 1001:1401> // Check that TCP collapse works : +0 > P. 1:1001(1000) ack 1 Reported-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02sctp: clean up sctp_packet_transmitXin Long
After adding sctp gso, sctp_packet_transmit is a quite big function now. This patch is to extract the codes for packing packet to sctp_packet_pack from sctp_packet_transmit, and add some comments, simplify the err path by freeing auth chunk when freeing packet chunk_list in out path and freeing head skb early if it fails to pack packet. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02net/sched: cls_flower: merge filter delete/destroy common codeRoi Dayan
Move common code from fl_delete and fl_detroy to __fl_delete. Signed-off-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02net/sched: cls_flower: add missing unbind call when destroying flowsRoi Dayan
tcf_unbind was called in fl_delete but was missing in fl_destroy when force deleting flows. Fixes: 77b9900ef53a ('tc: introduce Flower classifier') Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. This includes better integration with the routing subsystem for nf_tables, explicit notrack support and smaller updates. More specifically, they are: 1) Add fib lookup expression for nf_tables, from Florian Westphal. This new expression provides a native replacement for iptables addrtype and rp_filter matches. This is more flexible though, since we can populate the kernel flowi representation to inquire fib to accomodate new usecases, such as RTBH through skb mark. 2) Introduce rt expression for nf_tables, from Anders K. Pedersen. This new expression allow you to access skbuff route metadata, more specifically nexthop and classid fields. 3) Add notrack support for nf_tables, to skip conntracking, requested by many users already. 4) Add boilerplate code to allow to use nf_log infrastructure from nf_tables ingress. 5) Allow to mangle pkttype from nf_tables prerouting chain, to emulate the xtables cluster match, from Liping Zhang. 6) Move socket lookup code into generic nf_socket_* infrastructure so we can provide a native replacement for the xtables socket match. 7) Make sure nfnetlink_queue data that is updated on every packets is placed in a different cache from read-only data, from Florian Westphal. 8) Handle NF_STOLEN from nf_tables core, also from Florian Westphal. 9) Start round robin number generation in nft_numgen from zero, instead of n-1, for consistency with xtables statistics match, patch from Liping Zhang. 10) Set GFP_NOWARN flag in skbuff netlink allocations in nfnetlink_log, given we retry with a smaller allocation on failure, from Calvin Owens. 11) Cleanup xt_multiport to use switch(), from Gao feng. 12) Remove superfluous check in nft_immediate and nft_cmp, from Liping Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01netfilter: nf_queue: place volatile data in own cachelineFlorian Westphal
As the comment indicates, the data at the end of nfqnl_instance struct is written on every queue/dequeue, so it should reside in its own cacheline. Before this change, 'lock' was in first cacheline so we dirtied both. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01netfilter: nf_tables: remove useless U8_MAX validationLiping Zhang
After call nft_data_init, size is already validated and desc.len will not exceed the sizeof(struct nft_data), i.e. 16 bytes. So it will never exceed U8_MAX. Furthermore, in nft_immediate_init, we forget to call nft_data_uninit when desc.len exceeds U8_MAX, although this will not happen, but it's a logical mistake. Now remove these redundant validation introduced by commit 36b701fae12a ("netfilter: nf_tables: validate maximum value of u32 netlink attributes") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01netfilter: nf_tables: introduce routing expressionAnders K. Pedersen
Introduces an nftables rt expression for routing related data with support for nexthop (i.e. the directly connected IP address that an outgoing packet is sent to), which can be used either for matching or accounting, eg. # nft add rule filter postrouting \ ip daddr 192.168.1.0/24 rt nexthop != 192.168.0.1 drop This will drop any traffic to 192.168.1.0/24 that is not routed via 192.168.0.1. # nft add rule filter postrouting \ flow table acct { rt nexthop timeout 600s counter } # nft add rule ip6 filter postrouting \ flow table acct { rt nexthop timeout 600s counter } These rules count outgoing traffic per nexthop. Note that the timeout releases an entry if no traffic is seen for this nexthop within 10 minutes. # nft add rule inet filter postrouting \ ether type ip \ flow table acct { rt nexthop timeout 600s counter } # nft add rule inet filter postrouting \ ether type ip6 \ flow table acct { rt nexthop timeout 600s counter } Same as above, but via the inet family, where the ether type must be specified explicitly. "rt classid" is also implemented identical to "meta rtclassid", since it is more logical to have this match in the routing expression going forward. Signed-off-by: Anders K. Pedersen <akp@cohaesio.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01netfilter: move socket lookup infrastructure to nf_socket_ipv{4,6}.cPablo Neira Ayuso
We need this split to reuse existing codebase for the upcoming nf_tables socket expression. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01netfilter: nf_log: add packet logging for netdev familyPablo Neira Ayuso
Move layer 2 packet logging into nf_log_l2packet() that resides in nf_log_common.c, so this can be shared by both bridge and netdev families. This patch adds the boiler plate code to register the netdev logging family. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01netfilter: nf_tables: add fib expressionFlorian Westphal
Add FIB expression, supported for ipv4, ipv6 and inet family (the latter just dispatches to ipv4 or ipv6 one based on nfproto). Currently supports fetching output interface index/name and the rtm_type associated with an address. This can be used for adding path filtering. rtm_type is useful to e.g. enforce a strong-end host model where packets are only accepted if daddr is configured on the interface the packet arrived on. The fib expression is a native nftables alternative to the xtables addrtype and rp_filter matches. FIB result order for oif/oifname retrieval is as follows: - if packet is local (skb has rtable, RTF_LOCAL set, this will also catch looped-back multicast packets), set oif to the loopback interface. - if fib lookup returns an error, or result points to local, store zero result. This means '--local' option of -m rpfilter is not supported. It is possible to use 'fib type local' or add explicit saddr/daddr matching rules to create exceptions if this is really needed. - store result in the destination register. In case of multiple routes, search set for desired oif in case strict matching is requested. ipv4 and ipv6 behave fib expressions are supposed to behave the same. [ I have collapsed Arnd Bergmann's ("netfilter: nf_tables: fib warnings") http://patchwork.ozlabs.org/patch/688615/ to address fallout from this patch after rebasing nf-next, that was posted to address compilation warnings. --pablo ] Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01genetlink: fix error return code in genl_register_family()Wei Yongjun
Fix to return a negative error code from the idr_alloc() error handling case instead of 0, as done elsewhere in this function. Also fix the return value check of idr_alloc() since idr_alloc return negative errors on failure, not zero. Fixes: 2ae0f17df1cd ("genetlink: use idr to track families") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01net: Enable support for VRF with ipv4 multicastDavid Ahern
Enable support for IPv4 multicast: - similar to unicast the flow struct is updated to L3 master device if relevant prior to calling fib_rules_lookup. The table id is saved to the lookup arg so the rule action for ipmr can return the table associated with the device. - ip_mr_forward needs to check for master device mismatch as well since the skb->dev is set to it - allow multicast address on VRF device for Rx by checking for the daddr in the VRF device as well as the original ingress device - on Tx need to drop to __mkroute_output when FIB lookup fails for multicast destination address. - if CONFIG_IP_MROUTE_MULTIPLE_TABLES is enabled VRF driver creates IPMR FIB rules on first device create similar to FIB rules. In addition the VRF driver does not divert IPv4 multicast packets: it breaks on Tx since the fib lookup fails on the mcast address. With this patch, ipmr forwarding and local rx/tx work. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: remove SS_CONNECTED sock stateParthasarathy Bhuvaragan
In this commit, we replace references to sock->state SS_CONNECTE with sk_state TIPC_ESTABLISHED. Finally, the sock->state is no longer explicitly used by tipc. The FSM below is for various types of connection oriented sockets. Stream Server Listening Socket: +-----------+ +-------------+ | TIPC_OPEN |------>| TIPC_LISTEN | +-----------+ +-------------+ Stream Server Data Socket: +-----------+ +------------------+ | TIPC_OPEN |------>| TIPC_ESTABLISHED | +-----------+ +------------------+ ^ | | | | v +--------------------+ | TIPC_DISCONNECTING | +--------------------+ Stream Socket Client: +-----------+ +-----------------+ | TIPC_OPEN |------>| TIPC_CONNECTING |------+ +-----------+ +-----------------+ | | | | | v | +------------------+ | | TIPC_ESTABLISHED | | +------------------+ | ^ | | | | | | v | +--------------------+ | | TIPC_DISCONNECTING |<--+ +--------------------+ Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: create TIPC_CONNECTING as a new sk_stateParthasarathy Bhuvaragan
In this commit, we create a new tipc socket state TIPC_CONNECTING by primarily replacing the SS_CONNECTING with TIPC_CONNECTING. There is no functional change in this commit. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: remove SS_DISCONNECTING stateParthasarathy Bhuvaragan
In this commit, we replace the references to SS_DISCONNECTING with the combination of sk_state TIPC_DISCONNECTING and flags set in sk_shutdown. We introduce a new function _tipc_shutdown(), which provides the common code required by tipc_release() and tipc_shutdown(). Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: create TIPC_DISCONNECTING as a new sk_stateParthasarathy Bhuvaragan
In this commit, we create a new tipc socket state TIPC_DISCONNECTING in sk_state. TIPC_DISCONNECTING is replacing the socket connection status update using SS_DISCONNECTING. TIPC_DISCONNECTING is set for connection oriented sockets at: - tipc_shutdown() - connection probe timeout - when we receive an error message on the connection. There is no functional change in this commit. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: create TIPC_OPEN as a new sk_stateParthasarathy Bhuvaragan
In this commit, we create a new tipc socket state TIPC_OPEN in sk_state. We primarily replace the SS_UNCONNECTED sock->state with TIPC_OPEN. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: create TIPC_ESTABLISHED as a new sk_stateParthasarathy Bhuvaragan
Until now, tipc maintains probing state for connected sockets in tsk->probing_state variable. In this commit, we express this information as socket states and this remove the variable. We set probe_unacked flag when a probe is sent out and reset it if we receive a reply. Instead of the probing state TIPC_CONN_OK, we create a new state TIPC_ESTABLISHED. There is no functional change in this commit. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: create TIPC_LISTEN as a new sk_stateParthasarathy Bhuvaragan
Until now, tipc maintains the socket state in sock->state variable. This is used to maintain generic socket states, but in tipc we overload it and save tipc socket states like TIPC_LISTEN. Other protocols like TCP, UDP store protocol specific states in sk->sk_state instead. In this commit, we : - declare a new tipc state TIPC_LISTEN, that replaces SS_LISTEN - Create a new function tipc_set_state(), to update sk->sk_state. - TIPC_LISTEN state is maintained in sk->sk_state. - replace references to SS_LISTEN with TIPC_LISTEN. There is no functional change in this commit. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: remove socket state SS_READYParthasarathy Bhuvaragan
Until now, tipc socket state SS_READY declares that the socket is a connectionless socket. In this commit, we remove the state SS_READY and replace it with a condition which returns true for datagram / connectionless sockets. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: remove probing_intv from tipc_sockParthasarathy Bhuvaragan
Until now, probing_intv is a variable in struct tipc_sock but is always set to a constant CONN_PROBING_INTERVAL. The socket connection is probed based on this value. In this commit, we remove this variable and setup the socket timer based on the constant CONN_PROBING_INTERVAL. There is no functional change in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: remove tsk->connected from tipc_sockParthasarathy Bhuvaragan
Until now, we determine if a socket is connected or not based on tsk->connected, which is set once when the probing state is set to TIPC_CONN_OK. It is unset when the sock->state is updated from SS_CONNECTED to any other state. In this commit, we remove connected variable from tipc_sock and derive socket connection status from the following condition: sock->state == SS_CONNECTED => tsk->connected There is no functional change in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>