summaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)Author
2014-01-15ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTEThomas Haller
Refactor the deletion/update of prefix routes when removing an address. Now also consider IFA_F_NOPREFIXROUTE and if there is an address present with this flag, to not cleanup the route. Instead, assume that userspace is taking care of this route. Also perform the same cleanup, when userspace changes an existing address to add NOPREFIXROUTE (to an address that didn't have this flag). This is done because when the address was added, a prefix route was created for it. Since the user now wants to handle this route by himself, we cleanup this route. This cleanup of the route is not totally robust. There is no guarantee, that the route we are about to delete was really the one added by the kernel. This behavior does not change by the patch, and in practice it should work just fine. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of IP6 routesThomas Haller
When adding/modifying an IPv6 address, the userspace application needs a way to suppress adding a prefix route. This is for example relevant together with IFA_F_MANAGERTEMPADDR, where userspace creates autoconf generated addresses, but depending on on-link, no route for the prefix should be added. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15ipv6: move IPV6_TCLASS_SHIFT into ipv6.h and define a helperLi RongQing
Two places defined IPV6_TCLASS_SHIFT, so we should move it into ipv6.h, and use this macro as possible. And define ip6_tclass helper to return tclass Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14IPv6: move the anycast_src_echo_reply sysctl to netns_sysctl_ipv6FX Le Bail
This change move anycast_src_echo_reply sysctl with other ipv6 sysctls. Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14ipv6: addrconf spelling fixesstephen hemminger
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14net: replace macros net_random and net_srandom with direct calls to prandomAruna-Hewapathirane
This patch removes the net_random and net_srandom macros and replaces them with direct calls to the prandom ones. As new commits only seem to use prandom_u32 there is no use to keep them around. This change makes it easier to grep for users of prandom_u32. Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14ipv6: copy traffic class from ping request to replyHannes Frederic Sowa
Suggested-by: Simon Schneider <simon-schneider@gmx.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2014-01-13ipv6: introduce ip6_dst_mtu_forward and protect forwarding path with itHannes Frederic Sowa
In the IPv6 forwarding path we are only concerend about the outgoing interface MTU, but also respect locked MTUs on routes. Tunnel provider or IPSEC already have to recheck and if needed send PtB notifications to the sending host in case the data does not fit into the packet with added headers (we only know the final header sizes there, while also using path MTU information). The reason for this change is, that path MTU information can be injected into the kernel via e.g. icmp_err protocol handler without verification of local sockets. As such, this could cause the IPv6 forwarding path to wrongfully emit Packet-too-Big errors and drop IPv6 packets. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: John Heffner <johnwheffner@gmail.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-10Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem Conflicts: net/ieee802154/6lowpan.c
2014-01-09ipv6: add link-local, sit and loopback address with INFINITY_LIFE_TIMEHannes Frederic Sowa
In the past the IFA_PERMANENT flag indicated, that the valid and preferred lifetime where ignored. Since change fad8da3e085ddf ("ipv6 addrconf: fix preferred lifetime state-changing behavior while valid_lft is infinity") we honour at least the preferred lifetime on those addresses. As such the valid lifetime gets recalculated and updated to 0. If loopback address is added manually this problem does not occur. Also if NetworkManager manages IPv6, those addresses will get added via inet6_rtm_newaddr and thus will have a correct lifetime, too. Reported-by: François-Xavier Le Bail <fx.lebail@yahoo.com> Reported-by: Damien Wyart <damien.wyart@gmail.com> Fixes: fad8da3e085ddf ("ipv6 addrconf: fix preferred lifetime state-changing behavior while valid_lft is infinity") Cc: Yasushi Asano <yasushi.asano@jp.fujitsu.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-09Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables Pablo Neira Ayuso says: ==================== nf_tables updates for net-next The following patchset contains the following nf_tables updates, mostly updates from Patrick McHardy, they are: * Add the "inet" table and filter chain type for this new netfilter family: NFPROTO_INET. This special table/chain allows IPv4 and IPv6 rules, this should help to simplify the burden in the administration of dual stack firewalls. This also includes several patches to prepare the infrastructure for this new table and a new meta extension to match the layer 3 and 4 protocol numbers, from Patrick McHardy. * Load both IPv4 and IPv6 conntrack modules in nft_ct if the rule is used in NFPROTO_INET, as we don't certainly know which one would be used, also from Patrick McHardy. * Do not allow to delete a table that contains sets, otherwise these sets become orphan, from Patrick McHardy. * Hold a reference to the corresponding nf_tables family module when creating a table of that family type, to avoid the module deletion when in use, from Patrick McHardy. * Update chain counters before setting the chain policy to ensure that we don't leave the chain in inconsistent state in case of errors (aka. restore chain atomicity). This also fixes a possible leak if it fails to allocate the chain counters if no counters are passed to be restored, from Patrick McHardy. * Don't check for overflows in the table counter if we are just renaming a chain, from Patrick McHardy. * Replay the netlink request after dropping the nfnl lock to load the module that supports provides a chain type, from Patrick. * Fix chain type module references, from Patrick. * Several cleanups, function renames, constification and code refactorizations also from Patrick McHardy. * Add support to set the connmark, this can be used to set it based on the meta mark (similar feature to -j CONNMARK --restore), from Kristian Evensen. * A couple of fixes to the recently added meta/set support and nft_reject, and fix missing chain type unregistration if we fail to register our the family table/filter chain type, from myself. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-09netfilter: nf_tables: fix error path in the init functionsPablo Neira Ayuso
We have to unregister chain type if this fails to register netns. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09netfilter: nf_tables: rename nft_do_chain_pktinfo() to nft_do_chain()Patrick McHardy
We don't encode argument types into function names and since besides nft_do_chain() there are only AF-specific versions, there is no risk of confusion. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09netfilter: nf_tables: minor nf_chain_type cleanupsPatrick McHardy
Minor nf_chain_type cleanups: - reorder struct to plug a hoe - rename struct module member to "owner" for consistency - rename nf_hookfn array to "hooks" for consistency - reorder initializers for better readability Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09netfilter: nf_tables: constify chain type definitions and pointersPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09netfilter: nf_tables: add missing module references to chain typesPatrick McHardy
In some cases we neither take a reference to the AF info nor to the chain type, allowing the module to be unloaded while in use. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07netfilter: nf_tables: add "inet" table for IPv4/IPv6Patrick McHardy
This patch adds a new table family and a new filter chain that you can use to attach IPv4 and IPv6 rules. This should help to simplify rule-set maintainance in dual-stack setups. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07netfilter: nf_tables: add support for multi family tablesPatrick McHardy
Add support to register chains to multiple hooks for different address families for mixed IPv4/IPv6 tables. Signed-off-by: Patrick McHardy <kaber@trash.net>
2014-01-07netfilter: nf_tables: make chain types override the default AF functionsPatrick McHardy
Currently the AF-specific hook functions override the chain-type specific hook functions. That doesn't make too much sense since the chain types are a special case of the AF-specific hooks. Make the AF-specific hook functions the default and make the optional chain type hooks override them. As a side effect, the necessary code restructuring reduces the code size, f.i. in case of nf_tables_ipv4.o: nf_tables_ipv4_init_net | -24 nft_do_chain_ipv4 | -113 2 functions changed, 137 bytes removed, diff: -137 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07net-gre-gro: Add GRE support to the GRO stackJerry Chu
This patch built on top of Commit 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support") to add the support of the standard GRE (RFC1701/RFC2784/RFC2890) to the GRO stack. It also serves as an example for supporting other encapsulation protocols in the GRO stack in the future. The patch supports version 0 and all the flags (key, csum, seq#) but will flush any pkt with the S (seq#) flag. This is because the S flag is not support by GSO, and a GRO pkt may end up in the forwarding path, thus requiring GSO support to break it up correctly. Currently the "packet_offload" structure only contains L3 (ETH_P_IP/ ETH_P_IPV6) GRO offload support so the encapped pkts are limited to IP pkts (i.e., w/o L2 hdr). But support for other protocol type can be easily added, so is the support for GRE variations like NVGRE. The patch also support csum offload. Specifically if the csum flag is on and the h/w is capable of checksumming the payload (CHECKSUM_COMPLETE), the code will take advantage of the csum computed by the h/w when validating the GRE csum. Note that commit 60769a5dcd8755715c7143b4571d5c44f01796f1 "ipv4: gre: add GRO capability" already introduces GRO capability to IPv4 GRE tunnels, using the gro_cells infrastructure. But GRO is done after GRE hdr has been removed (i.e., decapped). The following patch applies GRO when pkts first come in (before hitting the GRE tunnel code). There is some performance advantage for applying GRO as early as possible. Also this approach is transparent to other subsystem like Open vSwitch where GRE decap is handled outside of the IP stack hence making it harder for the gro_cells stuff to apply. On the other hand, some NICs are still not capable of hashing on the inner hdr of a GRE pkt (RSS). In that case the GRO processing of pkts from the same remote host will all happen on the same CPU and the performance may be suboptimal. I'm including some rough preliminary performance numbers below. Note that the performance will be highly dependent on traffic load, mix as usual. Moreover it also depends on NIC offload features hence the following is by no means a comprehesive study. Local testing and tuning will be needed to decide the best setting. All tests spawned 50 copies of netperf TCP_STREAM and ran for 30 secs. (super_netperf 50 -H 192.168.1.18 -l 30) An IP GRE tunnel with only the key flag on (e.g., ip tunnel add gre1 mode gre local 10.246.17.18 remote 10.246.17.17 ttl 255 key 123) is configured. The GRO support for pkts AFTER decap are controlled through the device feature of the GRE device (e.g., ethtool -K gre1 gro on/off). 1.1 ethtool -K gre1 gro off; ethtool -K eth0 gro off thruput: 9.16Gbps CPU utilization: 19% 1.2 ethtool -K gre1 gro on; ethtool -K eth0 gro off thruput: 5.9Gbps CPU utilization: 15% 1.3 ethtool -K gre1 gro off; ethtool -K eth0 gro on thruput: 9.26Gbps CPU utilization: 12-13% 1.4 ethtool -K gre1 gro on; ethtool -K eth0 gro on thruput: 9.26Gbps CPU utilization: 10% The following tests were performed on a different NIC that is capable of csum offload. I.e., the h/w is capable of computing IP payload csum (CHECKSUM_COMPLETE). 2.1 ethtool -K gre1 gro on (hence will use gro_cells) 2.1.1 ethtool -K eth0 gro off; csum offload disabled thruput: 8.53Gbps CPU utilization: 9% 2.1.2 ethtool -K eth0 gro off; csum offload enabled thruput: 8.97Gbps CPU utilization: 7-8% 2.1.3 ethtool -K eth0 gro on; csum offload disabled thruput: 8.83Gbps CPU utilization: 5-6% 2.1.4 ethtool -K eth0 gro on; csum offload enabled thruput: 8.98Gbps CPU utilization: 5% 2.2 ethtool -K gre1 gro off 2.2.1 ethtool -K eth0 gro off; csum offload disabled thruput: 5.93Gbps CPU utilization: 9% 2.2.2 ethtool -K eth0 gro off; csum offload enabled thruput: 5.62Gbps CPU utilization: 8% 2.2.3 ethtool -K eth0 gro on; csum offload disabled thruput: 7.69Gbps CPU utilization: 8% 2.2.4 ethtool -K eth0 gro on; csum offload enabled thruput: 8.96Gbps CPU utilization: 5-6% Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07IPv6: add the option to use anycast addresses as source addresses in echo replyFX Le Bail
This change allows to follow a recommandation of RFC4942. - Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses as source addresses for ICMPv6 echo reply. This sysctl is false by default to preserve existing behavior. - Add inline check ipv6_anycast_destination(). - Use them in icmpv6_echo_reply(). Reference: RFC4942 - IPv6 Transition/Coexistence Security Considerations (http://tools.ietf.org/html/rfc4942#section-2.1.6) 2.1.6. Anycast Traffic Identification and Security [...] To avoid exposing knowledge about the internal structure of the network, it is recommended that anycast servers now take advantage of the ability to return responses with the anycast address as the source address if possible. Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07ipv6: pcpu_tstats.syncp should be initialised in ip6_vti.cLi RongQing
initialise pcpu_tstats.syncp to kill the calltrace [ 11.973950] Call Trace: [ 11.973950] [<819bbaff>] dump_stack+0x48/0x60 [ 11.973950] [<819bbaff>] dump_stack+0x48/0x60 [ 11.973950] [<81078dcf>] __lock_acquire.isra.22+0x1bf/0xc10 [ 11.973950] [<81078dcf>] __lock_acquire.isra.22+0x1bf/0xc10 [ 11.973950] [<81079fa7>] lock_acquire+0x77/0xa0 [ 11.973950] [<81079fa7>] lock_acquire+0x77/0xa0 [ 11.973950] [<817ca7ab>] ? dev_get_stats+0xcb/0x130 [ 11.973950] [<817ca7ab>] ? dev_get_stats+0xcb/0x130 [ 11.973950] [<8183862d>] ip_tunnel_get_stats64+0x6d/0x230 [ 11.973950] [<8183862d>] ip_tunnel_get_stats64+0x6d/0x230 [ 11.973950] [<817ca7ab>] ? dev_get_stats+0xcb/0x130 [ 11.973950] [<817ca7ab>] ? dev_get_stats+0xcb/0x130 [ 11.973950] [<811cf8c1>] ? __nla_reserve+0x21/0xd0 [ 11.973950] [<811cf8c1>] ? __nla_reserve+0x21/0xd0 [ 11.973950] [<817ca7ab>] dev_get_stats+0xcb/0x130 [ 11.973950] [<817ca7ab>] dev_get_stats+0xcb/0x130 [ 11.973950] [<817d5409>] rtnl_fill_ifinfo+0x569/0xe20 [ 11.973950] [<817d5409>] rtnl_fill_ifinfo+0x569/0xe20 [ 11.973950] [<810352e0>] ? kvm_clock_read+0x20/0x30 [ 11.973950] [<810352e0>] ? kvm_clock_read+0x20/0x30 [ 11.973950] [<81008e38>] ? sched_clock+0x8/0x10 [ 11.973950] [<81008e38>] ? sched_clock+0x8/0x10 [ 11.973950] [<8106ba45>] ? sched_clock_local+0x25/0x170 [ 11.973950] [<8106ba45>] ? sched_clock_local+0x25/0x170 [ 11.973950] [<810da6bd>] ? __kmalloc+0x3d/0x90 [ 11.973950] [<810da6bd>] ? __kmalloc+0x3d/0x90 [ 11.973950] [<817b8c10>] ? __kmalloc_reserve.isra.41+0x20/0x70 [ 11.973950] [<817b8c10>] ? __kmalloc_reserve.isra.41+0x20/0x70 [ 11.973950] [<810da81a>] ? slob_alloc_node+0x2a/0x60 [ 11.973950] [<810da81a>] ? slob_alloc_node+0x2a/0x60 [ 11.973950] [<817b919a>] ? __alloc_skb+0x6a/0x2b0 [ 11.973950] [<817b919a>] ? __alloc_skb+0x6a/0x2b0 [ 11.973950] [<817d8795>] rtmsg_ifinfo+0x65/0xe0 [ 11.973950] [<817d8795>] rtmsg_ifinfo+0x65/0xe0 [ 11.973950] [<817cbd31>] register_netdevice+0x531/0x5a0 [ 11.973950] [<817cbd31>] register_netdevice+0x531/0x5a0 [ 11.973950] [<81892b87>] ? ip6_tnl_get_cap+0x27/0x90 [ 11.973950] [<81892b87>] ? ip6_tnl_get_cap+0x27/0x90 [ 11.973950] [<817cbdb6>] register_netdev+0x16/0x30 [ 11.973950] [<817cbdb6>] register_netdev+0x16/0x30 [ 11.973950] [<81f574a6>] vti6_init_net+0x1c4/0x1d4 [ 11.973950] [<81f574a6>] vti6_init_net+0x1c4/0x1d4 [ 11.973950] [<81f573af>] ? vti6_init_net+0xcd/0x1d4 [ 11.973950] [<81f573af>] ? vti6_init_net+0xcd/0x1d4 [ 11.973950] [<817c16df>] ops_init.constprop.11+0x17f/0x1c0 [ 11.973950] [<817c16df>] ops_init.constprop.11+0x17f/0x1c0 [ 11.973950] [<817c1779>] register_pernet_operations.isra.9+0x59/0x90 [ 11.973950] [<817c1779>] register_pernet_operations.isra.9+0x59/0x90 [ 11.973950] [<817c18d1>] register_pernet_device+0x21/0x60 [ 11.973950] [<817c18d1>] register_pernet_device+0x21/0x60 [ 11.973950] [<81f574b6>] ? vti6_init_net+0x1d4/0x1d4 [ 11.973950] [<81f574b6>] ? vti6_init_net+0x1d4/0x1d4 [ 11.973950] [<81f574c7>] vti6_tunnel_init+0x11/0x68 [ 11.973950] [<81f574c7>] vti6_tunnel_init+0x11/0x68 [ 11.973950] [<81f572a1>] ? mip6_init+0x73/0xb4 [ 11.973950] [<81f572a1>] ? mip6_init+0x73/0xb4 [ 11.973950] [<81f0cba4>] do_one_initcall+0xbb/0x15b [ 11.973950] [<81f0cba4>] do_one_initcall+0xbb/0x15b [ 11.973950] [<811a00d8>] ? sha_transform+0x528/0x1150 [ 11.973950] [<811a00d8>] ? sha_transform+0x528/0x1150 [ 11.973950] [<81f0c544>] ? repair_env_string+0x12/0x51 [ 11.973950] [<81f0c544>] ? repair_env_string+0x12/0x51 [ 11.973950] [<8105c30d>] ? parse_args+0x2ad/0x440 [ 11.973950] [<8105c30d>] ? parse_args+0x2ad/0x440 [ 11.973950] [<810546be>] ? __usermodehelper_set_disable_depth+0x3e/0x50 [ 11.973950] [<810546be>] ? __usermodehelper_set_disable_depth+0x3e/0x50 [ 11.973950] [<81f0cd27>] kernel_init_freeable+0xe3/0x182 [ 11.973950] [<81f0cd27>] kernel_init_freeable+0xe3/0x182 [ 11.973950] [<81f0c532>] ? do_early_param+0x7a/0x7a [ 11.973950] [<81f0c532>] ? do_early_param+0x7a/0x7a [ 11.973950] [<819b5b1b>] kernel_init+0xb/0x100 [ 11.973950] [<819b5b1b>] kernel_init+0xb/0x100 [ 11.973950] [<819cebf7>] ret_from_kernel_thread+0x1b/0x28 [ 11.973950] [<819cebf7>] ret_from_kernel_thread+0x1b/0x28 [ 11.973950] [<819b5b10>] ? rest_init+0xc0/0xc0 [ 11.973950] [<819b5b10>] ? rest_init+0xc0/0xc0 Before 469bdcefdc ("ipv6: fix the use of pcpu_tstats in ip6_vti.c"), the pcpu_tstats.syncp is not used to pretect the 64bit elements of pcpu_tstats, so not appear this calltrace. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c net/ipv6/ip6_tunnel.c net/ipv6/ip6_vti.c ipv6 tunnel statistic bug fixes conflicting with consolidation into generic sw per-cpu net stats. qlogic conflict between queue counting bug fix and the addition of multiple MAC address support. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06ipv6: don't install anycast address for /128 addresses on routersHannes Frederic Sowa
It does not make sense to create an anycast address for an /128-prefix. Suppress it. As 32019e651c6fce ("ipv6: Do not leave router anycast address for /127 prefixes.") shows we also may not leave them, because we could accidentally remove an anycast address the user has allocated or got added via another prefix. Cc: François-Xavier Le Bail <fx.lebail@yahoo.com> Cc: Thomas Haller <thaller@redhat.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables Pablo Neira Ayuso says: <pablo@netfilter.org> ==================== nftables updates for net-next The following patchset contains nftables updates for your net-next tree, they are: * Add set operation to the meta expression by means of the select_ops() infrastructure, this allows us to set the packet mark among other things. From Arturo Borrero Gonzalez. * Fix wrong format in sscanf in nf_tables_set_alloc_name(), from Daniel Borkmann. * Add new queue expression to nf_tables. These comes with two previous patches to prepare this new feature, one to add mask in nf_tables_core to evaluate the queue verdict appropriately and another to refactor common code with xt_NFQUEUE, from Eric Leblond. * Do not hide nftables from Kconfig if nfnetlink is not enabled, also from Eric Leblond. * Add the reject expression to nf_tables, this adds the missing TCP RST support. It comes with an initial patch to refactor common code with xt_NFQUEUE, again from Eric Leblond. * Remove an unused variable assignment in nf_tables_dump_set(), from Michal Nazarewicz. * Remove the nft_meta_target code, now that Arturo added the set operation to the meta expression, from me. * Add help information for nf_tables to Kconfig, also from me. * Allow to dump all sets by specifying NFPROTO_UNSPEC, similar feature is available to other nf_tables objects, requested by Arturo, from me. * Expose the table usage counter, so we can know how many chains are using this table without dumping the list of chains, from Tomasz Bursztyka. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04net: unify the pcpu_tstats and br_cpu_netstats as oneLi RongQing
They are same, so unify them as one, pcpu_sw_netstats. Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats from if_tunnel and remove br_cpu_netstats from br_private.h Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02ipv6: fix the use of pcpu_tstats in ip6_vti.cLi RongQing
when read/write the 64bit data, the correct lock should be hold. and we can use the generic vti6_get_stats to return stats, and not define a new one in ip6_vti.c Fixes: 87b6d218f3adb ("tunnel: implement 64 bits statistics") Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02ipv6: fix the use of pcpu_tstats in ip6_tunnelLi RongQing
when read/write the 64bit data, the correct lock should be hold. Fixes: 87b6d218f3adb ("tunnel: implement 64 bits statistics") Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02ipv6 addrconf: fix preferred lifetime state-changing behavior while ↵Yasushi Asano
valid_lft is infinity Fixed a problem with setting the lifetime of an IPv6 address. When setting preferred_lft to a value not zero or infinity, while valid_lft is infinity(0xffffffff) preferred lifetime is set to forever and does not update. Therefore preferred lifetime never becomes deprecated. valid lifetime and preferred lifetime should be set independently, even if valid lifetime is infinity, preferred lifetime must expire correctly (meaning it must eventually become deprecated) Signed-off-by: Yasushi Asano <yasushi.asano@jp.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02ipv6: remove prune parameter for fib6_clean_allLi RongQing
since the prune parameter for fib6_clean_all always is 0, remove it. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01ipv6: namespace cleanupsstephen hemminger
Running 'make namespacecheck' shows: net/ipv6/route.o ipv6_route_table_template rt6_bind_peer net/ipv6/icmp.o icmpv6_route_lookup ipv6_icmp_table_template This addresses some of those warnings by: * make icmpv6_route_lookup static * move inline's out of ip6_route.h since only used into route.c * move rt6_bind_peer into route.c Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01netlink: cleanup rntl_af_registerstephen hemminger
The function __rtnl_af_register is never called outside this code, and the return value is always 0. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01ipv6: fix the use of pcpu_tstats in sitLi RongQing
when read/write the 64bit data, the correct lock should be hold. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01netfilter: add help information to new nf_tables Kconfig optionsPablo Neira Ayuso
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-31ipv6: unneccessary to get address prefix in addrconf_get_prefix_routeLi RongQing
Since addrconf_get_prefix_route inputs the address prefix to fib6_locate, which does not uses the data which is out of the prefix_len length, so do not need to use ipv6_addr_prefix to get address prefix. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-30netfilter: REJECT: separate reusable codeEric Leblond
This patch prepares the addition of TCP reset support in the nft_reject module by moving reusable code into a header file. Signed-off-by: Eric Leblond <eric@regit.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-27ipv6: release dst properly in ipip6_tunnel_xmitLi RongQing
if a dst is not attached to anywhere, it should be released before exit ipip6_tunnel_xmit, otherwise cause dst memory leakage. Fixes: 61c1db7fae21 ("ipv6: sit: add GSO/TSO support") Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-26ipv6: cleanup for tcp_ipv6.cWeilong Chen
Fix some checkpatch errors for tcp_ipv6.c Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-22netconf: add support for IPv6 proxy_ndpstephen hemminger
Need to be able to see changes to proxy NDP status on a per interface basis via netlink (analog to proxy_arp). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2013-12-19 1) Use the user supplied policy index instead of a generated one if present. From Fan Du. 2) Make xfrm migration namespace aware. From Fan Du. 3) Make the xfrm state and policy locks namespace aware. From Fan Du. 4) Remove ancient sleeping when the SA is in acquire state, we now queue packets to the policy instead. This replaces the sleeping code. 5) Remove FLOWI_FLAG_CAN_SLEEP. This was used to notify xfrm about the posibility to sleep. The sleeping code is gone, so remove it. 6) Check user specified spi for IPComp. Thr spi for IPcomp is only 16 bit wide, so check for a valid value. From Fan Du. 7) Export verify_userspi_info to check for valid user supplied spi ranges with pfkey and netlink. From Fan Du. 8) RFC3173 states that if the total size of a compressed payload and the IPComp header is not smaller than the size of the original payload, the IP datagram must be sent in the original non-compressed form. These packets are dropped by the inbound policy check because they are not transformed. Document the need to set 'level use' for IPcomp to receive such packets anyway. From Fan Du. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19ipv6: always set the new created dst's from in ip6_rt_copyLi RongQing
ip6_rt_copy only sets dst.from if ort has flag RTF_ADDRCONF and RTF_DEFAULT. but the prefix routes which did get installed by hand locally can have an expiration, and no any flag combination which can ensure a potential from does never expire, so we should always set the new created dst's from. This also fixes the new created dst is always expired since the ort, which is created by RA, maybe has RTF_EXPIRES and RTF_ADDRCONF, but no RTF_DEFAULT. Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18ipv6: sit: update mtu check to take care of gso packetsEric Dumazet
While testing my changes for TSO support in SIT devices, I was using sit0 tunnel which appears to include nopmtudisc flag. But using : ip tun add sittun mode sit remote $REMOTE_IPV4 local $LOCAL_IPV4 \ dev $IFACE We get a tunnel which rejects too long packets because of the mtu check which is not yet GSO aware. erd:~# ip tunnel sittun: ipv6/ip remote 10.246.17.84 local 10.246.17.83 ttl inherit 6rd-prefix 2002::/16 sit0: ipv6/ip remote any local any ttl 64 nopmtudisc 6rd-prefix 2002::/16 This patch is based on an excellent report from Michal Shmidt. In the future, we probably want to extend the MTU check to do the right thing for GSO packets... Fixes: ("61c1db7fae21 ipv6: sit: add GSO/TSO support") Reported-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18ipv6: pmtudisc setting not respected with UFO/CORKHannes Frederic Sowa
Sockets marked with IPV6_PMTUDISC_PROBE (or later IPV6_PMTUDISC_INTERFACE) don't respect this setting when the outgoing interface supports UFO. We had the same problem in IPv4, which was fixed in commit daba287b299ec7a2c61ae3a714920e90e8396ad5 ("ipv4: fix DO and PROBE pmtu mode regarding local fragmentation with UFO/CORK"). Also IPV6_DONTFRAG mode did not care about already corked data, thus it may generate a fragmented frame even if this socket option was specified. It also did not care about the length of the ipv6 header and possible options. In the error path allow the user to receive the pmtu notifications via both, rxpmtu method or error queue. The user may opted in for both, so deliver the notification to both error handlers (the handlers check if the error needs to be enqueued). Also report back consistent pmtu values when sending on an already cork-appended socket. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18ipv6: support IPV6_PMTU_INTERFACE on socketsHannes Frederic Sowa
IPV6_PMTU_INTERFACE is the same as IPV6_PMTU_PROBE for ipv6. Add it nontheless for symmetry with IPv4 sockets. Also drop incoming MTU information if this mode is enabled. The additional bit in ipv6_pinfo just eats in the padding behind the bitfield. There are no changes to the layout of the struct at all. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18inet: make no_pmtu_disc per namespace and kill ipv4_configHannes Frederic Sowa
The other field in ipv4_config, log_martians, was converted to a per-interface setting, so we can just remove the whole structure. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/intel/i40e/i40e_main.c drivers/net/macvtap.c Both minor merge hassles, simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-15net-ipv6: Fix alleged compiler warning in ipv6_exthdrs_len()Jerry Chu
It was reported that Commit 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support") triggered a compiler warning in ipv6_exthdrs_len(): net/ipv6/ip6_offload.c: In function ‘ipv6_gro_complete’: net/ipv6/ip6_offload.c:178:24: warning: ‘optlen’ may be used uninitialized in this function [-Wmaybe-u opth = (void *)opth + optlen; ^ net/ipv6/ip6_offload.c:164:22: note: ‘optlen’ was declared here int len = 0, proto, optlen; ^ Note that there was no real bug here - optlen was never uninitialized before use. (Was the version of gcc I used smarter to not complain?) Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14ipv6: fix compiler warning in ipv6_exthdrs_lenHannes Frederic Sowa
Commit 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support") used an uninitialized variable which leads to the following compiler warning: net/ipv6/ip6_offload.c: In function ‘ipv6_gro_complete’: net/ipv6/ip6_offload.c:178:24: warning: ‘optlen’ may be used uninitialized in this function [-Wmaybe-uninitialized] opth = (void *)opth + optlen; ^ net/ipv6/ip6_offload.c:164:22: note: ‘optlen’ was declared here int len = 0, proto, optlen; ^ Fix it up. Cc: Jerry Chu <hkchu@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12ipv6: fix incorrect type in declarationFlorent Fourcot
Introduced by 1397ed35f22d7c30d0b89ba74b6b7829220dfcfd "ipv6: add flowinfo for tcp6 pkt_options for all cases" Reported-by: kbuild test robot <fengguang.wu@intel.com> V2: fix the title, add empty line after the declaration (Sergei Shtylyov feedbacks) Signed-off-by: David S. Miller <davem@davemloft.net>