summaryrefslogtreecommitdiff
path: root/net/bridge
AgeCommit message (Collapse)Author
2015-06-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "As usual, mostly comment, kerneldoc and printk() fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: lpfc: Grammar s/an negative/a negative/ ARM: lib/lib1funcs.S: fix typo s/substractions/subtractions/ cx25821: cx25821-medusa-reg.h: fix 0x0x prefix lib: crc-itu-t.[ch] fix 0x0x prefix in integer constants rapidio: Fix kerneldoc and comment qla4xxx: Fix printk() in qla4_83xx_read_reset_template() and qla4_83xx_pre_loopback_config() treewide: Kconfig: fix wording / spelling usb/serial: fix grammar in Kconfig help text for FTDI_SIO megaraid_sas: fix kerneldoc netfilter: ebtables: fix comment grammar drm/radeon: fix comment isdn: fix grammar in comment ARM: KVM: fix comment
2015-06-10bridge: fix multicast router rlist endless loopNikolay Aleksandrov
Since the addition of sysfs multicast router support if one set multicast_router to "2" more than once, then the port would be added to the hlist every time and could end up linking to itself and thus causing an endless loop for rlist walkers. So to reproduce just do: echo 2 > multicast_router; echo 2 > multicast_router; in a bridge port and let some igmp traffic flow, for me it hangs up in br_multicast_flood(). Fix this by adding a check in br_multicast_add_router() if the port is already linked. The reason this didn't happen before the addition of multicast_router sysfs entries is because there's a !hlist_unhashed check that prevents it. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Fixes: 0909e11758bd ("bridge: Add multicast_router sysfs entries") Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07bridge: disable softirqs around br_fdb_update to avoid lockupNikolay Aleksandrov
br_fdb_update() can be called in process context in the following way: br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set) so we need to disable softirqs because there are softirq users of the hash_lock. One easy way to reproduce this is to modify the bridge utility to set NTF_USE, enable stp and then set maxageing to a low value so br_fdb_cleanup() is called frequently and then just add new entries in a loop. This happens because br_fdb_cleanup() is called from timer/softirq context. The spin locks in br_fdb_update were _bh before commit f8ae737deea1 ("[BRIDGE]: forwarding remove unneeded preempt and bh diasables") and at the time that commit was correct because br_fdb_update() couldn't be called from process context, but that changed after commit: 292d1398983f ("bridge: add NTF_USE support") Using local_bh_disable/enable around br_fdb_update() allows us to keep using the spin_lock/unlock in br_fdb_update for the fast-path. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: 292d1398983f ("bridge: add NTF_USE support") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07Revert "bridge: use _bh spinlock variant for br_fdb_update to avoid lockup"David S. Miller
This reverts commit 1d7c49037b12016e7056b9f2c990380e2187e766. Nikolay Aleksandrov has a better version of this fix. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07bridge: use _bh spinlock variant for br_fdb_update to avoid lockupWilson Kok
br_fdb_update() can be called in process context in the following way: br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set) so we need to use spin_lock_bh because there are softirq users of the hash_lock. One easy way to reproduce this is to modify the bridge utility to set NTF_USE, enable stp and then set maxageing to a low value so br_fdb_cleanup() is called frequently and then just add new entries in a loop. This happens because br_fdb_cleanup() is called from timer/softirq context. These locks were _bh before commit f8ae737deea1 ("[BRIDGE]: forwarding remove unneeded preempt and bh diasables") and at the time that commit was correct because br_fdb_update() couldn't be called from process context, but that changed after commit: 292d1398983f ("bridge: add NTF_USE support") Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: 292d1398983f ("bridge: add NTF_USE support") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fix for net The following patch reverts the ebtables chunk that enforces counters that was introduced in the recently applied d26e2c9ffa38 ('Revert "netfilter: ensure number of counters is >0 in do_replace()"') since this breaks ebtables. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01Revert "netfilter: ensure number of counters is >0 in do_replace()"Bernhard Thaler
This partially reverts commit 1086bbe97a07 ("netfilter: ensure number of counters is >0 in do_replace()") in net/bridge/netfilter/ebtables.c. Setting rules with ebtables does not work any more with 1086bbe97a07 place. There is an error message and no rules set in the end. e.g. ~# ebtables -t nat -A POSTROUTING --src 12:34:56:78:9a:bc -j DROP Unable to update the kernel. Two possible causes: 1. Multiple ebtables programs were executing simultaneously. The ebtables userspace tool doesn't by default support multiple ebtables programs running Reverting the ebtables part of 1086bbe97a07 makes this work again. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-30bridge: fix br_multicast_query_expired() bugEric Dumazet
br_multicast_query_expired() querier argument is a pointer to a struct bridge_mcast_querier : struct bridge_mcast_querier { struct br_ip addr; struct net_bridge_port __rcu *port; }; Intent of the code was to clear port field, not the pointer to querier. Fixes: 2cd4143192e8 ("bridge: memorize and export selected IGMP/MLD querier port") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Linus Lüssing <linus.luessing@c0d3.blue> Cc: Linus Lüssing <linus.luessing@web.de> Cc: Steinar H. Gunderson <sesse@samfundet.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-26netfilter: ebtables: fix comment grammarGeert Uytterhoeven
s/stongly inspired on/strongly inspired by/ Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-05-22bridge: fix lockdep splatEric Dumazet
Following lockdep splat was reported : [ 29.382286] =============================== [ 29.382315] [ INFO: suspicious RCU usage. ] [ 29.382344] 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 Not tainted [ 29.382380] ------------------------------- [ 29.382409] net/bridge/br_private.h:626 suspicious rcu_dereference_check() usage! [ 29.382455] other info that might help us debug this: [ 29.382507] rcu_scheduler_active = 1, debug_locks = 0 [ 29.382549] 2 locks held by swapper/0/0: [ 29.382576] #0: (((&p->forward_delay_timer))){+.-...}, at: [<ffffffff81139f75>] call_timer_fn+0x5/0x4f0 [ 29.382660] #1: (&(&br->lock)->rlock){+.-...}, at: [<ffffffffa0450dc1>] br_forward_delay_timer_expired+0x31/0x140 [bridge] [ 29.382754] stack backtrace: [ 29.382787] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 [ 29.382838] Hardware name: LENOVO 422916G/LENOVO, BIOS A1KT53AUS 04/07/2015 [ 29.382882] 0000000000000000 3ebfc20364115825 ffff880666603c48 ffffffff81892d4b [ 29.382943] 0000000000000000 ffffffff81e124e0 ffff880666603c78 ffffffff8110bcd7 [ 29.383004] ffff8800785c9d00 ffff88065485ac58 ffff880c62002800 ffff880c5fc88ac0 [ 29.383065] Call Trace: [ 29.383084] <IRQ> [<ffffffff81892d4b>] dump_stack+0x4c/0x65 [ 29.383130] [<ffffffff8110bcd7>] lockdep_rcu_suspicious+0xe7/0x120 [ 29.383178] [<ffffffffa04520f9>] br_fill_ifinfo+0x4a9/0x6a0 [bridge] [ 29.383225] [<ffffffffa045266b>] br_ifinfo_notify+0x11b/0x4b0 [bridge] [ 29.383271] [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge] [ 29.383320] [<ffffffffa0450de8>] br_forward_delay_timer_expired+0x58/0x140 [bridge] [ 29.383371] [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge] [ 29.383416] [<ffffffff8113a033>] call_timer_fn+0xc3/0x4f0 [ 29.383454] [<ffffffff81139f75>] ? call_timer_fn+0x5/0x4f0 [ 29.383493] [<ffffffff8110a90f>] ? lock_release_holdtime.part.29+0xf/0x200 [ 29.383541] [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge] [ 29.383587] [<ffffffff8113a6a4>] run_timer_softirq+0x244/0x490 [ 29.383629] [<ffffffff810b68cc>] __do_softirq+0xec/0x670 [ 29.383666] [<ffffffff810b70d5>] irq_exit+0x145/0x150 [ 29.383703] [<ffffffff8189f506>] smp_apic_timer_interrupt+0x46/0x60 [ 29.383744] [<ffffffff8189d523>] apic_timer_interrupt+0x73/0x80 [ 29.383782] <EOI> [<ffffffff816f131f>] ? cpuidle_enter_state+0x5f/0x2f0 [ 29.383832] [<ffffffff816f131b>] ? cpuidle_enter_state+0x5b/0x2f0 Problem here is that br_forward_delay_timer_expired() is a timer handler, calling br_ifinfo_notify() which assumes either rcu_read_lock() or RTNL are held. Simplest fix seems to add rcu read lock section. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Josh Boyer <jwboyer@fedoraproject.org> Reported-by: Dominick Grift <dac.override@gmail.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-22bridge: fix parsing of MLDv2 reportsThadeu Lima de Souza Cascardo
When more than a multicast address is present in a MLDv2 report, all but the first address is ignored, because the code breaks out of the loop if there has not been an error adding that address. This has caused failures when two guests connected through the bridge tried to communicate using IPv6. Neighbor discoveries would not be transmitted to the other guest when both used a link-local address and a static address. This only happens when there is a MLDv2 querier in the network. The fix will only break out of the loop when there is a failure adding a multicast address. The mdb before the patch: dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp dev ovirtmgmt port bond0.86 grp ff02::2 temp After the patch: dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp dev ovirtmgmt port bond0.86 grp ff02::fb temp dev ovirtmgmt port bond0.86 grp ff02::2 temp dev ovirtmgmt port bond0.86 grp ff02::d temp dev ovirtmgmt port vnet0 grp ff02::1:ff00:76 temp dev ovirtmgmt port bond0.86 grp ff02::16 temp dev ovirtmgmt port vnet1 grp ff02::1:ff00:77 temp dev ovirtmgmt port bond0.86 grp ff02::1:ff00:def temp dev ovirtmgmt port bond0.86 grp ff02::1:ffa1:40bf temp Fixes: 08b202b67264 ("bridge br_multicast: IPv6 MLD support.") Reported-by: Rik Theys <Rik.Theys@esat.kuleuven.be> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Tested-by: Rik Theys <Rik.Theys@esat.kuleuven.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-20Revert "netfilter: bridge: query conntrack about skb dnat"Florian Westphal
This reverts commit c055d5b03bb4cb69d349d787c9787c0383abd8b2. There are two issues: 'dnat_took_place' made me think that this is related to -j DNAT/MASQUERADE. But thats only one part of the story. This is also relevant for SNAT when we undo snat translation in reverse/reply direction. Furthermore, I originally wanted to do this mainly to avoid storing ipv6 addresses once we make DNAT/REDIRECT work for ipv6 on bridges. However, I forgot about SNPT/DNPT which is stateless. So we can't escape storing address for ipv6 anyway. Might as well do it for ipv4 too. Reported-and-tested-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-20netfilter: ensure number of counters is >0 in do_replace()Dave Jones
After improving setsockopt() coverage in trinity, I started triggering vmalloc failures pretty reliably from this code path: warn_alloc_failed+0xe9/0x140 __vmalloc_node_range+0x1be/0x270 vzalloc+0x4b/0x50 __do_replace+0x52/0x260 [ip_tables] do_ipt_set_ctl+0x15d/0x1d0 [ip_tables] nf_setsockopt+0x65/0x90 ip_setsockopt+0x61/0xa0 raw_setsockopt+0x16/0x60 sock_common_setsockopt+0x14/0x20 SyS_setsockopt+0x71/0xd0 It turns out we don't validate that the num_counters field in the struct we pass in from userspace is initialized. The same problem also exists in ebtables, arptables, ipv6, and the compat variants. Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-29bridge/nl: remove wrong use of NLM_F_MULTINicolas Dichtel
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact, it is sent only at the end of a dump. Libraries like libnl will wait forever for NLMSG_DONE. Fixes: e5a55a898720 ("net: create generic bridge ops") Fixes: 815cccbf10b2 ("ixgbe: add setlink, getlink support to ixgbe and ixgbevf") CC: John Fastabend <john.r.fastabend@intel.com> CC: Sathya Perla <sathya.perla@emulex.com> CC: Subbu Seetharaman <subbu.seetharaman@emulex.com> CC: Ajit Khaparde <ajit.khaparde@emulex.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: intel-wired-lan@lists.osuosl.org CC: Jiri Pirko <jiri@resnulli.us> CC: Scott Feldman <sfeldma@gmail.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: bridge@lists.linux-foundation.org Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29bridge/mdb: remove wrong use of NLM_F_MULTINicolas Dichtel
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact, it is sent only at the end of a dump. Libraries like libnl will wait forever for NLMSG_DONE. Fixes: 37a393bc4932 ("bridge: notify mdb changes via netlink") CC: Cong Wang <amwang@redhat.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: bridge@lists.linux-foundation.org Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13netfilter: nf_tables: switch registers to 32 bit addressingPatrick McHardy
Switch the nf_tables registers from 128 bit addressing to 32 bit addressing to support so called concatenations, where multiple values can be concatenated over multiple registers for O(1) exact matches of multiple dimensions using sets. The old register values are mapped to areas of 128 bits for compatibility. When dumping register numbers, values are expressed using the old values if they refer to the beginning of a 128 bit area for compatibility. To support concatenations, register loads of less than a full 32 bit value need to be padded. This mainly affects the payload and exthdr expressions, which both unconditionally zero the last word before copying the data. Userspace fully passes the testsuite using both old and new register addressing. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: add register parsing/dumping helpersPatrick McHardy
Add helper functions to parse and dump register values in netlink attributes. These helpers will later be changed to take care of translation between the old 128 bit and the new 32 bit register numbers. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: convert expressions to u32 register pointersPatrick McHardy
Simple conversion to use u32 pointers to the beginning of the registers to keep follow up patches smaller. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: get rid of NFT_REG_VERDICT usagePatrick McHardy
Replace the array of registers passed to expressions by a struct nft_regs, containing the verdict as a seperate member, which aliases to the NFT_REG_VERDICT register. This is needed to seperate the verdict from the data registers completely, so their size can be changed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: kill nft_validate_output_register()Patrick McHardy
All users of nft_validate_register_store() first invoke nft_validate_output_register(). There is in fact no use for using it on its own, so simplify the code by folding the functionality into nft_validate_register_store() and kill it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: rename nft_validate_data_load()Patrick McHardy
The existing name is ambiguous, data is loaded as well when we read from a register. Rename to nft_validate_register_store() for clarity and consistency with the upcoming patch to introduce its counterpart, nft_validate_register_load(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: validate len in nft_validate_data_load()Patrick McHardy
For values spanning multiple registers, we need to validate that enough space is available from the destination register onwards. Add a len argument to nft_validate_data_load() and consolidate the existing length validations in preparation of that. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. They are: * nf_tables set timeout infrastructure from Patrick Mchardy. 1) Add support for set timeout support. 2) Add support for set element timeouts using the new set extension infrastructure. 4) Add garbage collection helper functions to get rid of stale elements. Elements are accumulated in a batch that are asynchronously released via RCU when the batch is full. 5) Add garbage collection synchronization helpers. This introduces a new element busy bit to address concurrent access from the netlink API and the garbage collector. 5) Add timeout support for the nft_hash set implementation. The garbage collector peridically checks for stale elements from the workqueue. * iptables/nftables cgroup fixes: 6) Ignore non full-socket objects from the input path, otherwise cgroup match may crash, from Daniel Borkmann. 7) Fix cgroup in nf_tables. 8) Save some cycles from xt_socket by skipping packet header parsing when skb->sk is already set because of early demux. Also from Daniel. * br_netfilter updates from Florian Westphal. 9) Save frag_max_size and restore it from the forward path too. 10) Use a per-cpu area to restore the original source MAC address when traffic is DNAT'ed. 11) Add helper functions to access physical devices. 12) Use these new physdev helper function from xt_physdev. 13) Add another nf_bridge_info_get() helper function to fetch the br_netfilter state information. 14) Annotate original layer 2 protocol number in nf_bridge info, instead of using kludgy flags. 15) Also annotate the pkttype mangling when the packet travels back and forth from the IP to the bridge layer, instead of using a flag. * More nf_tables set enhancement from Patrick: 16) Fix possible usage of set variant that doesn't support timeouts. 17) Avoid spurious "set is full" errors from Netlink API when there are pending stale elements scheduled to be released. 18) Restrict loop checks to set maps. 19) Add support for dynamic set updates from the packet path. 20) Add support to store optional user data (eg. comments) per set element. BTW, I have also pulled net-next into nf-next to anticipate the conflict resolution between your okfn() signature changes and Florian's br_netfilter updates. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-08netfilter: Fix switch statement warnings with recent gcc.David Miller
More recent GCC warns about two kinds of switch statement uses: 1) Switching on an enumeration, but not having an explicit case statement for all members of the enumeration. To show the compiler this is intentional, we simply add a default case with nothing more than a break statement. 2) Switching on a boolean value. I think this warning is dumb but nevertheless you get it wholesale with -Wswitch. This patch cures all such warnings in netfilter. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextPablo Neira Ayuso
Resolve conflicts between 5888b93 ("Merge branch 'nf-hook-compress'") and Florian Westphal br_netfilter works. Conflicts: net/bridge/br_netfilter.c Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-08netfilter: bridge: make BRNF_PKT_TYPE flag a boolFlorian Westphal
nf_bridge_info->mask is used for several things, for example to remember if skb->pkt_type was set to OTHER_HOST. For a bridge, OTHER_HOST is expected case. For ip forward its a non-starter though -- routing expects PACKET_HOST. Bridge netfilter thus changes OTHER_HOST to PACKET_HOST before hook invocation and then un-does it after hook traversal. This information is irrelevant outside of br_netfilter. After this change, ->mask now only contains flags that need to be known outside of br_netfilter in fast-path. Future patch changes mask into a 2bit state field in sk_buff, so that we can remove skb->nf_bridge pointer for good and consider all remaining places that access nf_bridge info content a not-so fastpath. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-08netfilter: bridge: start splitting mask into public/private chunksFlorian Westphal
->mask is a bit info field that mixes various use cases. In particular, we have flags that are mutually exlusive, and flags that are only used within br_netfilter while others need to be exposed to other parts of the kernel. Remove BRNF_8021Q/PPPoE flags. They're mutually exclusive and only needed within br_netfilter context. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-08netfilter: bridge: add and use nf_bridge_info_get helperFlorian Westphal
Don't access skb->nf_bridge directly, this pointer will be removed soon. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-08netfilter: bridge: don't use nf_bridge_info data to store mac headerFlorian Westphal
br_netfilter maintains an extra state, nf_bridge_info, which is attached to skb via skb->nf_bridge pointer. Amongst other things we use skb->nf_bridge->data to store the original mac header for every processed skb. This is required for ip refragmentation when using conntrack on top of bridge, because ip_fragment doesn't copy it from original skb. However there is no need anymore to do this unconditionally. Move this to the one place where its needed -- when br_netfilter calls ip_fragment(). Also switch to percpu storage for this so we can handle fragmenting without accessing nf_bridge meta data. Only user left is neigh resolution when DNAT is detected, to hold the original source mac address (neigh resolution builds new mac header using bridge mac), so rename ->data and reduce its size to whats needed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-07netfilter: Pass socket pointer down through okfn().David Miller
On the output paths in particular, we have to sometimes deal with two socket contexts. First, and usually skb->sk, is the local socket that generated the frame. And second, is potentially the socket used to control a tunneling socket, such as one the encapsulates using UDP. We do not want to disassociate skb->sk when encapsulating in order to fix this, because that would break socket memory accounting. The most extreme case where this can cause huge problems is an AF_PACKET socket transmitting over a vxlan device. We hit code paths doing checks that assume they are dealing with an ipv4 socket, but are actually operating upon the AF_PACKET one. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04netfilter: Pass nf_hook_state through nft_set_pktinfo*().David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04netfilter: Make nf_hookfn use nf_hook_state.David S. Miller
Pass the nf_hook_state all the way down into the hook functions themselves. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02dev: introduce dev_get_iflink()Nicolas Dichtel
The goal of this patch is to prepare the removal of the iflink field. It introduces a new ndo function, which will be implemented by virtual interfaces. There is no functional change into this patch. All readers of iflink field now call dev_get_iflink(). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02netfilter: bridge: really save frag_max_size between PRE and POST_ROUTINGFlorian Westphal
We also need to save/store in forward, else br_parse_ip_options call will zero frag_max_size as well. Fixes: 93fdd47e5 ('bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING') Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next. Basically, more incremental updates for br_netfilter from Florian Westphal, small nf_tables updates (including one fix for rb-tree locking) and small two-liner to add extra validation for the REJECT6 target. More specifically, they are: 1) Use the conntrack status flags from br_netfilter to know that DNAT is happening. Patch for Florian Westphal. 2) nf_bridge->physoutdev == NULL already indicates that the traffic is bridged, so let's get rid of the BRNF_BRIDGED flag. Also from Florian. 3) Another patch to prepare voidization of seq_printf/seq_puts/seq_putc, from Joe Perches. 4) Consolidation of nf_tables_newtable() error path. 5) Kill nf_bridge_pad used by br_netfilter from ip_fragment(), from Florian Westphal. 6) Access rb-tree root node inside the lock and remove unnecessary locking from the get path (we already hold nfnl_lock there), from Patrick McHardy. 7) You cannot use a NFT_SET_ELEM_INTERVAL_END when the set doesn't support interval, also from Patrick. 8) Enforce IP6T_F_PROTO from ip6t_REJECT to make sure the core is actually restricting matches to TCP. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-22netfilter: bridge: kill nf_bridge_padFlorian Westphal
The br_netfilter frag output function calls skb_cow_head() so in case it needs a larger headroom to e.g. re-add a previously stripped PPPOE or VLAN header things will still work (at cost of reallocation). We can then move nf_bridge_encap_header_len to br_netfilter. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/emulex/benet/be_main.c net/core/sysctl_net_core.c net/ipv4/inet_diag.c The be_main.c conflict resolution was really tricky. The conflict hunks generated by GIT were very unhelpful, to say the least. It split functions in half and moved them around, when the real actual conflict only existed solely inside of one function, that being be_map_pci_bars(). So instead, to resolve this, I checked out be_main.c from the top of net-next, then I applied the be_main.c changes from 'net' since the last time I merged. And this worked beautifully. The inet_diag.c and sysctl_net_core.c conflicts were simple overlapping changes, and were easily to resolve. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18bridge: add ageing_time, stp_state, priority over netlinkJörg Thalheim
Signed-off-by: Jörg Thalheim <joerg@higgsboson.tk> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16netfilter: bridge: remove BRNF_STATE_BRIDGED flagFlorian Westphal
Its not needed anymore since 2bf540b73ed5b ([NETFILTER]: bridge-netfilter: remove deferred hooks). Before this it was possible to have physoutdev set for locally generated packets -- this isn't the case anymore: BRNF_STATE_BRIDGED flag is set when we assign nf_bridge->physoutdev, so physoutdev != NULL means BRNF_STATE_BRIDGED is set. If physoutdev is NULL, then we are looking at locally-delivered and routed packet. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-16netfilter: bridge: query conntrack about skb dnatFlorian Westphal
ask conntrack instead of storing ipv4 address in nf_bridge_info->data. Ths avoids the need to use ->data during NF_PRE_ROUTING. Only two functions that need ->data remain. These will be addressed in followup patches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-14bridge: reset bridge mtu after deleting an interfaceVenkat Venkatsubra
On adding an interface br_add_if() sets the MTU to the min of all the interfaces. Do the same thing on removing an interface too in br_del_if. Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10netfilter: bridge: use rcu hook to resolve br_netfilter dependencyPablo Neira Ayuso
e5de75b ("netfilter: bridge: move DNAT helper to br_netfilter") results in the following link problem: net/bridge/br_device.c:29: undefined reference to `br_nf_prerouting_finish_bridge` Moreover it creates a hard dependency between br_netfilter and the bridge core, which is what we've been trying to avoid so far. Resolve this problem by using a hook structure so we reduce #ifdef pollution and keep bridge netfilter specific code under br_netfilter.c which was the original intention. Reported-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-10netfilter: fix sparse warnings in reject handlingFlorian Westphal
make C=1 CF=-D__CHECK_ENDIAN__ shows following: net/bridge/netfilter/nft_reject_bridge.c:65:50: warning: incorrect type in argument 3 (different base types) net/bridge/netfilter/nft_reject_bridge.c:65:50: expected restricted __be16 [usertype] protocol [..] net/bridge/netfilter/nft_reject_bridge.c:102:37: warning: cast from restricted __be16 net/bridge/netfilter/nft_reject_bridge.c:102:37: warning: incorrect type in argument 1 (different base types) [..] net/bridge/netfilter/nft_reject_bridge.c:121:50: warning: incorrect type in argument 3 (different base types) [..] net/bridge/netfilter/nft_reject_bridge.c:168:52: warning: incorrect type in argument 3 (different base types) [..] net/bridge/netfilter/nft_reject_bridge.c:233:52: warning: incorrect type in argument 3 (different base types) [..] Caused by two (harmless) errors: 1. htons() instead of ntohs() 2. __be16 for protocol in nf_reject_ipXhdr_put API, use u8 instead. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-09net: Remove protocol from struct dst_opsEric W. Biederman
After my change to neigh_hh_init to obtain the protocol from the neigh_table there are no more users of protocol in struct dst_ops. Remove the protocol field from dst_ops and all of it's initializers. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. Basically, improvements for the packet rejection infrastructure, deprecation of CLUSTERIP, cleanups for nf_tables and some untangling for br_netfilter. More specifically they are: 1) Send packet to reset flow if checksum is valid, from Florian Westphal. 2) Fix nf_tables reject bridge from the input chain, also from Florian. 3) Deprecate the CLUSTERIP target, the cluster match supersedes it in functionality and it's known to have problems. 4) A couple of cleanups for nf_tables rule tracing infrastructure, from Patrick McHardy. 5) Another cleanup to place transaction declarations at the bottom of nf_tables.h, also from Patrick. 6) Consolidate Kconfig dependencies wrt. NF_TABLES. 7) Limit table names to 32 bytes in nf_tables. 8) mac header copying in bridge netfilter is already required when calling ip_fragment(), from Florian Westphal. 9) move nf_bridge_update_protocol() to br_netfilter.c, also from Florian. 10) Small refactor in br_netfilter in the transmission path, again from Florian. 11) Move br_nf_pre_routing_finish_bridge_slow() to br_netfilter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09netfilter: bridge: move DNAT helper to br_netfilterPablo Neira Ayuso
Only one caller, there is no need to keep this in a header. Move it to br_netfilter.c where this belongs to. Based on patch from Florian Westphal. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-09netfilter: bridge: refactor conditional in br_nf_dev_queue_xmitFlorian Westphal
simpilifies followup patch that re-works brnf ip_fragment handling. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-09netfilter: bridge: move nf_bridge_update_protocol to where its usedFlorian Westphal
no need to keep it in a header file. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-09bridge: move mac header copying into br_netfilterFlorian Westphal
The mac header only has to be copied back into the skb for fragments generated by ip_fragment(), which only happens for bridge forwarded packets with nf-call-iptables=1 && active nf_defrag. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-05bridge: Extend Proxy ARP design to allow optional rules for Wi-FiJouni Malinen
This extends the design in commit 958501163ddd ("bridge: Add support for IEEE 802.11 Proxy ARP") with optional set of rules that are needed to meet the IEEE 802.11 and Hotspot 2.0 requirements for ProxyARP. The previously added BR_PROXYARP behavior is left as-is and a new BR_PROXYARP_WIFI alternative is added so that this behavior can be configured from user space when required. In addition, this enables proxyarp functionality for unicast ARP requests for both BR_PROXYARP and BR_PROXYARP_WIFI since it is possible to use unicast as well as broadcast for these frames. The key differences in functionality: BR_PROXYARP: - uses the flag on the bridge port on which the request frame was received to determine whether to reply - block bridge port flooding completely on ports that enable proxy ARP BR_PROXYARP_WIFI: - uses the flag on the bridge port to which the target device of the request belongs - block bridge port flooding selectively based on whether the proxyarp functionality replied Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>