summaryrefslogtreecommitdiff
path: root/drivers/net/vrf.c
AgeCommit message (Collapse)Author
2016-05-16net: vrf: protect changes to private data with rcuDavid Ahern
One cpu can be processing packets which includes using the cached route entries in the vrf device's private data and on another cpu the device gets deleted which releases the routes and sets the pointers in net_vrf to NULL. This results in datapath dereferencing a NULL pointer. Fix by protecting access to dst's with rcu. Fixes: 193125dbd8eb ("net: Introduce VRF device driver") Fixes: 35402e313663 ("net: Add IPv6 support to VRF device") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-11net: l3mdev: Add hook in ip and ipv6David Ahern
Currently the VRF driver uses the rx_handler to switch the skb device to the VRF device. Switching the dev prior to the ip / ipv6 layer means the VRF driver has to duplicate IP/IPv6 processing which adds overhead and makes features such as retaining the ingress device index more complicated than necessary. This patch moves the hook to the L3 layer just after the first NF_HOOK for PRE_ROUTING. This location makes exposing the original ingress device trivial (next patch) and allows adding other NF_HOOKs to the VRF driver in the future. dev_queue_xmit_nit is exported so that the VRF driver can cycle the skb with the switched device through the packet taps to maintain current behavior (tcpdump can be used on either the vrf device or the enslaved devices). Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09net: l3mdev: Allow send on enslaved interfaceDavid Ahern
Allow udp and raw sockets to send by oif that is an enslaved interface versus the l3mdev/VRF device. For example, this allows BFD to use ifindex from IP_PKTINFO on a receive to send a response without the need to convert to the VRF index. It also allows ping and ping6 to work when specifying an enslaved interface (e.g., ping -I swp1 <ip>) which is a natural use case. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06net: vrf: Create FIB tables on link createDavid Ahern
Tables have to exist for VRFs to function. Ensure they exist when VRF device is created. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11net: vrf: Fix dst reference countingDavid Ahern
Vivek reported a kernel exception deleting a VRF with an active connection through it. The root cause is that the socket has a cached reference to a dst that is destroyed. Converting the dst_destroy to dst_release and letting proper reference counting kick in does not work as the dst has a reference to the device which needs to be released as well. I talked to Hannes about this at netdev and he pointed out the ipv4 and ipv6 dst handling has dst_ifdown for just this scenario. Rather than continuing with the reinvented dst wheel in VRF just remove it and leverage the ipv4 and ipv6 versions. Fixes: 193125dbd8eb2 ("net: Introduce VRF device driver") Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Several cases of overlapping changes, as well as one instance (vxlan) of a bug fix in 'net' overlapping with code movement in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25net: vrf: Remove direct access to skb->dataDavid Ahern
Nik pointed that the VRF driver should be using skb_header_pointer instead of accessing skb->data and bits beyond directly which can be garbage. Fixes: 35402e313663 ("net: Add IPv6 support to VRF device") Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11vrf: duplicate include of rtnetlink.hstephen hemminger
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07net: Add support for fill_slave_info to VRF deviceDavid Ahern
Allows userspace to have direct access to VRF table association versus looking up master device and its table. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2016-01-04net: Propagate lookup failure in l3mdev_get_saddr to callerDavid Ahern
Commands run in a vrf context are not failing as expected on a route lookup: root@kenny:~# ip ro ls table vrf-red unreachable default root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254 ping: Warning: source address might be selected on device other than vrf-red. PING 10.100.1.254 (10.100.1.254) from 0.0.0.0 vrf-red: 56(84) bytes of data. --- 10.100.1.254 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms Since the vrf table does not have a route for 10.100.1.254 the ping should have failed. The saddr lookup causes a full VRF table lookup. Propogating a lookup failure to the user allows the command to fail as expected: root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254 connect: No route to host Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/renesas/ravb_main.c kernel/bpf/syscall.c net/ipv4/ipmr.c All three conflicts were cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03net: add possibility to pass information about upper device via notifierJiri Pirko
Sometimes the drivers and other code would find it handy to know some internal information about upper device being changed. So allow upper-code to pass information down to notifier listeners during linking. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03net: propagate upper priv via netdev_master_upper_dev_linkJiri Pirko
Eliminate netdev_master_upper_dev_link_private and pass priv directly as a parameter of netdev_master_upper_dev_link. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24vrf: remove slave queue and private slave structNikolay Aleksandrov
The private slave queue and slave struct haven't been used for anything and aren't needed, this allows to reduce memory usage and simplify enslave/release. We can use netdev_for_each_lower_dev() to free the vrf ports when deleting a vrf device. Also if in the future a private struct is needed for each slave, it can be implemented via lower devices' private member (similar to how bonding does it). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23vrf: fix double free and memory corruption on register_netdevice failureNikolay Aleksandrov
When vrf's ->newlink is called, if register_netdevice() fails then it does free_netdev(), but that's also done by rtnl_newlink() so a second free happens and memory gets corrupted, to reproduce execute the following line a couple of times (1 - 5 usually is enough): $ for i in `seq 1 5`; do ip link add vrf: type vrf table 1; done; This works because we fail in register_netdevice() because of the wrong name "vrf:". And here's a trace of one crash: [ 28.792157] ------------[ cut here ]------------ [ 28.792407] kernel BUG at fs/namei.c:246! [ 28.792608] invalid opcode: 0000 [#1] SMP [ 28.793240] Modules linked in: vrf nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace sunrpc crct10dif_pclmul crc32_pclmul crc32c_intel qxl drm_kms_helper ttm drm aesni_intel aes_x86_64 psmouse glue_helper lrw evdev gf128mul i2c_piix4 ablk_helper cryptd ppdev parport_pc parport serio_raw pcspkr virtio_balloon virtio_console i2c_core acpi_cpufreq button 9pnet_virtio 9p 9pnet fscache ipv6 autofs4 ext4 crc16 mbcache jbd2 virtio_blk virtio_net sg sr_mod cdrom ata_generic ehci_pci uhci_hcd ehci_hcd e1000 usbcore usb_common ata_piix libata virtio_pci virtio_ring virtio scsi_mod floppy [ 28.796016] CPU: 0 PID: 1148 Comm: ld-linux-x86-64 Not tainted 4.4.0-rc1+ #24 [ 28.796016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 [ 28.796016] task: ffff8800352561c0 ti: ffff88003592c000 task.ti: ffff88003592c000 [ 28.796016] RIP: 0010:[<ffffffff812187b3>] [<ffffffff812187b3>] putname+0x43/0x60 [ 28.796016] RSP: 0018:ffff88003592fe88 EFLAGS: 00010246 [ 28.796016] RAX: 0000000000000000 RBX: ffff8800352561c0 RCX: 0000000000000001 [ 28.796016] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003784f000 [ 28.796016] RBP: ffff88003592ff08 R08: 0000000000000001 R09: 0000000000000000 [ 28.796016] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [ 28.796016] R13: 000000000000047c R14: ffff88003784f000 R15: ffff8800358c4a00 [ 28.796016] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 [ 28.796016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 28.796016] CR2: 00007ffd583bc2d9 CR3: 0000000035a99000 CR4: 00000000000406f0 [ 28.796016] Stack: [ 28.796016] ffffffff8121045d ffffffff812102d3 ffff8800352561c0 ffff880035a91660 [ 28.796016] ffff8800008a9880 0000000000000000 ffffffff81a49940 00ffffff81218684 [ 28.796016] ffff8800352561c0 000000000000047c 0000000000000000 ffff880035b36d80 [ 28.796016] Call Trace: [ 28.796016] [<ffffffff8121045d>] ? do_execveat_common.isra.34+0x74d/0x930 [ 28.796016] [<ffffffff812102d3>] ? do_execveat_common.isra.34+0x5c3/0x930 [ 28.796016] [<ffffffff8121066c>] do_execve+0x2c/0x30 [ 28.796016] [<ffffffff810939a0>] call_usermodehelper_exec_async+0xf0/0x140 [ 28.796016] [<ffffffff810938b0>] ? umh_complete+0x40/0x40 [ 28.796016] [<ffffffff815cb1af>] ret_from_fork+0x3f/0x70 [ 28.796016] Code: 48 8d 47 1c 48 89 e5 53 48 8b 37 48 89 fb 48 39 c6 74 1a 48 8b 3d 7e e9 8f 00 e8 49 fa fc ff 48 89 df e8 f1 01 fd ff 5b 5d f3 c3 <0f> 0b 48 89 fe 48 8b 3d 61 e9 8f 00 e8 2c fa fc ff 5b 5d eb e9 [ 28.796016] RIP [<ffffffff812187b3>] putname+0x43/0x60 [ 28.796016] RSP <ffff88003592fe88> Fixes: 193125dbd8eb ("net: Introduce VRF device driver") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-13net: Add IPv6 support to VRF deviceDavid Ahern
Add support for IPv6 to VRF device driver. Implemenation parallels what has been done for IPv4. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08dst: Pass net into dst->outputEric W. Biederman
The network namespace is already passed into dst_output pass it into dst->output lwt->output and friends. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08ipv4, ipv6: Pass net into ip_local_out and ip6_local_outEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08ipv4, ipv6: Pass net into __ip_local_out and __ip6_local_outEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08ipv4: Merge ip_local_out and ip_local_out_skEric W. Biederman
It is confusing and silly hiding a parameter so modify all of the callers to pass in the appropriate socket or skb->sk if no socket is known. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08dst: Pass a sk into .local_outEric W. Biederman
For consistency with the other similar methods in the kernel pass a struct sock into the dst_ops .local_out method. Simplifying the socket passing case is needed a prequel to passing a struct net reference into .local_out. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: Add source address lookup op for VRFDavid Ahern
Add operation to l3mdev to lookup source address for a given flow. Add support for the operation to VRF driver and convert existing IPv4 hooks to use the new lookup. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: Add netif_is_l3_slaveDavid Ahern
IPv6 addrconf keys off of IFF_SLAVE so can not use it for L3 slave. Add a new private flag and add netif_is_l3_slave function for checking it. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: Rename FLOWI_FLAG_VRFSRC to FLOWI_FLAG_L3MDEV_SRCDavid Ahern
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05vrf: fix a kernel warningWANG Cong
This fixes: tried to remove device ip6gre0 from (null) ------------[ cut here ]------------ kernel BUG at net/core/dev.c:5219! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 3 PID: 161 Comm: kworker/u8:2 Not tainted 4.3.0-rc2+ #1142 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: netns cleanup_net task: ffff8800d784a9c0 ti: ffff8800d74a4000 task.ti: ffff8800d74a4000 RIP: 0010:[<ffffffff817f0797>] [<ffffffff817f0797>] __netdev_adjacent_dev_remove+0x40/0xec RSP: 0018:ffff8800d74a7a98 EFLAGS: 00010282 RAX: 000000000000002a RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88011adcf701 RSI: ffff88011adccbf8 RDI: ffff88011adccbf8 RBP: ffff8800d74a7ab8 R08: 0000000000000001 R09: 0000000000000000 R10: ffffffff81d190ff R11: 00000000ffffffff R12: ffff8800d599e7c0 R13: 0000000000000000 R14: ffff8800d599e890 R15: ffffffff82385e00 FS: 0000000000000000(0000) GS:ffff88011ac00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007ffd6f003000 CR3: 000000000220c000 CR4: 00000000000006e0 Stack: 0000000000000000 ffff8800d599e7c0 0000000000000b00 ffff8800d599e8a0 ffff8800d74a7ad8 ffffffff817f0861 0000000000000000 ffff8800d599e7c0 ffff8800d74a7af8 ffffffff817f088f 0000000000000000 ffff8800d599e7c0 Call Trace: [<ffffffff817f0861>] __netdev_adjacent_dev_unlink+0x1e/0x35 [<ffffffff817f088f>] __netdev_adjacent_dev_unlink_neighbour+0x17/0x41 [<ffffffff817f56e6>] netdev_upper_dev_unlink+0x6c/0x13d [<ffffffff81674a3d>] vrf_del_slave+0x26/0x7d [<ffffffff81674ac3>] vrf_device_event+0x2f/0x34 [<ffffffff81098c40>] notifier_call_chain+0x75/0x9c [<ffffffff81098fa2>] raw_notifier_call_chain+0x14/0x16 [<ffffffff817ee129>] call_netdevice_notifiers_info+0x52/0x59 [<ffffffff817f179d>] call_netdevice_notifiers+0x13/0x15 [<ffffffff817f6f18>] rollback_registered_many+0x14f/0x24f [<ffffffff817f70f2>] unregister_netdevice_many+0x19/0x64 [<ffffffff819a2455>] ip6gre_exit_net+0x163/0x177 [<ffffffff817eb019>] ops_exit_list+0x44/0x55 [<ffffffff817ebcb7>] cleanup_net+0x193/0x226 [<ffffffff81091e1c>] process_one_work+0x26c/0x4d8 [<ffffffff81091d20>] ? process_one_work+0x170/0x4d8 [<ffffffff81092296>] worker_thread+0x1df/0x2c2 [<ffffffff810920b7>] ? process_scheduled_works+0x2f/0x2f [<ffffffff810920b7>] ? process_scheduled_works+0x2f/0x2f [<ffffffff81097a20>] kthread+0xd4/0xdc [<ffffffff810bc523>] ? trace_hardirqs_on_caller+0x17d/0x199 [<ffffffff8109794c>] ? __kthread_parkme+0x83/0x83 [<ffffffff81a5240f>] ret_from_fork+0x3f/0x70 [<ffffffff8109794c>] ? __kthread_parkme+0x83/0x83 Fixes: 93a7e7e837af ("net: Remove the now unused vrf_ptr") Cc: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: Remove vrf header fileDavid Ahern
Move remaining structs to VRF driver and delete the vrf header file. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: Remove the now unused vrf_ptrDavid Ahern
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: Add support for l3mdev ops to VRF driverDavid Ahern
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTERDavid Ahern
Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER and update the name of the netif_is_vrf and netif_index_is_vrf macros. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/ipv4/arp.c The net/ipv4/arp.c conflict was one commit adding a new local variable while another commit was deleting one. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17netfilter: Pass net into okfnEric W. Biederman
This is immediately motivated by the bridge code that chains functions that call into netfilter. Without passing net into the okfns the bridge code would need to guess about the best expression for the network namespace to process packets in. As net is frequently one of the first things computed in continuation functions after netfilter has done it's job passing in the desired network namespace is in many cases a code simplification. To support this change the function dst_output_okfn is introduced to simplify passing dst_output as an okfn. For the moment dst_output_okfn just silently drops the struct net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17netfilter: Pass struct net into the netfilter hooksEric W. Biederman
Pass a network namespace parameter into the netfilter hooks. At the call site of the netfilter hooks the path a packet is taking through the network stack is well known which allows the network namespace to be easily and reliabily. This allows the replacement of magic code like "dev_net(state->in?:state->out)" that appears at the start of most netfilter hooks with "state->net". In almost all cases the network namespace passed in is derived from the first network device passed in, guaranteeing those paths will not see any changes in practice. The exceptions are: xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm) ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp) ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp) ipv4/raw.c:raw_send_hdrinc() sock_net(sk) ipv6/ip6_output.c:ip6_xmit() sock_net(sk) ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev) ipv6/raw.c:raw6_send_hdrinc() sock_net(sk) br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev In all cases these exceptions seem to be a better expression for the network namespace the packet is being processed in then the historic "dev_net(in?in:out)". I am documenting them in case something odd pops up and someone starts trying to track down what happened. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17net: Fix vti use case with oif in dst lookupsDavid Ahern
Steffen reported that the recent change to add oif to dst lookups breaks the VTI use case. The problem is that with the oif set in the flow struct the comparison to the nh_oif is triggered. Fix by splitting the FLOWI_FLAG_VRFSRC into 2 flags -- one that triggers the vrf device cache bypass (FLOWI_FLAG_VRFSRC) and another telling the lookup to not compare nh oif (FLOWI_FLAG_SKIP_NH_OIF). Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-15net: Add FIB table id to rtableDavid Ahern
Add the FIB table id to rtable to make the information available for IPv4 as it is for IPv6. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28net: Add ethernet header for pass through VRF deviceDavid Ahern
The change to use a custom dst broke tcpdump captures on the VRF device: $ tcpdump -n -i vrf10 ... 05:32:29.009362 IP 10.2.1.254 > 10.2.1.2: ICMP echo request, id 21989, seq 1, length 64 05:32:29.009855 00:00:40:01:8d:36 > 45:00:00:54:d6:6f, ethertype Unknown (0x0a02), length 84: 0x0000: 0102 0a02 01fe 0000 9181 55e5 0001 bd11 ..........U..... 0x0010: da55 0000 0000 bb5d 0700 0000 0000 1011 .U.....]........ 0x0020: 1213 1415 1617 1819 1a1b 1c1d 1e1f 2021 ...............! 0x0030: 2223 2425 2627 2829 2a2b 2c2d 2e2f 3031 "#$%&'()*+,-./01 0x0040: 3233 3435 3637 234567 Local packets going through the VRF device are missing an ethernet header. Fix by adding one and then stripping it off before pushing back to the IP stack. With this patch you get the expected dumps: ... 05:36:15.713944 IP 10.2.1.254 > 10.2.1.2: ICMP echo request, id 23795, seq 1, length 64 05:36:15.714160 IP 10.2.1.2 > 10.2.1.254: ICMP echo reply, id 23795, seq 1, length 64 ... Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-20route: move lwtunnel state to dst_entryJiri Benc
Currently, the lwtunnel state resides in per-protocol data. This is a problem if we encapsulate ipv6 traffic in an ipv4 tunnel (or vice versa). The xmit function of the tunnel does not know whether the packet has been routed to it by ipv4 or ipv6, yet it needs the lwtstate data. Moving the lwtstate data to dst_entry makes such inter-protocol tunneling possible. As a bonus, this brings a nice diffstat. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-20vrf: ndo_add|del_slave drop unnecessary checksNikolay Aleksandrov
When ndo_add|del_slave ops are used, they're taken from the respective master device's netdev ops, so if the master device is a VRF only then the VRF ops will get called thus no need to check the type of the master. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-20vrf: move vrf_insert_slave so we can drop a goto labelNikolay Aleksandrov
We can simplify do_vrf_add_slave by moving vrf_insert_slave in the end of the enslaving and thus eliminate an error goto label. It always succeeds and isn't needed before that anyway. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-20vrf: remove unnecessary duplicate checkNikolay Aleksandrov
The upper/lower functions already check for duplicate slaves so no need to do it again. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-20vrf: don't panic on cache create failureNikolay Aleksandrov
It's pointless to panic on cache create failure when that case is handled and even more so since it's not a kernel-wide fatal problem so don't panic. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-20vrf: plug skb leaksNikolay Aleksandrov
Currently whenever a packet different from ETH_P_IP is sent through the VRF device it is leaked so plug the leaks and properly drop these packets. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-18vrf: simplify the netdev notifier functionNikolay Aleksandrov
We can drop the check because if vrf_ptr is present then we must have the vrf device as a master and since we're running with rtnl it can't go away. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-18vrf: don't check for dstats and rth in uninit pathNikolay Aleksandrov
dstats and rth are always present because we fail the device registration if they can't be allocated in vrf_init() (ndo_init) so drop the unnecessary checks. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-18vrf: drop unused num_slaves memberNikolay Aleksandrov
slave_queue has a num_slaves member which is unused, drop it. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-18vrf: drop unnecessary dev refcnt changesNikolay Aleksandrov
netdev_master_upper_dev_link/unlink already do a dev_hold/put on the devices being linked, so no need to take another reference. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13net: Introduce VRF device driverDavid Ahern
This driver borrows heavily from IPvlan and teaming drivers. Routing domains (VRF-lite) are created by instantiating a VRF master device with an associated table and enslaving all routed interfaces that participate in the domain. As part of the enslavement, all connected routes for the enslaved devices are moved to the table associated with the VRF device. Outgoing sockets must bind to the VRF device to function. Standard FIB rules bind the VRF device to tables and regular fib rule processing is followed. Routed traffic through the box, is forwarded by using the VRF device as the IIF and following the IIF rule to a table that is mated with the VRF. Example: Create vrf 1: ip link add vrf1 type vrf table 5 ip rule add iif vrf1 table 5 ip rule add oif vrf1 table 5 ip route add table 5 prohibit default ip link set vrf1 up Add interface to vrf 1: ip link set eth1 master vrf1 Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>