summaryrefslogtreecommitdiff
path: root/net/ipv4/route.c
AgeCommit message (Collapse)Author
2012-04-20net: Convert all sysctl registrations to register_net_sysctlEric W. Biederman
This results in code with less boiler plate that is a bit easier to read. Additionally stops us from using compatibility code in the sysctl core, hastening the day when the compatibility code can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20net ipv4: Remove the unneeded registration of an empty net/ipv4/neighEric W. Biederman
sysctl no longer requires explicit creation of directories. The neigh directory is always populated with at least a default entry so this won't cause any user visible changes. Delete the ipv4_path and the ipv4_skeleton these are no longer needed. Directly register the ipv4_route_table. And since I am an idiot remove the header definitions that I should have removed in the previous patch. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20net: Move all of the network sysctls without a namespace into init_net.Eric W. Biederman
This makes it clearer which sysctls are relative to your current network namespace. This makes it a little less error prone by not exposing sysctls for the initial network namespace in other namespaces. This is the same way we handle all of our other network interfaces to userspace and I can't honestly remember why we didn't do this for sysctls right from the start. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19net: fix compile error of leaking kmemleak.h headerShan Wei
net/core/sysctl_net_core.c: In function ‘sysctl_core_init’: net/core/sysctl_net_core.c:259: error: implicit declaration of function ‘kmemleak_not_leak’ with same error in net/ipv4/route.c Signed-off-by: Shan Wei <davidshan@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-18net/ipv4:Remove two memleak reports by kmemleak_not_leak.majianpeng
Signed-off-by: majianpeng <majianpeng@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15net: cleanup unsigned to unsigned intEric Dumazet
Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15ipv4: fix checkpatch errorsDaniel Baluta
Fix checkpatch errors of the following type: * ERROR: "foo * bar" should be "foo *bar" * ERROR: "(foo*)" should be "(foo *)" Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2012-04-05net/route: export symbol ip_tos2prioAmir Vadai
Need to export this to enable drivers use rt_tos2priority() Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Provide device string properly for USB i2400m wimax devices, also don't OOPS when providing firmware string. From Phil Sutter. 2) Add support for sh_eth SH7734 chips, from Nobuhiro Iwamatsu. 3) Add another device ID to USB zaurus driver, from Guan Xin. 4) Loop index start in pool vector iterator is wrong causing MAC to not get configured in bnx2x driver, fix from Dmitry Kravkov. 5) EQL driver assumes HZ=100, fix from Eric Dumazet. 6) Now that skb_add_rx_frag() can specify the truesize increment separately, do so in f_phonet and cdc_phonet, also from Eric Dumazet. 7) virtio_net accidently uses net_ratelimit() not only on the kernel warning but also the statistic bump, fix from Rick Jones. 8) ip_route_input_mc() uses fixed init_net namespace, oops, use dev_net(dev) instead. Fix from Benjamin LaHaise. 9) dev_forward_skb() needs to clear the incoming interface index of the SKB so that it looks like a new incoming packet, also from Benjamin LaHaise. 10) iwlwifi mistakenly initializes a channel entry as 2GHZ instead of 5GHZ, fix from Stanislav Yakovlev. 11) Missing kmalloc() return value checks in orinoco, from Santosh Nayak. 12) ath9k doesn't check for HT capabilities in the right way, it is checking ht_supported instead of the ATH9K_HW_CAP_HT flag. Fix from Sujith Manoharan. 13) Fix x86 BPF JIT emission of 16-bit immediate field of AND instructions, from Feiran Zhuang. 14) Avoid infinite loop in GARP code when registering sysfs entries. From David Ward. 15) rose protocol uses memcpy instead of memcmp in a device address comparison, oops. Fix from Daniel Borkmann. 16) Fix build of lpc_eth due to dev_hw_addr_rancom() interface being renamed to eth_hw_addr_random(). From Roland Stigge. 17) Make ipv6 RTM_GETROUTE interpret RTA_IIF attribute the same way that ipv4 does. Fix from Shmulik Ladkani. 18) via-rhine has an inverted bit test, causing suspend/resume regressions. Fix from Andreas Mohr. 19) RIONET assumes 4K page size, fix from Akinobu Mita. 20) Initialization of imask register in sky2 is buggy, because bits are "or'd" into an uninitialized local variable. Fix from Lino Sanfilippo. 21) Fix FCOE checksum offload handling, from Yi Zou. 22) Fix VLAN processing regression in e1000, from Jiri Pirko. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits) sky2: dont overwrite settings for PHY Quick link tg3: Fix 5717 serdes powerdown problem net: usb: cdc_eem: fix mtu net: sh_eth: fix endian check for architecture independent usb/rtl8150 : Remove duplicated definitions rionet: fix page allocation order of rionet_active via-rhine: fix wait-bit inversion. ipv6: Fix RTM_GETROUTE's interpretation of RTA_IIF to be consistent with ipv4 net: lpc_eth: Fix rename of dev_hw_addr_random net/netfilter/nfnetlink_acct.c: use linux/atomic.h rose_dev: fix memcpy-bug in rose_set_mac_address Fix non TBI PHY access; a bad merge undid bug fix in a previous commit. net/garp: avoid infinite loop if attribute already exists x86 bpf_jit: fix a bug in emitting the 16-bit immediate operand of AND bonding: emit event when bonding changes MAC mac80211: fix oper channel timestamp updation ath9k: Use HW HT capabilites properly MAINTAINERS: adding maintainer for ipw2x00 net: orinoco: add error handling for failed kmalloc(). net/wireless: ipw2x00: fix a typo in wiphy struct initilization ...
2012-04-02ipv4: Stop using NLA_PUT*().David S. Miller
These macros contain a hidden goto, and are thus extremely error prone and make code hard to audit. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-28Remove all #inclusions of asm/system.hDavid Howells
Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-28net/ipv4: fix IPv4 multicast over network namespacesBenjamin LaHaise
When using multicast over a local bridge feeding a number of LXC guests using veth, the LXC guests are unable to get a response from other guests when pinging 224.0.0.1. Multicast packets did not appear to be getting delivered to the network namespaces of the guest hosts, and further inspection showed that the incoming route was pointing to the loopback device of the host, not the guest. This lead to the wrong network namespace being picked up by sockets (like ICMP). Fix this by using the correct network namespace when creating the inbound route entry. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-12net: ipv4: Standardize prefixes for message loggingJoe Perches
Add #define pr_fmt(fmt) as appropriate. Add "IPv4: ", "TCP: ", and "IPsec: " to appropriate files. Standardize on "UDPLite: " for appropriate uses. Some prefixes were previously "UDPLITE: " and "UDP-Lite: ". Add KBUILD_MODNAME ": " to icmp and gre. Remove embedded prefixes as appropriate. Add missing "\n" to pr_info in gre.c. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-11net: Convert printks to pr_<level>Joe Perches
Use a more current kernel messaging style. Convert a printk block to print_hex_dump. Coalesce formats, align arguments. Use %s, __func__ instead of embedding function names. Some messages that were prefixed with <foo>_close are now prefixed with <foo>_fini. Some ah4 and esp messages are now not prefixed with "ip ". The intent of this patch is to later add something like #define pr_fmt(fmt) "IPv4: " fmt. to standardize the output messages. Text size is trivially reduced. (x86-32 allyesconfig) $ size net/ipv4/built-in.o* text data bss dec hex filename 887888 31558 249696 1169142 11d6f6 net/ipv4/built-in.o.new 887934 31558 249800 1169292 11d78c net/ipv4/built-in.o.old Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2012-03-08route: Remove redirect_genidSteffen Klassert
As we invalidate the inetpeer tree along with the routing cache now, we don't need a genid to reset the redirect handling when the routing cache is flushed. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-08inetpeer: Invalidate the inetpeer tree along with the routing cacheSteffen Klassert
We initialize the routing metrics with the values cached on the inetpeer in rt_init_metrics(). So if we have the metrics cached on the inetpeer, we ignore the user configured fib_metrics. To fix this issue, we replace the old tree with a fresh initialized inet_peer_base. The old tree is removed later with a delayed work queue. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-15ipv4: Eliminate spurious argument to __ipv4_neigh_lookupDavid S. Miller
'tbl' is always arp_tbl, so specifying it is pointless. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-26ipv4/ipv6: Prepare for new route gateway semantics.David S. Miller
In the future the ipv4/ipv6 route gateway will take on two types of values: 1) INADDR_ANY/IN6ADDR_ANY, for local network routes, and in this case the neighbour must be obtained using the destination address in ipv4/ipv6 header as the lookup key. 2) Everything else, the actual nexthop route address. So if the gateway is not inaddr-any we use it, otherwise we must use the packet's destination address. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/bluetooth/l2cap_core.c Just two overlapping changes, one added an initialization of a local variable, and another change added a new local variable. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22net: introduce DST_NOPEER dst flagEric Dumazet
Chris Boot reported crashes occurring in ipv6_select_ident(). [ 461.457562] RIP: 0010:[<ffffffff812dde61>] [<ffffffff812dde61>] ipv6_select_ident+0x31/0xa7 [ 461.578229] Call Trace: [ 461.580742] <IRQ> [ 461.582870] [<ffffffff812efa7f>] ? udp6_ufo_fragment+0x124/0x1a2 [ 461.589054] [<ffffffff812dbfe0>] ? ipv6_gso_segment+0xc0/0x155 [ 461.595140] [<ffffffff812700c6>] ? skb_gso_segment+0x208/0x28b [ 461.601198] [<ffffffffa03f236b>] ? ipv6_confirm+0x146/0x15e [nf_conntrack_ipv6] [ 461.608786] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.614227] [<ffffffff81271d64>] ? dev_hard_start_xmit+0x357/0x543 [ 461.620659] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.626440] [<ffffffffa0379745>] ? br_parse_ip_options+0x19a/0x19a [bridge] [ 461.633581] [<ffffffff812722ff>] ? dev_queue_xmit+0x3af/0x459 [ 461.639577] [<ffffffffa03747d2>] ? br_dev_queue_push_xmit+0x72/0x76 [bridge] [ 461.646887] [<ffffffffa03791e3>] ? br_nf_post_routing+0x17d/0x18f [bridge] [ 461.653997] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.659473] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.665485] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.671234] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.677299] [<ffffffffa0379215>] ? nf_bridge_update_protocol+0x20/0x20 [bridge] [ 461.684891] [<ffffffffa03bb0e5>] ? nf_ct_zone+0xa/0x17 [nf_conntrack] [ 461.691520] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.697572] [<ffffffffa0374812>] ? NF_HOOK.constprop.8+0x3c/0x56 [bridge] [ 461.704616] [<ffffffffa0379031>] ? nf_bridge_push_encap_header+0x1c/0x26 [bridge] [ 461.712329] [<ffffffffa037929f>] ? br_nf_forward_finish+0x8a/0x95 [bridge] [ 461.719490] [<ffffffffa037900a>] ? nf_bridge_pull_encap_header+0x1c/0x27 [bridge] [ 461.727223] [<ffffffffa0379974>] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge] [ 461.734292] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.739758] [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge] [ 461.746203] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.751950] [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge] [ 461.758378] [<ffffffffa037533a>] ? NF_HOOK.constprop.4+0x56/0x56 [bridge] This is caused by bridge netfilter special dst_entry (fake_rtable), a special shared entry, where attaching an inetpeer makes no sense. Problem is present since commit 87c48fa3b46 (ipv6: make fragment identifications less predictable) Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and __ip_select_ident() fallback to the 'no peer attached' handling. Reported-by: Chris Boot <bootc@bootc.net> Tested-by: Chris Boot <bootc@bootc.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22ipv4: using prefetch requires including prefetch.hStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-21ipv4: reintroduce route cache garbage collectorEric Dumazet
Commit 2c8cec5c10b (ipv4: Cache learned PMTU information in inetpeer) removed IP route cache garbage collector a bit too soon, as this gc was responsible for expired routes cleanup, releasing their neighbour reference. As pointed out by Robert Gladewitz, recent kernels can fill and exhaust their neighbour cache. Reintroduce the garbage collection, since we'll have to wait our neighbour lookups become refcount-less to not depend on this stuff. Reported-by: Robert Gladewitz <gladewitz@gmx.de> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2011-12-05net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.David Miller
To reflect the fact that a refrence is not obtained to the resulting neighbour entry. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>
2011-12-05ipv4: Fix peer validation on cached lookup.David S. Miller
If ipv4_valdiate_peer() fails during a cached entry lookup, we'll NULL derer since the loop iterator assumes rth is not NULL. Letting this be handled as a failure is just bogus, so just make it not fail. If we have trouble getting a non-NULL neighbour for the redirected gateway, just restore the original gateway and continue. The very next use of this cached route will try again. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-03ipv4: make sure RTO_ONLINK is saved in routing cacheJulian Anastasov
__mkroute_output fails to work with the original tos and uses value with stripped RTO_ONLINK bit. Make sure we put the original TOS bits into rt_key_tos because it used to match cached route. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2011-12-01ipv4: Perform peer validation on cached route lookup.David S. Miller
Otherwise we won't notice the peer GENID change. Reported-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30atm: clip: Use device neigh support on top of "arp_tbl".David Miller
Instead of instantiating an entire new neigh_table instance just for ATM handling, use the neigh device private facility. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30ipv4: fix lockdep splat in rt_cache_seq_showEric Dumazet
After commit f2c31e32b378 (fix NULL dereferences in check_peer_redir()), dst_get_neighbour() should be guarded by rcu_read_lock() / rcu_read_unlock() section. Reported-by: Miles Lane <miles.lane@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-26inet: add a redirect generation id in inetpeerEric Dumazet
Now inetpeer is the place where we cache redirect information for ipv4 destinations, we must be able to invalidate informations when a route is added/removed on host. As inetpeer is not yet namespace aware, this patch adds a shared redirect_genid, and a per inetpeer redirect_genid. This might be changed later if inetpeer becomes ns aware. Cache information for one inerpeer is valid as long as its redirect_genid has the same value than global redirect_genid. Reported-by: Arkadiusz Miśkiewicz <a.miskiewicz@gmail.com> Tested-by: Arkadiusz Miśkiewicz <a.miskiewicz@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-26ipv4: Don't use the cached pmtu informations for input routesSteffen Klassert
The pmtu informations on the inetpeer are visible for output and input routes. On packet forwarding, we might propagate a learned pmtu to the sender. As we update the pmtu informations of the inetpeer on demand, the original sender of the forwarded packets might never notice when the pmtu to that inetpeer increases. So use the mtu of the outgoing device on packet forwarding instead of the pmtu to the final destination. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-26net: Move mtu handling down to the protocol depended handlersSteffen Klassert
We move all mtu handling from dst_mtu() down to the protocol layer. So each protocol can implement the mtu handling in a different manner. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-26net: Rename the dst_opt default_mtu method to mtuSteffen Klassert
We plan to invoke the dst_opt->default_mtu() method unconditioally from dst_mtu(). So rename the method to dst_opt->mtu() to match the name with the new meaning. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-26route: Use the device mtu as the default for blackhole routesSteffen Klassert
As it is, we return null as the default mtu of blackhole routes. This may lead to a propagation of a bogus pmtu if the default_mtu method of a blackhole route is invoked. So return dst->dev->mtu as the default mtu instead. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-18ipv4: fix redirect handlingEric Dumazet
commit f39925dbde77 (ipv4: Cache learned redirect information in inetpeer.) introduced a regression in ICMP redirect handling. It assumed ipv4_dst_check() would be called because all possible routes were attached to the inetpeer we modify in ip_rt_redirect(), but thats not true. commit 7cc9150ebe (route: fix ICMP redirect validation) tried to fix this but solution was not complete. (It fixed only one route) So we must lookup existing routes (including different TOS values) and call check_peer_redir() on them. Reported-by: Ivan Zahariev <famzah@icdsoft.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-08ipv4: Fix inetpeer expire time informationSteffen Klassert
As we update the learned pmtu informations on demand, we might report a nagative expiration time value to userspace if the pmtu informations are already expired and we have not send a packet to that inetpeer after expiration. With this patch we send a expire time of null to userspace after expiration until the next packet is send to that inetpeer. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24ipv4: avoid useless call of the function check_peer_pmtuGao feng
In func ipv4_dst_check,check_peer_pmtu should be called only when peer is updated. So,if the peer is not updated in ip_rt_frag_needed,we can not inc __rt_peer_genid. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24Merge branch 'master' of ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2011-10-24route: fix ICMP redirect validationFlavio Leitner
The commit f39925dbde7788cfb96419c0f092b086aa325c0f (ipv4: Cache learned redirect information in inetpeer.) removed some ICMP packet validations which are required by RFC 1122, section 3.2.2.2: ... A Redirect message SHOULD be silently discarded if the new gateway address it specifies is not on the same connected (sub-) net through which the Redirect arrived [INTRO:2, Appendix A], or if the source of the Redirect is not the current first-hop gateway for the specified destination (see Section 3.3.1). Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-03ipv4: NET_IPV4_ROUTE_GC_INTERVAL removalVasily Averin
removing obsoleted sysctl, ip_rt_gc_interval variable no longer used since 2.6.38 Signed-off-by: Vasily Averin <vvs@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-20Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2011-08-12net: cleanup some rcu_dereference_rawEric Dumazet
RCU api had been completed and rcu_access_pointer() or rcu_dereference_protected() are better than generic rcu_dereference_raw() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-11ipv4: some rt_iif -> rt_route_iif conversionsJulian Anastasov
As rt_iif represents input device even for packets coming from loopback with output route, it is not an unique key specific to input routes. Now rt_route_iif has such role, it was fl.iif in 2.6.38, so better to change the checks at some places to save CPU cycles and to restore 2.6.38 semantics. compare_keys: - input routes: only rt_route_iif matters, rt_iif is same - output routes: only rt_oif matters, rt_iif is not used for matching in __ip_route_output_key - now we are back to 2.6.38 state ip_route_input_common: - matching rt_route_iif implies input route - compared to 2.6.38 we eliminated one rth->fl.oif check because it was not needed even for 2.6.38 compare_hash_inputs: Only the change here is not an optimization, it has effect only for output routes. I assume I'm restoring the original intention to ignore oif, it was using fl.iif - now we are back to 2.6.38 state Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07ipv4: fix the reusing of routing cache entriesJulian Anastasov
compare_keys and ip_route_input_common rely on rt_oif for distinguishing of input and output routes with same keys values. But sometimes the input route has also same hash chain (keyed by iif != 0) with the output routes (keyed by orig_oif=0). Problem visible if running with small number of rhash_entries. Fix them to use rt_route_iif instead. By this way input route can not be returned to users that request output route. The patch fixes the ip_rt_bug errors that were reported in ip_local_out context, mostly for 255.255.255.255 destinations. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-06net: Compute protocol sequence numbers and fragment IDs using MD5.David S. Miller
Computers have become a lot faster since we compromised on the partial MD4 hash which we use currently for performance reasons. MD5 is a much safer choice, and is inline with both RFC1948 and other ISS generators (OpenBSD, Solaris, etc.) Furthermore, only having 24-bits of the sequence number be truly unpredictable is a very serious limitation. So the periodic regeneration and 8-bit counter have been removed. We compute and use a full 32-bit sequence number. For ipv6, DCCP was found to use a 32-bit truncated initial sequence number (it needs 43-bits) and that is fixed here as well. Reported-by: Dan Kaminsky <dan@doxpara.com> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-03net: fix NULL dereferences in check_peer_redir()Eric Dumazet
Gergely Kalman reported crashes in check_peer_redir(). It appears commit f39925dbde778 (ipv4: Cache learned redirect information in inetpeer.) added a race, leading to possible NULL ptr dereference. Since we can now change dst neighbour, we should make sure a reader can safely use a neighbour. Add RCU protection to dst neighbour, and make sure check_peer_redir() can be called safely by different cpus in parallel. As neighbours are already freed after one RCU grace period, this patch should not add typical RCU penalty (cache cold effects) Many thanks to Gergely for providing a pretty report pointing to the bug. Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-23ipv4: use RT_TOS after some rt_tos conversionsJulian Anastasov
rt_tos was changed to iph->tos but it must be filtered by RT_TOS Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>