summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2013-12-17net: ovs: use CRC32 accelerated flow hash if availableFrancesco Fusco
Currently OVS uses jhash2() for calculating flow hashes in its internal flow_hash() function. The performance of the flow_hash() function is critical, as the input data can be hundreds of bytes long. OVS is largely deployed in x86_64 based datacenters. Therefore, we argue that the performance critical fast path of OVS should exploit underlying CPU features in order to reduce the per packet processing costs. We replace jhash2 with the hash implementation provided by the kernel hash lib, which exploits the crc32l instruction to achieve high performance Our patch greatly reduces the hash footprint from ~200 cycles of jhash2() to around ~90 cycles in case of ovs_flow_hash_crc() (measured with rdtsc over maximum length flow keys on an i7 Intel CPU). Additionally, we wrote a microbenchmark to stress the flow table performance. The benchmark inserts random flows into the flow hash and then performs lookups. Our hash deployed on a CRC32 capable CPU reduces the lookup for 1000 flows, 100 masks from ~10,100us to ~6,700us, for example. Thus, simply use the newly introduced arch_fast_hash2() as a drop-in replacement. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Thomas Graf <tgraf@redhat.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-16tipc: change lock_sock order in connect()wangweidong
Instead of reaquiring the socket lock and taking the normal exit path when a connection times out, we bail out early with a return -ETIMEDOUT. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-16tipc: Use <linux/uaccess.h> instead of <asm/uaccess.h>wangweidong
As warned by checkpatch.pl, use #include <linux/uaccess.h> instead of <asm/uaccess.h> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-16tipc: kill unnecessary goto'swangweidong
Remove a number of needless 'goto exit' in send_stream when the socket is in an unconnected state. This patch is cosmetic and does not alter the operation of TIPC in any way. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-16tipc: remove unnecessary variables and conditionswangweidong
We remove a number of unnecessary variables and branches in TIPC. This patch is cosmetic and does not change the operation of TIPC in any way. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-15net-ipv6: Fix alleged compiler warning in ipv6_exthdrs_len()Jerry Chu
It was reported that Commit 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support") triggered a compiler warning in ipv6_exthdrs_len(): net/ipv6/ip6_offload.c: In function ‘ipv6_gro_complete’: net/ipv6/ip6_offload.c:178:24: warning: ‘optlen’ may be used uninitialized in this function [-Wmaybe-u opth = (void *)opth + optlen; ^ net/ipv6/ip6_offload.c:164:22: note: ‘optlen’ was declared here int len = 0, proto, optlen; ^ Note that there was no real bug here - optlen was never uninitialized before use. (Was the version of gcc I used smarter to not complain?) Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14ipv6: fix compiler warning in ipv6_exthdrs_lenHannes Frederic Sowa
Commit 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support") used an uninitialized variable which leads to the following compiler warning: net/ipv6/ip6_offload.c: In function ‘ipv6_gro_complete’: net/ipv6/ip6_offload.c:178:24: warning: ‘optlen’ may be used uninitialized in this function [-Wmaybe-uninitialized] opth = (void *)opth + optlen; ^ net/ipv6/ip6_offload.c:164:22: note: ‘optlen’ was declared here int len = 0, proto, optlen; ^ Fix it up. Cc: Jerry Chu <hkchu@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14bonding: create bond_first_slave_rcu()dingtianhong
The bond_first_slave_rcu() will be used to instead of bond_first_slave() in rcu_read_lock(). According to the Jay Vosburgh's suggestion, the struct netdev_adjacent should hide from users who wanted to use it directly. so I package a new function to get the first slave of the bond. Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com> Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14pkt_sched: set root qdisc before change() in attach_default_qdiscs()Eric Dumazet
After commit 95dc19299f74 ("pkt_sched: give visibility to mq slave qdiscs") we call disc_list_add() while the device qdisc might be the noop_qdisc one. This shows up as duplicates in "tc qdisc show", as all inactive devices point to noop_qdisc. Fix this by setting dev->qdisc to the new qdisc before calling ops->change() in attach_default_qdiscs() Add a WARN_ON_ONCE() to catch any future similar problem. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14packet: fix using smp_processor_id() in preemptible codeLi Zhong
This patches fixes the following warning by replacing smp_processor_id() with raw_smp_processor_id(): [ 11.120893] BUG: using smp_processor_id() in preemptible [00000000] code: arping/3510 [ 11.120913] caller is .packet_sendmsg+0xc14/0xe68 [ 11.120920] CPU: 13 PID: 3510 Comm: arping Not tainted 3.13.0-rc3-next-20131211-dirty #1 [ 11.120926] Call Trace: [ 11.120932] [c0000001f803f6f0] [c0000000000138dc] .show_stack+0x110/0x25c (unreliable) [ 11.120942] [c0000001f803f7e0] [c00000000083dd24] .dump_stack+0xa0/0x37c [ 11.120951] [c0000001f803f870] [c000000000493fd4] .debug_smp_processor_id+0xfc/0x12c [ 11.120959] [c0000001f803f900] [c0000000007eba78] .packet_sendmsg+0xc14/0xe68 [ 11.120968] [c0000001f803fa80] [c000000000700968] .sock_sendmsg+0xa0/0xe0 [ 11.120975] [c0000001f803fbf0] [c0000000007014d8] .SyS_sendto+0x100/0x148 [ 11.120983] [c0000001f803fd60] [c0000000006fff10] .SyS_socketcall+0x1c4/0x2e8 [ 11.120990] [c0000001f803fe30] [c00000000000a1e4] syscall_exit+0x0/0x9c Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14netconf: add proxy-arp supportstephen hemminger
Add support to netconf to show changes to proxy-arp status on a per interface basis via netlink in a manner similar to forwarding and reverse path state. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12ipv6: fix incorrect type in declarationFlorent Fourcot
Introduced by 1397ed35f22d7c30d0b89ba74b6b7829220dfcfd "ipv6: add flowinfo for tcp6 pkt_options for all cases" Reported-by: kbuild test robot <fengguang.wu@intel.com> V2: fix the title, add empty line after the declaration (Sergei Shtylyov feedbacks) Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12net-gro: Prepare GRO stack for the upcoming tunneling supportJerry Chu
This patch modifies the GRO stack to avoid the use of "network_header" and associated macros like ip_hdr() and ipv6_hdr() in order to allow an arbitary number of IP hdrs (v4 or v6) to be used in the encapsulation chain. This lays the foundation for various IP tunneling support (IP-in-IP, GRE, VXLAN, SIT,...) to be added later. With this patch, the GRO stack traversing now is mostly based on skb_gro_offset rather than special hdr offsets saved in skb (e.g., skb->network_header). As a result all but the top layer (i.e., the the transport layer) must have hdrs of the same length in order for a pkt to be considered for aggregation. Therefore when adding a new encap layer (e.g., for tunneling), one must check and skip flows (e.g., by setting NAPI_GRO_CB(p)->same_flow to 0) that have a different hdr length. Note that unlike the network header, the transport header can and will continue to be set by the GRO code since there will be at most one "transport layer" in the encap chain. Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11ipv6: router reachability probingJiri Benc
RFC 4191 states in 3.5: When a host avoids using any non-reachable router X and instead sends a data packet to another router Y, and the host would have used router X if router X were reachable, then the host SHOULD probe each such router X's reachability by sending a single Neighbor Solicitation to that router's address. A host MUST NOT probe a router's reachability in the absence of useful traffic that the host would have sent to the router if it were reachable. In any case, these probes MUST be rate-limited to no more than one per minute per router. Currently, when the neighbour corresponding to a router falls into NUD_FAILED, it's never considered again. Introduce a new rt6_nud_state value, RT6_NUD_FAIL_PROBE, which suggests the route should not be used but should be probed with a single NS. The probe is ratelimited by the existing code. To better distinguish meanings of the failure values, rename RT6_NUD_FAIL_SOFT to RT6_NUD_FAIL_DO_RR. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11sctp: remove redundant null check on asocwangweidong
In sctp_err_lookup, goto out while the asoc is not NULL, so remove the check NULL. Also, in sctp_err_finish which called by sctp_v4_err and sctp_v6_err, they pass asoc to sctp_err_finish while the asoc is not NULL, so remove the check. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11sch_htb: remove unnecessary NULL pointer judgmentYang Yingliang
It already has a NULL pointer judgment of rtab in qdisc_put_rtab(). Remove the judgment outside of qdisc_put_rtab(). Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11ipv4: fix wildcard search with inet_confirm_addr()Nicolas Dichtel
Help of this function says: "in_dev: only on this interface, 0=any interface", but since commit 39a6d0630012 ("[NETNS]: Process inet_confirm_addr in the correct namespace."), the code supposes that it will never be NULL. This function is never called with in_dev == NULL, but it's exported and may be used by an external module. Because this patch restore the ability to call inet_confirm_addr() with in_dev == NULL, I partially revert the above commit, as suggested by Julian. CC: Julian Anastasov <ja@ssi.bg> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11net_sched: expand control flow of macro SKIP_NONLOCALYang Yingliang
SKIP_NONLOCAL hides the control flow. The control flow should be inlined and expanded explicitly in code so that someone who reads it can tell the control flow can be changed by the statement. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11tipc: remove unused 'blocked' flag from tipc_link structYing Xue
In early versions of TIPC it was possible to administratively block individual links through the use of the member flag 'blocked'. This functionality was deemed redundant, and since commit 7368dd ("tipc: clean out all instances of #if 0'd unused code"), this flag has been unused. In the current code, a link only needs to be blocked for sending and reception if it is subject to an ongoing link failover. In that case, it is sufficient to check if the number of expected failover packets is non-zero, something which is done via the funtion 'link_blocked()'. This commit finally removes the redundant 'blocked' flag completely. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11tipc: eliminate code duplication in media layerYing Xue
Currently TIPC supports two L2 media types, Ethernet and Infiniband. Because both these media are accessed through the common net_device API, several functions in the two media adaptation files turn out to be fully or almost identical, leading to unnecessary code duplication. In this commit we extract this common code from the two media files and move them to the generic bearer.c. Additionally, we change the function names to reflect their real role: to access L2 media, irrespective of type. Signed-off-by: Ying Xue <ying.xue@windriver.com> Cc: Patrick McHardy <kaber@trash.net> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11tipc: relocate common functions from media to bearerYing Xue
Currently, registering a TIPC stack handler in the network device layer is done twice, once for Ethernet (eth_media) and Infiniband (ib_media) repectively. But, as this registration is not media specific, we can avoid some code duplication by moving the registering function to the generic bearer layer, to the file bearer.c, and call it only once. The same is true for the network device event notifier. As a side effect, the two workqueues we are using for for setting up/ cleaning up media can now be eliminated. Furthermore, the array for storing the specific media type structs, media_array[], can be entirely deleted. Note that the eth_started and ib_started flags were removed during the code relocation. There is now only one call to bearer_setup and bearer_cleanup, and these can logically not race against each other. Despite its size, this cleanup work incurs no functional changes in TIPC. In particular, it should be noted that the sequence ordering of received packets is unaffected by this change, since packet reception never was subject to any work queue handling in the first place. Signed-off-by: Ying Xue <ying.xue@windriver.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11tipc: remove TIPC usage of field af_packet_priv in struct net_deviceYing Xue
TIPC is currently using the field 'af_packet_priv' in struct net_device as a handle to find the bearer instance associated to the given network device. But, by doing so it is blocking other networking cleanups, such as the one discussed here: http://patchwork.ozlabs.org/patch/178044/ This commit removes this usage from TIPC. Instead, we introduce a new field, 'tipc_ptr', to the net_device structure, to serve this purpose. When TIPC bearer is enabled, the bearer object is associated to 'tipc_ptr'. When a TIPC packet arrives in the recv_msg() upcall from a networking device, the bearer object can now be obtained from 'tipc_ptr'. When a bearer is disabled, the bearer object is detached from its underlying network device by setting 'tipc_ptr' to NULL. Additionally, an RCU lock is used to protect the new pointer. Henceforth, the existing tipc_net_lock is used in write mode to serialize write accesses to this pointer, while the new RCU lock is applied on the read side to ensure that the pointer is 100% valid within its wrapped area for all readers. Signed-off-by: Ying Xue <ying.xue@windriver.com> Cc: Patrick McHardy <kaber@trash.net> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11tipc: improve naming and comment consistency in media layerJon Paul Maloy
struct 'tipc_media' represents the specific info that the media layer adaptors (eth_media and ib_media) expose to the generic bearer layer. We clarify this by improved commenting, and by giving the 'media_list' array the more appropriate name 'media_info_array'. There are no functional changes in this commit. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11tipc: initiate media type array at compile timeJon Paul Maloy
Communication media types are abstracted through the struct 'tipc_media', one per media type. These structs are allocated statically inside their respective media file. Furthermore, in order to be able to reach all instances from a central location, we keep a static array with pointers to these structs. This array is currently initialized at runtime, under protection of tipc_net_lock. However, since the contents of the array itself never changes after initialization, we can just as well initialize it at compile time and make it 'const', at the same time making it obvious that no lock protection is needed here. This commit makes the array constant and removes the redundant lock protection. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11tipc: eliminate redundant code with kfree_skb_list routineYing Xue
sk_buff lists are currently relased by looping over the list and explicitly releasing each buffer. We replace all occurrences of this loop with a call to kfree_skb_list(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10net_sched: sfq: put sfq_unlink in a do - while loopYang Yingliang
Macros with multiple statements should be enclosed in a do - while loop Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10net_sched: add space around '>' and before '('Yang Yingliang
Spaces required around that '>' (ctx:VxV) and before the open parenthesis '('. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10net_sched: change "foo* bar" to "foo *bar"Yang Yingliang
"foo* bar" or "foo * bar" should be "foo *bar". Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10net_sched: cls_bpf: use tabs to do indentYang Yingliang
Code indent should use tabs where possible Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10net_sched: remove unnecessary parentheses while returnYang Yingliang
return is not a function, parentheses are not required. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10net: handle error more gracefully in socketpair()Yann Droneaud
This patch makes socketpair() use error paths which do not rely on heavy-weight call to sys_close(): it's better to try to push the file descriptor to userspace before installing the socket file to the file descriptor, so that errors are catched earlier and being easier to handle. Using sys_close() seems to be the exception, while writing the file descriptor before installing it look like it's more or less the norm: eg. except for code used in init/, error handling involve fput() and put_unused_fd(), but not sys_close(). This make socketpair() usage of sys_close() quite unusual. So it deserves to be replaced by the common pattern relying on fput() and put_unused_fd() just like, for example, the one used in pipe(2) or recvmsg(2). Three distinct error paths are still needed since calling fput() on file structure returned by sock_alloc_file() will implicitly call sock_release() on the associated socket structure. Cc: David S. Miller <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=1385979146-13825-1-git-send-email-ydroneaud@opteya.com Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10net: more spelling fixesstephen hemminger
Various spelling fixes in networking stack Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10ipv4: add support for IFA_FLAGS nl attributeJiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10dn_dev: add support for IFA_FLAGS nl attributeJiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10neigh: use neigh_parms_net() to get struct neigh_parms->net pointerJiri Pirko
This fixes compile error when CONFIG_NET_NS is not set. Introduced by: commit 1d4c8c29841b9991cdf3c7cc4ba7f96a94f104ca "neigh: restore old behaviour of default parms values" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10ipv6 addrconf: revert /proc/net/if_inet6 ifa_flag formatJiri Pirko
Turned out that applications like ifconfig do not handle the change. So revert ifa_flag format back to 2-letter hex value. Introduced by: commit 479840ffdbe4242e8a25349218c8e0859223aa35 "ipv6 addrconf: extend ifa_flags to u32" Reported-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Tested-by: FLorent Fourcot <florent.fourcot@enst-bretagne.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09ipv6: use ip6_flowinfo helperFlorent Fourcot
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09ipv6: add ip6_flowlabel helperFlorent Fourcot
And use it if possible. Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09ipv6: remove rcv_tclass of ipv6_pinfoFlorent Fourcot
tclass information in now already stored in rcv_flowinfo We do not need to store the same information twice. Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09ipv6: move IPV6_TCLASS_MASK definition in ipv6.hFlorent Fourcot
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09ipv6: add flowinfo for tcp6 pkt_options for all casesFlorent Fourcot
The current implementation of IPV6_FLOWINFO only gives a result if pktoptions is available (thanks to the ip6_datagram_recv_ctl function). It gives inconsistent results to user space, sometimes there is a result for getsockopt(IPV6_FLOWINFO), sometimes not. This patch add rcv_flowinfo to store it, and return it to the userspace in the same way than other pkt_options. Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09batadv: Slight optimization of batadv_compare_ethJoe Perches
Use the newly added generic routine ether_addr_equal_unaligned to test if possibly unaligned to u16 Ethernet addresses are equal. This slightly improves comparison time for systems with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09neigh: ipv6: respect default values set before an address is assigned to deviceJiri Pirko
Make the behaviour similar to ipv4. This will allow user to set sysctl default neigh param values and these values will be respected even by devices registered before (that ones what do not have address set yet). Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09neigh: restore old behaviour of default parms valuesJiri Pirko
Previously inet devices were only constructed when addresses are added. Therefore the default neigh parms values they get are the ones at the time of these operations. Now that we're creating inet devices earlier, this changes the behaviour of default neigh parms values in an incompatible way (see bug #8519). This patch creates a compromise by setting the default values at the same point as before but only for those that have not been explicitly set by the user since the inet device's creation. Introduced by: commit 8030f54499925d073a88c09f30d5d844fb1b3190 Author: Herbert Xu <herbert@gondor.apana.org.au> Date: Thu Feb 22 01:53:47 2007 +0900 [IPV4] devinet: Register inetdev earlier. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09neigh: use tbl->family to distinguish ipv4 from ipv6Jiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09neigh: wrap proc dointvec functionsJiri Pirko
This will be needed later on to provide better management of default values. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09neigh: convert parms to an arrayJiri Pirko
This patch converts the neigh param members to an array. This allows easier manipulation which will be needed later on to provide better management of default values. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09tipc: remove interface state mirroring in bearerErik Hugne
struct 'tipc_bearer' is a generic representation of the underlying media type, and exists in a one-to-one relationship to each interface TIPC is using. The struct contains a 'blocked' flag that mirrors the operational and execution state of the represented interface, and is updated through notification calls from the latter. The users of tipc_bearer are checking this flag before each attempt to send a packet via the interface. This state mirroring serves no purpose in the current code base. TIPC links will not discover a media failure any faster through this mechanism, and in reality the flag only adds overhead at packet sending and reception. Furthermore, the fact that the flag needs to be protected by a spinlock aggregated into tipc_bearer has turned out to cause a serious and completely unnecessary deadlock problem. CPU0 CPU1 ---- ---- Time 0: bearer_disable() link_timeout() Time 1: spin_lock_bh(&b_ptr->lock) tipc_link_push_queue() Time 2: tipc_link_delete() tipc_bearer_blocked(b_ptr) Time 3: k_cancel_timer(&req->timer) spin_lock_bh(&b_ptr->lock) Time 4: del_timer_sync(&req->timer) I.e., del_timer_sync() on CPU0 never returns, because the timer handler on CPU1 is waiting for the bearer lock. We eliminate the 'blocked' flag from struct tipc_bearer, along with all tests on this flag. This not only resolves the deadlock, but also simplifies and speeds up the data path execution of TIPC. It also fits well into our ongoing effort to make the locking policy simpler and more manageable. An effect of this change is that we can get rid of functions such as tipc_bearer_blocked(), tipc_continue() and tipc_block_bearer(). We replace the latter with a new function, tipc_reset_bearer(), which resets all links associated to the bearer immediately after an interface goes down. A user might notice one slight change in link behaviour after this change. When an interface goes down, (e.g. through a NETDEV_DOWN event) all attached links will be reset immediately, instead of leaving it to each link to detect the failure through a timer-driven mechanism. We consider this an improvement, and see no obvious risks with the new behavior. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <Paul.Gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09x25: convert printks to pr_<level>wangweidong
use pr_<level> instead of printk(LEVEL) Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09packet: introduce PACKET_QDISC_BYPASS socket optionDaniel Borkmann
This patch introduces a PACKET_QDISC_BYPASS socket option, that allows for using a similar xmit() function as in pktgen instead of taking the dev_queue_xmit() path. This can be very useful when PF_PACKET applications are required to be used in a similar scenario as pktgen, but with full, flexible packet payload that needs to be provided, for example. On default, nothing changes in behaviour for normal PF_PACKET TX users, so everything stays as is for applications. New users, however, can now set PACKET_QDISC_BYPASS if needed to prevent own packets from i) reentering packet_rcv() and ii) to directly push the frame to the driver. In doing so we can increase pps (here 64 byte packets) for PF_PACKET a bit: # CPUs -- QDISC_BYPASS -- qdisc path -- qdisc path[**] 1 CPU == 1,509,628 pps -- 1,208,708 -- 1,247,436 2 CPUs == 3,198,659 pps -- 2,536,012 -- 1,605,779 3 CPUs == 4,787,992 pps -- 3,788,740 -- 1,735,610 4 CPUs == 6,173,956 pps -- 4,907,799 -- 1,909,114 5 CPUs == 7,495,676 pps -- 5,956,499 -- 2,014,422 6 CPUs == 9,001,496 pps -- 7,145,064 -- 2,155,261 7 CPUs == 10,229,776 pps -- 8,190,596 -- 2,220,619 8 CPUs == 11,040,732 pps -- 9,188,544 -- 2,241,879 9 CPUs == 12,009,076 pps -- 10,275,936 -- 2,068,447 10 CPUs == 11,380,052 pps -- 11,265,337 -- 1,578,689 11 CPUs == 11,672,676 pps -- 11,845,344 -- 1,297,412 [...] 20 CPUs == 11,363,192 pps -- 11,014,933 -- 1,245,081 [**]: qdisc path with packet_rcv(), how probably most people seem to use it (hopefully not anymore if not needed) The test was done using a modified trafgen, sending a simple static 64 bytes packet, on all CPUs. The trick in the fast "qdisc path" case, is to avoid reentering packet_rcv() by setting the RAW socket protocol to zero, like: socket(PF_PACKET, SOCK_RAW, 0); Tradeoffs are documented as well in this patch, clearly, if queues are busy, we will drop more packets, tc disciplines are ignored, and these packets are not visible to taps anymore. For a pktgen like scenario, we argue that this is acceptable. The pointer to the xmit function has been placed in packet socket structure hole between cached_dev and prot_hook that is hot anyway as we're working on cached_dev in each send path. Done in joint work together with Jesper Dangaard Brouer. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>