summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2016-04-08net: dsa: make the VLAN add function return voidVivien Didelot
The switchdev design implies that a software error should not happen in the commit phase since it must have been previously reported in the prepare phase. If an hardware error occurs during the commit phase, there is nothing switchdev can do about it. The DSA layer separates port_vlan_prepare and port_vlan_add for simplicity and convenience. If an hardware error occurs during the commit phase, there is no need to report it outside the driver itself. Make the DSA port_vlan_add routine return void for explicitness. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-08net: dsa: make the FDB add function return voidVivien Didelot
The switchdev design implies that a software error should not happen in the commit phase since it must have been previously reported in the prepare phase. If an hardware error occurs during the commit phase, there is nothing switchdev can do about it. The DSA layer separates port_fdb_prepare and port_fdb_add for simplicity and convenience. If an hardware error occurs during the commit phase, there is no need to report it outside the DSA driver itself. Make the DSA port_fdb_add routine return void for explicitness. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-08net: dsa: make the STP state function return voidVivien Didelot
The DSA layer doesn't care about the return code of the port_stp_update routine, so make it void in the layer and the DSA drivers. Replace the useless dsa_slave_stp_update function with a dsa_slave_stp_state function used to reply to the switchdev SWITCHDEV_ATTR_ID_PORT_STP_STATE attribute. In the meantime, rename port_stp_update to port_stp_state_set to explicit the state change. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-08Merge tag 'mac80211-next-for-davem-2016-04-06' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== For the 4.7 cycle, we have a number of changes: * Bob's mesh mode rhashtable conversion, this includes the rhashtable API change for allocation flags * BSSID scan, connect() command reassoc support (Jouni) * fast (optimised data only) and support for RSS in mac80211 (myself) * various smaller changes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07bpf: sanitize bpf tracepoint accessAlexei Starovoitov
during bpf program loading remember the last byte of ctx access and at the time of attaching the program to tracepoint check that the program doesn't access bytes beyond defined in tracepoint fields This also disallows access to __dynamic_array fields, but can be relaxed in the future. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07bpf: support bpf_get_stackid() and bpf_perf_event_output() in tracepoint ↵Alexei Starovoitov
programs needs two wrapper functions to fetch 'struct pt_regs *' to convert tracepoint bpf context into kprobe bpf context to reuse existing helper functions Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07perf, bpf: allow bpf programs attach to tracepointsAlexei Starovoitov
introduce BPF_PROG_TYPE_TRACEPOINT program type and allow it to be attached to the perf tracepoint handler, which will copy the arguments into the per-cpu buffer and pass it to the bpf program as its first argument. The layout of the fields can be discovered by doing 'cat /sys/kernel/debug/tracing/events/sched/sched_switch/format' prior to the compilation of the program with exception that first 8 bytes are reserved and not accessible to the program. This area is used to store the pointer to 'struct pt_regs' which some of the bpf helpers will use: +---------+ | 8 bytes | hidden 'struct pt_regs *' (inaccessible to bpf program) +---------+ | N bytes | static tracepoint fields defined in tracepoint/format (bpf readonly) +---------+ | dynamic | __dynamic_array bytes of tracepoint (inaccessible to bpf yet) +---------+ Not that all of the fields are already dumped to user space via perf ring buffer and broken application access it directly without consulting tracepoint/format. Same rule applies here: static tracepoint fields should only be accessed in a format defined in tracepoint/format. The order of fields and field sizes are not an ABI. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07perf: split perf_trace_buf_prepare into alloc and update partsAlexei Starovoitov
split allows to move expensive update of 'struct trace_entry' to later phase. Repurpose unused 1st argument of perf_tp_event() to indicate event type. While splitting use temp variable 'rctx' instead of '*rctx' to avoid unnecessary loads done by the compiler due to -fno-strict-aliasing Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07perf: remove unused __addr variableAlexei Starovoitov
now all calls to perf_trace_buf_submit() pass 0 as 4th argument which will be repurposed in the next patch which will change the meaning of 1st arg of perf_tp_event() to event_type Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07perf: optimize perf_fetch_caller_regsAlexei Starovoitov
avoid memset in perf_fetch_caller_regs, since it's the critical path of all tracepoints. It's called from perf_sw_event_sched, perf_event_task_sched_in and all of perf_trace_##call with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by perpcu init logic and subsequent call to perf_arch_fetch_caller_regs initializes the same fields on all archs, so we can safely drop memset from all of the above cases and move it into perf_ftrace_function_call that calls it with stack allocated pt_regs. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07net: Fix build failure due to lockdep_sock_is_held().David S. Miller
Needs to be protected with CONFIG_LOCKDEP. Based upon a patch by Hannes Frederic Sowa. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07sock: make lockdep_sock_is_held static inlineHannes Frederic Sowa
I forgot to add inline to lockdep_sock_is_held, so it generated all kinds of build warnings if not build with lockdep support. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07udp: Remove udp_offloadsTom Herbert
Now that the UDP encapsulation GRO functions have been moved to the UDP socket we not longer need the udp_offload insfrastructure so removing it. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07vxlan: change vxlan to use UDP socket GROTom Herbert
Adapt vxlan_gro_receive, vxlan_gro_complete to take a socket argument. Set these functions in tunnel_config. Don't set udp_offloads any more. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07udp: Add socket based GRO and configTom Herbert
Add gro_receive and gro_complete to struct udp_tunnel_sock_cfg. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07udp: Add GRO functions to UDP socketTom Herbert
This patch adds GRO functions (gro_receive and gro_complete) to UDP sockets. udp_gro_receive is changed to perform socket lookup on a packet. If a socket is found the related GRO functions are called. This features obsoletes using UDP offload infrastructure for GRO (udp_offload). This has the advantage of not being limited to provide offload on a per port basis, GRO is now applied to whatever individual UDP sockets are bound to. This also allows the possbility of "application defined GRO"-- that is we can attach something like a BPF program to a UDP socket to perfrom GRO on an application layer protocol. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07udp: Add udp6_lib_lookup_skb and udp4_lib_lookup_skbTom Herbert
Add externally visible functions to lookup a UDP socket by skb. This will be used for GRO in UDP sockets. These functions also check if skb->dst is set, and if it is not skb->dev is used to get dev_net. This allows calling lookup functions before dst has been set on the skbuff. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07net: Checks skb_dst to be NULL in inet_iifTom Herbert
In inet_iif check if skb_rtable is NULL for the skb and return skb->skb_iif if it is. This change allows inet_iif to be called before the dst information has been set in the skb (e.g. when doing socket based UDP GRO). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07tun: use socket locks for sk_{attach,detatch}_filterHannes Frederic Sowa
This reverts commit 5a5abb1fa3b05dd ("tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter") and replaces it to use lock_sock around sk_{attach,detach}_filter. The checks inside filter.c are updated with lockdep_sock_is_held to check for proper socket locks. It keeps the code cleaner by ensuring that only one lock governs the socket filter instead of two independent locks. Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07net: introduce lockdep_is_held and update various places to use itHannes Frederic Sowa
The socket is either locked if we hold the slock spin_lock for lock_sock_fast and unlock_sock_fast or we own the lock (sk_lock.owned != 0). Check for this and at the same time improve that the current thread/cpu is really holding the lock. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07sock: fix lockdep annotation in release_sockHannes Frederic Sowa
During release_sock we use callbacks to finish the processing of outstanding skbs on the socket. We actually are still locked, sk_locked.owned == 1, but we already told lockdep that the mutex is released. This could lead to false positives in lockdep for lockdep_sock_is_held (we don't hold the slock spinlock during processing the outstanding skbs). I took over this patch from Eric Dumazet and tested it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06vxlan: implement GPEJiri Benc
Implement VXLAN-GPE. Only COLLECT_METADATA is supported for now (it is possible to support static configuration, too, if there is demand for it). The GPE header parsing has to be moved before iptunnel_pull_header, as we need to know the protocol. v2: Removed what was called "L2 mode" in v1 of the patchset. Only "L3 mode" (now called "raw mode") is added by this patch. This mode does not allow Ethernet header to be encapsulated in VXLAN-GPE when using ip route to specify the encapsulation, IP header is encapsulated instead. The patch does support Ethernet to be encapsulated, though, using ETH_P_TEB in skb->protocol. This will be utilized by other COLLECT_METADATA users (openvswitch in particular). If there is ever demand for Ethernet encapsulation with VXLAN-GPE using ip route, it's easy to add a new flag switching the interface to "Ethernet mode" (called "L2 mode" in v1 of this patchset). For now, leave this out, it seems we don't need it. Disallowed more flag combinations, especially RCO with GPE. Added comment explaining that GBP and GPE cannot be set together. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06ip_tunnel: implement __iptunnel_pull_headerJiri Benc
Allow calling of iptunnel_pull_header without special casing ETH_P_TEB inner protocol. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06cfg80211: Add option to specify previous BSSID for Connect commandJouni Malinen
This extends NL80211_CMD_CONNECT to allow the NL80211_ATTR_PREV_BSSID attribute to be used similarly to way this was already allowed with NL80211_CMD_ASSOCIATE. This allows user space to request reassociation (instead of association) when already connected to an AP. This provides an option to reassociate within an ESS without having to disconnect and associate with the AP. Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-06mac80211: add A-MSDU tx supportFelix Fietkau
Requires software tx queueing and fast-xmit support. For good performance, drivers need frag_list support as well. This avoids the need for copying data of aggregated frames. Running without it is only supported for debugging purposes. To avoid performance and packet size issues, the rate control module or driver needs to limit the maximum A-MSDU size by setting max_rc_amsdu_len in struct ieee80211_sta. Signed-off-by: Felix Fietkau <nbd@openwrt.org> [fix locking issue] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-06mac80211: enable collecting station statistics per-CPUJohannes Berg
If the driver advertises the new HW flag USE_RSS, make the station statistics on the fast-rx path per-CPU. This will enable calling the RX in parallel, only hitting locking or shared cachelines when the fast-RX path isn't available. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-06mac80211: add fast-rx pathJohannes Berg
The regular RX path has a lot of code, but with a few assumptions on the hardware it's possible to reduce the amount of code significantly. Currently the assumptions on the driver are the following: * hardware/driver reordering buffer (if supporting aggregation) * hardware/driver decryption & PN checking (if using encryption) * hardware/driver did de-duplication * hardware/driver did A-MSDU deaggregation * AP_LINK_PS is used (in AP mode) * no client powersave handling in mac80211 (in client mode) of which some are actually checked per packet: * de-duplication * PN checking * decryption and additionally packets must * not be A-MSDU (have been deaggregated by driver/device) * be data packets * not be fragmented * be unicast * have RFC 1042 header Additionally dynamically we assume: * no encryption or CCMP/GCMP, TKIP/WEP/other not allowed * station must be authorized * 4-addr format not enabled Some data needed for the RX path is cached in a new per-station "fast_rx" structure, so that we only need to look at this and the packet, no other memory when processing packets on the fast RX path. After doing the above per-packet checks, the data path collapses down to a pretty simple conversion function taking advantage of the data cached in the small fast_rx struct. This should speed up the RX processing, and will make it easier to reason about parallelizing RX (for which statistics will need to be per-CPU still.) Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-06mac80211: allow passing transmitter station on RXJohannes Berg
Sometimes drivers already looked up, or know out-of-band from their device, which station transmitted a given RX frame. Allow them to pass the station pointer to mac80211 to save the extra lookup. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05udp: enable MSG_PEEK at non-zero offsetsamanthakumar
Enable peeking at UDP datagrams at the offset specified with socket option SOL_SOCKET/SO_PEEK_OFF. Peek at any datagram in the queue, up to the end of the given datagram. Implement the SO_PEEK_OFF semantics introduced in commit ef64a54f6e55 ("sock: Introduce the SO_PEEK_OFF sock option"). Increase the offset on peek, decrease it on regular reads. When peeking, always checksum the packet immediately, to avoid recomputation on subsequent peeks and final read. The socket lock is not held for the duration of udp_recvmsg, so peek and read operations can run concurrently. Only the last store to sk_peek_off is preserved. Signed-off-by: Sam Kumar <samanthakumar@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05udp: remove headers from UDP packets before queueingsamanthakumar
Remove UDP transport headers before queueing packets for reception. This change simplifies a follow-up patch to add MSG_PEEK support. Signed-off-by: Sam Kumar <samanthakumar@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05sock: convert sk_peek_offset functions to WRITE_ONCEWillem de Bruijn
Make the peek offset interface safe to use in lockless environments. Use READ_ONCE and WRITE_ONCE to avoid race conditions between testing and updating the peek offset. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05mac80211: track and tell driver about GO client P2P PS abilitiesAyala Beker
Legacy clients don't support P2P power save mechanism, and thus if a P2P GO has a legacy client connected to it, it should disable P2P PS mechanisms. Let the driver know about this with a new bss_conf parameter. Signed-off-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05cfg80211: allow userspace to specify client P2P PS supportAyala Beker
Legacy clients don't support P2P power save mechanisms, and thus if a P2P GO has a legacy client connected to it, it has to make some changes in the PS behavior. To handle this, add an attribute to specify whether a station supports P2P PS or not. If the attribute was not specified cfg80211 will assume that station supports it for P2P GO interface, and does NOT support it for AP interface, matching the current assumptions in the code. Signed-off-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05ieee80211: support parsing Fine Timing Measurement action frameAvraham Stern
Add definition for Fine Timing Measurement (FTM) frame format as defined in IEEE802.11-REVmcD5.0 section 9.6.8.33 Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05cfg80211: fix kernel-doc struct nameAkira Moroo
This patch fix a structure name mismatch in cfg80211.h. Signed-off-by: Moroo Akira <retrage01@gmail.com> Reviewed-by: Julian Calaby <julian.calaby@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05nl80211: add feature for BSS selection supportArend van Spriel
Introducing a new feature that the driver can use to indicate the driver/firmware supports configuration of BSS selection criteria upon CONNECT command. This can be useful when multiple BSS-es are found belonging to the same ESS, ie. Infra-BSS with same SSID. The criteria can then be used to offload selection of a preferred BSS. Reviewed-by: Hante Meuleman <meuleman@broadcom.com> Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com> Reviewed-by: Lei Zhang <leizh@broadcom.com> Signed-off-by: Arend van Spriel <arend@broadcom.com> [move wiphy support check into parse_bss_select()] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05mac80211: synchronize driver rx queues before removing a stationSara Sharon
Some devices, like iwlwifi, have RSS queues. This may cause a situation where a disassociation is handled in control path and results in station removal while there are prior RX frames that were still not processed in other queues. When they will be processed the station will be gone, and the frames will be dropped. Add a synchronization interface to avoid that. When driver returns from the synchronization mac80211 may remove the station. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05rhashtable: accept GFP flags in rhashtable_walk_initBob Copeland
In certain cases, the 802.11 mesh pathtable code wants to iterate over all of the entries in the forwarding table from the receive path, which is inside an RCU read-side critical section. Enable walks inside atomic sections by allowing GFP_ATOMIC allocations for the walker state. Change all existing callsites to pass in GFP_KERNEL. Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Bob Copeland <me@bobcopeland.com> [also adjust gfs2/glock.c and rhashtable tests] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05cfg80211: Allow a scan request for a specific BSSIDJouni Malinen
This allows scans for a specific BSSID to be optimized by the user space application by requesting the driver to set the Probe Request frame BSSID field (Address 3) to the specified BSSID instead of the wildcard BSSID. This prevents other APs from replying which reduces airtime need and latency in getting the response from the target AP through. This is an optimization and as such, it is acceptable for some of the drivers not to support the mechanism. If not supported, the wildcard BSSID will be used and more responses may be received. Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05mac80211: allow not sending MIC up from driver for HW cryptoSara Sharon
When HW crypto is used, there's no need for the CCMP/GCMP MIC to be available to mac80211, and the hardware might have removed it already after checking. The MIC is also useless to have when the frame is already decrypted, so allow indicating that it's not present. Since we are running out of bits in mac80211_rx_flags, make the flags field a u64. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05mac80211: allow drivers to report CLOCK_BOOTTIME for scan resultsJohannes Berg
This was requested by Android, and the appropriate cfg80211 API had been added by Dmitry. Support it in mac80211, allowing drivers to provide the timestamp. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-04tcp: increment sk_drops for listenersEric Dumazet
Goal: packets dropped by a listener are accounted for. This adds tcp_listendrop() helper, and clears sk_drops in sk_clone_lock() so that children do not inherit their parent drop count. Note that we no longer increment LINUX_MIB_LISTENDROPS counter when sending a SYNCOOKIE, since the SYN packet generated a SYNACK. We already have a separate LINUX_MIB_SYNCOOKIESSENT Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04tcp: increment sk_drops for dropped rx packetsEric Dumazet
Now ss can report sk_drops, we can instruct TCP to increment this per socket counter when it drops an incoming frame, to refine monitoring and debugging. Following patch takes care of listeners drops. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04sock_diag: add SK_MEMINFO_DROPSEric Dumazet
Reporting sk_drops to user space was available for UDP sockets using /proc interface. Add this to sock_diag, so that we can have the same information available to ss users, and we'll be able to add sk_drops indications for TCP sockets as well. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04tcp/dccp: do not touch listener sk_refcnt under synfloodEric Dumazet
When a SYNFLOOD targets a non SO_REUSEPORT listener, multiple cpus contend on sk->sk_refcnt and sk->sk_wmem_alloc changes. By letting listeners use SOCK_RCU_FREE infrastructure, we can relax TCP_LISTEN lookup rules and avoid touching sk_refcnt Note that we still use SLAB_DESTROY_BY_RCU rules for other sockets, only listeners are impacted by this change. Peak performance under SYNFLOOD is increased by ~33% : On my test machine, I could process 3.2 Mpps instead of 2.4 Mpps Most consuming functions are now skb_set_owner_w() and sock_wfree() contending on sk->sk_wmem_alloc when cooking SYNACK and freeing them. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04inet: reqsk_alloc() needs to take care of dead listenersEric Dumazet
We'll soon no longer take a refcount on listeners, so reqsk_alloc() can not assume a listener refcount is not zero. We need to use atomic_inc_not_zero() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04tcp/dccp: remove BH disable/enable in lookupEric Dumazet
Since linux 2.6.29, lookups only use rcu locking. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04udp: no longer use SLAB_DESTROY_BY_RCUEric Dumazet
Tom Herbert would like not touching UDP socket refcnt for encapsulated traffic. For this to happen, we need to use normal RCU rules, with a grace period before freeing a socket. UDP sockets are not short lived in the high usage case, so the added cost of call_rcu() should not be a concern. This actually removes a lot of complexity in UDP stack. Multicast receives no longer need to hold a bucket spinlock. Note that ip early demux still needs to take a reference on the socket. Same remark for functions used by xt_socket and xt_PROXY netfilter modules, but this might be changed later. Performance for a single UDP socket receiving flood traffic from many RX queues/cpus. Simple udp_rx using simple recvfrom() loop : 438 kpps instead of 374 kpps : 17 % increase of the peak rate. v2: Addressed Willem de Bruijn feedback in multicast handling - keep early demux break in __udp4_lib_demux_lookup() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Cc: Willem de Bruijn <willemb@google.com> Tested-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04net: add SOCK_RCU_FREE socket flagEric Dumazet
We want a generic way to insert an RCU grace period before socket freeing for cases where RCU_SLAB_DESTROY_BY_RCU is adding too much overhead. SLAB_DESTROY_BY_RCU strict rules force us to take a reference on the socket sk_refcnt, and it is a performance problem for UDP encapsulation, or TCP synflood behavior, as many CPUs might attempt the atomic operations on a shared sk_refcnt UDP sockets and TCP listeners can set SOCK_RCU_FREE so that their lookup can use traditional RCU rules, without refcount changes. They can set the flag only once hashed and visible by other cpus. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Tested-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04sock: enable timestamping using control messagesSoheil Hassas Yeganeh
Currently, SOL_TIMESTAMPING can only be enabled using setsockopt. This is very costly when users want to sample writes to gather tx timestamps. Add support for enabling SO_TIMESTAMPING via control messages by using tsflags added in `struct sockcm_cookie` (added in the previous patches in this series) to set the tx_flags of the last skb created in a sendmsg. With this patch, the timestamp recording bits in tx_flags of the skbuff is overridden if SO_TIMESTAMPING is passed in a cmsg. Please note that this is only effective for overriding the recording timestamps flags. Users should enable timestamp reporting (e.g., SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_OPT_ID) using socket options and then should ask for SOF_TIMESTAMPING_TX_* using control messages per sendmsg to sample timestamps for each write. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>