summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2015-05-09netlink: allow to listen "all" netnsNicolas Dichtel
More accurately, listen all netns that have a nsid assigned into the netns where the netlink socket is opened. For this purpose, a netlink socket option is added: NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this socket will receive netlink notifications from all netns that have a nsid assigned into the netns where the socket has been opened. The nsid is sent to userland via an anscillary data. With this patch, a daemon needs only one socket to listen many netns. This is useful when the number of netns is high. Because 0 is a valid value for a nsid, the field nsid_is_set indicates if the field nsid is valid or not. skb->cb is initialized to 0 on skb allocation, thus we are sure that we will never send a nsid 0 by error to the userland. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09netlink: rename private flags and statesNicolas Dichtel
These flags and states have the same prefix (NETLINK_) that netlink socket options. To avoid confusion and to be able to name a flag like a socket option, let's use an other prefix: NETLINK_[S|F]_. Note: a comment has been fixed, it was talking about NETLINK_RECV_NO_ENOBUFS socket option instead of NETLINK_NO_ENOBUFS. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09netns: use a spin_lock to protect nsid managementNicolas Dichtel
Before this patch, nsid were protected by the rtnl lock. The goal of this patch is to be able to find a nsid without needing to hold the rtnl lock. The next patch will introduce a netlink socket option to listen to all netns that have a nsid assigned into the netns where the socket is opened. Thus, it's important to call rtnl_net_notifyid() outside the spinlock, to avoid a recursive lock (nsid are notified via rtnl). This was the main reason of the previous patch. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09netns: notify new nsid outside __peernet2id()Nicolas Dichtel
There is no functional change with this patch. It will ease the refactoring of the locking system that protects nsids and the support of the netlink socket option NETLINK_LISTEN_ALL_NSID. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09netns: rename peernet2id() to peernet2id_alloc()Nicolas Dichtel
In a following commit, a new function will be introduced to only lookup for a nsid (no allocation if the nsid doesn't exist). To avoid confusion, the existing function is renamed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09netns: always provide the id to rtnl_net_fill()Nicolas Dichtel
The goal of this commit is to prepare the rework of the locking of nsnid protection. After this patch, rtnl_net_notifyid() will not call anymore __peernet2id(), ie no idr_* operation into this function. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09netns: returns always an id in __peernet2id()Nicolas Dichtel
All callers of this function expect a nsid, not an error. Thus, returns NETNSA_NSID_NOT_ASSIGNED in case of error so that callers don't have to convert the error to NETNSA_NSID_NOT_ASSIGNED. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09tcp: set SOCK_NOSPACE under memory pressureJason Baron
Under tcp memory pressure, calling epoll_wait() in edge triggered mode after -EAGAIN, can result in an indefinite hang in epoll_wait(), even when there is sufficient memory available to continue making progress. The problem is that when __sk_mem_schedule() returns 0 under memory pressure, we do not set the SOCK_NOSPACE flag in the tcp write paths (tcp_sendmsg() or do_tcp_sendpages()). Then, since SOCK_NOSPACE is used to trigger wakeups when incoming acks create sufficient new space in the write queue, all outstanding packets are acked, but we never wake up with the the EPOLLOUT that we are expecting from epoll_wait(). This issue is currently limited to epoll() when used in edge trigger mode, since 'tcp_poll()', does in fact currently set SOCK_NOSPACE. This is sufficient for poll()/select() and epoll() in level trigger mode. However, in edge trigger mode, epoll() is relying on the write path to set SOCK_NOSPACE. EPOLL(7) says that in edge-trigger mode we can only call epoll_wait() after read/write return -EAGAIN. Thus, in the case of the socket write, we are relying on the fact that tcp_sendmsg()/network write paths are going to issue a wakeup for us at some point in the future when we get -EAGAIN. Normally, epoll() edge trigger works fine when we've exceeded the sk->sndbuf because in that case we do set SOCK_NOSPACE. However, when we return -EAGAIN from the write path b/c we are over the tcp memory limits and not b/c we are over the sndbuf, we are never going to get another wakeup. I can reproduce this issue, using SO_SNDBUF, since __sk_mem_schedule() will return 0, or failure more readily with SO_SNDBUF: 1) create socket and set SO_SNDBUF to N 2) add socket as edge trigger 3) write to socket and block in epoll on -EAGAIN 4) cause tcp mem pressure via: echo "<small val>" > net.ipv4.tcp_mem The fix here is simply to set SOCK_NOSPACE in sk_stream_wait_memory() when the socket is non-blocking. Note that SOCK_NOSPACE, in addition to waking up outstanding waiters is also used to expand the size of the sk->sndbuf. However, we will not expand it by setting it in this case because tcp_should_expand_sndbuf(), ensures that no expansion occurs when we are under tcp memory pressure. Note that we could still hang if sk->sk_wmem_queue is 0, when we get the -EAGAIN. In this case the SOCK_NOSPACE bit will not help, since we are waiting for and event that will never happen. I believe that this case is harder to hit (and did not hit in my testing), in that over the tcp 'soft' memory limits, we continue to guarantee a minimum write buffer size. Perhaps, we could return -ENOSPC in this case, or maybe we simply issue a wakeup in this case, such that we keep retrying the write. Note that this case is not specific to epoll() ET, but rather would affect blocking sockets as well. So I view this patch as bringing epoll() edge-trigger into sync with the current poll()/select()/epoll() level trigger and blocking sockets behavior. Signed-off-by: Jason Baron <jbaron@akamai.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09seccomp, filter: add and use bpf_prog_create_from_user from seccompDaniel Borkmann
Seccomp has always been a special candidate when it comes to preparation of its filters in seccomp_prepare_filter(). Due to the extra checks and filter rewrite it partially duplicates code and has BPF internals exposed. This patch adds a generic API inside the BPF code code that seccomp can use and thus keep it's filter preparation code minimal and better maintainable. The other side-effect is that now classic JITs can add seccomp support as well by only providing a BPF_LDX | BPF_W | BPF_ABS translation. Tested with seccomp and BPF test suites. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nicolas Schichan <nschichan@freebox.fr> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09net: filter: add __GFP_NOWARN flag for larger kmem allocsDaniel Borkmann
When seccomp BPF was added, it was discussed to add __GFP_NOWARN flag for their configuration path as f.e. up to 32K allocations are more prone to fail under stress. As we're going to reuse BPF API, add __GFP_NOWARN flags where larger kmalloc() and friends allocations could fail. It doesn't make much sense to pass around __GFP_NOWARN everywhere as an extra argument only for seccomp while we just as well could run into similar issues for socket filters, where it's not desired to have a user application throw a WARN() due to allocation failure. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nicolas Schichan <nschichan@freebox.fr> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09seccomp: simplify seccomp_prepare_filter and reuse bpf_prepare_filterNicolas Schichan
Remove the calls to bpf_check_classic(), bpf_convert_filter() and bpf_migrate_runtime() and let bpf_prepare_filter() take care of that instead. seccomp_check_filter() is passed to bpf_prepare_filter() so that it gets called from there, after bpf_check_classic(). We can now remove exposure of two internal classic BPF functions previously used by seccomp. The export of bpf_check_classic() symbol, previously known as sk_chk_filter(), was there since pre git times, and no in-tree module was using it, therefore remove it. Joint work with Daniel Borkmann. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09net: filter: add a callback to allow classic post-verifier transformationsNicolas Schichan
This is in preparation for use by the seccomp code, the rationale is not to duplicate additional code within the seccomp layer, but instead, have it abstracted and hidden within the classic BPF API. As an interim step, this now also makes bpf_prepare_filter() visible (not as exported symbol though), so that seccomp can reuse that code path instead of reimplementing it. Joint work with Daniel Borkmann. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09Merge tag 'mac80211-next-for-davem-2015-05-06' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Lots of updates for net-next for this cycle. As usual, we have a lot of small fixes and cleanups, the bigger items are: * proper mac80211 rate control locking, to fix some random crashes (this required changing other locking as well) * mac80211 "fast-xmit", a mechanism to reduce, in most cases, the amount of code we execute while going from ndo_start_xmit() to the driver * this also clears the way for properly supporting S/G and checksum and segmentation offloads ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09tcp: add TCPWinProbe and TCPKeepAlive SNMP countersEric Dumazet
Diagnosing problems related to Window Probes has been hard because we lack a counter. TCPWinProbe counts the number of ACK packets a sender has to send at regular intervals to make sure a reverse ACK packet opening back a window had not been lost. TCPKeepAlive counts the number of ACK packets sent to keep TCP flows alive (SO_KEEPALIVE) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09tcp: adjust window probe timers to safer valuesEric Dumazet
With the advent of small rto timers in datacenter TCP, (ip route ... rto_min x), the following can happen : 1) Qdisc is full, transmit fails. TCP sets a timer based on icsk_rto to retry the transmit, without exponential backoff. With low icsk_rto, and lot of sockets, all cpus are servicing timer interrupts like crazy. Intent of the code was to retry with a timer between 200 (TCP_RTO_MIN) and 500ms (TCP_RESOURCE_PROBE_INTERVAL) 2) Receivers can send zero windows if they don't drain their receive queue. TCP sends zero window probes, based on icsk_rto current value, with exponential backoff. With /proc/sys/net/ipv4/tcp_retries2 being 15 (or even smaller in some cases), sender can abort in less than one or two minutes ! If receiver stops the sender, it obviously doesn't care of very tight rto. Probability of dropping the ACK reopening the window is not worth the risk. Lets change the base timer to be at least 200ms (TCP_RTO_MIN) for these events (but not normal RTO based retransmits) A followup patch adds a new SNMP counter, as it would have helped a lot diagnosing this issue. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09tipc: send explicit not supported error in nl compatRichard Alpe
The legacy netlink API treated EPERM (permission denied) as "operation not supported". Reported-by: Tomi Ollila <tomi.ollila@iki.fi> Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09tipc: add broadcast link window set/get to nl apiRichard Alpe
Add the ability to get or set the broadcast link window through the new netlink API. The functionality was unintentionally missing from the new netlink API. Adding this means that we also fix the breakage in the old API when coming through the compat layer. Fixes: 37e2d4843f9e (tipc: convert legacy nl link prop set to nl compat) Reported-by: Tomi Ollila <tomi.ollila@iki.fi> Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09tipc: fix default link prop regression in nl compatRichard Alpe
Default link properties can be set for media or bearer. This functionality was missed when introducing the NL compatibility layer. This patch implements this functionality in the compat netlink layer. It works the same way as it did in the old API. We search for media and bearers matching the "link name". If we find a matching media or bearer the link tolerance, priority or window is used as default for new links on that media or bearer. Fixes: 37e2d4843f9e (tipc: convert legacy nl link prop set to nl compat) Reported-by: Tomi Ollila <tomi.ollila@iki.fi> Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09net_sched: fix a use-after-free in tc_ctl_tfilter()WANG Cong
When tcf_destroy() returns true, tp could be already destroyed, we should not use tp->next after that. For long term, we probably should move tp list to list_head. Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09net: dsa: Add lockdep class to tx queues to avoid lockdep splatAndrew Lunn
DSA stacks an Ethernet device on top of an Ethernet device. This can cause false positive lockdep splats for the transmit queue: Acked-by: Florian Fainelli <f.fainelli@gmail.com> ============================================= [ INFO: possible recursive locking detected ] 4.0.0-rc7-01838-g70621a215fc7 #386 Not tainted --------------------------------------------- kworker/0:0/4 is trying to acquire lock: (_xmit_ETHER#2){+.-...}, at: [<c040e95c>] sch_direct_xmit+0xa8/0x1fc but task is already holding lock: (_xmit_ETHER#2){+.-...}, at: [<c03f4208>] __dev_queue_xmit+0x4d4/0x56c other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(_xmit_ETHER#2); lock(_xmit_ETHER#2); To avoid this, walk the tq queues of the dsa slaves and set a lockdep class. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09net/rds: RDS-TCP: only initiate reconnect attempt on outgoing TCP socket.Sowmini Varadhan
When the peer of an RDS-TCP connection restarts, a reconnect attempt should only be made from the active side of the TCP connection, i.e. the side that has a transient TCP port number. Do not add the passive side of the TCP connection to the c_hash_node and thus avoid triggering rds_queue_reconnect() for passive rds connections. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09net/rds: RDS-TCP: Always create a new rds_sock for an incoming connection.Sowmini Varadhan
When running RDS over TCP, the active (client) side connects to the listening ("passive") side at the RDS_TCP_PORT. After the connection is established, if the client side reboots (potentially without even sending a FIN) the server still has a TCP socket in the esablished state. If the server-side now gets a new SYN comes from the client with a different client port, TCP will create a new socket-pair, but the RDS layer will incorrectly pull up the old rds_connection (which is still associated with the stale t_sock and RDS socket state). This patch corrects this behavior by having rds_tcp_accept_one() always create a new connection for an incoming TCP SYN. The rds and tcp state associated with the old socket-pair is cleaned up via the rds_tcp_state_change() callback which would typically be invoked in most cases when the client-TCP sends a FIN on TCP restart, triggering a transition to CLOSE_WAIT state. In the rarer event of client death without a FIN, TCP_KEEPALIVE probes on the socket will detect the stale socket, and the TCP transition to CLOSE state will trigger the RDS state cleanup. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09ipv6: Fixed source specific default route handling.Markus Stenberg
If there are only IPv6 source specific default routes present, the host gets -ENETUNREACH on e.g. connect() because ip6_dst_lookup_tail calls ip6_route_output first, and given source address any, it fails, and ip6_route_get_saddr is never called. The change is to use the ip6_route_get_saddr, even if the initial ip6_route_output fails, and then doing ip6_route_output _again_ after we have appropriate source address available. Note that this is '99% fix' to the problem; a correct fix would be to do route lookups only within addrconf.c when picking a source address, and never call ip6_route_output before source address has been populated. Signed-off-by: Markus Stenberg <markus.stenberg@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Johan Hedberg says: ==================== Here are a couple of important Bluetooth & mac802154 fixes for 4.1: - mac802154 fix for crypto algorithm allocation failure checking - mac802154 wpan phy leak fix for error code path - Fix for not calling Bluetooth shutdown() if interface is not up Let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-06mac80211: add missing documentation for rate_ctrl_lockJohannes Berg
This was missed in the previous patch, add some documentation for rate_ctrl_lock to avoid docbook warnings. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-06cfg80211: change GO_CONCURRENT to IR_CONCURRENT for STAArik Nemtsov
The GO_CONCURRENT regulatory definition can be extended to station interfaces requesting to IR as part of TDLS off-channel operations. Rename the GO_CONCURRENT flag to IR_CONCURRENT and allow the added use-case. Change internal users of GO_CONCURRENT to use the new definition. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-06cfg80211: Allow GO concurrent relaxation after BSS disconnectionAvraham Stern
If a P2P GO was allowed on a channel because of the GO concurrent relaxation, i.e., another station interface was associated to an AP on the same channel or the same UNII band, and the station interface disconnected from the AP, allow the following use cases unless the channel is marked as indoor only and the device is not operating in an indoor environment: 1. Allow the P2P GO to stay on its current channel. The rationale behind this is that if the channel or UNII band were allowed by the AP they could still be used to continue the P2P GO operation, and avoid connection breakage. 2. Allow another P2P GO to start on the same channel or another channel that is in the same UNII band as the previous instantiated P2P GO. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-06mac80211: validate cipher scheme PN length betterJohannes Berg
Currently, a cipher scheme can advertise an arbitrarily long sequence counter, but mac80211 only supports up to 16 bytes and the initial value from userspace will be truncated. Fix two things: * don't allow the driver to register anything longer than the 16 bytes that mac80211 reserves space for * require userspace to specify a starting value with the correct length (or none at all) Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-06mac80211: extend get_key() to return PN for all ciphersJohannes Berg
For ciphers not supported by mac80211, the function currently doesn't return any PN data. Fix this by extending the driver's get_key_seq() a little more to allow moving arbitrary PN data. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-06mac80211: extend get_tkip_seq to all keysJohannes Berg
Extend the function to read the TKIP IV32/IV16 to read the IV/PN for all ciphers in order to allow drivers with full hardware crypto to properly support this. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-05tcp_westwood: fix tcp_westwood_info()Eric Dumazet
I forgot to update tcp_westwood when changing get_info() behavior, this patch should fix this. Fixes: 64f40ff5bbdb ("tcp: prepare CC get_info() access from getsockopt()") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05mpls: Move reserved label definitionsTom Herbert
Move to include/uapi/linux/mpls.h to be externally visibile. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05openvswitch: Use eth_proto_is_802_3Alexander Duyck
Replace "ntohs(proto) >= ETH_P_802_3_MIN" w/ eth_proto_is_802_3(proto). Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05ipv4/ip_tunnel_core: Use eth_proto_is_802_3Alexander Duyck
Replace "ntohs(proto) >= ETH_P_802_3_MIN" w/ eth_proto_is_802_3(proto). Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05ebtables: Use eth_proto_is_802_3Alexander Duyck
Replace "ntohs(proto) >= ETH_P_802_3_MIN" w/ eth_proto_is_802_3(proto). Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05etherdev: Fix sparse error, make test usable by other functionsAlexander Duyck
This change does two things. First it fixes a sparse error for the fact that the __be16 degrades to an integer. Since that is actually what I am kind of doing I am simply working around that by forcing both sides of the comparison to u16. Also I realized on some compilers I was generating another instruction for big endian systems such as PowerPC since it was masking the value before doing the comparison. So to resolve that I have simply pulled the mask out and wrapped it in an #ifndef __BIG_ENDIAN. Lastly I pulled this all out into its own function. I notices there are similar checks in a number of other places so this function can be reused there to help reduce overhead in these paths as well. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05bridge: change BR_GROUPFWD_RESTRICTED to allow forwarding of LLDP framesBernhard Thaler
BR_GROUPFWD_RESTRICTED bitmask restricts users from setting values to /sys/class/net/brX/bridge/group_fwd_mask that allow forwarding of some IEEE 802.1D Table 7-10 Reserved addresses: (MAC Control) 802.3 01-80-C2-00-00-01 (Link Aggregation) 802.3 01-80-C2-00-00-02 802.1AB LLDP 01-80-C2-00-00-0E Change BR_GROUPFWD_RESTRICTED to allow to forward LLDP frames and document group_fwd_mask. e.g. echo 16384 > /sys/class/net/brX/bridge/group_fwd_mask allows to forward LLDP frames. This may be needed for bridge setups used for network troubleshooting or any other scenario where forwarding of LLDP frames is desired (e.g. bridge connecting a virtual machine to real switch transmitting LLDP frames that virtual machine needs to receive). Tested on a simple bridge setup with two interfaces and host transmitting LLDP frames on one side of this bridge (used lldpd). Setting group_fwd_mask as described above lets LLDP frames traverse bridge. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05tcp: provide SYN headers for passive connectionsEric Dumazet
This patch allows a server application to get the TCP SYN headers for its passive connections. This is useful if the server is doing fingerprinting of clients based on SYN packet contents. Two socket options are added: TCP_SAVE_SYN and TCP_SAVED_SYN. The first is used on a socket to enable saving the SYN headers for child connections. This can be set before or after the listen() call. The latter is used to retrieve the SYN headers for passive connections, if the parent listener has enabled TCP_SAVE_SYN. TCP_SAVED_SYN is read once, it frees the saved SYN headers. The data returned in TCP_SAVED_SYN are network (IPv4/IPv6) and TCP headers. Original patch was written by Tom Herbert, I changed it to not hold a full skb (and associated dst and conntracking reference). We have used such patch for about 3 years at Google. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05mac80211: remove useless skb->encapsulation checkJohannes Berg
No current (and planned, as far as I know) wifi devices support encapsulation checksum offload, so remove the useless test here. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-05mac80211: make LED triggering depend on activationJohannes Berg
When LED triggers are compiled in, but not used, mac80211 will still call them to update the status. This isn't really a problem for the assoc and radio ones, but the TX/RX (and to a certain extend TPT) ones can be called very frequently (for every packet.) In order to avoid that when they're not used, track their activation and call the corresponding trigger (and in the TPT case, account for throughput) only when the trigger is actually used by an LED. Additionally, make those trigger functions inlines since theyre only used once in the remaining code. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-05mac80211: make LED trigger names constJohannes Berg
This is just a code cleanup, make the LED trigger names const as they're not expected to be modified by drivers. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-05mac80211: clean up station debugfsJohannes Berg
Remove items that can be retrieved through nl80211. This also removes two items (tx_packets and tx_bytes) where only the VO counter was exposed since they are split up per AC but in the debugfs file only the first AC was shown. Also remove the useless "dev" file - the stations have long been in a sub-directory of the netdev so there's no need for that any more. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-05mac80211: remove sta->tx_fragments counterJohannes Berg
This counter is unsafe with concurrent TX and is only exposed through debugfs and ethtool. Instead of trying to fix it just remove it for now, if it's really needed then it should be exposed through nl80211 and in a way that drivers that do the fragmentation in the device could support it as well. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-05mac80211: move dot11 counters under MAC80211_DEBUG_COUNTERSJohannes Berg
Since these counters can only be read through debugfs, there's very little point in maintaining them all the time. However, even just making them depend on debugfs is pointless - they're not normally used. Additionally a number of them aren't even concurrency safe. Move them under MAC80211_DEBUG_COUNTERS so they're normally not even compiled in. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-05mac80211: clean up global debugfs statisticsJohannes Berg
The debugfs statistics macros are pointlessly verbose, so change that macro to just have a single argument. While at it, remove the unused counters and rename rx_expand_skb_head2 to the better rx_expand_skb_head_defrag. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-04net: fix two sparse warnings introduced by IGMP/MLD parsing exportsLinus Lüssing
> net/core/skbuff.c:4108:13: sparse: incorrect type in assignment (different base types) > net/ipv6/mcast_snoop.c:63 ipv6_mc_check_exthdrs() warn: unsigned 'offset' is never less than zero. Introduced by 9afd85c9e4552b276e2f4cfefd622bdeeffbbf26 ("net: Export IGMP/MLD message validation code") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04Merge tag 'mac80211-for-davem-2015-05-04' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== We have only a few fixes right now: * a fix for an issue with hash collision handling in the rhashtable conversion * a merge issue - rhashtable removed default shrinking just before mac80211 was converted, so enable it now * remove an invalid WARN that can trigger with legitimate userspace behaviour * add a struct member missing from kernel-doc that caused a lot of warnings ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-05-04 Here's the first bluetooth-next pull request for 4.2: - Various fixes for at86rf230 driver - ieee802154: trace events support for rdev->ops - HCI UART driver refactoring - New Realtek IDs added to btusb driver - Off-by-one fix for rtl8723b in btusb driver - Refactoring of btbcm driver for both UART & USB use Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04net/rds: Fix new sparse warningDavid Ahern
c0adf54a109 introduced new sparse warnings: CHECK /home/dahern/kernels/linux.git/net/rds/ib_cm.c net/rds/ib_cm.c:191:34: warning: incorrect type in initializer (different base types) net/rds/ib_cm.c:191:34: expected unsigned long long [unsigned] [usertype] dp_ack_seq net/rds/ib_cm.c:191:34: got restricted __be64 <noident> net/rds/ib_cm.c:194:51: warning: cast to restricted __be64 The temporary variable for sequence number should have been declared as __be64 rather than u64. Make it so. Signed-off-by: David Ahern <david.ahern@oracle.com> Cc: shamir rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04tipc: deal with return value of tipc_conn_new callbackYing Xue
Once tipc_conn_new() returns NULL, the connection should be shut down immediately, otherwise, oops may happen due to the NULL pointer. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericson.com> Signed-off-by: David S. Miller <davem@davemloft.net>