summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2013-10-02batman-adv: set up network coding packet handlers during module initMatthias Schiffer
batman-adv saves its table of packet handlers as a global state, so handlers must be set up only once (and setting them up a second time will fail). The recently-added network coding support tries to set up its handler each time a new softif is registered, which obviously fails when more that one softif is used (and in consequence, the softif creation fails). Fix this by splitting up batadv_nc_init into batadv_nc_init (which is called only once) and batadv_nc_mesh_init (which is called for each softif); in addition batadv_nc_free is renamed to batadv_nc_mesh_free to keep naming consistent. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-10-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking changes from David Miller: 1) Multiply in netfilter IPVS can overflow when calculating destination weight. From Simon Kirby. 2) Use after free fixes in IPVS from Julian Anastasov. 3) SFC driver bug fixes from Daniel Pieczko. 4) Memory leak in pcan_usb_core failure paths, from Alexey Khoroshilov. 5) Locking and encapsulation fixes to serial line CAN driver, from Andrew Naujoks. 6) Duplex and VF handling fixes to bnx2x driver from Yaniv Rosner, Eilon Greenstein, and Ariel Elior. 7) In lapb, if no other packets are outstanding, T1 timeouts actually stall things and no packet gets sent. Fix from Josselin Costanzi. 8) ICMP redirects should not make it to the socket error queues, from Duan Jiong. 9) Fix bugs in skge DMA mapping error handling, from Nikulas Patocka. 10) Fix setting of VLAN priority field on via-rhine driver, from Roget Luethi. 11) Fix TX stalls and VLAN promisc programming in be2net driver from Ajit Khaparde. 12) Packet padding doesn't get handled correctly in new usbnet SG support code, from Ming Lei. 13) Fix races in netdevice teardown wrt. network namespace closing. From Eric W. Biederman. 14) Fix potential missed initialization of net_secret if not TCP connections are openned. From Eric Dumazet. 15) Cinterion PLXX product ID in qmi_wwan driver is wrong, from Aleksander Morgado. 16) skb_cow_head() can change skb->data and thus packet header pointers, don't use stale ip_hdr reference in ip_tunnel code. 17) Backend state transition handling fixes in xen-netback, from Paul Durrant. 18) Packet offset for AH protocol is handled wrong in flow dissector, from Eric Dumazet. 19) Taking down an fq packet scheduler instance can leave stale packets in the queues, fix from Eric Dumazet. 20) Fix performance regressions introduced by TCP Small Queues. From Eric Dumazet. 21) IPV6 GRE tunneling code calculates max_headroom incorrectly, from Hannes Frederic Sowa. 22) Multicast timer handlers in ipv4 and ipv6 can be the last and final reference to the ipv4/ipv6 specific network device state, so use the reference put that will check and release the object if the reference hits zero. From Salam Noureddine. 23) Fix memory corruption in ip_tunnel driver, and use skb_push() instead of __skb_push() so that similar bugs are less hard to find. From Steffen Klassert. 24) Add forgotten hookup of rtnl_ops in SIT and ip6tnl drivers, from Nicolas Dichtel. 25) fq scheduler doesn't accurately rate limit in certain circumstances, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits) pkt_sched: fq: rate limiting improvements ip6tnl: allow to use rtnl ops on fb tunnel sit: allow to use rtnl ops on fb tunnel ip_tunnel: Remove double unregister of the fallback device ip_tunnel_core: Change __skb_push back to skb_push ip_tunnel: Add fallback tunnels to the hash lists ip_tunnel: Fix a memory corruption in ip_tunnel_xmit qlcnic: Fix SR-IOV configuration ll_temac: Reset dma descriptors indexes on ndo_open skbuff: size of hole is wrong in a comment ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put ethernet: moxa: fix incorrect placement of __initdata tag ipv6: gre: correct calculation of max_headroom powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file Revert "powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file" bonding: Fix broken promiscuity reference counting issue tcp: TSQ can use a dynamic limit dm9601: fix IFF_ALLMULTI handling pkt_sched: fq: qdisc dismantle fixes ...
2013-10-01pkt_sched: fq: rate limiting improvementsEric Dumazet
FQ rate limiting suffers from two problems, reported by Steinar : 1) FQ enforces a delay when flow quantum is exhausted in order to reduce cpu overhead. But if packets are small, current delay computation is slightly wrong, and observed rates can be too high. Steinar had this problem because he disabled TSO and GSO, and default FQ quantum is 2*1514. (Of course, I wish recent TSO auto sizing changes will help to not having to disable TSO in the first place) 2) maxrate was not used for forwarded flows (skbs not attached to a socket) Tested: tc qdisc add dev eth0 root est 1sec 4sec fq maxrate 8Mbit netperf -H lpq84 -l 1000 & sleep 10 ; tc -s qdisc show dev eth0 qdisc fq 8003: root refcnt 32 limit 10000p flow_limit 100p buckets 1024 quantum 3028 initial_quantum 15140 maxrate 8000Kbit Sent 16819357 bytes 11258 pkt (dropped 0, overlimits 0 requeues 0) rate 7831Kbit 653pps backlog 7570b 5p requeues 0 44 flows (43 inactive, 1 throttled), next packet delay 2977352 ns 0 gc, 0 highprio, 5545 throttled lpq83:~# tcpdump -p -i eth0 host lpq84 -c 12 09:02:52.079484 IP lpq83 > lpq84: . 1389536928:1389538376(1448) ack 3808678021 win 457 <nop,nop,timestamp 961812 572609068> 09:02:52.079499 IP lpq83 > lpq84: . 1448:2896(1448) ack 1 win 457 <nop,nop,timestamp 961812 572609068> 09:02:52.079906 IP lpq84 > lpq83: . ack 2896 win 16384 <nop,nop,timestamp 572609080 961812> 09:02:52.082568 IP lpq83 > lpq84: . 2896:4344(1448) ack 1 win 457 <nop,nop,timestamp 961815 572609071> 09:02:52.082581 IP lpq83 > lpq84: . 4344:5792(1448) ack 1 win 457 <nop,nop,timestamp 961815 572609071> 09:02:52.083017 IP lpq84 > lpq83: . ack 5792 win 16384 <nop,nop,timestamp 572609083 961815> 09:02:52.085678 IP lpq83 > lpq84: . 5792:7240(1448) ack 1 win 457 <nop,nop,timestamp 961818 572609074> 09:02:52.085693 IP lpq83 > lpq84: . 7240:8688(1448) ack 1 win 457 <nop,nop,timestamp 961818 572609074> 09:02:52.086117 IP lpq84 > lpq83: . ack 8688 win 16384 <nop,nop,timestamp 572609086 961818> 09:02:52.088792 IP lpq83 > lpq84: . 8688:10136(1448) ack 1 win 457 <nop,nop,timestamp 961821 572609077> 09:02:52.088806 IP lpq83 > lpq84: . 10136:11584(1448) ack 1 win 457 <nop,nop,timestamp 961821 572609077> 09:02:52.089217 IP lpq84 > lpq83: . ack 11584 win 16384 <nop,nop,timestamp 572609090 961821> Reported-by: Steinar H. Gunderson <sesse@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01ip6tnl: allow to use rtnl ops on fb tunnelNicolas Dichtel
rtnl ops where introduced by c075b13098b3 ("ip6tnl: advertise tunnel param via rtnl"), but I forget to assign rtnl ops to fb tunnels. Now that it is done, we must remove the explicit call to unregister_netdevice_queue(), because the fallback tunnel is added to the queue in ip6_tnl_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this is valid since commit 0bd8762824e7 ("ip6tnl: add x-netns support")). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01sit: allow to use rtnl ops on fb tunnelNicolas Dichtel
rtnl ops where introduced by ba3e3f50a0e5 ("sit: advertise tunnel param via rtnl"), but I forget to assign rtnl ops to fb tunnels. Now that it is done, we must remove the explicit call to unregister_netdevice_queue(), because the fallback tunnel is added to the queue in sit_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this is valid since commit 5e6700b3bf98 ("sit: add support of x-netns")). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01ip_tunnel: Remove double unregister of the fallback deviceSteffen Klassert
When queueing the netdevices for removal, we queue the fallback device twice in ip_tunnel_destroy(). The first time when we queue all netdevices in the namespace and then again explicitly. Fix this by removing the explicit queueing of the fallback device. Bug was introduced when network namespace support was added with commit 6c742e714d8 ("ipip: add x-netns support"). Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01ip_tunnel_core: Change __skb_push back to skb_pushSteffen Klassert
Git commit 0e6fbc5b ("ip_tunnels: extend iptunnel_xmit()") moved the IP header installation to iptunnel_xmit() and changed skb_push() to __skb_push(). This makes possible bugs hard to track down, so change it back to skb_push(). Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01ip_tunnel: Add fallback tunnels to the hash listsSteffen Klassert
Currently we can not update the tunnel parameters of the fallback tunnels because we don't find them in the hash lists. Fix this by adding them on initialization. Bug was introduced with commit c544193214 ("GRE: Refactor GRE tunneling code.") Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01ip_tunnel: Fix a memory corruption in ip_tunnel_xmitSteffen Klassert
We might extend the used aera of a skb beyond the total headroom when we install the ipip header. Fix this by calling skb_cow_head() unconditionally. Bug was introduced with commit c544193214 ("GRE: Refactor GRE tunneling code.") Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter/IPVS fixes for your net tree, they are: * Fix BUG_ON splat due to malformed TCP packets seen by synproxy, from Patrick McHardy. * Fix possible weight overflow in lblc and lblcr schedulers due to 32-bits arithmetics, from Simon Kirby. * Fix possible memory access race in the lblc and lblcr schedulers, introduced when it was converted to use RCU, two patches from Julian Anastasov. * Fix hard dependency on CPU 0 when reading per-cpu stats in the rate estimator, from Julian Anastasov. * Fix race that may lead to object use after release, when invoking ipvsadm -C && ipvsadm -R, introduced when adding RCU, from Julian Anastasov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_putSalam Noureddine
It is possible for the timer handlers to run after the call to ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the handler function in order to do proper cleanup when the refcnt reaches 0. Otherwise, the refcnt can reach zero without the inet6_dev being destroyed and we end up leaking a reference to the net_device and see messages like the following, unregister_netdevice: waiting for eth0 to become free. Usage count = 1 Tested on linux-3.4.43. Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_putSalam Noureddine
It is possible for the timer handlers to run after the call to ip_mc_down so use in_dev_put instead of __in_dev_put in the handler function in order to do proper cleanup when the refcnt reaches 0. Otherwise, the refcnt can reach zero without the in_device being destroyed and we end up leaking a reference to the net_device and see messages like the following, unregister_netdevice: waiting for eth0 to become free. Usage count = 1 Tested on linux-3.4.43. Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30ipv6: gre: correct calculation of max_headroomHannes Frederic Sowa
gre_hlen already accounts for sizeof(struct ipv6_hdr) + gre header, so initialize max_headroom to zero. Otherwise the if (encap_limit >= 0) { max_headroom += 8; mtu -= 8; } increments an uninitialized variable before max_headroom was reset. Found with coverity: 728539 Cc: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30tcp: TSQ can use a dynamic limitEric Dumazet
When TCP Small Queues was added, we used a sysctl to limit amount of packets queues on Qdisc/device queues for a given TCP flow. Problem is this limit is either too big for low rates, or too small for high rates. Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO auto sizing, it can better control number of packets in Qdisc/device queues. New limit is two packets or at least 1 to 2 ms worth of packets. Low rates flows benefit from this patch by having even smaller number of packets in queues, allowing for faster recovery, better RTT estimations. High rates flows benefit from this patch by allowing more than 2 packets in flight as we had reports this was a limiting factor to reach line rate. [ In particular if TX completion is delayed because of coalescing parameters ] Example for a single flow on 10Gbp link controlled by FQ/pacing 14 packets in flight instead of 2 $ tc -s -d qd qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p buckets 1024 quantum 3028 initial_quantum 15140 Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0 requeues 6822476) rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476 2047 flow, 2046 inactive, 1 throttled, delay 15673 ns 2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit Note that sk_pacing_rate is currently set to twice the actual rate, but this might be refined in the future when a flow is in congestion avoidance. Additional change : skb->destructor should be set to tcp_wfree(). A future patch (for linux 3.13+) might remove tcp_limit_output_bytes Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30Merge branch 'for-john' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
2013-09-30pkt_sched: fq: qdisc dismantle fixesEric Dumazet
fq_reset() should drops all packets in queue, including throttled flows. This patch moves code from fq_destroy() to fq_reset() to do the cleaning. fq_change() must stop calling fq_dequeue() if all remaining packets are from throttled flows. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30net: flow_dissector: fix thoff for IPPROTO_AHEric Dumazet
In commit 8ed781668dd49 ("flow_keys: include thoff into flow_keys for later usage"), we missed that existing code was using nhoff as a temporary variable that could not always contain transport header offset. This is not a problem for TCP/UDP because port offset (@poff) is 0 for these protocols. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30ipv6: Fix preferred_lft not updating in some casesPaul Marks
Consider the scenario where an IPv6 router is advertising a fixed preferred_lft of 1800 seconds, while the valid_lft begins at 3600 seconds and counts down in realtime. A client should reset its preferred_lft to 1800 every time the RA is received, but a bug is causing Linux to ignore the update. The core problem is here: if (prefered_lft != ifp->prefered_lft) { Note that ifp->prefered_lft is an offset, so it doesn't decrease over time. Thus, the comparison is always (1800 != 1800), which fails to trigger an update. The most direct solution would be to compute a "stored_prefered_lft", and use that value in the comparison. But I think that trying to filter out unnecessary updates here is a premature optimization. In order for the filter to apply, both of these would need to hold: - The advertised valid_lft and preferred_lft are both declining in real time. - No clock skew exists between the router & client. So in this patch, I've set "update_lft = 1" unconditionally, which allows the surrounding code to be greatly simplified. Signed-off-by: Paul Marks <pmarks@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30ip_tunnel: Do not use stale inner_iph pointer.Pravin B Shelar
While sending packet skb_cow_head() can change skb header which invalidates inner_iph pointer to skb header. Following patch avoid using it. Found by code inspection. This bug was introduced by commit 0e6fbc5b6c6218 (ip_tunnels: extend iptunnel_xmit()). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30netfilter: synproxy: fix BUG_ON triggered by corrupt TCP packetsPatrick McHardy
TCP packets hitting the SYN proxy through the SYNPROXY target are not validated by TCP conntrack. When th->doff is below 5, an underflow happens when calculating the options length, causing skb_header_pointer() to return NULL and triggering the BUG_ON(). Handle this case gracefully by checking for NULL instead of using BUG_ON(). Reported-by: Martin Topholm <mph@one.com> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-09-30mac80211: Run deferred scan if last roc_list item is not startedJouni Malinen
mac80211 scan processing could get stuck if roc work for pending, but not started when a scan request was deferred due to such roc item. Normally the deferred scan would be started from ieee80211_start_next_roc(), but ieee80211_sw_roc_work() calls that only if the finished ROC was started. Fix this by calling ieee80211_run_deferred_scan() in the case the last ROC was not actually started. This issue was hit relatively easily in P2P find operations where Listen state (remain-on-channel) and Search state (scan) are repeated in a loop. Signed-off-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-09-30mac80211: update sta->last_rx on acked tx framesFelix Fietkau
When clients are idle for too long, hostapd sends nullfunc frames for probing. When those are acked by the client, the idle time needs to be updated. To make this work (and to avoid unnecessary probing), update sta->last_rx whenever an ACK was received for a tx packet. Only do this if the flag IEEE80211_HW_REPORTS_TX_ACK_STATUS is set. Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-09-30mac80211: use sta_info_get_bss() for nl80211 tx and client probingFelix Fietkau
This allows calls for clients in AP_VLANs (e.g. for 4-addr) to succeed Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-09-28net: net_secret should not depend on TCPEric Dumazet
A host might need net_secret[] and never open a single socket. Problem added in commit aebda156a570782 ("net: defer net_secret[] initialization") Based on prior patch from Hannes Frederic Sowa. Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@strressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-28net: Delay default_device_exit_batch until no devices are unregistering v2Eric W. Biederman
There is currently serialization network namespaces exiting and network devices exiting as the final part of netdev_run_todo does not happen under the rtnl_lock. This is compounded by the fact that the only list of devices unregistering in netdev_run_todo is local to the netdev_run_todo. This lack of serialization in extreme cases results in network devices unregistering in netdev_run_todo after the loopback device of their network namespace has been freed (making dst_ifdown unsafe), and after the their network namespace has exited (making the NETDEV_UNREGISTER, and NETDEV_UNREGISTER_FINAL callbacks unsafe). Add the missing serialization by a per network namespace count of how many network devices are unregistering and having a wait queue that is woken up whenever the count is decreased. The count and wait queue allow default_device_exit_batch to wait until all of the unregistration activity for a network namespace has finished before proceeding to unregister the loopback device and then allowing the network namespace to exit. Only a single global wait queue is used because there is a single global lock, and there is a single waiter, per network namespace wait queues would be a waste of resources. The per network namespace count of unregistering devices gives a progress guarantee because the number of network devices unregistering in an exiting network namespace must ultimately drop to zero (assuming network device unregistration completes). The basic logic remains the same as in v1. This patch is now half comment and half rtnl_lock_unregistering an expanded version of wait_event performs no extra work in the common case where no network devices are unregistering when we get to default_device_exit_batch. Reported-by: Francesco Ruggeri <fruggeri@aristanetworks.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-28IPv6 NAT: Do not drop DNATed 6to4/6rd packetsCatalin\(ux\) M. BOIE
When a router is doing DNAT for 6to4/6rd packets the latest anti-spoofing commit 218774dc ("ipv6: add anti-spoofing checks for 6to4 and 6rd") will drop them because the IPv6 address embedded does not match the IPv4 destination. This patch will allow them to pass by testing if we have an address that matches on 6to4/6rd interface. I have been hit by this problem using Fedora and IPV6TO4_IPV4ADDR. Also, log the dropped packets (with rate limit). Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-27Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem Also fixed-up a badly indented closing brace... Signed-off-by: John W. Linville <linville@tuxdriver.com>
2013-09-26cfg80211: fix sysfs registration raceJohannes Berg
My locking rework/race fixes caused a regression in the registration, causing uevent notifications for wireless devices before the device is really fully registered and available in nl80211. Fix this by moving the device_add() under rtnl and move the rfkill to afterwards (it can't be under rtnl.) Reported-and-tested-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-09-26mac80211: fix the setting of extended supported rate IEChun-Yeow Yeoh
The patch "mac80211: select and adjust bitrates according to channel mode" causes regression and breaks the extended supported rate IE setting. Since "i" is starting with 8, so this is not necessary to introduce "skip" here. Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@cozybit.com> Signed-off-by: Colleen Twitty <colleen@cozybit.com> Reviewed-by: Jason Abele <jason@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-09-26mac80211: drop spoofed packets in ad-hoc modeFelix Fietkau
If an Ad-Hoc node receives packets with the Cell ID or its own MAC address as source address, it hits a WARN_ON in sta_info_insert_check() With many packets, this can massively spam the logs. One way that this can easily happen is through having Cisco APs in the area with rouge AP detection and countermeasures enabled. Such Cisco APs will regularly send fake beacons, disassoc and deauth packets that trigger these warnings. To fix this issue, drop such spoofed packets early in the rx path. Cc: stable@vger.kernel.org Reported-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-09-26Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
2013-09-26cfg80211: fix warning when using WEXT for IBSSBruno Randolf
Fix kernel warning when using WEXT for configuring ad-hoc mode, e.g. "iwconfig wlan0 essid test channel 1" WARNING: at net/wireless/chan.c:373 cfg80211_chandef_usable+0x50/0x21c [cfg80211]() The warning is caused by an uninitialized variable center_freq1. Cc: stable@vger.kernel.org Signed-off-by: Bruno Randolf <br1@einfach.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-09-26cfg80211: use the correct macro to check for active monitor supportLuciano Coelho
Use MONITOR_FLAG_ACTIVE, which is a flag mask, instead of NL80211_MNTR_FLAG_ACTIVE, which is a flag index, when checking if the hardware supports active monitoring. Cc: stable@vger.kernel.org Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-09-25xfrm: Fix aevent generation for each received packetThomas Egerer
If asynchronous events are enabled for a particular netlink socket, the notify function is called by the advance function. The notify function creates and dispatches a km_event if a replay timeout occurred, or at least replay_maxdiff packets have been received since the last asynchronous event has been sent. The function is supposed to return if neither of the two events were detected for a state, or replay_maxdiff is equal to zero. Replay_maxdiff is initialized in xfrm_state_construct to the value of the xfrm.sysctl_aevent_rseqth (2 by default), and updated if for a state if the netlink attribute XFRMA_REPLAY_THRESH is set. If, however, replay_maxdiff is set to zero, then all of the three notify implementations perform a break from the switch statement instead of checking whether a timeout occurred, and -- if not -- return. As a result an asynchronous event is generated for every replay update of a state that has a zero replay_maxdiff value. This patch modifies the notify functions such that they immediately return if replay_maxdiff has the value zero, unless a timeout occurred. Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-09-24ipv6: udp packets following an UFO enqueued packet need also be handled by UFOHannes Frederic Sowa
In the following scenario the socket is corked: If the first UDP packet is larger then the mtu we try to append it to the write queue via ip6_ufo_append_data. A following packet, which is smaller than the mtu would be appended to the already queued up gso-skb via plain ip6_append_data. This causes random memory corruptions. In ip6_ufo_append_data we also have to be careful to not queue up the same skb multiple times. So setup the gso frame only when no first skb is available. This also fixes a shortcoming where we add the current packet's length to cork->length but return early because of a packet > mtu with dontfrag set (instead of sutracting it again). Found with trinity. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-24net: raw: do not report ICMP redirects to user spaceDuan Jiong
Redirect isn't an error condition, it should leave the error handler without touching the socket. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-24net: udp: do not report ICMP redirects to user spaceDuan Jiong
Redirect isn't an error condition, it should leave the error handler without touching the socket. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23mrp: add periodictimer to allow retries when packets get lostNoel Burton-Krahn
MRP doesn't implement the periodictimer in 802.1Q, so it never retries if packets get lost. I ran into this problem when MRP sent a MVRP JoinIn before the interface was fully up. The JoinIn was lost, MRP didn't retry, and MVRP registration failed. Tested against Juniper QFabric switches Signed-off-by: Noel Burton-Krahn <noel@burton-krahn.com> Acked-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23net/lapb: re-send packets on timeoutjosselin.costanzi@mobile-devices.fr
Actually re-send packets when the T1 timer runs out. This fixes a bug where packets are waiting on the write queue until disconnection when no other traffic is outstanding. Signed-off-by: Josselin Costanzi <josselin.costanzi@mobile-devices.fr> Signed-off-by: Maxime Jayat <maxime.jayat@mobile-devices.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-21Merge tag 'nfs-for-3.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfix from Trond Myklebust: "Fix a regression due to incorrect sharing of gss auth caches" * tag 'nfs-for-3.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: RPCSEC_GSS: fix crash on destroying gss auth
2013-09-20Bluetooth: don't release the port in rfcomm_dev_state_change()Gianluca Anzolin
When the dlc is closed, rfcomm_dev_state_change() tries to release the port in the case it cannot get a reference to the tty. However this is racy and not even needed. Infact as Peter Hurley points out: 1. Only consider dlcs that are 'stolen' from a connected socket, ie. reused. Allocated dlcs cannot have been closed prior to port activate and so for these dlcs a tty reference will always be avail in rfcomm_dev_state_change() -- except for the conditions covered by #2b below. 2. If a tty was at some point previously created for this rfcomm, then either (a) the tty reference is still avail, so rfcomm_dev_state_change() will perform a hangup. So nothing to do, or, (b) the tty reference is no longer avail, and the tty_port will be destroyed by the last tty_port_put() in rfcomm_tty_cleanup. Again, no action required. 3. Prior to obtaining the dlc lock in rfcomm_dev_add(), rfcomm_dev_state_change() will not 'see' a rfcomm_dev so nothing to do here. 4. After releasing the dlc lock in rfcomm_dev_add(), rfcomm_dev_state_change() will 'see' an incomplete rfcomm_dev if a tty reference could not be obtained. Again, the best thing to do here is nothing. Any future attempted open() will block on rfcomm_dev_carrier_raised(). The unconnected device will exist until released by ioctl(RFCOMMRELEASEDEV). The patch removes the aforementioned code and uses the tty_port_tty_hangup() helper to hangup the tty. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-09-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) If the local_df boolean is set on an SKB we have to allocate a unique ID even if IP_DF is set in the ipv4 headers, from Ansis Atteka. 2) Some fixups for the new chipset support that went into the sfc driver, from Ben Hutchings. 3) Because SCTP bypasses a good chunk of, and actually duplicates, the logic of the ipv6 output path, some IPSEC things don't get done properly. Integrate SCTP better into the ipv6 output path so that these problems are fixed and such issues don't get missed in the future either. From Daniel Borkmann. 4) Fix skge regressions added by the DMA mapping error return checking added in v3.10, from Mikulas Patocka. 5) Kill some more IRQF_DISABLED references, from Michael Opdenacker. 6) Fix races and deadlocks in the bridging code, from Hong Zhiguo. 7) Fix error handling in tun_set_iff(), in particular don't leak resources. From Jason Wang. 8) Prevent format-string injection into xen-netback driver, from Kees Cook. 9) Fix regression added to netpoll ARP packet handling, in particular check for the right ETH_P_ARP protocol code. From Sonic Zhang. 10) Try to deal with AMD IOMMU errors when using r8169 chips, from Francois Romieu. 11) Cure freezes due to recent changes in the rt2x00 wireless driver, from Stanislaw Gruszka. 12) Don't do SPI transfers (which can sleep) in interrupt context in cw1200 driver, from Solomon Peachy. 13) Fix LEDs handling bug in 5720 tg3 chips already handled for 5719. From Nithin Sujir. 14) Make xen_netbk_count_skb_slots() count the actual number of slots that will be used, taking into consideration packing and other issues that the transmit path will run into. From David Vrabel. 15) Use the correct maximum age when calculating the bridge message_age_timer, from Chris Healy. 16) Get rid of memory leaks in mcs7780 IRDA driver, from Alexey Khoroshilov. 17) Netfilter conntrack extensions were converted to RCU but are not always freed properly using kfree_rcu(). Fix from Michal Kubecek. 18) VF reset recovery not being done correctly in qlcnic driver, from Manish Chopra. 19) Fix inverted test in ATM nicstar driver, from Andy Shevchenko. 20) Missing workqueue destroy in cxgb4 error handling, from Wei Yang. 21) Internal switch not initialized properly in bgmac driver, from Rafał Miłecki. 22) Netlink messages report wrong local and remote addresses in IPv6 tunneling, from Ding Zhi. 23) ICMP redirects should not generate socket errors in DCCP and SCTP. We're still working out how this should be handled for RAW and UDP sockets. From Daniel Borkmann and Duan Jiong. 24) We've had several bugs wherein the network namespace's loopback device gets accessed after it is free'd, NULL it out so that we can catch these problems more readily. From Eric W Biederman. 25) Fix regression in TCP RTO calculations, from Neal Cardwell. 26) Fix too early free of xen-netback network device when VIFs still exist. From Paul Durrant. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits) netconsole: fix a deadlock with rtnl and netconsole's mutex netpoll: fix NULL pointer dereference in netpoll_cleanup skge: fix broken driver ip: generate unique IP identificator if local fragmentation is allowed ip: use ip_hdr() in __ip_make_skb() to retrieve IP header xen-netback: Don't destroy the netdev until the vif is shut down net:dccp: do not report ICMP redirects to user space cnic: Fix crash in cnic_bnx2x_service_kcq() bnx2x, cnic, bnx2i, bnx2fc: Fix bnx2i and bnx2fc regressions. vxlan: Avoid creating fdb entry with NULL destination tcp: fix RTO calculated from cached RTT drivers: net: phy: cicada.c: clears warning Use #include <linux/io.h> instead of <asm/io.h> net loopback: Set loopback_dev to NULL when freed batman-adv: set the TAG flag for the vid passed to BLA netfilter: nfnetlink_queue: use network skb for sequence adjustment net: sctp: rfc4443: do not report ICMP redirects to user space net: usb: cdc_ether: use usb.h macros whenever possible net: usb: cdc_ether: fix checkpatch errors and warnings net: usb: cdc_ether: Use wwan interface for Telit modules ip6_tunnels: raddr and laddr are inverted in nl msg ...
2013-09-19netpoll: fix NULL pointer dereference in netpoll_cleanupNikolay Aleksandrov
I've been hitting a NULL ptr deref while using netconsole because the np->dev check and the pointer manipulation in netpoll_cleanup are done without rtnl and the following sequence happens when having a netconsole over a vlan and we remove the vlan while disabling the netconsole: CPU 1 CPU2 removes vlan and calls the notifier enters store_enabled(), calls netdev_cleanup which checks np->dev and then waits for rtnl executes the netconsole netdev release notifier making np->dev == NULL and releases rtnl continues to dereference a member of np->dev which at this point is == NULL Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-19ip: generate unique IP identificator if local fragmentation is allowedAnsis Atteka
If local fragmentation is allowed, then ip_select_ident() and ip_select_ident_more() need to generate unique IDs to ensure correct defragmentation on the peer. For example, if IPsec (tunnel mode) has to encrypt large skbs that have local_df bit set, then all IP fragments that belonged to different ESP datagrams would have used the same identificator. If one of these IP fragments would get lost or reordered, then peer could possibly stitch together wrong IP fragments that did not belong to the same datagram. This would lead to a packet loss or data corruption. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-19ip: use ip_hdr() in __ip_make_skb() to retrieve IP headerAnsis Atteka
skb->data already points to IP header, but for the sake of consistency we can also use ip_hdr() to retrieve it. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-19Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph fixes from Sage Weil: "These fix several bugs with RBD from 3.11 that didn't get tested in time for the merge window: some error handling, a use-after-free, and a sequencing issue when unmapping and image races with a notify operation. There is also a patch fixing a problem with the new ceph + fscache code that just went in" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: fscache: check consistency does not decrement refcount rbd: fix error handling from rbd_snap_name() rbd: ignore unmapped snapshots that no longer exist rbd: fix use-after free of rbd_dev->disk rbd: make rbd_obj_notify_ack() synchronous rbd: complete notifies before cleaning up osd_client and rbd_dev libceph: add function to ensure notifies are complete
2013-09-18ipvs: stats should not depend on CPU 0Julian Anastasov
When reading percpu stats we need to properly reset the sum when CPU 0 is not present in the possible mask. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-09-18ipvs: do not use dest after ip_vs_dest_put in LBLCRJulian Anastasov
commit c5549571f975ab ("ipvs: convert lblcr scheduler to rcu") allows RCU readers to use dest after calling ip_vs_dest_put(). In the corner case it can race with ip_vs_dest_trash_expire() which can release the dest while it is being returned to the RCU readers as scheduling result. To fix the problem do not allow e->dest to be replaced and defer the ip_vs_dest_put() call by using RCU callback. Now e->dest does not need to be RCU pointer. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-09-18ipvs: do not use dest after ip_vs_dest_put in LBLCJulian Anastasov
commit c2a4ffb70eef39 ("ipvs: convert lblc scheduler to rcu") allows RCU readers to use dest after calling ip_vs_dest_put(). In the corner case it can race with ip_vs_dest_trash_expire() which can release the dest while it is being returned to the RCU readers as scheduling result. To fix the problem do not allow en->dest to be replaced and defer the ip_vs_dest_put() call by using RCU callback. Now en->dest does not need to be RCU pointer. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-09-18ipvs: make the service replacement more robustJulian Anastasov
commit 578bc3ef1e473a ("ipvs: reorganize dest trash") added IP_VS_DEST_STATE_REMOVING flag and RCU callback named ip_vs_dest_wait_readers() to keep dests and services after removal for at least a RCU grace period. But we have the following corner cases: - we can not reuse the same dest if its service is removed while IP_VS_DEST_STATE_REMOVING is still set because another dest removal in the first grace period can not extend this period. It can happen when ipvsadm -C && ipvsadm -R is used. - dest->svc can be replaced but ip_vs_in_stats() and ip_vs_out_stats() have no explicit read memory barriers when accessing dest->svc. It can happen that dest->svc was just freed (replaced) while we use it to update the stats. We solve the problems as follows: - IP_VS_DEST_STATE_REMOVING is removed and we ensure a fixed idle period for the dest (IP_VS_DEST_TRASH_PERIOD). idle_start will remember when for first time after deletion we noticed dest->refcnt=0. Later, the connections can grab a reference while in RCU grace period but if refcnt becomes 0 we can safely free the dest and its svc. - dest->svc becomes RCU pointer. As result, we add explicit RCU locking in ip_vs_in_stats() and ip_vs_out_stats(). - __ip_vs_unbind_svc is renamed to __ip_vs_svc_put(), it now can free the service immediately or after a RCU grace period. dest->svc is not set to NULL anymore. As result, unlinked dests and their services are freed always after IP_VS_DEST_TRASH_PERIOD period, unused services are freed after a RCU grace period. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>