summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-04-14ipv6: datagram: Refactor dst lookup and update codes to a new functionMartin KaFai Lau
This patch moves the route lookup and update codes for connected datagram sk to a newly created function ip6_datagram_dst_update() It will be reused during the pmtu update in the later patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14ipv6: datagram: Refactor flowi6 init codes to a new functionMartin KaFai Lau
Move flowi6 init codes for connected datagram sk to a newly created function ip6_datagram_flow_key_init(). Notes: 1. fl6_flowlabel is used instead of fl6.flowlabel in __ip6_datagram_connect 2. ipv6_addr_is_multicast(&fl6->daddr) is used instead of (addr_type & IPV6_ADDR_MULTICAST) in ip6_datagram_flow_key_init() This new function will be reused during pmtu update in the later patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14Merge tag 'mac80211-for-davem-2016-04-14' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== This has just the single fix from Dmitry Ivanov, adding the missing netlink notifier family check to avoid the socket close DoS problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14net: sched: do not requeue a NULL skbLars Persson
A failure in validate_xmit_skb_list() triggered an unconditional call to dev_requeue_skb with skb=NULL. This slowly grows the queue discipline's qlen count until all traffic through the queue stops. We take the optimistic approach and continue running the queue after a failure since it is unknown if later packets also will fail in the validate path. Fixes: 55a93b3ea780 ("qdisc: validate skb without holding lock") Signed-off-by: Lars Persson <larper@axis.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interfaceMathias Krause
Because we miss to wipe the remainder of i->addr[] in packet_mc_add(), pdiag_put_mclist() leaks uninitialized heap bytes via the PACKET_DIAG_MCLIST netlink attribute. Fix this by explicitly memset(0)ing the remaining bytes in i->addr[]. Fixes: eea68e2f1a00 ("packet: Report socket mclist info via diag module") Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-13route: do not cache fib route info on local routes with oifChris Friesen
For local routes that require a particular output interface we do not want to cache the result. Caching the result causes incorrect behaviour when there are multiple source addresses on the interface. The end result being that if the intended recipient is waiting on that interface for the packet he won't receive it because it will be delivered on the loopback interface and the IP_PKTINFO ipi_ifindex will be set to the loopback interface as well. This can be tested by running a program such as "dhcp_release" which attempts to inject a packet on a particular interface so that it is received by another program on the same board. The receiving process should see an IP_PKTINFO ipi_ifndex value of the source interface (e.g., eth1) instead of the loopback interface (e.g., lo). The packet will still appear on the loopback interface in tcpdump but the important aspect is that the CMSG info is correct. Sample dhcp_release command line: dhcp_release eth1 192.168.204.222 02:11:33:22:44:66 Signed-off-by: Allain Legacy <allain.legacy@windriver.com> Signed off-by: Chris Friesen <chris.friesen@windriver.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-13net: ipv6: Do not keep linklocal and loopback addressesDavid Ahern
f1705ec197e7 added the option to retain user configured addresses on an admin down. A comment to one of the later revisions suggested using the IFA_F_PERMANENT flag rather than adding a user_managed boolean to the ifaddr struct. A side effect of this change is that link local and loopback addresses are also retained which is not part of the objective of f1705ec197e7. Add check to drop those addresses. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree. More specifically, they are: 1) Fix missing filter table per-netns registration in arptables, from Florian Westphal. 2) Resolve out of bound access when parsing TCP options in nf_conntrack_tcp, patch from Jozsef Kadlecsik. 3) Prefer NFPROTO_BRIDGE extensions over NFPROTO_UNSPEC in ebtables, this resolves conflict between xt_limit and ebt_limit, from Phil Sutter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-13netfilter: ebtables: Fix extension lookup with identical namePhil Sutter
If a requested extension exists as module and is not loaded, ebt_check_match() might accidentally use an NFPROTO_UNSPEC one with same name and fail. Reproduced with limit match: Given xt_limit and ebt_limit both built as module, the following would fail: modprobe xt_limit ebtables -I INPUT --limit 1/s -j ACCEPT The fix is to make ebt_check_match() distrust a found NFPROTO_UNSPEC extension and retry after requesting an appropriate module. Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-04-12nl80211: check netlink protocol in socket release notificationDmitry Ivanov
A non-privileged user can create a netlink socket with the same port_id as used by an existing open nl80211 netlink socket (e.g. as used by a hostapd process) with a different protocol number. Closing this socket will then lead to the notification going to nl80211's socket release notification handler, and possibly cause an action such as removing a virtual interface. Fix this issue by checking that the netlink protocol is NETLINK_GENERIC. Since generic netlink has no notifier chain of its own, we can't fix the problem more generically. Fixes: 026331c4d9b5 ("cfg80211/mac80211: allow registering for and sending action frames") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Ivanov <dima@ubnt.com> [rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-11net: vrf: Fix dev refcnt leak due to IPv6 prefix routeDavid Ahern
ifupdown2 found a kernel bug with IPv6 routes and movement from the main table to the VRF table. Sequence of events: Create the interface and add addresses: ip link add dev eth4.105 link eth4 type vlan id 105 ip addr add dev eth4.105 8.105.105.10/24 ip -6 addr add dev eth4.105 2008:105:105::10/64 At this point IPv6 has inserted a prefix route in the main table even though the interface is 'down'. From there the VRF device is created: ip link add dev vrf105 type vrf table 105 ip addr add dev vrf105 9.9.105.10/32 ip -6 addr add dev vrf105 2000:9:105::10/128 ip link set vrf105 up Then the interface is enslaved, while still in the 'down' state: ip link set dev eth4.105 master vrf105 Since the device is down the VRF driver cycling the device does not send the NETDEV_UP and NETDEV_DOWN but rather the NETDEV_CHANGE event which does not flush the routes inserted prior. When the link is brought up ip link set dev eth4.105 up the prefix route is added in the VRF table, but does not remove the route from the main table. Fix by handling the NETDEV_CHANGEUPPER event similar what was implemented for IPv4 in 7f49e7a38b77 ("net: Flush local routes when device changes vrf association") Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11net: vrf: Fix dst reference countingDavid Ahern
Vivek reported a kernel exception deleting a VRF with an active connection through it. The root cause is that the socket has a cached reference to a dst that is destroyed. Converting the dst_destroy to dst_release and letting proper reference counting kick in does not work as the dst has a reference to the device which needs to be released as well. I talked to Hannes about this at netdev and he pointed out the ipv4 and ipv6 dst handling has dst_ifdown for just this scenario. Rather than continuing with the reinvented dst wheel in VRF just remove it and leverage the ipv4 and ipv6 versions. Fixes: 193125dbd8eb2 ("net: Introduce VRF device driver") Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11tipc: purge deferred updates from dead nodesErik Hugne
If a peer node becomes unavailable, in addition to removing the nametable entries from this node we also need to purge all deferred updates associated with this node. Signed-off-by: Erik Hugne <erik.hugne@gmail.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11tipc: make dist queue pernetErik Hugne
Nametable updates received from the network that cannot be applied immediately are placed on a defer queue. This queue is global to the TIPC module, which might cause problems when using TIPC in containers. To prevent nametable updates from escaping into the wrong namespace, we make the queue pernet instead. Signed-off-by: Erik Hugne <erik.hugne@gmail.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-10netlink: don't send NETLINK_URELEASE for unbound socketsDmitry Ivanov
All existing users of NETLINK_URELEASE use it to clean up resources that were previously allocated to a socket via some command. As a result, no users require getting this notification for unbound sockets. Sending it for unbound sockets, however, is a problem because any user (including unprivileged users) can create a socket that uses the same ID as an existing socket. Binding this new socket will fail, but if the NETLINK_URELEASE notification is generated for such sockets, the users thereof will be tricked into thinking the socket that they allocated the resources for is closed. In the nl80211 case, this will cause destruction of virtual interfaces that still belong to an existing hostapd process; this is the case that Dmitry noticed. In the NFC case, it will cause a poll abort. In the case of netlink log/queue it will cause them to stop reporting events, as if NFULNL_CFG_CMD_UNBIND/NFQNL_CFG_CMD_UNBIND had been called. Fix this problem by checking that the socket is bound before generating the NETLINK_URELEASE notification. Cc: stable@vger.kernel.org Signed-off-by: Dmitry Ivanov <dima@ubnt.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-10decnet: Do not build routes to devices without decnet private data.David S. Miller
In particular, make sure we check for decnet private presence for loopback devices. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-10sctp: avoid refreshing heartbeat timer too oftenMarcelo Ricardo Leitner
Currently on high rate SCTP streams the heartbeat timer refresh can consume quite a lot of resources as timer updates are costly and it contains a random factor, which a) is also costly and b) invalidates mod_timer() optimization for not editing a timer to the same value. It may even cause the timer to be slightly advanced, for no good reason. As suggested by David Laight this patch now removes this timer update from hot path by leaving the timer on and re-evaluating upon its expiration if the heartbeat is still needed or not, similarly to what is done for TCP. If it's not needed anymore the timer is re-scheduled to the new timeout, considering the time already elapsed. For this, we now record the last tx timestamp per transport, updated in the same spots as hb timer was restarted on tx. Also split up sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the heartbeat one. On loopback with MTU of 65535 and data chunks with 1636, so that we have a considerable amount of chunks without stressing system calls, netperf -t SCTP_STREAM -l 30, perf looked like this before: Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000 Overhead Command Shared Object Symbol + 6,15% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string - 5,43% netperf [kernel.vmlinux] [k] _raw_write_unlock_irqrestore - _raw_write_unlock_irqrestore - 96,54% _raw_spin_unlock_irqrestore - 36,14% mod_timer + 97,24% sctp_transport_reset_timers + 2,76% sctp_do_sm + 33,65% __wake_up_sync_key + 28,77% sctp_ulpq_tail_event + 1,40% del_timer - 1,84% mod_timer + 99,03% sctp_transport_reset_timers + 0,97% sctp_do_sm + 1,50% sctp_ulpq_tail_event And after this patch, now with netperf -l 60: Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000 Overhead Command Shared Object Symbol + 5,65% netperf [kernel.vmlinux] [k] memcpy_erms + 5,59% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string - 5,05% netperf [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore - _raw_spin_unlock_irqrestore + 49,89% __wake_up_sync_key + 45,68% sctp_ulpq_tail_event - 2,85% mod_timer + 76,51% sctp_transport_reset_t3_rtx + 23,49% sctp_do_sm + 1,55% del_timer + 2,50% netperf [sctp] [k] sctp_datamsg_from_user + 2,26% netperf [sctp] [k] sctp_sendmsg Throughput-wise, from 6800mbps without the patch to 7050mbps with it, ~3.7%. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Stale SKB data pointer access across pskb_may_pull() calls in L2TP, from Haishuang Yan. 2) Fix multicast frame handling in mac80211 AP code, from Felix Fietkau. 3) mac80211 station hashtable insert errors not handled properly, fix from Johannes Berg. 4) Fix TX descriptor count limit handling in e1000, from Alexander Duyck. 5) Revert a buggy netdev refcount fix in netpoll, from Bjorn Helgaas. 6) Must assign rtnl_link_ops of the device before registering it, fix in ip6_tunnel from Thadeu Lima de Souza Cascardo. 7) Memory leak fix in tc action net exit, from WANG Cong. 8) Add missing AF_KCM entries to name tables, from Dexuan Cui. 9) Fix regression in GRE handling of csums wrt. FOU, from Alexander Duyck. 10) Fix memory allocation alignment and congestion map corruption in RDS, from Shamir Rabinovitch. 11) Fix default qdisc regression in tuntap driver, from Jason Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) bridge, netem: mark mailing lists as moderated tuntap: restore default qdisc mpls: find_outdev: check for err ptr in addition to NULL check ipv6: Count in extension headers in skb->network_header RDS: fix congestion map corruption for PAGE_SIZE > 4k RDS: memory allocated must be align to 8 GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU net: add the AF_KCM entries to family name tables MAINTAINERS: intel-wired-lan list is moderated lib/test_bpf: Add additional BPF_ADD tests lib/test_bpf: Add test to check for result of 32-bit add that overflows lib/test_bpf: Add tests for unsigned BPF_JGT lib/test_bpf: Fix JMP_JSET tests VSOCK: Detach QP check should filter out non matching QPs. stmmac: fix adjust link call in case of a switch is attached af_packet: tone down the Tx-ring unsupported spew. net_sched: fix a memory leak in tc action samples/bpf: Enable powerpc support samples/bpf: Use llc in PATH, rather than a hardcoded value samples/bpf: Fix build breakage with map_perf_test_user.c ...
2016-04-08Merge tag 'mac80211-for-davem-2016-04-06' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== For the current RC series, we have the following fixes: * TDLS fixes from Arik and Ilan * rhashtable fixes from Ben and myself * documentation fixes from Luis * U-APSD fixes from Emmanuel * a TXQ fix from Felix * and a compiler warning suppression from Jeff ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-08mpls: find_outdev: check for err ptr in addition to NULL checkRoopa Prabhu
find_outdev calls inet{,6}_fib_lookup_dev() or dev_get_by_index() to find the output device. In case of an error, inet{,6}_fib_lookup_dev() returns error pointer and dev_get_by_index() returns NULL. But the function only checks for NULL and thus can end up calling dev_put on an ERR_PTR. This patch adds an additional check for err ptr after the NULL check. Before: Trying to add an mpls route with no oif from user, no available path to 10.1.1.8 and no default route: $ip -f mpls route add 100 as 200 via inet 10.1.1.8 [ 822.337195] BUG: unable to handle kernel NULL pointer dereference at 00000000000003a3 [ 822.340033] IP: [<ffffffff8148781e>] mpls_nh_assign_dev+0x10b/0x182 [ 822.340033] PGD 1db38067 PUD 1de9e067 PMD 0 [ 822.340033] Oops: 0000 [#1] SMP [ 822.340033] Modules linked in: [ 822.340033] CPU: 0 PID: 11148 Comm: ip Not tainted 4.5.0-rc7+ #54 [ 822.340033] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 [ 822.340033] task: ffff88001db82580 ti: ffff88001dad4000 task.ti: ffff88001dad4000 [ 822.340033] RIP: 0010:[<ffffffff8148781e>] [<ffffffff8148781e>] mpls_nh_assign_dev+0x10b/0x182 [ 822.340033] RSP: 0018:ffff88001dad7a88 EFLAGS: 00010282 [ 822.340033] RAX: ffffffffffffff9b RBX: ffffffffffffff9b RCX: 0000000000000002 [ 822.340033] RDX: 00000000ffffff9b RSI: 0000000000000008 RDI: 0000000000000000 [ 822.340033] RBP: ffff88001ddc9ea0 R08: ffff88001e9f1768 R09: 0000000000000000 [ 822.340033] R10: ffff88001d9c1100 R11: ffff88001e3c89f0 R12: ffffffff8187e0c0 [ 822.340033] R13: ffffffff8187e0c0 R14: ffff88001ddc9e80 R15: 0000000000000004 [ 822.340033] FS: 00007ff9ed798700(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 [ 822.340033] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 822.340033] CR2: 00000000000003a3 CR3: 000000001de89000 CR4: 00000000000006f0 [ 822.340033] Stack: [ 822.340033] 0000000000000000 0000000100000000 0000000000000000 0000000000000000 [ 822.340033] 0000000000000000 0801010a00000000 0000000000000000 0000000000000000 [ 822.340033] 0000000000000004 ffffffff8148749b ffffffff8187e0c0 000000000000001c [ 822.340033] Call Trace: [ 822.340033] [<ffffffff8148749b>] ? mpls_rt_alloc+0x2b/0x3e [ 822.340033] [<ffffffff81488e66>] ? mpls_rtm_newroute+0x358/0x3e2 [ 822.340033] [<ffffffff810e7bbc>] ? get_page+0x5/0xa [ 822.340033] [<ffffffff813b7d94>] ? rtnetlink_rcv_msg+0x17e/0x191 [ 822.340033] [<ffffffff8111794e>] ? __kmalloc_track_caller+0x8c/0x9e [ 822.340033] [<ffffffff813c9393>] ? rht_key_hashfn.isra.20.constprop.57+0x14/0x1f [ 822.340033] [<ffffffff813b7c16>] ? __rtnl_unlock+0xc/0xc [ 822.340033] [<ffffffff813cb794>] ? netlink_rcv_skb+0x36/0x82 [ 822.340033] [<ffffffff813b4507>] ? rtnetlink_rcv+0x1f/0x28 [ 822.340033] [<ffffffff813cb2b1>] ? netlink_unicast+0x106/0x189 [ 822.340033] [<ffffffff813cb5b3>] ? netlink_sendmsg+0x27f/0x2c8 [ 822.340033] [<ffffffff81392ede>] ? sock_sendmsg_nosec+0x10/0x1b [ 822.340033] [<ffffffff81393df1>] ? ___sys_sendmsg+0x182/0x1e3 [ 822.340033] [<ffffffff810e4f35>] ? __alloc_pages_nodemask+0x11c/0x1e4 [ 822.340033] [<ffffffff8110619c>] ? PageAnon+0x5/0xd [ 822.340033] [<ffffffff811062fe>] ? __page_set_anon_rmap+0x45/0x52 [ 822.340033] [<ffffffff810e7bbc>] ? get_page+0x5/0xa [ 822.340033] [<ffffffff810e85ab>] ? __lru_cache_add+0x1a/0x3a [ 822.340033] [<ffffffff81087ea9>] ? current_kernel_time64+0x9/0x30 [ 822.340033] [<ffffffff813940c4>] ? __sys_sendmsg+0x3c/0x5a [ 822.340033] [<ffffffff8148f597>] ? entry_SYSCALL_64_fastpath+0x12/0x6a [ 822.340033] Code: 83 08 04 00 00 65 ff 00 48 8b 3c 24 e8 40 7c f2 ff eb 13 48 c7 c3 9f ff ff ff eb 0f 89 ce e8 f1 ae f1 ff 48 89 c3 48 85 db 74 15 <48> 8b 83 08 04 00 00 65 ff 08 48 81 fb 00 f0 ff ff 76 0d eb 07 [ 822.340033] RIP [<ffffffff8148781e>] mpls_nh_assign_dev+0x10b/0x182 [ 822.340033] RSP <ffff88001dad7a88> [ 822.340033] CR2: 00000000000003a3 [ 822.435363] ---[ end trace 98cc65e6f6b8bf11 ]--- After patch: $ip -f mpls route add 100 as 200 via inet 10.1.1.8 RTNETLINK answers: Network is unreachable Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reported-by: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07ipv6: Count in extension headers in skb->network_headerJakub Sitnicki
When sending a UDPv6 message longer than MTU, account for the length of fragmentable IPv6 extension headers in skb->network_header offset. Same as we do in alloc_new_skb path in __ip6_append_data(). This ensures that later on __ip6_make_skb() will make space in headroom for fragmentable extension headers: /* move skb->data to ip header from ext header */ if (skb->data < skb_network_header(skb)) __skb_pull(skb, skb_network_offset(skb)); Prevents a splat due to skb_under_panic: skbuff: skb_under_panic: text:ffffffff8143397b len:2126 put:14 \ head:ffff880005bacf50 data:ffff880005bacf4a tail:0x48 end:0xc0 dev:lo ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:104! invalid opcode: 0000 [#1] KASAN CPU: 0 PID: 160 Comm: reproducer Not tainted 4.6.0-rc2 #65 [...] Call Trace: [<ffffffff813eb7b9>] skb_push+0x79/0x80 [<ffffffff8143397b>] eth_header+0x2b/0x100 [<ffffffff8141e0d0>] neigh_resolve_output+0x210/0x310 [<ffffffff814eab77>] ip6_finish_output2+0x4a7/0x7c0 [<ffffffff814efe3a>] ip6_output+0x16a/0x280 [<ffffffff815440c1>] ip6_local_out+0xb1/0xf0 [<ffffffff814f1115>] ip6_send_skb+0x45/0xd0 [<ffffffff81518836>] udp_v6_send_skb+0x246/0x5d0 [<ffffffff8151985e>] udpv6_sendmsg+0xa6e/0x1090 [...] Reported-by: Ji Jianwen <jiji@redhat.com> Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07RDS: fix congestion map corruption for PAGE_SIZE > 4kshamir rabinovitch
When PAGE_SIZE > 4k single page can contain 2 RDS fragments. If 'rds_ib_cong_recv' ignore the RDS fragment offset in to the page it then read the data fragment as far congestion map update and lead to corruption of the RDS connection far congestion map. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07RDS: memory allocated must be align to 8shamir rabinovitch
Fix issue in 'rds_ib_cong_recv' when accessing unaligned memory allocated by 'rds_page_remainder_alloc' using uint64_t pointer. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOUAlexander Duyck
This patch fixes an issue I found in which we were dropping frames if we had enabled checksums on GRE headers that were encapsulated by either FOU or GUE. Without this patch I was barely able to get 1 Gb/s of throughput. With this patch applied I am now at least getting around 6 Gb/s. The issue is due to the fact that with FOU or GUE applied we do not provide a transport offset pointing to the GRE header, nor do we offload it in software as the GRE header is completely skipped by GSO and treated like a VXLAN or GENEVE type header. As such we need to prevent the stack from generating it and also prevent GRE from generating it via any interface we create. Fixes: c3483384ee511 ("gro: Allow tunnel stacking in the case of FOU/GUE") Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07netfilter: nf_conntrack_tcp: Fix stack out of bounds when parsing TCP optionsJozsef Kadlecsik
Baozeng Ding reported a KASAN stack out of bounds issue - it uncovered that the TCP option parsing routines in netfilter TCP connection tracking could read one byte out of the buffer of the TCP options. Therefore in the patch we check that the available data length is large enough to parse both TCP option code and size. Reported-by: Baozeng Ding <sploving1@gmail.com> Tested-by: Baozeng Ding <sploving1@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-04-07netfilter: arp_tables: register table in initnsFlorian Westphal
arptables is broken since we didn't register the table anymore -- even 'arptables -L' fails. Fixes: b9e69e127397187b ("netfilter: xtables: don't hook tables by default") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-04-06net: add the AF_KCM entries to family name tablesDexuan Cui
This is for the recent kcm driver, which introduces AF_KCM(41) in b7ac4eb(kcm: Kernel Connection Multiplexor module). Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06VSOCK: Detach QP check should filter out non matching QPs.Jorgen Hansen
The check in vmci_transport_peer_detach_cb should only allow a detach when the qp handle of the transport matches the one in the detach message. Testing: Before this change, a detach from a peer on a different socket would cause an active stream socket to register a detach. Reviewed-by: George Zhang <georgezhang@vmware.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06af_packet: tone down the Tx-ring unsupported spew.Dave Jones
Trinity and other fuzzers can hit this WARN on far too easily, resulting in a tainted kernel that hinders automated fuzzing. Replace it with a rate-limited printk. Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06Revert "bridge: Fix incorrect variable assignment on error path in ↵David S. Miller
br_sysfs_addbr" This reverts commit c862cc9b70526a71d07e7bd86d9b61d1c792cead. Patch lacks a real-name Signed-off-by. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06mac80211: fix "warning: ‘target_metric’ may be used uninitialized"Jeff Mahoney
This fixes: net/mac80211/mesh_hwmp.c:603:26: warning: ‘target_metric’ may be used uninitialized in this function target_metric is only consumed when reply = true so no bug exists here, but not all versions of gcc realize it. Initialize to 0 to remove the warning. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05ip6_tunnel: set rtnl_link_ops before calling register_netdeviceThadeu Lima de Souza Cascardo
When creating an ip6tnl tunnel with ip tunnel, rtnl_link_ops is not set before ip6_tnl_create2 is called. When register_netdevice is called, there is no linkinfo attribute in the NEWLINK message because of that. Setting rtnl_link_ops before calling register_netdevice fixes that. Fixes: 0b112457229d ("ip6tnl: add support of link creation via rtnl") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05Revert "netpoll: Fix extra refcount release in netpoll_cleanup()"Bjorn Helgaas
This reverts commit 543e3a8da5a4c453e992d5351ef405d5e32f27d7. Direct callers of __netpoll_setup() depend on it to set np->dev, so we can't simply move that assignment up to netpoll_stup(). Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05sctp: flush if we can't fit another DATA chunkMarcelo Ricardo Leitner
There is no point on delaying the packet if we can't fit a single byte of data on it anymore. So lets just reduce the threshold by the amount that a data chunk with 4 bytes (rounding) would use. v2: based on the right tree Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05mac80211: close the SP when we enqueue frames during the SPEmmanuel Grumbach
Since we enqueued the frame that was supposed to be sent during the SP, and that frame may very well cary the IEEE80211_TX_STATUS_EOSP bit, we may never close the SP (WLAN_STA_SP will never be cleared). If that happens, we will not open any new SP and will never respond to any poll frame from the client. Clear WLAN_STA_SP manually if a frame that was polled during the SP is queued because of a starting A-MPDU session. The client may not see the EOSP bit, but it will at least be able to poll new frames in another SP. Reported-by: Alesya Shapira <alesya.shapira@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> [remove erroneous comment] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05mac80211: Fix BW upgrade for TDLS peersIlan Peer
It is possible that the station is connected to an AP with bandwidth of 80+80MHz or 160MHz. In such cases there is no need to perform an upgrade as the maximal supported bandwidth is 80MHz. In addition, when upgrading and setting center_freq1 and bandwidth to 80MHz also set center_freq2 to 0. Fixes: 0fabfaafec3a ("mac80211: upgrade BW of TDLS peers when possible" Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05mac80211: don't send deferred frames outside the SPEmmanuel Grumbach
Frames that are sent between ampdu_action(IEEE80211_AMPDU_TX_START) and the move to the HT_AGG_STATE_OPERATIONAL state are buffered. If we try to start an A-MPDU session while the peer is sleeping and polling frames with U-APSD, we may have frames that will be buffered by ieee80211_tx_prep_agg. These frames have IEEE80211_TX_CTL_NO_PS_BUFFER set since they are sent to a sleeping client and possibly IEEE80211_TX_STATUS_EOSP. If the frame is buffered, we need clear these two flags since they will be re-sent after the move to HT_AGG_STATE_OPERATIONAL state which is very likely to happen after the SP ends. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05mac80211: remove description of dropped memberLuis de Bethencourt
Commit 976bd9efdae6 ("mac80211: move beacon_loss_count into ifmgd") removed the member from the sta_info struct but the description stayed lingering. Remove it. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05mac80211: ensure no limits on station rhashtableBen Greear
By default, the rhashtable logic will fail to insert objects if the key-chains are too long and un-balanced. In the degenerate case where mac80211 is creating many virtual interfaces connected to the same peer(s), this case can happen. St insecure_elasticity to true to allow chains to grow as long as needed. Signed-off-by: Ben Greear <greearb@candelatech.com> [remove message, change commit message slightly] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05mac80211: properly deal with station hashtable insert errorsJohannes Berg
The original hand-implemented hash-table in mac80211 couldn't result in insertion errors, and while converting to rhashtable I evidently forgot to check the errors. This surfaced now only because Ben is adding many identical keys and that resulted in hidden insertion errors. Cc: stable@vger.kernel.org Fixes: 7bedd0cfad4e1 ("mac80211: use rhashtable for station table") Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05mac80211: recalc min_def chanctx even when chandef is identicalArik Nemtsov
The min_def chanctx is affected not only by the current chandef, but sometimes also by other stations on the vif. There's a valid scenario where a TDLS peer can widen its BW, thereby causing the min_def to increase. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05mac80211: TDLS: change BW calculation for WIDER_BW peersArik Nemtsov
The previous approach simply ignored chandef restrictions when calculating the appropriate peer BW for a WIDER_BW peer. This could result in a regulatory violation if both peers indicated 80MHz support, but the regdomain forbade it. Change the approach to setting a WIDER_BW peer's BW. Don't exempt it from the chandef width at first. If during TDLS negotiation the chandef width is upgraded, update the peer's BW to match. Fixes: 0fabfaafec3a ("mac80211: upgrade BW of TDLS peers when possible") Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05mac80211: TDLS: always downgrade invalid chandefsArik Nemtsov
Even if the current chandef width is equal to the station's max-BW, it doesn't mean it's a valid width for TDLS. Make sure to always check regulatory constraints in these cases. Fixes: 0fabfaafec3a ("mac80211: upgrade BW of TDLS peers when possible") Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05mac80211: fix AP buffered multicast frames with queue control and txqFelix Fietkau
Buffered multicast frames must be passed to the driver directly via drv_tx instead of going through the txq, otherwise they cannot easily be scheduled to be sent after DTIM. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-04bridge: Fix incorrect variable assignment on error path in br_sysfs_addbrBastien Philbert
This fixes the incorrect variable assignment on error path in br_sysfs_addbr for when the call to kobject_create_and_add fails to assign the value of -EINVAL to the returned variable of err rather then incorrectly return zero making callers think this function has succeededed due to the previous assignment being assigned zero when assigning it the successful return value of the call to sysfs_create_group which is zero. Signed-off-by: Bastien Philbert <bastienphilbert@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04ipv6: l2tp: fix a potential issue in l2tp_ip6_recvHaishuang Yan
pskb_may_pull() can change skb->data, so we have to load ptr/optr at the right place. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04ipv4: l2tp: fix a potential issue in l2tp_ip_recvHaishuang Yan
pskb_may_pull() can change skb->data, so we have to load ptr/optr at the right place. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04Merge branch 'PAGE_CACHE_SIZE-removal'Linus Torvalds
Merge PAGE_CACHE_SIZE removal patches from Kirill Shutemov: "PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. Let's stop pretending that pages in page cache are special. They are not. The first patch with most changes has been done with coccinelle. The second is manual fixups on top. The third patch removes macros definition" [ I was planning to apply this just before rc2, but then I spaced out, so here it is right _after_ rc2 instead. As Kirill suggested as a possibility, I could have decided to only merge the first two patches, and leave the old interfaces for compatibility, but I'd rather get it all done and any out-of-tree modules and patches can trivially do the converstion while still also working with older kernels, so there is little reason to try to maintain the redundant legacy model. - Linus ] * PAGE_CACHE_SIZE-removal: mm: drop PAGE_CACHE_* and page_cache_{get,release} definition mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
2016-04-04mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usageKirill A. Shutemov
Mostly direct substitution with occasional adjustment or removing outdated comments. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macrosKirill A. Shutemov
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>