summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-01-24tipc: fix connection refcount errorParthasarathy Bhuvaragan
Until now, the generic server framework maintains the connection id's per subscriber in server's conn_idr. At tipc_close_conn, we remove the connection id from the server list, but the connection is valid until we call the refcount cleanup. Hence we have a window where the server allocates the same connection to an new subscriber leading to inconsistent reference count. We have another refcount warning we grab the refcount in tipc_conn_lookup() for connections with flag with CF_CONNECTED not set. This usually occurs at shutdown when the we stop the topology server and withdraw TIPC_CFG_SRV publication thereby triggering a withdraw message to subscribers. In this commit, we: 1. remove the connection from the server list at recount cleanup. 2. grab the refcount for a connection only if CF_CONNECTED is set. Tested-by: John Thompson <thompa.atl@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24tipc: add subscription refcount to avoid invalid deleteParthasarathy Bhuvaragan
Until now, the subscribers keep track of the subscriptions using reference count at subscriber level. At subscription cancel or subscriber delete, we delete the subscription only if the timer was pending for the subscription. This approach is incorrect as: 1. del_timer() is not SMP safe, if on CPU0 the check for pending timer returns true but CPU1 might schedule the timer callback thereby deleting the subscription. Thus when CPU0 is scheduled, it deletes an invalid subscription. 2. We export tipc_subscrp_report_overlap(), which accesses the subscription pointer multiple times. Meanwhile the subscription timer can expire thereby freeing the subscription and we might continue to access the subscription pointer leading to memory violations. In this commit, we introduce subscription refcount to avoid deleting an invalid subscription. Reported-and-Tested-by: John Thompson <thompa.atl@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24tipc: fix nametbl_lock soft lockup at node/link eventsParthasarathy Bhuvaragan
We trigger a soft lockup as we grab nametbl_lock twice if the node has a pending node up/down or link up/down event while: - we process an incoming named message in tipc_named_rcv() and perform an tipc_update_nametbl(). - we have pending backlog items in the name distributor queue during a nametable update using tipc_nametbl_publish() or tipc_nametbl_withdraw(). The following are the call chain associated: tipc_named_rcv() Grabs nametbl_lock tipc_update_nametbl() (publish/withdraw) tipc_node_subscribe()/unsubscribe() tipc_node_write_unlock() << lockup occurs if an outstanding node/link event exits, as we grabs nametbl_lock again >> tipc_nametbl_withdraw() Grab nametbl_lock tipc_named_process_backlog() tipc_update_nametbl() << rest as above >> The function tipc_node_write_unlock(), in addition to releasing the lock processes the outstanding node/link up/down events. To do this, we need to grab the nametbl_lock again leading to the lockup. In this commit we fix the soft lockup by introducing a fast variant of node_unlock(), where we just release the lock. We adapt the node_subscribe()/node_unsubscribe() to use the fast variants. Reported-and-Tested-by: John Thompson <thompa.atl@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24ipv6: fix ip6_tnl_parse_tlv_enc_lim()Eric Dumazet
This function suffers from multiple issues. First one is that pskb_may_pull() may reallocate skb->head, so the 'raw' pointer needs either to be reloaded or not used at all. Second issue is that NEXTHDR_DEST handling does not validate that the options are present in skb->data, so we might read garbage or access non existent memory. With help from Willem de Bruijn. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24ip6_tunnel: must reload ipv6h in ip6ip6_tnl_xmit()Eric Dumazet
Since ip6_tnl_parse_tlv_enc_lim() can call pskb_may_pull(), we must reload any pointer that was related to skb->head (or skb->data), or risk use after free. Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24af_unix: move unix_mknod() out of bindlockWANG Cong
Dmitry reported a deadlock scenario: unix_bind() path: u->bindlock ==> sb_writer do_splice() path: sb_writer ==> pipe->mutex ==> u->bindlock In the unix_bind() code path, unix_mknod() does not have to be done with u->bindlock held, since it is a pure fs operation, so we can just move unix_mknod() out. Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24mac80211: don't try to sleep in rate_control_rate_init()Johannes Berg
In my previous patch, I missed that rate_control_rate_init() is called from some places that cannot sleep, so it cannot call ieee80211_recalc_min_chandef(). Remove that call for now to fix the context bug, we'll have to find a different way to fix the minimum channel width issue. Fixes: 96aa2e7cf126 ("mac80211: calculate min channel width correctly") Reported-by: Xiaolong Ye (via lkp-robot) <xiaolong.ye@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-23net: dsa: Check return value of phy_connect_direct()Florian Fainelli
We need to check the return value of phy_connect_direct() in dsa_slave_phy_connect() otherwise we may be continuing the initialization of a slave network device with a PHY that already attached somewhere else and which will soon be in error because the PHY device is in error. The conditions for such an error to occur are that we have a port of our switch that is not disabled, and has the same port number as a PHY address (say both 5) that can be probed using the DSA slave MII bus. We end-up having this slave network device find a PHY at the same address as our port number, and we try to attach to it. A slave network (e.g: port 0) has already attached to our PHY device, and we try to re-attach it with a different network device, but since we ignore the error we would end-up initializating incorrect device references by the time the slave network interface is opened. The code has been (re)organized several times, making it hard to provide an exact Fixes tag, this is a bugfix nonetheless. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-23net: mpls: Fix multipath selection for LSR use caseDavid Ahern
MPLS multipath for LSR is broken -- always selecting the first nexthop in the one label case. For example: $ ip -f mpls ro ls 100 nexthop as to 200 via inet 172.16.2.2 dev virt12 nexthop as to 300 via inet 172.16.3.2 dev virt13 101 nexthop as to 201 via inet6 2000:2::2 dev virt12 nexthop as to 301 via inet6 2000:3::2 dev virt13 In this example incoming packets have a single MPLS labels which means BOS bit is set. The BOS bit is passed from mpls_forward down to mpls_multipath_hash which never processes the hash loop because BOS is 1. Update mpls_multipath_hash to process the entire label stack. mpls_hdr_len tracks the total mpls header length on each pass (on pass N mpls_hdr_len is N * sizeof(mpls_shim_hdr)). When the label is found with the BOS set it verifies the skb has sufficient header for ipv4 or ipv6, and find the IPv4 and IPv6 header by using the last mpls_hdr pointer and adding 1 to advance past it. With these changes I have verified the code correctly sees the label, BOS, IPv4 and IPv6 addresses in the network header and icmp/tcp/udp traffic for ipv4 and ipv6 are distributed across the nexthops. Fixes: 1c78efa8319ca ("mpls: flow-based multipath selection") Acked-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20bridge: netlink: call br_changelink() during br_dev_newlink()Ivan Vecera
Any bridge options specified during link creation (e.g. ip link add) are ignored as br_dev_newlink() does not process them. Use br_changelink() to do it. Fixes: 133235161721 ("bridge: implement rtnl_link_ops->changelink") Signed-off-by: Ivan Vecera <cera@cera.cz> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20Revert "net: sctp: fix array overrun read on sctp_timer_tbl"David S. Miller
This reverts commit 0e73fc9a56f22f2eec4d2b2910c649f7af67b74d. This fix wasn't correct, a better one is coming right up. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20net: sctp: fix array overrun read on sctp_timer_tblColin Ian King
The comparison on the timeout can lead to an array overrun read on sctp_timer_tbl because of an off-by-one error. Fix this by using < instead of <= and also compare to the array size rather than SCTP_EVENT_TIMEOUT_MAX. Fixes CoverityScan CID#1397639 ("Out-of-bounds read") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20ipv6: seg6_genl_set_tunsrc() must check kmemdup() return valueEric Dumazet
seg6_genl_get_tunsrc() and set_tun_src() do not handle tun_src being possibly NULL, so we must check kmemdup() return value and abort if it is NULL Fixes: 915d7e5e5930 ("ipv6: sr: add code base for control plane support of SR-IPv6") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Lebrun <david.lebrun@uclouvain.be> Acked-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20virtio-net: restore VIRTIO_HDR_F_DATA_VALID on receivingJason Wang
Commit 501db511397f ("virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit") in fact disables VIRTIO_HDR_F_DATA_VALID on receiving path too, fixing this by adding a hint (has_data_valid) and set it only on the receiving path. Cc: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-19tcp: initialize max window for a new fastopen socketAlexey Kodanev
Found that if we run LTP netstress test with large MSS (65K), the first attempt from server to send data comparable to this MSS on fastopen connection will be delayed by the probe timer. Here is an example: < S seq 0:0 win 43690 options [mss 65495 wscale 7 tfo cookie] length 32 > S. seq 0:0 ack 1 win 43690 options [mss 65495 wscale 7] length 0 < . ack 1 win 342 length 0 Inside tcp_sendmsg(), tcp_send_mss() returns max MSS in 'mss_now', as well as in 'size_goal'. This results the segment not queued for transmition until all the data copied from user buffer. Then, inside __tcp_push_pending_frames(), it breaks on send window test and continues with the check probe timer. Fragmentation occurs in tcp_write_wakeup()... +0.2 > P. seq 1:43777 ack 1 win 342 length 43776 < . ack 43777, win 1365 length 0 > P. seq 43777:65001 ack 1 win 342 options [...] length 21224 ... This also contradicts with the fact that we should bound to the half of the window if it is large. Fix this flaw by correctly initializing max_window. Before that, it could have large values that affect further calculations of 'size_goal'. Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-19ipv6: addrconf: Avoid addrconf_disable_change() using RCU read-side lockKefeng Wang
Just like commit 4acd4945cd1e ("ipv6: addrconf: Avoid calling netdevice notifiers with RCU read-side lock"), it is unnecessary to make addrconf_disable_change() use RCU iteration over the netdev list, since it already holds the RTNL lock, or we may meet Illegal context switch in RCU read-side critical section. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18lwtunnel: fix autoload of lwt modulesDavid Ahern
Trying to add an mpls encap route when the MPLS modules are not loaded hangs. For example: CONFIG_MPLS=y CONFIG_NET_MPLS_GSO=m CONFIG_MPLS_ROUTING=m CONFIG_MPLS_IPTUNNEL=m $ ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2 The ip command hangs: root 880 826 0 21:25 pts/0 00:00:00 ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2 $ cat /proc/880/stack [<ffffffff81065a9b>] call_usermodehelper_exec+0xd6/0x134 [<ffffffff81065efc>] __request_module+0x27b/0x30a [<ffffffff814542f6>] lwtunnel_build_state+0xe4/0x178 [<ffffffff814aa1e4>] fib_create_info+0x47f/0xdd4 [<ffffffff814ae451>] fib_table_insert+0x90/0x41f [<ffffffff814a8010>] inet_rtm_newroute+0x4b/0x52 ... modprobe is trying to load rtnl-lwt-MPLS: root 881 5 0 21:25 ? 00:00:00 /sbin/modprobe -q -- rtnl-lwt-MPLS and it hangs after loading mpls_router: $ cat /proc/881/stack [<ffffffff81441537>] rtnl_lock+0x12/0x14 [<ffffffff8142ca2a>] register_netdevice_notifier+0x16/0x179 [<ffffffffa0033025>] mpls_init+0x25/0x1000 [mpls_router] [<ffffffff81000471>] do_one_initcall+0x8e/0x13f [<ffffffff81119961>] do_init_module+0x5a/0x1e5 [<ffffffff810bd070>] load_module+0x13bd/0x17d6 ... The problem is that lwtunnel_build_state is called with rtnl lock held preventing mpls_init from registering. Given the potential references held by the time lwtunnel_build_state it can not drop the rtnl lock to the load module. So, extract the module loading code from lwtunnel_build_state into a new function to validate the encap type. The new function is called while converting the user request into a fib_config which is well before any table, device or fib entries are examined. Fixes: 745041e2aaf1 ("lwtunnel: autoload of lwt modules") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18net: fix harmonize_features() vs NETIF_F_HIGHDMAEric Dumazet
Ashizuka reported a highmem oddity and sent a patch for freescale fec driver. But the problem root cause is that core networking stack must ensure no skb with highmem fragment is ever sent through a device that does not assert NETIF_F_HIGHDMA in its features. We need to call illegal_highdma() from harmonize_features() regardless of CSUM checks. Fixes: ec5f06156423 ("net: Kill link between CSUM and SG features.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pravin Shelar <pshelar@ovn.org> Reported-by: "Ashizuka, Yuusuke" <ashiduka@jp.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18net: ethtool: Initialize buffer when querying device channel settingsEran Ben Elisha
Ethtool channels respond struct was uninitialized when querying device channel boundaries settings. As a result, unreported fields by the driver hold garbage. This may cause sending unsupported params to driver. Fixes: 8bf368620486 ('ethtool: ensure channel counts are within bounds ...') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> CC: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Handle multicast packets properly in fast-RX path of mac80211, from Johannes Berg. 2) Because of a logic bug, the user can't actually force SW checksumming on r8152 devices. This makes diagnosis of hw checksumming bugs really annoying. Fix from Hayes Wang. 3) VXLAN route lookup does not take the source and destination ports into account, which means IPSEC policies cannot be matched properly. Fix from Martynas Pumputis. 4) Do proper RCU locking in netvsc callbacks, from Stephen Hemminger. 5) Fix SKB leaks in mlxsw driver, from Arkadi Sharshevsky. 6) If lwtunnel_fill_encap() fails, we do not abort the netlink message construction properly in fib_dump_info(), from David Ahern. 7) Do not use kernel stack for DMA buffers in atusb driver, from Stefan Schmidt. 8) Openvswitch conntack actions need to maintain a correct checksum, fix from Lance Richardson. 9) ax25_disconnect() is missing a check for ax25->sk being NULL, in fact it already checks this, but not in all of the necessary spots. Fix from Basil Gunn. 10) Action GET operations in the packet scheduler can erroneously bump the reference count of the entry, making it unreleasable. Fix from Jamal Hadi Salim. Jamal gives a great set of example command lines that trigger this in the commit message. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits) net sched actions: fix refcnt when GETing of action after bind net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions net/mlx4_core: Fix racy CQ (Completion Queue) free net: stmmac: don't use netdev_[dbg, info, ..] before net_device is registered net/mlx5e: Fix a -Wmaybe-uninitialized warning ax25: Fix segfault after sock connection timeout bpf: rework prog_digest into prog_tag tipc: allocate user memory with GFP_KERNEL flag net: phy: dp83867: allow RGMII_TXID/RGMII_RXID interface types ip6_tunnel: Account for tunnel header in tunnel MTU mld: do not remove mld souce list info when set link down be2net: fix MAC addr setting on privileged BE3 VFs be2net: don't delete MAC on close on unprivileged BE3 VFs be2net: fix status check in be_cmd_pmac_add() cpmac: remove hopeless #warning ravb: do not use zero-length alignment DMA descriptor mlx4: do not call napi_schedule() without care openvswitch: maintain correct checksum state in conntrack actions tcp: fix tcp_fastopen unaligned access complaints on sparc ...
2017-01-16net sched actions: fix refcnt when GETing of action after bindJamal Hadi Salim
Demonstrating the issue: .. add a drop action $sudo $TC actions add action drop index 10 .. retrieve it $ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 2 bind 0 installed 29 sec used 29 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 ... bug 1 above: reference is two. Reference is actually 1 but we forget to subtract 1. ... do a GET again and we see the same issue try a few times and nothing changes ~$ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 2 bind 0 installed 31 sec used 31 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 ... lets try to bind the action to a filter.. $ sudo $TC qdisc add dev lo ingress $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \ u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10 ... and now a few GETs: $ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 3 bind 1 installed 204 sec used 204 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 $ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 4 bind 1 installed 206 sec used 206 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 $ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 5 bind 1 installed 235 sec used 235 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 .... as can be observed the reference count keeps going up. After the fix $ sudo $TC actions add action drop index 10 $ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 1 bind 0 installed 4 sec used 4 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 $ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 1 bind 0 installed 6 sec used 6 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 $ sudo $TC qdisc add dev lo ingress $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \ u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10 $ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 2 bind 1 installed 32 sec used 32 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 $ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 2 bind 1 installed 33 sec used 33 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Fixes: aecc5cefc389 ("net sched actions: fix GETing actions") Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16ax25: Fix segfault after sock connection timeoutBasil Gunn
The ax.25 socket connection timed out & the sock struct has been previously taken down ie. sock struct is now a NULL pointer. Checking the sock_flag causes the segfault. Check if the socket struct pointer is NULL before checking sock_flag. This segfault is seen in timed out netrom connections. Please submit to -stable. Signed-off-by: Basil Gunn <basil@pacabunga.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16bpf: rework prog_digest into prog_tagDaniel Borkmann
Commit 7bd509e311f4 ("bpf: add prog_digest and expose it via fdinfo/netlink") was recently discussed, partially due to admittedly suboptimal name of "prog_digest" in combination with sha1 hash usage, thus inevitably and rightfully concerns about its security in terms of collision resistance were raised with regards to use-cases. The intended use cases are for debugging resp. introspection only for providing a stable "tag" over the instruction sequence that both kernel and user space can calculate independently. It's not usable at all for making a security relevant decision. So collisions where two different instruction sequences generate the same tag can happen, but ideally at a rather low rate. The "tag" will be dumped in hex and is short enough to introspect in tracepoints or kallsyms output along with other data such as stack trace, etc. Thus, this patch performs a rename into prog_tag and truncates the tag to a short output (64 bits) to make it obvious it's not collision-free. Should in future a hash or facility be needed with a security relevant focus, then we can think about requirements, constraints, etc that would fit to that situation. For now, rework the exposed parts for the current use cases as long as nothing has been released yet. Tested on x86_64 and s390x. Fixes: 7bd509e311f4 ("bpf: add prog_digest and expose it via fdinfo/netlink") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16tipc: allocate user memory with GFP_KERNEL flagParthasarathy Bhuvaragan
Until now, we allocate memory always with GFP_ATOMIC flag. When the system is under memory pressure and a user tries to send, the send fails due to low memory. However, the user application can wait for free memory if we allocate it using GFP_KERNEL flag. In this commit, we use allocate memory with GFP_KERNEL for all user allocation. Reported-by: Rune Torgersen <runet@innovsys.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16ip6_tunnel: Account for tunnel header in tunnel MTUJakub Sitnicki
With ip6gre we have a tunnel header which also makes the tunnel MTU smaller. We need to reserve room for it. Previously we were using up space reserved for the Tunnel Encapsulation Limit option header (RFC 2473). Also, after commit b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions") our contract with the caller has changed. Now we check if the packet length exceeds the tunnel MTU after the tunnel header has been pushed, unlike before. This is reflected in the check where we look at the packet length minus the size of the tunnel header, which is already accounted for in tunnel MTU. Fixes: b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions") Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16mld: do not remove mld souce list info when set link downHangbin Liu
This is an IPv6 version of commit 24803f38a5c0 ("igmp: do not remove igmp souce list..."). In mld_del_delrec(), we will restore back all source filter info instead of flush them. Move mld_clear_delrec() from ipv6_mc_down() to ipv6_mc_destroy_dev() since we should not remove source list info when set link down. Remove igmp6_group_dropped() in ipv6_mc_destroy_dev() since we have called it in ipv6_mc_down(). Also clear all source info after igmp6_group_dropped() instead of in it because ipv6_mc_down() will call igmp6_group_dropped(). Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16Merge tag 'nfsd-4.10-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fixes from Bruce Fields: "Miscellaneous nfsd bugfixes, one for a 4.10 regression, three for older bugs" * tag 'nfsd-4.10-1' of git://linux-nfs.org/~bfields/linux: svcrdma: avoid duplicate dma unmapping during error recovery sunrpc: don't call sleeping functions from the notifier block callbacks svcrpc: don't leak contexts on PROC_DESTROY nfsd: fix supported attributes for acl & labels
2017-01-15Merge tag 'mac80211-for-davem-2017-01-13' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== We have a number of fixes, in part because I was late in actually sending them out - will try to do better in the future: * handle VHT opmode properly when hostapd is controlling full station state * two fixes for minimum channel width in mac80211 * don't leave SMPS set to junk in HT capabilities * fix headroom when forwarding mesh packets, recently broken by another fix that failed to take into account frame encryption * fix the TID in null-data packets indicating EOSP (end of service period) in U-APSD * prevent attempting to use (and then failing which results in crashes) TXQs on stations that aren't added to the driver yet ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-15openvswitch: maintain correct checksum state in conntrack actionsLance Richardson
When executing conntrack actions on skbuffs with checksum mode CHECKSUM_COMPLETE, the checksum must be updated to account for header pushes and pulls. Otherwise we get "hw csum failure" logs similar to this (ICMP packet received on geneve tunnel via ixgbe NIC): [ 405.740065] genev_sys_6081: hw csum failure [ 405.740106] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G I 4.10.0-rc3+ #1 [ 405.740108] Call Trace: [ 405.740110] <IRQ> [ 405.740113] dump_stack+0x63/0x87 [ 405.740116] netdev_rx_csum_fault+0x3a/0x40 [ 405.740118] __skb_checksum_complete+0xcf/0xe0 [ 405.740120] nf_ip_checksum+0xc8/0xf0 [ 405.740124] icmp_error+0x1de/0x351 [nf_conntrack_ipv4] [ 405.740132] nf_conntrack_in+0xe1/0x550 [nf_conntrack] [ 405.740137] ? find_bucket.isra.2+0x62/0x70 [openvswitch] [ 405.740143] __ovs_ct_lookup+0x95/0x980 [openvswitch] [ 405.740145] ? netif_rx_internal+0x44/0x110 [ 405.740149] ovs_ct_execute+0x147/0x4b0 [openvswitch] [ 405.740153] do_execute_actions+0x22e/0xa70 [openvswitch] [ 405.740157] ovs_execute_actions+0x40/0x120 [openvswitch] [ 405.740161] ovs_dp_process_packet+0x84/0x120 [openvswitch] [ 405.740166] ovs_vport_receive+0x73/0xd0 [openvswitch] [ 405.740168] ? udp_rcv+0x1a/0x20 [ 405.740170] ? ip_local_deliver_finish+0x93/0x1e0 [ 405.740172] ? ip_local_deliver+0x6f/0xe0 [ 405.740174] ? ip_rcv_finish+0x3a0/0x3a0 [ 405.740176] ? ip_rcv_finish+0xdb/0x3a0 [ 405.740177] ? ip_rcv+0x2a7/0x400 [ 405.740180] ? __netif_receive_skb_core+0x970/0xa00 [ 405.740185] netdev_frame_hook+0xd3/0x160 [openvswitch] [ 405.740187] __netif_receive_skb_core+0x1dc/0xa00 [ 405.740194] ? ixgbe_clean_rx_irq+0x46d/0xa20 [ixgbe] [ 405.740197] __netif_receive_skb+0x18/0x60 [ 405.740199] netif_receive_skb_internal+0x40/0xb0 [ 405.740201] napi_gro_receive+0xcd/0x120 [ 405.740204] gro_cell_poll+0x57/0x80 [geneve] [ 405.740206] net_rx_action+0x260/0x3c0 [ 405.740209] __do_softirq+0xc9/0x28c [ 405.740211] irq_exit+0xd9/0xf0 [ 405.740213] do_IRQ+0x51/0xd0 [ 405.740215] common_interrupt+0x93/0x93 Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Lance Richardson <lrichard@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13tcp: fix tcp_fastopen unaligned access complaints on sparcShannon Nelson
Fix up a data alignment issue on sparc by swapping the order of the cookie byte array field with the length field in struct tcp_fastopen_cookie, and making it a proper union to clean up the typecasting. This addresses log complaints like these: log_unaligned: 113 callbacks suppressed Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360 Kernel unaligned access at TPC[9764ac] tcp_try_fastopen+0x2ec/0x360 Kernel unaligned access at TPC[9764c8] tcp_try_fastopen+0x308/0x360 Kernel unaligned access at TPC[9764e4] tcp_try_fastopen+0x324/0x360 Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360 Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13ipv6: sr: fix several BUGs when preemption is enabledDavid Lebrun
When CONFIG_PREEMPT=y, CONFIG_IPV6=m and CONFIG_SEG6_HMAC=y, seg6_hmac_init() is called during the initialization of the ipv6 module. This causes a subsequent call to smp_processor_id() with preemption enabled, resulting in the following trace. [ 20.451460] BUG: using smp_processor_id() in preemptible [00000000] code: systemd/1 [ 20.452556] caller is debug_smp_processor_id+0x17/0x19 [ 20.453304] CPU: 0 PID: 1 Comm: systemd Not tainted 4.9.0-rc5-00973-g46738b1 #1 [ 20.454406] ffffc9000062fc18 ffffffff813607b2 0000000000000000 ffffffff81a7f782 [ 20.455528] ffffc9000062fc48 ffffffff813778dc 0000000000000000 00000000001dcf98 [ 20.456539] ffffffffa003bd08 ffffffff81af93e0 ffffc9000062fc58 ffffffff81377905 [ 20.456539] Call Trace: [ 20.456539] [<ffffffff813607b2>] dump_stack+0x63/0x7f [ 20.456539] [<ffffffff813778dc>] check_preemption_disabled+0xd1/0xe3 [ 20.456539] [<ffffffff81377905>] debug_smp_processor_id+0x17/0x19 [ 20.460260] [<ffffffffa0061f3b>] seg6_hmac_init+0xfa/0x192 [ipv6] [ 20.460260] [<ffffffffa0061ccc>] seg6_init+0x39/0x6f [ipv6] [ 20.460260] [<ffffffffa006121a>] inet6_init+0x21a/0x321 [ipv6] [ 20.460260] [<ffffffffa0061000>] ? 0xffffffffa0061000 [ 20.460260] [<ffffffff81000457>] do_one_initcall+0x8b/0x115 [ 20.460260] [<ffffffff811328a3>] do_init_module+0x53/0x1c4 [ 20.460260] [<ffffffff8110650a>] load_module+0x1153/0x14ec [ 20.460260] [<ffffffff81106a7b>] SYSC_finit_module+0x8c/0xb9 [ 20.460260] [<ffffffff81106a7b>] ? SYSC_finit_module+0x8c/0xb9 [ 20.460260] [<ffffffff81106abc>] SyS_finit_module+0x9/0xb [ 20.460260] [<ffffffff810014d1>] do_syscall_64+0x62/0x75 [ 20.460260] [<ffffffff816834f0>] entry_SYSCALL64_slow_path+0x25/0x25 Moreover, dst_cache_* functions also call smp_processor_id(), generating a similar trace. This patch uses raw_cpu_ptr() in seg6_hmac_init() rather than this_cpu_ptr() and disable preemption when using dst_cache_* functions. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13mac80211: prevent skb/txq mismatchMichal Kazior
Station structure is considered as not uploaded (to driver) until drv_sta_state() finishes. This call is however done after the structure is attached to mac80211 internal lists and hashes. This means mac80211 can lookup (and use) station structure before it is uploaded to a driver. If this happens (structure exists, but sta->uploaded is false) fast_tx path can still be taken. Deep in the fastpath call the sta->uploaded is checked against to derive "pubsta" argument for ieee80211_get_txq(). If sta->uploaded is false (and sta is actually non-NULL) ieee80211_get_txq() effectively downgraded to vif->txq. At first glance this may look innocent but coerces mac80211 into a state that is almost guaranteed (codel may drop offending skb) to crash because a station-oriented skb gets queued up on vif-oriented txq. The ieee80211_tx_dequeue() ends up looking at info->control.flags and tries to use txq->sta which in the fail case is NULL. It's probably pointless to pretend one can downgrade skb from sta-txq to vif-txq. Since downgrading unicast traffic to vif->txq must not be done there's no txq to put a frame on if sta->uploaded is false. Therefore the code is made to fall back to regular tx() op path if the described condition is hit. Only drivers using wake_tx_queue were affected. Example crash dump before fix: Unable to handle kernel paging request at virtual address ffffe26c PC is at ieee80211_tx_dequeue+0x204/0x690 [mac80211] [<bf4252a4>] (ieee80211_tx_dequeue [mac80211]) from [<bf4b1388>] (ath10k_mac_tx_push_txq+0x54/0x1c0 [ath10k_core]) [<bf4b1388>] (ath10k_mac_tx_push_txq [ath10k_core]) from [<bf4bdfbc>] (ath10k_htt_txrx_compl_task+0xd78/0x11d0 [ath10k_core]) [<bf4bdfbc>] (ath10k_htt_txrx_compl_task [ath10k_core]) [<bf51c5a4>] (ath10k_pci_napi_poll+0x54/0xe8 [ath10k_pci]) [<bf51c5a4>] (ath10k_pci_napi_poll [ath10k_pci]) from [<c0572e90>] (net_rx_action+0xac/0x160) Reported-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com> Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-13mac80211: initialize SMPS field in HT capabilitiesFelix Fietkau
ibss and mesh modes copy the ht capabilites from the band without overriding the SMPS state. Unfortunately the default value 0 for the SMPS field means static SMPS instead of disabled. This results in HT ibss and mesh setups using only single-stream rates, even though SMPS is not supposed to be active. Initialize SMPS to disabled for all bands on ieee80211_hw_register to ensure that the value is sane where it is not overriden with the real SMPS state. Reported-by: Elektra Wagenrad <onelektra@gmx.net> Signed-off-by: Felix Fietkau <nbd@nbd.name> [move VHT TODO comment to a better place] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-12svcrdma: avoid duplicate dma unmapping during error recoverySriharsha Basavapatna
In rdma_read_chunk_frmr() when ib_post_send() fails, the error code path invokes ib_dma_unmap_sg() to unmap the sg list. It then invokes svc_rdma_put_frmr() which in turn tries to unmap the same sg list through ib_dma_unmap_sg() again. This second unmap is invalid and could lead to problems when the iova being unmapped is subsequently reused. Remove the call to unmap in rdma_read_chunk_frmr() and let svc_rdma_put_frmr() handle it. Fixes: 412a15c0fe53 ("svcrdma: Port to new memory registration API") Cc: stable@vger.kernel.org Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-12sunrpc: don't call sleeping functions from the notifier block callbacksScott Mayhew
The inet6addr_chain is an atomic notifier chain, so we can't call anything that might sleep (like lock_sock)... instead of closing the socket from svc_age_temp_xprts_now (which is called by the notifier function), just have the rpc service threads do it instead. Cc: stable@vger.kernel.org Fixes: c3d4879e01be "sunrpc: Add a function to close..." Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-12svcrpc: don't leak contexts on PROC_DESTROYJ. Bruce Fields
Context expiry times are in units of seconds since boot, not unix time. The use of get_seconds() here therefore sets the expiry time decades in the future. This prevents timely freeing of contexts destroyed by client RPC_GSS_PROC_DESTROY requests. We'd still free them eventually (when the module is unloaded or the container shut down), but a lot of contexts could pile up before then. Cc: stable@vger.kernel.org Fixes: c5b29f885afe "sunrpc: use seconds since boot in expiry cache" Reported-by: Andy Adamson <andros@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-12net: ipv4: fix table id in getroute responseDavid Ahern
rtm_table is an 8-bit field while table ids are allowed up to u32. Commit 709772e6e065 ("net: Fix routing tables with id > 255 for legacy software") added the preference to set rtm_table in dumps to RT_TABLE_COMPAT if the table id is > 255. The table id returned on get route requests should do the same. Fixes: c36ba6603a11 ("net: Allow user to get table id from route lookup") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12net: lwtunnel: Handle lwtunnel_fill_encap failureDavid Ahern
Handle failure in lwtunnel_fill_encap adding attributes to skb. Fixes: 571e722676fe ("ipv4: support for fib route lwtunnel encap attributes") Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "27 fixes. There are three patches that aren't actually fixes. They're simple function renamings which are nice-to-have in mainline as ongoing net development depends on them." * akpm: (27 commits) timerfd: export defines to userspace mm/hugetlb.c: fix reservation race when freeing surplus pages mm/slab.c: fix SLAB freelist randomization duplicate entries zram: support BDI_CAP_STABLE_WRITES zram: revalidate disk under init_lock mm: support anonymous stable page mm: add documentation for page fragment APIs mm: rename __page_frag functions to __page_frag_cache, drop order from drain mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free mm, memcg: fix the active list aging for lowmem requests when memcg is enabled mm: don't dereference struct page fields of invalid pages mailmap: add codeaurora.org names for nameless email commits signal: protect SIGNAL_UNKILLABLE from unintentional clearing. mm: pmd dirty emulation in page fault handler ipc/sem.c: fix incorrect sem_lock pairing lib/Kconfig.debug: fix frv build failure mm: get rid of __GFP_OTHER_NODE mm: fix remote numa hits statistics mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done} ocfs2: fix crash caused by stale lvb with fsdlm plugin ...
2017-01-11mac80211: recalculate min channel width on VHT opmode changesJohannes Berg
When an associated station changes its VHT operating mode this can/will affect the bandwidth it's using, and consequently we must recalculate the minimum bandwidth we need to use. Failure to do so can lead to one of two scenarios: 1) we use a too high bandwidth, this is benign 2) we use a too narrow bandwidth, causing rate control and actual PHY configuration to be out of sync, which can in turn cause problems/crashes Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11mac80211: calculate min channel width correctlyJohannes Berg
In the current minimum chandef code there's an issue in that the recalculation can happen after rate control is initialized for a station that has a wider bandwidth than the current chanctx, and then rate control can immediately start using those higher rates which could cause problems. Observe that first of all that this problem is because we don't take non-associated and non-uploaded stations into account. The restriction to non-associated is quite pointless and is one of the causes for the problem described above, since the rate init will happen before the station is set to associated; no frames could actually be sent until associated, but the rate table can already contain higher rates and that might cause problems. Also, rejecting non-uploaded stations is wrong, since the rate control can select higher rates for those as well. Secondly, it's then necessary to recalculate the minimal config before initializing rate control, so that when rate control is initialized, the higher rates are already available. This can be done easily by adding the necessary function call in rate init. Change-Id: Ib9bc02d34797078db55459d196993f39dcd43070 Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11cfg80211: consider VHT opmode on station updateBeni Lev
Currently, this attribute is only fetched on station addition, but not on station change. Since this info is only present in the assoc request, with full station state support in the driver it cannot be present when the station is added. Thus, add support for changing the VHT opmode on station update if done before (or while) the station is marked as associated. After this, ignore it, since it used to be ignored. Signed-off-by: Beni Lev <beni.lev@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11mac80211: fix the TID on NDPs sent as EOSP carrierEmmanuel Grumbach
In the commit below, I forgot to translate the mac80211's AC to QoS IE order. Moreover, the condition in the if was wrong. Fix both issues. This bug would hit only with clients that didn't set all the ACs as delivery enabled. Fixes: f438ceb81d4 ("mac80211: uapsd_queues is in QoS IE order") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11mac80211: Fix headroom allocation when forwarding mesh pktCedric Izoard
This patch fix issue introduced by my previous commit that tried to ensure enough headroom was present, and instead broke it. When forwarding mesh pkt, mac80211 may also add security header, and it must therefore be taken into account in the needed headroom. Fixes: d8da0b5d64d5 ("mac80211: Ensure enough headroom when forwarding mesh pkt") Signed-off-by: Cedric Izoard <cedric.izoard@ceva-dsp.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11sctp: Fix spelling mistake: "Atempt" -> "Attempt"Colin Ian King
Trivial fix to spelling mistake in WARN_ONCE message Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11net: ipv4: Fix multipath selection with vrfDavid Ahern
fib_select_path does not call fib_select_multipath if oif is set in the flow struct. For VRF use cases oif is always set, so multipath route selection is bypassed. Use the FLOWI_FLAG_SKIP_NH_OIF to skip the oif check similar to what is done in fib_table_lookup. Add saddr and proto to the flow struct for the fib lookup done by the VRF driver to better match hash computation for a flow. Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11cgroup: move CONFIG_SOCK_CGROUP_DATA to init/KconfigArnd Bergmann
We now 'select SOCK_CGROUP_DATA' but Kconfig complains that this is not right when CONFIG_NET is disabled and there is no socket interface: warning: (CGROUP_BPF) selects SOCK_CGROUP_DATA which has unmet direct dependencies (NET) I don't know what the correct solution for this is, but simply removing the dependency on NET from SOCK_CGROUP_DATA by moving it out of the 'if NET' section avoids the warning and does not produce other build errors. Fixes: 483c4933ea09 ("cgroup: Fix CGROUP_BPF config") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11gro: use min_t() in skb_gro_reset_offset()Eric Dumazet
On 32bit arches, (skb->end - skb->data) is not 'unsigned int', so we shall use min_t() instead of min() to avoid a compiler error. Fixes: 1272ce87fa01 ("gro: Enter slow-path if there is no tailroom") Reported-by: kernel test robot <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to ↵Alexander Duyck
page_frag_free Patch series "Page fragment updates", v4. This patch series takes care of a few cleanups for the page fragments API. First we do some renames so that things are much more consistent. First we move the page_frag_ portion of the name to the front of the functions names. Secondly we split out the cache specific functions from the other page fragment functions by adding the word "cache" to the name. Finally I added a bit of documentation that will hopefully help to explain some of this. I plan to revisit this later as we get things more ironed out in the near future with the changes planned for the DMA setup to support eXpress Data Path. This patch (of 3): This patch renames the page frag functions to be more consistent with other APIs. Specifically we place the name page_frag first in the name and then have either an alloc or free call name that we append as the suffix. This makes it a bit clearer in terms of naming. In addition we drop the leading double underscores since we are technically no longer a backing interface and instead the front end that is called from the networking APIs. Link: http://lkml.kernel.org/r/20170104023854.13451.67390.stgit@localhost.localdomain Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10gro: Disable frag0 optimization on IPv6 ext headersHerbert Xu
The GRO fast path caches the frag0 address. This address becomes invalid if frag0 is modified by pskb_may_pull or its variants. So whenever that happens we must disable the frag0 optimization. This is usually done through the combination of gro_header_hard and gro_header_slow, however, the IPv6 extension header path did the pulling directly and would continue to use the GRO fast path incorrectly. This patch fixes it by disabling the fast path when we enter the IPv6 extension header path. Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address") Reported-by: Slava Shwartsman <slavash@mellanox.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>