summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-10-29Merge branch '1GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 1GbE Intel Wired LAN Driver Updates 2017-10-27 This patchset is a proposal of how the Traffic Control subsystem can be used to offload the configuration of the Credit Based Shaper (defined in the IEEE 802.1Q-2014 Section 8.6.8.2) into supported network devices. As part of this work, we've assessed previous public discussions related to TSN enabling: patches from Henrik Austad (Cisco), the presentation from Eric Mann at Linux Plumbers 2012, patches from Gangfeng Huang (National Instruments) and the current state of the OpenAVNU project (https://github.com/AVnu/OpenAvnu/). Overview ======== Time-sensitive Networking (TSN) is a set of standards that aim to address resources availability for providing bandwidth reservation and bounded latency on Ethernet based LANs. The proposal described here aims to cover mainly what is needed to enable the following standards: 802.1Qat and 802.1Qav. The initial target of this work is the Intel i210 NIC, but other controllers' datasheet were also taken into account, like the Renesas RZ/A1H RZ/A1M group and the Synopsis DesignWare Ethernet QoS controller. Proposal ======== Feature-wise, what is covered here is the configuration interfaces for HW implementations of the Credit-Based shaper (CBS, 802.1Qav). CBS is a per-queue shaper. Given that this feature is related to traffic shaping, and that the traffic control subsystem already provides a queueing discipline that offloads config into the device driver (i.e. mqprio), designing a new qdisc for the specific purpose of offloading the config for the CBS shaper seemed like a good fit. For steering traffic into the correct queues, we use the socket option SO_PRIORITY and then a mechanism to map priority to traffic classes / Tx queues. The qdisc mqprio is currently used in our tests. As for the CBS config interface, this patchset is proposing a new qdisc called 'cbs'. Its 'tc' cmd line is: $ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \ idleslope I Note that the parameters for this qdisc are the ones defined by the 802.1Q-2014 spec, so no hardware specific functionality is exposed here. Per-stream shaping, as defined by IEEE 802.1Q-2014 Section 34.6.1, is not yet covered by this proposal. v2: Merged patch 6 of the original series into patch 4 based on feedback from David Miller. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29Merge branch 'l2tp-register-sessions-atomically'David S. Miller
Guillaume Nault says: ==================== l2tp: register sessions atomically Currently l2tp_session_create() allocates a session, partially initialises it and finally registers it. It therefore exposes sessions that aren't fully initialised to the rest of the system, because pseudo-wire specific initialisation can only happen after l2tp_session_create() returns. This leads to several crashes when these sessions are used or deleted. This series starts by splitting session registration out of l2tp_session_create() (patch #1). Thus allowing pseudo-wires code to terminate the initialisation phase before registration. Then patch #2 fixes the eth pseudo-wire code. This requires protecting the session's netdevice pointer with RCU, because it still needs to be updated concurrently after the session got registered. Remaining patches take care of ppp pseudo-wires. RCU protection is needed there too, for the same reasons. This time it's the pppol2tp socket pointer that gets protected. For clarity, and since the conversion requires more modifications, introducing RCU is done in its own patch (#3). Then patch #4 only has to take care of fixing sessions initialisation and registration (and adapting part of the deletion process). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29l2tp: initialise PPP sessions before registering themGuillaume Nault
pppol2tp_connect() initialises L2TP sessions after they've been exposed to the rest of the system by l2tp_session_register(). This puts sessions into transient states that are the source of several races, in particular with session's deletion path. This patch centralises the initialisation code into pppol2tp_session_init(), which is called before the registration phase. The only field that can't be set before session registration is the pppol2tp socket pointer, which has already been converted to RCU. So pppol2tp_connect() should now be race-free. The session's .session_close() callback is now set before registration. Therefore, it's always called when l2tp_core deletes the session, even if it was created by pppol2tp_session_create() and hasn't been plugged to a pppol2tp socket yet. That'd prevent session free because the extra reference taken by pppol2tp_session_close() wouldn't be dropped by the socket's ->sk_destruct() callback (pppol2tp_session_destruct()). We could set .session_close() only while connecting a session to its pppol2tp socket, or teach pppol2tp_session_close() to avoid grabbing a reference when the session isn't connected, but that'd require adding some form of synchronisation to be race free. Instead of that, we can just let the pppol2tp socket hold a reference on the session as soon as it starts depending on it (that is, in pppol2tp_connect()). Then we don't need to utilise pppol2tp_session_close() to hold a reference at the last moment to prevent l2tp_core from dropping it. When releasing the socket, pppol2tp_release() now deletes the session using the standard l2tp_session_delete() function, instead of merely removing it from hash tables. l2tp_session_delete() drops the reference the sessions holds on itself, but also makes sure it doesn't remove a session twice. So it can safely be called, even if l2tp_core already tried, or is concurrently trying, to remove the session. Finally, pppol2tp_session_destruct() drops the reference held by the socket. Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29l2tp: protect sock pointer of struct pppol2tp_session with RCUGuillaume Nault
pppol2tp_session_create() registers sessions that can't have their corresponding socket initialised. This socket has to be created by userspace, then connected to the session by pppol2tp_connect(). Therefore, we need to protect the pppol2tp socket pointer of L2TP sessions, so that it can safely be updated when userspace is connecting or closing the socket. This will eventually allow pppol2tp_connect() to avoid generating transient states while initialising its parts of the session. To this end, this patch protects the pppol2tp socket pointer using RCU. The pppol2tp socket pointer is still set in pppol2tp_connect(), but only once we know the function isn't going to fail. It's eventually reset by pppol2tp_release(), which now has to wait for a grace period to elapse before it can drop the last reference on the socket. This ensures that pppol2tp_session_get_sock() can safely grab a reference on the socket, even after ps->sk is reset to NULL but before this operation actually gets visible from pppol2tp_session_get_sock(). The rest is standard RCU conversion: pppol2tp_recv(), which already runs in atomic context, is simply enclosed by rcu_read_lock() and rcu_read_unlock(), while other functions are converted to use pppol2tp_session_get_sock() followed by sock_put(). pppol2tp_session_setsockopt() is a special case. It used to retrieve the pppol2tp socket from the L2TP session, which itself was retrieved from the pppol2tp socket. Therefore we can just avoid dereferencing ps->sk and directly use the original socket pointer instead. With all users of ps->sk now handling NULL and concurrent updates, the L2TP ->ref() and ->deref() callbacks aren't needed anymore. Therefore, rather than converting pppol2tp_session_sock_hold() and pppol2tp_session_sock_put(), we can just drop them. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29l2tp: initialise l2tp_eth sessions before registering themGuillaume Nault
Sessions must be initialised before being made externally visible by l2tp_session_register(). Otherwise the session may be concurrently deleted before being initialised, which can confuse the deletion path and eventually lead to kernel oops. Therefore, we need to move l2tp_session_register() down in l2tp_eth_create(), but also handle the intermediate step where only the session or the netdevice has been registered. We can't just call l2tp_session_register() in ->ndo_init() because we'd have no way to properly undo this operation in ->ndo_uninit(). Instead, let's register the session and the netdevice in two different steps and protect the session's device pointer with RCU. And now that we allow the session's .dev field to be NULL, we don't need to prevent the netdevice from being removed anymore. So we can drop the dev_hold() and dev_put() calls in l2tp_eth_create() and l2tp_eth_dev_uninit(). Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29l2tp: don't register sessions in l2tp_session_create()Guillaume Nault
Sessions created by l2tp_session_create() aren't fully initialised: some pseudo-wire specific operations need to be done before making the session usable. Therefore the PPP and Ethernet pseudo-wires continue working on the returned l2tp session while it's already been exposed to the rest of the system. This can lead to various issues. In particular, the session may enter the deletion process before having been fully initialised, which will confuse the session removal code. This patch moves session registration out of l2tp_session_create(), so that callers can control when the session is exposed to the rest of the system. This is done by the new l2tp_session_register() function. Only pppol2tp_session_create() can be easily converted to avoid modifying its session after registration (the debug message is dropped in order to avoid the need for holding a reference on the session). For pppol2tp_connect() and l2tp_eth_create()), more work is needed. That'll be done in followup patches. For now, let's just register the session right after its creation, like it was done before. The only difference is that we can easily take a reference on the session before registering it, so, at least, we're sure it's not going to be freed while we're working on it. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29tcp: Remove "linux/unaligned/access_ok.h" include.David S. Miller
This causes build failures: In file included from net/ipv4/tcp_input.c:79:0: ./include/linux/unaligned/access_ok.h:7:28: error: redefinition of 'get_unaligned_le16' In file included from ./include/asm-generic/unaligned.h:17:0, from ./arch/arm/include/generated/asm/unaligned.h:1, from net/ipv4/tcp_input.c:76: ./include/linux/unaligned/le_struct.h:6:19: note: previous definition of 'get_unaligned_le16' was here In file included from net/ipv4/tcp_input.c:79:0: ./include/linux/unaligned/access_ok.h:12:28: error: redefinition of 'get_unaligned_le32' Plain "asm/access_ok.h", which is already included, is sufficient. Fixes: 60e2a7780793 ("tcp: TCP experimental option for SMC") Reported-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29cxgb3: Check and handle the dma mapping errorsArjun Vynipadath
This patch adds checks at approprate places whether *dma_map*() call has succeeded or not. Original Work by: Santosh Rastapur <santosh@chelsio.com> Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29r8169: Add support for interrupt coalesce tuning (ethtool -C)Francois Romieu
Kirr: In particular with ethtool -C <ifname> rx-usecs 0 rx-frames 0 now it is possible to disable RX delays when NIC usage requires low-latency. See this thread for context: https://www.spinics.net/lists/netdev/msg217665.html My specific case is that: We have many computers with gigabit Realtek NICs. For 2 such computers connected to a gigabit store-and-forward switch the minimum round-trip time for small pings (`ping -i 0 -w 3 -s 56 -q peer`) is ~ 30μs. However it turned out that when Ethernet frame length transitions 127 -> 128 bytes (`ping -i 0 -w 3 -s {81 -> 82} -q peer`) the lowest RTT transitions step-wise to ~ 270μs. As David Light said this is RX interrupt mitigation done by NIC which creates the latency. For workloads when low-latency is required with e.g. Intel, BCM etc NIC drivers one just uses `ethtool -C rx-usecs ...` to reduce the time NIC delays before interrupting CPU, but it turned out `ethtool -C` is not supported by r8169 driver. Like Stéphane ANCELOT I've traced the problem down to IntrMitigate being hardcoded to != 0 for our chips (we have 8168 based NICs): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/realtek/r8169.c#n5460 static void rtl_hw_start_8169(struct net_device *dev) { ... /* * Undocumented corner. Supposedly: * (TxTimer << 12) | (TxPackets << 8) | (RxTimer << 4) | RxPackets */ RTL_W16(IntrMitigate, 0x0000); https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/realtek/r8169.c#n6346 static void rtl_hw_start_8168(struct net_device *dev) { ... RTL_W16(IntrMitigate, 0x5151); and then I've also found https://www.spinics.net/lists/netdev/msg217665.html and original Francois' patch: https://www.spinics.net/lists/netdev/msg217984.html https://www.spinics.net/lists/netdev/msg218207.html So could we please finally get support for tuning r8169 interrupt coalescing in tree? (so that next poor soul who hits the problem does not need to go all the way to dig into driver sources and internet wildly and finally patch locally -RTL_W16(IntrMitigate, 0x5151); +RTL_W16(IntrMitigate, 0x5100); guessing whether it is right or not and also having to care to deploy the patch everywhere it needs to be used, etc...). To do so I've took original Francois's patch from 2012 and reworked it a bit: - updated to latest net-next.git; - adjusted scaling setup based on feedback from Hayes to pick up scaling vector depending not only on link speed but also on CPlusCmd[0:1] and to adjust CPlusCmd[0:1] correspondingly when setting timings; - improved a bit (I think so) error handling. I've tested the patch on "RTL8168d/8111d" (XID 083000c0) and with it and `ethtool -C rx-usecs 0 rx-frames 0` on both ends it improves: - minimum RTT latency: ~270μs -> ~30μs (small packet), ~330μs -> ~110μs (full 1.5K ethernet frame) - average RTT latency: ~480μs -> ~50μs (small packet), ~560μs -> ~125μs (full 1.5K ethernet frame) ( before: root@neo1:# ping -i 0 -w 3 -s 82 -q neo2 PING neo2.kirr.nexedi.com (192.168.102.21) 82(110) bytes of data. --- neo2.kirr.nexedi.com ping statistics --- 5906 packets transmitted, 5905 received, 0% packet loss, time 2999ms rtt min/avg/max/mdev = 0.274/0.485/0.607/0.026 ms, ipg/ewma 0.508/0.489 ms root@neo1:# ping -i 0 -w 3 -s 1472 -q neo2 PING neo2.kirr.nexedi.com (192.168.102.21) 1472(1500) bytes of data. --- neo2.kirr.nexedi.com ping statistics --- 5073 packets transmitted, 5073 received, 0% packet loss, time 2999ms rtt min/avg/max/mdev = 0.330/0.566/0.710/0.028 ms, ipg/ewma 0.591/0.544 ms after: root@neo1# ping -i 0 -w 3 -s 82 -q neo2 PING neo2.kirr.nexedi.com (192.168.102.21) 82(110) bytes of data. --- neo2.kirr.nexedi.com ping statistics --- 45815 packets transmitted, 45815 received, 0% packet loss, time 3000ms rtt min/avg/max/mdev = 0.036/0.051/0.368/0.010 ms, ipg/ewma 0.065/0.053 ms root@neo1:# ping -i 0 -w 3 -s 1472 -q neo2 PING neo2.kirr.nexedi.com (192.168.102.21) 1472(1500) bytes of data. --- neo2.kirr.nexedi.com ping statistics --- 21250 packets transmitted, 21250 received, 0% packet loss, time 3000ms rtt min/avg/max/mdev = 0.112/0.125/0.390/0.007 ms, ipg/ewma 0.141/0.125 ms the small -> 1.5K latency growth is understandable as it takes ~15μs to transmit 1.5K on 1Gbps on the wire and with 2 hosts and 1 switch and ICMP ECHO + ECHO reply the packet has to travel 4 ethernet segments which is already 60μs; probably something a bit else is also there as e.g. on Linux, even with `cpupower frequency-set -g performance`, on some computers I've noticed the kernel can be spending more time in software-only mode when incoming packets go in less frequently. E.g. this program can demonstrate the effect for ICMP ECHO processing: https://lab.nexedi.com/kirr/bcc/blob/43cfc13b/tools/pinglat.py (later this was found to be partly due to C-states exit latencies) ) We have this patch running in our testing setup for 1 months already without any issues observed. It remains to be clarified whether RX and TX timers use the same base. For now I've set them equally, but Francois's original patch version suggests it could be not the same. I've got no feedback at all to my original posting of this patch and questions https://www.spinics.net/lists/netdev/msg457173.html neither from Francois, nor from any people from Realtek during one month. So I suggest we simply apply it to net-next.git now. Cc: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes Wang <hayeswang@realtek.com> Cc: Realtek linux nic maintainers <nic_swsd@realtek.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Stéphane ANCELOT <sancelot@free.fr> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29Merge branch 'bridge-make-setlink-dellink-notifications-more-accurate'David S. Miller
Nikolay Aleksandrov says: ==================== bridge: make setlink/dellink notifications more accurate Before this set the bridge would generate a notification on vlan add or del even if they didn't actually do any changes, which confuses listeners and is generally not preferred. We could also lose notifications on actual changes if one adds a range of vlans and there's an error in the middle. The problem with just breaking and returning an error is that we could break existing user-space scripts which rely on the vlan delete to clear all existing entries in the specified range and ignore the non-existing errors (typically used to clear the current vlan config). So in order to make the notifications more accurate while keeping backwards compatibility we add a boolean that tracks if anything actually changed during the config calls. The vlan add is more difficult to fix because it always returns 0 even if nothing changed, but we cannot use a specific error because the drivers can return anything and we may mask it, also we'd need to update all places that directly return the add result, thus to signal that a vlan was created or updated and in order not to break overlapping vlan range add we pass down the new boolean that tracks changes to the add functions to check if anything was actually updated. v6: moved "changed" in else branch in br|nbp_vlan_add, thanks to Toshiaki Makita and retested everything again v5: fix br_vlan_add return (v1 leftover) spotted by Toshiaki Makita v4: set changed always to false in the non-vlan config case and retested v3: rebased to latest net-next and fixed non-vlan config functions reported by kbuild test bot v2: pass changed down to vlan add instead of masking errors ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29bridge: vlan: signal if anything changed on vlan addNikolay Aleksandrov
Before this patch there was no way to tell if the vlan add operation actually changed anything, thus we would always generate a notification on adds. Let's make the notifications more precise and generate them only if anything changed, so use the new bool parameter to signal that the vlan was updated. We cannot return an error because there are valid use cases that will be broken (e.g. overlapping range add) and also we can't risk masking errors due to calls into drivers for vlan add which can potentially return anything. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29bridge: netlink: make setlink/dellink notifications more accurateNikolay Aleksandrov
Before this patch we had cases that either sent notifications when there were in fact no changes (e.g. non-existent vlan delete) or didn't send notifications when there were changes (e.g. vlan add range with an error in the middle, port flags change + vlan update error). This patch sends down a boolean to the functions setlink/dellink use and if there is even a single configuration change (port flag, vlan add/del, port state) then we always send a notification. This is all done to keep backwards compatibility with the opportunistic vlan delete, where one could specify a vlan range that has missing vlans inside and still everything in that range will be cleared, this is mostly used to clear the whole vlan config with a single call, i.e. range 1-4094. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: remove unnecessary includeAlexei Starovoitov
two extra #include are not necessary in tcp.h Remove them. Fixes: 40304b2a1567 ("bpf: BPF support for sock_ops") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28Merge branch 'tcp-more-perns-sysctls'David S. Miller
Eric Dumazet says: ==================== tcp: move 12 sysctls to namespaces Ideally all TCP sysctls should be per netns. This patch series takes care of 12 sysctls. Remains the ones that need discussion : sysctl_tcp_mem, sysctl_tcp_rmem, sysctl_tcp_wmem, and sysctl_tcp_max_orphans ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_pacing_ca_ratioEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_pacing_ss_ratioEric Dumazet
Also remove an obsolete comment about TCP pacing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_invalid_ratelimitEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_autocorkingEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_min_rtt_wlenEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_min_tso_segsEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_challenge_ack_limitEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_limit_output_bytesEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_workaround_signed_windowsEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_tso_win_divisorEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_moderate_rcvbufEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_nometrics_saveEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: smsc: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "David S. Miller" <davem@davemloft.net> Cc: "yuval.shaia@oracle.com" <yuval.shaia@oracle.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Philippe Reynes <tremyfr@gmail.com> Cc: Allen Pais <allen.lkml@gmail.com> Cc: Tobias Klauser <tklauser@distanz.ch> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: packetengines: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "David S. Miller" <davem@davemloft.net> Cc: Allen Pais <allen.lkml@gmail.com> Cc: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn> Cc: Philippe Reynes <tremyfr@gmail.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: natsemi: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "David S. Miller" <davem@davemloft.net> Cc: Allen Pais <allen.lkml@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Philippe Reynes <tremyfr@gmail.com> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: mellanox: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Matan Barak <matanb@mellanox.com> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: netdev@vger.kernel.org Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: korina: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "David S. Miller" <davem@davemloft.net> Cc: Roman Yeryomin <leroi.lists@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: fealnx: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "David S. Miller" <davem@davemloft.net> Cc: "yuval.shaia@oracle.com" <yuval.shaia@oracle.com> Cc: Allen Pais <allen.lkml@gmail.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Philippe Reynes <tremyfr@gmail.com> Cc: Johannes Berg <johannes.berg@intel.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: dlink: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Denis Kirjanov <kda@linux-powerpc.org> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: chelsio/cxgb*: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Santosh Raspatur <santosh@chelsio.com> Cc: Ganesh Goudar <ganeshgr@chelsio.com> Cc: Casey Leedom <leedom@chelsio.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: appletalk/cops: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Allen Pais <allen.lkml@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Howells <dhowells@redhat.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: amd: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Allen Pais <allen.lkml@gmail.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: 8390: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28ipv6: exthdrs: use swap macro in ipv6_dest_haoGustavo A. R. Silva
make use of the swap macro and remove unnecessary variable tmp_addr. This makes the code easier to read and maintain. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28stmmac: copy unicast mac address to MAC registersBhadram Varka
Currently stmmac driver not copying the valid ethernet MAC address to MAC registers. This patch takes care of updating the MAC register with MAC address. Signed-off-by: Bhadram Varka <vbhadram@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28nfp: inform the VF driver needs to be restarted after changing the MACPablo Cascón
Add message to inform the VF MAC was changed and the need to restart the VF driver for the changes to be effective. Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28liquidio: fix kernel panic in VF driverFelix Manlunas
Doing ifconfig down on VF driver in the middle of receiving line rate traffic causes a kernel panic: LiquidIO_VF 0000:02:00.3: should not come here should not get rx when poll mode = 0 for vf BUG: unable to handle kernel NULL pointer dereference at (null) . . . Call Trace: <IRQ> ? tasklet_action+0x102/0x120 __do_softirq+0x91/0x292 irq_exit+0xb6/0xc0 do_IRQ+0x4f/0xd0 common_interrupt+0x93/0x93 </IRQ> RIP: 0010:cpuidle_enter_state+0x142/0x2f0 RSP: 0018:ffffffffa6403e20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff59 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 000000000000001f RDX: 0000000000000000 RSI: 000000002ab7519f RDI: 0000000000000000 RBP: ffffffffa6403e58 R08: 0000000000000084 R09: 0000000000000018 R10: ffffffffa6403df0 R11: 00000000000003c7 R12: 0000000000000003 R13: ffffd27ebd806800 R14: ffffffffa64d40d8 R15: 0000007be072823f cpuidle_enter+0x17/0x20 call_cpuidle+0x23/0x40 do_idle+0x18c/0x1f0 cpu_startup_entry+0x64/0x70 rest_init+0xa5/0xb0 start_kernel+0x45e/0x46b x86_64_start_reservations+0x24/0x26 x86_64_start_kernel+0x6f/0x72 secondary_startup_64+0xa5/0xa5 Code: Bad RIP value. RIP: (null) RSP: ffff9246ed003f28 CR2: 0000000000000000 ---[ end trace 92731e80f31b7d7d ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x24000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt Reason is: in the function assigned to net_device_ops->ndo_stop, the steps for bringing down the interface are done in the wrong order. The step that notifies the NIC firmware to stop forwarding packets to host is done too late. Fix it by moving that step to the beginning. Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28net: dsa: move fixed link registration helpersVivien Didelot
The new bindings (dsa2.c) and the old bindings (legacy.c) share two helpers dsa_cpu_dsa_setup and dsa_cpu_dsa_destroy, used to register or deregister a fixed PHY if a given port has a corresponding device node. Unclutter the code by moving them into two new port.c helpers, dsa_port_fixed_link_register_of and dsa_port_fixed_link_(un)register_of. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28bnxt_en: Fix randconfig build errors.Michael Chan
Fix undefined symbols when CONFIG_VLAN_8021Q or CONFIG_INET is not set. Fixes: 8c95f773b4a3 ("bnxt_en: add support for Flower based vxlan encap/decap offload") Reported-by: Jakub Kicinski <kubakici@wp.pl> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-27igb: Add support for CBS offloadAndre Guedes
This patch adds support for Credit-Based Shaper (CBS) qdisc offload from Traffic Control system. This support enable us to leverage the Forwarding and Queuing for Time-Sensitive Streams (FQTSS) features from Intel i210 Ethernet Controller. FQTSS is the former 802.1Qav standard which was merged into 802.1Q in 2014. It enables traffic prioritization and bandwidth reservation via the Credit-Based Shaper which is implemented in hardware by i210 controller. The patch introduces the igb_setup_tc() function which implements the support for CBS qdisc hardware offload in the IGB driver. CBS offload is the only traffic control offload supported by the driver at the moment. FQTSS transmission mode from i210 controller is automatically enabled by the IGB driver when the CBS is enabled for the first hardware queue. Likewise, FQTSS mode is automatically disabled when CBS is disabled for the last hardware queue. Changing FQTSS mode requires NIC reset. FQTSS feature is supported by i210 controller only. Signed-off-by: Andre Guedes <andre.guedes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Tested-by: Henrik Austad <henrik@austad.us> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-27net/sched: Add support for HW offloading for CBSVinicius Costa Gomes
This adds support for offloading the CBS algorithm to the controller, if supported. Drivers wanting to support CBS offload must implement the .ndo_setup_tc callback and handle the TC_SETUP_CBS (introduced here) type. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Henrik Austad <henrik@austad.us> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-27net/sched: Introduce Credit Based Shaper (CBS) qdiscVinicius Costa Gomes
This queueing discipline implements the shaper algorithm defined by the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L. It's primary usage is to apply some bandwidth reservation to user defined traffic classes, which are mapped to different queues via the mqprio qdisc. Only a simple software implementation is added for now. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Tested-by: Henrik Austad <henrik@austad.us> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-27net/sched: Add select_queue() class_ops for mqprioJesus Sanchez-Palencia
When replacing a child qdisc from mqprio, tc_modify_qdisc() must fetch the netdev_queue pointer that the current child qdisc is associated with before creating the new qdisc. Currently, when using mqprio as root qdisc, the kernel will end up getting the queue #0 pointer from the mqprio (root qdisc), which leaves any new child qdisc with a possibly wrong netdev_queue pointer. Implementing the Qdisc_class_ops select_queue() on mqprio fixes this issue and avoid an inconsistent state when child qdiscs are replaced. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Tested-by: Henrik Austad <henrik@austad.us> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-27net/sched: Change behavior of mq select_queue()Jesus Sanchez-Palencia
Currently, the class_ops select_queue() implementation on sch_mq returns a pointer to netdev_queue #0 when it receives and invalid qdisc id. That can be misleading since all of mq's inner qdiscs are attached to a valid netdev_queue. Here we fix that by returning NULL when a qdisc id is invalid. This is aligned with how select_queue() is implemented for sch_mqprio in the next patch on this series, keeping a consistent behavior between these two qdiscs. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Tested-by: Henrik Austad <henrik@austad.us> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-27net/sched: Check for null dev_queue on create flowJesus Sanchez-Palencia
In qdisc_alloc() the dev_queue pointer was used without any checks being performed. If qdisc_create() gets a null dev_queue pointer, it just passes it along to qdisc_alloc(), leading to a crash. That happens if a root qdisc implements select_queue() and returns a null dev_queue pointer for an "invalid handle", for example, or if the dev_queue associated with the parent qdisc is null. This patch is in preparation for the next in this series, where select_queue() is being added to mqprio and as it may return a null dev_queue. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Tested-by: Henrik Austad <henrik@austad.us> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-28liquidio: xmit_more supportIntiyaz Basha
Defer ringing the Tx doorbell if skb->xmit_more is set unless the Tx queue is full or stopped. To keep latency low, use a deferral limit of 8 packets. We chose 8 because Octeon can fetch at most 8 packets in a single PCI read, and our tests show that 8 results in low latency. Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Signed-off-by: Satanand Burla <satananda.burla@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>