summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-10-29l2tp: protect sock pointer of struct pppol2tp_session with RCUGuillaume Nault
pppol2tp_session_create() registers sessions that can't have their corresponding socket initialised. This socket has to be created by userspace, then connected to the session by pppol2tp_connect(). Therefore, we need to protect the pppol2tp socket pointer of L2TP sessions, so that it can safely be updated when userspace is connecting or closing the socket. This will eventually allow pppol2tp_connect() to avoid generating transient states while initialising its parts of the session. To this end, this patch protects the pppol2tp socket pointer using RCU. The pppol2tp socket pointer is still set in pppol2tp_connect(), but only once we know the function isn't going to fail. It's eventually reset by pppol2tp_release(), which now has to wait for a grace period to elapse before it can drop the last reference on the socket. This ensures that pppol2tp_session_get_sock() can safely grab a reference on the socket, even after ps->sk is reset to NULL but before this operation actually gets visible from pppol2tp_session_get_sock(). The rest is standard RCU conversion: pppol2tp_recv(), which already runs in atomic context, is simply enclosed by rcu_read_lock() and rcu_read_unlock(), while other functions are converted to use pppol2tp_session_get_sock() followed by sock_put(). pppol2tp_session_setsockopt() is a special case. It used to retrieve the pppol2tp socket from the L2TP session, which itself was retrieved from the pppol2tp socket. Therefore we can just avoid dereferencing ps->sk and directly use the original socket pointer instead. With all users of ps->sk now handling NULL and concurrent updates, the L2TP ->ref() and ->deref() callbacks aren't needed anymore. Therefore, rather than converting pppol2tp_session_sock_hold() and pppol2tp_session_sock_put(), we can just drop them. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29l2tp: initialise l2tp_eth sessions before registering themGuillaume Nault
Sessions must be initialised before being made externally visible by l2tp_session_register(). Otherwise the session may be concurrently deleted before being initialised, which can confuse the deletion path and eventually lead to kernel oops. Therefore, we need to move l2tp_session_register() down in l2tp_eth_create(), but also handle the intermediate step where only the session or the netdevice has been registered. We can't just call l2tp_session_register() in ->ndo_init() because we'd have no way to properly undo this operation in ->ndo_uninit(). Instead, let's register the session and the netdevice in two different steps and protect the session's device pointer with RCU. And now that we allow the session's .dev field to be NULL, we don't need to prevent the netdevice from being removed anymore. So we can drop the dev_hold() and dev_put() calls in l2tp_eth_create() and l2tp_eth_dev_uninit(). Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29l2tp: don't register sessions in l2tp_session_create()Guillaume Nault
Sessions created by l2tp_session_create() aren't fully initialised: some pseudo-wire specific operations need to be done before making the session usable. Therefore the PPP and Ethernet pseudo-wires continue working on the returned l2tp session while it's already been exposed to the rest of the system. This can lead to various issues. In particular, the session may enter the deletion process before having been fully initialised, which will confuse the session removal code. This patch moves session registration out of l2tp_session_create(), so that callers can control when the session is exposed to the rest of the system. This is done by the new l2tp_session_register() function. Only pppol2tp_session_create() can be easily converted to avoid modifying its session after registration (the debug message is dropped in order to avoid the need for holding a reference on the session). For pppol2tp_connect() and l2tp_eth_create()), more work is needed. That'll be done in followup patches. For now, let's just register the session right after its creation, like it was done before. The only difference is that we can easily take a reference on the session before registering it, so, at least, we're sure it's not going to be freed while we're working on it. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29tcp: Remove "linux/unaligned/access_ok.h" include.David S. Miller
This causes build failures: In file included from net/ipv4/tcp_input.c:79:0: ./include/linux/unaligned/access_ok.h:7:28: error: redefinition of 'get_unaligned_le16' In file included from ./include/asm-generic/unaligned.h:17:0, from ./arch/arm/include/generated/asm/unaligned.h:1, from net/ipv4/tcp_input.c:76: ./include/linux/unaligned/le_struct.h:6:19: note: previous definition of 'get_unaligned_le16' was here In file included from net/ipv4/tcp_input.c:79:0: ./include/linux/unaligned/access_ok.h:12:28: error: redefinition of 'get_unaligned_le32' Plain "asm/access_ok.h", which is already included, is sufficient. Fixes: 60e2a7780793 ("tcp: TCP experimental option for SMC") Reported-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29cxgb3: Check and handle the dma mapping errorsArjun Vynipadath
This patch adds checks at approprate places whether *dma_map*() call has succeeded or not. Original Work by: Santosh Rastapur <santosh@chelsio.com> Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29r8169: Add support for interrupt coalesce tuning (ethtool -C)Francois Romieu
Kirr: In particular with ethtool -C <ifname> rx-usecs 0 rx-frames 0 now it is possible to disable RX delays when NIC usage requires low-latency. See this thread for context: https://www.spinics.net/lists/netdev/msg217665.html My specific case is that: We have many computers with gigabit Realtek NICs. For 2 such computers connected to a gigabit store-and-forward switch the minimum round-trip time for small pings (`ping -i 0 -w 3 -s 56 -q peer`) is ~ 30μs. However it turned out that when Ethernet frame length transitions 127 -> 128 bytes (`ping -i 0 -w 3 -s {81 -> 82} -q peer`) the lowest RTT transitions step-wise to ~ 270μs. As David Light said this is RX interrupt mitigation done by NIC which creates the latency. For workloads when low-latency is required with e.g. Intel, BCM etc NIC drivers one just uses `ethtool -C rx-usecs ...` to reduce the time NIC delays before interrupting CPU, but it turned out `ethtool -C` is not supported by r8169 driver. Like Stéphane ANCELOT I've traced the problem down to IntrMitigate being hardcoded to != 0 for our chips (we have 8168 based NICs): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/realtek/r8169.c#n5460 static void rtl_hw_start_8169(struct net_device *dev) { ... /* * Undocumented corner. Supposedly: * (TxTimer << 12) | (TxPackets << 8) | (RxTimer << 4) | RxPackets */ RTL_W16(IntrMitigate, 0x0000); https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/realtek/r8169.c#n6346 static void rtl_hw_start_8168(struct net_device *dev) { ... RTL_W16(IntrMitigate, 0x5151); and then I've also found https://www.spinics.net/lists/netdev/msg217665.html and original Francois' patch: https://www.spinics.net/lists/netdev/msg217984.html https://www.spinics.net/lists/netdev/msg218207.html So could we please finally get support for tuning r8169 interrupt coalescing in tree? (so that next poor soul who hits the problem does not need to go all the way to dig into driver sources and internet wildly and finally patch locally -RTL_W16(IntrMitigate, 0x5151); +RTL_W16(IntrMitigate, 0x5100); guessing whether it is right or not and also having to care to deploy the patch everywhere it needs to be used, etc...). To do so I've took original Francois's patch from 2012 and reworked it a bit: - updated to latest net-next.git; - adjusted scaling setup based on feedback from Hayes to pick up scaling vector depending not only on link speed but also on CPlusCmd[0:1] and to adjust CPlusCmd[0:1] correspondingly when setting timings; - improved a bit (I think so) error handling. I've tested the patch on "RTL8168d/8111d" (XID 083000c0) and with it and `ethtool -C rx-usecs 0 rx-frames 0` on both ends it improves: - minimum RTT latency: ~270μs -> ~30μs (small packet), ~330μs -> ~110μs (full 1.5K ethernet frame) - average RTT latency: ~480μs -> ~50μs (small packet), ~560μs -> ~125μs (full 1.5K ethernet frame) ( before: root@neo1:# ping -i 0 -w 3 -s 82 -q neo2 PING neo2.kirr.nexedi.com (192.168.102.21) 82(110) bytes of data. --- neo2.kirr.nexedi.com ping statistics --- 5906 packets transmitted, 5905 received, 0% packet loss, time 2999ms rtt min/avg/max/mdev = 0.274/0.485/0.607/0.026 ms, ipg/ewma 0.508/0.489 ms root@neo1:# ping -i 0 -w 3 -s 1472 -q neo2 PING neo2.kirr.nexedi.com (192.168.102.21) 1472(1500) bytes of data. --- neo2.kirr.nexedi.com ping statistics --- 5073 packets transmitted, 5073 received, 0% packet loss, time 2999ms rtt min/avg/max/mdev = 0.330/0.566/0.710/0.028 ms, ipg/ewma 0.591/0.544 ms after: root@neo1# ping -i 0 -w 3 -s 82 -q neo2 PING neo2.kirr.nexedi.com (192.168.102.21) 82(110) bytes of data. --- neo2.kirr.nexedi.com ping statistics --- 45815 packets transmitted, 45815 received, 0% packet loss, time 3000ms rtt min/avg/max/mdev = 0.036/0.051/0.368/0.010 ms, ipg/ewma 0.065/0.053 ms root@neo1:# ping -i 0 -w 3 -s 1472 -q neo2 PING neo2.kirr.nexedi.com (192.168.102.21) 1472(1500) bytes of data. --- neo2.kirr.nexedi.com ping statistics --- 21250 packets transmitted, 21250 received, 0% packet loss, time 3000ms rtt min/avg/max/mdev = 0.112/0.125/0.390/0.007 ms, ipg/ewma 0.141/0.125 ms the small -> 1.5K latency growth is understandable as it takes ~15μs to transmit 1.5K on 1Gbps on the wire and with 2 hosts and 1 switch and ICMP ECHO + ECHO reply the packet has to travel 4 ethernet segments which is already 60μs; probably something a bit else is also there as e.g. on Linux, even with `cpupower frequency-set -g performance`, on some computers I've noticed the kernel can be spending more time in software-only mode when incoming packets go in less frequently. E.g. this program can demonstrate the effect for ICMP ECHO processing: https://lab.nexedi.com/kirr/bcc/blob/43cfc13b/tools/pinglat.py (later this was found to be partly due to C-states exit latencies) ) We have this patch running in our testing setup for 1 months already without any issues observed. It remains to be clarified whether RX and TX timers use the same base. For now I've set them equally, but Francois's original patch version suggests it could be not the same. I've got no feedback at all to my original posting of this patch and questions https://www.spinics.net/lists/netdev/msg457173.html neither from Francois, nor from any people from Realtek during one month. So I suggest we simply apply it to net-next.git now. Cc: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes Wang <hayeswang@realtek.com> Cc: Realtek linux nic maintainers <nic_swsd@realtek.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Stéphane ANCELOT <sancelot@free.fr> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29Merge branch 'bridge-make-setlink-dellink-notifications-more-accurate'David S. Miller
Nikolay Aleksandrov says: ==================== bridge: make setlink/dellink notifications more accurate Before this set the bridge would generate a notification on vlan add or del even if they didn't actually do any changes, which confuses listeners and is generally not preferred. We could also lose notifications on actual changes if one adds a range of vlans and there's an error in the middle. The problem with just breaking and returning an error is that we could break existing user-space scripts which rely on the vlan delete to clear all existing entries in the specified range and ignore the non-existing errors (typically used to clear the current vlan config). So in order to make the notifications more accurate while keeping backwards compatibility we add a boolean that tracks if anything actually changed during the config calls. The vlan add is more difficult to fix because it always returns 0 even if nothing changed, but we cannot use a specific error because the drivers can return anything and we may mask it, also we'd need to update all places that directly return the add result, thus to signal that a vlan was created or updated and in order not to break overlapping vlan range add we pass down the new boolean that tracks changes to the add functions to check if anything was actually updated. v6: moved "changed" in else branch in br|nbp_vlan_add, thanks to Toshiaki Makita and retested everything again v5: fix br_vlan_add return (v1 leftover) spotted by Toshiaki Makita v4: set changed always to false in the non-vlan config case and retested v3: rebased to latest net-next and fixed non-vlan config functions reported by kbuild test bot v2: pass changed down to vlan add instead of masking errors ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29bridge: vlan: signal if anything changed on vlan addNikolay Aleksandrov
Before this patch there was no way to tell if the vlan add operation actually changed anything, thus we would always generate a notification on adds. Let's make the notifications more precise and generate them only if anything changed, so use the new bool parameter to signal that the vlan was updated. We cannot return an error because there are valid use cases that will be broken (e.g. overlapping range add) and also we can't risk masking errors due to calls into drivers for vlan add which can potentially return anything. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29bridge: netlink: make setlink/dellink notifications more accurateNikolay Aleksandrov
Before this patch we had cases that either sent notifications when there were in fact no changes (e.g. non-existent vlan delete) or didn't send notifications when there were changes (e.g. vlan add range with an error in the middle, port flags change + vlan update error). This patch sends down a boolean to the functions setlink/dellink use and if there is even a single configuration change (port flag, vlan add/del, port state) then we always send a notification. This is all done to keep backwards compatibility with the opportunistic vlan delete, where one could specify a vlan range that has missing vlans inside and still everything in that range will be cleared, this is mostly used to clear the whole vlan config with a single call, i.e. range 1-4094. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28Merge tag 'kbuild-fixes-v4.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - fix O= building on dash - remove unused dependency in Makefile - fix default of a choice in Kconfig - fix typos and documentation style - fix command options unrecognized by sparse * tag 'kbuild-fixes-v4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: clang: fix build failures with sparse check kbuild doc: a bundle of fixes on makefiles.txt Makefile: kselftest: fix grammar typo kbuild: Fix optimization level choice default kbuild: drop unused symverfile in Makefile.modpost kbuild: revert $(realpath ...) to $(shell cd ... && /bin/pwd)
2017-10-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - fix gtco tablet driver, tightening parsing of HID descriptors - add ACPI ID added to Elan driver to be able to handle touchpads found in Lenovo Ideapad 320/520 - fix the Symaptics RMI4 driver to adjust handling of buttons * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: synaptics-rmi4 - limit the range of what GPIOs are buttons Input: gtco - fix potential out-of-bound access Input: elan_i2c - add ELAN0611 to the ACPI table
2017-10-28Merge tag 'pci-v4.14-fixes-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "Move alpha PCI IRQ map/swizzle functions out of initdata to fix regression from PCI core IRQ mapping changes (Lorenzo Pieralisi)" * tag 'pci-v4.14-fixes-6' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: alpha/PCI: Move pci_map_irq()/pci_swizzle() out of initdata
2017-10-28Merge tag 'drm-fixes-for-v4.14-rc7' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Two amd fixes, one i915 core and a few i915 GVT fixes, things seem fairly quiet" * tag 'drm-fixes-for-v4.14-rc7' of git://people.freedesktop.org/~airlied/linux: drm/i915/gvt: Adding ACTHD mmio read handler drm/i915/gvt: Extract mmio_read_from_hw() common function drm/i915/gvt: Refine MMIO_RING_F() drm/i915/gvt: properly check per_ctx bb valid state drm/i915/perf: fix perf enable/disable ioctls with 32bits userspace drm/amd/amdgpu: Remove workaround check for UVD6 on APUs drm/amd/powerplay: fix uninitialized variable
2017-10-28Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Six fixes for mostly minor issues, most of which have small race windows for occurring" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: Suppress a kernel warning in case the prep function returns BLKPREP_DEFER scsi: sg: Re-fix off by one in sg_fill_request_table() scsi: aacraid: Fix controller initialization failure scsi: hpsa: Fix configured_logical_drive_count·check scsi: qla2xxx: Initialize Work element before requesting IRQs scsi: zfcp: fix erp_action use-before-initialize in REC action trace
2017-10-28assoc_array: Fix a buggy node-splitting caseDavid Howells
This fixes CVE-2017-12193. Fix a case in the assoc_array implementation in which a new leaf is added that needs to go into a node that happens to be full, where the existing leaves in that node cluster together at that level to the exclusion of new leaf. What needs to happen is that the existing leaves get moved out to a new node, N1, at level + 1 and the existing node needs replacing with one, N0, that has pointers to the new leaf and to N1. The code that tries to do this gets this wrong in two ways: (1) The pointer that should've pointed from N0 to N1 is set to point recursively to N0 instead. (2) The backpointer from N0 needs to be set correctly in the case N0 is either the root node or reached through a shortcut. Fix this by removing this path and using the split_node path instead, which achieves the same end, but in a more general way (thanks to Eric Biggers for spotting the redundancy). The problem manifests itself as: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: assoc_array_apply_edit+0x59/0xe5 Fixes: 3cb989501c26 ("Add a generic associative array implementation.") Reported-and-tested-by: WU Fan <u3536072@connect.hku.hk> Signed-off-by: David Howells <dhowells@redhat.com> Cc: stable@vger.kernel.org [v3.13-rc1+] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-28Merge tag '4.14-smb3-fixes-for-stable' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Various SMB3 fixes for 4.14 and stable" * tag '4.14-smb3-fixes-for-stable' of git://git.samba.org/sfrench/cifs-2.6: SMB3: Validate negotiate request must always be signed SMB: fix validate negotiate info uninitialised memory use SMB: fix leak of validate negotiate info response buffer CIFS: Fix NULL pointer deref on SMB2_tcon() failure CIFS: do not send invalid input buffer on QUERY_INFO requests cifs: Select all required crypto modules CIFS: SMBD: Fix the definition for SMB2_CHANNEL_RDMA_V1_INVALIDATE cifs: handle large EA requests more gracefully in smb2+ Fix encryption labels and lengths for SMB3.1.1
2017-10-28Merge branch 'overlayfs-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "Fix several issues, most of them introduced in the last release" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: do not cleanup unsupported index entries ovl: handle ENOENT on index lookup ovl: fix EIO from lookup of non-indexed upper ovl: Return -ENOMEM if an allocation fails ovl_lookup() ovl: add NULL check in ovl_alloc_inode
2017-10-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fix from Miklos Szeredi: "This fixes a longstanding bug, which can be triggered by interrupting a directory reading syscall" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix READDIRPLUS skipping an entry
2017-10-28tcp: remove unnecessary includeAlexei Starovoitov
two extra #include are not necessary in tcp.h Remove them. Fixes: 40304b2a1567 ("bpf: BPF support for sock_ops") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28Merge branch 'tcp-more-perns-sysctls'David S. Miller
Eric Dumazet says: ==================== tcp: move 12 sysctls to namespaces Ideally all TCP sysctls should be per netns. This patch series takes care of 12 sysctls. Remains the ones that need discussion : sysctl_tcp_mem, sysctl_tcp_rmem, sysctl_tcp_wmem, and sysctl_tcp_max_orphans ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_pacing_ca_ratioEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_pacing_ss_ratioEric Dumazet
Also remove an obsolete comment about TCP pacing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_invalid_ratelimitEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_autocorkingEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_min_rtt_wlenEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_min_tso_segsEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_challenge_ack_limitEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_limit_output_bytesEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_workaround_signed_windowsEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_tso_win_divisorEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_moderate_rcvbufEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: Namespace-ify sysctl_tcp_nometrics_saveEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tap: reference to KVA of an unloaded module causes kernel panicGirish Moodalbail
The commit 9a393b5d5988 ("tap: tap as an independent module") created a separate tap module that implements tap functionality and exports interfaces that will be used by macvtap and ipvtap modules to create create respective tap devices. However, that patch introduced a regression wherein the modules macvtap and ipvtap can be removed (through modprobe -r) while there are applications using the respective /dev/tapX devices. These applications cause kernel to hold reference to /dev/tapX through 'struct cdev macvtap_cdev' and 'struct cdev ipvtap_dev' defined in macvtap and ipvtap modules respectively. So, when the application is later closed the kernel panics because we are referencing KVA that is present in the unloaded modules. ----------8<------- Example ----------8<---------- $ sudo ip li add name mv0 link enp7s0 type macvtap $ sudo ip li show mv0 |grep mv0| awk -e '{print $1 $2}' 14:mv0@enp7s0: $ cat /dev/tap14 & $ lsmod |egrep -i 'tap|vlan' macvtap 16384 0 macvlan 24576 1 macvtap tap 24576 3 macvtap $ sudo modprobe -r macvtap $ fg cat /dev/tap14 ^C <...system panics...> BUG: unable to handle kernel paging request at ffffffffa038c500 IP: cdev_put+0xf/0x30 ----------8<-----------------8<---------- The fix is to set cdev.owner to the module that creates the tap device (either macvtap or ipvtap). With this set, the operations (in fs/char_dev.c) on char device holds and releases the module through cdev_get() and cdev_put() and will not allow the module to unload prematurely. Fixes: 9a393b5d5988ea4e (tap: tap as an independent module) Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: smsc: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "David S. Miller" <davem@davemloft.net> Cc: "yuval.shaia@oracle.com" <yuval.shaia@oracle.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Philippe Reynes <tremyfr@gmail.com> Cc: Allen Pais <allen.lkml@gmail.com> Cc: Tobias Klauser <tklauser@distanz.ch> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: packetengines: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "David S. Miller" <davem@davemloft.net> Cc: Allen Pais <allen.lkml@gmail.com> Cc: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn> Cc: Philippe Reynes <tremyfr@gmail.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: natsemi: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "David S. Miller" <davem@davemloft.net> Cc: Allen Pais <allen.lkml@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Philippe Reynes <tremyfr@gmail.com> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: mellanox: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Matan Barak <matanb@mellanox.com> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: netdev@vger.kernel.org Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: korina: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "David S. Miller" <davem@davemloft.net> Cc: Roman Yeryomin <leroi.lists@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: fealnx: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "David S. Miller" <davem@davemloft.net> Cc: "yuval.shaia@oracle.com" <yuval.shaia@oracle.com> Cc: Allen Pais <allen.lkml@gmail.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Philippe Reynes <tremyfr@gmail.com> Cc: Johannes Berg <johannes.berg@intel.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: dlink: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Denis Kirjanov <kda@linux-powerpc.org> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: chelsio/cxgb*: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Santosh Raspatur <santosh@chelsio.com> Cc: Ganesh Goudar <ganeshgr@chelsio.com> Cc: Casey Leedom <leedom@chelsio.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: appletalk/cops: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Allen Pais <allen.lkml@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Howells <dhowells@redhat.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: amd: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Allen Pais <allen.lkml@gmail.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28drivers/net: 8390: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tcp: refresh tp timestamp before tcp_mtu_probe()Eric Dumazet
In the unlikely event tcp_mtu_probe() is sending a packet, we want tp->tcp_mstamp being as accurate as possible. This means we need to call tcp_mstamp_refresh() a bit earlier in tcp_write_xmit(). Fixes: 385e20706fac ("tcp: use tp->tcp_mstamp in output path") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28ipv6: exthdrs: use swap macro in ipv6_dest_haoGustavo A. R. Silva
make use of the swap macro and remove unnecessary variable tmp_addr. This makes the code easier to read and maintain. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tuntap: properly align skb->head before building skbJason Wang
An unaligned alloc_frag->offset caused by previous allocation will result an unaligned skb->head. This will lead unaligned skb_shared_info and then unaligned dataref which requires to be aligned for accessing on some architecture. Fix this by aligning alloc_frag->offset before the frag refilling. Fixes: 0bbd7dad34f8 ("tun: make tun_build_skb() thread safe") Cc: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Cc: Wei Wei <dotweiba@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Reported-by: Wei Wei <dotweiba@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28stmmac: copy unicast mac address to MAC registersBhadram Varka
Currently stmmac driver not copying the valid ethernet MAC address to MAC registers. This patch takes care of updating the MAC register with MAC address. Signed-off-by: Bhadram Varka <vbhadram@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28nfp: inform the VF driver needs to be restarted after changing the MACPablo Cascón
Add message to inform the VF MAC was changed and the need to restart the VF driver for the changes to be effective. Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28liquidio: fix kernel panic in VF driverFelix Manlunas
Doing ifconfig down on VF driver in the middle of receiving line rate traffic causes a kernel panic: LiquidIO_VF 0000:02:00.3: should not come here should not get rx when poll mode = 0 for vf BUG: unable to handle kernel NULL pointer dereference at (null) . . . Call Trace: <IRQ> ? tasklet_action+0x102/0x120 __do_softirq+0x91/0x292 irq_exit+0xb6/0xc0 do_IRQ+0x4f/0xd0 common_interrupt+0x93/0x93 </IRQ> RIP: 0010:cpuidle_enter_state+0x142/0x2f0 RSP: 0018:ffffffffa6403e20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff59 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 000000000000001f RDX: 0000000000000000 RSI: 000000002ab7519f RDI: 0000000000000000 RBP: ffffffffa6403e58 R08: 0000000000000084 R09: 0000000000000018 R10: ffffffffa6403df0 R11: 00000000000003c7 R12: 0000000000000003 R13: ffffd27ebd806800 R14: ffffffffa64d40d8 R15: 0000007be072823f cpuidle_enter+0x17/0x20 call_cpuidle+0x23/0x40 do_idle+0x18c/0x1f0 cpu_startup_entry+0x64/0x70 rest_init+0xa5/0xb0 start_kernel+0x45e/0x46b x86_64_start_reservations+0x24/0x26 x86_64_start_kernel+0x6f/0x72 secondary_startup_64+0xa5/0xa5 Code: Bad RIP value. RIP: (null) RSP: ffff9246ed003f28 CR2: 0000000000000000 ---[ end trace 92731e80f31b7d7d ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x24000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt Reason is: in the function assigned to net_device_ops->ndo_stop, the steps for bringing down the interface are done in the wrong order. The step that notifies the NIC firmware to stop forwarding packets to host is done too late. Fix it by moving that step to the beginning. Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>