summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-06-03net/ncsi: Avoid GFP_KERNEL in response handlerSamuel Mendoza-Jonas
ncsi_rsp_handler_gc() allocates the filter arrays using GFP_KERNEL in softirq context, causing the below backtrace. This allocation is only a few dozen bytes during probing so allocate with GFP_ATOMIC instead. [ 42.813372] BUG: sleeping function called from invalid context at mm/slab.h:416 [ 42.820900] in_atomic(): 1, irqs_disabled(): 0, pid: 213, name: kworker/0:1 [ 42.827893] INFO: lockdep is turned off. [ 42.832023] CPU: 0 PID: 213 Comm: kworker/0:1 Tainted: G W 4.13.16-01441-gad99b38 #65 [ 42.841007] Hardware name: Generic DT based system [ 42.845966] Workqueue: events ncsi_dev_work [ 42.850251] [<8010a494>] (unwind_backtrace) from [<80107510>] (show_stack+0x20/0x24) [ 42.858046] [<80107510>] (show_stack) from [<80612770>] (dump_stack+0x20/0x28) [ 42.865309] [<80612770>] (dump_stack) from [<80148248>] (___might_sleep+0x230/0x2b0) [ 42.873241] [<80148248>] (___might_sleep) from [<80148334>] (__might_sleep+0x6c/0xac) [ 42.881129] [<80148334>] (__might_sleep) from [<80240d6c>] (__kmalloc+0x210/0x2fc) [ 42.888737] [<80240d6c>] (__kmalloc) from [<8060ad54>] (ncsi_rsp_handler_gc+0xd0/0x170) [ 42.896770] [<8060ad54>] (ncsi_rsp_handler_gc) from [<8060b454>] (ncsi_rcv_rsp+0x16c/0x1d4) [ 42.905314] [<8060b454>] (ncsi_rcv_rsp) from [<804d86c8>] (__netif_receive_skb_core+0x3c8/0xb50) [ 42.914158] [<804d86c8>] (__netif_receive_skb_core) from [<804d96cc>] (__netif_receive_skb+0x20/0x7c) [ 42.923420] [<804d96cc>] (__netif_receive_skb) from [<804de4b0>] (netif_receive_skb_internal+0x78/0x6a4) [ 42.932931] [<804de4b0>] (netif_receive_skb_internal) from [<804df980>] (netif_receive_skb+0x78/0x158) [ 42.942292] [<804df980>] (netif_receive_skb) from [<8042f204>] (ftgmac100_poll+0x43c/0x4e8) [ 42.950855] [<8042f204>] (ftgmac100_poll) from [<804e094c>] (net_rx_action+0x278/0x4c4) [ 42.958918] [<804e094c>] (net_rx_action) from [<801016a8>] (__do_softirq+0xe0/0x4c4) [ 42.966716] [<801016a8>] (__do_softirq) from [<8011cd9c>] (do_softirq.part.4+0x50/0x78) [ 42.974756] [<8011cd9c>] (do_softirq.part.4) from [<8011cebc>] (__local_bh_enable_ip+0xf8/0x11c) [ 42.983579] [<8011cebc>] (__local_bh_enable_ip) from [<804dde08>] (__dev_queue_xmit+0x260/0x890) [ 42.992392] [<804dde08>] (__dev_queue_xmit) from [<804df1f0>] (dev_queue_xmit+0x1c/0x20) [ 43.000689] [<804df1f0>] (dev_queue_xmit) from [<806099c0>] (ncsi_xmit_cmd+0x1c0/0x244) [ 43.008763] [<806099c0>] (ncsi_xmit_cmd) from [<8060dc14>] (ncsi_dev_work+0x2e0/0x4c8) [ 43.016725] [<8060dc14>] (ncsi_dev_work) from [<80133dfc>] (process_one_work+0x214/0x6f8) [ 43.024940] [<80133dfc>] (process_one_work) from [<80134328>] (worker_thread+0x48/0x558) [ 43.033070] [<80134328>] (worker_thread) from [<8013ba80>] (kthread+0x130/0x174) [ 43.040506] [<8013ba80>] (kthread) from [<80102950>] (ret_from_fork+0x14/0x24) Fixes: 062b3e1b6d4f ("net/ncsi: Refactor MAC, VLAN filters") Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03net: netcp: ethss: remove unnecessary pointer set to NULLYueHaibing
If statement has make sure the 'slave->phy' is NULL Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03net/smc: fix error return code in smc_setsockopt()Wei Yongjun
Fix to return error code -EINVAL instead of 0 if optlen is invalid. Fixes: 01d2f7e2cdd3 ("net/smc: sockopts TCP_NODELAY and TCP_CORK") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03net/mlx5: Make function mlx5_fpga_tls_send_teardown_cmd() staticWei Yongjun
Fixes the following sparse warning: drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c:199:6: warning: symbol 'mlx5_fpga_tls_send_teardown_cmd' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03hv_netvsc: fix error return code in netvsc_probe()Wei Yongjun
Fix to return a negative error code from the failover register fail error handling case instead of 0, as done elsewhere in this function. Fixes: 1ff78076d8dd ("netvsc: refactor notifier/event handling code to use the failover framework") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Filling in the padding slot in the bpf structure as a bug fix in 'ne' overlapped with actually using that padding area for something in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03net: phy: consider PHY_IGNORE_INTERRUPT in state machine PHY_NOLINK handlingHeiner Kallweit
We can bail out immediately also in case of PHY_IGNORE_INTERRUPT because phy_mac_interupt() informs us once the link is up. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree: 1) Get rid of nf_sk_is_transparent(), use inet_sk_transparent() instead. From Máté Eckl. 2) Move shared tproxy infrastructure to nf_tproxy_ipv4 and nf_tproxy_ipv6. Also from Máté. 3) Add hashtable to speed up chain lookups by name, from Florian Westphal. 4) Patch series to add connlimit support reusing part of the nf_conncount infrastructure. This includes preparation changes such passing context to the object and expression destroy interface; garbage collection for expressions embedded into set elements, and the introduction of the clone_destroy interface for expressions. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Infinite loop in _decode_session6(), from Eric Dumazet. 2) Pass correct argument to nla_strlcpy() in netfilter, also from Eric Dumazet. 3) Out of bounds memory access in ipv6 srh code, from Mathieu Xhonneux. 4) NULL deref in XDP_REDIRECT handling of tun driver, from Toshiaki Makita. 5) Incorrect idr release in cls_flower, from Paul Blakey. 6) Probe error handling fix in davinci_emac, from Dan Carpenter. 7) Memory leak in XPS configuration, from Alexander Duyck. 8) Use after free with cloned sockets in kcm, from Kirill Tkhai. 9) MTU handling fixes fo ip_tunnel and ip6_tunnel, from Nicolas Dichtel. 10) Fix UAPI hole in bpf data structure for 32-bit compat applications, from Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits) bpf: fix uapi hole for 32 bit compat applications net: usb: cdc_mbim: add flag FLAG_SEND_ZLP ip6_tunnel: remove magic mtu value 0xFFF8 ip_tunnel: restore binding to ifaces with a large mtu net: dsa: b53: Add BCM5389 support kcm: Fix use-after-free caused by clonned sockets net-sysfs: Fix memory leak in XPS configuration ixgbe: fix parsing of TC actions for HW offload net: ethernet: davinci_emac: fix error handling in probe() net/ncsi: Fix array size in dumpit handler cls_flower: Fix incorrect idr release when failing to modify rule net/sonic: Use dma_mapping_error() xfrm Fix potential error pointer dereference in xfrm_bundle_create. vhost_net: flush batched heads before trying to busy polling tun: Fix NULL pointer dereference in XDP redirect be2net: Fix error detection logic for BE3 net: qmi_wwan: Add Netgear Aircard 779S mlxsw: spectrum: Forbid creation of VLAN 1 over port/LAG atm: zatm: fix memcmp casting iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs ...
2018-06-03netfilter: nf_tables: handle chain name lookups via rhltableFlorian Westphal
If there is a significant amount of chains list search is too slow, so add an rhlist table for this. This speeds up ruleset loading: for every new rule we have to check if the name already exists in current generation. We need to be able to cope with duplicate chain names in case a transaction drops the nfnl mutex (for request_module) and the abort of this old transaction is still pending. The list is kept -- we need a way to iterate chains even if hash resize is in progress without missing an entry. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-03netfilter: nf_tables: add connlimit supportPablo Neira Ayuso
This features which allows you to limit the maximum number of connections per arbitrary key. The connlimit expression is stateful, therefore it can be used from meters to dynamically populate a set, this provides a mapping to the iptables' connlimit match. This patch also comes that allows you define static connlimit policies. This extension depends on the nf_conncount infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-02Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fix from James Bottomley: "Eve of merge window fix: The original code was so bogus as to be casting the wrong generic device to an rport and proceeding to take actions based on the bogus values it found. Fortunately it seems the location that is dereferenced always exists, so the code hasn't oopsed yet, but it certainly annoys the memory checkers" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: scsi_transport_srp: Fix shost to rport translation
2018-06-02Merge tag 'drm-fixes-for-v4.17-rc8' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "A few final fixes: i915: - fix for potential Spectre vector in the new query uAPI - fix NULL pointer deref (FDO #106559) - DMI fix to hide LVDS for Radiant P845 (FDO #105468) amdgpu: - suspend/resume DC regression fix - underscan flicker fix on fiji - gamma setting fix after dpms omap: - fix oops regression core: - fix PSR timing dw-hdmi: - fix oops regression" * tag 'drm-fixes-for-v4.17-rc8' of git://people.freedesktop.org/~airlied/linux: drm/amd/display: Update color props when modeset is required drm/amd/display: Make atomic-check validate underscan changes drm/bridge/synopsys: dw-hdmi: fix dw_hdmi_setup_rx_sense drm/amd/display: Fix BUG_ON during CRTC atomic check update drm/i915/query: nospec expects no more than an unsigned long drm/i915/query: Protect tainted function pointer lookup drm/i915/lvds: Move acpi lid notification registration to registration phase drm/i915: Disable LVDS on Radiant P845 drm/omap: fix NULL deref crash with SDI displays drm/psr: Fix missed entry in PSR setup time table.
2018-06-03netfilter: nf_tables: add destroy_clone expressionPablo Neira Ayuso
Before this patch, cloned expressions are released via ->destroy. This is a problem for the new connlimit expression since the ->destroy path drop a reference on the conntrack modules and it unregisters hooks. The new ->destroy_clone provides context that this expression is being released from the packet path, so it is mirroring ->clone(), where neither module reference is dropped nor hooks need to be unregistered - because this done from the control plane path from the ->init() path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-03netfilter: nf_tables: garbage collection for stateful expressionsPablo Neira Ayuso
Use garbage collector to schedule removal of elements based of feedback from expression that this element comes with. Therefore, the garbage collector is not guided by timeout expirations in this new mode. The new connlimit expression sets on the NFT_EXPR_GC flag to enable this behaviour, the dynset expression needs to explicitly enable the garbage collector via set->ops->gc_init call. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-03netfilter: nf_tables: pass ctx to nf_tables_expr_destroy()Pablo Neira Ayuso
nft_set_elem_destroy() can be called from call_rcu context. Annotate netns and table in set object so we can populate the context object. Moreover, pass context object to nf_tables_set_elem_destroy() from the commit phase, since it is already available from there. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-03netfilter: nf_conncount: expose connection list interfacePablo Neira Ayuso
This patch provides an interface to maintain the list of connections and the lookup function to obtain the number of connections in the list. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-03netfilter: nf_tables: pass context to object destroy indirectionPablo Neira Ayuso
The new connlimit object needs this to properly deal with conntrack dependencies. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-03netfilter: Libify xt_TPROXYMáté Eckl
The extracted functions will likely be usefull to implement tproxy support in nf_tables. Extrancted functions: - nf_tproxy_sk_is_transparent - nf_tproxy_laddr4 - nf_tproxy_handle_time_wait4 - nf_tproxy_get_sock_v4 - nf_tproxy_laddr6 - nf_tproxy_handle_time_wait6 - nf_tproxy_get_sock_v6 (nf_)tproxy_handle_time_wait6 also needed some refactor as its current implementation was xtables-specific. Signed-off-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-03netfilter: Decrease code duplication regarding transparent socket optionMáté Eckl
There is a function in include/net/netfilter/nf_socket.h to decide if a socket has IP(V6)_TRANSPARENT socket option set or not. However this does the same as inet_sk_transparent() in include/net/tcp.h include/net/tcp.h:1733 /* This helper checks if socket has IP_TRANSPARENT set */ static inline bool inet_sk_transparent(const struct sock *sk) { switch (sk->sk_state) { case TCP_TIME_WAIT: return inet_twsk(sk)->tw_transparent; case TCP_NEW_SYN_RECV: return inet_rsk(inet_reqsk(sk))->no_srccheck; } return inet_sk(sk)->transparent; } tproxy_sk_is_transparent has also been refactored to use this function instead of reimplementing it. Signed-off-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-03Merge branch 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes Two last minute DC fixes for 4.17. A fix for underscan on fiji and a fix for gamma settings getting after dpms. * 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux: drm/amd/display: Update color props when modeset is required drm/amd/display: Make atomic-check validate underscan changes
2018-06-02Merge tag 'mips_fixes_4.17_3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from James Hogan: "A final few MIPS fixes for 4.17: - drop Lantiq gphy reboot/remove reset (4.14) - prctl(PR_SET_FP_MODE): Disallow PRE without FR (4.0) - ptrace(PTRACE_PEEKUSR): Fix 64-bit FGRs (3.15)" * tag 'mips_fixes_4.17_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: ptrace: Fix PTRACE_PEEKUSR requests for 64-bit FGRs MIPS: prctl: Disallow FRE without FR with PR_SET_FP_MODE requests MIPS: lantiq: gphy: Drop reboot/remove reset asserts
2018-06-02Merge tag 'vfio-v4.17' of git://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO fix from Alex Williamson: "Revert a pfn page mapping optimization identified as introducing a bad page state regression (Alex Williamson)" * tag 'vfio-v4.17' of git://github.com/awilliam/linux-vfio: Revert "vfio/type1: Improve memory pinning process for raw PFN mapping"
2018-06-02Merge tag 'char-misc-4.17-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are four small bugfixes for some char/misc drivers. Well, really three fixes and one fix for one of those fixes due to problems found by 0-day. This resolves some reported issues with the hwtracing drivers, and a reported regression for the thunderbolt subsystem. All of these have been in linux-next for a while now with no reported problems" * tag 'char-misc-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: hwtracing: stm: fix build error on some arches intel_th: Use correct device when freeing buffers stm class: Use vmalloc for the master map thunderbolt: Handle NULL boot ACL entries properly
2018-06-02Merge tag 'staging-4.17-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull IIO driver fixes from Greg KH: "Here are some old IIO driver fixes that were sitting in my tree for a few weeks. Sorry about not getting them to you sooner. They fix a number of small IIO driver issues that have been reported. All of these have been in linux-next for a while with no reported problems" * tag 'staging-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: iio: adc: select buffer for at91-sama5d2_adc iio: hid-sensor-trigger: Fix sometimes not powering up the sensor after resume iio: adc: at91-sama5d2_adc: fix channel configuration for differential channels iio:kfifo_buf: check for uint overflow iio:buffer: make length types match kfifo types iio: adc: stm32-dfsdm: fix sample rate for div2 spi clock iio: adc: stm32-dfsdm: fix successive oversampling settings iio: ad7793: implement IIO_CHAN_INFO_SAMP_FREQ
2018-06-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Just three small last minute regressions that were found in the last week. The Broadcom fix is a bit big for rc7, but since it is fixing driver crash regressions that were merged via netdev into rc1, I am sending it. - bnxt netdev changes merged this cycle caused the bnxt RDMA driver to crash under certain situations - Arnd found (several, unfortunately) kconfig problems with the patches adding INFINIBAND_ADDR_TRANS. Reverting this last part, will fix it more fully outside -rc. - Subtle change in error code for a uapi function caused breakage in userspace. This was bug was subtly introduced cycle" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: IB/core: Fix error code for invalid GID entry IB: Revert "remove redundant INFINIBAND kconfig dependencies" RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes
2018-06-02Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "A documentation bugfix and a MAINTAINERS addition" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: ocores: update HDL sources URL i2c: xlp9xx: Add MAINTAINERS entry
2018-06-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge two fixes from Andrew Morton. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: fix the NULL mapping case in __isolate_lru_page() mm/huge_memory.c: __split_huge_page() use atomic ClearPageDirty()
2018-06-02mm: fix the NULL mapping case in __isolate_lru_page()Hugh Dickins
George Boole would have noticed a slight error in 4.16 commit 69d763fc6d3a ("mm: pin address_space before dereferencing it while isolating an LRU page"). Fix it, to match both the comment above it, and the original behaviour. Although anonymous pages are not marked PageDirty at first, we have an old habit of calling SetPageDirty when a page is removed from swap cache: so there's a category of ex-swap pages that are easily migratable, but were inadvertently excluded from compaction's async migration in 4.16. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805302014001.12558@eggly.anvils Fixes: 69d763fc6d3a ("mm: pin address_space before dereferencing it while isolating an LRU page") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Ivan Kalvachev <ikalvachev@gmail.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-02mm/huge_memory.c: __split_huge_page() use atomic ClearPageDirty()Hugh Dickins
Swapping load on huge=always tmpfs (with khugepaged tuned up to be very eager, but I'm not sure that is relevant) soon hung uninterruptibly, waiting for page lock in shmem_getpage_gfp()'s find_lock_entry(), most often when "cp -a" was trying to write to a smallish file. Debug showed that the page in question was not locked, and page->mapping NULL by now, but page->index consistent with having been in a huge page before. Reproduced in minutes on a 4.15 kernel, even with 4.17's 605ca5ede764 ("mm/huge_memory.c: reorder operations in __split_huge_page_tail()") added in; but took hours to reproduce on a 4.17 kernel (no idea why). The culprit proved to be the __ClearPageDirty() on tails beyond i_size in __split_huge_page(): the non-atomic __bitoperation may have been safe when 4.8's baa355fd3314 ("thp: file pages support for split_huge_page()") introduced it, but liable to erase PageWaiters after 4.10's 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit"). Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805291841070.3197@eggly.anvils Fixes: 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-02Revert "vfio/type1: Improve memory pinning process for raw PFN mapping"Alex Williamson
Bisection by Amadeusz Sławiński implicates this commit leading to bad page state issues after VM shutdown, likely due to unbalanced page references. The original commit was intended only as a performance improvement, therefore revert for offline rework. Link: https://lkml.org/lkml/2018/6/2/97 Fixes: 356e88ebe447 ("vfio/type1: Improve memory pinning process for raw PFN mapping") Cc: Jason Cai (Xiang Feng) <jason.cai@linux.alibaba.com> Reported-by: Amadeusz Sławiński <amade@asmblr.net> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree, the most relevant things in this batch are: 1) Compile masquerade infrastructure into NAT module, from Florian Westphal. Same thing with the redirection support. 2) Abort transaction if early initialization of the commit phase fails. Also from Florian. 3) Get rid of synchronize_rcu() by using rule array in nf_tables, from Florian. 4) Abort nf_tables batch if fatal signal is pending, from Florian. 5) Use .call_rcu nfnetlink from nf_tables to make dumps fully lockless. From Florian Westphal. 6) Support to match transparent sockets from nf_tables, from Máté Eckl. 7) Audit support for nf_tables, from Phil Sutter. 8) Validate chain dependencies from commit phase, fall back to fine grain validation only in case of errors. 9) Attach dst to skbuff from netfilter flowtable packet path, from Jason A. Donenfeld. 10) Use artificial maximum attribute cap to remove VLA from nfnetlink. Patch from Kees Cook. 11) Add extension to allow to forward packets through neighbour layer. 12) Add IPv6 conntrack helper support to IPVS, from Julian Anastasov. 13) Add IPv6 FTP conntrack support to IPVS, from Julian Anastasov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-02Merge tag 'mlx5e-updates-2018-06-01' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5e-updates-2018-06-01 1) From Tariq, Two patches to Fix IPoIB issues introduced in "net/mlx5e: TX, Use actual WQE size for SQ edge fill" 2) From Eran, Additional improvements to mlx5e statistics reporting 3) From Maor, Increase aRFS flow tables size 4) From Adi, Support MTU change for ethernet representors 5) From Ilan and Adi, Handle QP error events in FPGA 6) From Tariq, last 10 patches mainly deals with RX buffer scheme improvements for legacy RQ to use only order-0 pages and fragmented SKBs for large MTUs. - Tariq starts with some refactoring and removing HW LRO support from traditional (legacy) RQ, since it complicates the buffer scheme and removing it makes it smoother to move to cyclic descriptor buffer for traditional RQ. - Use cyclic WQ in legacy RQ, which has many benefits and paves the way for fragmented SKBs for large MTUs. - Enhance legacy Receive Queue memory scheme, such that only order-0 pages are used. Whenever possible, prefer using a linear SKB, and build it wrapping the WQE buffer. Otherwise (for example, jumbo frames on x86), use non-linear SKB, with as many frags as needed. In this case, multiple WQE scatter entries are used, up to a maximum of 4 frags and 10KB of MTU. - TX statistics access improvements. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2018-06-02 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) BPF uapi fix in struct bpf_prog_info and struct bpf_map_info in order to fix offsets on 32 bit archs. This will have a minor merge conflict with net-next which has the __u32 gpl_compatible:1 bitfield in struct bpf_prog_info at this location. Resolution is to use the gpl_compatible member. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01bpf: fix uapi hole for 32 bit compat applicationsDaniel Borkmann
In 64 bit, we have a 4 byte hole between ifindex and netns_dev in the case of struct bpf_map_info but also struct bpf_prog_info. In net-next commit b85fab0e67b ("bpf: Add gpl_compatible flag to struct bpf_prog_info") added a bitfield into it to expose some flags related to programs. Thus, add an unnamed __u32 bitfield for both so that alignment keeps the same in both 32 and 64 bit cases, and can be naturally extended from there as in b85fab0e67b. Before: # file test.o test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped # pahole test.o struct bpf_map_info { __u32 type; /* 0 4 */ __u32 id; /* 4 4 */ __u32 key_size; /* 8 4 */ __u32 value_size; /* 12 4 */ __u32 max_entries; /* 16 4 */ __u32 map_flags; /* 20 4 */ char name[16]; /* 24 16 */ __u32 ifindex; /* 40 4 */ __u64 netns_dev; /* 44 8 */ __u64 netns_ino; /* 52 8 */ /* size: 64, cachelines: 1, members: 10 */ /* padding: 4 */ }; After (same as on 64 bit): # file test.o test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped # pahole test.o struct bpf_map_info { __u32 type; /* 0 4 */ __u32 id; /* 4 4 */ __u32 key_size; /* 8 4 */ __u32 value_size; /* 12 4 */ __u32 max_entries; /* 16 4 */ __u32 map_flags; /* 20 4 */ char name[16]; /* 24 16 */ __u32 ifindex; /* 40 4 */ /* XXX 4 bytes hole, try to pack */ __u64 netns_dev; /* 48 8 */ __u64 netns_ino; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ /* size: 64, cachelines: 1, members: 10 */ /* sum members: 60, holes: 1, sum holes: 4 */ }; Reported-by: Dmitry V. Levin <ldv@altlinux.org> Reported-by: Eugene Syromiatnikov <esyr@redhat.com> Fixes: 52775b33bb507 ("bpf: offload: report device information about offloaded maps") Fixes: 675fc275a3a2d ("bpf: offload: report device information for offloaded programs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-01net/mlx5e: TX, Separate cachelines of xmit and completion statsTariq Toukan
Avoid false sharing of cachelines by separating the cachelines of TX stats that are dertied in xmit flow and in completion flow. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: RX, Always prefer Linear SKB configurationTariq Toukan
Prefer the linear SKB configuration of Legacy RQ over the non-linear one of Striding RQ. This implies that ConnectX-4 LX now uses legacy RQ by default, as it does not support the linear configuration of Striding RQ. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: RX, Enhance legacy Receive Queue memory schemeTariq Toukan
Enhance the memory scheme of the legacy RQ, such that only order-0 pages are used. Whenever possible, prefer using a linear SKB, and build it wrapping the WQE buffer. Otherwise (for example, jumbo frames on x86), use non-linear SKB, with as many frags as needed. In this case, multiple WQE scatter entries are used, up to a maximum of 4 frags and 10KB of MTU. This implied to remove support of HW LRO in legacy RQ, as it would require large number of page allocations and scatter entries per WQE on archs with PAGE_SIZE = 4KB, yielding bad performance. In earlier patches, we guaranteed that all completions are in-order, and that we use a cyclic WQ. This creates an oppurtunity for a performance optimization: The mapping between a "struct mlx5e_dma_info", and the WQEs (struct mlx5e_wqe_frag_info) pointing to it, is constant across different cycles of a WQ. This allows initializing the mapping in the time of RQ creation, and not handle it in datapath. A struct mlx5e_dma_info that is shared between different WQEs is allocated by the first WQE, and freed by the last one. This implies an important requirement: WQEs that share the same struct mlx5e_dma_info must be posted within the same NAPI. Otherwise, upon completion, struct mlx5e_wqe_frag_info would mistakenly point to the new struct mlx5e_dma_info, not the one that was posted (and the HW wrote to). This bulking requirement is actually good also for performance reasons, hence we extend the bulk beyong the minimal requirement above. With this memory scheme, the RQs memory footprint is reduce by a factor of 2 on x86, and by a factor of 32 on PowerPC. Same factors apply for the number of pages in a GRO session. Performance tests: ConnectX-4, single core, single RX ring, default MTU. x86: CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Packet rate (early drop in TC): no degradation TCP streams: ~5% improvement PowerPC: CPU: POWER8 (raw), altivec supported Packet rate (early drop in TC): 20% gain TCP streams: 25% gain Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: RX, Use cyclic WQ in legacy RQTariq Toukan
Now that LRO is not supported for Legacy RQ, there is no source of out-of-order completions in the WQ, and we can use a cyclic one. This has multiple advantages: - reduces the WQE size (smaller PCI transactions). - lower overhead in datapath (no handling of 'next' pointers). - no reserved WQE for the WQ head (was need in linked-list). - allows using a constant map between frag and dma_info struct, in downstream patch. Performance tests: ConnectX-4, single core, single RX ring. Major gain in packet rate of single ring XDP drop. Bottleneck is shifted form HW (at 16Mpps) to SW (at 20Mpps). Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: RX, Split WQ objects for different RQ typesTariq Toukan
Replace the common RQ WQ object with two separate ones for the different RQ types. This is in preparation for switching to using a cyclic WQ type in Legacy RQ. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: RX, Remove HW LRO support in legacy RQTariq Toukan
Current LRO implementation in Legacy RQ uses high-order pages. In downstream patches of this series we complete the transition to using only order-0 pages in RX datapath (which was already done in Striding RQ). Unlike the more advanced Striding RQ, Legacy RQ does not make reuse of any non-consumed buffers of non-full LRO sessions, and combining it with order-0 pages has many performance drawbacks. Hence, here we totally remove LRO support in Legacy RQ. This guarantees having no out-of-order completions, which allows using a cyclic work queue (instead of a linked-list) in a downstream patch. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: RX, Dedicate a function for copying SKB headerTariq Toukan
Get the logic of copying the packet header into the SKB linear part into a generic function. Function does copy length alignment and dma buffer sync. It is currently called only within the MPWQE flow. In a downstream patch, it will be called within the legacy RQ flow as well. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: RX, Generalise function of SKB frag additionTariq Toukan
Rename it and pass truesize as an extra argument, as it will be used also in Legacy RQ in a downstream patch. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: RX, Generalise name of non-linear SKB head sizeTariq Toukan
Make name more generic by dropping MPWRQ from it, as it will be used also in Legacy RQ in a downstream patch. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: TX, Obsolete maintaining local copies of skb->len/dataTariq Toukan
Instead of maintaining a local copy of skb->len/data and updating it upon every copy to the WQE inline part, just calculate it once when needed, using the ihs. This obsoletes the function mlx5e_tx_skb_pull_inline. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5: FPGA, Handle QP error eventIlan Tayari
Add handlers for this event to perform graceful teardown of the device. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Adi Nissim <adin@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: Support configurable MTU for vport representorsAdi Nissim
The representor MTU was hard coded to 1500 bytes. Allow setting arbitrary MTU values up to the max supported by the FW. Signed-off-by: Adi Nissim <adin@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: Increase aRFS flow tables sizeMaor Gottlieb
Increase the aRFS flow table size to 64k so it could contain up to 64k different streams. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: Remove redundant active_channels indicationEran Ben Elisha
Now, when all channels stats are saved regardless of the channel's state {open, closed}, we can safely remove this indication and the stats spin lock which protects it. Fixes: 76c3810bade3 ("net/mlx5e: Avoid reset netdev stats on configuration changes") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: Present SW stats when state is not openedEran Ben Elisha
The driver can present all SW stats even when the state not opened. Fixed get strings, count and stats to support it. In addition, fix tc2txq to hold a static mapping which doesn't depend on the amount of open channels, and cannot have the same value on two different cells while moving between configurations. Example: - OOB 16 channels - Change to 2 channels, 8 TCs - tc2txq[15][0] == tc2txq[1][7] == 15 This will cause multiple appearances of the same TX index in statistics output. Fixes: 76c3810bade3 ("net/mlx5e: Avoid reset netdev stats on configuration changes") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>