summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-10-11net: mscc: allow extracting the FCS into the skbAntoine Tenart
This patch adds support for the NETIF_F_RXFCS feature in the Mscc Ethernet driver. This feature is disabled by default and allow a user to request the driver not to drop the FCS and to extract it into the skb for debugging purposes. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-11mac80211: Extend SAE authentication in infra BSS STA modeJouni Malinen
Previous implementation of SAE authentication in infrastructure BSS was somewhat restricting and not exactly clean way of handling the two auth() operations. This ended up removing and re-adding the STA entry for the AP in the middle of authentication and also messing up authentication state tracking through the sequence of four Authentication frames. Furthermore, this did not work if the AP ended up sending out SAE Confirm (auth trans #2) immediately after SAE Commit (auth trans #1) before the station had time to transmit its SAE Confirm. Clean up authentication state handling for the SAE case to allow two rounds of auth() calls without dropping all state between those operations. Track peer Confirmed status and mark authentication completed only once both ends have confirmed. ieee80211_mgd_auth() check for EBUSY cases is now handling only the pending association (ifmgd->assoc_data) while all pending authentication (ifmgd->auth_data) cases are allowed to proceed to allow user space to start a new connection attempt from scratch even if the previously requested authentication is still waiting completion. This is needed to avoid making SAE error cases with retries take excessive amount of time with no means for the user space to stop that (apart from setting the netdev down). As an extra bonus, the end of ieee80211_rx_mgmt_auth() can be cleaned up to avoid the extra copy of the cfg80211_rx_mlme_mgmt() call for ongoing SAE authentication since the new ieee80211_mark_sta_auth() helper function can handle both completion of authentication and updates to the STA entry under the same condition and there is no need to return from the function between those operations. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: Move ieee80211_mgd_auth() EBUSY check to be before allocationJouni Malinen
This makes it easier to conditionally replace full allocation of auth_data to use reallocation for the case of continuing SAE authentication. Furthermore, there was not really any point in having this check done so late in the function after having already completed number of steps that cannot be used anyway in the error case. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: Helper function for marking STA authenticatedJouni Malinen
Authentication exchange can be completed in both TX and RX paths for SAE, so move this common functionality into a helper function to avoid having to implement practically the same operations in two places when extending SAE implementation in the following commits. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: rc80211_minstrel: remove variance / stddev calculationFelix Fietkau
When there are few packets (e.g. for sampling attempts), the exponentially weighted variance is usually vastly overestimated, making the resulting data essentially useless. As far as I know, there has not been any practical use for this, so let's not waste any cycles on it. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: do not sample rates 3 times slower than max_prob_rateFelix Fietkau
These rates are highly unlikely to be used quickly, even if the link deteriorates rapidly. This improves throughput in cases where CCK rates are not reliable enough to be skipped entirely during sampling. Sampling these rates regularly can cost a lot of airtime. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: fix sampling/reporting of CCK rates in HT modeFelix Fietkau
Long/short preamble selection cannot be sampled separately, since it depends on the BSS state. Because of that, sampling attempts to currently not used preamble modes are not counted in the statistics, which leads to CCK rates being sampled too often. Fix statistics accounting for long/short preamble by increasing the index where necessary. Fix excessive CCK rate sampling by dropping unsupported sample attempts. This improves throughput on 2.4 GHz channels Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: fix CCK rate group streams valueFelix Fietkau
Fixes a harmless underflow issue when CCK rates are actively being used Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: fix using short preamble CCK rates on HT clientsFelix Fietkau
mi->supported[MINSTREL_CCK_GROUP] needs to be updated short preamble rates need to be marked as supported regardless of whether it's currently enabled. Its state can change at any time without a rate_update call. Fixes: 782dda00ab8e ("mac80211: minstrel_ht: move short preamble check out of get_rate") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: reduce minstrel_mcs_groups sizeFelix Fietkau
By storing a shift value for all duration values of a group, we can reduce precision by a neglegible amount to make it fit into a u16 value. This improves cache footprint and reduces size: Before: text data bss dec hex filename 10024 116 0 10140 279c rc80211_minstrel_ht.o After: text data bss dec hex filename 9368 116 0 9484 250c rc80211_minstrel_ht.o Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: merge with minstrel_ht, always enable VHT supportFelix Fietkau
Legacy-only devices are not very common and the overhead of the extra code for HT and VHT rates is not big enough to justify all those extra lines of code to make it optional. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: remove unnecessary debugfs cleanup codeFelix Fietkau
debugfs entries are cleaned up by debugfs_remove_recursive already. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: Enable STBC and LDPC for VHT RatesChaitanya T K
If peer support reception of STBC and LDPC, enable them for better performance. Signed-off-by: Chaitanya TK <chaitanya.mgit@gmail.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: avoid reflecting frames back to the clientJohannes Berg
I'm not really sure exactly _why_ I've been carrying a note for what's probably _years_ to check that we don't do this, but we clearly do reflect frames back to the station itself if it sends such. One way or the other, it's useless since the station doesn't really need the AP to talk to itself, so suppress it. While at it, clarify some of the logic by removing skb->data references in favour of the destination address (pointer) we already have separately. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11nl80211: use netlink policy validation function for elementsJohannes Berg
Instead of open-coding a lot of calls to is_valid_ie_attr(), add this validation directly to the policy, now that we can. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11nl80211: use policy range validation where applicableJohannes Berg
Many range checks can be done in the policy, move them there. A few in mesh are added in the code (taken out of the macros) because they don't fit into the s16 range in the policy validation. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11Merge branch 'x86-urgent-for-linus' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Ingo writes: "x86 fixes An intel_rdt memory access fix and a VLA fix in pgd_alloc()." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Avoid VLA in pgd_alloc() x86/intel_rdt: Fix out-of-bounds memory access in CBM tests
2018-10-11Merge branch 'sched-urgent-for-linus' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Ingo writes: "scheduler fix: Cleanup of dead code left over from the recent sched/numa fixes." * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mm, sched/numa: Remove remaining traces of NUMA rate-limiting
2018-10-11Merge branch 'perf-urgent-for-linus' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Ingo, a man of few words, writes: "perf fixes: misc perf tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf record: Use unmapped IP for inline callchain cursors perf python: Use -Wno-redundant-decls to build with PYTHON=python3 perf report: Don't try to map ip to invalid map perf script python: Fix export-to-sqlite.py sample columns perf script python: Fix export-to-postgresql.py occasional failure
2018-10-10Merge branch 'hns3-next'David S. Miller
Salil Mehta says: ==================== Adds support of RSS to HNS3 Driver for Rev 2(=0x21) H/W This patch-set mainly adds new additions related to RSS for the new hardware Revision 0x21. It also adds support to use RSS hash value provided by the hardware along with descriptor. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net: hns3: Add HW RSS hash information to RX skbPeng Li
Drivers should call skb_set_hash to set the hash and its type in an skbuff. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net: hns3: Add RSS tuples support for VFJian Shen
This patch adds RSS tuple support for VF in revision 0x21. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net: hns3: Add RSS general configuration support for VFJian Shen
This patch adds RSS key, hash algorithm configuration support for VF in revision 0x21. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net: hns3: Add new RSS hash algorithm support for PFJian Shen
This patch adds ETH_RSS_HASH_XOR hash algorithm supports, which is supported by hw revision 0x21. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interfaceGiacinto Cifelli
Added support for Gemalto's Cinterion ALASxx WWAN interfaces by adding QMI_FIXED_INTF with Cinterion's VID and PID. Signed-off-by: Giacinto Cifelli <gciofono@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10tipc: queue socket protocol error messages into socket receive bufferParthasarathy Bhuvaragan
In tipc_sk_filter_rcv(), when we detect protocol messages with error we call tipc_sk_conn_proto_rcv() and let it reset the connection and notify the socket by calling sk->sk_state_change(). However, tipc_sk_filter_rcv() may have been called from the function tipc_backlog_rcv(), in which case the socket lock is held and the socket already awake. This means that the sk_state_change() call is ignored and the error notification lost. Now the receive queue will remain empty and the socket sleeps forever. In this commit, we convert the protocol message into a connection abort message and enqueue it into the socket's receive queue. By this addition to the above state change we cover all conditions. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10tipc: set link tolerance correctly in broadcast linkJon Maloy
In the patch referred to below we added link tolerance as an additional criteria for declaring broadcast transmission "stale" and resetting the affected links. However, the 'tolerance' field of the broadcast link is never set, and remains at zero. This renders the whole commit without the intended improving effect, but luckily also with no negative effect. In this commit we add the missing initialization. Fixes: a4dc70d46cf1 ("tipc: extend link reset criteria for stale packet retransmission") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10phy: phy-ocelot-serdes: fix return value check in serdes_probe()Wei Yongjun
In case of error, the function syscon_node_to_regmap() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Fixes: 51f6b410fc22 ("phy: add driver for Microsemi Ocelot SerDes muxing") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10Merge branch 'net-dsa-bcm_sf2-Couple-of-fixes'David S. Miller
Florian Fainelli says: ==================== net: dsa: bcm_sf2: Couple of fixes Here are two fixes for the bcm_sf2 driver that were found during testing unbind and analysing another issue during system suspend/resume. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net: dsa: bcm_sf2: Call setup during switch resumeFlorian Fainelli
There is no reason to open code what the switch setup function does, in fact, because we just issued a switch reset, we would make all the register get their default values, including for instance, having unused port be enabled again and wasting power and leading to an inappropriate switch core clock being selected. Fixes: 8cfa94984c9c ("net: dsa: bcm_sf2: add suspend/resume callbacks") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net: dsa: bcm_sf2: Fix unbind orderingFlorian Fainelli
The order in which we release resources is unfortunately leading to bus errors while dismantling the port. This is because we set priv->wol_ports_mask to 0 to tell bcm_sf2_sw_suspend() that it is now permissible to clock gate the switch. Later on, when dsa_slave_destroy() comes in from dsa_unregister_switch() and calls dsa_switch_ops::port_disable, we perform the same dismantling again, and this time we hit registers that are clock gated. Make sure that dsa_unregister_switch() is the first thing that happens, which takes care of releasing all user visible resources, then proceed with clock gating hardware. We still need to set priv->wol_ports_mask to 0 to make sure that an enabled port properly gets disabled in case it was previously used as part of Wake-on-LAN. Fixes: d9338023fb8e ("net: dsa: bcm_sf2: Make it a real platform device driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net: sched: avoid writing on noop_qdiscEric Dumazet
While noop_qdisc.gso_skb and noop_qdisc.skb_bad_txq are not used in other places, it seems not correct to overwrite their fields in dev_init_scheduler_queue(). noop_qdisc is essentially a shared and read-only object, even if it is not marked as const because of some implementation detail. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net/mpls: Implement handler for strict data checking on dumpsDavid Ahern
Without CONFIG_INET enabled compiles fail with: net/mpls/af_mpls.o: In function `mpls_dump_routes': af_mpls.c:(.text+0xed0): undefined reference to `ip_valid_fib_dump_req' The preference is for MPLS to use the same handler as ipv4 and ipv6 to allow consistency when doing a dump for AF_UNSPEC which walks all address families invoking the route dump handler. If INET is disabled then fallback to an MPLS version which can be tighter on the data checks. Fixes: e8ba330ac0c5 ("rtnetlink: Update fib dumps for strict data checking") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10Merge branch 'net-ipv4-fixes-for-PMTU-when-link-MTU-changes'David S. Miller
Sabrina Dubroca says: ==================== net: ipv4: fixes for PMTU when link MTU changes The first patch adapts the changes that commit e9fa1495d738 ("ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes") did in IPv6 to IPv4: lower PMTU when the first hop's MTU drops below it, and raise PMTU when the first hop was limiting PMTU discovery and its MTU is increased. The second patch fixes bugs introduced in commit d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu") that only appear once the first patch is applied. Selftests for these cases were introduced in net-next commit e44e428f59e4 ("selftests: pmtu: add basic IPv4 and IPv6 PMTU tests") v2: add cover letter, and fix a few small things in patch 1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net: ipv4: don't let PMTU updates increase route MTUSabrina Dubroca
When an MTU update with PMTU smaller than net.ipv4.route.min_pmtu is received, we must clamp its value. However, we can receive a PMTU exception with PMTU < old_mtu < ip_rt_min_pmtu, which would lead to an increase in PMTU. To fix this, take the smallest of the old MTU and ip_rt_min_pmtu. Before this patch, in case of an update, the exception's MTU would always change. Now, an exception can have only its lock flag updated, but not the MTU, so we need to add a check on locking to the following "is this exception getting updated, or close to expiring?" test. Fixes: d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net: ipv4: update fnhe_pmtu when first hop's MTU changesSabrina Dubroca
Since commit 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions"), exceptions get deprecated separately from cached routes. In particular, administrative changes don't clear PMTU anymore. As Stefano described in commit e9fa1495d738 ("ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes"), the PMTU discovered before the local MTU change can become stale: - if the local MTU is now lower than the PMTU, that PMTU is now incorrect - if the local MTU was the lowest value in the path, and is increased, we might discover a higher PMTU Similarly to what commit e9fa1495d738 did for IPv6, update PMTU in those cases. If the exception was locked, the discovered PMTU was smaller than the minimal accepted PMTU. In that case, if the new local MTU is smaller than the current PMTU, let PMTU discovery figure out if locking of the exception is still needed. To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU notifier. By the time the notifier is called, dev->mtu has been changed. This patch adds the old MTU as additional information in the notifier structure, and a new call_netdevice_notifiers_u32() function. Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net/ipv6: stop leaking percpu memory in fib6 infoMike Rapoport
The fib6_info_alloc() function allocates percpu memory to hold per CPU pointers to rt6_info, but this memory is never freed. Fix it. Fixes: a64efe142f5e ("net/ipv6: introduce fib6_info struct and helpers") Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10Merge branch 'fore200e-DMA-cleanups-and-fixes'David S. Miller
Christoph Hellwig says: ==================== fore200e DMA cleanups and fixes The fore200e driver came up during some dma-related audits, so here is the fallout. Compile tested (x86 & sparc) only. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10fore200e: check for dma mapping failuresChristoph Hellwig
The driver was lacking any handling for failures from the DMA mapping routines. With an iommu or swiotlb this can be fatal. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10fore200e: don't use GFP_DMAChristoph Hellwig
The driver properly uses the DMA mapping API, so it should not pointlessly dip into the GFP_DMA pool, which is only 16MB on x86. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10fore200e: devirtualize dma alloc callsChristoph Hellwig
There is no need for an indirection before calling the dma alloc routines now that we store a struct device in struct fore200e. Also remove the pointless GFP_ATOMIC for the sbus case, and fix the up the error handling by removing the 0 dma_addr test - some iommus can return 0 as a perfectly valid bus address. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10fore200e: devirtualize dma mapping callsChristoph Hellwig
There is no need for an indirection before calling the dma mapping routines now that we store a struct device in struct fore200e. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10fore200e: remove the align_size field of struct chunkChristoph Hellwig
There is no need for this field, as the only user of it can just use the local size variable instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10fore200e: store a struct device in struct fore200eChristoph Hellwig
This can be used much better than the untyped void pointer containing either a PCI or platform device. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10fore200e: simplify fore200e_bus usageChristoph Hellwig
There is no need to have a global array of the ops, instead PCI and sbus can have their own instances assigned in *_probe. Also switch to C99 initializers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net: tun: remove useless codes of tun_automq_select_queueWang Li
Because the function __skb_get_hash_symmetric always returns non-zero. Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Wang Li <wangli39@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10virtio_net: ethtool tx napi configurationJason Wang
Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) handlers. Interrupt moderation is currently not supported, so these accept and display the default settings of 0 usec and 1 frame. Toggle tx napi through setting tx-frames. So as to not interfere with possible future interrupt moderation, value 1 means tx napi while value 0 means not. Only allow the switching when device is down for simplicity. Link: https://patchwork.ozlabs.org/patch/948149/ Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10Merge branch 'nfp-flower-speed-up-stats-update-loop'David S. Miller
Jakub Kicinski says: ==================== nfp: flower: speed up stats update loop This set from Pieter improves performance of processing FW stats update notifications. The FW seems to send those at relatively high rate (roughly ten per second per flow), therefore if we want to approach the million flows mark we have to be very careful about our data structures. We tried rhashtable for stat updates, but according to our experiments rhashtable lookup on a u32 takes roughly 60ns on an Xeon E5-2670 v3. Which translate to a hard limit of 16M lookups per second on this CPU, and, according to perf record jhash and memcmp account for 60% of CPU usage on the core handling the updates. Given that our statistic IDs are already array indices, and considering each statistic is only 24B in size, we decided to forego the use of hashtables and use a directly indexed array. The CPU savings are considerable. With the recent improvements in TC core and with our own bottlenecks out of the way Pieter removes the artificial limit of 128 flows, and allows the driver to install as many flows as FW supports. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10nfp: flower: use host context count provided by firmwarePieter Jansen van Vuuren
Read the host context count symbols provided by firmware and use it to determine the number of allocated stats ids. Previously it won't be possible to offload more than 2^17 filter even if FW was able to do so. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10nfp: flower: use stats array instead of storing stats per flowPieter Jansen van Vuuren
Make use of an array stats instead of storing stats per flow which would require a hash lookup at critical times. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>