summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-12-08net: hns: optimize XGE capability by reducing cpu usageyankejian
here is the patch raising the performance of XGE by: 1)changes the way page management method for enet momery, and 2)reduces the count of rmb, and 3)adds Memory prefetching Signed-off-by: Kejian Yan <yankejian@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-08sock, cgroup: add sock->sk_cgroupTejun Heo
In cgroup v1, dealing with cgroup membership was difficult because the number of membership associations was unbound. As a result, cgroup v1 grew several controllers whose primary purpose is either tagging membership or pull in configuration knobs from other subsystems so that cgroup membership test can be avoided. net_cls and net_prio controllers are examples of the latter. They allow configuring network-specific attributes from cgroup side so that network subsystem can avoid testing cgroup membership; unfortunately, these are not only cumbersome but also problematic. Both net_cls and net_prio aren't properly hierarchical. Both inherit configuration from the parent on creation but there's no interaction afterwards. An ancestor doesn't restrict the behavior in its subtree in anyway and configuration changes aren't propagated downwards. Especially when combined with cgroup delegation, this is problematic because delegatees can mess up whatever network configuration implemented at the system level. net_prio would allow the delegatees to set whatever priority value regardless of CAP_NET_ADMIN and net_cls the same for classid. While it is possible to solve these issues from controller side by implementing hierarchical allowable ranges in both controllers, it would involve quite a bit of complexity in the controllers and further obfuscate network configuration as it becomes even more difficult to tell what's actually being configured looking from the network side. While not much can be done for v1 at this point, as membership handling is sane on cgroup v2, it'd be better to make cgroup matching behave like other network matches and classifiers than introducing further complications. In preparation, this patch updates sock->sk_cgrp_data handling so that it points to the v2 cgroup that sock was created in until either net_prio or net_cls is used. Once either of the two is used, sock->sk_cgrp_data reverts to its previous role of carrying prioidx and classid. This is to avoid adding yet another cgroup related field to struct sock. As the mode switching can happen at most once per boot, the switching mechanism is aimed at lowering hot path overhead. It may leak a finite, likely small, number of cgroup refs and report spurious prioidx or classid on switching; however, dynamic updates of prioidx and classid have always been racy and lossy - socks between creation and fd installation are never updated, config changes don't update existing sockets at all, and prioidx may index with dead and recycled cgroup IDs. Non-critical inaccuracies from small race windows won't make any noticeable difference. This patch doesn't make use of the pointer yet. The following patch will implement netfilter match for cgroup2 membership. v2: Use sock_cgroup_data to avoid inflating struct sock w/ another cgroup specific field. v3: Add comments explaining why sock_data_prioidx() and sock_data_classid() use different fallback values. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Daniel Wagner <daniel.wagner@bmw-carit.de> CC: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-08net: wrap sock->sk_cgrp_prioidx and ->sk_classid inside a structTejun Heo
Introduce sock->sk_cgrp_data which is a struct sock_cgroup_data. ->sk_cgroup_prioidx and ->sk_classid are moved into it. The struct and its accessors are defined in cgroup-defs.h. This is to prepare for overloading the fields with a cgroup pointer. This patch mostly performs equivalent conversions but the followings are noteworthy. * Equality test before updating classid is removed from sock_update_classid(). This shouldn't make any noticeable difference and a similar test will be implemented on the helper side later. * sock_update_netprioidx() now takes struct sock_cgroup_data and can be moved to netprio_cgroup.h without causing include dependency loop. Moved. * The dummy version of sock_update_netprioidx() converted to a static inline function while at it. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-08netprio_cgroup: limit the maximum css->id to USHRT_MAXTejun Heo
netprio builds per-netdev contiguous priomap array which is indexed by css->id. The array is allocated using kzalloc() effectively limiting the maximum ID supported to some thousand range. This patch caps the maximum supported css->id to USHRT_MAX which should be way above what is actually useable. This allows reducing sock->sk_cgrp_prioidx to u16 from u32. The freed up part will be used to overload the cgroup related fields. sock->sk_cgrp_prioidx's position is swapped with sk_mark so that the two cgroup related fields are adjacent. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Daniel Borkmann <daniel@iogearbox.net> CC: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-08Merge branch 'for-4.5-ancestor-test' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Preparatory changes for some new socket cgroup infrastructure and netfilter targets. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-08Revert "Merge branch 'vsock-virtio'"Stefan Hajnoczi
This reverts commit 0d76d6e8b2507983a2cae4c09880798079007421 and merge commit c402293bd76fbc93e52ef8c0947ab81eea3ae019, reversing changes made to c89359a42e2a49656451569c382eed63e781153c. The virtio-vsock device specification is not finalized yet. Michael Tsirkin voiced concerned about merging this code when the hardware interface (and possibly the userspace interface) could still change. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-08Merge branch 'sh_eth-optimize-mdio'David S. Miller
Sergei Shtylyov says: ==================== sh_eth: optimize MDIO code Here's a set of 3 patches against DaveM's 'net-next.git' repo which gets rid of ~35 LoCs in the MDIO bitbang methods. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-08sh_eth: get rid of bb_{set|clr|read}()Sergei Shtylyov
After the MDIO bitbang code consolidation, there's no need anymore for bb_{set|clr}() as well as bb_read() -- just expand them inline, thus saving more LoCs... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-08sh_eth: factor out common code from MDIO bitbang methodsSergei Shtylyov
sh_mm[cd]_ctrl() and sh_set_mdio() all look mostly the same -- factor out their common code and put it into sh_mdio_ctrl(). Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-08sh_eth: remove mask fields from 'struct bb_info'Sergei Shtylyov
The MDIO control bits are always mapped to the same bits of the same register (PIR), so there's no need to store their masks in the 'struct bb_info'... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-08drivers: net: xgene: constify xgene_mac_ops and xgene_port_ops structuresJulia Lawall
The xgene_mac_ops and xgene_port_ops structures are never modified, so declare them as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-08Merge tag 'wireless-drivers-next-for-davem-2015-12-07' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Vallo says: ==================== brcfmac * support bcm4359 which can operate in two bands concurrently * disable runtime pm for USB avoiding issues * use generic pm callback in PCIe driver * support wowlan wake indication reporting * add beamforming support * unified handling of firmware files ath10k * support Manegement Frame Protection (MFP) * add thermal throttling support for 10.4 firmware * add support for pktlog in QCA99X0 * add debugfs file to enable Bluetooth coexistence feature * use firmware's native mesh interface type instead of raw mode iwlwifi * BT coex improvements * D3 operation bugfixes * rate control improvements * firmware debugging infra improvements * ground work for multi Rx * various security fixes ==================== Conflicts: drivers/net/wireless/ath/ath10k/pci.c The conflict resolution at: http://article.gmane.org/gmane.linux.kernel.next/37391 by Stephen Rothwell was used. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-08net: Fix inverted test in __skb_recv_datagramRainer Weikusat
As the kernel generally uses negated error numbers, *err needs to be compared with -EAGAIN (d'oh). Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Fixes: ea3793ee29d3 ("core: enable more fine-grained datagram reception control") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-08ath9k: remove ath9k_mod_tsf64_tuJanusz Dziedzic
Remove ath9k_mod_tsf64_tu() function while we could use div_u64_rem() function. Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-12-08ath9k: MCC, print time elapsed between eventsJanusz Dziedzic
This is useful for MCC debugging and bug fixing. Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-12-08ath9k: MCC, add NOA also in case of an APJanusz Dziedzic
In case of MCC and AP interface, add also NOA attr that will inform stations about absence of an AP. There is a chance that some stations will handle this NOA attr correctly and will know exactly when AP is present/absent. Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-12-08ath9k: P2P_CLIENT, get/set NOA correctlyJanusz Dziedzic
In case we get BSS_CHANGED_P2P_PS early, from mac80211, we didn't set NOA timer correctly, while p2p_ps_vif was NULL. Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-12-08ath9k: MCC enable Opportunistic Power SaveJanusz Dziedzic
When adding NOA attr enable Opportunistic Power Save. Before we calculate ctwindow but didn't enable oppps. Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-12-08ath9k: setup correct skb priority for nullfuncJanusz Dziedzic
After queue nullfunc for MCC case, we hit WARN_ON in xmit.c:2398 while skb priority wasn't set. Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-12-08ath9k: use u32 when calculate tsfJanusz Dziedzic
Use u32 while ath9k_hw_gettsf32() and ath9k_hw_gen_timer_start() require u32. Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-12-08ath9k: P2P_CLIENT, send frames after 1ms AP/GO will aprearJanusz Dziedzic
AP/GO will aprear after NOA, wait 1ms to be sure AP could receive/answer this frames. Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-12-08ath9k: queue null frames in case of MCCJanusz Dziedzic
While mac80211 using null frames when connection polling, we should queue this frames while NOA could be there, and AP, P2P_GO could be not present. Without this patch, with no traffic we often saw disconnections while we try to send nullfunc when AP/GO wasn't present. Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-12-08ath9k: print real timer valueJanusz Dziedzic
In case of low HZ before this patch we saw wrong values in debug message. Print real timeout value. Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-12-08ath9k: add debug messages to aggr/chanctx funcsJanusz Dziedzic
Add/extend debug messages when chanctx used. Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-12-08wil6210: prevent external wmi commands during suspend flowMaya Erez
In __wmi_send we check if fw is ready at the beginning of the function. While we wait for the completion of the previous command, system suspend can be invoked and reset the HW, causing __wmi_send to read from HW registers while it is not ready. Taking the wmi_mutex in the reset flow when setting the FW ready bit to zero will prevent the above race condition. Signed-off-by: Maya Erez <qca_merez@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-12-08ath5k: fix RTS/CTS by using proper rate flagsBob Copeland
The rates in the tx control rateset do not have the protection flags applied, so RTS/CTS would never get enabled if requested. Fix by using the rate flags in the rates returned by ieee80211_get_tx_rates(). Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-12-08ath6kl: Don't print error message when recv is canceledSteve deRosier
An error message ath6kl_htc_rxmsg_pending_handler isn't appropate for when the error is ECANCELED. This could be the result of a perfectly appropriate RX cancel due to shutdown or suspend. This allows the right cleanup to continue, but without an alarming error message in this particular case. Signed-off-by: Steve deRosier <steve.derosier@lairdtech.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-12-08ath9k: Simplify and fix eeprom endianness swappingMartin Blumenstingl
The three eeprom implementations had quite some duplicate code when it came to endianness swapping. Additionally there was a bug in eeprom_4k and eeprom_9287 which prevented the endianness swapping from working correctly, because the swapping code was guarded within an "if (!ath9k_hw_use_flash(ah))". In eeprom_def this check did not exist, so it seems that eeprom_def was the only implementation where endianness swapping worked. This patch takes the duplicate code and moves it from eeprom_* to eeprom.c. The new code is derived from eeprom_def, while taking into account the specifics from the other implementations. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-12-08ath10k: do not use coherent memory for allocated device memory chunksFelix Fietkau
Coherent memory is more expensive to allocate (and constrained on some architectures where it has to be pre-allocated). It is also completely unnecessary, since the host has no reason to even access these allocated memory spaces Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-12-08ath10k: remove unnecessary amsdu/ampdu assignment in debugfsMohammed Shafi Shajakhan
The default values of max_num_amsdu / max_num_amdpu is assigned a default value as part of 'ath10k_core_init_firmware_features' Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-12-07cxgb3: Convert simple_strtoul to kstrtoxLABBE Corentin
the simple_strtoul function is obsolete. This patch replace it by kstrtox. Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-07Merge branch 'more-dsa-unbinding-fixes'David S. Miller
Neil Armstrong says: ==================== Further fix for dsa unbinding This series fixes further issues for DSA dynamic unbinding. The first patch completely removes the PHY link state polling. The two following cleans up the dsa state upon removal. The last patch moves slave destroy code as slave function and adds missing netdev and phy cleanup calls. v1: http://lkml.kernel.org/r/562F8ECB.6050709@baylibre.com v2: http://lkml.kernel.org/r/56321D9A.8010109@baylibre.com remove phy fix and add missing calls in dsa_switch_destroy then add dedicated dsa_slave_destroy v3: remove polling instead of fixing it, make single patch for dsa slave destroy ==================== Acked-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-07net: dsa: move dsa slave destroy code to slave.cNeil Armstrong
Move dsa slave dedicated code from dsa_switch_destroy to a new dsa_slave_destroy function in slave.c. Add the netif_carrier_off and phy_disconnect calls in order to correctly cleanup the netdev state and PHY state machine. Signed-off-by: Frode Isaksen <fisaksen@baylibre.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-07net: dsa: Add missing master netdev dev_put() callsNeil Armstrong
Upon probe failure or unbinding, add missing dev_put() calls. Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-07net: dsa: cleanup resources upon module removalNeil Armstrong
Make sure that we unassign the master_netdev dsa_ptr to make the packet processing go through the regular Ethernet receive path. Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-07net: dsa: remove DSA link pollingNeil Armstrong
Since no more DSA driver uses the polling callback, and since the phylib handles the link detection, remove the link polling work and timer code. Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-07Merge tag 'mac80211-next-for-davem-2015-12-07' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== This pull request got a bit bigger than I wanted, due to needing to reshuffle and fix some bugs. I merged mac80211 to get the right base for some of these changes. * new mac80211 API for upcoming driver changes: EOSP handling, key iteration * scan abort changes allowing to cancel an ongoing scan * VHT IBSS 80+80 MHz support * re-enable full AP client state tracking after fixes * various small fixes (that weren't relevant for mac80211) * various cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-07Merge branch 'thunderx-cleanups'David S. Miller
Sunil Goutham says: ==================== net: thunderx: Miscellaneous cleanups This patch series contains contains couple of cleanup patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-07net, thunderx: Remove unnecessary rcv buffer start address managementSunil Goutham
Since we have moved on to using allocated pages to carve receive buffers instead of netdev_alloc_skb() there is no need to store any pointers for later retrieval. Earlier we had to store skb and skb->data pointers which later are used to handover received packet to network stack. This will avoid an unnecessary cache miss as well. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-07net: thunderx: nicvf_queues: nivc_*_intr: remove duplicationYury Norov
The same switch-case repeates for nivc_*_intr functions. In this patch it is moved to a helper nicvf_int_type_to_mask(). By the way: - Unneeded write to NICVF register dropped if int_type is unknown. - netdev_dbg() is used instead of netdev_err(). Signed-off-by: Yury Norov <yury.norov@auriga.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Acked-by: Vadim Lomovtsev <Vadim.Lomovtsev@caiumnetworks.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-07mac80211: handle HW ROC expired properlyIlan Peer
In case of HW ROC, when the driver reports that the ROC expired, it is not sufficient to purge the ROCs based on the remaining time, as it possible that the device finished the ROC session before the actual requested duration. To handle such cases, in case of ROC expired notification from the driver, complete all the ROCs which are marked with hw_begun, regardless of the remaining duration. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-06af_unix: fix unix_dgram_recvmsg entry lockingRainer Weikusat
The current unix_dgram_recvsmg code acquires the u->readlock mutex in order to protect access to the peek offset prior to calling __skb_recv_datagram for actually receiving data. This implies that a blocking reader will go to sleep with this mutex held if there's presently no data to return to userspace. Two non-desirable side effects of this are that a later non-blocking read call on the same socket will block on the ->readlock mutex until the earlier blocking call releases it (or the readers is interrupted) and that later blocking read calls will wait longer than the effective socket read timeout says they should: The timeout will only start 'ticking' once such a reader hits the schedule_timeout in wait_for_more_packets (core.c) while the time it already had to wait until it could acquire the mutex is unaccounted for. The patch avoids both by using the __skb_try_recv_datagram and __skb_wait_for_more packets functions created by the first patch to implement a unix_dgram_recvmsg read loop which releases the readlock mutex prior to going to sleep and reacquires it as needed afterwards. Non-blocking readers will thus immediately return with -EAGAIN if there's no data available regardless of any concurrent blocking readers and all blocking readers will end up sleeping via schedule_timeout, thus honouring the configured socket receive timeout. Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-06core: enable more fine-grained datagram reception controlRainer Weikusat
The __skb_recv_datagram routine in core/ datagram.c provides a general skb reception factility supposed to be utilized by protocol modules providing datagram sockets. It encompasses both the actual recvmsg code and a surrounding 'sleep until data is available' loop. This is inconvenient if a protocol module has to use additional locking in order to maintain some per-socket state the generic datagram socket code is unaware of (as the af_unix code does). The patch below moves the recvmsg proper code into a new __skb_try_recv_datagram routine which doesn't sleep and renames wait_for_more_packets to __skb_wait_for_more_packets, both routines being exported interfaces. The original __skb_recv_datagram routine is reimplemented on top of these two functions such that its user-visible behaviour remains unchanged. Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-06PHY: DP83867: Remove looking in parent device for OF propertiesAndrew Lunn
Device tree properties for a phy device are expected to be in the phy node. The current code for the DP83867 also tries to look in the parent node. The devices binding documentation does not mention this, no current device tree file makes use of this, and it is not behaviour we want. So remove looking in the parent device. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-06net: cdc_ncm: add "ndp_to_end" sysfs attributeBjørn Mork
Adding a writable sysfs attribute for the "NDP to end" quirk flag. This makes it easier for end users to test new devices for this firmware bug. We've been lucky so far, but we should not depend on reporters capable of rebuilding the driver. Cc: Enrico Mioso <mrkiko.rs@gmail.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-06Merge branch 'mlx4-HA-LAG-SRIOV-VF'David S. Miller
Or Gerlitz says: ==================== Add HA and LAG support for mlx4 SRIOV VFs This series is built upon the code added in commit ce388ff "Merge branch 'mlx4-next'" which added HA and LAG support to mlx4 RoCE and SRIOV services. We add HA and Link Aggregation support to single ported mlx4 Ethernet VFs. In this case, the PF Ethernet interfaces are bonded, the VFs see single port HW devices (already supported) -- however, now this port is highly available. This means that all VF HW QPs (both VF Ethernet driver and VF RoCE / RAW QPs) are subject to the V2P (Virtual-To-Physical) mapping which is managed by the PF driver, and hence resilient across link failures and such events. When bonding operates in Dynamic link aggregation (802.3ad) mode, traffic from each VF will go over the VF "base port" (the port this VF is assigned to by the admin) as long as this port is up. When the port fails, traffic from all VFs that are defined on this port will move to the other port, and be back to their base-port when it recovers. Moni and Or. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-06net/mlx4_core: Support the HA mode for SRIOV VFs tooMoni Shoua
When the mlx4 driver runs in HA mode, and all VFs are single ported ones, we make their single port Highly-Available. This is done by taking advantage of the HA mode properties (following bonding changes with programming the port V2P map, etc) and adding the missing parts which are unique to SRIOV such as mirroring VF steering rules on both ports. Due to limits on the MAC and VLAN table this mode is enabled only when number of total VFs is under 64. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-06IB/mlx4: Use the VF base-port when demuxing mad from wireOr Gerlitz
Under HA mode, it's possible that the VF registered its GID (and expects to get mads through the PV scheme) on a port which is different from the one this mad arrived on, due to HA fail over. Therefore, if the gid is not matched on the port that the packet arrived on, check for a match on the other port if HA mode is active -- and if a match is found on the other port, continue processing the mad using that other port. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-06net/mlx4_core: Keep VLAN/MAC tables mirrored in multifunc HA modeMoni Shoua
Due to HW limitations, indexes to MAC and VLAN tables are always taken from the table of the actual port. So, if a resource holds an index to a table, it may refer to different values during the lifetime of the resource, unless the tables are mirrored. Also, even when driver is not in HA mode the policy of allocating an index to these tables is such to make sure, as much as possible, that when the time comes the mirroring will be successful. This means that in multifunction mode the allocation of a free index in a port's table tries to make sure that the same index in the other's port table is also free. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-06net/mlx4_core: Support mirroring VF DMFS rules on both portsMoni Shoua
Under HA mode, steering rules set by VFs should be mirrored on both ports of the device so packets will be accepted no matter on which port they arrived. Since getting into HA mode is done dynamically when the user bonds mlx4 Ethernet netdevs, we keep hold of the VF DMFS rule mbox with the port value flipped (1->2,2->1) and execute the mirroring when getting into HA mode. Later, when going out of HA mode, we unset the mirrored rules. In that context note that mirrored rules cannot be removed explicitly. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>