summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-09-17tcp: prepare skbs for better sack shiftingEric Dumazet
With large BDP TCP flows and lossy networks, it is very important to keep a low number of skbs in the write queue. RACK and SACK processing can perform a linear scan of it. We should avoid putting any payload in skb->head, so that SACK shifting can be done if needed. With this patch, we allow to pack ~0.5 MB per skb instead of the 64KB initially cooked at tcp_sendmsg() time. This gives a reduction of number of skbs in write queue by eight. tcp_rack_detect_loss() likes this. We still allow payload in skb->head for first skb put in the queue, to not impact RPC workloads. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17Merge tag 'wireless-drivers-next-for-davem-2016-09-15' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.9 Major changes: iwlwifi * preparation for new a000 HW continues * some DQA improvements * add support for GMAC * add support for 9460, 9270 and 9170 series mwifiex * support random MAC address for scanning * add HT aggregation support for adhoc mode * add custom regulatory domain support * add manufacturing mode support via nl80211 testmode interface bcma * support BCM53573 series of wireless SoCs bitfield.h * add FIELD_PREP() and FIELD_GET() macros mt7601u * convert to use the new bitfield.h macros brcmfmac * add support for bcm4339 chip with modalias sdio:c00v02D0d4339 ath10k * add nl80211 testmode support for 10.4 firmware * hide kernel addresses from logs using %pK format specifier * implement NAPI support * enable peer stats by default ath9k * use ieee80211_tx_status_noskb where possible wil6210 * extract firmware capabilities from the firmware file ath6kl * enable firmware crash dumps on the AR6004 ath-current is also merged to fix a conflict in ath10k. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17Merge branch 'mlx5e-order-0'David S. Miller
Tariq Toukan says: ==================== mlx5e Order-0 pages for Striding RQ In this series, we refactor our Striding RQ receive-flow to always use fragmented WQEs (Work Queue Elements) using order-0 pages, omitting the flow that allocates and splits high-order pages which would fragment and deplete high-order pages in the system. The first patch gives a slight degradation, but opens the opportunity to using a simple page-cache mechanism of a fair size. The page-cache, implemented in patch 3, not only closes the performance gap but even gives a gain. In patch 2 we re-organize the code to better manage the calls for alloc/de-alloc pages in the RX flow. Series generated against net-next commit: bed806cb266e "Merge branch 'mlxsw-ethtool'" ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17net/mlx5e: Implement RX mapped page cache for page recycleTariq Toukan
Instead of reallocating and mapping pages for RX data-path, recycle already used pages in a per ring cache. Performance tests: The following results were measured on a freshly booted system, giving optimal baseline performance, as high-order pages are yet to be fragmented and depleted. We ran pktgen single-stream benchmarks, with iptables-raw-drop: Single stride, 64 bytes: * 4,739,057 - baseline * 4,749,550 - order0 no cache * 4,786,899 - order0 with cache 1% gain Larger packets, no page cross, 1024 bytes: * 3,982,361 - baseline * 3,845,682 - order0 no cache * 4,127,852 - order0 with cache 3.7% gain Larger packets, every 3rd packet crosses a page, 1500 bytes: * 3,731,189 - baseline * 3,579,414 - order0 no cache * 3,931,708 - order0 with cache 5.4% gain Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17net/mlx5e: Introduce API for RX mapped pagesTariq Toukan
Manage the allocation and deallocation of mapped RX pages only through dedicated API functions. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17net/mlx5e: Single flow order-0 pages for Striding RQTariq Toukan
To improve the memory consumption scheme, we omit the flow that demands and splits high-order pages in Striding RQ, and stay with a single Striding RQ flow that uses order-0 pages. Moving to fragmented memory allows the use of larger MPWQEs, which reduces the number of UMR posts and filler CQEs. Moving to a single flow allows several optimizations that improve performance, especially in production servers where we would anyway fallback to order-0 allocations: - inline functions that were called via function pointers. - improve the UMR post process. This patch alone is expected to give a slight performance reduction. However, the new memory scheme gives the possibility to use a page-cache of a fair size, that doesn't inflate the memory footprint, which will dramatically fix the reduction and even give a performance gain. Performance tests: The following results were measured on a freshly booted system, giving optimal baseline performance, as high-order pages are yet to be fragmented and depleted. We ran pktgen single-stream benchmarks, with iptables-raw-drop: Single stride, 64 bytes: * 4,739,057 - baseline * 4,749,550 - this patch no reduction Larger packets, no page cross, 1024 bytes: * 3,982,361 - baseline * 3,845,682 - this patch 3.5% reduction Larger packets, every 3rd packet crosses a page, 1500 bytes: * 3,731,189 - baseline * 3,579,414 - this patch 4% reduction Fixes: 461017cb006a ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)") Fixes: bc77b240b3c5 ("net/mlx5e: Add fragmented memory support for RX multi packet WQE") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17rxrpc: Make IPv6 support conditional on CONFIG_IPV6David Howells
Add CONFIG_AF_RXRPC_IPV6 and make the IPv6 support code conditional on it. This is then made conditional on CONFIG_IPV6. Without this, the following can be seen: net/built-in.o: In function `rxrpc_init_peer': >> peer_object.c:(.text+0x18c3c8): undefined reference to `ip6_route_output_flags' Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16Merge branch 'QCA8K'David S. Miller
John Crispin says: ==================== net-next: dsa: add QCA8K support This series is based on the AR8xxx series posted by Matthieu Olivari in may 2015. The following changes were made since then * fixed the nitpicks from the previous review * updated to latest API * turned it into an mdio device * added callbacks for fdb, bridge offloading, stp, eee, port status * fixed several minor issues to the port setup and arp learning * changed the namespacing as this driver to qca8k The driver has so far only been tested on qca8337/N. It should work on other QCA switches such as the qca8327 with minor changes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16net-next: dsa: add new driver for qca8xxx familyJohn Crispin
This patch contains initial support for the QCA8337 switch. It will detect a QCA8337 switch, if present and declared in the DT. Each port will be represented through a standalone net_device interface, as for other DSA switches. CPU can communicate with any of the ports by setting an IP@ on ethN interface. Most of the extra callbacks of the DSA subsystem are already supported, such as bridge offloading, stp, fdb. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16net-next: dsa: add Qualcomm tag RX/TX handlerJohn Crispin
Add support for the 2-bytes Qualcomm tag that gigabit switches such as the QCA8337/N might insert when receiving packets, or that we need to insert while targeting specific switch ports. The tag is inserted directly behind the ethernet header. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16Documentation: devicetree: add qca8k bindingJohn Crispin
Add device-tree binding for ar8xxx switch families. Cc: devicetree@vger.kernel.org Signed-off-by: John Crispin <john@phrozen.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16net: emac: remove .owner field for driverWei Yongjun
Remove .owner field if calls are used which set it automatically. Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16net: emac: remove unnecessary dev_set_drvdata()Wei Yongjun
The driver core clears the driver data to NULL after device_release or on probe failure. Thus, it is not needed to manually clear the device driver data to NULL. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16net: dsa: b53: Remove unused including <linux/version.h>Wei Yongjun
Remove including <linux/version.h> that don't need it. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16net: dsa: bcm_sf2: Fix non static symbol warningWei Yongjun
Fixes the following sparse warning: drivers/net/dsa/bcm_sf2.c:963:19: warning: symbol 'bcm_sf2_io_ops' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16tcp: fix a stale ooo_last_skb after a replaceEric Dumazet
When skb replaces another one in ooo queue, I forgot to also update tp->ooo_last_skb as well, if the replaced skb was the last one in the queue. To fix this, we simply can re-use the code that runs after an insertion, trying to merge skbs at the right of current skb. This not only fixes the bug, but also remove all small skbs that might be a subset of the new one. Example: We receive segments 2001:3001, 4001:5001 Then we receive 2001:8001 : We should replace 2001:3001 with the big skb, but also remove 4001:50001 from the queue to save space. packetdrill test demonstrating the bug 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> +0.100 < . 1:1(0) ack 1 win 1024 +0 accept(3, ..., ...) = 4 +0.01 < . 1001:2001(1000) ack 1 win 1024 +0 > . 1:1(0) ack 1 <nop,nop, sack 1001:2001> +0.01 < . 1001:3001(2000) ack 1 win 1024 +0 > . 1:1(0) ack 1 <nop,nop, sack 1001:2001 1001:3001> Fixes: 9f5afeae5152 ("tcp: use an RB tree for ooo receive queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Yuchung Cheng <ycheng@google.com> Cc: Yaogong Wang <wygivan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16Merge branch 'mediatek-reset-flow'David S. Miller
Sean Wang says: ==================== mediatek: add enhancement into the existing reset flow Current driver only resets DMA used by descriptor rings which can't guarantee it can recover all various kinds of fatal errors, so the patch 1) tries to reset the underlying hardware resource from scratch on Mediatek SoC required for ethernet running. 2) refactors code in order to the reusability of existing code. 3) considers handling for race condition between the reset flow and callbacks registered into core driver called about hardware accessing. 4) introduces power domain usage to hardware setup which leads to have cleanly and completely restore to the state as the initial. Changes since v1: - fix the build error with module built causing undefined symbol for pinctrl_bind_pins, so using pinctrl_select_state instead accomplishes the pin mux setup during the reset process. ==================== Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16net: ethernet: mediatek: avoid race condition during the reset processSean Wang
add the protection of the race condition between the reset process and hardware access happening on the related callbacks. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16net: ethernet: mediatek: add more resets for internal ethernet circuit blockSean Wang
struct mtk_eth has already contained struct regmap ethsys pointer to the address range of the internal circuit reset, so we reuse it to reset more internal blocks on ethernet hardware such as packet processing engine (PPE) and frame engine (FE) instead of rstc which deals with FE only. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16net: ethernet: mediatek: add the whole ethernet reset into the reset processSean Wang
1) original driver only resets DMA used by descriptor rings which can't guarantee it can recover all various kinds of fatal errors, so the patch tries to reset the underlying hardware resource from scratch on Mediatek SoC required for ethernet running, including power, pin mux control, clock and internal circuits on the ethernet in order to restore into the initial state which the rebooted machine gives. 2) add state variable inside structure mtk_eth to help distinguish mtk_hw_init is called between the initialization during boot time or re-initialization during the reset process. 3) add ge_mode variable inside structure mtk_mac for restoring the interface mode of the current setup for the target MAC. 4) remove __init attribute from mtk_hw_init definition Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16net: ethernet: mediatek: add controlling power domain the ethernet belongs toSean Wang
introduce power domain control which the digital circuit of the ethernet belongs to inside the flow of hardware initialization and deinitialization which helps the entire ethernet hardware block could restart cleanly and completely as being back to the initial state when the whole machine reboot. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16net: ethernet: mediatek: cleanup error path inside mtk_hw_initSean Wang
This cleans up the error path inside mtk_hw_init call, causing it able to exit appropriately when something fails and also includes refactoring mtk_cleanup call to make the partial logic reusable on the error path. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16net: ethernet: mediatek: add mtk_hw_deinit call as the opposite to ↵Sean Wang
mtk_hw_init call grouping things related to the deinitialization of what mtk_hw_init call does that help to be reused by the reset process and the error path handling. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16net: ethernet: mediatek: refactoring mtk_hw_init to be reusedSean Wang
the existing mtk_hw_init includes hardware and software initialization inside so that it is slightly hard to reuse them for the process of the reset recovery, so some splitting is made here for keeping hardware initializing relevant thing and the else such as IRQ registration and MDIO initialization what are all about to the interface of core driver moved to the other proper place because they have no needs to register IRQ and re-initialize structure again during the reset process. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16Merge tag 'rxrpc-rewrite-20160913-2' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Support IPv6 Here is a set of patches that add IPv6 support. They need to be applied on top of the just-posted miscellaneous fix patches. They are: (1) Make autobinding of an unconnected socket work when sendmsg() is called to initiate a client call. (2) Don't specify the protocol when creating the client socket, but rather take the default instead. (3) Use rxrpc_extract_addr_from_skb() in a couple of places that were doing the same thing manually. This allows the IPv6 address extraction to be done in fewer places. (4) Add IPv6 support. With this, calls can be made to IPv6 servers from userspace AF_RXRPC programs; AFS, however, can't use IPv6 yet as the RPC calls need to be upgradeable. ==================== Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16Merge tag 'rxrpc-rewrite-20160913-1' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Miscellaneous fixes Here's a set of miscellaneous fix patches. There are a couple of points of note: (1) There is one non-fix patch that adjusts the call ref tracking tracepoint to make kernel API-held refs on calls more obvious. This is a prerequisite for the patch that fixes prealloc refcounting. (2) The final patch alters how jumbo packets that partially exceed the receive window are handled. Previously, space was being left in the Rx buffer for them, but this significantly hurts performance as the Rx window can't be increased to match the OpenAFS Tx window size. Instead, the excess subpackets are discarded and an EXCEEDS_WINDOW ACK is generated for the first. To avoid the problem of someone trying to run the kernel out of space by feeding the kernel a series of overlapping maximal jumbo packets, we stop allowing jumbo packets on a call if we encounter more than three jumbo packets with duplicate or excessive subpackets. ==================== Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15Merge branch 'libcxgb-next'David S. Miller
Varun Prakash says: ==================== iw_cxgb4,cxgbit: remove duplicate code This patch series removes duplicate code from iw_cxgb4 and cxgbit by adding common function definitions in libcxgb. Please review. ==================== Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15libcxgb,iw_cxgb4,cxgbit: add cxgb_mk_rx_data_ack()Varun Prakash
Add cxgb_mk_rx_data_ack() to remove duplicate code to form CPL_RX_DATA_ACK hardware command. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15libcxgb,iw_cxgb4,cxgbit: add cxgb_mk_abort_rpl()Varun Prakash
Add cxgb_mk_abort_rpl() to remove duplicate code to form CPL_ABORT_RPL hardware command. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15libcxgb,iw_cxgb4,cxgbit: add cxgb_mk_abort_req()Varun Prakash
Add cxgb_mk_abort_req() to remove duplicate code to form CPL_ABORT_REQ hardware command. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15libcxgb, iw_cxgb4, cxgbit: add cxgb_mk_close_con_req()Varun Prakash
Add cxgb_mk_close_con_req() to remove duplicate code to form CPL_CLOSE_CON_REQ hardware command. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15libcxgb,iw_cxgb4,cxgbit: add cxgb_mk_tid_release()Varun Prakash
Add cxgb_mk_tid_release() to remove duplicate code to form CPL_TID_RELEASE hardware command. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15libcxgb,iw_cxgb4,cxgbit: add cxgb_compute_wscale()Varun Prakash
Add cxgb_compute_wscale() in libcxgb_cm.h to remove it's duplicate definitions from cxgb4/cm.c and cxgbit/cxgbit_cm.c. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15libcxgb,iw_cxgb4,cxgbit: add cxgb_best_mtu()Varun Prakash
Add cxgb_best_mtu() in libcxgb_cm.h to remove it's duplicate definitions from cxgb4/cm.c and cxgbit/cxgbit_cm.c Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15libcxgb,iw_cxgb4,cxgbit: add cxgb_is_neg_adv()Varun Prakash
Add cxgb_is_neg_adv() in libcxgb_cm.h to remove it's duplicate definitions from cxgb4/cm.c and cxgbit/cxgbit_cm.c. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15libcxgb,iw_cxgb4,cxgbit: add cxgb_find_route6()Varun Prakash
Add cxgb_find_route6() in libcxgb_cm.c to remove it's duplicate definitions from cxgb4/cm.c and cxgbit/cxgbit_cm.c. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15libcxgb,iw_cxgb4,cxgbit: add cxgb_find_route()Varun Prakash
Add cxgb_find_route() in libcxgb_cm.c to remove it's duplicate definitions from cxgb4/cm.c and cxgbit/cxgbit_cm.c. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15libcxgb,iw_cxgb4,cxgbit: add cxgb_get_4tuple()Varun Prakash
Add cxgb_get_4tuple() in libcxgb_cm.c to remove it's duplicate definitions from cxgb4/cm.c and cxgbit/cxgbit_cm.c. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15openvswitch: avoid deferred execution of recirc actionsLance Richardson
The ovs kernel data path currently defers the execution of all recirc actions until stack utilization is at a minimum. This is too limiting for some packet forwarding scenarios due to the small size of the deferred action FIFO (10 entries). For example, broadcast traffic sent out more than 10 ports with recirculation results in packet drops when the deferred action FIFO becomes full, as reported here: http://openvswitch.org/pipermail/dev/2016-March/067672.html Since the current recursion depth is available (it is already tracked by the exec_actions_level pcpu variable), we can use it to determine whether to execute recirculation actions immediately (safe when recursion depth is low) or defer execution until more stack space is available. With this change, the deferred action fifo size becomes a non-issue for currently failing scenarios because it is no longer used when there are three or fewer recursions through ovs_execute_actions(). Suggested-by: Pravin Shelar <pshelar@ovn.org> Signed-off-by: Lance Richardson <lrichard@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15Merge branch 'cls_flower-port-masks'David S. Miller
Or Gerlitz says: ==================== net/sched: cls_flower: Add ports masks This series adds the ability to specify tcp/udp ports masks for TC/flower filter matches. I also removed an unused fields from the flower keys struct and clarified the format of the recently added vlan attibutes. v1--> v2 changes: * fixes typo in patch #2 title and change log (Sergei) * added acks provided by Jiri on v1 FWIW, by mistake the cover letter of V1 (but not the patches) carried V2 tag, hope this doesn't create too much confusion. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15net/sched: cls_flower: Specify vlan attributes format in the UAPI headerOr Gerlitz
Specify the format (size and endianess) for the vlan attributes. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15net/sched: cls_flower: Remove an unused field from the filter key structureOr Gerlitz
Commit c3f8324188fa "net: Add full IPv6 addresses to flow_keys" added an unused instance of struct flow_dissector_key_addrs into struct fl_flow_key, remove it. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15net/sched: cls_flower: Support masking for matching on tcp/udp portsOr Gerlitz
Add the definitions for src/dst udp/tcp port masks and use them when setting && dumping the relevant keys. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15alx: fix error handling in __alx_openTobias Regnery
In commit 9ee7b683ea63 we moved the enablement of msi interrupts earlier in alx_init_intr. If there is an error in alx_alloc_rings, __alx_open returns with an error but msi (or msi-x) interrupts stays enabled. Add a new error label to disable msi (or msi-x) interrupts. Fixes: 9ee7b683ea63 ("alx: refactor msi enablement and disablement") Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15cxgb4vf: don't offload Rx checksums for IPv6 fragmentsHariprasad Shenai
The checksum provided by the device doesn't include the L3 headers, as IPv6 expects Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15net_sched: Introduce skbmod actionJamal Hadi Salim
This action is intended to be an upgrade from a usability perspective from pedit (as well as operational debugability). Compare this: sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \ u32 match ip protocol 1 0xff flowid 1:2 \ action pedit munge offset -14 u8 set 0x02 \ munge offset -13 u8 set 0x15 \ munge offset -12 u8 set 0x15 \ munge offset -11 u8 set 0x15 \ munge offset -10 u16 set 0x1515 \ pipe to: sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \ u32 match ip protocol 1 0xff flowid 1:2 \ action skbmod dmac 02:15:15:15:15:15 Also try to do a MAC address swap with pedit or worse try to debug a policy with destination mac, source mac and etherype. Then make few rules out of those and you'll get my point. In the future common use cases on pedit can be migrated to this action (as an example different fields in ip v4/6, transports like tcp/udp/sctp etc). For this first cut, this allows modifying basic ethernet header. The most important ethernet use case at the moment is when redirecting or mirroring packets to a remote machine. The dst mac address needs a re-write so that it doesnt get dropped or confuse an interconnecting (learning) switch or dropped by a target machine (which looks at the dst mac). And at times when flipping back the packet a swap of the MAC addresses is needed. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15Merge branch 'bpf-next'David S. Miller
Daniel Borkmann says: ==================== Misc cls_bpf/act_bpf improvements Two minor improvements to {cls,act}_bpf. For details please see individual patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15bpf: use skb_at_tc_ingress helper in tcf_bpfDaniel Borkmann
We have a small skb_at_tc_ingress() helper for testing for ingress, so make use of it. cls_bpf already uses it and so should act_bpf. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15bpf: drop unnecessary test in cls_bpf_classify and tcf_bpfDaniel Borkmann
The skb_mac_header_was_set() test in cls_bpf's and act_bpf's fast-path is actually unnecessary and can be removed altogether. This was added by commit a166151cbe33 ("bpf: fix bpf helpers to use skb->mac_header relative offsets"), which was later on improved by 3431205e0397 ("bpf: make programs see skb->data == L2 for ingress and egress"). We're always guaranteed to have valid mac header at the time we invoke cls_bpf_classify() or tcf_bpf(). Reason is that since 6d1ccff62780 ("net: reset mac header in dev_start_xmit()") we do skb_reset_mac_header() in __dev_queue_xmit() before we could call into sch_handle_egress() or any subsequent enqueue. sch_handle_ingress() always sees a valid mac header as well (things like skb_reset_mac_len() would badly fail otherwise). Thus, drop the unnecessary test in classifier and action case. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15net/sched: act_tunnel_key: Remove rcu_read_lock protectionHadar Hen Zion
Remove rcu_read_lock protection from tunnel_key_dump and use rtnl_dereference, dump operation is protected by rtnl lock. Also, remove rcu_read_lock from tunnel_key_release and use rcu_dereference_protected. Both operations are running exclusively and a writer couldn't modify t->params while those functions are executed. Fixes: 54d94fd89d90 ('net/sched: Introduce act_tunnel_key') Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>