summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-04-26bpf: fix xdp_generic for bpf_adjust_tail usecaseNikita V. Shirokov
When bpf_adjust_tail was introduced for generic xdp, it changed skb's tail pointer, so it was pointing to the new "end of the packet". However skb's len field wasn't properly modified, so on the wire ethernet frame had original (or even bigger, if adjust_head was used) size. This diff is fixing this. Fixes: 198d83bb3 (" bpf: make generic xdp compatible w/ bpf_xdp_adjust_tail") Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-26tools, bpftool: Display license GPL compatible in prog show/listJiri Olsa
Display the license "gpl" string in bpftool prog command, like: # bpftool prog list 5: tracepoint name func tag 57cd311f2e27366b gpl loaded_at Apr 26/09:37 uid 0 xlated 16B not jited memlock 4096B # bpftool --json --pretty prog show [{ "id": 5, "type": "tracepoint", "name": "func", "tag": "57cd311f2e27366b", "gpl_compatible": true, "loaded_at": "Apr 26/09:37", "uid": 0, "bytes_xlated": 16, "jited": false, "bytes_memlock": 4096 } ] Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-26tools, bpf: Sync bpf.h uapi headerJiri Olsa
Syncing the bpf.h uapi header with tools. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-26bpf: Add gpl_compatible flag to struct bpf_prog_infoJiri Olsa
Adding gpl_compatible flag to struct bpf_prog_info so it can be dumped via bpf_prog_get_info_by_fd and displayed via bpftool progs dump. Alexei noticed 4-byte hole in struct bpf_prog_info, so we put the u32 flags field in there, and we can keep adding bit fields in there without breaking user space. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-26Merge branch 'udp-gso'David S. Miller
Willem de Bruijn says: ==================== udp gso Segmentation offload reduces cycles/byte for large packets by amortizing the cost of protocol stack traversal. This patchset implements GSO for UDP. A process can concatenate and submit multiple datagrams to the same destination in one send call by setting socket option SOL_UDP/UDP_SEGMENT with the segment size, or passing an analogous cmsg at send time. The stack will send the entire large (up to network layer max size) datagram through the protocol layer. At the GSO layer, it is broken up in individual segments. All receive the same network layer header and UDP src and dst port. All but the last segment have the same UDP header, but the last may differ in length and checksum. Initial results show a significant reduction in UDP cycles/byte. See the main patch for more details and benchmark results. udp 876 MB/s 14873 msg/s 624666 calls/s 11,205,777,429 cycles udp gso 2139 MB/s 36282 msg/s 36282 calls/s 11,204,374,561 cycles The patch set is broken down as follows: - patch 1 is a prerequisite: code rearrangement, noop otherwise - patch 2 implements the gso logic - patch 3 adds protocol stack support for UDP_SEGMENT - patch 4,5,7 are refinements - patch 6 adds the cmsg interface - patch 8..11 are tests This idea was presented previously at netconf 2017-2 http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf Changes v1 -> v2 - Convert __udp_gso_segment to modify headers after skb_segment - Split main patch into two, one for gso logic, one for UDP_SEGMENT Changes RFC -> v1 - MSG_MORE: fixed, by allowing checksum offload with corking if gso - SKB_GSO_UDP_L4: made independent from SKB_GSO_UDP and removed skb_is_ufo() wrapper - NETIF_F_GSO_UDP_L4: add to netdev_features_string and to netdev-features.txt add BUILD_BUG_ON to match SKB_GSO_UDP_L4 value - UDP_MAX_SEGMENTS: introduce limit on number of segments per gso skb to avoid extreme cases like IP_MAX_MTU/IPV4_MIN_MTU - CHECKSUM_PARTIAL: test against missing feature after ndo_features_check if not supported return error, analogous to udp_send_check - MSG_ZEROCOPY: removed, deferred for now ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-26selftests: udp gso benchmarkWillem de Bruijn
Send udp data between a source and sink, optionally with udp gso. The two processes are expected to be run on separate hosts. A script is included that runs them together over loopback in a single namespace for functionality testing. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-26selftests: udp gso with corkingWillem de Bruijn
Corked sockets take a different path to construct a udp datagram than the lockless fast path. Test this alternate path. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-26selftests: udp gso with connected socketsWillem de Bruijn
Connected sockets use path mtu instead of device mtu. Test this path by inserting a route mtu that is lower than the device mtu. Verify that the path mtu for the connection matches this lower number, then run the same test as in the connectionless case. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-26selftests: udp gsoWillem de Bruijn
Validate udp gso, including edge cases (such as min/max gso sizes). Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-26udp: add gso support to virtual devicesWillem de Bruijn
Virtual devices such as tunnels and bonding can handle large packets. Only segment packets when reaching a physical or loopback device. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-26udp: add gso segment cmsgWillem de Bruijn
Allow specifying segment size in the send call. The new control message performs the same function as socket option UDP_SEGMENT while avoiding the extra system call. [ Export udp_cmsg_send for ipv6. -DaveM ] Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-26udp: paged allocation with gsoWillem de Bruijn
When sending large datagrams that are later segmented, store data in page frags to avoid copying from linear in skb_segment. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-26udp: better wmem accounting on gsoWillem de Bruijn
skb_segment by default transfers allocated wmem from the gso skb to the tail of the segment list. This underreports real truesize of the list, especially if the tail might be dropped. Similar to tcp_gso_segment, update wmem_alloc with the aggregate list truesize and make each segment responsible for its own share by setting skb->destructor. Clear gso_skb->destructor prior to calling skb_segment to skip the default assignment to tail. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-26udp: generate gso with UDP_SEGMENTWillem de Bruijn
Support generic segmentation offload for udp datagrams. Callers can concatenate and send at once the payload of multiple datagrams with the same destination. To set segment size, the caller sets socket option UDP_SEGMENT to the length of each discrete payload. This value must be smaller than or equal to the relevant MTU. A follow-up patch adds cmsg UDP_SEGMENT to specify segment size on a per send call basis. Total byte length may then exceed MTU. If not an exact multiple of segment size, the last segment will be shorter. The implementation adds a gso_size field to the udp socket, ip(v6) cmsg cookie and inet_cork structure to be able to set the value at setsockopt or cmsg time and to work with both lockless and corked paths. Initial benchmark numbers show UDP GSO about as expensive as TCP GSO. tcp tso 3197 MB/s 54232 msg/s 54232 calls/s 6,457,754,262 cycles tcp gso 1765 MB/s 29939 msg/s 29939 calls/s 11,203,021,806 cycles tcp without tso/gso * 739 MB/s 12548 msg/s 12548 calls/s 11,205,483,630 cycles udp 876 MB/s 14873 msg/s 624666 calls/s 11,205,777,429 cycles udp gso 2139 MB/s 36282 msg/s 36282 calls/s 11,204,374,561 cycles [*] after reverting commit 0a6b2a1dc2a2 ("tcp: switch to GSO being always on") Measured total system cycles ('-a') for one core while pinning both the network receive path and benchmark process to that core: perf stat -a -C 12 -e cycles \ ./udpgso_bench_tx -C 12 -4 -D "$DST" -l 4 Note the reduction in calls/s with GSO. Bytes per syscall drops increases from 1470 to 61818. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-26udp: add udp gsoWillem de Bruijn
Implement generic segmentation offload support for udp datagrams. A follow-up patch adds support to the protocol stack to generate such packets. UDP GSO is not UFO. UFO fragments a single large datagram. GSO splits a large payload into a number of discrete UDP datagrams. The implementation adds a GSO type SKB_UDP_GSO_L4 to differentiate it from UFO (SKB_UDP_GSO). IPPROTO_UDPLITE is excluded, as that protocol has no gso handler registered. [ Export __udp_gso_segment for ipv6. -DaveM ] Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-26udp: expose inet cork to udpWillem de Bruijn
UDP segmentation offload needs access to inet_cork in the udp layer. Pass the struct to ip(6)_make_skb instead of allocating it on the stack in that function itself. This patch is a noop otherwise. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Merging net into net-next to help the bpf folks avoid some really ugly merge conflicts. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25Merge branch '1GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 1GbE Intel Wired LAN Driver Updates 2018-04-25 This series enables some ethtool and tc-flower filters to be offloaded to igb-based network controllers. This is useful when the system configuration wants to steer kinds of traffic to a specific hardware queue for i210 devices only. The first two patch in the series are bug fixes. The basis of this series is to export the internal API used to configure address filters, so they can be used by ethtool, and extending the functionality so an source address can be handled. Then, we enable the tc-flower offloading implementation to re-use the same infrastructure as ethtool, and storing them in the per-adapter "nfc" (Network Filter Config?) list. But for consistency, for destructive access they are separated, i.e. an filter added by tc-flower can only be removed by tc-flower, but ethtool can read them all. Only support for VLAN Prio, Source and Destination MAC Address, and Ethertype is enabled for now. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2018-04-25 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix to clear the percpu metadata_dst that could otherwise carry stale ip_tunnel_info, from William. 2) Fix that reduces the number of passes in x64 JIT with regards to dead code sanitation to avoid risk of prog rejection, from Gianluca. 3) Several fixes of sockmap programs, besides others, fixing a double page_put() in error path, missing refcount hold for pinned sockmap, adding required -target bpf for clang in sample Makefile, from John. 4) Fix to disable preemption in __BPF_PROG_RUN_ARRAY() paths, from Roman. 5) Fix tools/bpf/ Makefile with regards to a lex/yacc build error seen on older gcc-5, from John. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25bpf: fix for lex/yacc build error with gcc-5John Fastabend
Fix build error found with Ubuntu shipped gcc-5 ~/git/bpf/tools/bpf$ make all Auto-detecting system features: ... libbfd: [ OFF ] ... disassembler-four-args: [ OFF ] CC bpf_jit_disasm.o LINK bpf_jit_disasm CC bpf_dbg.o /home/john/git/bpf/tools/bpf/bpf_dbg.c: In function ‘cmd_load’: /home/john/git/bpf/tools/bpf/bpf_dbg.c:1077:13: warning: ‘cont’ may be used uninitialized in this function [-Wmaybe-uninitialized] } else if (matches(subcmd, "pcap") == 0) { ^ LINK bpf_dbg CC bpf_asm.o make: *** No rule to make target `bpf_exp.yacc.o', needed by `bpf_asm'. Stop. Fixes: 5a8997f20715 ("tools: bpf: respect output directory during build") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-25Merge branch '10GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 10GbE Intel Wired LAN Driver Updates 2018-04-25 This series represents yet another phase of the macvlan cleanup Alex has been working on. The main goal of these changes is to make it so that we only support offloading what we can actually offload and we don't break any existing functionality. So for example we were claiming to advertise source mode macvlan and we were doing nothing of the sort, so support for that has been dropped. The biggest change with this set is that broadcast/multicast replication is no longer being supported in software. Alex dropped it as it leads to scaling issues when a broadcast frame has to be replicated up to 64 times. Beyond that this set goes through and optimized the time needed to bring up and tear down the macvlan interfaces on ixgbe and provides a clean way for us to disable the macvlan offload when needed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25rds: ib: Fix missing call to rds_ib_dev_put in rds_ib_setup_qpDag Moxnes
The function rds_ib_setup_qp is calling rds_ib_get_client_data and should correspondingly call rds_ib_dev_put. This call was lost in the non-error path with the introduction of error handling done in commit 3b12f73a5c29 ("rds: ib: add error handle") Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25net/smc: keep clcsock reference in smc_tcp_listen_work()Ursula Braun
The internal CLC socket should exist till the SMC-socket is released. Function tcp_listen_worker() releases the internal CLC socket of a listen socket, if an smc_close_active() is called. This function is called for the final release(), but it is called for shutdown SHUT_RDWR as well. This opens a door for protection faults, if socket calls using the internal CLC socket are called for a shutdown listen socket. With the changes of commit 3d502067599f ("net/smc: simplify wait when closing listen socket") there is no need anymore to release the internal CLC socket in function tcp_listen_worker((). It is sufficient to release it in smc_release(). Fixes: 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Reported-by: syzbot+9045fc589fcd196ef522@syzkaller.appspotmail.com Reported-by: syzbot+28a2c86cf19c81d871fa@syzkaller.appspotmail.com Reported-by: syzbot+9605e6cace1b5efd4a0a@syzkaller.appspotmail.com Reported-by: syzbot+cf9012c597c8379d535c@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25mkiss: remove redundant check for len > 0Colin Ian King
The check for len > 0 is always true and hence is redundant as this check is already being made to execute the code inside the while-loop. Hence it is redundant and can be removed. Cleans up cppcheck warning: drivers/net/hamradio/mkiss.c:220: (warning) Identical inner 'if' condition is always true. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25net: amd8111e: remove redundant duplicated if statementColin Ian King
There are two identical nested if statements, the second is redundant and can be removed. Also clean up white space formatting. Cleans up cppcheck warning: drivers/net/ethernet/amd/amd8111e.c:1080: (warning) Identical inner 'if' condition is always true. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25sctp: remove the unused sctp_assoc_is_match functionXin Long
After Commit 4f0087812648 ("sctp: apply rhashtable api to send/recv path"), there's no place using sctp_assoc_is_match, so remove it. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25igb: Add support for adding offloaded clsflower filtersVinicius Costa Gomes
This allows filters added by tc-flower and specifying MAC addresses, Ethernet types, and the VLAN priority field, to be offloaded to the controller. This reuses most of the infrastructure used by ethtool, but clsflower filters are kept in a separated list, so they are invisible to ethtool. To setup clsflower offloading: $ tc qdisc replace dev eth0 handle 100: parent root mqprio \ num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \ queues 1@0 1@1 2@2 hw 0 (clsflower offloading depends on the netword driver to be configured with multiple traffic classes, we use mqprio's 'num_tc' parameter to set it to 3) $ tc qdisc add dev eth0 ingress Examples of filters: $ tc filter add dev eth0 parent ffff: flower \ dst_mac aa:aa:aa:aa:aa:aa \ hw_tc 2 skip_sw (just a simple filter filtering for the destination MAC address and steering that traffic to queue 2) $ tc filter add dev enp2s0 parent ffff: proto 0x22f0 flower \ src_mac cc:cc:cc:cc:cc:cc \ hw_tc 1 skip_sw (as the i210 doesn't support steering traffic based on the source address alone, we need to use another steering traffic, in this case we are using the ethernet type (0x22f0) to steer traffic to queue 1) Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25Merge branch 'nfp-flower-tc-block-support-and-nfp-PCI-updates'David S. Miller
Jakub Kicinski says: ==================== nfp: flower tc block support and nfp PCI updates This series improves the nfp PCIe code by making use of the new pcie_print_link_status() helper and resetting NFP locks when driver loads. This can help us avoid lock ups after host crashes and is rebooted with PCIe reset or when kdump kernel is loaded. The flower changes come from John, he says: This patchset fixes offload issues when multiple repr netdevs are bound to a tc block and filter rules added. Previously the rule would be passed to the reprs and would be rejected in all but the first as the cookie value will indicate a duplicate. The first patch extends the flow lookup function to consider both host context and ingress netdev along with the cookie value. This means that a rule with a given cookie can exist multiple times assuming the ingress netdev is different. The host context ensures that stats from fw are associated with the correct instance of the rule. The second patch protects against rejecting add/del/stat messages when a rule has a repr as both an ingress port and an egress dev. In such cases a callback can be triggered twice (once for ingress and once for egress) and can lead to duplicate rule detection or incorrect double calls. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25nfp: flower: ignore duplicate cb requests for same ruleJohn Hurley
If a flower rule has a repr both as ingress and egress port then 2 callbacks may be generated for the same rule request. Add an indicator to each flow as to whether or not it was added from an ingress registered cb. If so then ignore add/del/stat requests to it from an egress cb. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25nfp: flower: support offloading multiple rules with same cookieJohn Hurley
When multiple netdevs are attached to a tc offload block and register for callbacks, a rule added to the block will be propogated to all netdevs. Previously these were detected as duplicates (based on cookie) and rejected. Modify the rule nfp lookup function to optionally include an ingress netdev and a host context along with the cookie value when searching for a rule. When a new rule is passed to the driver, the netdev the rule is to be attached to is considered when searching for dublicates. When a stats update is received from HW, the host context is used alongside the cookie to map to the correct host rule. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25nfp: print PCIe link bandwidth on probeJakub Kicinski
To aid debugging of performance issues caused by limited PCIe bandwidth print the PCIe link information on probe. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25nfp: reset local locks on initJakub Kicinski
NFP locks record the owner when held, for PCIe devices the owner ID will be the PCIe link number. When driver loads it should scan known locks and if they indicate that they are held by local endpoint but the driver doesn't hold them - release them. Locks can be left taken for instance when kernel gets kexec-ed or after a crash. Management FW tries to clean up stale locks too, but it currently depends on PCIe link going down which doesn't always happen. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25igb: Add the skeletons for tc-flower offloadingVinicius Costa Gomes
This adds basic functions needed to implement offloading for filters created by tc-flower. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Add MAC address support for ethtool nftuple filtersVinicius Costa Gomes
This adds the capability of configuring the queue steering of arriving packets based on their source and destination MAC addresses. Source address steering (i.e. driving traffic to a specific queue), for the i210, does not work, but filtering does (i.e. accepting traffic based on the source address). So, trying to add a filter specifying only a source address will be an error. In practical terms this adds support for the following use cases, characterized by these examples: $ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0 (this will direct packets with destination address "aa:aa:aa:aa:aa:aa" to the RX queue 0) $ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 \ proto 0x22f0 action 3 (this will direct packets with source address "44:44:44:44:44:44" and ethertype 0x22f0 to the RX queue 3) Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Enable nfc filters to specify MAC addressesVinicius Costa Gomes
This allows igb_add_filter()/igb_erase_filter() to work on filters that include MAC addresses (both source and destination). For now, this only exposes the functionality, the next commit glues ethtool into this. Later in this series, these APIs are used to allow offloading of cls_flower filters. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Allow filters to be added for the local MAC addressVinicius Costa Gomes
Users expect that when adding a steering filter for the local MAC address, that all the traffic directed to that address will go to some queue. Currently, it's not possible to configure entries in the "in use" state, which is the normal state of the local MAC address entry (it is the default), this patch allows to override the steering configuration of "in use" entries, if the filter to be added match the address and address type (source or destination) of an existing entry. There is a bit of a special handling for entries referring to the local MAC address, when they are removed, only the steering configuration is reset. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Add support for enabling queue steering in filtersVinicius Costa Gomes
On some igb models (82575 and i210) the MAC address filters can control to which queue the packet will be assigned. This extends the 'state' with one more state to signify that queue selection should be enabled for that filter. As 82575 parts are no longer easily obtained (and this was developed against i210), only support for the i210 model is enabled. These functions are exported and will be used in the next patch. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Add support for MAC address filters specifying source addressesVinicius Costa Gomes
Makes it possible to direct packets to queues based on their source address. Documents the expected usage of the 'flags' parameter. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Enable the hardware traffic class feature bit for igb modelsVinicius Costa Gomes
This will allow functionality depending on the hardware being traffic class aware to work. In particular the tc-flower offloading checks verifies that this bit is set. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Fix queue selection on MAC filters on i210Vinicius Costa Gomes
On the RAH registers there are semantic differences on the meaning of the "queue" parameter for traffic steering depending on the controller model: there is the 82575 meaning, which "queue" means a RX Hardware Queue, and the i350 meaning, where it is a reception pool. The previous behaviour was having no effect for i210 based controllers because the QSEL bit of the RAH register wasn't being set. This patch separates the condition in discrete cases, so the different handling is clearer. Fixes: 83c21335c876 ("igb: improve MAC filter handling") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25net: rules: Move l3mdev attribute validation to a helperDavid Ahern
Move the check on FRA_L3MDEV attribute to helper to improve the readability of fib_nl2rule. Update the extack messages to be clear when the configuration option is disabled versus an invalid value has been passed. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25sctp: fix identification of new acks for SFR-CACCMarcelo Ricardo Leitner
It's currently written as: if (!tchunk->tsn_gap_acked) { [1] tchunk->tsn_gap_acked = 1; ... } if (TSN_lte(tsn, sack_ctsn)) { if (!tchunk->tsn_gap_acked) { /* SFR-CACC processing */ ... } } Which causes the SFR-CACC processing on ack reception to never process, as tchunk->tsn_gap_acked is always true by then. Block [1] was moved to that position by the commit marked below. This patch fixes it by doing SFR-CACC processing earlier, before tsn_gap_acked is set to true. Fixes: 31b02e154940 ("sctp: Failover transmitted list on transport delete") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25sctp: fix const parameter violation in sctp_make_sackMarcelo Ricardo Leitner
sctp_make_sack() make changes to the asoc and this cast is just bypassing the const attribute. As there is no need to have the const there, just remove it and fix the violation. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25neighbour: support for NTF_EXT_LEARNED flagRoopa Prabhu
This patch extends NTF_EXT_LEARNED support to the neighbour system. Example use-case: An Ethernet VPN implementation (eg in FRR routing suite) can use this flag to add dynamic reachable external neigh entires learned via control plane. The use of neigh NTF_EXT_LEARNED in this patch is consistent with its use with bridge and vxlan fdb entries. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25ipv6: addrconf: don't evaluate keep_addr_on_down twiceIvan Vecera
The addrconf_ifdown() evaluates keep_addr_on_down state twice. There is no need to do it. Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25ipv6: sr: Compute flowlabel for outer IPv6 header of seg6 encap modeAhmed Abdelsalam
ECMP (equal-cost multipath) hashes are typically computed on the packets' 5-tuple(src IP, dst IP, src port, dst port, L4 proto). For encapsulated packets, the L4 data is not readily available and ECMP hashing will often revert to (src IP, dst IP). This will lead to traffic polarization on a single ECMP path, causing congestion and waste of network capacity. In IPv6, the 20-bit flow label field is also used as part of the ECMP hash. In the lack of L4 data, the hashing will be on (src IP, dst IP, flow label). Having a non-zero flow label is thus important for proper traffic load balancing when L4 data is unavailable (i.e., when packets are encapsulated). Currently, the seg6_do_srh_encap() function extracts the original packet's flow label and set it as the outer IPv6 flow label. There are two issues with this behaviour: a) There is no guarantee that the inner flow label is set by the source. b) If the original packet is not IPv6, the flow label will be set to zero (e.g., IPv4 or L2 encap). This patch adds a function, named seg6_make_flowlabel(), that computes a flow label from a given skb. It supports IPv6, IPv4 and L2 payloads, and leverages the per namespace 'seg6_flowlabel" sysctl value. The currently support behaviours are as follows: -1 set flowlabel to zero. 0 copy flowlabel from Inner paceket in case of Inner IPv6 (Set flowlabel to 0 in case IPv4/L2) 1 Compute the flowlabel using seg6_make_flowlabel() This patch has been tested for IPv6, IPv4, and L2 traffic. Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com> Acked-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25net: phy: allow scanning busses with missing physAlexandre Belloni
Some MDIO busses will error out when trying to read a phy address with no phy present at that address. In that case, probing the bus will fail because __mdiobus_register() is scanning the bus for all possible phys addresses. In case MII_PHYSID1 returns -EIO or -ENODEV, consider there is no phy at this address and set the phy ID to 0xffffffff which is then properly handled in get_phy_device(). Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25igb: Fix not adding filter elements to the listVinicius Costa Gomes
Because the order of the parameters passes to 'hlist_add_behind()' was inverted, the 'parent' node was added "behind" the 'input', as input is not in the list, this causes the 'input' node to be lost. Fixes: 0e71def25281 ("igb: add support of RX network flow classification") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25ixgbe: Avoid performing unnecessary resets for macvlan offloadAlexander Duyck
The original implementation for macvlan offload has us performing a full port reset every time we added a new macvlan. This shouldn't be necessary and can be avoided with a few behavior changes. This patches updates the logic for the queues so that we have essentially 3 possible configurations for macvlan offload. They consist of 15 macvlans with 4 queues per macvlan, 31 macvlans with 2 queues per macvlan, and 63 macvlans with 1 queue per macvlan. As macvlans are added you will encounter up to 3 total resets if you add all the way up to 63, and after that the device will stay in the mode supporting up to 63 macvlans until the L2FW flag is cleared. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25ixgbe: Drop real_adapter from l2 fwd acceleration structureAlexander Duyck
This patch drops the real_adapter member from the fwd_adapter structure. The general idea behind the change is that the real_adapter is carrying unnecessary data since we could always just grab the adapter structure from netdev_priv(macvlan->lowerdev) if we really needed to get at it. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>