summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-10-04dev: advertise the new nsid when the netns iface changesNicolas Dichtel
x-netns interfaces are bound to two netns: the link netns and the upper netns. Usually, this kind of interfaces is created in the link netns and then moved to the upper netns. At the end, the interface is visible only in the upper netns. The link nsid is advertised via netlink in the upper netns, thus the user always knows where is the link part. There is no such mechanism in the link netns. When the interface is moved to another netns, the user cannot "follow" it. This patch adds a new netlink attribute which helps to follow an interface which moves to another netns. When the interface is unregistered, the new nsid is advertised. If the interface is a x-netns interface (ie rtnl_link_ops->get_link_net is defined), the nsid is allocated if needed. CC: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04Merge branch 'bpf-cgroup-multi-prog'David S. Miller
Alexei Starovoitov says: ==================== bpf: muli prog support for cgroup-bpf v1->v2: - fixed accidentally swapped two lines which caused static_key not going to zero - addressed Martin's feedback and changed prog_query to be consistent with verifier output: return -enospc and fill supplied buffer instead of just returning -enospc when buffer is too small to fit all prog_ids v1: cgroup-bpf use cases are getting more advanced and running only one program per cgroup is no longer enough. Therefore introduce support for attaching multiple programs per cgroup and running a set of effective programs. These patches introduces BPF_F_ALLOW_MULTI flag for BPF_PROG_ATTACH cmd. The default is still NONE and behavior of BPF_F_ALLOW_OVERRIDE flag is unchanged. The difference between three possible flags for BPF_PROG_ATTACH command: - NONE(default): No further bpf programs allowed in the subtree. - BPF_F_ALLOW_OVERRIDE: If a sub-cgroup installs some bpf program, the program in this cgroup yields to sub-cgroup program. - BPF_F_ALLOW_MULTI: If a sub-cgroup installs some bpf program, that cgroup program gets run in addition to the program in this cgroup. Most of the logic is in patch 1. Even when cgroup doesn't have any programs attached its set of effective program can be non-empty. To quickly execute them and avoid penalizing cgroups without any effective programs introduce 'struct bpf_prog_array' which has an optimization for cgroups with zero effective programs. Patch 2 introduces BPF_PROG_QUERY command for introspection Patch 3 makes verifier more strict for cgroup-bpf program types. Patch 4+ are tests. More details in individual patches ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04samples/bpf: use bpf_prog_query() interfaceAlexei Starovoitov
use BPF_PROG_QUERY command to strengthen test coverage Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04libbpf: add support for BPF_PROG_QUERYAlexei Starovoitov
add support for BPF_PROG_QUERY command to libbpf Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04libbpf: sync bpf.hAlexei Starovoitov
tools/include/uapi/linux/bpf.h got out of sync with actual kernel header. Update it. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04samples/bpf: add multi-prog cgroup test caseAlexei Starovoitov
create 5 cgroups, attach 6 progs and check that progs are executed as: cgrp1 (MULTI progs A, B) -> cgrp2 (OVERRIDE prog C) -> cgrp3 (MULTI prog D) -> cgrp4 (OVERRIDE prog E) -> cgrp5 (NONE prog F) the event in cgrp5 triggers execution of F,D,A,B in that order. if prog F is detached, the execution is E,D,A,B if prog F and D are detached, the execution is E,A,B if prog F, E and D are detached, the execution is C,A,B Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04libbpf: introduce bpf_prog_detach2()Alexei Starovoitov
introduce bpf_prog_detach2() that takes one more argument prog_fd vs bpf_prog_detach() that takes only attach_fd and type. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04bpf: enforce return code for cgroup-bpf programsAlexei Starovoitov
with addition of tnum logic the verifier got smart enough and we can enforce return codes at program load time. For now do so for cgroup-bpf program types. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04bpf: introduce BPF_PROG_QUERY commandAlexei Starovoitov
introduce BPF_PROG_QUERY command to retrieve a set of either attached programs to given cgroup or a set of effective programs that will execute for events within a cgroup Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> for cgroup bits Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04bpf: multi program support for cgroup+bpfAlexei Starovoitov
introduce BPF_F_ALLOW_MULTI flag that can be used to attach multiple bpf programs to a cgroup. The difference between three possible flags for BPF_PROG_ATTACH command: - NONE(default): No further bpf programs allowed in the subtree. - BPF_F_ALLOW_OVERRIDE: If a sub-cgroup installs some bpf program, the program in this cgroup yields to sub-cgroup program. - BPF_F_ALLOW_MULTI: If a sub-cgroup installs some bpf program, that cgroup program gets run in addition to the program in this cgroup. NONE and BPF_F_ALLOW_OVERRIDE existed before. This patch doesn't change their behavior. It only clarifies the semantics in relation to new flag. Only one program is allowed to be attached to a cgroup with NONE or BPF_F_ALLOW_OVERRIDE flag. Multiple programs are allowed to be attached to a cgroup with BPF_F_ALLOW_MULTI flag. They are executed in FIFO order (those that were attached first, run first) The programs of sub-cgroup are executed first, then programs of this cgroup and then programs of parent cgroup. All eligible programs are executed regardless of return code from earlier programs. To allow efficient execution of multiple programs attached to a cgroup and to avoid penalizing cgroups without any programs attached introduce 'struct bpf_prog_array' which is RCU protected array of pointers to bpf programs. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> for cgroup bits Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04net: cache skb_shinfo() in skb_try_coalesce()Eric Dumazet
Compiler does not really know that skb_shinfo(to|from) are constants in skb_try_coalesce(), lets cache their values to shrink code. We might even take care of skb_zcopy() calls later. $ size net/core/skbuff.o.before net/core/skbuff.o text data bss dec hex filename 40727 1298 0 42025 a429 net/core/skbuff.o.before 40631 1298 0 41929 a3c9 net/core/skbuff.o Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04selftests: rtnetlink: try concurrent change of ifaliasFlorian Westphal
to make sure this is serialized correctly. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04rtnetlink: remove __rtnl_af_unregisterFlorian Westphal
switch the only caller to rtnl_af_unregister. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04rtnetlink: remove slave_validate callbackFlorian Westphal
no users in the tree. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04cxgb4vf: make a couple of functions staticColin Ian King
The functions t4vf_link_down_rc_str and t4vf_handle_get_port_info are local to the source and do not need to be in global scope, so make them static. Cleans up sparse warnings: symbol 't4vf_link_down_rc_str' was not declared. Should it be static? symbol 't4vf_handle_get_port_info' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04net: core: fix kerneldoc commentFlorian Westphal
net/core/dev.c:1306: warning: No description found for parameter 'name' net/core/dev.c:1306: warning: Excess function parameter 'alias' description in 'dev_get_alias' Fixes: 6c5570016b97 ("net: core: decouple ifalias get/set from rtnl lock") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04ravb: RX checksum offloadSimon Horman
Add support for RX checksum offload. This is enabled by default and may be disabled and re-enabled using ethtool: # ethtool -K eth0 rx off # ethtool -K eth0 rx on The RAVB provides a simple checksumming scheme which appears to be completely compatible with CHECKSUM_COMPLETE: sum of all packet data after the L2 header is appended to packet data; this may be trivially read by the driver and used to update the skb accordingly. In terms of performance throughput is close to gigabit line-rate both with and without RX checksum offload enabled. Perf output, however, appears to indicate that significantly less time is spent in do_csum(). This is as expected. Test results with RX checksum offload enabled: # /usr/bin/perf_3.16 record -o /run/perf.data -a netperf -t TCP_MAERTS -H 10.4.3.162 MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.4.3.162 () port 0 AF_INET : demo enable_enobufs failed: getprotobyname Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 937.54 Summary of output of perf report: 18.28% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 10.34% ksoftirqd/0 [kernel.kallsyms] [k] __pi_memcpy 9.83% ksoftirqd/0 [kernel.kallsyms] [k] ravb_poll 7.89% ksoftirqd/0 [kernel.kallsyms] [k] skb_put 4.01% ksoftirqd/0 [kernel.kallsyms] [k] dev_gro_receive 3.37% netperf [kernel.kallsyms] [k] __arch_copy_to_user 3.17% swapper [kernel.kallsyms] [k] arch_cpu_idle 2.55% swapper [kernel.kallsyms] [k] tick_nohz_idle_enter 2.04% ksoftirqd/0 [kernel.kallsyms] [k] __pi___inval_dcache_area 2.03% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irq 1.96% ksoftirqd/0 [kernel.kallsyms] [k] __netdev_alloc_skb 1.59% ksoftirqd/0 [kernel.kallsyms] [k] __slab_alloc.isra.83 Test results without RX checksum offload enabled: # /usr/bin/perf_3.16 record -o /run/perf.data -a netperf -t TCP_MAERTS -H 10.4.3.162 MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.4.3.162 () port 0 AF_INET : demo enable_enobufs failed: getprotobyname Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 940.20 Summary of output of perf report: 17.10% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 10.99% ksoftirqd/0 [kernel.kallsyms] [k] __pi_memcpy 8.87% ksoftirqd/0 [kernel.kallsyms] [k] ravb_poll 8.16% ksoftirqd/0 [kernel.kallsyms] [k] skb_put 7.42% ksoftirqd/0 [kernel.kallsyms] [k] do_csum 3.91% ksoftirqd/0 [kernel.kallsyms] [k] dev_gro_receive 2.31% swapper [kernel.kallsyms] [k] arch_cpu_idle 2.16% ksoftirqd/0 [kernel.kallsyms] [k] __pi___inval_dcache_area 2.14% ksoftirqd/0 [kernel.kallsyms] [k] __netdev_alloc_skb 1.93% netperf [kernel.kallsyms] [k] __arch_copy_to_user 1.79% swapper [kernel.kallsyms] [k] tick_nohz_idle_enter 1.63% ksoftirqd/0 [kernel.kallsyms] [k] __slab_alloc.isra.83 Above results collected on an R-Car Gen 3 Salvator-X/r8a7796 ES1.0. Also tested on a R-Car Gen 3 Salvator-X/r8a7795 ES1.0. By inspection this also appears to be compatible with the ravb found on R-Car Gen 2 SoCs, however, this patch is currently untested on such hardware. Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03cxgb4: add new T6 pci device id'sGanesh Goudar
Add 0x6085 T6 device id. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03Merge branch 'sctp-stream-schedulers'David S. Miller
Marcelo Ricardo Leitner says: ==================== Introduce SCTP Stream Schedulers This patchset introduces the SCTP Stream Schedulers are defined by https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13 It provides 3 schedulers at the moment: FCFS, Priority and Round Robin. The other 3, Round Robin per packet, Fair Capacity and Weighted Fair Capacity will be added later. More specifically, WFQ is required by WebRTC Datachannels. The draft also defines the idata chunk, allowing a usermsg to be interrupted by another piece of idata from another stream. This patchset *doesn't* include it. It will be posted later by Xin Long. Its integration with this patchset is very simple and it basically only requires a tweak in sctp_sched_dequeue_done(), to ignore datamsg boundaries. The first 5 patches are a preparation for the next ones. The most relevant patches are the 4th and 6th ones. More details are available on each patch. v2: changelog update on patch 3 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03sctp: introduce round robin stream schedulerMarcelo Ricardo Leitner
This patch introduces RFC Draft ndata section 3.2 Priority Based Scheduler (SCTP_SS_RR). Works by maintaining a list of enqueued streams and tracking the last one used to send data. When the datamsg is done, it switches to the next stream. See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13 Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03sctp: introduce priority based stream schedulerMarcelo Ricardo Leitner
This patch introduces RFC Draft ndata section 3.4 Priority Based Scheduler (SCTP_SS_PRIO). It works by having a struct sctp_stream_priority for each priority configured. This struct is then enlisted on a queue ordered per priority if, and only if, there is a stream with data queued, so that dequeueing is very straightforward: either finish current datamsg or simply dequeue from the highest priority queued, which is the next stream pointed, and that's it. If there are multiple streams assigned with the same priority and with data queued, it will do round robin amongst them while respecting datamsgs boundaries (when not using idata chunks), to be reasonably fair. We intentionally don't maintain a list of priorities nor a list of all streams with the same priority to save memory. The first would mean at least 2 other pointers per priority (which, for 1000 priorities, that can mean 16kB) and the second would also mean 2 other pointers but per stream. As SCTP supports up to 65535 streams on a given asoc, that's 1MB. This impacts when giving a priority to some stream, as we have to find out if the new priority is already being used and if we can free the old one, and also when tearing down. The new fields in struct sctp_stream_out_ext and sctp_stream are added under a union because that memory is to be shared with other schedulers. See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13 Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03sctp: add sockopt to get/set stream scheduler parametersMarcelo Ricardo Leitner
As defined per RFC Draft ndata Section 4.3.3, named as SCTP_STREAM_SCHEDULER_VALUE. See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13 Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03sctp: add sockopt to get/set stream schedulerMarcelo Ricardo Leitner
As defined per RFC Draft ndata Section 4.3.2, named as SCTP_STREAM_SCHEDULER. See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13 Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03sctp: introduce stream scheduler foundationsMarcelo Ricardo Leitner
This patch introduces the hooks necessary to do stream scheduling, as per RFC Draft ndata. It also introduces the first scheduler, which is what we do today but now factored out: first come first served (FCFS). With stream scheduling now we have to track which chunk was enqueued on which stream and be able to select another other than the in front of the main outqueue. So we introduce a list on sctp_stream_out_ext structure for this purpose. We reuse sctp_chunk->transmitted_list space for the list above, as the chunk cannot belong to the two lists at the same time. By using the union in there, we can have distinct names for these moments. sctp_sched_ops are the operations expected to be implemented by each scheduler. The dequeueing is a bit particular to this implementation but it is to match how we dequeue packets today. We first dequeue and then check if it fits the packet and if not, we requeue it at head. Thus why we don't have a peek operation but have dequeue_done instead, which is called once the chunk can be safely considered as transmitted. The check removed from sctp_outq_flush is now performed by sctp_stream_outq_migrate, which is only called during assoc setup. (sctp_sendmsg() also checks for it) The only operation that is foreseen but not yet added here is a way to signalize that a new packet is starting or that the packet is done, for round robin scheduler per packet, but is intentionally left to the patch that actually implements it. Support for I-DATA chunks, also described in this RFC, with user message interleaving is straightforward as it just requires the schedulers to probe for the feature and ignore datamsg boundaries when dequeueing. See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13 Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03sctp: introduce sctp_chunk_stream_noMarcelo Ricardo Leitner
Add a helper to fetch the stream number from a given chunk. Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03sctp: introduce struct sctp_stream_out_extMarcelo Ricardo Leitner
With the stream schedulers, sctp_stream_out will become too big to be allocated by kmalloc and as we need to allocate with BH disabled, we cannot use __vmalloc in sctp_stream_init(). This patch moves out the stats from sctp_stream_out to sctp_stream_out_ext, which will be allocated only when the application tries to sendmsg something on it. Just the introduction of sctp_stream_out_ext would already fix the issue described above by splitting the allocation in two. Moving the stats to it also reduces the pressure on the allocator as we will ask for less memory atomically when creating the socket and we will use GFP_KERNEL later. Then, for stream schedulers, we will just use sctp_stream_out_ext. Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03sctp: factor out stream->in allocationMarcelo Ricardo Leitner
There is 1 place allocating it and another reallocating. Move such procedures to a common function. v2: updated changelog Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03sctp: factor out stream->out allocationMarcelo Ricardo Leitner
There is 1 place allocating it and 2 other reallocating. Move such procedures to a common function. Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03sctp: silence warns on sctp_stream_init allocationsMarcelo Ricardo Leitner
As SCTP supports up to 65535 streams, that can lead to very large allocations in sctp_stream_init(). As Xin Long noticed, systems with small amounts of memory are more prone to not have enough memory and dump warnings on dmesg initiated by user actions. Thus, silence them. Also, if the reallocation of stream->out is not necessary, skip it and keep the memory we already have. Reported-by: Xin Long <lucien.xin@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2017-10-03 This series contains updates to fm10k only. Jake provides majority of the changes in this series, starting with using fm10k_prepare_for_reset() if we lose PCIe link. Before we would detach the device and close the netdev, which left a lot of items still active, such as the Tx/Rx resources. This could cause problems where register reads would return potentially invalid values and would result in unknown driver behavior, so call fm10k_prepare_for_reset() much like we do for suspend/resume cycles. This will attempt to shutdown as much as possible to prevent possible issues. Then replaced the PCI specific legacy power management hooks with the new generic power management hooks for both suspend and hibernate. Introduced a workqueue item which monitors a queue of MAC and VLAN requests since a large number of MAC address or VLAN updates at once can overload the mailbox with too many messages at once. Fixed a cppcheck warning by properly declaring the min_rate and max_rate variables in the declaration and definition for .ndo_set_vf_bw, rather than using "unused" for the minimum rates. Joe Perches fixes the backward logic when using net_ratelimit(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03net: core: decouple ifalias get/set from rtnl lockFlorian Westphal
Device alias can be set by either rtnetlink (rtnl is held) or sysfs. rtnetlink hold the rtnl mutex, sysfs acquires it for this purpose. Add an extra mutex for it and use rcu to protect concurrent accesses. This allows the sysfs path to not take rtnl and would later allow to not hold it when dumping ifalias. Based on suggestion from Eric Dumazet. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03bonding: speed/duplex update at NETDEV_UP eventMahesh Bandewar
Some NIC drivers don't have correct speed/duplex settings at the time they send NETDEV_UP notification and that messes up the bonding state. Especially 802.3ad mode which is very sensitive to these settings. In the current implementation we invoke bond_update_speed_duplex() when we receive NETDEV_UP, however, ignore the return value. If the values we get are invalid (UNKNOWN), then slave gets removed from the aggregator with speed and duplex set to UNKNOWN while link is still marked as UP. This patch fixes this scenario. Also 802.3ad mode is sensitive to these conditions while other modes are not, so making sure that it doesn't change the behavior for other modes. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03mlxsw: spectrum: Add missing error code on allocation failureDan Carpenter
We accidentally return success if the kmalloc_array() call fails. Fixes: 0e14c7777acb ("mlxsw: spectrum: Add the multicast routing hardware logic") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03mlxsw: spectrum: Fix check for IS_ERR() instead of NULLDan Carpenter
mlxsw_afa_block_create() doesn't return error pointers, it returns NULL on error. Fixes: 0e14c7777acb ("mlxsw: spectrum: Add the multicast routing hardware logic") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03net: dsa: mt7530: make functions mt7530_phy_write staticColin Ian King
The function mt7530_phy_write is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warnings: symbol 'mt7530_phy_write' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03net: dsa: lan9303: make functions lan9303_mdio_phy_{read|write} staticColin Ian King
The functions lan9303_mdio_phy_write and lan9303_mdio_phy_read are local to the source and do not need to be in global scope, so make them static. Cleans up sparse warnings: symbol 'lan9303_mdio_phy_write' was not declared. Should it be static? symbol 'lan9303_mdio_phy_read' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03Merge branch 'mlxsw-mc-route-offload'David S. Miller
Jiri Pirko says: ==================== mlxsw: Add support for partial multicast route offload Yotam says: Previous patchset introduced support for offloading multicast MFC routes to the Spectrum hardware. As described in that patchset, no partial offloading is supported, i.e if a route has one output interface which is not a valid offloadable device (e.g. pimreg device, dummy device, management NIC), the route is trapped to the CPU and the forwarding is done in slow-path. Add support for partial offloading of multicast routes, by letting the hardware to forward the packet to all the in-hardware devices, while the kernel ipmr module will continue forwarding to all other interfaces. Similarly to the bridge, the kernel ipmr module will forward a marked packet to an interface only if the interface has a different parent ID than the packet's ingress interfaces. The first patch introduces the offload_mr_fwd_mark skb field, which can be used by offloading drivers to indicate that a packet had already gone through multicast forwarding in hardware, similarly to the offload_fwd_mark field that indicates that a packet had already gone through L2 forwarding in hardware. Patches 2 and 3 change the ipmr module to not forward packets that had already been forwarded by the hardware, i.e. packets that are marked with offload_mr_fwd_mark and the ingress VIF shares the same parent ID with the egress VIF. Patches 4, 5, 6 and 7 add the support in the mlxsw Spectrum driver for trap and forward routes, while marking the trapped packets with the offload_mr_fwd_mark. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03mlxsw: spectrum: mr: Support trap-and-forward routesYotam Gigi
Add the support of trap-and-forward route action in the multicast routing offloading logic. A route will be set to trap-and-forward action if one (or more) of its output interfaces is not offload-able, i.e. does not have a valid Spectrum RIF. This way, a route with mixed output VIFs list, which contains both offload-able and un-offload-able devices can go through partial offloading in hardware, and the rest will be done in the kernel ipmr module. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03mlxsw: spectrum: mr_tcam: Add trap-and-forward multicast routeYotam Gigi
In addition to the current multicast route actions, which include trap route action and a forward route action, add the trap-and-forward multicast route action, and implement it in the multicast routing hardware logic. To implement that, add a trap-and-forward ACL action as the last action in the route flexible action set. The used trap is the ACL2 trap, which marks the packets with offload_mr_forward_mark, to prevent the packet from being forwarded again by the kernel. Note: At that stage the offloading logic does not support trap-and-forward multicast routes. This patch adds the support only in the hardware logic. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03mlxsw: spectrum: Add trap for multicast trap-and-forward routesYotam Gigi
When a multicast route is configured with trap-and-forward action, the packets should be marked with skb->offload_mr_fwd_mark, in order to prevent the packets from being forwarded again by the kernel ipmr module. Due to this, it is not possible to use the already existing multicast trap (MLXSW_TRAP_ID_ACL1) as the packet should be marked differently. Add the MLXSW_TRAP_ID_ACL2 which is for trap-and-forward multicast routes, and set the offload_mr_fwd_mark skb field in its handler. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03mlxsw: acl: Introduce ACL trap and forward actionYotam Gigi
Use trap/discard flex action to implement trap and forward. The action will later be used for multicast routing, as the multicast routing mechanism is done using ACL flexible actions in Spectrum hardware. Using that action, it will be possible to implement a trap-and-forward route. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03ipv4: ipmr: Don't forward packets already forwarded by hardwareYotam Gigi
Change the ipmr module to not forward packets if: - The packet is marked with the offload_mr_fwd_mark, and - Both input interface and output interface share the same parent ID. This way, a packet can go through partial multicast forwarding in the hardware, where it will be forwarded only to the devices that share the same parent ID (AKA, reside inside the same hardware). The kernel will forward the packet to all other interfaces. To do this, add the ipmr_offload_forward helper, which per skb, ingress VIF and egress VIF, returns whether the forwarding was offloaded to hardware. The ipmr_queue_xmit frees the skb and does not forward it if the result is a true value. All the forwarding path code compiles out when the CONFIG_NET_SWITCHDEV is not set. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03ipv4: ipmr: Add the parent ID field to VIF structYotam Gigi
In order to allow the ipmr module to do partial multicast forwarding according to the device parent ID, add the device parent ID field to the VIF struct. This way, the forwarding path can use the parent ID field without invoking switchdev calls, which requires the RTNL lock. When a new VIF is added, set the device parent ID field in it by invoking the switchdev_port_attr_get call. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03skbuff: Add the offload_mr_fwd_mark fieldYotam Gigi
Similarly to the offload_fwd_mark field, the offload_mr_fwd_mark field is used to allow partial offloading of MFC multicast routes. Switchdev drivers can offload MFC multicast routes to the hardware by registering to the FIB notification chain. When one of the route output interfaces is not offload-able, i.e. has different parent ID, the route cannot be fully offloaded by the hardware. Examples to non-offload-able devices are a management NIC, dummy device, pimreg device, etc. Similar problem exists in the bridge module, as one bridge can hold interfaces with different parent IDs. At the bridge, the problem is solved by the offload_fwd_mark skb field. Currently, when a route cannot go through full offload, the only solution for a switchdev driver is not to offload it at all and let the packet go through slow path. Using the offload_mr_fwd_mark field, a driver can indicate that a packet was already forwarded by hardware to all the devices with the same parent ID as the input device. Further patches in this patch-set are going to enhance ipmr to skip multicast forwarding to devices with the same parent ID if a packets is marked with that field. The reason why the already existing "offload_fwd_mark" bit cannot be used is that a switchdev driver would want to make the distinction between a packet that has already gone through L2 forwarding but did not go through multicast forwarding, and a packet that has already gone through both L2 and multicast forwarding. For example: when a packet is ingressing from a switchport enslaved to a bridge, which is configured with multicast forwarding, the following scenarios are possible: - The packet can be trapped to the CPU due to exception while multicast forwarding (for example, MTU error). In that case, it had already gone through L2 forwarding in the hardware, thus A switchdev driver would want to set the skb->offload_fwd_mark and not the skb->offload_mr_fwd_mark. - The packet can also be trapped due to a pimreg/dummy device used as one of the output interfaces. In that case, it can go through both L2 and (partial) multicast forwarding inside the hardware, thus a switchdev driver would want to set both the skb->offload_fwd_mark and skb->offload_mr_fwd_mark. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellaox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03cxgb4: Update comment for min_mtuArjun Vynipadath
We have lost a comment for minimum mtu value set for netdevice with 'commit d894be57ca92 ("ethernet: use net core MTU range checking in more drivers"). Updating it accordingly. Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03fm10k: fix mis-ordered parameters in declaration for .ndo_set_vf_bwJacob Keller
We've had support for setting both a minimum and maximum bandwidth via .ndo_set_vf_bw since commit 883a9ccbae56 ("fm10k: Add support for SR-IOV to driver", 2014-09-20). Likely because we do not support minimum rates, the declaration mis-ordered the "unused" parameter, which causes warnings when analyzed with cppcheck. Fix this warning by properly declaring the min_rate and max_rate variables in the declaration and definition (rather than using "unused"). Also rename "rate" to max_rate so as to clarify that we only support setting the maximum rate. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-03fm10k: prefer %s and __func__ for diagnostic printsJacob Keller
Don't hard code the function names in the diagnostic output when these reset related routines fail. Instead, use %s and __func__ so that future refactors don't need to change the print outs. Additionally, while we are here, add missing function header comments for the new reset_prepare and reset_done function handlers. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-03fm10k: Fix misuse of net_ratelimit()Joe Perches
Correct the backward logic using !net_ratelimit() Miscellanea: o Add a blank line before the error return label Signed-off-by: Joe Perches <joe@perches.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-03fm10k: bump version numberJacob Keller
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-03fm10k: use the MAC/VLAN queue for VF<->PF MAC/VLAN requestsJacob Keller
Now that we have a working MAC/VLAN queue for handling MAC/VLAN messages from the netdev, replace the default handler for the VF<->PF messages. This new handler is very similar to the default code, but uses the MAC/VLAN queue instead of sending the message directly. Unfortunately we can't easily re-use the default code, so we'll just replace the entire function. This ensures that a VF requesting a large number of VLANs or MAC addresses does not start a reset cycle, as explained in the commit which introduced the message queue. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Ngai-mint Kwan <ngai-mint.kwan@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>