Age | Commit message (Collapse) | Author |
|
Adds support for calling sock_ops BPF program when there is a
retransmission. Three arguments are used; one for the sequence number,
another for the number of segments retransmitted, and the last one for
the return value of tcp_transmit_skb (0 => success).
Does not include syn-ack retransmissions.
New op: BPF_SOCK_OPS_RETRANS_CB.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add support for reading many more tcp_sock fields
state, same as sk->sk_state
rtt_min same as sk->rtt_min.s[0].v (current rtt_min)
snd_ssthresh
rcv_nxt
snd_nxt
snd_una
mss_cache
ecn_flags
rate_delivered
rate_interval_us
packets_out
retrans_out
total_retrans
segs_in
data_segs_in
segs_out
data_segs_out
lost_out
sacked_out
sk_txhash
bytes_received (__u64)
bytes_acked (__u64)
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Adds an optional call to sock_ops BPF program based on whether the
BPF_SOCK_OPS_RTO_CB_FLAG is set in bpf_sock_ops_flags.
The BPF program is passed 2 arguments: icsk_retransmits and whether the
RTO has expired.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Adds field bpf_sock_ops_cb_flags to tcp_sock and bpf_sock_ops. Its primary
use is to determine if there should be calls to sock_ops bpf program at
various points in the TCP code. The field is initialized to zero,
disabling the calls. A sock_ops BPF program can set it, per connection and
as necessary, when the connection is established.
It also adds support for reading and writting the field within a
sock_ops BPF program. Reading is done by accessing the field directly.
However, writing is done through the helper function
bpf_sock_ops_cb_flags_set, in order to return an error if a BPF program
is trying to set a callback that is not supported in the current kernel
(i.e. running an older kernel). The helper function returns 0 if it was
able to set all of the bits set in the argument, a positive number
containing the bits that could not be set, or -EINVAL if the socket is
not a full TCP socket.
Examples of where one could call the bpf program:
1) When RTO fires
2) When a packet is retransmitted
3) When the connection terminates
4) When a packet is sent
5) When a packet is received
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Adds support for passing up to 4 arguments to sock_ops bpf functions. It
reusues the reply union, so the bpf_sock_ops structures are not
increased in size.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds a macro, SOCK_OPS_SET_FIELD, for writing to
struct tcp_sock or struct sock fields. This required adding a new
field "temp" to struct bpf_sock_ops_kern for temporary storage that
is used by sock_ops_convert_ctx_access. It is used to store and recover
the contents of a register, so the register can be used to store the
address of the sk. Since we cannot overwrite the dst_reg because it
contains the pointer to ctx, nor the src_reg since it contains the value
we want to store, we need an extra register to contain the address
of the sk.
Also adds the macro SOCK_OPS_GET_OR_SET_FIELD that calls one of the
GET or SET macros depending on the value of the TYPE field.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Create a wrapper around tc_can_offload() that takes an additional
extack pointer argument in order to output an error message if TC
offload is disabled on the device.
In this way, the error message is handled by the core and can be the
same for all drivers.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add extack support for hardware offload of classifiers. In order
to achieve this, a pointer to a struct netlink_ext_ack is added to the
struct tc_cls_common_offload that is passed to the callback for setting
up the classifier. Function tc_cls_common_offload_init() is updated to
support initialization of this new attribute.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Expose the number of times the link has been going UP or DOWN, and
update the "carrier_changes" counter to be the sum of these two events.
While at it, also update the sysfs-class-net documentation to cover:
carrier_changes (3.15), carrier_up_count (4.16) and carrier_down_count
(4.16)
Signed-off-by: David Decotigny <decot@googlers.com>
[Florian:
* rebase
* add documentation
* merge carrier_changes with up/down counters]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit ccfdec908922 ("macsec: Add support for GCM-AES-256 cipher suite")
changed a few values in the uapi headers for MACsec.
Because of existing userspace implementations, we need to preserve the
value of MACSEC_DEFAULT_CIPHER_ID. Not doing that resulted in
wpa_supplicant segfaults when a secure channel was created using the
default cipher. Thus, swap MACSEC_DEFAULT_CIPHER_{ID,ALT} back to their
original values.
Changing the maximum length of the MACSEC_SA_ATTR_KEY attribute is
unnecessary, as the previous value (MACSEC_MAX_KEY_LEN, which was 128B)
is large enough to carry 32-bytes keys. This patch reverts
MACSEC_MAX_KEY_LEN to 128B and restores the old length check on
MACSEC_SA_ATTR_KEY.
Fixes: ccfdec908922 ("macsec: Add support for GCM-AES-256 cipher suite")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement a new helper function fwnode_get_next_available_child_node(),
which enables obtaining next enabled child fwnode, which
works on a similar basis to OF's of_get_next_available_child().
This commit also introduces a macro, thanks to which it is
possible to iterate over the available fwnodes, using the
new function described above.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Until now there were two very similar functions allowing
to get Linux IRQ number from ACPI handle (acpi_irq_get())
and OF node (of_irq_get()). The first one appeared to be used
only as a subroutine of platform_irq_get(), which (in the generic
code) limited IRQ obtaining from _CRS method only to nodes
associated to kernel's struct platform_device.
This patch introduces a new helper routine - fwnode_irq_get(),
which allows to get the IRQ number directly from the fwnode
to be used as common for OF/ACPI worlds. It is usable not
only for the parents fwnodes, but also for the child nodes
comprising their own _CRS methods with interrupts description.
In order to be able o satisfy compilation with !CONFIG_ACPI
and also simplify the new code, introduce a helper macro
(ACPI_HANDLE_FWNODE), with which it is possible to reach
an ACPI handle directly from its fwnode.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Until now there were two almost identical functions for
obtaining network PHY mode - of_get_phy_mode() and,
more generic, device_get_phy_mode(). However it is not uncommon,
that the network interface is represented as a child
of the actual controller, hence it is not associated
directly to any struct device, required by the latter
routine.
This commit allows for getting the PHY mode for
children nodes in the ACPI world by introducing a new function -
fwnode_get_phy_mode(). This commit also changes
device_get_phy_mode() routine to be its wrapper, in order
to prevent unnecessary duplication.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Until now there were two almost identical functions for
obtaining MAC address - of_get_mac_address() and, more generic,
device_get_mac_address(). However it is not uncommon,
that the network interface is represented as a child
of the actual controller, hence it is not associated
directly to any struct device, required by the latter
routine.
This commit allows for getting the MAC address for
children nodes in the ACPI world by introducing a new function -
fwnode_get_mac_address(). This commit also changes
device_get_mac_address() routine to be its wrapper, in order
to prevent unnecessary duplication.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2018-01-19
From: Or Gerlitz <ogerlitz@mellanox.com>
=======
First six patches of this series further enhances the mlx5 hairpin support.
The first two patches deal with using different hairpin instances
for flows whose packets have different priorities to align with the port
TX QoS model. The next four patches allow us to do HW spreading
of flows over a set of hairpin pairs using RSS. The last two patches
change the driver to also set the size of the HW hairpin queues.
========
Next four patches from Eran Ben Elisha <eranbe@mellanox.com>:
Add more debug data for TX timeout handling, and further enhance and optimize
TX timeout handling upon lost interrupts, which adds a mechanism for explicitly
polling EQ in case of a TX timeout in order to recover from a lost interrupt.
If this is not the case (no pending EQEs), perform a channels full recovery as
usual.
From Kamal Heib <kamalh@mellanox.com>, Two patches to extend the stats group API
to have an update_stats() callback which will be used to fetch the hardware or
software counters data, this will improve the current API and reduce code
duplication.
From Gal Pressman <galp@mellanox.com>, Last patch, Add likely to the common RX checksum
flow.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next
tree. Basically, a new extension for ip6tables, simplification work of
nf_tables that saves us 500 LoC, allow raw table registration before
defragmentation, conversion of the SNMP helper to use the ASN.1 code
generator, unique 64-bit handle for all nf_tables objects and fixes to
address fallout from previous nf-next batch. More specifically, they
are:
1) Seven patches to remove family abstraction layer (struct nft_af_info)
in nf_tables, this simplifies our codebase and it saves us 64 bytes per
net namespace.
2) Add IPv6 segment routing header matching for ip6tables, from Ahmed
Abdelsalam.
3) Allow to register iptable_raw table before defragmentation, some
people do not want to waste cycles on defragmenting traffic that is
going to be dropped, hence add a new module parameter to enable this
behaviour in iptables and ip6tables. From Subash Abhinov
Kasiviswanathan. This patch needed a couple of follow up patches to
get things tidy from Arnd Bergmann.
4) SNMP helper uses the ASN.1 code generator, from Taehee Yoo. Several
patches for this helper to prepare this change are also part of this
patch series.
5) Add 64-bit handles to uniquely objects in nf_tables, from Harsha
Sharma.
6) Remove log message that several netfilter subsystems print at
boot/load time.
7) Restore x_tables module autoloading, that got broken in a previous
patch to allow singleton NAT hook callback registration per hook
spot, from Florian Westphal. Moreover, return EBUSY to report that
the singleton NAT hook slot is already in instead.
8) Several fixes for the new nf_tables flowtable representation,
including incorrect error check after nf_tables_flowtable_lookup(),
missing Kconfig dependencies that lead to build breakage and missing
initialization of priority and hooknum in flowtable object.
9) Missing NETFILTER_FAMILY_ARP dependency in Kconfig for the clusterip
target. This is due to recent updates in the core to shrink the hook
array size and compile it out if no specific family is enabled via
.config file. Patch from Florian Westphal.
10) Remove duplicated include header files, from Wei Yongjun.
11) Sparse warning fix for the NFPROTO_INET handling from the core
due to missing static function definition, also from Wei Yongjun.
12) Restore ICMPv6 Parameter Problem error reporting when
defragmentation fails, from Subash Abhinov Kasiviswanathan.
13) Remove obsolete owner field initialization from struct
file_operations, patch from Alexey Dobriyan.
14) Use boolean datatype where needed in the Netfilter codebase, from
Gustavo A. R. Silva.
15) Remove double semicolon in dynset nf_tables expression, from
Luis de Bethencourt.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf-next 2018-01-19
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) bpf array map HW offload, from Jakub.
2) support for bpf_get_next_key() for LPM map, from Yonghong.
3) test_verifier now runs loaded programs, from Alexei.
4) xdp cpumap monitoring, from Jesper.
5) variety of tests, cleanups and small x64 JIT optimization, from Daniel.
6) user space can now retrieve HW JITed program, from Jiong.
Note there is a minor conflict between Russell's arm32 JIT fixes
and removal of bpf_jit_enable variable by Daniel which should
be resolved by keeping Russell's comment and removing that variable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The BPF verifier conflict was some minor contextual issue.
The TUN conflict was less trivial. Cong Wang fixed a memory leak of
tfile->tx_array in 'net'. This is an skb_array. But meanwhile in
net-next tun changed tfile->tx_arry into tfile->tx_ring which is a
ptr_ring.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds extack handling for the tcf_change_indev function which
is common used by TC classifier implementations.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds extack support for classifier delete callback api. This
prepares to handle extack support inside each specific classifier
implementation.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The tcf_exts_validate function calls the act api change callback. For
preparing extack support for act api, this patch adds the extack as
parameter for this function which is common used in cls implementations.
Furthermore the tcf_exts_validate will call action init callback which
prepares the TC action subsystem for extack support.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds extack support for classifier change callback api. This
prepares to handle extack support inside each specific classifier
implementation.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch changes some code style issues pointed out by checkpatch
inside the TC cls subsystem.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow to specify the size of the hairpin queues along with the
packet buffer data size from the core setup code.
If the driver doesn't provide this, the FW applies proper value that
matches the provided data size and a FW chosen RQ stride size.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Enhance the hairpin setup code at the core to support a set of N
(RQ,SQ) pairs. This will be later used by the caller to set RSS
spreading among the different RQs.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
A persistent connection may send tiny amount of data (e.g. health-check)
for a long period of time. BBR's windowed min RTT filter may only see
RTT samples from delayed ACKs causing BBR to grossly over-estimate
the path delay depending how much the ACK was delayed at the receiver.
This patch skips RTT samples that are likely coming from delayed ACKs. Note
that it is possible the sender never obtains a valid measure to set the
min RTT. In this case BBR will continue to set cwnd to initial window
which seems fine because the connection is thin stream.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Reviewed-by: Guillaume Nault <g.nault@alphalink.fr>
Tested-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Helmut reported a bug about devision by zero while
running traffic and doing physical cable pull test.
When the cable unplugged the ppms become zero, so when
dividing the current ppms by the previous ppms in the
next dim iteration there is devision by zero.
This patch prevent this division for both ppms and epms.
Fixes: c3164d2fc48f ("net/mlx5e: Added BW check for DIM decision mechanism")
Fixes: 4c4dbb4a7363 ("net/mlx5e: Move dynamic interrupt coalescing code to include/linux")
Reported-by: Helmut Grauer <helmut.grauer@de.ibm.com>
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When CONFIG_KASAN is set, we can use relatively large amounts of kernel
stack space:
net/caif/cfctrl.c:555:1: warning: the frame size of 1600 bytes is larger than 1280 bytes [-Wframe-larger-than=]
This adds convenience wrappers around cfpkt_extr_head(), which is responsible
for most of the stack growth. With those wrapper functions, gcc apparently
starts reusing the stack slots for each instance, thus avoiding the
problem.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Without this patch, I drown in a sea of unknown attribute warnings
Link: http://lkml.kernel.org/r/20180117024539.27354-1-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch allows deletion of objects via unique handle which can be
listed via '-a' option.
Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Tell user space about device on which the map was created.
Unfortunate reality of user ABI makes sharing this code
with program offload difficult but the information is the
same.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Doc BPF ld/ldx size defines as comments in code, as it makes in
faster to lookup in a programming/review setting, than looking up
the sizes in Documentation/networking/filter.txt.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
For host JIT, there are "jited_len"/"bpf_func" fields in struct bpf_prog
used by all host JIT targets to get jited image and it's length. While for
offload, targets are likely to have different offload mechanisms that these
info are kept in device private data fields.
Therefore, BPF_OBJ_GET_INFO_BY_FD syscall needs an unified way to get JIT
length and contents info for offload targets.
One way is to introduce new callback to parse device private data then fill
those fields in bpf_prog_info. This might be a little heavy, the other way
is to add generic fields which will be initialized by all offload targets.
This patch follow the second approach to introduce two new fields in
struct bpf_dev_offload and teach bpf_prog_get_info_by_fd about them to fill
correct jited_prog_len and jited_prog_insns in bpf_prog_info.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2018-01-16
this is a pull request for net-next/master consisting of 9 patches.
This is a series of patches, some of them initially by Franklin S Cooper
Jr, which was picked up by Faiz Abbas. Faiz Abbas added some patches
while working on this series, I contributed one as well.
The first two patches add support to CAN device infrastructure to limit
the bitrate of a CAN adapter if the used CAN-transceiver has a certain
maximum bitrate.
The remaining patches improve the m_can driver. They add support for
bitrate limiting to the driver, clean up the driver and add support for
runtime PM.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The trailing semicolon is an empty statement that does no operation.
It is completely stripped out by the compiler. Removing it since it doesn't do
anything.
Fixes: 5f35227ea34b ("net: Generalize ndo_gso_check to ndo_features_check")
Signed-off-by: Luis de Bethencourt <luisbg@kernel.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch allows userspace to attach eBPF filter to tun. This will
allow to implement VM dataplane filtering in a more efficient way
compared to cBPF filter by allowing either qemu or libvirt to
attach eBPF filter to tun.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"A delayacct statistics correctness fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
delayacct: Account blkio completion on the correct task
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti bits and fixes from Thomas Gleixner:
"This last update contains:
- An objtool fix to prevent a segfault with the gold linker by
changing the invocation order. That's not just for gold, it's a
general robustness improvement.
- An improved error message for objtool which spares tearing hairs.
- Make KASAN fail loudly if there is not enough memory instead of
oopsing at some random place later
- RSB fill on context switch to prevent RSB underflow and speculation
through other units.
- Make the retpoline/RSB functionality work reliably for both Intel
and AMD
- Add retpoline to the module version magic so mismatch can be
detected
- A small (non-fix) update for cpufeatures which prevents cpu feature
clashing for the upcoming extra mitigation bits to ease
backporting"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
module: Add retpoline tag to VERMAGIC
x86/cpufeature: Move processor tracing out of scattered features
objtool: Improve error message for bad file argument
objtool: Fix seg fault with gold linker
x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
x86/retpoline: Fill RSB on context switch for affected CPUs
x86/kasan: Panic if there is not enough memory to boot
|
|
Introduce two new attributes to be used for qdisc creation and dumping.
One for ingress block, one for egress block. Introduce a set of ops that
qdisc which supports block sharing would implement.
Passing block indexes in qdisc change is not supported yet and it is
checked and forbidded.
In future, these attributes are to be reused for specifying block
indexes for classes as well. As of this moment however, it is not
supported so a check is in place to forbid it.
Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As the tcm_ifindex with value TCM_IFINDEX_MAGIC_BLOCK is invalid ifindex,
use it to indicate that we work with block, instead of qdisc.
So if tcm_ifindex is set to TCM_IFINDEX_MAGIC_BLOCK, tcm_parent is used
to carry block_index.
If the block is set to be shared between at least 2 qdiscs, it is
forbidden to use the qdisc handle to add/delete filters. In that case,
userspace has to pass block_index.
Also, for dump of the filters, in case the block is shared in between at
least 2 qdiscs, the each filter is dumped with tcm_ifindex value
TCM_IFINDEX_MAGIC_BLOCK and tcm_parent set to block_index. That gives
the user clear indication, that the filter belongs to a shared block
and not only to one qdisc under which it is dumped.
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During block bind, we need to check tc offload feature. If it is
disabled yet still the block contains offloaded filters, forbid the
bind. Also forbid to register callback for a block that already
contains offloaded filters, as the play back is not supported now.
For keeping track of offloaded filters there is a new counter
introduced, alongside with couple of helpers called from cls_* code.
These helpers set and clear TCA_CLS_FLAGS_IN_HW flag.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Both are no longer used, so remove them.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Couple of classifiers call netif_keep_dst directly on q->dev. That is
not possible to do directly for shared blocke where multiple qdiscs are
owning the block. So introduce a infrastructure to keep track of the
block owners in list and use this list to implement block variant of
netif_keep_dst.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow qdiscs to share filter blocks among them. Each qdisc type has to
use block get/put extended modifications that enable sharing.
Shared blocks are tracked within each net namespace and identified
by u32 index. This index is passed from user during the qdisc creation.
If user passes index that is not used by any other qdisc, new block
is created. If user passes index that is already used, the existing
block will be re-used.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
So far, there was possible only to register a single filter chain
pointer to block->chain[0]. However, when the blocks will get shareable,
we need to allow multiple filter chain pointers registration.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 0dfb33a0d7e2 ("sch_red: report backlog information") copied
child's backlog into RED's backlog. Back then RED did not maintain
its own backlog counts. This has changed after commit 2ccccf5fb43f
("net_sched: update hierarchical backlog too") and commit d7f4f332f082
("sch_red: update backlog as well"). Copying is no longer necessary.
Tested:
$ tc -s qdisc show dev veth0
qdisc red 1: root refcnt 2 limit 400000b min 30000b max 30000b ecn
Sent 20942 bytes 221 pkt (dropped 0, overlimits 0 requeues 0)
backlog 1260b 14p requeues 14
marked 0 early 0 pdrop 0 other 0
qdisc tbf 2: parent 1: rate 1Kbit burst 15000b lat 3585.0s
Sent 20942 bytes 221 pkt (dropped 0, overlimits 138 requeues 0)
backlog 1260b 14p requeues 14
Recently RED offload was added. We need to make sure drivers don't
depend on resetting the stats. This means backlog should be treated
like any other statistic:
total_stat = new_hw_stat - prev_hw_stat;
Adjust mlxsw.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Nogah Frankel <nogahf@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The latest merge between net and net-next introduced a complier assert in
mlx5 driver. In hca_cap_bits older fields are kept along with newer
fields that should have replaced them.
Fixes: c02b3741eb99 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a marker for retpoline to the module VERMAGIC. This catches the case
when a non RETPOLINE compiled module gets loaded into a retpoline kernel,
making it insecure.
It doesn't handle the case when retpoline has been runtime disabled. Even
in this case the match of the retcompile status will be enforced. This
implies that even with retpoline run time disabled all modules loaded need
to be recompiled.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: rusty@rustcorp.com.au
Cc: arjan.van.de.ven@intel.com
Cc: jeyu@kernel.org
Cc: torvalds@linux-foundation.org
Link: https://lkml.kernel.org/r/20180116205228.4890-1-andi@firstfloor.org
|
|
Overlapping changes all over.
The mini-qdisc bits were a little bit tricky, however.
Signed-off-by: David S. Miller <davem@davemloft.net>
|