Age | Commit message (Collapse) | Author |
|
This patch adds handling of GSO packets in bpf_lwt_push_ip_encap()
(called from bpf_lwt_push_encap):
* IPIP, GRE, and UDP encapsulation types are deduced by looking
into iphdr->protocol or ipv6hdr->next_header;
* SCTP GSO packets are not supported (as bpf_skb_proto_4_to_6
and similar do);
* UDP_L4 GSO packets are also not supported (although they are
not blocked in bpf_skb_proto_4_to_6 and similar), as
skb_decrease_gso_size() will break it;
* SKB_GSO_DODGY bit is set.
Note: it may be possible to support SCTP and UDP_L4 gso packets;
but as these cases seem to be not well handled by other
tunneling/encapping code paths, the solution should
be generic enough to apply to all tunneling/encapping code.
v8 changes:
- make sure that if GRE or UDP encap is detected, there is
enough of pushed bytes to cover both IP[v6] + GRE|UDP headers;
- do not reject double-encapped packets;
- whitelist TCP GSO packets rather than block SCTP GSO and
UDP GSO.
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap BPF helper.
It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN and
BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
to packets (e.g. IP/GRE, GUE, IPIP).
This is useful when thousands of different short-lived flows should be
encapped, each with different and dynamically determined destination.
Although lwtunnels can be used in some of these scenarios, the ability
to dynamically generate encap headers adds more flexibility, e.g.
when routing depends on the state of the host (reflected in global bpf
maps).
v7 changes:
- added a call skb_clear_hash();
- removed calls to skb_set_transport_header();
- refuse to encap GSO-enabled packets.
v8 changes:
- fix build errors when LWT is not enabled.
Note: the next patch in the patchset with deal with GSO-enabled packets,
which are currently rejected at encapping attempt.
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds all needed plumbing in preparation to allowing
bpf programs to do IP encapping via bpf_lwt_push_encap. Actual
implementation is added in the next patch in the patchset.
Of note:
- bpf_lwt_push_encap can now be called from BPF_PROG_TYPE_LWT_XMIT
prog types in addition to BPF_PROG_TYPE_LWT_IN;
- if the skb being encapped has GSO set, encapsulation is limited
to IPIP/IP+GRE/IP+GUE (both IPv4 and IPv6);
- as route lookups are different for ingress vs egress, the single
external bpf_lwt_push_encap BPF helper is routed internally to
either bpf_lwt_in_push_encap or bpf_lwt_xmit_push_encap BPF_CALLs,
depending on prog type.
v8 changes: fixed a typo.
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This can remove redundant check
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit b639583f9e36d044ac1b13090ae812266992cbac.
As per discussion with Jakub Kicinski and Michal Kubecek,
this will be better addressed by soon-too-come ethtool netlink
API with additional indication that given configuration request
is supposed to be persisted.
Also, remove the parameter support from bnxt_en driver.
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Michal Kubecek <mkubecek@suse.cz>
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Number of devlink attributes has grown over 128, causing the
following warning:
../net/core/devlink.c: In function ‘devlink_nl_cmd_region_read_dumpit’:
../net/core/devlink.c:3740:1: warning: the frame size of 1064 bytes is larger than 1024 bytes [-Wframe-larger-than=]
}
^
Since the number of attributes is only going to grow allocate
the array dynamically.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need the port to be both ethernet and have the rigth netdev,
not one or the other.
Fixes: ddb6e99e2db1 ("ethtool: add compat for devlink info")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add WARN_ON to make sure that all sub objects of a devlink device are
cleanedup before freeing the devlink device.
This helps to catch any driver bugs.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
bpf_skb_change_proto and bpf_skb_adjust_room change skb header length.
For GSO packets they adjust gso_size to maintain the same MTU.
The gso size can only be safely adjusted on bytestream protocols.
Commit d02f51cbcf12 ("bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat
to deal with gso sctp skbs") excluded SKB_GSO_SCTP.
Since then type SKB_GSO_UDP_L4 has been added, whose contents are one
gso_size unit per datagram. Also exclude these.
Move from a blacklist to a whitelist check to future proof against
additional such new GSO types, e.g., for fraglist based GRO.
Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds a helper function BPF_FUNC_tcp_sock and it
is currently available for cg_skb and sched_(cls|act):
struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk);
int cg_skb_foo(struct __sk_buff *skb) {
struct bpf_tcp_sock *tp;
struct bpf_sock *sk;
__u32 snd_cwnd;
sk = skb->sk;
if (!sk)
return 1;
tp = bpf_tcp_sock(sk);
if (!tp)
return 1;
snd_cwnd = tp->snd_cwnd;
/* ... */
return 1;
}
A 'struct bpf_tcp_sock' is also added to the uapi bpf.h to provide
read-only access. bpf_tcp_sock has all the existing tcp_sock's fields
that has already been exposed by the bpf_sock_ops.
i.e. no new tcp_sock's fields are exposed in bpf.h.
This helper returns a pointer to the tcp_sock. If it is not a tcp_sock
or it cannot be traced back to a tcp_sock by sk_to_full_sk(), it
returns NULL. Hence, the caller needs to check for NULL before
accessing it.
The current use case is to expose members from tcp_sock
to allow a cg_skb_bpf_prog to provide per cgroup traffic
policing/shaping.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The next patch will introduce a new "struct bpf_tcp_sock" which
exposes the same tcp_sock's fields already exposed in
"struct bpf_sock_ops".
This patch refactor the existing convert_ctx_access() codes for
"struct bpf_sock_ops" to get them ready to be reused for
"struct bpf_tcp_sock". The "rtt_min" is not refactored
in this patch because its handling is different from other
fields.
The SOCK_OPS_GET_TCP_SOCK_FIELD is new. All other SOCK_OPS_XXX_FIELD
changes are code move only.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds "state", "dst_ip4", "dst_ip6" and "dst_port" to the
bpf_sock. The userspace has already been using "state",
e.g. inet_diag (ss -t) and getsockopt(TCP_INFO).
This patch also allows narrow load on the following existing fields:
"family", "type", "protocol" and "src_port". Unlike IP address,
the load offset is resticted to the first byte for them but it
can be relaxed later if there is a use case.
This patch also folds __sock_filter_check_size() into
bpf_sock_is_valid_access() since it is not called
by any where else. All bpf_sock checking is in
one place.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In kernel, it is common to check "skb->sk && sk_fullsock(skb->sk)"
before accessing the fields in sock. For example, in __netdev_pick_tx:
static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb,
struct net_device *sb_dev)
{
/* ... */
struct sock *sk = skb->sk;
if (queue_index != new_index && sk &&
sk_fullsock(sk) &&
rcu_access_pointer(sk->sk_dst_cache))
sk_tx_queue_set(sk, new_index);
/* ... */
return queue_index;
}
This patch adds a "struct bpf_sock *sk" pointer to the "struct __sk_buff"
where a few of the convert_ctx_access() in filter.c has already been
accessing the skb->sk sock_common's fields,
e.g. sock_ops_convert_ctx_access().
"__sk_buff->sk" is a PTR_TO_SOCK_COMMON_OR_NULL in the verifier.
Some of the fileds in "bpf_sock" will not be directly
accessible through the "__sk_buff->sk" pointer. It is limited
by the new "bpf_sock_common_is_valid_access()".
e.g. The existing "type", "protocol", "mark" and "priority" in bpf_sock
are not allowed.
The newly added "struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)"
can be used to get a sk with all accessible fields in "bpf_sock".
This helper is added to both cg_skb and sched_(cls|act).
int cg_skb_foo(struct __sk_buff *skb) {
struct bpf_sock *sk;
sk = skb->sk;
if (!sk)
return 1;
sk = bpf_sk_fullsock(sk);
if (!sk)
return 1;
if (sk->family != AF_INET6 || sk->protocol != IPPROTO_TCP)
return 1;
/* some_traffic_shaping(); */
return 1;
}
(1) The sk is read only
(2) There is no new "struct bpf_sock_common" introduced.
(3) Future kernel sock's members could be added to bpf_sock only
instead of repeatedly adding at multiple places like currently
in bpf_sock_ops_md, bpf_sock_addr_md, sk_reuseport_md...etc.
(4) After "sk = skb->sk", the reg holding sk is in type
PTR_TO_SOCK_COMMON_OR_NULL.
(5) After bpf_sk_fullsock(), the return type will be in type
PTR_TO_SOCKET_OR_NULL which is the same as the return type of
bpf_sk_lookup_xxx().
However, bpf_sk_fullsock() does not take refcnt. The
acquire_reference_state() is only depending on the return type now.
To avoid it, a new is_acquire_function() is checked before calling
acquire_reference_state().
(6) The WARN_ON in "release_reference_state()" is no longer an
internal verifier bug.
When reg->id is not found in state->refs[], it means the
bpf_prog does something wrong like
"bpf_sk_release(bpf_sk_fullsock(skb->sk))" where reference has
never been acquired by calling "bpf_sk_fullsock(skb->sk)".
A -EINVAL and a verbose are done instead of WARN_ON. A test is
added to the test_verifier in a later patch.
Since the WARN_ON in "release_reference_state()" is no longer
needed, "__release_reference_state()" is folded into
"release_reference_state()" also.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
net/core/ethtool.c:3023:19: warning: address of array
'ext_m_spec->h_dest' will always evaluate to 'true'
[-Wpointer-bool-conversion]
if (ext_m_spec->h_dest) {
~~ ~~~~~~~~~~~~^~~~~~
h_dest is an array, it can't be null so remove this check.
Fixes: eca4205f9ec3 ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator")
Link: https://github.com/ClangBuiltLinux/linux/issues/353
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, user can do dump or get of param values right after the
devlink params are registered. However the driver may not be initialized
which is an issue. The same problem happens during notification
upon param registration. Allow driver to publish devlink params
whenever it is ready to handle get() ops. Note that this cannot
be resolved by init reordering, as the "driverinit" params have
to be available before the driver is initialized (it needs the param
values there).
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
An ipvlan bug fix in 'net' conflicted with the abstraction away
of the IPV6 specific support in 'net-next'.
Similarly, a bug fix for mlx5 in 'net' conflicted with the flow
action conversion in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add devlink health dump commands, in order to run an dump operation
over a specific reporter.
The supported operations are dump_get in order to get last saved
dump (if not exist, dump now) and dump_clear to clear last saved
dump.
It is expected from driver's callback for diagnose command to fill it
via the devlink fmsg API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add devlink health diagnose command, in order to run a diagnose
operation over a specific reporter.
It is expected from driver's callback for diagnose command to fill it
via the devlink fmsg API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add devlink health recover command to the uapi, in order to allow the user
to execute a recover operation over a specific reporter.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add devlink health set command, in order to set configuration parameters
for a specific reporter.
Supported parameters are:
- graceful_period: Time interval between auto recoveries (in msec)
- auto_recover: Determines if the devlink shall execute recover upon
receiving error for the reporter
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add devlink health get command to provide reporter/s data for user space.
Add the ability to get data per reporter or dump data from all available
reporters.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Upon error discover, every driver can report it to the devlink health
mechanism via devlink_health_report function, using the appropriate
reporter registered to it. Driver can pass error specific context which
will be delivered to it as part of the dump / recovery callbacks.
Once an error is reported, devlink health will do the following actions:
* A log is being send to the kernel trace events buffer
* Health status and statistics are being updated for the reporter instance
* Object dump is being taken and stored at the reporter instance (as long
as there is no other dump which is already stored)
* Auto recovery attempt is being done. Depends on:
- Auto Recovery configuration
- Grace period vs. Time since last recover
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Devlink health reporter is an instance for reporting, diagnosing and
recovering from run time errors discovered by the reporters.
Define it's data structure and supported operations.
In addition, expose devlink API to create and destroy a reporter.
Each devlink instance will hold it's own reporters list.
As part of the allocation, driver shall provide a set of callbacks which
will be used by devlink in order to handle health reports and user
commands related to this reporter. In addition, driver is entitled to
provide some priv pointer, which can be fetched from the reporter by
devlink_health_reporter_priv function.
For each reporter, devlink will hold a metadata of statistics,
dump msg and status.
For passing dumps and diagnose data to the user-space, it will use devlink
fmsg API.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Devlink fmsg is a mechanism to pass descriptors between drivers and
devlink, in json-like format. The API allows the driver to add nested
attributes such as object, object pair and value array, in addition to
attributes such as name and value.
Driver can use this API to fill the fmsg context in a format which will be
translated by the devlink to the netlink message later.
There is no memory allocation in advance (other than the initial list
head), and it dynamically allocates messages descriptors and add them to
the list on the fly.
When it needs to send the data using SKBs to the netlink layer, it
fragments the data between different SKBs. In order to do this
fragmentation, it uses virtual nests attributes, to avoid actual
nesting use which cannot be divided between different SKBs.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-02-07
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add a riscv64 JIT for BPF, from Björn.
2) Implement BTF deduplication algorithm for libbpf which takes BTF type
information containing duplicate per-compilation unit information and
reduces it to an equivalent set of BTF types with no duplication and
without loss of information, from Andrii.
3) Offloaded and native BPF XDP programs can coexist today, enable also
offloaded and generic ones as well, from Jakub.
4) Expose various BTF related helper functions in libbpf as API which
are in particular helpful for JITed programs, from Yonghong.
5) Fix the recently added JMP32 code emission in s390x JIT, from Heiko.
6) Fix BPF kselftests' tcp_{server,client}.py to be able to run inside
a network namespace, also add a fix for libbpf to get libbpf_print()
working, from Stanislav.
7) Fixes for bpftool documentation, from Prashant.
8) Type cleanup in BPF kselftests' test_maps.c to silence a gcc8 warning,
from Breno.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that we have a dedicated NDO for getting a port's parent ID, get rid
of SWITCHDEV_ATTR_ID_PORT_PARENT_ID and convert all callers to use the
NDO exclusively. This is a preliminary change to getting rid of
switchdev_ops eventually.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for getting rid of switchdev_ops, create a dedicated NDO
operation for getting the port's parent identifier. There are
essentially two classes of drivers that need to implement getting the
port's parent ID which are VF/PF drivers with a built-in switch, and
pure switchdev drivers such as mlxsw, ocelot, dsa etc.
We introduce a helper function: dev_get_port_parent_id() which supports
recursion into the lower devices to obtain the first port's parent ID.
Convert the bridge, core and ipv4 multicast routing code to check for
such ndo_get_port_parent_id() and call the helper function when valid
before falling back to switchdev_port_attr_get(). This will allow us to
convert all relevant drivers in one go instead of having to implement
both switchdev_port_attr_get() and ndo_get_port_parent_id() operations,
then get rid of switchdev_port_attr_get().
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Define a tracepoint and allow user to trace messages in case of an hardware
error code for hardware associated with devlink instance.
Signed-off-by: Nir Dotan <nird@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds a function to translate the ethtool_rx_flow_spec
structure to the flow_rule representation.
This allows us to reuse code from the driver side given that both flower
and ethtool_rx_flow interfaces use the same representation.
This patch also includes support for the flow type flags FLOW_EXT,
FLOW_MAC_EXT and FLOW_RSS.
The ethtool_rx_flow_spec_input wrapper structure is used to convey the
rss_context field, that is away from the ethtool_rx_flow_spec structure,
and the ethtool_rx_flow_spec structure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This new infrastructure defines the nic actions that you can perform
from existing network drivers. This infrastructure allows us to avoid a
direct dependency with the native software TC action representation.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch wraps the dissector key and mask - that flower uses to
represent the matching side - around the flow_match structure.
To avoid a follow up patch that would edit the same LoCs in the drivers,
this patch also wraps this new flow match structure around the flow rule
object. This new structure will also contain the flow actions in follow
up patches.
This introduces two new interfaces:
bool flow_rule_match_key(rule, dissector_id)
that returns true if a given matching key is set on, and:
flow_rule_match_XYZ(rule, &match);
To fetch the matching side XYZ into the match container structure, to
retrieve the key and the mask with one single call.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since commit a25717d2b604 ("xdp: support simultaneous driver and
hw XDP attachment") users can load an XDP program for offload and
in native driver mode simultaneously. Allow a similar mix of
offload and SKB mode/generic XDP.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
net/core/sock.c: In function 'sock_setsockopt':
net/core/sock.c:914:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
sock_set_flag(sk, SOCK_TSTAMP_NEW);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/core/sock.c:915:2: note: here
case SO_TIMESTAMPING_OLD:
^~~~
Fixes: 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Shared buffer allocation is usually done in cell increments.
Drivers will either round up the allocation or refuse the
configuration if it's not an exact multiple of cell size.
Drivers know exactly the cell size of shared buffer, so help
out users by providing this information in dumps.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add new socket timeout options that are y2038 safe.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: ccaulfie@redhat.com
Cc: davem@davemloft.net
Cc: deller@gmx.de
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: cluster-devel@redhat.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval
as the time format. struct timeval is not y2038 safe.
The subsequent patches in the series add support for new socket
timeout options with _NEW suffix that will use y2038 safe
data structures. Although the existing struct timeval layout
is sufficiently wide to represent timeouts, because of the way
libc will interpret time_t based on user defined flag, these
new flags provide a way of having a structure that is the same
for all architectures consistently.
Rename the existing options with _OLD suffix forms so that the
right option is enabled for userspace applications according
to the architecture and time_t definition of libc.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: ccaulfie@redhat.com
Cc: deller@gmx.de
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: cluster-devel@redhat.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add SO_TIMESTAMPING_NEW variant of socket timestamp options.
This is the y2038 safe versions of the SO_TIMESTAMPING_OLD
for all architectures.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: chris@zankel.net
Cc: fenghua.yu@intel.com
Cc: rth@twiddle.net
Cc: tglx@linutronix.de
Cc: ubraun@linux.ibm.com
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-s390@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add SO_TIMESTAMP_NEW and SO_TIMESTAMPNS_NEW variants of
socket timestamp options.
These are the y2038 safe versions of the SO_TIMESTAMP_OLD
and SO_TIMESTAMPNS_OLD for all architectures.
Note that the format of scm_timestamping.ts[0] is not changed
in this patch.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: jejb@parisc-linux.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: linux-alpha@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the
way they are currently defined, are not y2038 safe.
Subsequent patches in the series add new y2038 safe versions
of these options which provide 64 bit timestamps on all
architectures uniformly.
Hence, rename existing options with OLD tag suffixes.
Also note that kernel will not use the untagged SO_TIMESTAMP*
and SCM_TIMESTAMP* options internally anymore.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: deller@gmx.de
Cc: dhowells@redhat.com
Cc: jejb@parisc-linux.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: linux-afs@lists.infradead.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is a cleanup to prepare for the addition of 64-bit time_t
in O_SNDTIMEO/O_RCVTIMEO. The existing compat handler seems
unnecessarily complex and error-prone, moving it all into the
main setsockopt()/getsockopt() implementation requires half
as much code and is easier to extend.
32-bit user space can now use old_timeval32 on both 32-bit
and 64-bit machines, while 64-bit code can use
__old_kernel_timeval.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf-next 2019-02-01
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) introduce bpf_spin_lock, from Alexei.
2) convert xdp samples to libbpf, from Maciej.
3) skip verifier tests for unsupported program/map types, from Stanislav.
4) powerpc64 JIT support for BTF line info, from Sandipan.
5) assorted fixed, from Valdis, Jesper, Jiong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If driver did not fill the fw_version field, try to call into
the new devlink get_info op and collect the versions that way.
We assume ethtool was always reporting running versions.
v4:
- use IS_REACHABLE() to avoid problems with DEVLINK=m (kbuildbot).
v3 (Jiri):
- do a dump and then parse it instead of special handling;
- concatenate all versions (well, all that fit :)).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ethtool -i has a few fixed-size fields which can be used to report
firmware version and expansion ROM version. Unfortunately, modern
hardware has more firmware components. There is usually some
datapath microcode, management controller, PXE drivers, and a
CPLD load. Running ethtool -i on modern controllers reveals the
fact that vendors cram multiple values into firmware version field.
Here are some examples from systems I could lay my hands on quickly:
tg3: "FFV20.2.17 bc 5720-v1.39"
i40e: "6.01 0x800034a4 1.1747.0"
nfp: "0.0.3.5 0.25 sriov-2.1.16 nic"
Add a new devlink API to allow retrieving multiple versions, and
provide user-readable name for those versions.
While at it break down the versions into three categories:
- fixed - this is the board/fixed component version, usually vendors
report information like the board version in the PCI VPD,
but it will benefit from naming and common API as well;
- running - this is the running firmware version;
- stored - this is firmware in the flash, after firmware update
this value will reflect the flashed version, while the
running version may only be updated after reboot.
v3:
- add per-type helpers instead of using the special argument (Jiri).
RFCv2:
- remove the nesting in attr DEVLINK_ATTR_INFO_VERSIONS (now
versions are mixed with other info attrs)l
- have the driver report versions from the same callback as
other info.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ethtool -i has served us well for a long time, but its showing
its limitations more and more. The device information should
also be reported per device not per-netdev.
Lay foundation for a simple devlink-based way of reading device
info. Add driver name and device serial number as initial pieces
of information exposed via this new API.
v3:
- rename helpers (Jiri);
- rename driver name attr (Jiri);
- remove double spacing in commit message (Jiri).
RFC v2:
- wrap the skb into an opaque structure (Jiri);
- allow the serial number of be any length (Jiri & Andrew);
- add driver name (Jonathan).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2019-01-31
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) disable preemption in sender side of socket filters, from Alexei.
2) fix two potential deadlocks in syscall bpf lookup and prog_register,
from Martin and Alexei.
3) fix BTF to allow typedef on func_proto, from Yonghong.
4) two bpftool fixes, from Jiri and Paolo.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to provide more meaningful messages to user when the process of
loading xdp program onto network interface failed, let's add extack
messages within dev_change_xdp_fd.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let
bpf program serialize access to other variables.
Example:
struct hash_elem {
int cnt;
struct bpf_spin_lock lock;
};
struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
if (val) {
bpf_spin_lock(&val->lock);
val->cnt++;
bpf_spin_unlock(&val->lock);
}
Restrictions and safety checks:
- bpf_spin_lock is only allowed inside HASH and ARRAY maps.
- BTF description of the map is mandatory for safety analysis.
- bpf program can take one bpf_spin_lock at a time, since two or more can
cause dead locks.
- only one 'struct bpf_spin_lock' is allowed per map element.
It drastically simplifies implementation yet allows bpf program to use
any number of bpf_spin_locks.
- when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed.
- bpf program must bpf_spin_unlock() before return.
- bpf program can access 'struct bpf_spin_lock' only via
bpf_spin_lock()/bpf_spin_unlock() helpers.
- load/store into 'struct bpf_spin_lock lock;' field is not allowed.
- to use bpf_spin_lock() helper the BTF description of map value must be
a struct and have 'struct bpf_spin_lock anyname;' field at the top level.
Nested lock inside another struct is not allowed.
- syscall map_lookup doesn't copy bpf_spin_lock field to user space.
- syscall map_update and program map_update do not update bpf_spin_lock field.
- bpf_spin_lock cannot be on the stack or inside networking packet.
bpf_spin_lock can only be inside HASH or ARRAY map value.
- bpf_spin_lock is available to root only and to all program types.
- bpf_spin_lock is not allowed in inner maps of map-in-map.
- ld_abs is not allowed inside spin_lock-ed region.
- tracing progs and socket filter progs cannot use bpf_spin_lock due to
insufficient preemption checks
Implementation details:
- cgroup-bpf class of programs can nest with xdp/tc programs.
Hence bpf_spin_lock is equivalent to spin_lock_irqsave.
Other solutions to avoid nested bpf_spin_lock are possible.
Like making sure that all networking progs run with softirq disabled.
spin_lock_irqsave is the simplest and doesn't add overhead to the
programs that don't use it.
- arch_spinlock_t is used when its implemented as queued_spin_lock
- archs can force their own arch_spinlock_t
- on architectures where queued_spin_lock is not available and
sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used.
- presence of bpf_spin_lock inside map value could have been indicated via
extra flag during map_create, but specifying it via BTF is cleaner.
It provides introspection for map key/value and reduces user mistakes.
Next steps:
- allow bpf_spin_lock in other map types (like cgroup local storage)
- introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper
to request kernel to grab bpf_spin_lock before rewriting the value.
That will serialize access to map elements.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
We recently changed this function in commit f9fc54d313fa ("ethtool:
check the return value of get_regs_len") such that if "reglen" is zero
we return directly. That means we can remove this condition as well.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
wake_on_lan - Enables Wake on Lan for this port. If enabled,
the controller asserts a wake pin based on the WOL type.
v2->v3:
- Define only WOL types used now and define them as bitfield, so that
mutliple WOL types can be enabled upon power on.
- Modify "wake-on-lan" name to "wake_on_lan" to be symmetric with
previous definitions.
- Rename DEVLINK_PARAM_WOL_XXX to DEVLINK_PARAM_WAKE_XXX to be
symmetrical with ethtool WOL definitions.
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add notification call for devlink port param set, register and unregister
functions.
Add devlink_port_param_value_changed() function to enable the driver notify
devlink on value change. Driver should use this function after value was
changed on any configuration mode part to driverinit.
v7->v8:
Order devlink_port_param_value_changed() definitions followed by
devlink_param_value_changed()
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|