summaryrefslogtreecommitdiff
path: root/include/uapi
AgeCommit message (Collapse)Author
2018-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
net/sched/cls_api.c has overlapping changes to a call to nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL to the 5th argument, and another (from 'net-next') added cb->extack instead of NULL to the 6th argument. net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to code which moved (to mr_table_dump)) in 'net-next'. Thanks to David Ahern for the heads up. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL insteadXin Long
According to rfc7496 section 4.3 or 4.4: sprstat_policy: This parameter indicates for which PR-SCTP policy the user wants the information. It is an error to use SCTP_PR_SCTP_NONE in sprstat_policy. If SCTP_PR_SCTP_ALL is used, the counters provided are aggregated over all supported policies. We change to dump pr_assoc and pr_stream all status by SCTP_PR_SCTP_ALL instead, and return error for SCTP_PR_SCTP_NONE, as it also said "It is an error to use SCTP_PR_SCTP_NONE in sprstat_policy. " Fixes: 826d253d57b1 ("sctp: add SCTP_PR_ASSOC_STATUS on sctp sockopt") Fixes: d229d48d183f ("sctp: add SCTP_PR_STREAM_STATUS sockopt for prsctp") Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15net/ncsi: Extend NC-SI Netlink interface to allow user space to send NC-SI ↵Justin.Lee1@Dell.com
command The new command (NCSI_CMD_SEND_CMD) is added to allow user space application to send NC-SI command to the network card. Also, add a new attribute (NCSI_ATTR_DATA) for transferring request and response. The work flow is as below. Request: User space application -> Netlink interface (msg) -> new Netlink handler - ncsi_send_cmd_nl() -> ncsi_xmit_cmd() Response: Response received - ncsi_rcv_rsp() -> internal response handler - ncsi_rsp_handler_xxx() -> ncsi_rsp_handler_netlink() -> ncsi_send_netlink_rsp () -> Netlink interface (msg) -> user space application Command timeout - ncsi_request_timeout() -> ncsi_send_netlink_timeout () -> Netlink interface (msg with zero data length) -> user space application Error: Error detected -> ncsi_send_netlink_err () -> Netlink interface (err msg) -> user space application Signed-off-by: Justin Lee <justin.lee1@dell.com> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15FDDI: defza: Add support for DEC FDDIcontroller 700 TURBOchannel adapterMaciej W. Rozycki
Add support for the DEC FDDIcontroller 700 (DEFZA), Digital Equipment Corporation's first-generation FDDI network interface adapter, made for TURBOchannel and based on a discrete version of what eventually became Motorola's widely used CAMEL chipset. The CAMEL chipset is present for example in the DEC FDDIcontroller TURBOchannel, EISA and PCI adapters (DEFTA/DEFEA/DEFPA) that we support with the `defxx' driver, however the host bus interface logic and the firmware API are different in the DEFZA and hence a separate driver is required. There isn't much to say about the driver except that it works, but there is one peculiarity to mention. The adapter implements two Tx/Rx queue pairs. Of these one pair is the usual network Tx/Rx queue pair, in this case used by the adapter to exchange frames with the ring, via the RMC (Ring Memory Controller) chip. The Tx queue is handled directly by the RMC chip and resides in onboard packet memory. The Rx queue is maintained via DMA in host memory by adapter's firmware copying received data stored by the RMC in onboard packet memory. The other pair is used to communicate SMT frames with adapter's firmware. Any SMT frame received from the RMC via the Rx queue must be queued back by the driver to the SMT Rx queue for the firmware to process. Similarly the firmware uses the SMT Tx queue to supply the driver with SMT frames that must be queued back to the Tx queue for the RMC to send to the ring. This solution was chosen because the designers ran out of PCB space and could not squeeze in more logic onto the board that would be required to handle this SMT frame traffic without the need to involve the driver, as with the later DEFTA/DEFEA/DEFPA adapters. Finally the driver does some Frame Control byte decoding, so to avoid magic numbers some macros are added to <linux/if_fddi.h>. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts were easy to resolve using immediate context mostly, except the cls_u32.c one where I simply too the entire HEAD chunk. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12Merge tag 'mac80211-next-for-davem-2018-10-12' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Highlights: * merge net-next, so I can finish the hwsim workqueue removal * fix TXQ NULL pointer issue that was reported multiple times * minstrel cleanups from Felix * simplify lib80211 code by not using skcipher, note that this will conflict with the crypto tree (and this new code here should be used) * use new netlink policy validation in nl80211 * fix up SAE (part of WPA3) in client-mode * FTM responder support in the stack ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12net: bridge: add support for per-port vlan statsNikolay Aleksandrov
This patch adds an option to have per-port vlan stats instead of the default global stats. The option can be set only when there are no port vlans in the bridge since we need to allocate the stats if it is set when vlans are being added to ports (and respectively free them when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the largest user of options. The current stats design allows us to add these without any changes to the fast-path, it all comes down to the per-vlan stats pointer which, if this option is enabled, will be allocated for each port vlan instead of using the global bridge-wide one. CC: bridge@lists.linux-foundation.org CC: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12nl80211: Add per peer statistics to compute FCS error rateAnkita Bajaj
Add support for drivers to report the total number of MPDUs received and the number of MPDUs received with an FCS error from a specific peer. These counters will be incremented only when the TA of the frame matches the MAC address of the peer irrespective of FCS error. It should be noted that the TA field in the frame might be corrupted when there is an FCS error and TA matching logic would fail in such cases. Hence, FCS error counter might not be fully accurate, but it can provide help in detecting bad RX links in significant number of cases. This FCS error counter without full accuracy can be used, e.g., to trigger a kick-out of a connected client with a bad link in AP mode to force such a client to roam to another AP. Signed-off-by: Ankita Bajaj <bankita@codeaurora.org> Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-10Merge tag 'rxrpc-fixes-20181008' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fix packet reception code Here are a set of patches that prepares for and fix problems in rxrpc's package reception code. There serious problems are: (A) There's a window between binding the socket and setting the data_ready hook in which packets can find their way into the UDP socket's receive queues. (B) The skb_recv_udp() will return an error (and clear the error state) if there was an error on the Tx side. rxrpc doesn't handle this. (C) The rxrpc data_ready handler doesn't fully drain the UDP receive queue. (D) The rxrpc data_ready handler assumes it is called in a non-reentrant state. The second patch fixes (A) - (C); the third patch renders (B) and (C) non-issues by using the recap_rcv hook instead of data_ready - and the final patch fixes (D). That last is the most complex. The preparatory patches are: (1) Fix some places that are doing things in the wrong net namespace. (2) Stop taking the rcu read lock as it's held by the IP input routine in the call chain. (3) Only end the Tx phase if *we* rotated the final packet out of the Tx buffer. (4) Don't assume that the call state won't change after dropping the call_state lock. (5) Only take receive window and MTU suze parameters from an ACK packet if it's the latest ACK packet. (6) Record connection-level abort information correctly. (7) Fix a trace line. And then there are three main patches - note that these are mixed in with the preparatory patches somewhat: (1) Fix the setup window (A), skb_recv_udp() error check (B) and packet drainage (C). (2) Switch to using the encap_rcv instead of data_ready to cut out the effects of the UDP read queues and get the packets delivered directly. (3) Add more locking into the various packet input paths to defend against re-entrance (D). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2018-10-08 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow BPF programs to perform lookups for sockets in a network namespace. This would allow programs to determine early on in processing whether the stack is expecting to receive the packet, and perform some action (eg drop, forward somewhere) based on this information. 2) per-cpu cgroup local storage from Roman Gushchin. Per-cpu cgroup local storage is very similar to simple cgroup storage except all the data is per-cpu. The main goal of per-cpu variant is to implement super fast counters (e.g. packet counters), which don't require neither lookups, neither atomic operations in a fast path. The example of these hybrid counters is in selftests/bpf/netcnt_prog.c 3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet 4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski 5) rename of libbpf interfaces from Andrey Ignatov. libbpf is maturing as a library and should follow good practices in library design and implementation to play well with other libraries. This patch set brings consistent naming convention to global symbols. 6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov to let Apache2 projects use libbpf 7) various AF_XDP fixes from Björn and Magnus ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree: 1) Support for matching on ipsec policy already set in the route, from Florian Westphal. 2) Split set destruction into deactivate and destroy phase to make it fit better into the transaction infrastructure, also from Florian. This includes a patch to warn on imbalance when setting the new activate and deactivate interfaces. 3) Release transaction list from the workqueue to remove expensive synchronize_rcu() from configuration plane path. This speeds up configuration plane quite a bit. From Florian Westphal. 4) Add new xfrm/ipsec extension, this new extension allows you to match for ipsec tunnel keys such as source and destination address, spi and reqid. From Máté Eckl and Florian Westphal. 5) Add secmark support, this includes connsecmark too, patches from Christian Gottsche. 6) Allow to specify remaining bytes in xt_quota, from Chenbo Feng. One follow up patch to calm a clang warning for this one, from Nathan Chancellor. 7) Flush conntrack entries based on layer 3 family, from Kristian Evensen. 8) New revision for cgroups2 to shrink the path field. 9) Get rid of obsolete need_conntrack(), as a result from recent demodularization works. 10) Use WARN_ON instead of BUG_ON, from Florian Westphal. 11) Unused exported symbol in nf_nat_ipv4_fn(), from Florian. 12) Remove superfluous check for timeout netlink parser and dump functions in layer 4 conntrack helpers. 13) Unnecessary redundant rcu read side locks in NAT redirect, from Taehee Yoo. 14) Pass nf_hook_state structure to error handlers, patch from Florian Westphal. 15) Remove ->new() interface from layer 4 protocol trackers. Place them in the ->packet() interface. From Florian. 16) Place conntrack ->error() handling in the ->packet() interface. Patches from Florian Westphal. 17) Remove unused parameter in the pernet initialization path, also from Florian. 18) Remove additional parameter to specify layer 3 protocol when looking up for protocol tracker. From Florian. 19) Shrink array of layer 4 protocol trackers, from Florian. 20) Check for linear skb only once from the ALG NAT mangling codebase, from Taehee Yoo. 21) Use rhashtable_walk_enter() instead of deprecated rhashtable_walk_init(), also from Taehee. 22) No need to flush all conntracks when only one single address is gone, from Tan Hu. 23) Remove redundant check for NAT flags in flowtable code, from Taehee Yoo. 24) Use rhashtable_lookup() instead of rhashtable_lookup_fast() from netfilter codebase, since rcu read lock side is already assumed in this path. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08netlink: Add new socket option to enable strict checking on dumpsDavid Ahern
Add a new socket option, NETLINK_DUMP_STRICT_CHK, that userspace can use via setsockopt to request strict checking of headers and attributes on dump requests. To get dump features such as kernel side filtering based on data in the header or attributes appended to the dump request, userspace must call setsockopt() for NETLINK_DUMP_STRICT_CHK and a non-zero value. Since the netlink sock and its flags are private to the af_netlink code, the strict checking flag is passed to dump handlers via a flag in the netlink_callback struct. For old userspace on new kernel there is no impact as all of the data checks in later patches are wrapped in a check on the new strict flag. For new userspace on old kernel, the setsockopt will fail and even if new userspace sets data in the headers and appended attributes the kernel will silently ignore it. Moving forward when the setsockopt succeeds, the new userspace on old kernel means the dump request can pass an attribute the kernel does not understand. The dump will then fail as the older kernel does not understand it. New userspace on new kernel setting the socket option gets the benefit of the improved data dump. Kernel side the NETLINK_DUMP_STRICT_CHK uapi is converted to a generic NETLINK_F_STRICT_CHK flag which can potentially be leveraged for tighter checking on the NEW, DEL, and SET commands. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08rxrpc: Use the UDP encap_rcv hookDavid Howells
Use the UDP encap_rcv hook to cut the bit out of the rxrpc packet reception in which a packet is placed onto the UDP receive queue and then immediately removed again by rxrpc. Going via the queue in this manner seems like it should be unnecessary. This does, however, require the invention of a value to place in encap_type as that's one of the conditions to switch packets out to the encap_rcv hook. Possibly the value doesn't actually matter for anything other than sockopts on the UDP socket, which aren't accessible outside of rxrpc anyway. This seems to cut a bit of time out of the time elapsed between each sk_buff being timestamped and turning up in rxrpc (the final number in the following trace excerpts). I measured this by making the rxrpc_rx_packet trace point print the time elapsed between the skb being timestamped and the current time (in ns), e.g.: ... 424.278721: rxrpc_rx_packet: ... ACK 25026 So doing a 512MiB DIO read from my test server, with an unmodified kernel: N min max sum mean stddev 27605 2626 7581 7.83992e+07 2840.04 181.029 and with the patch applied: N min max sum mean stddev 27547 1895 12165 6.77461e+07 2459.29 255.02 Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-08Merge remote-tracking branch 'net-next/master' into mac80211-nextJohannes Berg
Merge net-next, which pulled in net, so I can merge a few more patches that would otherwise conflict. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-07net/smc: retain old name for diag_mode fieldEugene Syromiatnikov
Commit c601171d7a60 ("net/smc: provide smc mode in smc_diag.c") changed the name of diag_fallback field of struct smc_diag_msg structure to diag_mode. However, this structure is a part of UAPI, and this change breaks user space applications that use it ([1], for example). Since the new name is more suitable, convert the field to a union that provides access to the data via both the new and the old name. [1] https://gitlab.com/strace/strace/blob/v4.24/netlink_smc_diag.c#L165 Fixes: c601171d7a60 ("net/smc: provide smc mode in smc_diag.c") Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-07net/smc: use __aligned_u64 for 64-bit smc_diag fieldsEugene Syromiatnikov
Commit 4b1b7d3b30a6 ("net/smc: add SMC-D diag support") introduced new UAPI-exposed structure, struct smcd_diag_dmbinfo. However, it's not usable by compat binaries, as it has different layout there. Probably, the most straightforward fix that will avoid similar issues in the future is to use __aligned_u64 for 64-bit fields. Fixes: 4b1b7d3b30a6 ("net/smc: add SMC-D diag support") Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2018-10-05mm/hugetlb: add mmap() encodings for 32MB and 512MB page sizesAnshuman Khandual
ARM64 architecture also supports 32MB and 512MB HugeTLB page sizes. This just adds mmap() system call argument encoding for them. Link: http://lkml.kernel.org/r/1537841300-6979-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-04tc: Add support for configuring the taprio schedulerVinicius Costa Gomes
This traffic scheduler allows traffic classes states (transmission allowed/not allowed, in the simplest case) to be scheduled, according to a pre-generated time sequence. This is the basis of the IEEE 802.1Qbv specification. Example configuration: tc qdisc replace dev enp3s0 parent root handle 100 taprio \ num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \ queues 1@0 1@1 2@2 \ base-time 1528743495910289987 \ sched-entry S 01 300000 \ sched-entry S 02 300000 \ sched-entry S 04 300000 \ clockid CLOCK_TAI The configuration format is similar to mqprio. The main difference is the presence of a schedule, built by multiple "sched-entry" definitions, each entry has the following format: sched-entry <CMD> <GATE MASK> <INTERVAL> The only supported <CMD> is "S", which means "SetGateStates", following the IEEE 802.1Qbv-2015 definition (Table 8-6). <GATE MASK> is a bitmask where each bit is a associated with a traffic class, so bit 0 (the least significant bit) being "on" means that traffic class 0 is "active" for that schedule entry. <INTERVAL> is a time duration in nanoseconds that specifies for how long that state defined by <CMD> and <GATE MASK> should be held before moving to the next entry. This schedule is circular, that is, after the last entry is executed it starts from the first one, indefinitely. The other parameters can be defined as follows: - base-time: specifies the instant when the schedule starts, if 'base-time' is a time in the past, the schedule will start at base-time + (N * cycle-time) where N is the smallest integer so the resulting time is greater than "now", and "cycle-time" is the sum of all the intervals of the entries in the schedule; - clockid: specifies the reference clock to be used; The parameters should be similar to what the IEEE 802.1Q family of specification defines. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04dns: Allow the dns resolver to retrieve a server setDavid Howells
Allow the DNS resolver to retrieve a set of servers and their associated addresses, ports, preference and weight ratings. In terms of communication with userspace, "srv=1" is added to the callout string (the '1' indicating the maximum data version supported by the kernel) to ask the userspace side for this. If the userspace side doesn't recognise it, it will ignore the option and return the usual text address list. If the userspace side does recognise it, it will return some binary data that begins with a zero byte that would cause the string parsers to give an error. The second byte contains the version of the data in the blob (this may be between 1 and the version specified in the callout data). The remainder of the payload is version-specific. In version 1, the payload looks like (note that this is packed): u8 Non-string marker (ie. 0) u8 Content (0 => Server list) u8 Version (ie. 1) u8 Source (eg. DNS_RECORD_FROM_DNS_SRV) u8 Status (eg. DNS_LOOKUP_GOOD) u8 Number of servers foreach-server { u16 Name length (LE) u16 Priority (as per SRV record) (LE) u16 Weight (as per SRV record) (LE) u16 Port (LE) u8 Source (eg. DNS_RECORD_FROM_NSS) u8 Status (eg. DNS_LOOKUP_GOT_NOT_FOUND) u8 Protocol (eg. DNS_SERVER_PROTOCOL_UDP) u8 Number of addresses char[] Name (not NUL-terminated) foreach-address { u8 Family (AF_INET{,6}) union { u8[4] ipv4_addr u8[16] ipv6_addr } } } This can then be used to fetch a whole cell's VL-server configuration for AFS, for example. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-03netfilter: xt_quota: fix the behavior of xt_quota moduleChenbo Feng
A major flaw of the current xt_quota module is that quota in a specific rule gets reset every time there is a rule change in the same table. It makes the xt_quota module not very useful in a table in which iptables rules are changed at run time. This fix introduces a new counter that is visible to userspace as the remaining quota of the current rule. When userspace restores the rules in a table, it can restore the counter to the remaining quota instead of resetting it to the full quota. Signed-off-by: Chenbo Feng <fengc@google.com> Suggested-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-10-03bpf: Add helper to retrieve socket in BPFJoe Stringer
This patch adds new BPF helper functions, bpf_sk_lookup_tcp() and bpf_sk_lookup_udp() which allows BPF programs to find out if there is a socket listening on this host, and returns a socket pointer which the BPF program can then access to determine, for instance, whether to forward or drop traffic. bpf_sk_lookup_xxx() may take a reference on the socket, so when a BPF program makes use of this function, it must subsequently pass the returned pointer into the newly added sk_release() to return the reference. By way of example, the following pseudocode would filter inbound connections at XDP if there is no corresponding service listening for the traffic: struct bpf_sock_tuple tuple; struct bpf_sock_ops *sk; populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet sk = bpf_sk_lookup_tcp(ctx, &tuple, sizeof tuple, netns, 0); if (!sk) { // Couldn't find a socket listening for this traffic. Drop. return TC_ACT_SHOT; } bpf_sk_release(sk, 0); return TC_ACT_OK; Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-02cfg80211: support FTM responder configuration/statisticsPradeep Kumar Chitrapu
Allow userspace to enable fine timing measurement responder functionality with configurable lci/civic parameters in AP mode. This can be done at AP start or changing beacon parameters. A new EXT_FEATURE flag is introduced for drivers to advertise the capability. Also nl80211 API support for retrieving statistics is added. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org> [remove unused cfg80211_ftm_responder_params, clarify docs, move validation into policy] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-01bpf: introduce per-cpu cgroup local storageRoman Gushchin
This commit introduced per-cpu cgroup local storage. Per-cpu cgroup local storage is very similar to simple cgroup storage (let's call it shared), except all the data is per-cpu. The main goal of per-cpu variant is to implement super fast counters (e.g. packet counters), which don't require neither lookups, neither atomic operations. >From userspace's point of view, accessing a per-cpu cgroup storage is similar to other per-cpu map types (e.g. per-cpu hashmaps and arrays). Writing to a per-cpu cgroup storage is not atomic, but is performed by copying longs, so some minimal atomicity is here, exactly as with other per-cpu maps. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-28netfilter: nf_tables: add SECMARK supportChristian Göttsche
Add the ability to set the security context of packets within the nf_tables framework. Add a nft_object for holding security contexts in the kernel and manipulating packets on the wire. Convert the security context strings at rule addition time to security identifiers. This is the same behavior like in xt_SECMARK and offers better performance than computing it per packet. Set the maximum security context length to 256. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-09-25 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Allow for RX stack hardening by implementing the kernel's flow dissector in BPF. Idea was originally presented at netconf 2017 [0]. Quote from merge commit: [...] Because of the rigorous checks of the BPF verifier, this provides significant security guarantees. In particular, the BPF flow dissector cannot get inside of an infinite loop, as with CVE-2013-4348, because BPF programs are guaranteed to terminate. It cannot read outside of packet bounds, because all memory accesses are checked. Also, with BPF the administrator can decide which protocols to support, reducing potential attack surface. Rarely encountered protocols can be excluded from dissection and the program can be updated without kernel recompile or reboot if a bug is discovered. [...] Also, a sample flow dissector has been implemented in BPF as part of this work, from Petar and Willem. [0] http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf 2) Add support for bpftool to list currently active attachment points of BPF networking programs providing a quick overview similar to bpftool's perf subcommand, from Yonghong. 3) Fix a verifier pruning instability bug where a union member from the register state was not cleared properly leading to branches not being pruned despite them being valid candidates, from Alexei. 4) Various smaller fast-path optimizations in XDP's map redirect code, from Jesper. 5) Enable to recognize BPF_MAP_TYPE_REUSEPORT_SOCKARRAY maps in bpftool, from Roman. 6) Remove a duplicate check in libbpf that probes for function storage, from Taeung. 7) Fix an issue in test_progs by avoid checking for errno since on success its value should not be checked, from Mauricio. 8) Fix unused variable warning in bpf_getsockopt() helper when CONFIG_INET is not configured, from Anders. 9) Fix a compilation failure in the BPF sample code's use of bpf_flow_keys, from Prashant. 10) Minor cleanups in BPF code, from Yue and Zhong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Version bump conflict in batman-adv, take what's in net-next. iavf conflict, adjustment of netdev_ops in net-next conflicting with poll controller method removal in net. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25Revert "uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct ↵Lubomir Rintel
member name" This changes UAPI, breaking iwd and libell: ell/key.c: In function 'kernel_dh_compute': ell/key.c:205:38: error: 'struct keyctl_dh_params' has no member named 'private'; did you mean 'dh_private'? struct keyctl_dh_params params = { .private = private, ^~~~~~~ dh_private This reverts commit 8a2336e549d385bb0b46880435b411df8d8200e8. Fixes: 8a2336e549d3 ("uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name") Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: David Howells <dhowells@redhat.com> cc: Randy Dunlap <rdunlap@infradead.org> cc: Mat Martineau <mathew.j.martineau@linux.intel.com> cc: Stephan Mueller <smueller@chronox.de> cc: James Morris <jmorris@namei.org> cc: "Serge E. Hallyn" <serge@hallyn.com> cc: Mat Martineau <mathew.j.martineau@linux.intel.com> cc: Andrew Morton <akpm@linux-foundation.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: <stable@vger.kernel.org> Signed-off-by: James Morris <james.morris@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-24net/core: Add new basic hardware counterEelco Chaudron
Add a new hardware specific basic counter, TCA_STATS_BASIC_HW. This can be used to count packets/bytes processed by hardware offload. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21net: if_arp: use define instead of hard-coded valueHåkon Bugge
uapi/linux/if_arp.h includes linux/netdevice.h, which uses IFNAMSIZ. Hence, use it instead of hard-coded value. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21net: if_arp: Fix incorrect indentsHåkon Bugge
Fixing incorrect indents and align comments. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmGreg Kroah-Hartman
Paolo writes: "It's mostly small bugfixes and cleanups, mostly around x86 nested virtualization. One important change, not related to nested virtualization, is that the ability for the guest kernel to trap CPUID instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is now masked by default. This is because the feature is detected through an MSR; a very bad idea that Intel seems to like more and more. Some applications choke if the other fields of that MSR are not initialized as on real hardware, hence we have to disable the whole MSR by default, as was the case before Linux 4.12." * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits) KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs kvm: selftests: Add platform_info_test KVM: x86: Control guest reads of MSR_PLATFORM_INFO KVM: x86: Turbo bits in MSR_PLATFORM_INFO nVMX x86: Check VPID value on vmentry of L2 guests nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2 KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv KVM: VMX: check nested state and CR4.VMXE against SMM kvm: x86: make kvm_{load|put}_guest_fpu() static x86/hyper-v: rename ipi_arg_{ex,non_ex} structures KVM: VMX: use preemption timer to force immediate VMExit KVM: VMX: modify preemption timer bit only when arming timer KVM: VMX: immediately mark preemption timer expired only for zero value KVM: SVM: Switch to bitmap_zalloc() KVM/MMU: Fix comment in walk_shadow_page_lockless_end() kvm: selftests: use -pthread instead of -lpthread KVM: x86: don't reset root in kvm_mmu_setup() kvm: mmu: Don't read PDPTEs when paging is not enabled x86/kvm/lapic: always disable MMIO interface in x2APIC mode KVM: s390: Make huge pages unavailable in ucontrol VMs ...
2018-09-20Merge tag 'sound-4.19-rc5' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Takashi writes: "sound fixes for 4.19-rc5 here comes a collection of various fixes, mostly for stable-tree or regression fixes. Two relatively high LOCs are about the (rather simple) conversion of uapi integer types in topology API, and a regression fix about HDMI hotplug notification on AMD HD-audio. The rest are all small individual fixes like ASoC Intel Skylake race condition, minor uninitialized page leak in emu10k1 ioctl, Firewire audio error paths, and so on." * tag 'sound-4.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (33 commits) ALSA: fireworks: fix memory leak of response buffer at error path ALSA: oxfw: fix memory leak of discovered stream formats at error path ALSA: oxfw: fix memory leak for model-dependent data at error path ALSA: bebob: fix memory leak for M-Audio FW1814 and ProjectMix I/O at error path ALSA: hda - Enable runtime PM only for discrete GPU ALSA: oxfw: fix memory leak of private data ALSA: firewire-tascam: fix memory leak of private data ALSA: firewire-digi00x: fix memory leak of private data sound: don't call skl_init_chip() to reset intel skl soc sound: enable interrupt after dma buffer initialization Revert "ASoC: Intel: Skylake: Acquire irq after RIRB allocation" ALSA: emu10k1: fix possible info leak to userspace on SNDRV_EMU10K1_IOCTL_INFO ASoC: cs4265: fix MMTLR Data switch control ASoC: AMD: Ensure reset bit is cleared before configuring ALSA: fireface: fix memory leak in ff400_switch_fetching_mode() ALSA: bebob: use address returned by kmalloc() instead of kernel stack for streaming DMA mapping ASoC: rsnd: don't fallback to PIO mode when -EPROBE_DEFER ASoC: rsnd: adg: care clock-frequency size ASoC: uniphier: change status to orphan ASoC: rsnd: fixup not to call clk_get/set under non-atomic ...
2018-09-20KVM: x86: Control guest reads of MSR_PLATFORM_INFODrew Schmitt
Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access to reads of MSR_PLATFORM_INFO. Disabling access to reads of this MSR gives userspace the control to "expose" this platform-dependent information to guests in a clear way. As it exists today, guests that read this MSR would get unpopulated information if userspace hadn't already set it (and prior to this patch series, only the CPUID faulting information could have been populated). This existing interface could be confusing if guests don't handle the potential for incorrect/incomplete information gracefully (e.g. zero reported for base frequency). Signed-off-by: Drew Schmitt <dasch@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-18Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two new tls tests added in parallel in both net and net-next. Used Stephen Rothwell's linux-next resolution. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17Merge tag 'asoc-v4.19-rc4' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v4.19 This is the usual set of small fixes scatterd around various drivers, plus one fix for DAPM and a UAPI build fix. There's not a huge amount that stands out here relative to anything else.
2018-09-17netfilter: xt_cgroup: shrink size of v2 pathPablo Neira Ayuso
cgroup v2 path field is PATH_MAX which is too large, this is placing too much pressure on memory allocation for people with many rules doing cgroup v1 classid matching, side effects of this are bug reports like: https://bugzilla.kernel.org/show_bug.cgi?id=200639 This patch registers a new revision that shrinks the cgroup path to 512 bytes, which is the same approach we follow in similar extensions that have a path field. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Tejun Heo <tj@kernel.org>
2018-09-17netfilter: nf_tables: add xfrm expressionFlorian Westphal
supports fetching saddr/daddr of tunnel mode states, request id and spi. If direction is 'in', use inbound skb secpath, else dst->xfrm. Joint work with Máté Eckl. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17netfilter: nf_tables: rt: allow checking if dst has xfrm attachedFlorian Westphal
Useful e.g. to avoid NATting inner headers of to-be-encrypted packets. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-14flow_dissector: implements flow dissector BPF hookPetar Penkov
Adds a hook for programs of type BPF_PROG_TYPE_FLOW_DISSECTOR and attach type BPF_FLOW_DISSECTOR that is executed in the flow dissector path. The BPF program is per-network namespace. Signed-off-by: Petar Penkov <ppenkov@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-09-13ipv6: Add sockopt IPV6_MULTICAST_ALL analogue to IP_MULTICAST_ALLAndre Naujoks
The socket option will be enabled by default to ensure current behaviour is not changed. This is the same for the IPv4 version. A socket bound to in6addr_any and a specific port will receive all traffic on that port. Analogue to IP_MULTICAST_ALL, disable this behaviour, if one or more multicast groups were joined (using said socket) and only pass on multicast traffic from groups, which were explicitly joined via this socket. Without this option disabled a socket (system even) joined to multiple multicast groups is very hard to get right. Filtering by destination address has to take place in user space to avoid receiving multicast traffic from other multicast groups, which might have traffic on the same port. The extension of the IP_MULTICAST_ALL socketoption to just apply to ipv6, too, is not done to avoid changing the behaviour of current applications. Signed-off-by: Andre Naujoks <nautsch2@gmail.com> Acked-By: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2018-09-12geneve: add ttl inherit supportHangbin Liu
Similar with commit 72f6d71e491e6 ("vxlan: add ttl inherit support"), currently ttl == 0 means "use whatever default value" on geneve instead of inherit inner ttl. To respect compatibility with old behavior, let's add a new IFLA_GENEVE_TTL_INHERIT for geneve ttl inherit support. Reported-by: Jianlin Shi <jishi@redhat.com> Suggested-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12net: bridge: add support for sticky fdb entriesNikolay Aleksandrov
Add support for entries which are "sticky", i.e. will not change their port if they show up from a different one. A new ndm flag is introduced for that purpose - NTF_STICKY. We allow to set it only to non-local entries. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10net: sched: cls_flower: dump offload count valueVlad Buslov
Change flower in_hw_count type to fixed-size u32 and dump it as TCA_FLOWER_IN_HW_COUNT. This change is necessary to properly test shared blocks and re-offload functionality. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10perf/UAPI: Clearly mark __PERF_SAMPLE_CALLCHAIN_EARLY as internal usePeter Zijlstra
Vince noted that commit: 6cbc304f2f36 ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)") 'leaked' __PERF_SAMPLE_CALLCHAIN_EARLY into the UAPI namespace. And while sys_perf_event_open() will error out if you try to use it, it is exposed. Clearly mark it for internal use only to avoid any confusion. Requested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-05if_link: add IFLA_TARGET_NETNSID aliasChristian Brauner
This adds IFLA_TARGET_NETNSID as an alias for IFLA_IF_NETNSID for RTM_*LINK requests. The new name is clearer and also aligns with the newly introduced IFA_TARGET_NETNSID propert for RTM_*ADDR requests. Signed-off-by: Christian Brauner <christian@brauner.io> Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05if_addr: add IFA_TARGET_NETNSIDChristian Brauner
This adds a new IFA_TARGET_NETNSID property to be used by address families such as PF_INET and PF_INET6. The IFA_TARGET_NETNSID property can be used to send a network namespace identifier as part of a request. If a IFA_TARGET_NETNSID property is identified it will be used to retrieve the target network namespace in which the request is to be made. Signed-off-by: Christian Brauner <christian@brauner.io> Cc: Jiri Benc <jbenc@redhat.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05packet: add sockopt to ignore outgoing packetsVincent Whitchurch
Currently, the only way to ignore outgoing packets on a packet socket is via the BPF filter. With MSG_ZEROCOPY, packets that are looped into AF_PACKET are copied in dev_queue_xmit_nit(), and this copy happens even if the filter run from packet_rcv() would reject them. So the presence of a packet socket on the interface takes away the benefits of MSG_ZEROCOPY, even if the packet socket is not interested in outgoing packets. (Even when MSG_ZEROCOPY is not used, the skb is unnecessarily cloned, but the cost for that is much lower.) Add a socket option to allow AF_PACKET sockets to ignore outgoing packets to solve this. Note that the *BSDs already have something similar: BIOCSSEESENT/BIOCSDIRECTION and BIOCSDIRFILT. The first intended user is lldpd. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05nl80211: Add CAN_REPLACE_PTK0 APIAlexander Wetzel
Drivers able to correctly replace a in-use key should set @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 to allow the user space (e.g. hostapd or wpa_supplicant) to rekey PTK keys. The user space must detect a PTK rekey attempt and only go ahead with it when the driver has set this flag. If the driver is not supporting the feature the user space either must not replace the PTK key or perform a full re-association instead. Ignoring this flag and continuing to rekey the connection can still work but has to be considered insecure and broken. Depending on the driver it can leak clear text packets or freeze the connection and is only supported to allow the user space to be updated. Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Reviewed-by: Denis Kenzior <denkenz@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>