summaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)Author
2020-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Allow setting bluetooth L2CAP modes via socket option, from Luiz Augusto von Dentz. 2) Add GSO partial support to igc, from Sasha Neftin. 3) Several cleanups and improvements to r8169 from Heiner Kallweit. 4) Add IF_OPER_TESTING link state and use it when ethtool triggers a device self-test. From Andrew Lunn. 5) Start moving away from custom driver versions, use the globally defined kernel version instead, from Leon Romanovsky. 6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin. 7) Allow hard IRQ deferral during NAPI, from Eric Dumazet. 8) Add sriov and vf support to hinic, from Luo bin. 9) Support Media Redundancy Protocol (MRP) in the bridging code, from Horatiu Vultur. 10) Support netmap in the nft_nat code, from Pablo Neira Ayuso. 11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina Dubroca. Also add ipv6 support for espintcp. 12) Lots of ReST conversions of the networking documentation, from Mauro Carvalho Chehab. 13) Support configuration of ethtool rxnfc flows in bcmgenet driver, from Doug Berger. 14) Allow to dump cgroup id and filter by it in inet_diag code, from Dmitry Yakunin. 15) Add infrastructure to export netlink attribute policies to userspace, from Johannes Berg. 16) Several optimizations to sch_fq scheduler, from Eric Dumazet. 17) Fallback to the default qdisc if qdisc init fails because otherwise a packet scheduler init failure will make a device inoperative. From Jesper Dangaard Brouer. 18) Several RISCV bpf jit optimizations, from Luke Nelson. 19) Correct the return type of the ->ndo_start_xmit() method in several drivers, it's netdev_tx_t but many drivers were using 'int'. From Yunjian Wang. 20) Add an ethtool interface for PHY master/slave config, from Oleksij Rempel. 21) Add BPF iterators, from Yonghang Song. 22) Add cable test infrastructure, including ethool interfaces, from Andrew Lunn. Marvell PHY driver is the first to support this facility. 23) Remove zero-length arrays all over, from Gustavo A. R. Silva. 24) Calculate and maintain an explicit frame size in XDP, from Jesper Dangaard Brouer. 25) Add CAP_BPF, from Alexei Starovoitov. 26) Support terse dumps in the packet scheduler, from Vlad Buslov. 27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei. 28) Add devm_register_netdev(), from Bartosz Golaszewski. 29) Minimize qdisc resets, from Cong Wang. 30) Get rid of kernel_getsockopt and kernel_setsockopt in order to eliminate set_fs/get_fs calls. From Christoph Hellwig. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits) selftests: net: ip_defrag: ignore EPERM net_failover: fixed rollback in net_failover_open() Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv" Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv" vmxnet3: allow rx flow hash ops only when rss is enabled hinic: add set_channels ethtool_ops support selftests/bpf: Add a default $(CXX) value tools/bpf: Don't use $(COMPILE.c) bpf, selftests: Use bpf_probe_read_kernel s390/bpf: Use bcr 0,%0 as tail call nop filler s390/bpf: Maintain 8-byte stack alignment selftests/bpf: Fix verifier test selftests/bpf: Fix sample_cnt shared between two threads bpf, selftests: Adapt cls_redirect to call csum_level helper bpf: Add csum_level helper for fixing up csum levels bpf: Fix up bpf_skb_adjust_room helper's skb csum setting sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf() crypto/chtls: IPv6 support for inline TLS Crypto/chcr: Fixes a coccinile check error Crypto/chcr: Fixes compilations warnings ...
2020-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-06-01 The following pull-request contains BPF updates for your *net-next* tree. We've added 55 non-merge commits during the last 1 day(s) which contain a total of 91 files changed, 4986 insertions(+), 463 deletions(-). The main changes are: 1) Add rx_queue_mapping to bpf_sock from Amritha. 2) Add BPF ring buffer, from Andrii. 3) Attach and run programs through devmap, from David. 4) Allow SO_BINDTODEVICE opt in bpf_setsockopt, from Ferenc. 5) link based flow_dissector, from Jakub. 6) Use tracing helpers for lsm programs, from Jiri. 7) Several sk_msg fixes and extensions, from John. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01crypto/chtls: IPv6 support for inline TLSVinay Kumar Yadav
Extends support to IPv6 for Inline TLS server. Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> v1->v2: - cc'd tcp folks. v2->v3: - changed EXPORT_SYMBOL() to EXPORT_SYMBOL_GPL() Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01ipv6: fix IPV6_ADDRFORM operation logicHangbin Liu
Socket option IPV6_ADDRFORM supports UDP/UDPLITE and TCP at present. Previously the checking logic looks like: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol != IPPROTO_TCP) break; After commit b6f6118901d1 ("ipv6: restrict IPV6_ADDRFORM operation"), TCP was blocked as the logic changed to: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol == IPPROTO_TCP) do_some_check; break; else break; Then after commit 82c9ae440857 ("ipv6: fix restrict IPV6_ADDRFORM operation") UDP/UDPLITE were blocked as the logic changed to: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; if (sk->sk_protocol == IPPROTO_TCP) do_some_check; if (sk->sk_protocol != IPPROTO_TCP) break; Fix it by using Eric's code and simply remove the break in TCP check, which looks like: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol == IPPROTO_TCP) do_some_check; else break; Fixes: 82c9ae440857 ("ipv6: fix restrict IPV6_ADDRFORM operation") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01net: Make locking in sock_bindtoindex optionalFerenc Fejes
The sock_bindtoindex intended for kernel wide usage however it will lock the socket regardless of the context. This modification relax this behavior optionally: locking the socket will be optional by calling the sock_bindtoindex with lock_sk = true. The modification applied to all users of the sock_bindtoindex. Signed-off-by: Ferenc Fejes <fejes@inf.elte.hu> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/bee6355da40d9e991b2f2d12b67d55ebb5f5b207.1590871065.git.fejes@inf.elte.hu
2020-06-01Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Introduce crypto_shash_tfm_digest() and use it wherever possible. - Fix use-after-free and race in crypto_spawn_alg. - Add support for parallel and batch requests to crypto_engine. Algorithms: - Update jitter RNG for SP800-90B compliance. - Always use jitter RNG as seed in drbg. Drivers: - Add Arm CryptoCell driver cctrng. - Add support for SEV-ES to the PSP driver in ccp" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (114 commits) crypto: hisilicon - fix driver compatibility issue with different versions of devices crypto: engine - do not requeue in case of fatal error crypto: cavium/nitrox - Fix a typo in a comment crypto: hisilicon/qm - change debugfs file name from qm_regs to regs crypto: hisilicon/qm - add DebugFS for xQC and xQE dump crypto: hisilicon/zip - add debugfs for Hisilicon ZIP crypto: hisilicon/hpre - add debugfs for Hisilicon HPRE crypto: hisilicon/sec2 - add debugfs for Hisilicon SEC crypto: hisilicon/qm - add debugfs to the QM state machine crypto: hisilicon/qm - add debugfs for QM crypto: stm32/crc32 - protect from concurrent accesses crypto: stm32/crc32 - don't sleep in runtime pm crypto: stm32/crc32 - fix multi-instance crypto: stm32/crc32 - fix run-time self test issue. crypto: stm32/crc32 - fix ext4 chksum BUG_ON() crypto: hisilicon/zip - Use temporary sqe when doing work crypto: hisilicon - add device error report through abnormal irq crypto: hisilicon - remove codes of directly report device errors through MSI crypto: hisilicon - QM memory management optimization crypto: hisilicon - unify initial value assignment into QM ...
2020-05-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
xdp_umem.c had overlapping changes between the 64-bit math fix for the calculation of npgs and the removal of the zerocopy memory type which got rid of the chunk_size_nohdr member. The mlx5 Kconfig conflict is a case where we just take the net-next copy of the Kconfig entry dependency as it takes on the ESWITCH dependency by one level of indirection which is what the 'net' conflicting change is trying to ensure. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-29Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2020-05-29 1) Several fixes for ESP gro/gso in transport and beet mode when IPv6 extension headers are present. From Xin Long. 2) Fix a wrong comment on XFRMA_OFFLOAD_DEV. From Antony Antony. 3) Fix sk_destruct callback handling on ESP in TCP encapsulation. From Sabrina Dubroca. 4) Fix a use after free in xfrm_output_gso when used with vxlan. From Xin Long. 5) Fix secpath handling of VTI when used wiuth IPCOMP. From Xin Long. 6) Fix an oops when deleting a x-netns xfrm interface. From Nicolas Dichtel. 7) Fix a possible warning on policy updates. We had a case where it was possible to add two policies with the same lookup keys. From Xin Long. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-29Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2020-05-29 1) Add IPv6 encapsulation support for ESP over UDP and TCP. From Sabrina Dubroca. 2) Remove unneeded reference when initializing xfrm interfaces. From Nicolas Dichtel. 3) Remove some indirect calls from the state_afinfo. From Florian Westphal. Please note that this pull request has two merge conflicts between commit: 0c922a4850eb ("xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finish") from Linus' tree and commit: 2ab6096db2f1 ("xfrm: remove output_finish indirection from xfrm_state_afinfo") from the ipsec-next tree. and between commit: 3986912f6a9a ("ipv6: move SIOCADDRT and SIOCDELRT handling into ->compat_ioctl") from the net-next tree and commit: 0146dca70b87 ("xfrm: add support for UDPv6 encapsulation of ESP") from the ipsec-next tree. Both conflicts can be resolved as done in linux-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28ipv6: add ip6_sock_set_addr_preferencesChristoph Hellwig
Add a helper to directly set the IPV6_ADD_PREFERENCES sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28ipv6: add ip6_sock_set_v6onlyChristoph Hellwig
Add a helper to directly set the IPV6_V6ONLY sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net: add sock_bindtoindexChristoph Hellwig
Add a helper to directly set the SO_BINDTOIFINDEX sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28tcp: ipv6: support RFC 6069 (TCP-LD)Eric Dumazet
Make tcp_ld_RTO_revert() helper available to IPv6, and implement RFC 6069 : Quoting this RFC : 3. Connectivity Disruption Indication For Internet Protocol version 6 (IPv6) [RFC2460], the counterpart of the ICMP destination unreachable message of code 0 (net unreachable) and of code 1 (host unreachable) is the ICMPv6 destination unreachable message of code 0 (no route to destination) [RFC4443]. As with IPv4, a router should generate an ICMPv6 destination unreachable message of code 0 in response to a packet that cannot be delivered to its destination address because it lacks a matching entry in its routing table. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-25tcp: allow traceroute -Mtcp for unpriv usersEric Dumazet
Unpriv users can use traceroute over plain UDP sockets, but not TCP ones. $ traceroute -Mtcp 8.8.8.8 You do not have enough privileges to use this traceroute method. $ traceroute -n -Mudp 8.8.8.8 traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets 1 192.168.86.1 3.631 ms 3.512 ms 3.405 ms 2 10.1.10.1 4.183 ms 4.125 ms 4.072 ms 3 96.120.88.125 20.621 ms 19.462 ms 20.553 ms 4 96.110.177.65 24.271 ms 25.351 ms 25.250 ms 5 69.139.199.197 44.492 ms 43.075 ms 44.346 ms 6 68.86.143.93 27.969 ms 25.184 ms 25.092 ms 7 96.112.146.18 25.323 ms 96.112.146.22 25.583 ms 96.112.146.26 24.502 ms 8 72.14.239.204 24.405 ms 74.125.37.224 16.326 ms 17.194 ms 9 209.85.251.9 18.154 ms 209.85.247.55 14.449 ms 209.85.251.9 26.296 ms^C We can easily support traceroute over TCP, by queueing an error message into socket error queue. Note that applications need to set IP_RECVERR/IPV6_RECVERR option to enable this feature, and that the error message is only queued while in SYN_SNT state. socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 3 setsockopt(3, SOL_IPV6, IPV6_RECVERR, [1], 4) = 0 setsockopt(3, SOL_SOCKET, SO_TIMESTAMP_OLD, [1], 4) = 0 setsockopt(3, SOL_IPV6, IPV6_UNICAST_HOPS, [5], 4) = 0 connect(3, {sa_family=AF_INET6, sin6_port=htons(8787), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2002:a05:6608:297::", &sin6_addr), sin6_scope_id=0}, 28) = -1 EHOSTUNREACH (No route to host) recvmsg(3, {msg_name={sa_family=AF_INET6, sin6_port=htons(8787), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2002:a05:6608:297::", &sin6_addr), sin6_scope_id=0}, msg_namelen=1024->28, msg_iov=[{iov_base="`\r\337\320\0004\6\1&\7\370\260\200\231\16\27\0\0\0\0\0\0\0\0 \2\n\5f\10\2\227"..., iov_len=1024}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1590340680, tv_usec=272424}}, {cmsg_len=60, cmsg_level=SOL_IPV6, cmsg_type=IPV6_RECVERR}], msg_controllen=96, msg_flags=MSG_ERRQUEUE}, MSG_ERRQUEUE) = 144 Suggested-by: Maciej Żenczykowski <maze@google.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
The MSCC bug fix in 'net' had to be slightly adjusted because the register accesses are done slightly differently in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-05-23 The following pull-request contains BPF updates for your *net-next* tree. We've added 50 non-merge commits during the last 8 day(s) which contain a total of 109 files changed, 2776 insertions(+), 2887 deletions(-). The main changes are: 1) Add a new AF_XDP buffer allocation API to the core in order to help lowering the bar for drivers adopting AF_XDP support. i40e, ice, ixgbe as well as mlx5 have been moved over to the new API and also gained a small improvement in performance, from Björn Töpel and Magnus Karlsson. 2) Add getpeername()/getsockname() attach types for BPF sock_addr programs in order to allow for e.g. reverse translation of load-balancer backend to service address/port tuple from a connected peer, from Daniel Borkmann. 3) Improve the BPF verifier is_branch_taken() logic to evaluate pointers being non-NULL, e.g. if after an initial test another non-NULL test on that pointer follows in a given path, then it can be pruned right away, from John Fastabend. 4) Larger rework of BPF sockmap selftests to make output easier to understand and to reduce overall runtime as well as adding new BPF kTLS selftests that run in combination with sockmap, also from John Fastabend. 5) Batch of misc updates to BPF selftests including fixing up test_align to match verifier output again and moving it under test_progs, allowing bpf_iter selftest to compile on machines with older vmlinux.h, and updating config options for lirc and v6 segment routing helpers, from Stanislav Fomichev, Andrii Nakryiko and Alan Maguire. 6) Conversion of BPF tracing samples outdated internal BPF loader to use libbpf API instead, from Daniel T. Lee. 7) Follow-up to BPF kernel test infrastructure in order to fix a flake in the XDP selftests, from Jesper Dangaard Brouer. 8) Minor improvements to libbpf's internal hashmap implementation, from Ian Rogers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22ip6_tunnel: add generic MPLS receive supportVadim Fedorenko
Add support for MPLS in receive side. Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22tunnel6: support for IPPROTO_MPLSVadim Fedorenko
This patch is just preparation for MPLS support in ip6_tunnel Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22ip6_tunnel: add MPLS transmit supportVadim Fedorenko
Add ETH_P_MPLS_UC as supported protocol. Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22ip6_tunnel: simplify transmit pathVadim Fedorenko
Merge ip{4,6}ip6_tnl_xmit functions into one universal ipxip6_tnl_xmit in preparation for adding MPLS support. Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22nexthop: support for fdb ecmp nexthopsRoopa Prabhu
This patch introduces ecmp nexthops and nexthop groups for mac fdb entries. In subsequent patches this is used by the vxlan driver fdb entries. The use case is E-VPN multihoming [1,2,3] which requires bridged vxlan traffic to be load balanced to remote switches (vteps) belonging to the same multi-homed ethernet segment (This is analogous to a multi-homed LAG but over vxlan). Changes include new nexthop flag NHA_FDB for nexthops referenced by fdb entries. These nexthops only have ip. This patch includes appropriate checks to avoid routes referencing such nexthops. example: $ip nexthop add id 12 via 172.16.1.2 fdb $ip nexthop add id 13 via 172.16.1.3 fdb $ip nexthop add id 102 group 12/13 fdb $bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self [1] E-VPN https://tools.ietf.org/html/rfc7432 [2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365 [3] LPC talk with mention of nexthop groups for L2 ecmp http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf v4 - fixed uninitialized variable reported by kernel test robot Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-21net: don't return invalid table id error when we fall back to PF_UNSPECSabrina Dubroca
In case we can't find a ->dumpit callback for the requested (family,type) pair, we fall back to (PF_UNSPEC,type). In effect, we're in the same situation as if userspace had requested a PF_UNSPEC dump. For RTM_GETROUTE, that handler is rtnl_dump_all, which calls all the registered RTM_GETROUTE handlers. The requested table id may or may not exist for all of those families. commit ae677bbb4441 ("net: Don't return invalid table id error when dumping all families") fixed the problem when userspace explicitly requests a PF_UNSPEC dump, but missed the fallback case. For example, when we pass ipv6.disable=1 to a kernel with CONFIG_IP_MROUTE=y and CONFIG_IP_MROUTE_MULTIPLE_TABLES=y, the (PF_INET6, RTM_GETROUTE) handler isn't registered, so we end up in rtnl_dump_all, and listing IPv6 routes will unexpectedly print: # ip -6 r Error: ipv4: MR table does not exist. Dump terminated commit ae677bbb4441 introduced the dump_all_families variable, which gets set when userspace requests a PF_UNSPEC dump. However, we can't simply set the family to PF_UNSPEC in rtnetlink_rcv_msg in the fallback case to get dump_all_families == true, because some messages types (for example RTM_GETRULE and RTM_GETNEIGH) only register the PF_UNSPEC handler and use the family to filter in the kernel what is dumped to userspace. We would then export more entries, that userspace would have to filter. iproute does that, but other programs may not. Instead, this patch removes dump_all_families and updates the RTM_GETROUTE handlers to check if the family that is being dumped is their own. When it's not, which covers both the intentional PF_UNSPEC dumps (as dump_all_families did) and the fallback case, ignore the missing table id error. Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-20handle the group_source_req options directlyAl Viro
Native ->setsockopt() handling of these options (MCAST_..._SOURCE_GROUP and MCAST_{,UN}BLOCK_SOURCE) consists of copyin + call of a helper that does the actual work. The only change needed for ->compat_setsockopt() is a slightly different copyin - the helpers can be reused as-is. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-20ipv6: take handling of group_source_req options into a helperAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-20ipv[46]: do compat setsockopt for MCAST_{JOIN,LEAVE}_GROUP directlyAl Viro
direct parallel to the way these two are handled in the native ->setsockopt() instances - the helpers that do the real work are already separated and can be reused as-is in this case. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-20ipv6: do compat setsockopt for MCAST_MSFILTER directlyAl Viro
similar to the ipv4 counterpart of that patch - the same trick used to align the tail array properly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-20ip6_mc_msfilter(): pass the address list separatelyAl Viro
that way we'll be able to reuse it for compat case Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-20get rid of compat_mc_getsockopt()Al Viro
now we can do MCAST_MSFILTER in compat ->getsockopt() without playing silly buggers with copying things back and forth. We can form a native struct group_filter (sans the variable-length tail) on stack, pass that + pointer to the tail of original request to the helper doing the bulk of the work, then do the rest of copyout - same as the native getsockopt() does. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-20ip*_mc_gsfget(): lift copyout of struct group_filter into callersAl Viro
pass the userland pointer to the array in its tail, so that part gets copied out by our functions; copyout of everything else is done in the callers. Rationale: reuse for compat; the array is the same in native and compat, the layout of parts before it is different for compat. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-20compat_ip{,v6}_setsockopt(): enumerate MCAST_... options explicitlyAl Viro
We want to check if optname is among the MCAST_... ones; do that as an explicit switch. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-19ipv6: use ->ndo_tunnel_ctl in addrconf_set_dstaddrChristoph Hellwig
Use the new ->ndo_tunnel_ctl instead of overriding the address limit and using ->ndo_do_ioctl just to do a pointless user copy. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-19ipv6: streamline addrconf_set_dstaddrChristoph Hellwig
Factor out a addrconf_set_sit_dstaddr helper for the actual work if we found a SIT device, and only hold the rtnl lock around the device lookup and that new helper, as there is no point in holding it over a copy_from_user call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-19ipv6: stub out even more of addrconf_set_dstaddr if SIT is disabledChristoph Hellwig
There is no point in copying the structure from userspace or looking up a device if SIT support is not disabled and we'll eventually return -ENODEV anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-19sit: impement ->ndo_tunnel_ctlChristoph Hellwig
Implement the ->ndo_tunnel_ctl method, and use ip_tunnel_ioctl to handle userspace requests for the SIOCGETTUNNEL, SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-19sit: refactor ipip6_tunnel_ioctlChristoph Hellwig
Split the ioctl handler into one function per command instead of having a all the logic sit in one giant switch statement. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-19bpf: Add get{peer, sock}name attach types for sock_addrDaniel Borkmann
As stated in 983695fa6765 ("bpf: fix unconnected udp hooks"), the objective for the existing cgroup connect/sendmsg/recvmsg/bind BPF hooks is to be transparent to applications. In Cilium we make use of these hooks [0] in order to enable E-W load balancing for existing Kubernetes service types for all Cilium managed nodes in the cluster. Those backends can be local or remote. The main advantage of this approach is that it operates as close as possible to the socket, and therefore allows to avoid packet-based NAT given in connect/sendmsg/recvmsg hooks we only need to xlate sock addresses. This also allows to expose NodePort services on loopback addresses in the host namespace, for example. As another advantage, this also efficiently blocks bind requests for applications in the host namespace for exposed ports. However, one missing item is that we also need to perform reverse xlation for inet{,6}_getname() hooks such that we can return the service IP/port tuple back to the application instead of the remote peer address. The vast majority of applications does not bother about getpeername(), but in a few occasions we've seen breakage when validating the peer's address since it returns unexpectedly the backend tuple instead of the service one. Therefore, this trivial patch allows to customise and adds a getpeername() as well as getsockname() BPF cgroup hook for both IPv4 and IPv6 in order to address this situation. Simple example: # ./cilium/cilium service list ID Frontend Service Type Backend 1 1.2.3.4:80 ClusterIP 1 => 10.0.0.10:80 Before; curl's verbose output example, no getpeername() reverse xlation: # curl --verbose 1.2.3.4 * Rebuilt URL to: 1.2.3.4/ * Trying 1.2.3.4... * TCP_NODELAY set * Connected to 1.2.3.4 (10.0.0.10) port 80 (#0) > GET / HTTP/1.1 > Host: 1.2.3.4 > User-Agent: curl/7.58.0 > Accept: */* [...] After; with getpeername() reverse xlation: # curl --verbose 1.2.3.4 * Rebuilt URL to: 1.2.3.4/ * Trying 1.2.3.4... * TCP_NODELAY set * Connected to 1.2.3.4 (1.2.3.4) port 80 (#0) > GET / HTTP/1.1 > Host: 1.2.3.4 > User-Agent: curl/7.58.0 > Accept: */* [...] Originally, I had both under a BPF_CGROUP_INET{4,6}_GETNAME type and exposed peer to the context similar as in inet{,6}_getname() fashion, but API-wise this is suboptimal as it always enforces programs having to test for ctx->peer which can easily be missed, hence BPF_CGROUP_INET{4,6}_GET{PEER,SOCK}NAME split. Similarly, the checked return code is on tnum_range(1, 1), but if a use case comes up in future, it can easily be changed to return an error code instead. Helper and ctx member access is the same as with connect/sendmsg/etc hooks. [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Andrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/61a479d759b2482ae3efb45546490bacd796a220.1589841594.git.daniel@iogearbox.net
2020-05-18ipv6: move SIOCADDRT and SIOCDELRT handling into ->compat_ioctlChristoph Hellwig
To prepare removing the global routing_ioctl hack start lifting the code into a newly added ipv6 ->compat_ioctl handler. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-18ipv6: lift copy_from_user out of ipv6_route_ioctlChristoph Hellwig
Prepare for better compat ioctl handling by moving the user copy out of ipv6_route_ioctl. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16netns: enable to inherit devconf from current netnsNicolas Dichtel
The goal is to be able to inherit the initial devconf parameters from the current netns, ie the netns where this new netns has been created. This is useful in a containers environment where /proc/sys is read only. For example, if a pod is created with specifics devconf parameters and has the capability to create netns, the user expects to get the same parameters than his 'init_net', which is not the real init_net in this case. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16ipv6: Fix suspicious RCU usage warning in ip6mrMadhuparna Bhowmik
This patch fixes the following warning: ============================= WARNING: suspicious RCU usage 5.7.0-rc4-next-20200507-syzkaller #0 Not tainted ----------------------------- net/ipv6/ip6mr.c:124 RCU-list traversed in non-reader section!! ipmr_new_table() returns an existing table, but there is no table at init. Therefore the condition: either holding rtnl or the list is empty is used. Fixes: d1db275dd3f6e ("ipv6: ip6mr: support multiple tables") Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Move the bpf verifier trace check into the new switch statement in HEAD. Resolve the overlapping changes in hinic, where bug fixes overlap the addition of VF support. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-05-14 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Merged tag 'perf-for-bpf-2020-05-06' from tip tree that includes CAP_PERFMON. 2) support for narrow loads in bpf_sock_addr progs and additional helpers in cg-skb progs, from Andrey. 3) bpf benchmark runner, from Andrii. 4) arm and riscv JIT optimizations, from Luke. 5) bpf iterator infrastructure, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14esp6: calculate transport_header correctly when sel.family != AF_INET6Xin Long
In esp6_init_state() for beet mode when x->sel.family != AF_INET6: x->props.header_len = sizeof(struct ip_esp_hdr) + crypto_aead_ivsize(aead) + IPV4_BEET_PHMAXLEN + (sizeof(struct ipv6hdr) - sizeof(struct iphdr)) In xfrm6_beet_gso_segment() skb->transport_header is supposed to move to the end of the ph header for IPPROTO_BEETPH, so if x->sel.family != AF_INET6 and it's IPPROTO_BEETPH, it should do: skb->transport_header -= (sizeof(struct ipv6hdr) - sizeof(struct iphdr)); skb->transport_header += ph->hdrlen * 8; And IPV4_BEET_PHMAXLEN is only reserved for PH header, so if x->sel.family != AF_INET6 and it's not IPPROTO_BEETPH, it should do: skb->transport_header -= (sizeof(struct ipv6hdr) - sizeof(struct iphdr)); skb->transport_header -= IPV4_BEET_PHMAXLEN; Thanks Sabrina for looking deep into this issue. Fixes: 7f9e40eb18a9 ("esp6: add gso_segment for esp6 beet mode") Reported-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-05-13ipv6: set msg_control_is_user in do_ipv6_getsockoptChristoph Hellwig
While do_ipv6_getsockopt does not call the high-level recvmsg helper, the msghdr eventually ends up being passed to put_cmsg anyway, and thus needs msg_control_is_user set to the proper value. Fixes: 1f466e1f15cf ("net: cleanly handle kernel vs user buffers for ->msg_control") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-13bpf: Enable bpf_iter targets registering ctx argument typesYonghong Song
Commit b121b341e598 ("bpf: Add PTR_TO_BTF_ID_OR_NULL support") adds a field btf_id_or_null_non0_off to bpf_prog->aux structure to indicate that the first ctx argument is PTR_TO_BTF_ID reg_type and all others are PTR_TO_BTF_ID_OR_NULL. This approach does not really scale if we have other different reg types in the future, e.g., a pointer to a buffer. This patch enables bpf_iter targets registering ctx argument reg types which may be different from the default one. For example, for pointers to structures, the default reg_type is PTR_TO_BTF_ID for tracing program. The target can register a particular pointer type as PTR_TO_BTF_ID_OR_NULL which can be used by the verifier to enforce accesses. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200513180221.2949882-1-yhs@fb.com
2020-05-13bpf: Change func bpf_iter_unreg_target() signatureYonghong Song
Change func bpf_iter_unreg_target() parameter from target name to target reg_info, similar to bpf_iter_reg_target(). Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200513180220.2949737-1-yhs@fb.com
2020-05-13bpf: net: Refactor bpf_iter target registrationYonghong Song
Currently bpf_iter_reg_target takes parameters from target and allocates memory to save them. This is really not necessary, esp. in the future we may grow information passed from targets to bpf_iter manager. The patch refactors the code so target reg_info becomes static and bpf_iter manager can just take a reference to it. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200513180219.2949605-1-yhs@fb.com
2020-05-12netlabel: cope with NULL catmapPaolo Abeni
The cipso and calipso code can set the MLS_CAT attribute on successful parsing, even if the corresponding catmap has not been allocated, as per current configuration and external input. Later, selinux code tries to access the catmap if the MLS_CAT flag is present via netlbl_catmap_getlong(). That may cause null ptr dereference while processing incoming network traffic. Address the issue setting the MLS_CAT flag only if the catmap is really allocated. Additionally let netlbl_catmap_getlong() cope with NULL catmap. Reported-by: Matthew Sheets <matthew.sheets@gd-ms.com> Fixes: 4b8feff251da ("netlabel: fix the horribly broken catmap functions") Fixes: ceba1832b1b2 ("calipso: Set the calipso socket label to match the secattr.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-09net: bpf: Add netlink and ipv6_route bpf_iter targetsYonghong Song
This patch added netlink and ipv6_route targets, using the same seq_ops (except show() and minor changes for stop()) for /proc/net/{netlink,ipv6_route}. The net namespace for these targets are the current net namespace at file open stage, similar to /proc/net/{netlink,ipv6_route} reference counting the net namespace at seq_file open stage. Since module is not supported for now, ipv6_route is supported only if the IPV6 is built-in, i.e., not compiled as a module. The restriction can be lifted once module is properly supported for bpf_iter. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175910.2476329-1-yhs@fb.com
2020-05-08ipv6: use DST_NOCOUNT in ip6_rt_pcpu_alloc()Eric Dumazet
We currently have to adjust ipv6 route gc_thresh/max_size depending on number of cpus on a server, this makes very little sense. If the kernels sets /proc/sys/net/ipv6/route/gc_thresh to 1024 and /proc/sys/net/ipv6/route/max_size to 4096, then we better not track the percpu dst that our implementation uses. Only routes not added (directly or indirectly) by the admin should be tracked and limited. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: David Ahern <dsahern@kernel.org> Cc: Maciej Żenczykowski <maze@google.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>