summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-09-10selftests/bpf: Fix test_ksyms on non-SMP kernelsIlya Leoshkevich
On non-SMP kernels __per_cpu_start is not 0, so look it up in kallsyms. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200910171336.3161995-1-iii@linux.ibm.com
2020-09-10bpf: Plug hole in struct bpf_sk_lookup_kernLorenz Bauer
As Alexei points out, struct bpf_sk_lookup_kern has two 4-byte holes. This leads to suboptimal instructions being generated (IPv4, x86): 1372 struct bpf_sk_lookup_kern ctx = { 0xffffffff81b87f30 <+624>: xor %eax,%eax 0xffffffff81b87f32 <+626>: mov $0x6,%ecx 0xffffffff81b87f37 <+631>: lea 0x90(%rsp),%rdi 0xffffffff81b87f3f <+639>: movl $0x110002,0x88(%rsp) 0xffffffff81b87f4a <+650>: rep stos %rax,%es:(%rdi) 0xffffffff81b87f4d <+653>: mov 0x8(%rsp),%eax 0xffffffff81b87f51 <+657>: mov %r13d,0x90(%rsp) 0xffffffff81b87f59 <+665>: incl %gs:0x7e4970a0(%rip) 0xffffffff81b87f60 <+672>: mov %eax,0x8c(%rsp) 0xffffffff81b87f67 <+679>: movzwl 0x10(%rsp),%eax 0xffffffff81b87f6c <+684>: mov %ax,0xa8(%rsp) 0xffffffff81b87f74 <+692>: movzwl 0x38(%rsp),%eax 0xffffffff81b87f79 <+697>: mov %ax,0xaa(%rsp) Fix this by moving around sport and dport. pahole confirms there are no more holes: struct bpf_sk_lookup_kern { u16 family; /* 0 2 */ u16 protocol; /* 2 2 */ __be16 sport; /* 4 2 */ u16 dport; /* 6 2 */ struct { __be32 saddr; /* 8 4 */ __be32 daddr; /* 12 4 */ } v4; /* 8 8 */ struct { const struct in6_addr * saddr; /* 16 8 */ const struct in6_addr * daddr; /* 24 8 */ } v6; /* 16 16 */ struct sock * selected_sk; /* 32 8 */ bool no_reuseport; /* 40 1 */ /* size: 48, cachelines: 1, members: 8 */ /* padding: 7 */ /* last cacheline: 48 bytes */ }; The assembly also doesn't contain the pesky rep stos anymore: 1372 struct bpf_sk_lookup_kern ctx = { 0xffffffff81b87f60 <+624>: movzwl 0x10(%rsp),%eax 0xffffffff81b87f65 <+629>: movq $0x0,0xa8(%rsp) 0xffffffff81b87f71 <+641>: movq $0x0,0xb0(%rsp) 0xffffffff81b87f7d <+653>: mov %ax,0x9c(%rsp) 0xffffffff81b87f85 <+661>: movzwl 0x38(%rsp),%eax 0xffffffff81b87f8a <+666>: movq $0x0,0xb8(%rsp) 0xffffffff81b87f96 <+678>: mov %ax,0x9e(%rsp) 0xffffffff81b87f9e <+686>: mov 0x8(%rsp),%eax 0xffffffff81b87fa2 <+690>: movq $0x0,0xc0(%rsp) 0xffffffff81b87fae <+702>: movl $0x110002,0x98(%rsp) 0xffffffff81b87fb9 <+713>: mov %eax,0xa0(%rsp) 0xffffffff81b87fc0 <+720>: mov %r13d,0xa4(%rsp) 1: https://lore.kernel.org/bpf/CAADnVQKE6y9h2fwX6OS837v-Uf+aBXnT_JXiN_bbo2gitZQ3tA@mail.gmail.com/ Fixes: e9ddbb7707ff ("bpf: Introduce SK_LOOKUP program type with a dedicated attach point") Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20200910110248.198326-1-lmb@cloudflare.com
2020-09-10tools: bpftool: Add "inner_map" to "bpftool map create" outer mapsQuentin Monnet
There is no support for creating maps of types array-of-map or hash-of-map in bpftool. This is because the kernel needs an inner_map_fd to collect metadata on the inner maps to be supported by the new map, but bpftool does not provide a way to pass this file descriptor. Add a new optional "inner_map" keyword that can be used to pass a reference to a map, retrieve a fd to that map, and pass it as the inner_map_fd. Add related documentation and bash completion. Note that we can reference the inner map by its name, meaning we can have several times the keyword "name" with different meanings (mandatory outer map name, and possibly a name to use to find the inner_map_fd). The bash completion will offer it just once, and will not suggest "name" on the following command: # bpftool map create /sys/fs/bpf/my_outer_map type hash_of_maps \ inner_map name my_inner_map [TAB] Fixing that specific case seems too convoluted. Completion will work as expected, however, if the outer map name comes first and the "inner_map name ..." is passed second. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200910102652.10509-4-quentin@isovalent.com
2020-09-10tools: bpftool: Keep errors for map-of-map dumps if distinct from ENOENTQuentin Monnet
When dumping outer maps or prog_array maps, and on lookup failure, bpftool simply skips the entry with no error message. This is because the kernel returns non-zero when no value is found for the provided key, which frequently happen for those maps if they have not been filled. When such a case occurs, errno is set to ENOENT. It seems unlikely we could receive other error codes at this stage (we successfully retrieved map info just before), but to be on the safe side, let's skip the entry only if errno was ENOENT, and not for the other errors. v3: New patch Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200910102652.10509-3-quentin@isovalent.com
2020-09-10tools: bpftool: Clean up function to dump map entryQuentin Monnet
The function used to dump a map entry in bpftool is a bit difficult to follow, as a consequence to earlier refactorings. There is a variable ("num_elems") which does not appear to be necessary, and the error handling would look cleaner if moved to its own function. Let's clean it up. No functional change. v2: - v1 was erroneously removing the check on fd maps in an attempt to get support for outer map dumps. This is already working. Instead, v2 focuses on cleaning up the dump_map_elem() function, to avoid similar confusion in the future. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200910102652.10509-2-quentin@isovalent.com
2020-09-10selftests: bpf: Test iterating a sockmapLorenz Bauer
Add a test that exercises a basic sockmap / sockhash iteration. For now we simply count the number of elements seen. Once sockmap update from iterators works we can extend this to perform a full copy. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200909162712.221874-4-lmb@cloudflare.com
2020-09-10net: Allow iterating sockmap and sockhashLorenz Bauer
Add bpf_iter support for sockmap / sockhash, based on the bpf_sk_storage and hashtable implementation. sockmap and sockhash share the same iteration context: a pointer to an arbitrary key and a pointer to a socket. Both pointers may be NULL, and so BPF has to perform a NULL check before accessing them. Technically it's not possible for sockhash iteration to yield a NULL socket, but we ignore this to be able to use a single iteration point. Iteration will visit all keys that remain unmodified during the lifetime of the iterator. It may or may not visit newly added ones. Switch from using rcu_dereference_raw to plain rcu_dereference, so we gain another guard rail if CONFIG_PROVE_RCU is enabled. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200909162712.221874-3-lmb@cloudflare.com
2020-09-10net: sockmap: Remove unnecessary sk_fullsock checksLorenz Bauer
The lookup paths for sockmap and sockhash currently include a check that returns NULL if the socket we just found is not a full socket. However, this check is not necessary. On insertion we ensure that we have a full socket (caveat around sock_ops), so request sockets are not a problem. Time-wait sockets are allocated separate from the original socket and then fed into the hashdance. They don't affect the sockets already stored in the sockmap. Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200909162712.221874-2-lmb@cloudflare.com
2020-09-10tools: bpftool: Include common options from separate fileQuentin Monnet
Nearly all man pages for bpftool have the same common set of option flags (--help, --version, --json, --pretty, --debug). The description is duplicated across all the pages, which is more difficult to maintain if the description of an option changes. It may also be confusing to sort out what options are not "common" and should not be copied when creating new manual pages. Let's move the description for those common options to a separate file, which is included with a RST directive when generating the man pages. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200909162500.17010-3-quentin@isovalent.com
2020-09-10tools: bpftool: Print optional built-in features along with versionQuentin Monnet
Bpftool has a number of features that can be included or left aside during compilation. This includes: - Support for libbfd, providing the disassembler for JIT-compiled programs. - Support for BPF skeletons, used for profiling programs or iterating on the PIDs of processes associated with BPF objects. In order to make it easy for users to understand what features were compiled for a given bpftool binary, print the status of the two features above when showing the version number for bpftool ("bpftool -V" or "bpftool version"). Document this in the main manual page. Example invocations: $ bpftool version ./bpftool v5.9.0-rc1 features: libbfd, skeletons $ bpftool -p version { "version": "5.9.0-rc1", "features": { "libbfd": true, "skeletons": true } } Some other parameters are optional at compilation ("DISASM_FOUR_ARGS_SIGNATURE", LIBCAP support) but they do not impact significantly bpftool's behaviour from a user's point of view, so their status is not reported. Available commands and supported program types depend on the version number, and are therefore not reported either. Note that they are already available, albeit without JSON, via bpftool's help messages. v3: - Use a simple list instead of boolean values for plain output. v2: - Fix JSON (object instead or array for the features). Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200909162500.17010-2-quentin@isovalent.com
2020-09-10selftests, bpftool: Add bpftool (and eBPF helpers) documentation buildQuentin Monnet
eBPF selftests include a script to check that bpftool builds correctly with different command lines. Let's add one build for bpftool's documentation so as to detect errors or warning reported by rst2man when compiling the man pages. Also add a build to the selftests Makefile to make sure we build bpftool documentation along with bpftool when building the selftests. This also builds and checks warnings for the man page for eBPF helpers, which is built along bpftool's documentation. This change adds rst2man as a dependency for selftests (it comes with Python's "docutils"). v2: - Use "--exit-status=1" option for rst2man instead of counting lines from stderr. - Also build bpftool as part as the selftests build (and not only when the tests are actually run). Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200909162251.15498-3-quentin@isovalent.com
2020-09-10tools: bpftool: Log info-level messages when building bpftool man pagesQuentin Monnet
To build man pages for bpftool (and for eBPF helper functions), rst2man can log different levels of information. Let's make it log all levels to keep the RST files clean. Doing so, rst2man complains about double colons, used for literal blocks, that look like underlines for section titles. Let's add the necessary blank lines. v2: - Use "--verbose" instead of "-r 1" (same behaviour but more readable). - Pass it through a RST2MAN_OPTS variable so we can easily pass other options too. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200909162251.15498-2-quentin@isovalent.com
2020-09-10bpf: Remove duplicate headersChen Zhou
Remove duplicate headers which are included twice. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200908132201.184005-1-chenzhou10@huawei.com
2020-09-09perf: Stop using deprecated bpf_program__title()Andrii Nakryiko
Switch from deprecated bpf_program__title() API to bpf_program__section_name(). Also drop unnecessary error checks because neither bpf_program__title() nor bpf_program__section_name() can fail or return NULL. Fixes: 521095842027 ("libbpf: Deprecate notion of BPF program "title" in favor of "section name"") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/bpf/20200908180127.1249-1-andriin@fb.com
2020-09-09selftests/bpf: Fix test_sysctl_loop{1, 2} failure due to clang changeYonghong Song
Andrii reported that with latest clang, when building selftests, we have error likes: error: progs/test_sysctl_loop1.c:23:16: in function sysctl_tcp_mem i32 (%struct.bpf_sysctl*): Looks like the BPF stack limit of 512 bytes is exceeded. Please move large on stack variables into BPF per-cpu array map. The error is triggered by the following LLVM patch: https://reviews.llvm.org/D87134 For example, the following code is from test_sysctl_loop1.c: static __always_inline int is_tcp_mem(struct bpf_sysctl *ctx) { volatile char tcp_mem_name[] = "net/ipv4/tcp_mem/very_very_very_very_long_pointless_string"; ... } Without the above LLVM patch, the compiler did optimization to load the string (59 bytes long) with 7 64bit loads, 1 8bit load and 1 16bit load, occupying 64 byte stack size. With the above LLVM patch, the compiler only uses 8bit loads, but subregister is 32bit. So stack requirements become 4 * 59 = 236 bytes. Together with other stuff on the stack, total stack size exceeds 512 bytes, hence compiler complains and quits. To fix the issue, removing "volatile" key word or changing "volatile" to "const"/"static const" does not work, the string is put in .rodata.str1.1 section, which libbpf did not process it and errors out with libbpf: elf: skipping unrecognized data section(6) .rodata.str1.1 libbpf: prog 'sysctl_tcp_mem': bad map relo against '.L__const.is_tcp_mem.tcp_mem_name' in section '.rodata.str1.1' Defining the string const as global variable can fix the issue as it puts the string constant in '.rodata' section which is recognized by libbpf. In the future, when libbpf can process '.rodata.str*.*' properly, the global definition can be changed back to local definition. Defining tcp_mem_name as a global, however, triggered a verifier failure. ./test_progs -n 7/21 libbpf: load bpf program failed: Permission denied libbpf: -- BEGIN DUMP LOG --- libbpf: invalid stack off=0 size=1 verification time 6975 usec stack depth 160+64 processed 889 insns (limit 1000000) max_states_per_insn 4 total_states 14 peak_states 14 mark_read 10 libbpf: -- END LOG -- libbpf: failed to load program 'sysctl_tcp_mem' libbpf: failed to load object 'test_sysctl_loop2.o' test_bpf_verif_scale:FAIL:114 #7/21 test_sysctl_loop2.o:FAIL This actually exposed a bpf program bug. In test_sysctl_loop{1,2}, we have code like const char tcp_mem_name[] = "<...long string...>"; ... char name[64]; ... for (i = 0; i < sizeof(tcp_mem_name); ++i) if (name[i] != tcp_mem_name[i]) return 0; In the above code, if sizeof(tcp_mem_name) > 64, name[i] access may be out of bound. The sizeof(tcp_mem_name) is 59 for test_sysctl_loop1.c and 79 for test_sysctl_loop2.c. Without promotion-to-global change, old compiler generates code where the overflowed stack access is actually filled with valid value, so hiding the bpf program bug. With promotion-to-global change, the code is different, more specifically, the previous loading constants to stack is gone, and "name" occupies stack[-64:0] and overflow access triggers a verifier error. To fix the issue, adjust "name" buffer size properly. Reported-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200909171542.3673449-1-yhs@fb.com
2020-09-08selftests/bpf: Add test for map_ptr arithmeticYonghong Song
Change selftest map_ptr_kern.c with disabling inlining for one of subtests, which will fail the test without previous verifier change. Also added to verifier test for both "map_ptr += scalar" and "scalar += map_ptr" arithmetic. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200908175703.2463721-1-yhs@fb.com
2020-09-08bpf: Permit map_ptr arithmetic with opcode add and offset 0Yonghong Song
Commit 41c48f3a98231 ("bpf: Support access to bpf map fields") added support to access map fields with CORE support. For example, struct bpf_map { __u32 max_entries; } __attribute__((preserve_access_index)); struct bpf_array { struct bpf_map map; __u32 elem_size; } __attribute__((preserve_access_index)); struct { __uint(type, BPF_MAP_TYPE_ARRAY); __uint(max_entries, 4); __type(key, __u32); __type(value, __u32); } m_array SEC(".maps"); SEC("cgroup_skb/egress") int cg_skb(void *ctx) { struct bpf_array *array = (struct bpf_array *)&m_array; /* .. array->map.max_entries .. */ } In kernel, bpf_htab has similar structure, struct bpf_htab { struct bpf_map map; ... } In the above cg_skb(), to access array->map.max_entries, with CORE, the clang will generate two builtin's. base = &m_array; /* access array.map */ map_addr = __builtin_preserve_struct_access_info(base, 0, 0); /* access array.map.max_entries */ max_entries_addr = __builtin_preserve_struct_access_info(map_addr, 0, 0); max_entries = *max_entries_addr; In the current llvm, if two builtin's are in the same function or in the same function after inlining, the compiler is smart enough to chain them together and generates like below: base = &m_array; max_entries = *(base + reloc_offset); /* reloc_offset = 0 in this case */ and we are fine. But if we force no inlining for one of functions in test_map_ptr() selftest, e.g., check_default(), the above two __builtin_preserve_* will be in two different functions. In this case, we will have code like: func check_hash(): reloc_offset_map = 0; base = &m_array; map_base = base + reloc_offset_map; check_default(map_base, ...) func check_default(map_base, ...): max_entries = *(map_base + reloc_offset_max_entries); In kernel, map_ptr (CONST_PTR_TO_MAP) does not allow any arithmetic. The above "map_base = base + reloc_offset_map" will trigger a verifier failure. ; VERIFY(check_default(&hash->map, map)); 0: (18) r7 = 0xffffb4fe8018a004 2: (b4) w1 = 110 3: (63) *(u32 *)(r7 +0) = r1 R1_w=invP110 R7_w=map_value(id=0,off=4,ks=4,vs=8,imm=0) R10=fp0 ; VERIFY_TYPE(BPF_MAP_TYPE_HASH, check_hash); 4: (18) r1 = 0xffffb4fe8018a000 6: (b4) w2 = 1 7: (63) *(u32 *)(r1 +0) = r2 R1_w=map_value(id=0,off=0,ks=4,vs=8,imm=0) R2_w=invP1 R7_w=map_value(id=0,off=4,ks=4,vs=8,imm=0) R10=fp0 8: (b7) r2 = 0 9: (18) r8 = 0xffff90bcb500c000 11: (18) r1 = 0xffff90bcb500c000 13: (0f) r1 += r2 R1 pointer arithmetic on map_ptr prohibited To fix the issue, let us permit map_ptr + 0 arithmetic which will result in exactly the same map_ptr. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200908175702.2463625-1-yhs@fb.com
2020-09-07tools, bpf: Synchronise BPF UAPI header with toolsQuentin Monnet
Synchronise the bpf.h header under tools, to report the fixes recently brought to the documentation for the BPF helpers. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200904161454.31135-4-quentin@isovalent.com
2020-09-07bpf: Fix formatting in documentation for BPF helpersQuentin Monnet
Fix a formatting error in the description of bpf_load_hdr_opt() (rst2man complains about a wrong indentation, but what is missing is actually a blank line before the bullet list). Fix and harmonise the formatting for other helpers. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200904161454.31135-3-quentin@isovalent.com
2020-09-07tools: bpftool: Fix formatting in bpftool-link documentationQuentin Monnet
Fix a formatting error in the documentation for bpftool-link, so that the man page can build correctly. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200904161454.31135-2-quentin@isovalent.com
2020-09-04samples, bpf: Add xsk_fwd test file to .gitignoreDaniel T. Lee
This commit adds xsk_fwd test file to .gitignore which is newly added to samples/bpf. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200904063434.24963-2-danieltimlee@gmail.com
2020-09-04samples, bpf: Replace bpf_program__title() with bpf_program__section_name()Daniel T. Lee
From commit 521095842027 ("libbpf: Deprecate notion of BPF program "title" in favor of "section name""), the term title has been replaced with section name in libbpf. Since the bpf_program__title() has been deprecated, this commit switches this function to bpf_program__section_name(). Due to this commit, the compilation warning issue has also been resolved. Fixes: 521095842027 ("libbpf: Deprecate notion of BPF program "title" in favor of "section name"") Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200904063434.24963-1-danieltimlee@gmail.com
2020-09-04libbpf: Fix potential multiplication overflowAndrii Nakryiko
Detected by LGTM static analyze in Github repo, fix potential multiplication overflow before result is casted to size_t. Fixes: 8505e8709b5e ("libbpf: Implement generalized .BTF.ext func/line info adjustment") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200904041611.1695163-2-andriin@fb.com
2020-09-04libbpf: Fix another __u64 cast in printfAndrii Nakryiko
Another issue of __u64 needing either %lu or %llu, depending on the architecture. Fix with cast to `unsigned long long`. Fixes: 7e06aad52929 ("libbpf: Add multi-prog section support for struct_ops") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200904041611.1695163-1-andriin@fb.com
2020-09-03selftests/bpf: Fix check in global_data_init.Hao Luo
The returned value of bpf_object__open_file() should be checked with libbpf_get_error() rather than NULL. This fix prevents test_progs from crash when test_global_data.o is not present. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200903200528.747884-1-haoluo@google.com
2020-09-03Merge branch 'libbpf-support-bpf-to-bpf-calls'Alexei Starovoitov
Andrii Nakryiko says: ==================== Currently, libbpf supports a limited form of BPF-to-BPF subprogram calls. The restriction is that entry-point BPF program should use *all* of defined sub-programs in BPF .o file. If any of the subprograms is not used, such entry-point BPF program will be rejected by verifier as containing unreachable dead code. This is not a big limitation for cases with single entry-point BPF programs, but is quite a heavy restriction for multi-programs that use only partially overlapping set of subprograms. This patch set removes all such restrictions and adds complete support for using BPF sub-program calls on BPF side. This is achieved through libbpf tracking subprograms individually and detecting which subprograms are used by any given entry-point BPF program, and subsequently only appending and relocating code for just those used subprograms. In addition, libbpf now also supports multiple entry-point BPF programs within the same ELF section. This allows to structure code so that there are few variants of BPF programs of the same type and attaching to the same target (e.g., for tracepoints and kprobes) without the need to worry about ELF section name clashes. This patch set opens way for more wider adoption of BPF subprogram calls, especially for real-world production use-cases with complicated net of subprograms. This will allow to further scale BPF verification process through good use of global functions, which can be verified independently. This is also important prerequisite for static linking which allows static BPF libraries to not worry about naming clashes for section names, as well as use static non-inlined functions (subprograms) without worries of verifier rejecting program due to dead code. Patch set is structured as follows: - patched 1-6 contain all the libbpf changes necessary to support multi-prog sections and bpf2bpf subcalls; - patch 7 adds dedicated selftests validating all combinations of possible sub-calls (within and across sections, static vs global functions); - patch 8 deprecated bpf_program__title() in favor of bpf_program__section_name(). The intent was to also deprecate bpf_object__find_program_by_title() as it's now non-sensical with multiple programs per section. But there were too many selftests uses of this and I didn't want to delay this patches further and make it even bigger, so left it for a follow up cleanup; - patches 9-10 remove uses for title-related APIs from bpftool and bpf_program__title() use from selftests; - patch 11 is converting fexit_bpf2bpf to have explicit subtest (it does contain 4 subtests, which are not handled as sub-tests); - patches 12-14 convert few complicated BPF selftests to use __noinline functions to further validate correctness of libbpf's bpf2bpf processing logic. v2->v3: - explained subprog relocation algorithm in more details (Alexei); - pyperf, strobelight and cls_redirect got new subprog variants, leaving other modes intact (Alexei); v1->v2: - rename DEPRECATED to LIBBPF_DEPRECATED to avoid name clashes; - fix test_subprogs build; - convert a bunch of complicated selftests to __noinline (Alexei). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-03selftests/bpf: Add __noinline variant of cls_redirect selftestAndrii Nakryiko
As one of the most complicated and close-to-real-world programs, cls_redirect is a good candidate to exercise libbpf's logic of handling bpf2bpf calls. So add variant with using explicit __noinline for majority of functions except few most basic ones. If those few functions are inlined, verifier starts to complain about program instruction limit of 1mln instructions being exceeded, most probably due to instruction overhead of doing a sub-program call. Convert user-space part of selftest to have to sub-tests: with and without inlining. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Lorenz Bauer <lmb@cloudflare.com> Link: https://lore.kernel.org/bpf/20200903203542.15944-15-andriin@fb.com
2020-09-03selftests/bpf: Modernize xdp_noinline test w/ skeleton and __noinlineAndrii Nakryiko
Update xdp_noinline to use BPF skeleton and force __noinline on helper sub-programs. Also, split existing logic into v4- and v6-only to complicate sub-program calling patterns (partially overlapped sets of functions for entry-level BPF programs). Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200903203542.15944-14-andriin@fb.com
2020-09-03selftests/bpf: Add subprogs to pyperf, strobemeta, and l4lb_noinline testsAndrii Nakryiko
Add use of non-inlined subprogs to few bigger selftests to excercise libbpf's bpf2bpf handling logic. Also split l4lb_all selftest into two sub-tests. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200903203542.15944-13-andriin@fb.com
2020-09-03selftests/bpf: Turn fexit_bpf2bpf into test with subtestsAndrii Nakryiko
There are clearly 4 subtests, so make it official. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200903203542.15944-12-andriin@fb.com
2020-09-03libbpf: Deprecate notion of BPF program "title" in favor of "section name"Andrii Nakryiko
BPF program title is ambigious and misleading term. It is ELF section name, so let's just call it that and deprecate bpf_program__title() API in favor of bpf_program__section_name(). Additionally, using bpf_object__find_program_by_title() is now inherently dangerous and ambiguous, as multiple BPF program can have the same section name. So deprecate this API as well and recommend to switch to non-ambiguous bpf_object__find_program_by_name(). Internally, clean up usage and mis-usage of BPF program section name for denoting BPF program name. Shorten the field name to prog->sec_name to be consistent with all other prog->sec_* variables. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200903203542.15944-11-andriin@fb.com
2020-09-03selftests/bpf: Don't use deprecated libbpf APIsAndrii Nakryiko
Remove all uses of bpf_program__title() and bpf_program__find_program_by_title(). Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200903203542.15944-10-andriin@fb.com
2020-09-03tools/bpftool: Replace bpf_program__title() with bpf_program__section_name()Andrii Nakryiko
bpf_program__title() is deprecated, switch to bpf_program__section_name() and avoid compilation warnings. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200903203542.15944-9-andriin@fb.com
2020-09-03selftests/bpf: Add selftest for multi-prog sections and bpf-to-bpf callsAndrii Nakryiko
Add a selftest excercising bpf-to-bpf subprogram calls, as well as multiple entry-point BPF programs per section. Also make sure that BPF CO-RE works for such set ups both for sub-programs and for multi-entry sections. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200903203542.15944-8-andriin@fb.com
2020-09-03libbpf: Add multi-prog section support for struct_opsAndrii Nakryiko
Adjust struct_ops handling code to work with multi-program ELF sections properly. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200903203542.15944-7-andriin@fb.com
2020-09-03libbpf: Implement generalized .BTF.ext func/line info adjustmentAndrii Nakryiko
Complete multi-prog sections and multi sub-prog support in libbpf by properly adjusting .BTF.ext's line and function information. Mark exposed btf_ext__reloc_func_info() and btf_ext__reloc_func_info() APIs as deprecated. These APIs have simplistic assumption that all sub-programs are going to be appended to all main BPF programs, which doesn't hold in real life. It's unlikely there are any users of this API, as it's very libbpf internals-specific. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200903203542.15944-6-andriin@fb.com
2020-09-03libbpf: Make RELO_CALL work for multi-prog sections and sub-program callsAndrii Nakryiko
This patch implements general and correct logic for bpf-to-bpf sub-program calls. Only sub-programs used (called into) from entry-point (main) BPF program are going to be appended at the end of main BPF program. This ensures that BPF verifier won't encounter any dead code due to copying unreferenced sub-program. This change means that each entry-point (main) BPF program might have a different set of sub-programs appended to it and potentially in different order. This has implications on how sub-program call relocations need to be handled, described below. All relocations are now split into two categores: data references (maps and global variables) and code references (sub-program calls). This distinction is important because data references need to be relocated just once per each BPF program and sub-program. These relocation are agnostic to instruction locations, because they are not code-relative and they are relocating against static targets (maps, variables with fixes offsets, etc). Sub-program RELO_CALL relocations, on the other hand, are highly-dependent on code position, because they are recorded as instruction-relative offset. So BPF sub-programs (those that do calls into other sub-programs) can't be relocated once, they need to be relocated each time such a sub-program is appended at the end of the main entry-point BPF program. As mentioned above, each main BPF program might have different subset and differen order of sub-programs, so call relocations can't be done just once. Splitting data reference and calls relocations as described above allows to do this efficiently and cleanly. bpf_object__find_program_by_name() will now ignore non-entry BPF programs. Previously one could have looked up '.text' fake BPF program, but the existence of such BPF program was always an implementation detail and you can't do much useful with it. Now, though, all non-entry sub-programs get their own BPF program with name corresponding to a function name, so there is no more '.text' name for BPF program. This means there is no regression, effectively, w.r.t. API behavior. But this is important aspect to highlight, because it's going to be critical once libbpf implements static linking of BPF programs. Non-entry static BPF programs will be allowed to have conflicting names, but global and main-entry BPF program names should be unique. Just like with normal user-space linking process. So it's important to restrict this aspect right now, keep static and non-entry functions as internal implementation details, and not have to deal with regressions in behavior later. This patch leaves .BTF.ext adjustment as is until next patch. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200903203542.15944-5-andriin@fb.com
2020-09-03libbpf: Support CO-RE relocations for multi-prog sectionsAndrii Nakryiko
Fix up CO-RE relocation code to handle relocations against ELF sections containing multiple BPF programs. This requires lookup of a BPF program by its section name and instruction index it contains. While it could have been done as a simple loop, it could run into performance issues pretty quickly, as number of CO-RE relocations can be quite large in real-world applications, and each CO-RE relocation incurs BPF program look up now. So instead of simple loop, implement a binary search by section name + insn offset. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200903203542.15944-4-andriin@fb.com
2020-09-03libbpf: Parse multi-function sections into multiple BPF programsAndrii Nakryiko
Teach libbpf how to parse code sections into potentially multiple bpf_program instances, based on ELF FUNC symbols. Each BPF program will keep track of its position within containing ELF section for translating section instruction offsets into program instruction offsets: regardless of BPF program's location in ELF section, it's first instruction is always at local instruction offset 0, so when libbpf is working with relocations (which use section-based instruction offsets) this is critical to make proper translations. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200903203542.15944-3-andriin@fb.com
2020-09-03libbpf: Ensure ELF symbols table is found before further ELF processingAndrii Nakryiko
libbpf ELF parsing logic might need symbols available before ELF parsing is completed, so we need to make sure that symbols table section is found in a separate pass before all the subsequent sections are processed. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200903203542.15944-2-andriin@fb.com
2020-09-02xsk: Fix use-after-free in failed shared_umem bindMagnus Karlsson
Fix use-after-free when a shared umem bind fails. The code incorrectly tried to free the allocated buffer pool both in the bind code and then later also when the socket was released. Fix this by setting the buffer pool pointer to NULL after the bind code has freed the pool, so that the socket release code will not try to free the pool. This is the same solution as the regular, non-shared umem code path has. This was missing from the shared umem path. Fixes: b5aea28dca13 ("xsk: Add shared umem support between queue ids") Reported-by: syzbot+5334f62e4d22804e646a@syzkaller.appspotmail.com Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1599032164-25684-1-git-send-email-magnus.karlsson@intel.com
2020-09-02xsk: Fix null check on error return pathGustavo A. R. Silva
Currently, dma_map is being checked, when the right object identifier to be null-checked is dma_map->dma_pages, instead. Fix this by null-checking dma_map->dma_pages. Fixes: 921b68692abb ("xsk: Enable sharing of dma mappings") Addresses-Coverity-ID: 1496811 ("Logically dead code") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20200902150750.GA7257@embeddedor
2020-09-02xsk: Fix possible segfault at xskmap entry insertionMagnus Karlsson
Fix possible segfault when entry is inserted into xskmap. This can happen if the socket is in a state where the umem has been set up, the Rx ring created but it has yet to be bound to a device. In this case the pool has not yet been created and we cannot reference it for the existence of the fill ring. Fix this by removing the whole xsk_is_setup_for_bpf_map function. Once upon a time, it was used to make sure that the Rx and fill rings where set up before the driver could call xsk_rcv, since there are no tests for the existence of these rings in the data path. But these days, we have a state variable that we test instead. When it is XSK_BOUND, everything has been set up correctly and the socket has been bound. So no reason to have the xsk_is_setup_for_bpf_map function anymore. Fixes: 7361f9c3d719 ("xsk: Move fill and completion rings to buffer pool") Reported-by: syzbot+febe51d44243fbc564ee@syzkaller.appspotmail.com Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1599037569-26690-1-git-send-email-magnus.karlsson@intel.com
2020-09-02xsk: Fix possible segfault in xsk umem diagnosticsMagnus Karlsson
Fix possible segfault in the xsk diagnostics code when dumping information about the umem. This can happen when a umem has been created, but the socket has not been bound yet. In this case, the xsk buffer pool does not exist yet and we cannot dump the information that was moved from the umem to the buffer pool. Fix this by testing for the existence of the buffer pool and if not there, do not dump any of that information. Fixes: c2d3d6a47462 ("xsk: Move queue_id, dev and need_wakeup to buffer pool") Fixes: 7361f9c3d719 ("xsk: Move fill and completion rings to buffer pool") Reported-by: syzbot+3f04d36b7336f7868066@syzkaller.appspotmail.com Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1599036743-26454-1-git-send-email-magnus.karlsson@intel.com
2020-09-02selftests/bpf: Test task_file iterator without visiting pthreadsYonghong Song
Modified existing bpf_iter_test_file.c program to check whether all accessed files from the main thread or not. Modified existing bpf_iter_test_file program to check whether all accessed files from the main thread or not. $ ./test_progs -n 4 ... #4/7 task_file:OK ... #4 bpf_iter:OK Summary: 1/24 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200902023113.1672863-1-yhs@fb.com
2020-09-02bpf: Avoid iterating duplicated files for task_file iteratorYonghong Song
Currently, task_file iterator iterates all files from all tasks. This may potentially visit a lot of duplicated files if there are many tasks sharing the same files, e.g., typical pthreads where these pthreads and the main thread are sharing the same files. This patch changed task_file iterator to skip a particular task if that task shares the same files as its group_leader (the task having the same tgid and also task->tgid == task->pid). This will preserve the same result, visiting all files from all tasks, and will reduce runtime cost significantl, e.g., if there are a lot of pthreads and the process has a lot of open files. Suggested-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/bpf/20200902023112.1672792-1-yhs@fb.com
2020-09-01Merge branch 'dpaa2-eth-add-a-dpaa2_eth_-prefix-to-all-functions'David S. Miller
Ioana Ciornei says: ==================== dpaa2-eth: add a dpaa2_eth_ prefix to all functions This is just a quick cleanup that aims at adding a dpaa2_eth_ prefix to all functions within the dpaa2-eth driver even if those are static and private to the driver. The main reason for doing this is that looking a perf top, for example, is becoming an inconvenience because one cannot easily determine which entries are dpaa2-eth related or not. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01dpaa2-eth: add a dpaa2_eth_ prefix to all functions in dpaa2-eth-dcb.cIoana Ciornei
Some static functions in the dpaa2-eth driver don't have the dpaa2_eth_ prefix and this is becoming an inconvenience when looking at, for example, a perf top output and trying to determine easily which entries are dpaa2-eth related. Ammend this by adding the prefix to all the functions. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01dpaa2-eth: add a dpaa2_eth_ prefix to all functions in dpaa2-eth.cIoana Ciornei
Some static functions in the dpaa2-eth driver don't have the dpaa2_eth_ prefix and this is becoming an inconvenience when looking at, for example, a perf top output and trying to determine easily which entries are dpaa2-eth related. Ammend this by adding the prefix to all the functions. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01dpaa2-eth: add a dpaa2_eth_ prefix to all functions in dpaa2-ethtool.cIoana Ciornei
Some static functions in the dpaa2-eth driver don't have the dpaa2_eth_ prefix and this is becoming an inconvenience when looking at, for example, a perf top output and trying to determine easily which entries are dpaa2-eth related. Ammend this by adding the prefix to all the functions. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>