summaryrefslogtreecommitdiff
path: root/samples/bpf/libbpf.h
AgeCommit message (Collapse)Author
2017-03-23A Sample of using socket cookie and uid for traffic monitoringChenbo Feng
Add a sample program to demostrate the possible usage of get_socket_cookie and get_socket_uid helper function. The program will store bytes and packets counting of in/out traffic monitored by iptables and store the stats in a bpf map in per socket base. The owner uid of the socket will be stored as part of the data entry. A shell script for running the program is also included. Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Chenbo Feng <fengc@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20samples/bpf: Move open_raw_sock to separate headerJoe Stringer
This function was declared in libbpf.c and was the only remaining function in this library, but has nothing to do with BPF. Shift it out into a new header, sock_example.h, and include it from the relevant samples. Signed-off-by: Joe Stringer <joe@ovn.org> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20161209024620.31660-8-joe@ovn.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20samples/bpf: Remove perf_event_open() declarationJoe Stringer
This declaration was made in samples/bpf/libbpf.c for convenience, but there's already one in tools/perf/perf-sys.h. Reuse that one. Committer notes: Testing it: $ make -j4 O=../build/v4.9.0-rc8+ samples/bpf/ make[1]: Entering directory '/home/build/v4.9.0-rc8+' CHK include/config/kernel.release GEN ./Makefile CHK include/generated/uapi/linux/version.h Using /home/acme/git/linux as source for kernel CHK include/generated/utsrelease.h CHK include/generated/timeconst.h CHK include/generated/bounds.h CHK include/generated/asm-offsets.h CALL /home/acme/git/linux/scripts/checksyscalls.sh HOSTCC samples/bpf/test_verifier.o HOSTCC samples/bpf/libbpf.o HOSTCC samples/bpf/../../tools/lib/bpf/bpf.o HOSTCC samples/bpf/test_maps.o HOSTCC samples/bpf/sock_example.o HOSTCC samples/bpf/bpf_load.o <SNIP> HOSTLD samples/bpf/trace_event HOSTLD samples/bpf/sampleip HOSTLD samples/bpf/tc_l2_redirect make[1]: Leaving directory '/home/build/v4.9.0-rc8+' $ Also tested the offwaketime resulting from the rebuild, seems to work as before. Signed-off-by: Joe Stringer <joe@ovn.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20161209024620.31660-7-joe@ovn.org [ Use -I$(srctree)/tools/lib/ to support out of source code tree builds ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20tools lib bpf: Add bpf_prog_{attach,detach}Joe Stringer
Commit d8c5b17f2bc0 ("samples: bpf: add userspace example for attaching eBPF programs to cgroups") added these functions to samples/libbpf, but during this merge all of the samples libbpf functionality is shifting to tools/lib/bpf. Shift these functions there. Committer notes: Use bzero + attr.FIELD = value instead of 'attr = { .FIELD = value, just like the other wrapper calls to sys_bpf with bpf_attr to make this build in older toolchais, such as the ones in CentOS 5 and 6. Signed-off-by: Joe Stringer <joe@ovn.org> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-au2zvtsh55vqeo3v3uw7jr4c@git.kernel.org Link: https://github.com/joestringer/linux/commit/353e6f298c3d0a92fa8bfa61ff898c5050261a12.patch Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20samples/bpf: Switch over to libbpfJoe Stringer
Now that libbpf under tools/lib/bpf/* is synced with the version from samples/bpf, we can get rid most of the libbpf library here. Committer notes: Built it in a docker fedora rawhide container and ran it in the f25 host, seems to work just like it did before this patch, i.e. the switch to tools/lib/bpf/ doesn't seem to have introduced problems and Joe said he tested it with all the entries in samples/bpf/ and other code he found: [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux headers_install <SNIP> [root@f5065a7d6272 linux]# rm -rf /tmp/build/linux/samples/bpf/ [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/ make[1]: Entering directory '/tmp/build/linux' CHK include/config/kernel.release HOSTCC scripts/basic/fixdep GEN ./Makefile CHK include/generated/uapi/linux/version.h Using /git/linux as source for kernel CHK include/generated/utsrelease.h HOSTCC scripts/basic/bin2c HOSTCC arch/x86/tools/relocs_32.o HOSTCC arch/x86/tools/relocs_64.o LD samples/bpf/built-in.o <SNIP> HOSTCC samples/bpf/fds_example.o HOSTCC samples/bpf/sockex1_user.o /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create': /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers] insns, insns_cnt, "GPL", 0, ^~~~~ In file included from /git/linux/samples/bpf/libbpf.h:5:0, from /git/linux/samples/bpf/bpf_load.h:4, from /git/linux/samples/bpf/fds_example.c:15: /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *' int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns, ^~~~~~~~~~~~~~~~ HOSTCC samples/bpf/sockex2_user.o <SNIP> HOSTCC samples/bpf/xdp_tx_iptunnel_user.o clang -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include -I/git/linux/arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated -I/git/linux/include -I./include -I/git/linux/arch/x86/include/uapi -I/git/linux/include/uapi -I./include/generated/uapi -include /git/linux/include/linux/kconfig.h \ -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \ -Wno-compare-distinct-pointer-types \ -Wno-gnu-variable-sized-type-not-at-end \ -Wno-address-of-packed-member -Wno-tautological-compare \ -O2 -emit-llvm -c /git/linux/samples/bpf/sockex1_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/sockex1_kern.o HOSTLD samples/bpf/tc_l2_redirect <SNIP> HOSTLD samples/bpf/lwt_len_hist HOSTLD samples/bpf/xdp_tx_iptunnel make[1]: Leaving directory '/tmp/build/linux' [root@f5065a7d6272 linux]# And then, in the host: [root@jouet bpf]# mount | grep "docker.*devicemapper\/" /dev/mapper/docker-253:0-1705076-9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 on /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 type xfs (rw,relatime,context="system_u:object_r:container_file_t:s0:c73,c276",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota) [root@jouet bpf]# cd /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9/rootfs/tmp/build/linux/samples/bpf/ [root@jouet bpf]# file offwaketime offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=f423d171e0487b2f802b6a792657f0f3c8f6d155, not stripped [root@jouet bpf]# readelf -SW offwaketime offwaketime offwaketime_kern.o offwaketime_user.o [root@jouet bpf]# readelf -SW offwaketime_kern.o There are 11 section headers, starting at offset 0x700: Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 0] NULL 0000000000000000 000000 000000 00 0 0 0 [ 1] .strtab STRTAB 0000000000000000 000658 0000a8 00 0 0 1 [ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4 [ 3] kprobe/try_to_wake_up PROGBITS 0000000000000000 000040 0000d8 00 AX 0 0 8 [ 4] .relkprobe/try_to_wake_up REL 0000000000000000 0005a8 000020 10 10 3 8 [ 5] tracepoint/sched/sched_switch PROGBITS 0000000000000000 000118 000318 00 AX 0 0 8 [ 6] .reltracepoint/sched/sched_switch REL 0000000000000000 0005c8 000090 10 10 5 8 [ 7] maps PROGBITS 0000000000000000 000430 000050 00 WA 0 0 4 [ 8] license PROGBITS 0000000000000000 000480 000004 00 WA 0 0 1 [ 9] version PROGBITS 0000000000000000 000484 000004 00 WA 0 0 4 [10] .symtab SYMTAB 0000000000000000 000488 000120 18 1 4 8 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific) [root@jouet bpf]# ./offwaketime | head -3 qemu-system-x86;entry_SYSCALL_64_fastpath;sys_ppoll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;hrtimer_wakeup;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel;start_cpu;;swapper/0 4 firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 1 swapper/2;start_cpu;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 61 [root@jouet bpf]# Signed-off-by: Joe Stringer <joe@ovn.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Cc: netdev@vger.kernel.org Link: https://github.com/joestringer/linux/commit/5c40f54a52b1f437123c81e21873f4b4b1f9bd55.patch Link: http://lkml.kernel.org/n/tip-xr8twtx7sjh5821g8qw47yxk@git.kernel.org [ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as noticed by Wang Nan ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15samples/bpf: Make samples more libbpf-centricJoe Stringer
Switch all of the sample code to use the function names from tools/lib/bpf so that they're consistent with that, and to declare their own log buffers. This allow the next commit to be purely devoted to getting rid of the duplicate library in samples/bpf. Committer notes: Testing it: On a fedora rawhide container, with clang/llvm 3.9, sharing the host linux kernel git tree: # make O=/tmp/build/linux/ headers_install # make O=/tmp/build/linux -C samples/bpf/ Since I forgot to make it privileged, just tested it outside the container, using what it generated: # uname -a Linux jouet 4.9.0-rc8+ #1 SMP Mon Dec 12 11:20:49 BRT 2016 x86_64 x86_64 x86_64 GNU/Linux # cd /var/lib/docker/devicemapper/mnt/c43e09a53ff56c86a07baf79847f00e2cc2a17a1e2220e1adbf8cbc62734feda/rootfs/tmp/build/linux/samples/bpf/ # ls -la offwaketime -rwxr-xr-x. 1 root root 24200 Dec 15 12:19 offwaketime # file offwaketime offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=c940d3f127d5e66cdd680e42d885cb0b64f8a0e4, not stripped # readelf -SW offwaketime_kern.o | grep PROGBITS [ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4 [ 3] kprobe/try_to_wake_up PROGBITS 0000000000000000 000040 0000d8 00 AX 0 0 8 [ 5] tracepoint/sched/sched_switch PROGBITS 0000000000000000 000118 000318 00 AX 0 0 8 [ 7] maps PROGBITS 0000000000000000 000430 000050 00 WA 0 0 4 [ 8] license PROGBITS 0000000000000000 000480 000004 00 WA 0 0 1 [ 9] version PROGBITS 0000000000000000 000484 000004 00 WA 0 0 4 # ./offwaketime | head -5 swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 106 CPU 0/KVM;entry_SYSCALL_64_fastpath;sys_ioctl;do_vfs_ioctl;kvm_vcpu_ioctl;kvm_arch_vcpu_ioctl_run;kvm_vcpu_block;schedule;__schedule;-;try_to_wake_up;swake_up_locked;swake_up;apic_timer_expired;apic_timer_fn;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary;;swapper/3 2 Compositor;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;futex_requeue;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;SoftwareVsyncTh 5 firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 13 JS Helper;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;firefox 2 # Signed-off-by: Joe Stringer <joe@ovn.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/20161214224342.12858-2-joe@ovn.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-11-25samples: bpf: add userspace example for attaching eBPF programs to cgroupsDaniel Mack
Add a simple userpace program to demonstrate the new API to attach eBPF programs to cgroups. This is what it does: * Create arraymap in kernel with 4 byte keys and 8 byte values * Load eBPF program The eBPF program accesses the map passed in to store two pieces of information. The number of invocations of the program, which maps to the number of packets received, is stored to key 0. Key 1 is incremented on each iteration by the number of bytes stored in the skb. * Detach any eBPF program previously attached to the cgroup * Attach the new program to the cgroup using BPF_PROG_ATTACH * Once a second, read map[0] and map[1] to see how many bytes and packets were seen on any socket of tasks in the given cgroup. The program takes a cgroup path as 1st argument, and either "ingress" or "egress" as 2nd. Optionally, "drop" can be passed as 3rd argument, which will make the generated eBPF program return 0 instead of 1, so the kernel will drop the packet. libbpf gained two new wrappers for the new syscall commands. Signed-off-by: Daniel Mack <daniel@zonque.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24samples/bpf: fix sockex2 exampleAlexei Starovoitov
since llvm commit "Do not expand UNDEF SDNode during insn selection lowering" llvm will generate code that uses uninitialized registers for cases where C code is actually uses uninitialized data. So this sockex2 example is technically broken. Fix it by initializing on the stack variable fully. Also increase verifier buffer limit, since verifier output may not fit in 64k for this sockex2 code depending on llvm version. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-29bpf: allow access into map value arraysJosef Bacik
Suppose you have a map array value that is something like this struct foo { unsigned iter; int array[SOME_CONSTANT]; }; You can easily insert this into an array, but you cannot modify the contents of foo->array[] after the fact. This is because we have no way to verify we won't go off the end of the array at verification time. This patch provides a start for this work. We accomplish this by keeping track of a minimum and maximum value a register could be while we're checking the code. Then at the time we try to do an access into a MAP_VALUE we verify that the maximum offset into that region is a valid access into that memory region. So in practice, code such as this unsigned index = 0; if (foo->iter >= SOME_CONSTANT) foo->iter = index; else index = foo->iter++; foo->array[index] = bar; would be allowed, as we can verify that index will always be between 0 and SOME_CONSTANT-1. If you wish to use signed values you'll have to have an extra check to make sure the index isn't less than 0, or do something like index %= SOME_CONSTANT. Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08samples/bpf: add map_flags to bpf loaderAlexei Starovoitov
note old loader is compatible with new kernel. map_flags are optional Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bpf: add sample usages for persistent maps/progsDaniel Borkmann
This patch adds a couple of stand-alone examples on how BPF_OBJ_PIN and BPF_OBJ_GET commands can be used. Example with maps: # ./fds_example -F /sys/fs/bpf/m -P -m -k 1 -v 42 bpf: map fd:3 (Success) bpf: pin ret:(0,Success) bpf: fd:3 u->(1:42) ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):42 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 -v 24 bpf: get fd:3 (Success) bpf: fd:3 u->(1:24) ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):24 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -P -m bpf: map fd:3 (Success) bpf: pin ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):0 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -G -m bpf: get fd:3 (Success) Example with progs: # ./fds_example -F /sys/fs/bpf/p -P -p bpf: prog fd:3 (Success) bpf: pin ret:(0,Success) bpf sock:4 <- fd:3 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p -G -p bpf: get fd:3 (Success) bpf: sock:4 <- fd:3 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p2 -P -p -o ./sockex1_kern.o bpf: prog fd:5 (Success) bpf: pin ret:(0,Success) bpf: sock:3 <- fd:5 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p2 -G -p bpf: get fd:3 (Success) bpf: sock:4 <- fd:3 attached ret:(0,Success) Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12bpf: add unprivileged bpf testsAlexei Starovoitov
Add new tests samples/bpf/test_verifier: unpriv: return pointer checks that pointer cannot be returned from the eBPF program unpriv: add const to pointer unpriv: add pointer to pointer unpriv: neg pointer checks that pointer arithmetic is disallowed unpriv: cmp pointer with const unpriv: cmp pointer with pointer checks that comparison of pointers is disallowed Only one case allowed 'void *value = bpf_map_lookup_elem(..); if (value == 0) ...' unpriv: check that printk is disallowed since bpf_trace_printk is not available to unprivileged unpriv: pass pointer to helper function checks that pointers cannot be passed to functions that expect integers If function expects a pointer the verifier allows only that type of pointer. Like 1st argument of bpf_map_lookup_elem() must be pointer to map. (applies to non-root as well) unpriv: indirectly pass pointer on stack to helper function checks that pointer stored into stack cannot be used as part of key passed into bpf_map_lookup_elem() unpriv: mangle pointer on stack 1 unpriv: mangle pointer on stack 2 checks that writing into stack slot that already contains a pointer is disallowed unpriv: read pointer from stack in small chunks checks that < 8 byte read from stack slot that contains a pointer is disallowed unpriv: write pointer into ctx checks that storing pointers into skb->fields is disallowed unpriv: write pointer into map elem value checks that storing pointers into element values is disallowed For example: int bpf_prog(struct __sk_buff *skb) { u32 key = 0; u64 *value = bpf_map_lookup_elem(&map, &key); if (value) *value = (u64) skb; } will be rejected. unpriv: partial copy of pointer checks that doing 32-bit register mov from register containing a pointer is disallowed unpriv: pass pointer to tail_call checks that passing pointer as an index into bpf_tail_call is disallowed unpriv: cmp map pointer with zero checks that comparing map pointer with constant is disallowed unpriv: write into frame pointer checks that frame pointer is read-only (applies to root too) unpriv: cmp of frame pointer checks that R10 cannot be using in comparison unpriv: cmp of stack pointer checks that Rx = R10 - imm is ok, but comparing Rx is not unpriv: obfuscate stack pointer checks that Rx = R10 - imm is ok, but Rx -= imm is not Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Add BQL support to via-rhine, from Tino Reichardt. 2) Integrate SWITCHDEV layer support into the DSA layer, so DSA drivers can support hw switch offloading. From Floria Fainelli. 3) Allow 'ip address' commands to initiate multicast group join/leave, from Madhu Challa. 4) Many ipv4 FIB lookup optimizations from Alexander Duyck. 5) Support EBPF in cls_bpf classifier and act_bpf action, from Daniel Borkmann. 6) Remove the ugly compat support in ARP for ugly layers like ax25, rose, etc. And use this to clean up the neigh layer, then use it to implement MPLS support. All from Eric Biederman. 7) Support L3 forwarding offloading in switches, from Scott Feldman. 8) Collapse the LOCAL and MAIN ipv4 FIB tables when possible, to speed up route lookups even further. From Alexander Duyck. 9) Many improvements and bug fixes to the rhashtable implementation, from Herbert Xu and Thomas Graf. In particular, in the case where an rhashtable user bulk adds a large number of items into an empty table, we expand the table much more sanely. 10) Don't make the tcp_metrics hash table per-namespace, from Eric Biederman. 11) Extend EBPF to access SKB fields, from Alexei Starovoitov. 12) Split out new connection request sockets so that they can be established in the main hash table. Much less false sharing since hash lookups go direct to the request sockets instead of having to go first to the listener then to the request socks hashed underneath. From Eric Dumazet. 13) Add async I/O support for crytpo AF_ALG sockets, from Tadeusz Struk. 14) Support stable privacy address generation for RFC7217 in IPV6. From Hannes Frederic Sowa. 15) Hash network namespace into IP frag IDs, also from Hannes Frederic Sowa. 16) Convert PTP get/set methods to use 64-bit time, from Richard Cochran. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1816 commits) fm10k: Bump driver version to 0.15.2 fm10k: corrected VF multicast update fm10k: mbx_update_max_size does not drop all oversized messages fm10k: reset head instead of calling update_max_size fm10k: renamed mbx_tx_dropped to mbx_tx_oversized fm10k: update xcast mode before synchronizing multicast addresses fm10k: start service timer on probe fm10k: fix function header comment fm10k: comment next_vf_mbx flow fm10k: don't handle mailbox events in iov_event path and always process mailbox fm10k: use separate workqueue for fm10k driver fm10k: Set PF queues to unlimited bandwidth during virtualization fm10k: expose tx_timeout_count as an ethtool stat fm10k: only increment tx_timeout_count in Tx hang path fm10k: remove extraneous "Reset interface" message fm10k: separate PF only stats so that VF does not display them fm10k: use hw->mac.max_queues for stats fm10k: only show actual queues, not the maximum in hardware fm10k: allow creation of VLAN on default vid fm10k: fix unused warnings ...
2015-04-02samples/bpf: Add simple non-portable kprobe filter exampleAlexei Starovoitov
tracex1_kern.c - C program compiled into BPF. It attaches to kprobe:netif_receive_skb() When skb->dev->name == "lo", it prints sample debug message into trace_pipe via bpf_trace_printk() helper function. tracex1_user.c - corresponding user space component that: - loads BPF program via bpf() syscall - opens kprobes:netif_receive_skb event via perf_event_open() syscall - attaches the program to event via ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd); - prints from trace_pipe Note, this BPF program is non-portable. It must be recompiled with current kernel headers. kprobe is not a stable ABI and BPF+kprobe scripts may no longer be meaningful when kernel internals change. No matter in what way the kernel changes, neither the kprobe, nor the BPF program can ever crash or corrupt the kernel, assuming the kprobes, perf and BPF subsystem has no bugs. The verifier will detect that the program is using bpf_trace_printk() and the kernel will print 'this is a DEBUG kernel' warning banner, which means that bpf_trace_printk() should be used for debugging of the BPF program only. Usage: $ sudo tracex1 ping-19826 [000] d.s2 63103.382648: : skb ffff880466b1ca00 len 84 ping-19826 [000] d.s2 63103.382684: : skb ffff880466b1d300 len 84 ping-19826 [000] d.s2 63104.382533: : skb ffff880466b1ca00 len 84 ping-19826 [000] d.s2 63104.382594: : skb ffff880466b1d300 len 84 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1427312966-8434-7-git-send-email-ast@plumgrid.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-01ebpf: export BPF_PSEUDO_MAP_FD to uapiDaniel Borkmann
We need to export BPF_PSEUDO_MAP_FD to user space, as it's used in the ELF BPF loader where instructions are being loaded that need map fixups. An initial stage loads all maps into the kernel, and later on replaces related instructions in the eBPF blob with BPF_PSEUDO_MAP_FD as source register and the actual fd as immediate value. The kernel verifier recognizes this keyword and replaces the map fd with a real pointer internally. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05samples: bpf: trivial eBPF program in CAlexei Starovoitov
this example does the same task as previous socket example in assembler, but this one does it in C. eBPF program in kernel does: /* assume that packet is IPv4, load one byte of IP->proto */ int index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)); long *value; value = bpf_map_lookup_elem(&my_map, &index); if (value) __sync_fetch_and_add(value, 1); Corresponding user space reads map[tcp], map[udp], map[icmp] and prints protocol stats every second Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05samples: bpf: example of stateful socket filteringAlexei Starovoitov
this socket filter example does: - creates arraymap in kernel with key 4 bytes and value 8 bytes - loads eBPF program which assumes that packet is IPv4 and loads one byte of IP->proto from the packet and uses it as a key in a map r0 = skb->data[ETH_HLEN + offsetof(struct iphdr, protocol)]; *(u32*)(fp - 4) = r0; value = bpf_map_lookup_elem(map_fd, fp - 4); if (value) (*(u64*)value) += 1; - attaches this program to raw socket - every second user space reads map[IPPROTO_TCP], map[IPPROTO_UDP], map[IPPROTO_ICMP] to see how many packets of given protocol were seen on loopback interface Usage: $sudo samples/bpf/sock_example TCP 0 UDP 0 ICMP 0 packets TCP 187600 UDP 0 ICMP 4 packets TCP 376504 UDP 0 ICMP 8 packets TCP 563116 UDP 0 ICMP 12 packets TCP 753144 UDP 0 ICMP 16 packets Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18bpf: add a testsuite for eBPF mapsAlexei Starovoitov
. check error conditions and sanity of hash and array map APIs . check large maps (that kernel gracefully switches to vmalloc from kmalloc) . check multi-process parallel access and stress test Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26bpf: mini eBPF library, test stubs and verifier testsuiteAlexei Starovoitov
1. the library includes a trivial set of BPF syscall wrappers: int bpf_create_map(int key_size, int value_size, int max_entries); int bpf_update_elem(int fd, void *key, void *value); int bpf_lookup_elem(int fd, void *key, void *value); int bpf_delete_elem(int fd, void *key); int bpf_get_next_key(int fd, void *key, void *next_key); int bpf_prog_load(enum bpf_prog_type prog_type, const struct sock_filter_int *insns, int insn_len, const char *license); bpf_prog_load() stores verifier log into global bpf_log_buf[] array and BPF_*() macros to build instructions 2. test stubs configure eBPF infra with 'unspec' map and program types. These are fake types used by user space testsuite only. 3. verifier tests valid and invalid programs and expects predefined error log messages from kernel. 40 tests so far. $ sudo ./test_verifier #0 add+sub+mul OK #1 unreachable OK #2 unreachable2 OK #3 out of range jump OK #4 out of range jump2 OK #5 test1 ld_imm64 OK ... Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>