linux.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2016-12-23	Merge branch 'perf-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "On the kernel side there's two x86 PMU driver fixes and a uprobes fix, plus on the tooling side there's a number of fixes and some late updates" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) perf sched timehist: Fix invalid period calculation perf sched timehist: Remove hardcoded 'comm_width' check at print_summary perf sched timehist: Enlarge default 'comm_width' perf sched timehist: Honour 'comm_width' when aligning the headers perf/x86: Fix overlap counter scheduling bug perf/x86/pebs: Fix handling of PEBS buffer overflows samples/bpf: Move open_raw_sock to separate header samples/bpf: Remove perf_event_open() declaration samples/bpf: Be consistent with bpf_load_program bpf_insn parameter tools lib bpf: Add bpf_prog_{attach,detach} samples/bpf: Switch over to libbpf perf diff: Do not overwrite valid build id perf annotate: Don't throw error for zero length symbols perf bench futex: Fix lock-pi help string perf trace: Check if MAP_32BIT is defined (again) samples/bpf: Make perf_event_read() static uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation samples/bpf: Make samples more libbpf-centric tools lib bpf: Add flags to bpf_create_map() tools lib bpf: use __u32 from linux/types.h ...
2016-12-20	samples/bpf: Move open_raw_sock to separate header	Joe Stringer
	This function was declared in libbpf.c and was the only remaining function in this library, but has nothing to do with BPF. Shift it out into a new header, sock_example.h, and include it from the relevant samples. Signed-off-by: Joe Stringer <joe@ovn.org> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20161209024620.31660-8-joe@ovn.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20	samples/bpf: Remove perf_event_open() declaration	Joe Stringer
	This declaration was made in samples/bpf/libbpf.c for convenience, but there's already one in tools/perf/perf-sys.h. Reuse that one. Committer notes: Testing it: $ make -j4 O=../build/v4.9.0-rc8+ samples/bpf/ make[1]: Entering directory '/home/build/v4.9.0-rc8+' CHK include/config/kernel.release GEN ./Makefile CHK include/generated/uapi/linux/version.h Using /home/acme/git/linux as source for kernel CHK include/generated/utsrelease.h CHK include/generated/timeconst.h CHK include/generated/bounds.h CHK include/generated/asm-offsets.h CALL /home/acme/git/linux/scripts/checksyscalls.sh HOSTCC samples/bpf/test_verifier.o HOSTCC samples/bpf/libbpf.o HOSTCC samples/bpf/../../tools/lib/bpf/bpf.o HOSTCC samples/bpf/test_maps.o HOSTCC samples/bpf/sock_example.o HOSTCC samples/bpf/bpf_load.o <SNIP> HOSTLD samples/bpf/trace_event HOSTLD samples/bpf/sampleip HOSTLD samples/bpf/tc_l2_redirect make[1]: Leaving directory '/home/build/v4.9.0-rc8+' $ Also tested the offwaketime resulting from the rebuild, seems to work as before. Signed-off-by: Joe Stringer <joe@ovn.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20161209024620.31660-7-joe@ovn.org [ Use -I$(srctree)/tools/lib/ to support out of source code tree builds ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20	samples/bpf: Switch over to libbpf	Joe Stringer
	Now that libbpf under tools/lib/bpf/* is synced with the version from samples/bpf, we can get rid most of the libbpf library here. Committer notes: Built it in a docker fedora rawhide container and ran it in the f25 host, seems to work just like it did before this patch, i.e. the switch to tools/lib/bpf/ doesn't seem to have introduced problems and Joe said he tested it with all the entries in samples/bpf/ and other code he found: [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux headers_install <SNIP> [root@f5065a7d6272 linux]# rm -rf /tmp/build/linux/samples/bpf/ [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/ make[1]: Entering directory '/tmp/build/linux' CHK include/config/kernel.release HOSTCC scripts/basic/fixdep GEN ./Makefile CHK include/generated/uapi/linux/version.h Using /git/linux as source for kernel CHK include/generated/utsrelease.h HOSTCC scripts/basic/bin2c HOSTCC arch/x86/tools/relocs_32.o HOSTCC arch/x86/tools/relocs_64.o LD samples/bpf/built-in.o <SNIP> HOSTCC samples/bpf/fds_example.o HOSTCC samples/bpf/sockex1_user.o /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create': /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers] insns, insns_cnt, "GPL", 0, ^~~~~ In file included from /git/linux/samples/bpf/libbpf.h:5:0, from /git/linux/samples/bpf/bpf_load.h:4, from /git/linux/samples/bpf/fds_example.c:15: /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn ' but argument is of type 'const struct bpf_insn ' int bpf_load_program(enum bpf_prog_type type, struct bpf_insn insns, ^~~~~~~~~~~~~~~~ HOSTCC samples/bpf/sockex2_user.o <SNIP> HOSTCC samples/bpf/xdp_tx_iptunnel_user.o clang -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include -I/git/linux/arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated -I/git/linux/include -I./include -I/git/linux/arch/x86/include/uapi -I/git/linux/include/uapi -I./include/generated/uapi -include /git/linux/include/linux/kconfig.h \ -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \ -Wno-compare-distinct-pointer-types \ -Wno-gnu-variable-sized-type-not-at-end \ -Wno-address-of-packed-member -Wno-tautological-compare \ -O2 -emit-llvm -c /git/linux/samples/bpf/sockex1_kern.c -o -\| llc -march=bpf -filetype=obj -o samples/bpf/sockex1_kern.o HOSTLD samples/bpf/tc_l2_redirect <SNIP> HOSTLD samples/bpf/lwt_len_hist HOSTLD samples/bpf/xdp_tx_iptunnel make[1]: Leaving directory '/tmp/build/linux' [root@f5065a7d6272 linux]# And then, in the host: [root@jouet bpf]# mount \| grep "docker.devicemapper\/" /dev/mapper/docker-253:0-1705076-9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 on /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 type xfs (rw,relatime,context="system_u:object_r:container_file_t:s0:c73,c276",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota) [root@jouet bpf]# cd /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9/rootfs/tmp/build/linux/samples/bpf/ [root@jouet bpf]# file offwaketime offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=f423d171e0487b2f802b6a792657f0f3c8f6d155, not stripped [root@jouet bpf]# readelf -SW offwaketime offwaketime offwaketime_kern.o offwaketime_user.o [root@jouet bpf]# readelf -SW offwaketime_kern.o There are 11 section headers, starting at offset 0x700: Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 0] NULL 0000000000000000 000000 000000 00 0 0 0 [ 1] .strtab STRTAB 0000000000000000 000658 0000a8 00 0 0 1 [ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4 [ 3] kprobe/try_to_wake_up PROGBITS 0000000000000000 000040 0000d8 00 AX 0 0 8 [ 4] .relkprobe/try_to_wake_up REL 0000000000000000 0005a8 000020 10 10 3 8 [ 5] tracepoint/sched/sched_switch PROGBITS 0000000000000000 000118 000318 00 AX 0 0 8 [ 6] .reltracepoint/sched/sched_switch REL 0000000000000000 0005c8 000090 10 10 5 8 [ 7] maps PROGBITS 0000000000000000 000430 000050 00 WA 0 0 4 [ 8] license PROGBITS 0000000000000000 000480 000004 00 WA 0 0 1 [ 9] version PROGBITS 0000000000000000 000484 000004 00 WA 0 0 4 [10] .symtab SYMTAB 0000000000000000 000488 000120 18 1 4 8 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific) [root@jouet bpf]# ./offwaketime \| head -3 qemu-system-x86;entry_SYSCALL_64_fastpath;sys_ppoll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;hrtimer_wakeup;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel;start_cpu;;swapper/0 4 firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 1 swapper/2;start_cpu;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 61 [root@jouet bpf]# Signed-off-by: Joe Stringer <joe@ovn.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Cc: netdev@vger.kernel.org Link: https://github.com/joestringer/linux/commit/5c40f54a52b1f437123c81e21873f4b4b1f9bd55.patch Link: http://lkml.kernel.org/n/tip-xr8twtx7sjh5821g8qw47yxk@git.kernel.org [ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as noticed by Wang Nan ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-17	Merge branch 'kbuild' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild updates from Michal Marek: - prototypes for x86 asm-exported symbols (Adam Borowski) and a warning about missing CRCs (Nick Piggin) - asm-exports fix for LTO (Nicolas Pitre) - thin archives improvements (Nick Piggin) - linker script fix for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION (Nick Piggin) - genksyms support for __builtin_va_list keyword - misc minor fixes * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: x86/kbuild: enable modversions for symbols exported from asm kbuild: fix scripts/adjust_autoksyms.sh* for the no modules case scripts/kallsyms: remove last remnants of --page-offset option make use of make variable CURDIR instead of calling pwd kbuild: cmd_export_list: tighten the sed script kbuild: minor improvement for thin archives build kbuild: modpost warn if export version crc is missing kbuild: keep data tables through dead code elimination kbuild: improve linker compatibility with lib-ksyms.o build genksyms: Regenerate parser kbuild/genksyms: handle va_list type kbuild: thin archives for multi-y targets kbuild: kallsyms allow 3-pass generation if symbols size has changed
2016-12-11	make use of make variable CURDIR instead of calling pwd	Uwe Kleine-König
	make already provides the current working directory in a variable, so make use of it instead of forking a shell. Also replace usage of PWD by CURDIR. PWD is provided by most shells, but not all, so this makes the build system more robust. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Michal Marek <mmarek@suse.com>
2016-12-08	bpf: xdp: Add XDP example for head adjustment	Martin KaFai Lau
	The XDP prog checks if the incoming packet matches any VIP:PORT combination in the BPF hashmap. If it is, it will encapsulate the packet with a IPv4/v6 header as instructed by the value of the BPF hashmap and then XDP_TX it out. The VIP:PORT -> IP-Encap-Info can be specified by the cmd args of the user prog. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03	samples, bpf: Add automated test for cgroup filter attachments	Sargun Dhillon
	This patch adds the sample program test_cgrp2_attach2. This program is similar to test_cgrp2_attach, but it performs automated testing of the cgroupv2 BPF attached filters. It runs the following checks: * Simple filter attachment * Application of filters to child cgroups * Overriding filters on child cgroups * Checking that this still works when the parent filter is removed The filters that are used here are simply allow all / deny all filters, so it isn't checking the actual functionality of the filters, but rather the behaviour around detachment / attachment. If net_cls is enabled, this test will fail. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03	samples, bpf: Refactor test_current_task_under_cgroup - separate out helpers	Sargun Dhillon
	This patch modifies test_current_task_under_cgroup_user. The test has several helpers around creating a temporary environment for cgroup testing, and moving the current task around cgroups. This set of helpers can then be used in other tests. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03	samples/bpf: silence compiler warnings	Alexei Starovoitov
	silence some of the clang compiler warnings like: include/linux/fs.h:2693:9: warning: comparison of unsigned enum expression < 0 is always false arch/x86/include/asm/processor.h:491:30: warning: taking address of packed member 'sp0' of class or structure 'x86_hw_tss' may result in an unaligned pointer value include/linux/cgroup-defs.h:326:16: warning: field 'cgrp' with variable sized type 'struct cgroup' not at the end of a struct or class is a GNU extension since they add too much noise to samples/bpf/ build. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02	samples/bpf: add userspace example for prohibiting sockets	David Ahern
	Add examples preventing a process in a cgroup from opening a socket based family, protocol and type. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02	samples: bpf: add userspace example for modifying sk_bound_dev_if	David Ahern
	Add a simple program to demonstrate the ability to attach a bpf program to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets when they are created. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02	bpf: Add tests and samples for LWT-BPF	Thomas Graf
	Adds a series of tests to verify the functionality of attaching BPF programs at LWT hooks. Also adds a sample which collects a histogram of packet sizes which pass through an LWT hook. $ ./lwt_len_hist.sh Starting netserver with host 'IN(6)ADDR_ANY' port '12865' and family AF_UNSPEC MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.253.2 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 39857.69 1 -> 1 : 0 \| \| 2 -> 3 : 0 \| \| 4 -> 7 : 0 \| \| 8 -> 15 : 0 \| \| 16 -> 31 : 0 \| \| 32 -> 63 : 22 \| \| 64 -> 127 : 98 \| \| 128 -> 255 : 213 \| \| 256 -> 511 : 1444251 \|****** \| 512 -> 1023 : 660610 \|* \| 1024 -> 2047 : 535241 \| \| 2048 -> 4095 : 19 \| \| 4096 -> 8191 : 180 \| \| 8192 -> 16383 : 5578023 \|********************************* \| 16384 -> 32767 : 632099 \|* \| 32768 -> 65535 : 6575 \| \| Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30	samples/bpf: fix include path	Alexei Starovoitov
	Fix the following build error: HOSTCC samples/bpf/test_lru_dist.o ../samples/bpf/test_lru_dist.c:25:22: fatal error: bpf_util.h: No such file or directory This is due to objtree != srctree. Use srctree, since that's where bpf_util.h is located. Fixes: e00c7b216f34 ("bpf: fix multiple issues in selftest suite and samples") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27	bpf: fix multiple issues in selftest suite and samples	Daniel Borkmann
	1) The test_lru_map and test_lru_dist fails building on my machine since the sys/resource.h header is not included. 2) test_verifier fails in one test case where we try to call an invalid function, since the verifier log output changed wrt printing function names. 3) Current selftest suite code relies on sysconf(_SC_NPROCESSORS_CONF) for retrieving the number of possible CPUs. This is broken at least in our scenario and really just doesn't work. glibc tries a number of things for retrieving _SC_NPROCESSORS_CONF. First it tries equivalent of /sys/devices/system/cpu/cpu[0-9]* \| wc -l, if that fails, depending on the config, it either tries to count CPUs in /proc/cpuinfo, or returns the _SC_NPROCESSORS_ONLN value instead. If /proc/cpuinfo has some issue, it returns just 1 worst case. This oddity is nothing new [1], but semantics/behaviour seems to be settled. _SC_NPROCESSORS_ONLN will parse /sys/devices/system/cpu/online, if that fails it looks into /proc/stat for cpuX entries, and if also that fails for some reason, /proc/cpuinfo is consulted (and returning 1 if unlikely all breaks down). While that might match num_possible_cpus() from the kernel in some cases, it's really not guaranteed with CPU hotplugging, and can result in a buffer overflow since the array in user space could have too few number of slots, and on perpcu map lookup, the kernel will write beyond that memory of the value buffer. William Tu reported such mismatches: [...] The fact that sysconf(_SC_NPROCESSORS_CONF) != num_possible_cpu() happens when CPU hotadd is enabled. For example, in Fusion when setting vcpu.hotadd = "TRUE" or in KVM, setting ./qemu-system-x86_64 -smp 2, maxcpus=4 ... the num_possible_cpu() will be 4 and sysconf() will be 2 [2]. [...] Documentation/cputopology.txt says /sys/devices/system/cpu/possible outputs cpu_possible_mask. That is the same as in num_possible_cpus(), so first step would be to fix the _SC_NPROCESSORS_CONF calls with our own implementation. Later, we could add support to bpf(2) for passing a mask via CPU_SET(3), for example, to just select a subset of CPUs. BPF samples code needs this fix as well (at least so that people stop copying this). Thus, define bpf_num_possible_cpus() once in selftests and import it from there for the sample code to avoid duplicating it. The remaining sysconf(_SC_NPROCESSORS_CONF) in samples are unrelated. After all three issues are fixed, the test suite runs fine again: # make run_tests \| grep self selftests: test_verifier [PASS] selftests: test_maps [PASS] selftests: test_lru_map [PASS] selftests: test_kmod.sh [PASS] [1] https://www.sourceware.org/ml/libc-alpha/2011-06/msg00079.html [2] https://www.mail-archive.com/netdev@vger.kernel.org/msg121183.html Fixes: 3059303f59cf ("samples/bpf: update tracex[23] examples to use per-cpu maps") Fixes: 86af8b4191d2 ("Add sample for adding simple drop program to link") Fixes: df570f577231 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY") Fixes: e15596717948 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_HASH") Fixes: ebb676daa1a3 ("bpf: Print function name in addition to function id") Fixes: 5db58faf989f ("bpf: Add tests for the LRU bpf_htab") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: William Tu <u9012063@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25	samples: bpf: add userspace example for attaching eBPF programs to cgroups	Daniel Mack
	Add a simple userpace program to demonstrate the new API to attach eBPF programs to cgroups. This is what it does: * Create arraymap in kernel with 4 byte keys and 8 byte values * Load eBPF program The eBPF program accesses the map passed in to store two pieces of information. The number of invocations of the program, which maps to the number of packets received, is stored to key 0. Key 1 is incremented on each iteration by the number of bytes stored in the skb. * Detach any eBPF program previously attached to the cgroup * Attach the new program to the cgroup using BPF_PROG_ATTACH * Once a second, read map[0] and map[1] to see how many bytes and packets were seen on any socket of tasks in the given cgroup. The program takes a cgroup path as 1st argument, and either "ingress" or "egress" as 2nd. Optionally, "drop" can be passed as 3rd argument, which will make the generated eBPF program return 0 instead of 1, so the kernel will drop the packet. libbpf gained two new wrappers for the new syscall commands. Signed-off-by: Daniel Mack <daniel@zonque.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15	bpf: Add tests for the LRU bpf_htab	Martin KaFai Lau
	This patch has some unit tests and a test_lru_dist. The test_lru_dist reads in the numeric keys from a file. The files used here are generated by a modified fio-genzipf tool originated from the fio test suit. The sample data file can be found here: https://github.com/iamkafai/bpf-lru The zipf.* data files have 100k numeric keys and the key is also ranged from 1 to 100k. The test_lru_dist outputs the number of unique keys (nr_unique). F.e. The following means, 61239 of them is unique out of 100k keys. nr_misses means it cannot be found in the LRU map, so nr_misses must be >= nr_unique. test_lru_dist also simulates a perfect LRU map as a comparison: [root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \ /root/zipf.100k.a1_01.out 4000 1 ... test_parallel_lru_dist (map_type:9 map_flags:0x0): task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31603(/100000) task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000) .... test_parallel_lru_dist (map_type:9 map_flags:0x2): task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31710(/100000) task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000) [root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \ /root/zipf.100k.a0_01.out 40000 1 ... test_parallel_lru_dist (map_type:9 map_flags:0x0): task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67054(/100000) task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000) ... test_parallel_lru_dist (map_type:9 map_flags:0x2): task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67068(/100000) task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000) LRU map has also been added to map_perf_test: /* Global LRU / [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \ ./map_perf_test 16 $i \| awk '{r += $3}END{print r " updates"}'; done 1 cpus: 2934082 updates 4 cpus: 7391434 updates 8 cpus: 6500576 updates / Percpu LRU */ [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \ ./map_perf_test 32 $i \| awk '{r += $3}END{print r " updates"}'; done 1 cpus: 2896553 updates 4 cpus: 9766395 updates 8 cpus: 17460553 updates Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller
	Several cases of bug fixes in 'net' overlapping other changes in 'net-next-. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-12	bpf: Add test for bpf_redirect to ipip/ip6tnl	Martin KaFai Lau
	The test creates two netns, ns1 and ns2. The host (the default netns) has an ipip or ip6tnl dev configured for tunneling traffic to the ns2. ping VIPS from ns1 <----> host <--tunnel--> ns2 (VIPs at loopback) The test is to have ns1 pinging VIPs configured at the loopback interface in ns2. The VIPs are 10.10.1.102 and 2401:face::66 (which are configured at lo@ns2). [Note: 0x66 => 102]. At ns1, the VIPs are routed _via_ the host. At the host, bpf programs are installed at the veth to redirect packets from a veth to the ipip/ip6tnl. The test is configured in a way so that both ingress and egress can be tested. At ns2, the ipip/ip6tnl dev is configured with the local and remote address specified. The return path is routed to the dev ipip/ip6tnl. During egress test, the host also locally tests pinging the VIPs to ensure that bpf_redirect at egress also works for the direct egress (i.e. not forwarding from dev ve1 to ve2). Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18	bpf: add initial suite for selftests	Daniel Borkmann
	Add a start of a test suite for kernel selftests. This moves test_verifier and test_maps over to tools/testing/selftests/bpf/ along with various code improvements and also adds a script for invoking test_bpf module. The test suite can simply be run via selftest framework, f.e.: # cd tools/testing/selftests/bpf/ # make # make run_tests Both test_verifier and test_maps were kind of misplaced in samples/bpf/ directory and we were looking into adding them to selftests for a while now, so it can be picked up by kbuild bot et al and hopefully also get more exposure and thus new test case additions. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	samples/bpf: add sampleip example	Brendan Gregg
	sample instruction pointer and frequency count in a BPF map Signed-off-by: Brendan Gregg <bgregg@netflix.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	samples/bpf: add perf_event+bpf example	Alexei Starovoitov
	The bpf program is called 50 times a second and does hashmap[kern&user_stackid]++ It's primary purpose to check that key bpf helpers like map lookup, update, get_stackid, trace_printk and ctx access are all working. It checks: - PERF_COUNT_HW_CPU_CYCLES on all cpus - PERF_COUNT_HW_CPU_CYCLES for current process and inherited perf_events to children - PERF_COUNT_SW_CPU_CLOCK on all cpus - PERF_COUNT_SW_CPU_CLOCK for current process Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19	samples/bpf: Add tunnel set/get tests.	William Tu
	The patch creates sample code exercising bpf_skb_{set,get}_tunnel_key, and bpf_skb_{set,get}_tunnel_opt for GRE, VXLAN, and GENEVE. A native tunnel device is created in a namespace to interact with a lwtunnel device out of the namespace, with metadata enabled. The bpf_skb_set_* program is attached to tc egress and bpf_skb_get_* is attached to egress qdisc. A ping between two tunnels is used to verify correctness and the result of bpf_skb_get_* printed by bpf_trace_printk. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-12	samples/bpf: Add test_current_task_under_cgroup test	Sargun Dhillon
	This test has a BPF program which writes the last known pid to call the sync syscall within a given cgroup to a map. The user mode program creates its own mount namespace, and mounts the cgroupsv2 hierarchy in there, as on all current test systems (Ubuntu 16.04, Debian), the cgroupsv2 vfs is unmounted by default. Once it does this, it proceeds to test. The test checks for positive and negative condition. It ensures that when it's part of a given cgroup, its pid is captured in the map, and that when it leaves the cgroup, this doesn't happen. It populate a cgroups arraymap prior to execution in userspace. This means that the program must be run in the same cgroups namespace as the programs that are being traced. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Tejun Heo <tj@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	samples/bpf: Add test/example of using bpf_probe_write_user bpf helper	Sargun Dhillon
	This example shows using a kprobe to act as a dnat mechanism to divert traffic for arbitrary endpoints. It rewrite the arguments to a syscall while they're still in userspace, and before the syscall has a chance to copy the argument into kernel space. Although this is an example, it also acts as a test because the mapped address is 255.255.255.255:555 -> real address, and that's not a legal address to connect to. If the helper is broken, the example will fail on the intermediate steps, as well as the final step to verify the rewrite of userspace memory succeeded. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19	bpf: add sample for xdp forwarding and rewrite	Brenden Blanco
	Add a sample that rewrites and forwards packets out on the same interface. Observed single core forwarding performance of ~10Mpps. Since the mlx4 driver under test recycles every single packet page, the perf output shows almost exclusively just the ring management and bpf program work. Slowdowns are likely occurring due to cache misses. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19	Add sample for adding simple drop program to link	Brenden Blanco
	Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX hook of a link. With the drop-only program, observed single core rate is ~20Mpps. Other tests were run, for instance without the dropcnt increment or without reading from the packet header, the packet rate was mostly unchanged. $ perf record -a samples/bpf/xdp1 $(</sys/class/net/eth0/ifindex) proto 17: 20403027 drops/s ./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4 Running... ctrl^C to stop Device: eth4@0 Result: OK: 11791017(c11788327+d2689) usec, 59622913 (60byte,0frags) 5056638pps 2427Mb/sec (2427186240bps) errors: 0 Device: eth4@1 Result: OK: 11791012(c11787906+d3106) usec, 60526944 (60byte,0frags) 5133311pps 2463Mb/sec (2463989280bps) errors: 0 Device: eth4@2 Result: OK: 11791019(c11788249+d2769) usec, 59868091 (60byte,0frags) 5077431pps 2437Mb/sec (2437166880bps) errors: 0 Device: eth4@3 Result: OK: 11795039(c11792403+d2636) usec, 59483181 (60byte,0frags) 5043067pps 2420Mb/sec (2420672160bps) errors: 0 perf report --no-children: 26.05% ksoftirqd/0 [mlx4_en] [k] mlx4_en_process_rx_cq 17.84% ksoftirqd/0 [mlx4_en] [k] mlx4_en_alloc_frags 5.52% ksoftirqd/0 [mlx4_en] [k] mlx4_en_free_frag 4.90% swapper [kernel.vmlinux] [k] poll_idle 4.14% ksoftirqd/0 [kernel.vmlinux] [k] get_page_from_freelist 2.78% ksoftirqd/0 [kernel.vmlinux] [k] __free_pages_ok 2.57% ksoftirqd/0 [kernel.vmlinux] [k] bpf_map_lookup_elem 2.51% swapper [mlx4_en] [k] mlx4_en_process_rx_cq 1.94% ksoftirqd/0 [kernel.vmlinux] [k] percpu_array_map_lookup_elem 1.45% swapper [mlx4_en] [k] mlx4_en_alloc_frags 1.35% ksoftirqd/0 [kernel.vmlinux] [k] free_one_page 1.33% swapper [kernel.vmlinux] [k] intel_idle 1.04% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5c5 0.96% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c58d 0.93% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c6ee 0.92% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c6b9 0.89% ksoftirqd/0 [kernel.vmlinux] [k] __alloc_pages_nodemask 0.83% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c686 0.83% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5d5 0.78% ksoftirqd/0 [mlx4_en] [k] mlx4_alloc_pages.isra.23 0.77% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5b4 0.77% ksoftirqd/0 [kernel.vmlinux] [k] net_rx_action machine specs: receiver - Intel E5-1630 v3 @ 3.70GHz sender - Intel E5645 @ 2.40GHz Mellanox ConnectX-3 @40G Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01	cgroup: bpf: Add an example to do cgroup checking in BPF	Martin KaFai Lau
	test_cgrp2_array_pin.c: A userland program that creates a bpf_map (BPF_MAP_TYPE_GROUP_ARRAY), pouplates/updates it with a cgroup2's backed fd and pins it to a bpf-fs's file. The pinned file can be loaded by tc and then used by the bpf prog later. This program can also update an existing pinned array and it could be useful for debugging/testing purpose. test_cgrp2_tc_kern.c: A bpf prog which should be loaded by tc. It is to demonstrate the usage of bpf_skb_in_cgroup. test_cgrp2_tc.sh: A script that glues the test_cgrp2_array_pin.c and test_cgrp2_tc_kern.c together. The idea is like: 1. Load the test_cgrp2_tc_kern.o by tc 2. Use test_cgrp2_array_pin.c to populate a BPF_MAP_TYPE_CGROUP_ARRAY with a cgroup fd 3. Do a 'ping -6 ff02::1%ve' to ensure the packet has been dropped because of a match on the cgroup Most of the lines in test_cgrp2_tc.sh is the boilerplate to setup the cgroup/bpf-fs/net-devices/netns...etc. It is not bulletproof on errors but should work well enough and give enough debug info if things did not go well. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Tejun Heo <tj@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06	samples/bpf: add 'pointer to packet' tests	Alexei Starovoitov
	parse_simple.c - packet parser exapmle with single length check that filters out udp packets for port 9 parse_varlen.c - variable length parser that understand multiple vlan headers, ipip, ipip6 and ip options to filter out udp or tcp packets on port 9. The packet is parsed layer by layer with multitple length checks. parse_ldabs.c - classic style of packet parsing using LD_ABS instruction. Same functionality as parse_simple. simple = 24.1Mpps per core varlen = 22.7Mpps ldabs = 21.4Mpps Parser with LD_ABS instructions is slower than full direct access parser which does more packet accesses and checks. These examples demonstrate the choice bpf program authors can make between flexibility of the parser vs speed. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29	samples/bpf: like LLC also verify and allow redefining CLANG command	Jesper Dangaard Brouer
	Users are likely to manually compile both LLVM 'llc' and 'clang' tools. Thus, also allow redefining CLANG and verify command exist. Makefile implementation wise, the target that verify the command have been generalized. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29	samples/bpf: allow make to be run from samples/bpf/ directory	Jesper Dangaard Brouer
	It is not intuitive that 'make' must be run from the top level directory with argument "samples/bpf/" to compile these eBPF samples. Introduce a kbuild make file trick that allow make to be run from the "samples/bpf/" directory itself. It basically change to the top level directory and call "make samples/bpf/" with the "/" slash after the directory name. Also add a clean target that only cleans this directory, by taking advantage of the kbuild external module setting M=$PWD. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29	samples/bpf: Makefile verify LLVM compiler avail and bpf target is supported	Jesper Dangaard Brouer
	Make compiling samples/bpf more user friendly, by detecting if LLVM compiler tool 'llc' is available, and also detect if the 'bpf' target is available in this version of LLVM. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29	samples/bpf: add back functionality to redefine LLC command	Jesper Dangaard Brouer
	It is practical to be-able-to redefine the location of the LLVM command 'llc', because not all distros have a LLVM version with bpf target support. Thus, it is sometimes required to compile LLVM from source, and sometimes it is not desired to overwrite the distros default LLVM version. This feature was removed with 128d1514be35 ("samples/bpf: Use llc in PATH, rather than a hardcoded value"). Add this features back. Note that it is possible to redefine the LLC on the make command like: make samples/bpf/ LLC=~/git/llvm/build/bin/llc Fixes: 128d1514be35 ("samples/bpf: Use llc in PATH, rather than a hardcoded value") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-09	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller

2016-04-07	samples/bpf: add tracepoint vs kprobe performance tests	Alexei Starovoitov
	the first microbenchmark does fd=open("/proc/self/comm"); for() { write(fd, "test"); } and on 4 cpus in parallel: writes per sec base (no tracepoints, no kprobes) 930k with kprobe at __set_task_comm() 420k with tracepoint at task:task_rename 730k For kprobe + full bpf program manully fetches oldcomm, newcomm via bpf_probe_read. For tracepint bpf program does nothing, since arguments are copied by tracepoint. 2nd microbenchmark does: fd=open("/dev/urandom"); for() { read(fd, buf); } and on 4 cpus in parallel: reads per sec base (no tracepoints, no kprobes) 300k with kprobe at urandom_read() 279k with tracepoint at random:urandom_read 290k bpf progs attached to kprobe and tracepoint are noop. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	samples/bpf: Use llc in PATH, rather than a hardcoded value	Naveen N. Rao
	While at it, remove the generation of .s files and fix some typos in the related comment. Cc: Alexei Starovoitov <ast@fb.com> Cc: David S. Miller <davem@davemloft.net> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08	samples/bpf: add map performance test	Alexei Starovoitov
	performance tests for hash map and per-cpu hash map with and without pre-allocation Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08	samples/bpf: add bpf map stress test	Alexei Starovoitov
	this test calls bpf programs from different contexts: from inside of slub, from rcu, from pretty much everywhere, since it kprobes all spin_lock functions. It stresses the bpf hash and percpu map pre-allocation, deallocation logic and call_rcu mechanisms. User space part adding more stress by walking and deleting map elements. Note that due to nature bpf_load.c the earlier kprobe+bpf programs are already active while loader loads new programs, creates new kprobes and attaches them. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-20	samples/bpf: offwaketime example	Alexei Starovoitov
	This is simplified version of Brendan Gregg's offwaketime: This program shows kernel stack traces and task names that were blocked and "off-CPU", along with the stack traces and task names for the threads that woke them, and the total elapsed time from when they blocked to when they were woken up. The combined stacks, task names, and total time is summarized in kernel context for efficiency. Example: $ sudo ./offwaketime \| flamegraph.pl > demo.svg Open demo.svg in the browser as FlameGraph visualization. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	bpf: samples: exclude asm/sysreg.h for arm64	Yang Shi
	commit 338d4f49d6f7114a017d294ccf7374df4f998edc ("arm64: kernel: Add support for Privileged Access Never") includes sysreg.h into futex.h and uaccess.h. But, the inline assembly used by asm/sysreg.h is incompatible with llvm so it will cause BPF samples build failure for ARM64. Since sysreg.h is useless for BPF samples, just exclude it from Makefile via defining __ASM_SYSREG_H. Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02	bpf: add sample usages for persistent maps/progs	Daniel Borkmann
	This patch adds a couple of stand-alone examples on how BPF_OBJ_PIN and BPF_OBJ_GET commands can be used. Example with maps: # ./fds_example -F /sys/fs/bpf/m -P -m -k 1 -v 42 bpf: map fd:3 (Success) bpf: pin ret:(0,Success) bpf: fd:3 u->(1:42) ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):42 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 -v 24 bpf: get fd:3 (Success) bpf: fd:3 u->(1:24) ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):24 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -P -m bpf: map fd:3 (Success) bpf: pin ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):0 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -G -m bpf: get fd:3 (Success) Example with progs: # ./fds_example -F /sys/fs/bpf/p -P -p bpf: prog fd:3 (Success) bpf: pin ret:(0,Success) bpf sock:4 <- fd:3 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p -G -p bpf: get fd:3 (Success) bpf: sock:4 <- fd:3 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p2 -P -p -o ./sockex1_kern.o bpf: prog fd:5 (Success) bpf: pin ret:(0,Success) bpf: sock:3 <- fd:5 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p2 -G -p bpf: get fd:3 (Success) bpf: sock:4 <- fd:3 attached ret:(0,Success) Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22	samples: bpf: add bpf_perf_event_output example	Alexei Starovoitov
	Performance test and example of bpf_perf_event_output(). kprobe is attached to sys_write() and trivial bpf program streams pid+cookie into userspace via PERF_COUNT_SW_BPF_OUTPUT event. Usage: $ sudo ./bld_x64/samples/bpf/trace_output recv 2968913 events per sec Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09	samples/bpf: example of get selected PMU counter value	Kaixu Xia
	This is a simple example and shows how to use the new ability to get the selected Hardware PMU counter value. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-23	bpf: BPF based latency tracing	Daniel Wagner
	BPF offers another way to generate latency histograms. We attach kprobes at trace_preempt_off and trace_preempt_on and calculate the time it takes to from seeing the off/on transition. The first array is used to store the start time stamp. The key is the CPU id. The second array stores the log2(time diff). We need to use static allocation here (array and not hash tables). The kprobes hooking into trace_preempt_on\|off should not calling any dynamic memory allocation or free path. We need to avoid recursivly getting called. Besides that, it reduces jitter in the measurement. CPU 0 latency : count distribution 1 -> 1 : 0 \| \| 2 -> 3 : 0 \| \| 4 -> 7 : 0 \| \| 8 -> 15 : 0 \| \| 16 -> 31 : 0 \| \| 32 -> 63 : 0 \| \| 64 -> 127 : 0 \| \| 128 -> 255 : 0 \| \| 256 -> 511 : 0 \| \| 512 -> 1023 : 0 \| \| 1024 -> 2047 : 0 \| \| 2048 -> 4095 : 166723 \|************************************* \| 4096 -> 8191 : 19870 \|* \| 8192 -> 16383 : 6324 \| \| 16384 -> 32767 : 1098 \| \| 32768 -> 65535 : 190 \| \| 65536 -> 131071 : 179 \| \| 131072 -> 262143 : 18 \| \| 262144 -> 524287 : 4 \| \| 524288 -> 1048575 : 1363 \| \| CPU 1 latency : count distribution 1 -> 1 : 0 \| \| 2 -> 3 : 0 \| \| 4 -> 7 : 0 \| \| 8 -> 15 : 0 \| \| 16 -> 31 : 0 \| \| 32 -> 63 : 0 \| \| 64 -> 127 : 0 \| \| 128 -> 255 : 0 \| \| 256 -> 511 : 0 \| \| 512 -> 1023 : 0 \| \| 1024 -> 2047 : 0 \| \| 2048 -> 4095 : 114042 \|************************************* \| 4096 -> 8191 : 9587 \| \| 8192 -> 16383 : 4140 \| \| 16384 -> 32767 : 673 \| \| 32768 -> 65535 : 179 \| \| 65536 -> 131071 : 29 \| \| 131072 -> 262143 : 4 \| \| 262144 -> 524287 : 1 \| \| 524288 -> 1048575 : 364 \| \| CPU 2 latency : count distribution 1 -> 1 : 0 \| \| 2 -> 3 : 0 \| \| 4 -> 7 : 0 \| \| 8 -> 15 : 0 \| \| 16 -> 31 : 0 \| \| 32 -> 63 : 0 \| \| 64 -> 127 : 0 \| \| 128 -> 255 : 0 \| \| 256 -> 511 : 0 \| \| 512 -> 1023 : 0 \| \| 1024 -> 2047 : 0 \| \| 2048 -> 4095 : 40147 \|*************************************** \| 4096 -> 8191 : 2300 \|* \| 8192 -> 16383 : 828 \| \| 16384 -> 32767 : 178 \| \| 32768 -> 65535 : 59 \| \| 65536 -> 131071 : 2 \| \| 131072 -> 262143 : 0 \| \| 262144 -> 524287 : 1 \| \| 524288 -> 1048575 : 174 \| \| CPU 3 latency : count distribution 1 -> 1 : 0 \| \| 2 -> 3 : 0 \| \| 4 -> 7 : 0 \| \| 8 -> 15 : 0 \| \| 16 -> 31 : 0 \| \| 32 -> 63 : 0 \| \| 64 -> 127 : 0 \| \| 128 -> 255 : 0 \| \| 256 -> 511 : 0 \| \| 512 -> 1023 : 0 \| \| 1024 -> 2047 : 0 \| \| 2048 -> 4095 : 29626 \|************************************* \| 4096 -> 8191 : 2704 \| \| 8192 -> 16383 : 1090 \| \| 16384 -> 32767 : 160 \| \| 32768 -> 65535 : 72 \| \| 65536 -> 131071 : 32 \| \| 131072 -> 262143 : 26 \| \| 262144 -> 524287 : 12 \| \| 524288 -> 1048575 : 298 \| \| All this is based on the trace3 examples written by Alexei Starovoitov <ast@plumgrid.com>. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21	samples/bpf: bpf_tail_call example for networking	Alexei Starovoitov
	Usage: $ sudo ./sockex3 IP src.port -> dst.port bytes packets 127.0.0.1.42010 -> 127.0.0.1.12865 1568 8 127.0.0.1.59526 -> 127.0.0.1.33778 11422636 173070 127.0.0.1.33778 -> 127.0.0.1.59526 11260224828 341974 127.0.0.1.12865 -> 127.0.0.1.42010 1832 12 IP src.port -> dst.port bytes packets 127.0.0.1.42010 -> 127.0.0.1.12865 1568 8 127.0.0.1.59526 -> 127.0.0.1.33778 23198092 351486 127.0.0.1.33778 -> 127.0.0.1.59526 22972698518 698616 127.0.0.1.12865 -> 127.0.0.1.42010 1832 12 this example is similar to sockex2 in a way that it accumulates per-flow statistics, but it does packet parsing differently. sockex2 inlines full packet parser routine into single bpf program. This sockex3 example have 4 independent programs that parse vlan, mpls, ip, ipv6 and one main program that starts the process. bpf_tail_call() mechanism allows each program to be small and be called on demand potentially multiple times, so that many vlan, mpls, ip in ip, gre encapsulations can be parsed. These and other protocol parsers can be added or removed at runtime. TLVs can be parsed in similar manner. Note, tail_call_cnt dynamic check limits the number of tail calls to 32. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21	samples/bpf: bpf_tail_call example for tracing	Alexei Starovoitov
	kprobe example that demonstrates how future seccomp programs may look like. It attaches to seccomp_phase1() function and tail-calls other BPF programs depending on syscall number. Existing optimized classic BPF seccomp programs generated by Chrome look like: if (sd.nr < 121) { if (sd.nr < 57) { if (sd.nr < 22) { if (sd.nr < 7) { if (sd.nr < 4) { if (sd.nr < 1) { check sys_read } else { if (sd.nr < 3) { check sys_write and sys_open } else { check sys_close } } } else { } else { } else { } else { } else { } the future seccomp using native eBPF may look like: bpf_tail_call(&sd, &syscall_jmp_table, sd.nr); which is simpler, faster and leaves more room for per-syscall checks. Usage: $ sudo ./tracex5 <...>-366 [001] d... 4.870033: : read(fd=1, buf=00007f6d5bebf000, size=771) <...>-369 [003] d... 4.870066: : mmap <...>-369 [003] d... 4.870077: : syscall=110 (one of get/set uid/pid/gid) <...>-369 [003] d... 4.870089: : syscall=107 (one of get/set uid/pid/gid) sh-369 [000] d... 4.891740: : read(fd=0, buf=00000000023d1000, size=512) sh-369 [000] d... 4.891747: : write(fd=1, buf=00000000023d3000, size=512) sh-369 [000] d... 4.891747: : read(fd=1, buf=00000000023d3000, size=512) Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12	samples/bpf: fix in-source build of samples with clang	Brenden Blanco
	in-source build of 'make samples/bpf/' was incorrectly using default compiler instead of invoking clang/llvm. out-of-source build was ok. Fixes: a80857822b0c ("samples: bpf: trivial eBPF program in C") Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-15	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next	Linus Torvalds
	Pull networking updates from David Miller: 1) Add BQL support to via-rhine, from Tino Reichardt. 2) Integrate SWITCHDEV layer support into the DSA layer, so DSA drivers can support hw switch offloading. From Floria Fainelli. 3) Allow 'ip address' commands to initiate multicast group join/leave, from Madhu Challa. 4) Many ipv4 FIB lookup optimizations from Alexander Duyck. 5) Support EBPF in cls_bpf classifier and act_bpf action, from Daniel Borkmann. 6) Remove the ugly compat support in ARP for ugly layers like ax25, rose, etc. And use this to clean up the neigh layer, then use it to implement MPLS support. All from Eric Biederman. 7) Support L3 forwarding offloading in switches, from Scott Feldman. 8) Collapse the LOCAL and MAIN ipv4 FIB tables when possible, to speed up route lookups even further. From Alexander Duyck. 9) Many improvements and bug fixes to the rhashtable implementation, from Herbert Xu and Thomas Graf. In particular, in the case where an rhashtable user bulk adds a large number of items into an empty table, we expand the table much more sanely. 10) Don't make the tcp_metrics hash table per-namespace, from Eric Biederman. 11) Extend EBPF to access SKB fields, from Alexei Starovoitov. 12) Split out new connection request sockets so that they can be established in the main hash table. Much less false sharing since hash lookups go direct to the request sockets instead of having to go first to the listener then to the request socks hashed underneath. From Eric Dumazet. 13) Add async I/O support for crytpo AF_ALG sockets, from Tadeusz Struk. 14) Support stable privacy address generation for RFC7217 in IPV6. From Hannes Frederic Sowa. 15) Hash network namespace into IP frag IDs, also from Hannes Frederic Sowa. 16) Convert PTP get/set methods to use 64-bit time, from Richard Cochran. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1816 commits) fm10k: Bump driver version to 0.15.2 fm10k: corrected VF multicast update fm10k: mbx_update_max_size does not drop all oversized messages fm10k: reset head instead of calling update_max_size fm10k: renamed mbx_tx_dropped to mbx_tx_oversized fm10k: update xcast mode before synchronizing multicast addresses fm10k: start service timer on probe fm10k: fix function header comment fm10k: comment next_vf_mbx flow fm10k: don't handle mailbox events in iov_event path and always process mailbox fm10k: use separate workqueue for fm10k driver fm10k: Set PF queues to unlimited bandwidth during virtualization fm10k: expose tx_timeout_count as an ethtool stat fm10k: only increment tx_timeout_count in Tx hang path fm10k: remove extraneous "Reset interface" message fm10k: separate PF only stats so that VF does not display them fm10k: use hw->mac.max_queues for stats fm10k: only show actual queues, not the maximum in hardware fm10k: allow creation of VLAN on default vid fm10k: fix unused warnings ...
2015-04-06	tc: bpf: add checksum helpers	Alexei Starovoitov
	Commit 608cd71a9c7c ("tc: bpf: generalize pedit action") has added the possibility to mangle packet data to BPF programs in the tc pipeline. This patch adds two helpers bpf_l3_csum_replace() and bpf_l4_csum_replace() for fixing up the protocol checksums after the packet mangling. It also adds 'flags' argument to bpf_skb_store_bytes() helper to avoid unnecessary checksum recomputations when BPF programs adjusting l3/l4 checksums and documents all three helpers in uapi header. Moreover, a sample program is added to show how BPF programs can make use of the mangle and csum helpers. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02	samples/bpf: Add kmem_alloc()/free() tracker tool	Alexei Starovoitov
	One BPF program attaches to kmem_cache_alloc_node() and remembers all allocated objects in the map. Another program attaches to kmem_cache_free() and deletes corresponding object from the map. User space walks the map every second and prints any objects which are older than 1 second. Usage: $ sudo tracex4 Then start few long living processes. The 'tracex4' will print something like this: obj 0xffff880465928000 is 13sec old was allocated at ip ffffffff8105dc32 obj 0xffff88043181c280 is 13sec old was allocated at ip ffffffff8105dc32 obj 0xffff880465848000 is 8sec old was allocated at ip ffffffff8105dc32 obj 0xffff8804338bc280 is 15sec old was allocated at ip ffffffff8105dc32 $ addr2line -fispe vmlinux ffffffff8105dc32 do_fork at fork.c:1665 As soon as processes exit the memory is reclaimed and 'tracex4' prints nothing. Similar experiment can be done with the __kmalloc()/kfree() pair. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1427312966-8434-10-git-send-email-ast@plumgrid.com Signed-off-by: Ingo Molnar <mingo@kernel.org>