summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-10-24perf list: Make vendor event matching case insensitiveAndi Kleen
Make the 'perf list' glob matching for vendor events case insensitive. This allows to use the upper case vendor events with perf list too. Now the following works: % perf list LONGEST_LAT ... cache: longest_lat_cache.miss [Core-originated cacheable demand requests missed LLC] longest_lat_cache.reference [Core-originated cacheable demand requests that refer to LLC] Signed-off-by: Andi Kleen <ak@linux.intel.com> Suggested-by: Ingo Molnar <mingo@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1476899402-31460-1-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf trace: Use the syscall raw_syscalls:sys_enter timestampArnaldo Carvalho de Melo
Instead of the one when another syscall takes place while another is being processed (in another CPU, but we show it serialized, so need to "interrupt" the other), and also when finally showing the sys_enter + sys_exit + duration, where we were showing the sample->time for the sys_exit, duh. Before: # perf trace sleep 1 <SNIP> 0.373 ( 0.001 ms): close(fd: 3 ) = 0 1000.626 (1000.211 ms): nanosleep(rqtp: 0x7ffd6ddddfb0) = 0 1000.653 ( 0.003 ms): close(fd: 1 ) = 0 1000.657 ( 0.002 ms): close(fd: 2 ) = 0 1000.667 ( 0.000 ms): exit_group( ) # After: # perf trace sleep 1 <SNIP> 0.336 ( 0.001 ms): close(fd: 3 ) = 0 0.373 (1000.086 ms): nanosleep(rqtp: 0x7ffe303e9550) = 0 1000.481 ( 0.002 ms): close(fd: 1 ) = 0 1000.485 ( 0.001 ms): close(fd: 2 ) = 0 1000.494 ( 0.000 ms): exit_group( ) [root@jouet linux]# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ecbzgmu2ni6glc6zkw8p1zmx@git.kernel.org Fixes: 752fde44fd1c ("perf trace: Support interrupted syscalls") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf trace: Remove thread_trace->exit_timeArnaldo Carvalho de Melo
Not used at all, we need just the entry_time to calculate the syscall duration. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-js6r09zdwlzecvaei7t4l3vd@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf bench futex: Cache align the worker structSebastian Andrzej Siewior
It popped up in perf testing that the worker consumes some amount of CPU. It boils down to the increment of `ops` which causes cache line bouncing between the individual threads. This patch aligns the struct by 256 bytes to ensure that not a cache line is shared among CPUs. 128 byte is the x86 worst case and grep says that L1_CACHE_SHIFT is set to 8 on s390. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161016190803.3392-1-bigeasy@linutronix.de Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf tools: Use normal error reporting when processing PERF_RECORD_READ eventsArnaldo Carvalho de Melo
We already have handling for errors when processing PERF_RECORD_ events, so instead of calling die() when not being able to alloc, propagate the error, so that the normal UI exit sequence can take place, the user be warned and possibly the terminal be properly reset to a sane mode. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-r90je3c009a125dvs3525yge@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf tools: Normalize sq_quote_argv() error reportingArnaldo Carvalho de Melo
It already returns whatever strbuf_(grow|addch)() returns in case of failure, so just return -ENOSPC in the only case where it was die()ing. When it returns, its only caller will call die() anyway, so no need to be so eager, die later. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-as05b7mbogprlwi8iarwns8e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf bench mem: Move boilerplate memory allocation to the infrastructureArnaldo Carvalho de Melo
Instead of having all tests perform alloc/free, do it in the code that calls the do_cycles() and do_gettimeofday() functions. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-lywj4mbdb1m9x1z9asivwuuy@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf trace: Implement --delayAlexis Berlemont
In the perf wiki todo-list[1], there is an entry regarding initial-delay and 'perf trace'; the following small patch tries to fulfill this point. It has been generated against the branch tip/perf/core. It has only been implemented in the "trace__run" case. Ex.: $ sudo strace -- ./perf trace --delay 5 sleep 1 2>&1 ... fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 ioctl(7, PERF_EVENT_IOC_ID, 0x7ffc8fd35718) = 0 ioctl(11, PERF_EVENT_IOC_SET_OUTPUT, 0x7) = 0 fcntl(11, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 ioctl(11, PERF_EVENT_IOC_ID, 0x7ffc8fd35718) = 0 write(6, "\0", 1) = 1 close(6) = 0 nanosleep({0, 5000000}, NULL) = 0 # DELAY OF 5 MS BEFORE ENABLING THE EVENTS ioctl(3, PERF_EVENT_IOC_ENABLE, 0) = 0 ioctl(4, PERF_EVENT_IOC_ENABLE, 0) = 0 ioctl(5, PERF_EVENT_IOC_ENABLE, 0) = 0 ioctl(7, PERF_EVENT_IOC_ENABLE, 0) = 0 ... [1]: https://perf.wiki.kernel.org/index.php/Todo Signed-off-by: Alexis Berlemont <alexis.berlemont@gmail.com> Suggested-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161010054328.4028-2-alexis.berlemont@gmail.com [ Add entry to the manpage, cut'n'pasted from stat's and record's ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf hists browser: Dynamically change verbosity levelAlexis Berlemont
Here is a small patch which tries to fulfill a point in the perf todo list: * Make pressing 'V' multiple times to go on cycling thru various verbosity levels in 'perf top', so that info that is present in 'perf top -v' can be obtained without having to restart the tool (acme). After a small grep in the code, the max verbosity level seems 3; so, we cycle at 4; I did not dare define a MAX_VERBOSE_LEVEL constant. Signed-off-by: Alexis Berlemont <alexis.berlemont@gmail.com> Suggested-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161012214823.14324-2-alexis.berlemont@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf tools: Fix typo "No enough" to "Not enough"Alexander Alemayhu
The latter version occurs much more when running git grep. Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20161013161811.4939-1-alexander@alemayhu.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf pmu: Only print Using CPUID message onceAndi Kleen
With uncore event aliases which are duplicated over multiple PMUs the "Using CPUID" message with -v could be printed many times. Only print it once. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1476393332-20732-3-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf jit: Add jitdump format specification documentStephane Eranian
This patch adds a formal specification of the jitdump format. The goal is to help jit runtime developers implement the jitdump support without having to read the jvmti code. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1476356383-30100-10-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf jit: Check JITHEADER_VERSIONStefano Sanfilippo
Check the version number when opening a jitdump file. Accept older versions, but not newer ones. Signed-off-by: Stefano Sanfilippo <ssanfilippo@chromium.org> Signed-off-by: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1476356383-30100-9-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf jit: Generate .eh_frame/.eh_frame_hdr in DSOStefano Sanfilippo
When the jit_buf_desc contains unwinding information, it is emitted as eh_frame unwinding sections in the DSOs generated by perf inject. The unwinding information is required to unwind of JITed code which do not maintain the frame pointer register during function calls. It can be emitted by V8 / Chromium when the --perf_prof_unwinding_info is passed to V8. The eh_frame and eh_frame_hdr sections are emitted immediately after the .text. The .eh_frame is aligned at a 8-byte boundary, and .eh_frame_hdr at a 4-byte one. Since size of the .eh_frame is required to be a multiple of the word size, which means there will never be additional padding between it and the .eh_frame_hdr on machines where the word size is 4 or 8 bytes. However, additional padding might be inserted between .text and .eh_frame to reach the correct alignment, which will always be 8 bytes, also on 32bit machines. The reasoning behind this choice is that 4 extra bytes of padding worst case are not a large cost for the advantage of removing word-size dependent offset calculations when emitting the jitdump. Signed-off-by: Stefano Sanfilippo <ssanfilippo@chromium.org> Signed-off-by: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1476356383-30100-8-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf jit: Add unwinding supportStefano Sanfilippo
This record is intended to provide unwinding information in the eh_frame format. This is required to unwind JITed code which does not maintain the frame pointer register during function calls. The eh_frame unwinding information can be emitted by V8 / Chromium when the --perf_prof_unwinding_info is passed. A record of type jr_code_unwinding_info comes before the jr_code_load it referred to and contains both the .eh_frame and .eh_frame_hdr. The fields in the header have the following meaning: * unwinding_size: size of the eh_frame and eh_frame_hdr, necessary for distinguishing the content from the padding. * eh_frame_hdr_size: as the name says. * mapped_size: size of the payload that was in memory at runtime. typically unwinding_size if the .eh_frame_hdr and .eh_frame were mapped, or 0 if they weren't. It should always be the former case, since the .eh_frame is guaranteed to be mapped in memory. However, certain JITs might want to inject an .eh_frame_hdr with an empty LUT to trigger fp-based unwinding fallback in libunwind. The only part of the .eh_frame_hdr that libunwind reads from remote memory is the LUT, and since there is none, mapping the unwinding info in memory is not necessary, and 0 in this field signifies that it wasn't. This practical hack allows to save bytes in code memory for those JIT compilers that might or might not maintain a valid frame pointer. The payload that follows is assumed to contain first the .eh_frame and then the .eh_header_hdr, with no padding between the two. Signed-off-by: Stefano Sanfilippo <ssanfilippo@chromium.org> Signed-off-by: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1476356383-30100-7-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf jit: Do not assume pgoff is zeroStefano Sanfilippo
When calculating .eh_frame_hdr base and LUT offsets do not always assume that pgoff is zero. The assumption is false for DSOs built from the jitdump by perf inject, because the ELF header did not exist in memory at sampling time. Signed-off-by: Stefano Sanfilippo <ssanfilippo@chromium.org> Signed-off-by: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1476356383-30100-6-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf jit: Make perf skip unknown recordsStefano Sanfilippo
The behavior before this commit was to skip the remaining portion of the jitdump in case an unknown record was found, including those records that perf could handle. With this change, parsing a record with an unknown id will cause a warning to be emitted, the record will be skipped and parsing will resume from the next (valid) one. The patch aims at making perf more future proof, by extracting as much information as possible from jitdumps. Signed-off-by: Stefano Sanfilippo <ssanfilippo@chromium.org> Signed-off-by: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1476356383-30100-5-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf jit: Remove unecessary padding in jitdump fileStephane Eranian
This patch removes all the string padding generated in the jitdump file. They are not necessary and were adding unnecessary complexity. Modern processors can handle unaligned accesses quite well. The perf.data/ jitdump file are always post-processed, no need to add extra complexity for no real gain. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1476356383-30100-4-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf jit: Enable jitdump support without dwarfMaciej Debski
This patch modifies the build dependencies on the jitdump support in perf. As it stands jitdump was wrongfully made dependent 100% on using DWARF. However, the dwarf dependency, only exist if generating the source line table in genelf_debug.c. The rest of the support does not need DWARF. This patch removes the dependency on DWARF for the entire jitdump support. It keeps it only for the genelf_debug.c support. Signed-off-by: Maciej Debski <maciejd@google.com> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1476356383-30100-3-git-send-email-eranian@google.com Fixes: e12b202f8fb9 ("perf jitdump: Build only on supported archs") [ Make it build only if NO_LIBELF isn't defined, as jitdump.o will only be built in that case ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf jit: Improve error messages from JVMTIStephane Eranian
This patch improves the usefulness of error messages generated by the JVMTI interfac.e This can help identify the root cause of a problem by printing the actual error code. The patch adds a new helper function called print_error(). Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1476356383-30100-2-git-send-email-eranian@google.com [ Handle failure to convert numeric error to a string in print_error() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf jit: Add NT_GNU_BUILD_ID definition for older distrosArnaldo Carvalho de Melo
Such as CentOS5, where such define is not present in elf.h. This file, genelf.c, wasn't being built for several systems, because it mistakenly was conditional on some DWARF features, now that it is just needing libelf, after "perf jit: Enable jitdump support without dwarf" it fails. So, as preparation for "perf jit: Enable jitdump support without dwarf", conditionally define it, if not available. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Maciej Debski <maciejd@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-k09qay1cmr0l3fzprmztzy3o@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf jit: Avoid returning garbage for a ret variableArnaldo Carvalho de Melo
When the loop body isn't executed at all, then the 'ret' local variable, that is uninitialized will be used as the return value. This triggers this error on Alpine Linux: CC /tmp/build/perf/util/demangle-java.o CC /tmp/build/perf/util/demangle-rust.o CC /tmp/build/perf/util/jitdump.o CC /tmp/build/perf/util/genelf.o util/jitdump.c: In function 'jit_process': util/jitdump.c:622:3: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized] fprintf(stderr, "injected: %s (%d)\n", path, ret); ^ util/jitdump.c:584:6: note: 'ret' was declared here int ret; ^ FLEX /tmp/build/perf/util/parse-events-flex.c / $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-alpine-linux-musl/5.3.0/lto-wrapper Target: x86_64-alpine-linux-musl Configured with: /home/buildozer/aports/main/gcc/src/gcc-5.3.0/configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info +--build=x86_64-alpine-linux-musl --host=x86_64-alpine-linux-musl --target=x86_64-alpine-linux-musl --with-pkgversion='Alpine 5.3.0' --enable-checking=release +--disable-fixed-point --disable-libstdcxx-pch --disable-multilib --disable-nls --disable-werror --disable-symvers --enable-__cxa_atexit --enable-esp +--enable-cloog-backend --enable-languages=c,c++,objc,java,fortran,ada --disable-libssp --disable-libmudflap --disable-libsanitizer --enable-shared +--enable-threads --enable-tls --with-system-zlib Thread model: posix gcc version 5.3.0 (Alpine 5.3.0) But this so far got under the radar, not causing any build problem, till the "perf jit: enable jitdump support without dwarf" gets applied, when the above problem takes place, some combination of inlining or whatever, the problem is real, so fix it by initializing the variable to zero. Cc: Anton Blanchard <anton@ozlabs.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Maciej Debski <maciejd@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lkml.kernel.org/r/20161013200437.GA12815@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf tools: Implement branch_type event parameterAndi Kleen
It can be useful to specify branch type state per event, for example if we want to collect both software trace points and last branch PMU events in a single collection. Currently this doesn't work because the software trace point errors out with -b. There was already a branch-type parameter to configure branch sample types per event in the parser, but it was stubbed out. This patch implements the necessary plumbing to actually enable it. Now: $ perf record -e sched:sched_switch,cpu/cpu-cycles,branch_type=any/ ... works. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1476306127-19721-1-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf record: Improve documentation of event parametersAndi Kleen
- Some editing (params -> parameters) - Point to the now more complete list of parameters in the perf list manpage. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1476381433-22959-1-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf header: Display feature name on write failureJiri Olsa
Display name of feature instead of just the number during recording data. Before: failed to write feature 13 Now: failed to write feature HEADER_CPU_TOPOLOGY Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-k9d9trozi5kkx737cy8n5xh5@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf header: Display missing featuresJiri Olsa
Display missing features in header info, like: $ perf report --header-only # ======== # captured on: Mon Oct 10 09:39:47 2016 ... # missing features: HEADER_TRACING_DATA HEADER_CPU_TOPOLOGY ... To help in diagnosing problems. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-bh5gp84gobdmyl345dcp64se@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf report: Move captured info to generic header infoJiri Olsa
It's not displayed in TUI now, putting it into generic part. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-5fk88kejqgi50ye7xdkhiloz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24tools lib: Add for_each_clear_bit macroJiri Olsa
Adding for_each_clear_bit macro plus all its the necessary backbone functions. Taken from related kernel code. It will be used in following patch. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-cayv2zbqi0nlmg5sjjxs1775@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24tools lib traceevent: Add version for traceevent shared objectJiri Olsa
Adding version support for libtraceevent.so object. Using the existing EVENT_PARSE_VERSION variable to construct the .so object version string, which now consists of: $(EP_VERSION).$(EP_PATCHLEVEL).$(EP_EXTRAVERSION) Looks like it was created for this purpose anyway. The build will now produce following traeceevent libraries: $ ll libtraceevent* libtraceevent.a libtraceevent.so -> libtraceevent.so.1.1.0 libtraceevent.so.1 -> libtraceevent.so.1.1.0 libtraceevent.so.1.1.0 Also the install target will carry them: $ make DESTDIR=/tmp/krava prefix=/usr install INSTALL trace_plugins INSTALL libtraceevent.a INSTALL libtraceevent.so.1.1.0 $ find /tmp/krava/ | xargs ls -l ... /tmp/krava/usr/lib64: total 572 libtraceevent.a libtraceevent.so -> libtraceevent.so.1.1.0 libtraceevent.so.1 -> libtraceevent.so.1.1.0 libtraceevent.so.1.1.0 ... Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-v64z62fh0dwt0ueie5usrnac@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24tools lib traceevent: Rename LIB_FILE to LIB_TARGETJiri Olsa
To ease up following patch. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-zpv5gd8y7clwrhh6dq03ucd5@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24tools lib traceevent: Add do_install_mkdir Makefile functionJiri Olsa
Decompose the do_install function to ease up the following patch a little. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-zzs19yx8seyors532vuer37w@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24tools lib traceevent: Add install_headers targetJiri Olsa
Adding install_headers target to install all headers under 'include/traceevent' path, like: $ make DESTDIR=/tmp/krava prefix=/usr install_headers $ find /tmp/krava/ -type f /tmp/krava/usr/include/traceevent/kbuffer.h /tmp/krava/usr/include/traceevent/event-utils.h /tmp/krava/usr/include/traceevent/event-parse.h Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-if70lj3zhdc3csdqm5webjvc@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf tools: Sync copy of x86's syscall tableArnaldo Carvalho de Melo
To get up to the recent compat pread/pwrite changes, that albeit not being used by 'perf trace' due to some raw_syscalls tracepoint limitations, trigger this warning when building perf: Warning: x86_64's syscall_64.tbl differs from kernel Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ilgqhxd9ubkg5f66bx0bht2t@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf script: Support insn and insnlenAndi Kleen
When looking at Intel PT traces with perf script it is useful to have some indication of the instruction. Dump the instruction bytes and instruction length, which can be used for simple pattern analysis in scripts. % perf record -e intel_pt// foo % perf script --itrace=i0ns -F ip,insn,insnlen ffffffff8101232f ilen: 5 insn: 0f 1f 44 00 00 ffffffff81012334 ilen: 1 insn: 5b ffffffff81012335 ilen: 1 insn: 5d ffffffff81012336 ilen: 1 insn: c3 ffffffff810123e3 ilen: 1 insn: 5b ffffffff810123e4 ilen: 2 insn: 41 5c ffffffff810123e6 ilen: 1 insn: 5d ffffffff810123e7 ilen: 1 insn: c3 ffffffff810124a6 ilen: 2 insn: 31 c0 ffffffff810124a8 ilen: 9 insn: 41 83 bc 24 a8 01 00 00 01 ffffffff810124b1 ilen: 2 insn: 75 87 ... Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: http://lkml.kernel.org/r/1475847747-30994-4-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf intel-pt/bts: Report instruction bytes and length in sampleAndi Kleen
Change Intel PT and BTS to pass up the length and the instruction bytes of the decoded or sampled instruction in the perf sample. The decoder already knows this information, we just need to pass it up. Since it is only a couple of movs it is not very expensive. Handle instruction cache too. Make sure ilen is always initialized. Used in the next patch. [Adrian: re-base on top (and adjust for) instruction buffer size tidy-up] [Adrian: add BTS support and adjust commit message accordingly] Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Link: http://lkml.kernel.org/r/1475847747-30994-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf intel-pt/bts: Tidy instruction buffer size usageAdrian Hunter
Tidy instruction buffer size usage in preparation for copying the instruction bytes onto samples. The instruction buffer is presently used for debugging, so rename its size macro from INTEL_PT_INSN_DBG_BUF_SZ to INTEL_PT_INSN_BUF_SZ, and use it everywhere. Note that the maximum instruction size is 15 which is a less efficient size to copy than 16, which is why a separate buffer size is used. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1475847747-30994-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-22Merge tag 'perf-c2c-for-mingo-20161021' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull new 'perf c2c' tool from Arnaldo Carvalho de Melo: - The 'perf c2c' tool provides means for Shared Data C2C/HITM analysis. It allows you to track down cacheline contention. The tool is based on x86's load latency and precise store facility events provided by Intel CPUs. It was tested by Joe Mario and has proven to be useful, finding some cacheline contentions. Joe also wrote a blog about c2c tool with examples: https://joemario.github.io/blog/2016/09/01/c2c-blog/ Excerpt of the content on this site: --- At a high level, “perf c2c” will show you: * The cachelines where false sharing was detected. * The readers and writers to those cachelines, and the offsets where those accesses occurred. * The pid, tid, instruction addr, function name, binary object name for those readers and writers. * The source file and line number for each reader and writer. * The average load latency for the loads to those cachelines. * Which numa nodes the samples a cacheline came from and which CPUs were involved. Using perf c2c is similar to using the Linux perf tool today. First collect data with “perf c2c record” Then generate a report output with “perf c2c report” --- There one finds extensive details on using the tool, with tips on reducing the volume of samples while still capturing enough to do its job. (Dick Fowles, Joe Mario, Don Zickus, Jiri Olsa) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-21perf c2c report: Add --show-all optionJiri Olsa
Normally we limit the main list to contain only entries with HITM % value > 0.0005, but it might be useful to display all captured entries. Adding --show-all option for that. Requested-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-nokgjdwikbegec5jzj4mxhqc@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-21perf c2c report: Add --no-source optionJiri Olsa
Add a possibility to disable source line column with new --no-source option. It source line data could take lot of time to retrieve, so it could be a performance burden for big data. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-8p6s2727fq8nbsm3it5gix3p@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-21perf c2c: Add man page and creditsJiri Olsa
Add man page for c2c command and credits to builtin-c2c.c file. Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-twbp391v8v9f5idp584hlfov@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-21perf c2c report: Add help windowsJiri Olsa
Adding help windows to display key/action mappings for both browsers. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-zni4apopx6a9eyxsosm1ebh1@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-21perf c2c report: Iterate node display in browserJiri Olsa
Adding TUI support to switch between Node entry versions in real time with 'n' key. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-xqbw4h4dxig54wff7fd14lao@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-21perf c2c report: Add support to manage symbol name lengthJiri Olsa
The width of symbol and source line entries could get really long and not convenient to display. Adding support to display only patrt of such strings and possibility to switch to full length by uing --full-symbols option or 's' key in TUI browser. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-yxf5hfteyfaoi8xrgczqtyha@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-21perf c2c report: Add cacheline index entryJiri Olsa
It's convenient to have an index for each cacheline to help discussions about results over the phone. Add new 'Index' and 'Num' fields in main and single cacheline tables. $ perf c2c report ================================================= Shared Data Cache Line Table ================================================= # # Total Lcl ----- LLC Load Hitm ----- # Index Cacheline records Hitm Total Lcl Rmt ... # ..... .................. ....... ....... ....... ....... ....... # 0 0xffff880036233b40 1 11.11% 1 1 0 1 0xffff88009ccb2900 1 11.11% 1 1 0 2 0xffff8800b5b3bc40 7 11.11% 1 1 0 ... ================================================= Shared Cache Line Distribution Pareto ================================================= # # ----- HITM ----- -- Store Refs -- Data address # Num Rmt Lcl L1 Hit L1 Miss Offset Pid ... # ..... ....... ....... ....... ....... .................. ....... # ------------------------------------------------------------- 0 0 1 0 0 0xffff880036233b40 ------------------------------------------------------------- 0.00% 100.00% 0.00% 0.00% 0x30 0 ------------------------------------------------------------- 1 0 1 0 0 0xffff88009ccb2900 ------------------------------------------------------------- 0.00% 100.00% 0.00% 0.00% 0x28 549 ... Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-4dhfagaz57tvrfjbg8nd2h4u@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-21perf c2c report: Recalc width of global sort entriesJiri Olsa
Using resort callbacks to compute the columns' width. Computing only the global ones, c2c entries have fixed width only. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-zyayvq2u3dzyf3y7i9jza0lw@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-21perf c2c report: Allow to set cacheline sort fieldsJiri Olsa
Allowing user to configure the way the single cacheline data are sorted after being sorted by offset. Adding 'c' option to specify sorting fields for single cacheline: -c, --coalesce <coalesce fields> coalesce fields: pid,tid,iaddr,dso It's allowed to use following combination of fields: pid - process pid tid - process tid iaddr - code address dso - shared object Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-aka8z31umxoq2gqr5mjd81zr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-21perf c2c report: Add support to choose local HITMsJiri Olsa
Currently we sort and limit displayed data based on the remote HITMs count. Adding support to switch to local HITMs via --display option: --display ... lcl,rmt Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-inykbom2f19difvsu1e18avr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-21perf c2c report: Limit the cachelines table entriesJiri Olsa
Add a limit for entries number of the cachelines table entries. By default now it's the 0.0005% minimum of remote HITMs. Also display only cachelines with remote hitm or store data. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-inykbom2f19difvsu1e18avr@git.kernel.org [ Disabled for now ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-21perf c2c report: Allow to report callchainsJiri Olsa
Add --call-graph option to properly setup callchain code. Adding default settings to display callchains whenever they are stored in the perf.data. Committer Notes: Testing it: [root@jouet ~]# perf c2c record -a -g sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 5.331 MB perf.data (4263 samples) ] [root@jouet ~]# perf evlist -v cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1 [root@jouet ~]# perf c2c report --stats ================================================= Trace Event Information ================================================= Total records : 4263 Locked Load/Store Operations : 220 Load Operations : 2130 Loads - uncacheable : 1 Loads - IO : 7 Loads - Miss : 86 Loads - no mapping : 5 Load Fill Buffer Hit : 609 Load L1D hit : 612 ================================================= Trace Event Information ================================================= Total records : 4263 Locked Load/Store Operations : 220 Load Operations : 2130 Loads - uncacheable : 1 Loads - IO : 7 Loads - Miss : 86 Loads - no mapping : 5 Load Fill Buffer Hit : 609 Load L1D hit : 612 Load L2D hit : 27 Load LLC hit : 607 Load Local HITM : 15 Load Remote HITM : 0 Load Remote HIT : 0 Load Local DRAM : 176 Load Remote DRAM : 0 Load MESI State Exclusive : 176 Load MESI State Shared : 0 Load LLC Misses : 176 LLC Misses to Local DRAM : 100.0% LLC Misses to Remote DRAM : 0.0% LLC Misses to Remote cache (HIT) : 0.0% LLC Misses to Remote cache (HITM) : 0.0% Store Operations : 2133 Store - uncacheable : 0 Store - no mapping : 1 Store L1D Hit : 1967 Store L1D Miss : 165 No Page Map Rejects : 145 Unable to parse data source : 0 ================================================= Global Shared Cache Line Event Information ================================================= Total Shared Cache Lines : 15 Load HITs on shared lines : 26 Fill Buffer Hits on shared lines : 7 L1D hits on shared lines : 3 L2D hits on shared lines : 0 LLC hits on shared lines : 16 Locked Access on shared lines : 2 Store HITs on shared lines : 8 Store L1D hits on shared lines : 7 Total Merged records : 23 ================================================= c2c details ================================================= Events : cpu/mem-loads,ldlat=30/P : cpu/mem-stores/P [root@jouet ~]# [root@jouet ~]# perf c2c report Shared Data Cache Line Table (2378 entries) Total --- LLC Load Hitm -- -- Store Reference - - Load Dram - LLC Total - Core Load Hit - Cacheline records %hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2 - 0xffff880024380c00 10 0.00% 0 0 0 6 6 0 0 0 0 4 1 3 0 - 0.13% _raw_spin_lock_irqsave - 0.07% ep_poll sys_epoll_wait do_syscall_64 return_from_SYSCALL_64 + 0x103573 - 0.05% ep_poll_callback __wake_up_common - __wake_up_sync_key - 0.02% pipe_read __vfs_read vfs_read sys_read do_syscall_64 return_from_SYSCALL_64 0xfdad + 0.02% sock_def_readable + 0.02% ep_scan_ready_list.constprop.12 + 0.00% mutex_lock + 0.00% __wake_up_common + 0xffff880024380c40 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0 + 0xffff880024380c80 1 0.00% 0 0 0 0 0 0 0 0 0 1 0 0 0 - 0xffff8800243e9f00 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0 enqueue_entity enqueue_task_fair activate_task ttwu_do_activate try_to_wake_up wake_up_process hrtimer_wakeup __hrtimer_run_queues hrtimer_interrupt local_apic_timer_interrupt smp_apic_timer_interrupt apic_timer_interrupt cpuidle_enter call_cpuidle help ------------- And when presing 'd' to see the cacheline details: Cacheline 0xffff880024380c00 ----- HITM ----- -- Store Refs -- --------- cycles ----- cpu Rmt Lcl L1 Hit L1 Miss Off Pid Tid rmt hitm lcl hitm load cnt Symbol - 0.00% 0.00% 100.00% 0.00% 0x0 1473 1474:Chrome_ChildIOT 0 0 41 2 [k] _raw_spin_lock_irqsave [kernel] - _raw_spin_lock_irqsave - 51.52% ep_poll sys_epoll_wait do_syscall_64 return_from_SYSCALL_64 - 0x103573 47.19% 0 4.33% 0xc30bd - 35.93% ep_poll_callback __wake_up_common - __wake_up_sync_key - 18.20% pipe_read __vfs_read vfs_read sys_read do_syscall_64 return_from_SYSCALL_64 0xfdad - 17.73% sock_def_readable unix_stream_sendmsg sock_sendmsg ___sys_sendmsg __sys_sendmsg sys_sendmsg do_syscall_64 return_from_SYSCALL_64 __GI___libc_sendmsg 0x12c036af1fc0 0x16a4050 0x894928ec83485354 + 12.45% ep_scan_ready_list.constprop.12 + 0.00% 0.00% 0.00% 0.00% 0x8 1473 1474:Chrome_ChildIOT 0 0 102 1 [k] mutex_lock [kernel] + 0.00% 0.00% 0.00% 0.00% 0x38 1473 1473:chrome 0 0 88 1 [k] __wake_up_common [kernel] help Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-inykbom2f19difvsu1e18avr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-21perf c2c report: Add c2c related stats stdio outputJiri Olsa
Display c2c related configuration options/setup. So far it's output of monitored events: $ perf c2c report --stats ... ================================================= c2c details ================================================= Events : cpu/mem-loads,ldlat=50/pp : cpu/mem-stores/pp Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-ypz84f3a9fumyttrxurm458z@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>