Age | Commit message (Collapse) | Author |
|
It would be useful to support sorting for all blocks by the sampled
cycles percent per block. This is useful to concentrate on the globally
hottest blocks.
This patch implements a new option "--total-cycles" which sorts all
blocks by 'Sampled Cycles%'. The 'Sampled Cycles%' is the percent:
percent = block sampled cycles aggregation / total sampled cycles
Note that, this patch only supports "--stdio" mode.
For example,
# perf record -b ./div
# perf report --total-cycles --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 2M of event 'cycles'
# Event count (approx.): 2753248
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
# ............... .............. ........... .......... ................................................ .................
#
26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div
15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so
5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div
4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so
4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div
3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div
3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so
3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so
2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so
2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so
2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so
2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div
1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so
1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div
1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so
0.25% 182.5K 0.02% 1 [random_r.c:388 -> random_r.c:391] libc-2.27.so
0.00% 48 1.07% 48 [x86_pmu_enable+284 -> x86_pmu_enable+298] [kernel.kallsyms]
0.00% 74 1.64% 74 [vm_mmap_pgoff+0 -> vm_mmap_pgoff+92] [kernel.kallsyms]
0.00% 73 1.62% 73 [vm_mmap+0 -> vm_mmap+48] [kernel.kallsyms]
0.00% 63 0.69% 31 [up_write+0 -> up_write+34] [kernel.kallsyms]
0.00% 13 0.29% 13 [setup_arg_pages+396 -> setup_arg_pages+413] [kernel.kallsyms]
0.00% 3 0.07% 3 [setup_arg_pages+418 -> setup_arg_pages+450] [kernel.kallsyms]
0.00% 616 6.84% 308 [security_mmap_file+0 -> security_mmap_file+72] [kernel.kallsyms]
0.00% 23 0.51% 23 [security_mmap_file+77 -> security_mmap_file+87] [kernel.kallsyms]
0.00% 4 0.02% 1 [sched_clock+0 -> sched_clock+4] [kernel.kallsyms]
0.00% 4 0.02% 1 [sched_clock+9 -> sched_clock+12] [kernel.kallsyms]
0.00% 1 0.02% 1 [rcu_nmi_exit+0 -> rcu_nmi_exit+9] [kernel.kallsyms]
Committer testing:
This should provide material for hours of endless joy, both from looking
for suspicious things in the implementation of this patch, such as the
top one:
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
As well from things that look legit:
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux]
:-)
Very short system wide taken branches session:
# perf record -h -b
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-b, --branch-any sample any taken branches
#
# perf record -b
^C[ perf record: Woken up 595 times to write data ]
[ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ]
#
# perf evlist -v
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
#
# perf report --total-cycles --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 6M of event 'cycles'
# Event count (approx.): 6299936
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
# ............... .............. ........... .......... ...................................................................... ....................
#
2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so
0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux]
0.56% 541.8K 0.09% 672 [compiler.h:199 -> common.c:300] [kernel.vmlinux]
0.39% 293.2K 0.01% 104 [list_debug.c:43 -> list_debug.c:61] [kernel.vmlinux]
0.36% 278.6K 0.03% 272 [entry_64.S:1289 -> entry_64.S:1308] [kernel.vmlinux]
0.30% 260.8K 0.07% 564 [clear_page_64.S:47 -> clear_page_64.S:50] [kernel.vmlinux]
0.28% 215.3K 0.05% 369 [traps.c:623 -> traps.c:628] [kernel.vmlinux]
0.23% 178.1K 0.04% 278 [entry_64.S:271 -> entry_64.S:275] [kernel.vmlinux]
0.20% 152.6K 0.09% 706 [paravirt.c:177 -> paravirt.c:179] [kernel.vmlinux]
0.20% 155.8K 0.05% 373 [entry_64.S:153 -> entry_64.S:175] [kernel.vmlinux]
0.18% 136.6K 0.03% 222 [msr.h:105 -> msr.h:166] [kernel.vmlinux]
0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux]
0.16% 118.3K 0.01% 44 [entry_64.S:632 -> entry_64.S:657] [kernel.vmlinux]
0.14% 104.5K 0.00% 28 [rwsem.c:1541 -> rwsem.c:1544] [kernel.vmlinux]
0.13% 99.2K 0.01% 53 [spinlock.c:150 -> spinlock.c:152] [kernel.vmlinux]
0.13% 95.5K 0.00% 35 [swap.c:456 -> swap.c:471] [kernel.vmlinux]
0.12% 96.2K 0.05% 407 [copy_user_64.S:175 -> copy_user_64.S:209] [kernel.vmlinux]
0.11% 85.9K 0.00% 31 [swap.c:400 -> page-flags.h:188] [kernel.vmlinux]
0.10% 73.0K 0.01% 52 [paravirt.h:763 -> list.h:131] [kernel.vmlinux]
0.07% 56.2K 0.03% 214 [filemap.c:1524 -> filemap.c:1557] [kernel.vmlinux]
0.07% 54.2K 0.02% 145 [memory.c:1032 -> memory.c:1049] [kernel.vmlinux]
0.07% 50.3K 0.00% 39 [mmzone.c:49 -> mmzone.c:69] [kernel.vmlinux]
0.06% 48.3K 0.01% 40 [paravirt.h:768 -> page_alloc.c:3304] [kernel.vmlinux]
0.06% 46.7K 0.02% 155 [memory.c:1032 -> memory.c:1056] [kernel.vmlinux]
0.06% 46.9K 0.01% 103 [swap.c:867 -> swap.c:902] [kernel.vmlinux]
0.06% 47.8K 0.00% 34 [entry_64.S:1201 -> entry_64.S:1202] [kernel.vmlinux]
-----------------------------------------------------------
v7:
---
Use use_browser in report__browse_block_hists for supporting
stdio and potential tui mode.
v6:
---
Create report__browse_block_hists in block-info.c (codes are
moved from builtin-report.c). It's called from
perf_evlist__tty_browse_hists.
v5:
---
1. Move all block functions to block-info.c
2. Move the code of setting ms in block hist_entry to
other patch.
v4:
---
1. Use new option '--total-cycles' to replace
'-s total_cycles' in v3.
2. Move block info collection out of block info
printing.
v3:
---
1. Use common function block_info__process_sym to
process the blocks per symbol.
2. Remove the nasty hack for skipping calculation
of column length
3. Some minor cleanup
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191107074719.26139-6-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Since 'perf top' shares the histogram browser with 'perf report', then
the same explanation in the previous cset applies.
An additional example uses a pair of SDT events available for systemtap:
# perf probe --exec=/usr/bin/stap '%*:*'
Added new events:
sdt_stap:benchmark__thread__start (on %* in /usr/bin/stap)
sdt_stap:benchmark (on %* in /usr/bin/stap)
sdt_stap:benchmark__thread__end (on %* in /usr/bin/stap)
sdt_stap:pass6__start (on %* in /usr/bin/stap)
sdt_stap:pass6__end (on %* in /usr/bin/stap)
sdt_stap:pass5__start (on %* in /usr/bin/stap)
sdt_stap:pass5__end (on %* in /usr/bin/stap)
sdt_stap:pass0__start (on %* in /usr/bin/stap)
sdt_stap:pass0__end (on %* in /usr/bin/stap)
sdt_stap:pass1a__start (on %* in /usr/bin/stap)
sdt_stap:pass1b__start (on %* in /usr/bin/stap)
sdt_stap:pass1__end (on %* in /usr/bin/stap)
sdt_stap:pass2__start (on %* in /usr/bin/stap)
sdt_stap:pass2__end (on %* in /usr/bin/stap)
sdt_stap:pass3__start (on %* in /usr/bin/stap)
sdt_stap:pass3__end (on %* in /usr/bin/stap)
sdt_stap:pass4__start (on %* in /usr/bin/stap)
sdt_stap:pass4__end (on %* in /usr/bin/stap)
sdt_stap:benchmark__start (on %* in /usr/bin/stap)
sdt_stap:benchmark__end (on %* in /usr/bin/stap)
sdt_stap:cache__get (on %* in /usr/bin/stap)
sdt_stap:cache__clean (on %* in /usr/bin/stap)
sdt_stap:cache__add__module (on %* in /usr/bin/stap)
sdt_stap:cache__add__source (on %* in /usr/bin/stap)
sdt_stap:stap_system__complete (on %* in /usr/bin/stap)
sdt_stap:stap_system__start (on %* in /usr/bin/stap)
sdt_stap:stap_system__spawn (on %* in /usr/bin/stap)
sdt_stap:stap_system__fork (on %* in /usr/bin/stap)
sdt_stap:intern_string (on %* in /usr/bin/stap)
sdt_stap:client__start (on %* in /usr/bin/stap)
sdt_stap:client__end (on %* in /usr/bin/stap)
You can now use it in all perf tools, such as:
perf record -e sdt_stap:client__end -aR sleep 1
#
From these we're use the two below to run systemtap's test suite:
# perf record -e sdt_stap:pass2__*,cycles:P make installcheck > /dev/null
^C[ perf record: Woken up 8 times to write data ]
[ perf record: Captured and wrote 2.691 MB perf.data (39638 samples) ]
Terminated
# perf script | grep sdt_stap
stap 28979 [000] 19424.302660: sdt_stap:pass2__start: (561b9a537de3) arg1=140730364262544
stap 28979 [000] 19424.333083: sdt_stap:pass2__end: (561b9a53a9e1) arg1=140730364262544
stap 29045 [006] 19424.933460: sdt_stap:pass2__start: (563edddcede3) arg1=140722674883152
stap 29045 [006] 19424.963794: sdt_stap:pass2__end: (563edddd19e1) arg1=140722674883152
# perf script | grep cycles | wc -l
39634
#
Looking at the whole perf.data file:
[root@quaco testsuite]# perf report | grep cycles:P -A25
# Samples: 39K of event 'cycles:P'
# Event count (approx.): 34044267368
#
# Overhead Command Shared Object Symbol
# ........ ....... .................... ................................
#
3.50% cc1 cc1 [.] ht_lookup_with_hash
3.04% cc1 cc1 [.] _cpp_lex_token
2.11% cc1 cc1 [.] ggc_internal_alloc
1.83% cc1 cc1 [.] cpp_get_token_with_location
1.68% cc1 libc-2.29.so [.] _int_malloc
1.41% cc1 cc1 [.] linemap_position_for_column
1.25% cc1 cc1 [.] ggc_internal_cleared_alloc
1.20% cc1 cc1 [.] c_lex_with_flags
1.18% cc1 cc1 [.] get_combined_adhoc_loc
1.05% cc1 libc-2.29.so [.] malloc
1.01% cc1 libc-2.29.so [.] _int_free
0.96% stap stap [.] std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Identity, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, stringtable_hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_insert<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__detail::_AllocNode<std::allocator<std::__detail::_Hash_node<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, true> > > >
0.78% stap stap [.] lexer::scan
0.74% cc1 cc1 [.] _cpp_lex_direct
0.70% cc1 cc1 [.] pop_scope
0.70% cc1 cc1 [.] c_parser_declspecs
0.69% stap libc-2.29.so [.] _int_malloc
0.68% cc1 cc1 [.] htab_find_slot
0.68% cc1 [kernel.vmlinux] [k] prepare_exit_to_usermode
0.64% cc1 [kernel.vmlinux] [k] clear_page_erms
[root@quaco testsuite]#
And now only what happens in slices demarcated by those start/end SDT
events:
[root@quaco testsuite]# perf report --switch-on=sdt_stap:pass2__start --switch-off=sdt_stap:pass2__end | grep cycles:P -A100
# Samples: 240 of event 'cycles:P'
# Event count (approx.): 206491934
#
# Overhead Command Shared Object Symbol
# ........ ....... ................... ................................................
#
38.99% stap stap [.] systemtap_session::register_library_aliases
19.47% stap stap [.] match_key::operator<
15.01% stap libc-2.29.so [.] __memcmp_avx2_movbe
5.19% stap libc-2.29.so [.] _int_malloc
2.50% stap libstdc++.so.6.0.26 [.] std::_Rb_tree_insert_and_rebalance
2.30% stap stap [.] match_node::build_no_more
2.07% stap libc-2.29.so [.] malloc
1.66% stap stap [.] std::_Rb_tree<match_key, std::pair<match_key const, match_node*>, std::_Select1st<std::pair<match_key const, match_node*> >, std::less<match_key>, std::allocator<std::pair<match_key const, match_node*> > >::find
1.66% stap stap [.] match_node::bind
1.58% stap [kernel.vmlinux] [k] prepare_exit_to_usermode
1.17% stap [kernel.vmlinux] [k] native_irq_return_iret
0.87% stap stap [.] 0x0000000000032ec4
0.77% stap libstdc++.so.6.0.26 [.] std::_Rb_tree_increment
0.47% stap stap [.] std::vector<derived_probe_builder*, std::allocator<derived_probe_builder*> >::_M_realloc_insert<derived_probe_builder* const&>
0.47% stap [kernel.vmlinux] [k] get_page_from_freelist
0.47% stap [kernel.vmlinux] [k] swapgs_restore_regs_and_return_to_usermode
0.47% stap [kernel.vmlinux] [k] do_user_addr_fault
0.46% stap [kernel.vmlinux] [k] __pagevec_lru_add_fn
0.46% stap stap [.] std::_Rb_tree<match_key, std::pair<match_key const, match_node*>, std::_Select1st<std::pair<match_key const, match_node*> >, std::less<match_key>, std::allocator<std::pair<match_key const, match_node*> > >::_M_emplace_unique<std::pair<match_key, match_node*> >
0.42% stap libstdc++.so.6.0.26 [.] 0x00000000000c18fa
0.40% stap [kernel.vmlinux] [k] interrupt_entry
0.40% stap [kernel.vmlinux] [k] update_load_avg
0.40% stap [kernel.vmlinux] [k] __intel_pmu_disable_all
0.40% stap [kernel.vmlinux] [k] clear_page_erms
0.39% stap [kernel.vmlinux] [k] __mod_node_page_state
0.39% stap [kernel.vmlinux] [k] error_entry
0.39% stap [kernel.vmlinux] [k] sync_regs
0.38% stap [kernel.vmlinux] [k] __handle_mm_fault
0.38% stap stap [.] derive_probes
#
# (Tip: System-wide collection from all CPUs: perf record -a)
#
[root@quaco testsuite]#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: William Cohen <wcohen@redhat.com>
Link: https://lkml.kernel.org/n/tip-408hvumcnyn93a0auihnawew@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
- Fix a typo in the man page
- Fix a tip that doesn't make any sense.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190628220900.13741-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Currently only a single explicit time range is accepted. Add support for
multiple ranges separated by spaces, which requires the string to be
quoted. Update the time utils test accordingly.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190604130017.31207-20-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Correct some punctuation and spelling and correct the format to show
that the time resolution is nanoseconds not microseconds.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190604130017.31207-16-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Now 'perf report' can show whole time periods with 'perf script', but
the user still has to find individual samples of interest manually.
It would be expensive and complicated to search for the right samples in
the whole perf file. Typically users only need to look at a small number
of samples for useful analysis.
Also the full scripts tend to show samples of all CPUs and all threads
mixed up, which can be very confusing on larger systems.
Add a new --samples option to save a small random number of samples per
hist entry.
Use a reservoir sample technique to select a representatve number of
samples.
Then allow browsing the samples using 'perf script' as part of the hist
entry context menu. This automatically adds the right filters, so only
the thread or cpu of the sample is displayed. Then we use less' search
functionality to directly jump the to the time stamp of the selected
sample.
It uses different menus for assembler and source display. Assembler
needs xed installed and source needs debuginfo.
Currently it only supports as many samples as fit on the screen due to
some limitations in the slang ui code.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190311174605.GA29294@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add a time sort key to perf report to display samples for different time
quantums separately. This allows easier analysis of workloads that
change over time, and also will allow looking at the context of samples.
% perf record ...
% perf report --sort time,overhead,symbol --time-quantum 1ms --stdio
...
0.67% 277061.87300 [.] _dl_start
0.50% 277061.87300 [.] f1
0.50% 277061.87300 [.] f2
0.33% 277061.87300 [.] main
0.29% 277061.87300 [.] _dl_lookup_symbol_x
0.29% 277061.87300 [.] dl_main
0.29% 277061.87300 [.] do_lookup_x
0.17% 277061.87300 [.] _dl_debug_initialize
0.17% 277061.87300 [.] _dl_init_paths
0.08% 277061.87300 [.] check_match
0.04% 277061.87300 [.] _dl_count_modids
1.33% 277061.87400 [.] f1
1.33% 277061.87400 [.] f2
1.33% 277061.87400 [.] main
1.17% 277061.87500 [.] main
1.08% 277061.87500 [.] f1
1.08% 277061.87500 [.] f2
1.00% 277061.87600 [.] main
0.83% 277061.87600 [.] f1
0.83% 277061.87600 [.] f2
1.00% 277061.87700 [.] main
Committer notes:
Rename 'time' argument to hist_time() to htime to overcome this in older
distros:
cc1: warnings being treated as errors
util/hist.c: In function 'hist_time':
util/hist.c:251: error: declaration of 'time' shadows a global declaration
/usr/include/time.h:186: error: shadowed declaration is here
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190311144502.15423-4-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Many workloads change over time. 'perf report' currently aggregates the
whole time range reported in perf.data.
This patch adds an option for a time quantum to quantisize the perf.data
over time.
This just adds the option, will be used in follow on patches for a time
sort key.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20190305144758.12397-6-andi@firstfloor.org
[ Use NSEC_PER_[MU]SEC ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Upcoming changes add timestamp output in perf report. Add a --ns
argument similar to perf script to support nanoseconds resolution when
needed.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20190305144758.12397-5-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Go over the tools/ files that are maintained in Arnaldo's tree and
fix common typos: half of them were in comments, the other half
in JSON files.
No change in functionality intended.
Committer notes:
This was split from a larger patch as there are code that is,
additionally, maintained outside the kernel tree, so to ease cherry
picking and/or backporting, split this into multiple patches.
In this particular case, it affects documentation, so may be interesting
to cherry pick as it is information that is presented to the user.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20181203102200.GA104797@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add explanations for new columns "IPC" and "IPC coverage" in perf
documentation.
v5:
---
Update the description according to Ingo's comments.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1543586097-27632-5-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Set annotation percent type from following choices:
global-period, local-period, global-hits, local-hits
With following report option setup the percent type will be passed to
annotation browser:
$ perf report --percent-type period-local
The local/global keywords set if the percentage is computed in the scope
of the function (local) or the whole data (global). The period/hits
keywords set the base the percentage is computed on - the samples period
or the number of samples (hits).
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-21-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add DSO size to perf report/top sort output list.
This includes adding a map__size fn to map.h, which is
approximately equal to the DSO data file_size:
DSO file size map (end-start) file / (end-start)
libwebkit2gtk-4.0.so.37.24.9 43260072 41295872 95%
libglib-2.0.so.0.5400.1 1125680 1118208 99%
libc-2.26.so 1960656 1925120 101%
libdbus-1.so.3.14.13 309456 303104 102%
Sample output:
$ ./perf report -s dso_size,dso
Samples: 2K of event 'cycles:uppp', Event count (approx.): 128373340
Overhead DSO size Shared Object
90.62% unknown [unknown]
2.87% 1118208 libglib-2.0.so.0.5400.1
1.92% 303104 libdbus-1.so.3.14.13
1.42% 1925120 libc-2.26.so
0.77% 41295872 libwebkit2gtk-4.0.so.37.24.9
0.61% 335872 libgobject-2.0.so.0.5400.1
0.41% 1052672 libgdk-3.so.0.2200.25
0.36% 106496 libpthread-2.26.so
0.29% 221184 dbus-daemon
0.17% 159744 ld-2.26.so
0.13% 49152 libwayland-client.so.0.3.0
0.12% 1642496 libgio-2.0.so.0.5400.1
0.09% 7327744 libgtk-3.so.0.2200.25
0.09% 12324864 libmozjs-52.so.0.0.0
0.05% 4796416 perf
0.04% 843776 libgjs.so.0.0.0
0.03% 1409024 libmutter-clutter-1.so
Committer testing:
To sort by DSO size, use:
# perf report -F dso_size,dso,overhead -s dso_size
<SNIP>
3465216 libdns-export.so.174.0.1 0.00%
3522560 libgc.so.1.0.3 0.00%
3538944 libbfd-2.29-13.fc27.so 0.59%
3670016 libunistring.so.2.1.0 0.00%
3723264 libguile-2.0.so.22.8.1 0.00%
3776512 libgio-2.0.so.0.5400.3 0.00%
3891200 libc-2.26.so 0.96%
3944448 libmozjs-17.0.so 0.00%
4218880 libperl.so.5.26.1 0.18%
4452352 libpython2.7.so.1.0 0.02%
4472832 perf 0.02%
4603904 git 0.01%
4751360 libcrypto.so.1.1.0g 0.00%
5005312 libslang.so.2.3.1 0.00%
7315456 libgtk-3.so.0.2200.26 0.09%
8818688 i965_dri.so 2.46%
8818688 i965_dri.so (deleted) 1.26%
12414976 libmozjs-52.so.0.0.0 0.03%
23642112 cc1 2.02%
27889664 [kernel.kallsyms] 25.41%
80834560 libxul.so (deleted) 15.68%
98078720 chrome 32.03%
1056964608 [kernel.kallsyms] 1.59%
#
Signed-off-by: Kim Phillips <kim.phillips@arm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180327060956.1c01ebe67a2a941bb4468c6f@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We've had this in 'perf top' for quite a while, useful if one wishes
to force using /proc/kcore to do annotation using the patched kernel
instead of the ELF image it started from, aka vmlinux.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ircpvox4wzsv7gasrpb28fw9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The "mem-loads" event only works when PEBS is enabled, so add the "/p"
("precise") suffix to the examples.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
LPU-Reference: 20180209163909.9240-1-andi@firstfloor.org
Link: https://lkml.kernel.org/n/tip-v0gcd4u9tktrvjjsp6y7ouv4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add support to display group output for if non grouped events are
detected and user forces --group option. Now for non-group events
recorded like:
$ perf record -e 'cycles,instructions' ls
you can still get group output by using --group option
in report:
$ perf report --group --stdio
...
# Overhead Command Shared Object Symbol
# ................ ....... ................ ......................
#
17.67% 0.00% ls libc-2.25.so [.] _IO_do_write@@GLIB
15.59% 25.94% ls ls [.] calculate_columns
15.41% 31.35% ls libc-2.25.so [.] __strcoll_l
...
Committer note:
We should improve on this by making sure that the first line states that
this is not a group, but since the user doesn't have to force group view
when really using grouped events (e.g. '{cycles,instructions}'), the
user better know what is being done...
Requested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180209092734.GB20449@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Previously it was only allowed to use at most 10 time slices in 'perf
report --time'.
This patch removes this limitation.
For example, following command line is OK (12 time slices)
perf report --stdio --time 1%/1,1%/2,1%/3,1%/4,1%/5,1%/6,1%/7,1%/8,1%/9,1%/10,1%/11,1%/12
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1515596433-24653-8-git-send-email-yao.jin@linux.intel.com
[ No need to check for NULL to call free, use zfree ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Similar to --tasks, producing the same output plus /proc/<PID>/maps
similar lines for each mmap record present in a perf.data file.
Please note that not all mmaps are stored, for instance, some of the
non-executable mmaps are only stored when 'perf record --data' is used,
when the user wants to resolve data accesses in addition to asking for
executable mmaps to get the DSO with symtabs.
E.g.:
# perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.018 MB perf.data (7 samples) ]
[root@jouet ~]# perf report --mmaps
# pid tid ppid comm
0 0 -1 |swapper
4137 4137 -1 |sleep
5628a35a1000-5628a37aa000 r-xp 00000000 3147148 /usr/bin/sleep
7fb65ad51000-7fb65b134000 r-xp 00000000 3149795 /usr/lib64/libc-2.26.so
7fb65b134000-7fb65b35e000 r-xp 00000000 3149715 /usr/lib64/ld-2.26.so
7ffd94b9f000-7ffd94ba1000 r-xp 00000000 0 [vdso]
#
# perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data (8 samples) ]
# perf report --mmaps
# pid tid ppid comm
0 0 -1 |swapper
4161 4161 -1 |sleep
55afae69a000-55afae8a3000 r-xp 00000000 3147148 /usr/bin/sleep
7f569f00d000-7f569f3f0000 r-xp 00000000 3149795 /usr/lib64/libc-2.26.so
7f569f3f0000-7f569f61a000 r-xp 00000000 3149715 /usr/lib64/ld-2.26.so
7fff6fffe000-7fff70000000 r-xp 00000000 0 [vdso]
#
# perf record time sleep 1
0.00user 0.00system 0:01.00elapsed 0%CPU (0avgtext+0avgdata 2156maxresident)k
0inputs+0outputs (0major+73minor)pagefaults 0swaps
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data (14 samples) ]
# perf report --mmaps
# pid tid ppid comm
0 0 -1 |swapper
4281 4281 -1 |time
560560dca000-560560fcf000 r-xp 00000000 3190458 /usr/bin/time
7fc175196000-7fc175579000 r-xp 00000000 3149795 /usr/lib64/libc-2.26.so
7fc175579000-7fc1757a3000 r-xp 00000000 3149715 /usr/lib64/ld-2.26.so
7ffc924f6000-7ffc924f8000 r-xp 00000000 0 [vdso]
4282 4282 4281 | sleep
560560dca000-560560fcf000 r-xp 00000000 3190458 /usr/bin/time
564b4de3c000-564b4e045000 r-xp 00000000 3147148 /usr/bin/sleep
7f6a5a716000-7f6a5aaf9000 r-xp 00000000 3149795 /usr/lib64/libc-2.26.so
7f6a5aaf9000-7f6a5ad23000 r-xp 00000000 3149715 /usr/lib64/ld-2.26.so
7fc175196000-7fc175579000 r-xp 00000000 3149795 /usr/lib64/libc-2.26.so
7fc175579000-7fc1757a3000 r-xp 00000000 3149715 /usr/lib64/ld-2.26.so
7ffc924f6000-7ffc924f8000 r-xp 00000000 0 [vdso]
7ffcec7e6000-7ffcec7e8000 r-xp 00000000 0 [vdso]
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-zulwdlg5rfowogr1qznorvvc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add --tasks option to display monitored tasks stored in perf.data.
Displaying pid/tid/ppid plus the command string aligned to distinguish
parent and child tasks.
$ perf record -a
...
$ perf report --tasks
# pid tid ppid comm
0 0 -1 |swapper
2 2 0 | kthreadd
14080 14080 2 | kworker/u17:1
4 4 2 | kworker/0:0H
6 6 2 | mm_percpu_wq
...
1 1 0 | systemd
23242 23242 1 | firefox
23242 23298 23242 | Cache2 I/O
23242 23304 23242 | GMPThread
...
1195 1195 1 | login
1611 1611 1195 | bash
1639 1639 1611 | startx
1663 1663 1639 | xinit
1673 1673 1663 | xmonad-x86_64-l
23939 23939 1673 | xterm
23941 23941 23939 | bash
23963 23963 23941 | mutt
24954 24954 23963 | offlineimap
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180107160356.28203-13-jolsa@kernel.org
[ Make it --tasks, plural, --task works as well, as its unambiguous ]
[ Use machine__find_thread(), not findnew(), as pointed out by Namhyung ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add --stats option to display quick data statistics of event numbers,
without any further processing, like the one at the end of the perf
report -D command.
$ perf report --stat
Aggregated stats:
TOTAL events: 4566
MMAP events: 113
LOST events: 19
COMM events: 3
FORK events: 400
SAMPLE events: 3315
MMAP2 events: 32
FINISHED_ROUND events: 681
THREAD_MAP events: 1
CPU_MAP events: 1
TIME_CONV events: 1
I found this useful when hunting lost events for another change.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180107160356.28203-12-jolsa@kernel.org
[ Rename it to --stats, plural ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
perf report has a --time option to limit the time range of output. It
only supports absolute time.
Now this option is extended to support multiple time ranges and support
the percent of time.
For example:
1. Select the first and second 10% time slices:
perf report --time 10%/1,10%/2
2. Select from 0% to 10% and 30% to 40% slices:
perf report --time 0%-10%,30%-40%
Changelog:
v6: Fix the merge issue with latest perf/core branch.
No functional changes.
v5: Add checking of first/last sample time to detect if it's recorded
in perf.data. If it's not recorded, returns error message to user.
v4: Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample
v3: Since the definitions of first_sample_time/last_sample_time
are moved from perf_session to perf_evlist so change the
related code.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512738826-2628-6-git-send-email-yao.jin@linux.intel.com
[ Add missing colons at end of examples in the man page ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Now that we have caches in place to speed up the process of finding
inlined frames and srcline information repeatedly, we can enable this
useful option by default.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-6-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add a new sort option "phys_daddr" for --mem-mode sort. With this
option applied, perf can sort and report by sample's physical address.
Signed-off-by: Kan Liang <kan.liang@intel.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1504026672-7304-3-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Mostly in the documentation.
Signed-off-by: Kim Phillips <kim.phillips@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170503131350.cebeecd8bd0f2968417626ab@arm.com
[ Fix spelling of "parameter" in one of the spell-checked lines ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Often it is interesting to know how costly a given source line is in
total. Previously, one had to build these sums manually based on all
addresses that pointed to the same source line. This patch introduces
srcline as a sort key, which will do the aggregation for us.
Paired with the recent addition of showing inline frames, this makes
perf report much more useful for many C++ work loads.
The following shows the new feature in action. First, let's show the
status quo output when we sort by address. The result contains many hist
entries that generate the same output:
~~~~~~~~~~~~~~~~
$ perf report --stdio --inline -g address
# Children Self Command Shared Object Symbol
# ........ ........ ............ ................... .........................................
#
99.89% 35.34% cpp-inlining cpp-inlining [.] main
|
|--64.55%--main complex:655
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/complex:664 (inline)
| |
| |--60.31%--hypot +20
| | |
| | |--8.52%--__hypot_finite +273
| | |
| | |--7.32%--__hypot_finite +411
...
--35.34%--_start +4194346
__libc_start_main +241
|
|--6.65%--main random.tcc:3326
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
|
|--2.70%--main random.tcc:3326
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
|
|--1.69%--main random.tcc:3326
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
...
~~~~~~~~~~~~~~~~
With this patch and `-g srcline` we instead get the following output:
~~~~~~~~~~~~~~~~
$ perf report --stdio --inline -g srcline
# Children Self Command Shared Object Symbol
# ........ ........ ............ ................... .........................................
#
99.89% 35.34% cpp-inlining cpp-inlining [.] main
|
|--64.55%--main complex:655
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/complex:664 (inline)
| |
| |--64.02%--hypot
| | |
| | --59.81%--__hypot_finite
| |
| --0.53%--cabs
|
--35.34%--_start
__libc_start_main
|
|--12.48%--main random.tcc:3326
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
...
~~~~~~~~~~~~~~~~
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20170318214928.9047-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It takes some time to look for inline stack for callgraph addresses. So
it provides new option "--inline" to let user decide if enable this
feature.
--inline:
If a callgraph address belongs to an inlined function, the inline stack
will be printed. Each entry is the inline function name or file/line.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Tested-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1490474069-15823-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This patch introduces a cgroup identifier entry field in perf report to
identify or distinguish data of different cgroups. It uses the device
number and inode number of cgroup namespace, included in perf data with
the new PERF_RECORD_NAMESPACES event, as cgroup identifier.
With the assumption that each container is created with it's own cgroup
namespace, this allows assessment/analysis of multiple containers at
once.
A simple test for this would be to clone a few processes passing
SIGCHILD & CLONE_NEWCROUP flags to each of them, execute shell and run
different workloads on each of those contexts, while running perf
record command with --namespaces option.
Shown below is the output of perf report, sorted with cgroup identifier,
on perf.data generated with the above test scenario, clearly indicating
one context's considerable use of kernel memory in comparison with
others:
$ perf report -s cgroup_id,sample --stdio
#
# Total Lost Samples: 0
#
# Samples: 5K of event 'kmem:kmalloc'
# Event count (approx.): 5965
#
# Overhead cgroup id (dev/inode) Samples
# ........ ..................... ............
#
81.27% 3/0xeffffffb 4848
16.24% 3/0xf00000d0 969
1.16% 3/0xf00000ce 69
0.82% 3/0xf00000cf 49
0.50% 0/0x0 30
While this is a start, there is further scope of improving this. For
example, instead of cgroup namespace's device and inode numbers, dev
and inode numbers of some or all namespaces may be used to distinguish
which processes are running in a given container context.
Also, scripts to map device and inode info to containers sounds
plausible for better tracing of containers.
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/148891933338.25309.756882900782042645.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Commit 2f3f9bcf000b ("perf tools: Add +field argument support for
--field option") by Jiri Olsa <jolsa@kernel.org> introduced +field style
argument support for --field option.
This is useful but not updated documentation. This add a little
description there.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170313083252.23644-1-changbin.du@intel.com
[ Slightly improved the phrase structure ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add new sort key 'symbol_size' to allow user to sort by symbol size, or
(more usefully) display the symbol size using --fields=...,symbol_size.
Committer note:
Testing it together with the recently added -q, to remove the headers,
and using the '+' sign with -s, to add the symbol_size sort order to
the default, which is '-s/--sort comm,dso,symbol':
# perf report -q -s +symbol_size | head -10
10.39% swapper [kernel.vmlinux] [k] intel_idle 270
3.45% swapper [kernel.vmlinux] [k] update_blocked_averages 1546
2.61% swapper [kernel.vmlinux] [k] update_load_avg 1292
2.36% swapper [kernel.vmlinux] [k] update_cfs_shares 240
1.83% swapper [kernel.vmlinux] [k] __hrtimer_run_queues 606
1.74% swapper [kernel.vmlinux] [k] update_cfs_rq_load_avg. 1187
1.66% swapper [kernel.vmlinux] [k] apic_timer_interrupt 152
1.60% CPU 0/KVM [kvm] [k] kvm_set_msr_common 3046
1.60% gnome-shell libglib-2.0.so.0 [.] g_slist_find 37
1.46% gnome-termina libglib-2.0.so.0 [.] g_hash_table_lookup 370
#
Signed-off-by: Charles Baylis <charles.baylis@linaro.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1487943176-13840-1-git-send-email-charles.baylis@linaro.org
[ Use symbol__size(), remove needless %lld + (long long) casting ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The -q/--quiet option is to suppress any message. Sometimes users just
want to see the numbers and it can be used for that case.
Before:
$ perf report | head -15
Failed to open /lib/modules/3.19.3-3-ARCH/kernel/fs/ext4/ext4.ko.gz, continuing without symbols
Failed to open /lib/modules/3.19.3-3-ARCH/kernel/fs/jbd2/jbd2.ko.gz, continuing without symbols
Failed to open /tmp/perf-14507.map, continuing without symbols
...
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 39K of event 'cycles'
# Event count (approx.): 30444796573
#
# Overhead Command Shared Object Symbol
# ........ ........... ................... .........................
#
9.28% swapper [kernel.vmlinux] [k] intel_idle
5.64% swapper [kernel.vmlinux] [k] native_write_msr_safe
1.93% swapper [kernel.vmlinux] [k] __switch_to
1.89% swapper [kernel.vmlinux] [k] menu_select
1.75% sched-pipe [kernel.vmlinux] [k] __switch_to
After:
$ perf report -q | head
9.28% swapper [kernel.vmlinux] [k] intel_idle
5.64% swapper [kernel.vmlinux] [k] native_write_msr_safe
1.93% swapper [kernel.vmlinux] [k] __switch_to
1.89% swapper [kernel.vmlinux] [k] menu_select
1.75% sched-pipe [kernel.vmlinux] [k] __switch_to
1.67% swapper [kernel.vmlinux] [k] cpu_startup_entry
1.48% sched-pipe [kernel.vmlinux] [k] enqueue_entity
1.46% swapper [kernel.vmlinux] [k] __schedule
1.36% swapper [kernel.vmlinux] [k] native_read_tsc
1.34% sched-pipe [kernel.vmlinux] [k] __schedule
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Suggested-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: kernel-team@lge.com
Link: http://lkml.kernel.org/r/20170217081742.17417-4-namhyung@kernel.org
[ Removed builtin-report.c verbose > 0 hunk added to the previous patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The fact that the --children option is enabled by default is buried deep
at the end of the help page, in the overhead calculation section. This
make it explicit right where the option is listed, following the same
way other default options are described
Signed-off-by: Yannick Brosseau <scientist@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/20161202160732.29058-1-scientist@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add option to allow user to control analysis window. e.g., collect data
for time window and analyze a segment of interest within that window.
Committer notes:
Testing it:
Using the perf.data file captured via 'perf kmem record':
# perf report --header-only
# ========
# captured on: Tue Nov 29 16:01:53 2016
# hostname : jouet
# os release : 4.8.8-300.fc25.x86_64
# perf version : 4.9.rc6.g5a6aca
# arch : x86_64
# nrcpus online : 4
# nrcpus avail : 4
# cpudesc : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
# cpuid : GenuineIntel,6,61,4
# total memory : 20254660 kB
# cmdline : /home/acme/bin/perf kmem record usleep 1
# event : name = kmem:kmalloc, , id = { 931980, 931981, 931982, 931983 }, type = 2, size = 112, config = 0x1b9, { sample_period, sample_freq } = 1, sample_typ
# event : name = kmem:kmalloc_node, , id = { 931984, 931985, 931986, 931987 }, type = 2, size = 112, config = 0x1b7, { sample_period, sample_freq } = 1, sampl
# event : name = kmem:kfree, , id = { 931988, 931989, 931990, 931991 }, type = 2, size = 112, config = 0x1b5, { sample_period, sample_freq } = 1, sample_type
# event : name = kmem:kmem_cache_alloc, , id = { 931992, 931993, 931994, 931995 }, type = 2, size = 112, config = 0x1b8, { sample_period, sample_freq } = 1, s
# event : name = kmem:kmem_cache_alloc_node, , id = { 931996, 931997, 931998, 931999 }, type = 2, size = 112, config = 0x1b6, { sample_period, sample_freq } =
# event : name = kmem:kmem_cache_free, , id = { 932000, 932001, 932002, 932003 }, type = 2, size = 112, config = 0x1b4, { sample_period, sample_freq } = 1, sa
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, intel_pt = 7, intel_bts = 6, uncore_arb = 13, cstate_pkg = 15, breakpoint = 5, uncore_cbox_1 = 12, power = 9, software = 1, uncore_im
# HEADER_CACHE info available, use -I to display
# missing features: HEADER_BRANCH_STACK HEADER_GROUP_DESC HEADER_AUXTRACE HEADER_STAT
# ========
#
# # Looking at just the histogram entries for the first event:
#
# perf report | head -33
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 40 of event 'kmem:kmalloc'
# Event count (approx.): 40
#
# Overhead Trace output
# ........ ...............................................................................................................
#
37.50% call_site=ffffffffb91ad3c7 ptr=0xffff88895fc05000 bytes_req=4096 bytes_alloc=4096 gfp_flags=GFP_KERNEL
10.00% call_site=ffffffffb9258416 ptr=0xffff888a1dc61f00 bytes_req=240 bytes_alloc=256 gfp_flags=GFP_KERNEL|__GFP_ZERO
7.50% call_site=ffffffffb9258416 ptr=0xffff888a2640ac00 bytes_req=240 bytes_alloc=256 gfp_flags=GFP_KERNEL|__GFP_ZERO
2.50% call_site=ffffffffb92759ba ptr=0xffff888a26776000 bytes_req=4096 bytes_alloc=4096 gfp_flags=GFP_KERNEL
2.50% call_site=ffffffffb9276864 ptr=0xffff8886f6b82600 bytes_req=136 bytes_alloc=192 gfp_flags=GFP_KERNEL|__GFP_ZERO
2.50% call_site=ffffffffb9276903 ptr=0xffff888aefcf0460 bytes_req=32 bytes_alloc=32 gfp_flags=GFP_KERNEL
2.50% call_site=ffffffffb92ad0ce ptr=0xffff888756c98a00 bytes_req=392 bytes_alloc=512 gfp_flags=GFP_KERNEL
2.50% call_site=ffffffffb92ad0ce ptr=0xffff888756c9ba00 bytes_req=504 bytes_alloc=512 gfp_flags=GFP_KERNEL
2.50% call_site=ffffffffb92ad301 ptr=0xffff888a31747600 bytes_req=128 bytes_alloc=128 gfp_flags=GFP_KERNEL
2.50% call_site=ffffffffb92ad511 ptr=0xffff888a9d26a2a0 bytes_req=28 bytes_alloc=32 gfp_flags=GFP_KERNEL
2.50% call_site=ffffffffb936a7fb ptr=0xffff88873e8c11a0 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL
2.50% call_site=ffffffffb936a7fb ptr=0xffff88873e8c12c0 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL
2.50% call_site=ffffffffb936a7fb ptr=0xffff88873e8c1540 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL
2.50% call_site=ffffffffb936a7fb ptr=0xffff88873e8c15a0 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL
2.50% call_site=ffffffffb936a7fb ptr=0xffff88873e8c15e0 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL
2.50% call_site=ffffffffb936a7fb ptr=0xffff88873e8c16e0 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL
2.50% call_site=ffffffffb936a7fb ptr=0xffff88873e8c1c20 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL
2.50% call_site=ffffffffb936a7fb ptr=0xffff888a9d26a2a0 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL
2.50% call_site=ffffffffb9373e66 ptr=0xffff8889f1931240 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_ATOMIC|__GFP_ZERO
2.50% call_site=ffffffffb9373e66 ptr=0xffff8889f1931980 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_ATOMIC|__GFP_ZERO
2.50% call_site=ffffffffb9373e66 ptr=0xffff8889f1931a00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_ATOMIC|__GFP_ZERO
#
# # And then limiting using the example for 'perf kmem stat --time' used
# # in the previous changeset committer note we see that there were no
# # kmem:kmalloc in that last part of the file, but there were some
# # kmem:kmem_cache_alloc ones:
#
# perf report --time 20119.782088, --stdio
#
# Total Lost Samples: 0
#
# Samples: 0 of event 'kmem:kmalloc'
# Event count (approx.): 0
#
# Overhead Trace output
# ........ ............
#
# Samples: 0 of event 'kmem:kmalloc_node'
# Event count (approx.): 0
#
# Overhead Trace output
# ........ ............
#
# Samples: 0 of event 'kmem:kfree'
# Event count (approx.): 0
#
# Overhead Trace output
# ........ ............
#
# Samples: 8 of event 'kmem:kmem_cache_alloc'
# Event count (approx.): 8
#
# Overhead Trace output
# ........ ..................................................................................................................
#
75.00% call_site=ffffffffb9333b42 ptr=0xffff888bdf1a39c0 bytes_req=48 bytes_alloc=48 gfp_flags=GFP_NOFS|__GFP_ZERO
12.50% call_site=ffffffffb90ad33a ptr=0xffff8889f071f6e0 bytes_req=160 bytes_alloc=160 gfp_flags=GFP_ATOMIC|__GFP_NOTRACK
12.50% call_site=ffffffffb9287cc1 ptr=0xffff8889b12722d8 bytes_req=104 bytes_alloc=104 gfp_flags=GFP_NOFS|__GFP_ZERO
#
Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1480439746-42695-7-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
'perf report --stdio' will colorize entries with most hits and possibly
some other aspects of its output, but those colors gets suppressed if we
redirect the output to a non-tty, allow keeping the colors by adding a
new option, --stdio-color, now this use case will also output escape
sequences for colors:
$ perf annotate --stdio-color | more
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-3iuawqjldu4i8gziot7e3d5n@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add "srcline_from" and "srcline_to" branch sort keys that allow to show
the source lines of a branch.
That makes it much easier to track down where particular branches happen
in the program, for example to examine branch mispredictions, or to
associate it with cycle counts:
% perf record -b -e cycles:p ./tcall
% perf report --sort srcline_from,srcline_to,mispredict
...
15.10% tcall.c:18 tcall.c:10 N
14.83% tcall.c:11 tcall.c:5 N
14.12% tcall.c:7 tcall.c:12 N
14.04% tcall.c:12 tcall.c:5 N
12.42% tcall.c:17 tcall.c:18 N
12.39% tcall.c:7 tcall.c:13 N
12.27% tcall.c:13 tcall.c:17 N
...
% perf report --sort srcline_from,srcline_to,cycles
...
17.12% tcall.c:18 tcall.c:11 1
17.01% tcall.c:12 tcall.c:6 1
16.98% tcall.c:11 tcall.c:6 1
15.91% tcall.c:17 tcall.c:18 1
6.38% tcall.c:7 tcall.c:17 7
4.80% tcall.c:7 tcall.c:12 8
4.21% tcall.c:7 tcall.c:17 8
2.67% tcall.c:7 tcall.c:12 7
2.62% tcall.c:7 tcall.c:12 10
2.10% tcall.c:7 tcall.c:17 9
1.58% tcall.c:7 tcall.c:12 6
1.44% tcall.c:7 tcall.c:12 5
1.38% tcall.c:7 tcall.c:12 9
1.06% tcall.c:7 tcall.c:17 13
1.05% tcall.c:7 tcall.c:12 4
1.01% tcall.c:7 tcall.c:17 6
Open issues:
- Some kernel symbols get misresolved.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1463775308-32748-1-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We cannot limit processing stacks from the current value of the sysctl,
as we may be processing perf.data files, possibly from other machines.
Instead use the old PERF_MAX_STACK_DEPTH, the sysctl default, that can
be overriden using --max-stack or equivalent.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Fixes: 4cb93446c587 ("perf tools: Set the maximum allowed stack from /proc/sys/kernel/perf_event_max_stack")
Link: http://lkml.kernel.org/n/tip-eqeutsr7n7wy0c36z24ytvii@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
/proc/sys/kernel/perf_event_max_stack
There is an upper limit to what tooling considers a valid callchain,
and it was tied to the hardcoded value in the kernel,
PERF_MAX_STACK_DEPTH (127), now that this can be tuned via a sysctl,
make it read it and use that as the upper limit, falling back to
PERF_MAX_STACK_DEPTH for kernels where this sysctl isn't present.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-yjqsd30nnkogvj5oyx9ghir9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1458823940-24583-6-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The --hierarchy option is to show output in hierarchy mode. It extends
folding/unfolding in the TUI and GTK browsers to support sort items as
well as callchains. Users can toggle the items to see the performance
result at wanted level.
$ perf report --hierarchy --tui
Overhead Command / Shared Object / Symbol
--------------------------------------------------
+ 32.96% gnome-shell
- 15.11% swapper
- 14.97% [kernel.vmlinux]
6.82% [k] intel_idle
0.66% [k] menu_select
0.43% [k] __hrtimer_start_range_ns
...
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1456326830-30456-17-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The --percent-limit option was changed to be applied to callchains as
well as to hist entries recently, but it missed to update the doc.
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1454508683-5735-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The description of the memory sort key (used by --mem-mode) was
misplaced. Move it under the --sort option so that it can be referenced
properly.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1454508683-5735-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1451991518-25673-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The --raw-trace option allows disabling pretty printing by the event's
print_fmt or plugin. Besides that, each dynamic sort key now can
receive a 'raw' suffix separated by '/' to ask for the raw trace of a
specific field.
$ perf report -s comm,kmem:kmalloc.gfp_flags
...
# Overhead Command gfp_flags
# ........ ....... ...................
#
99.89% perf GFP_NOFS|GFP_ZERO
0.06% sleep GFP_KERNEL
0.03% perf GFP_KERNEL|GFP_ZERO
0.01% perf GFP_KERNEL
Now
$ perf report -s comm,kmem:kmalloc.gfp_flags --raw-trace
or
$ perf report -s comm,kmem:kmalloc.gfp_flags/raw
...
# Overhead Command gfp_flags
# ........ ....... ..........
#
99.89% perf 32848
0.06% sleep 208
0.03% perf 32976
0.01% perf 208
Suggested-and-Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450804030-29193-9-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Now -g/--call-graph option supports how to display callchain values.
Possible values are 'percent', 'period' and 'count'. The percent is
same as before and it's the default behavior. The period displays the
raw period value rather than the percentage. The count displays the
number of occurrences.
$ perf report --no-children --stdio -g percent
...
39.93% swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--28.63%-- start_secondary
|
--11.30%-- rest_init
$ perf report --no-children --show-total-period --stdio -g period
...
39.93% 13018705 swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--9334403-- start_secondary
|
--3684302-- rest_init
$ perf report --no-children --show-nr-samples --stdio -g count
...
39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--57-- start_secondary
|
--23-- rest_init
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1447047946-1691-6-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add new call chain option (-g) 'folded' to print callchains in a line.
The callchains are separated by semicolons, and preceded by (absolute)
percent values and a space.
For example, the following 20 lines can be printed in 3 lines with the
folded output mode:
$ perf report -g flat --no-children | grep -v ^# | head -20
60.48% swapper [kernel.vmlinux] [k] intel_idle
54.60%
intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
start_secondary
5.88%
intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
$ perf report -g folded --no-children | grep -v ^# | head -3
60.48% swapper [kernel.vmlinux] [k] intel_idle
54.60% intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
5.88% intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
This mode is supported only for --stdio now and intended to be used by
some scripts like in FlameGraphs[1]. Support for other UI might be
added later.
[1] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
Requested-and-Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1447047946-1691-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
So that it can be more consistent with other --show-* options. The old
name (--showcpuutilization) is provided only for compatibility.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445701767-12731-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The --call-graph option is complex so we should provide better guide for
users. Also change help message to be consistent with config option
names. Now perf top will show help like below:
$ perf top --call-graph
Error: option `call-graph' requires a value
Usage: perf top [<options>]
--call-graph <record_mode[,record_size],print_type,threshold[,print_limit],order,sort_key[,branch]>
setup and enables call-graph (stack chain/backtrace):
record_mode: call graph recording mode (fp|dwarf|lbr)
record_size: if record_mode is 'dwarf', max size of stack recording (<bytes>)
default: 8192 (bytes)
print_type: call graph printing style (graph|flat|fractal|none)
threshold: minimum call graph inclusion threshold (<percent>)
print_limit: maximum number of call graph entry (<number>)
order: call graph order (caller|callee)
sort_key: call graph sort key (function|address)
branch: include last branch info to call graph (branch)
Default: fp,graph,0.5,caller,function
Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1445524112-5201-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Which is the most common default found in other similar tools.
Requested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://www.youtube.com/watch?v=nXaxk27zwlk
Link: http://lkml.kernel.org/n/tip-v8lq36aispvdwgxdmt9p9jd9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The --max_stack option was added as an optimization to reduce processing time,
so people specifying --max-stack might get a increased processing time if
combined with synthesized callchains, but otherwise no real harm.
A warning about setting both --max_stack and the synthesized callchains max
depth seems like overkill. Amend the documentation.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/560A5155.4060105@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Introduce --socket-filter option for 'perf report' to only show entries
for a processor socket that match this filter.
$ perf report --socket-filter 1 --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 752 of event 'cycles'
# Event count (approx.): 350995599
# Processor Socket: 1
#
# Overhead Command Shared Object Symbol
# ........ ......... ................ .................................
#
97.02% test test [.] plusB_c
0.97% test test [.] plusA_c
0.23% swapper [kernel.vmlinux] [k] acpi_idle_do_entry
0.09% rcu_sched [kernel.vmlinux] [k] dyntick_save_progress_counter
0.01% swapper [kernel.vmlinux] [k] task_waking_fair
0.00% swapper [kernel.vmlinux] [k] run_timer_softirq
Signed-off-by: Kan Liang <kan.liang@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1441377946-44429-3-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This patch enable perf report to sort by processor socket:
$ perf report --stdio --sort socket,comm,dso,symbol
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 686 of event 'cycles'
# Event count (approx.): 349215462
#
# Overhead SOCKET Command Shared Object Symbol
# ........ ...... ....... ................ ............................
#
97.05% 000 test test [.] plusB_c
0.98% 000 test test [.] plusA_c
0.93% 001 perf [kernel.vmlinux] [k] smp_call_function_single
0.19% 001 perf [kernel.vmlinux] [k] page_fault
0.19% 001 swapper [kernel.vmlinux] [k] pm_qos_request
0.16% 000 test [kernel.vmlinux] [k] add_mm_counter_fast
Signed-off-by: Kan Liang <kan.liang@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1441377946-44429-2-git-send-email-kan.liang@intel.com
[ Fix col calc, un-allcapsify col header & read the topology when not using perf.data ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|