summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2017-09-18selftests: net: More graceful finding of `ip'.Daniel Díaz
The ip tool might be provided by another package (such as Busybox), not necessarily implementing the -Version switch. Trying an actual usage (`ip link show') might be a better test that would work with all implementations of `ip'. Signed-off-by: Daniel Díaz <daniel.diaz@linaro.org> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2017-09-18perf tools: Fix leaking rec_argv in error casesMartin Kepplinger
Let's free the allocated rec_argv in case we return early, in order to avoid leaking memory. This adds free() at a few very similar places across the tree where it was missing. Signed-off-by: Martin Kepplinger <martink@posteo.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Martin kepplinger <martink@posteo.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170913191419.29806-1-martink@posteo.de Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-18perf pmu: Improve error messages for missing PMUsAndi Kleen
When a PMU is missing print a better error message mentioning the missing PMU. % mkdir empty % mount --bind empty /sys/devices/msr % perf stat -M Summary true event syntax error: '{inst_retired.any,cycles}:W,{cpu_clk_unhalted.thread}:W,{inst_retired.any}:W,{cpu_clk_unhalted.ref_tsc,msr/tsc/}:W,{fp_comp_ops_exe.sse_scalar..' \___ Cannot find PMU `msr'. Missing kernel support? It still cannot find the right column for aliases, but it's already a vast improvement. v2: Check asprintf Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170913215006.32222-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-18perf machine: Optimize a bit the machine__findnew_thread() methodsArnaldo Carvalho de Melo
In some cases we already have calculated the hash bucket, so reuse it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-800zehjsyy03er4s4jf0e99v@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-18perf machine: Use hashtable for machine threadsKan Liang
To process any events, it needs to find the thread in the machine first. The machine maintains a rb tree to store all threads. The rb tree is protected by a rw lock. It is not a problem for current perf which serially processing events. However, it will have scalability performance issue to process events in parallel, especially on a heavy load system which have many threads. Introduce a hashtable to divide the big rb tree into many samll rb tree for threads. The index is thread id % hashtable size. It can reduce the lock contention. Committer notes: Renamed some variables and function names to reduce semantic confusion: 'struct threads' pointers: thread -> threads threads hastable index: tid -> hash_bucket struct threads *machine__thread() -> machine__threads() Cast tid to (unsigned int) to handle -1 in machine__threads() (Kan Liang) Signed-off-by: Kan Liang <kan.liang@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1505096603-215017-2-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-17Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc fixes from Thomas Gleixner: - A fix for a user space regression in /proc/$PID/stat - A couple of objtool fixes: ~ Plug a memory leak ~ Avoid accessing empty sections which upsets certain binutil versions ~ Prevent corrupting the obj file when section sizes did not change * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/proc: Report eip/esp in /prod/PID/stat for coredumping objtool: Fix object file corruption objtool: Do not retrieve data from empty sections objtool: Fix memory leak in elf_create_rela_section()
2017-09-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix hotplug deadlock in hv_netvsc, from Stephen Hemminger. 2) Fix double-free in rmnet driver, from Dan Carpenter. 3) INET connection socket layer can double put request sockets, fix from Eric Dumazet. 4) Don't match collect metadata-mode tunnels if the device is down, from Haishuang Yan. 5) Do not perform TSO6/GSO on ipv6 packets with extensions headers in be2net driver, from Suresh Reddy. 6) Fix scaling error in gen_estimator, from Eric Dumazet. 7) Fix 64-bit statistics deadlock in systemport driver, from Florian Fainelli. 8) Fix use-after-free in sctp_sock_dump, from Xin Long. 9) Reject invalid BPF_END instructions in verifier, from Edward Cree. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits) mlxsw: spectrum_router: Only handle IPv4 and IPv6 events Documentation: link in networking docs tcp: fix data delivery rate bpf/verifier: reject BPF_ALU64|BPF_END sctp: do not mark sk dumped when inet_sctp_diag_fill returns err sctp: fix an use-after-free issue in sctp_sock_dump netvsc: increase default receive buffer size tcp: update skb->skb_mstamp more carefully net: ipv4: fix l3slave check for index returned in IP_PKTINFO net: smsc911x: Quieten netif during suspend net: systemport: Fix 64-bit stats deadlock net: vrf: avoid gcc-4.6 warning qed: remove unnecessary call to memset tg3: clean up redundant initialization of tnapi tls: make tls_sw_free_resources static sctp: potential read out of bounds in sctp_ulpevent_type_enabled() MAINTAINERS: review Renesas DT bindings as well net_sched: gen_estimator: fix scaling error in bytes/packets samples nfp: wait for the NSP resource to appear on boot nfp: wait for board state before talking to the NSP ...
2017-09-15bpf/verifier: reject BPF_ALU64|BPF_ENDEdward Cree
Neither ___bpf_prog_run nor the JITs accept it. Also adds a new test case. Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Edward Cree <ecree@solarflare.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-15objtool: Fix object file corruptionJosh Poimboeuf
Arnd Bergmann reported that a randconfig build was failing with the following link error: built-in.o: member arch/x86/kernel/time.o in archive is not an object It turns out the link failed because the time.o file had been corrupted by objtool: nm: arch/x86/kernel/time.o: File format not recognized In certain rare cases, when a .o file's ORC table is very small, the .data section size doesn't change because it's page aligned. Because all the existing sections haven't changed size, libelf doesn't detect any section header changes, and so it doesn't update the section header table properly. Instead it writes junk in the section header entries for the new ORC sections. Make sure libelf properly updates the section header table by setting the ELF_F_DIRTY flag in the top level elf struct. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 627fce14809b ("objtool: Add ORC unwind table generation") Link: http://lkml.kernel.org/r/e650fd0f2d8a209d1409a9785deb101fdaed55fb.1505459813.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-15objtool: Do not retrieve data from empty sectionsPetr Vandrovec
Binutils 2.29-9 in Debian return an error when elf_getdata is invoked on empty section (.note.GNU-stack in all kernel files), causing immediate failure of kernel build with: elf_getdata: can't manipulate null section As nothing is done with sections that have zero size, just do not retrieve their data at all. Signed-off-by: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2ce30a44349065b70d0f00e71e286dc0cbe745e6.1505459652.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-15objtool: Fix memory leak in elf_create_rela_section()Martin Kepplinger
Let's free the allocated char array 'relaname' before returning, in order to avoid leaking memory. Signed-off-by: Martin Kepplinger <martink@posteo.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: mingo.kernel.org@gmail.com Link: http://lkml.kernel.org/r/20170914060138.26472-1-martink@posteo.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14Merge tag 'kbuild-v4.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Use Make-builtin $(abspath ...) helper to get absolute path - Add W=2 extra warning option to detect unused macros - Use more KCONFIG_CONFIG instead hard-coded .config - Fix bugs of tar*-pkg targets * tag 'kbuild-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: buildtar: do not print successful message if tar returns error kbuild: buildtar: fix tar error when CONFIG_MODULES is disabled kbuild: Use KCONFIG_CONFIG in buildtar Kbuild: enable -Wunused-macros warning for "make W=2" kbuild: use $(abspath ...) instead of $(shell cd ... && /bin/pwd)
2017-09-13mm: treewide: remove GFP_TEMPORARY allocation flagMichal Hocko
GFP_TEMPORARY was introduced by commit e12ba74d8ff3 ("Group short-lived and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's primary motivation was to allow users to tell that an allocation is short lived and so the allocator can try to place such allocations close together and prevent long term fragmentation. As much as this sounds like a reasonable semantic it becomes much less clear when to use the highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the context holding that memory sleep? Can it take locks? It seems there is no good answer for those questions. The current implementation of GFP_TEMPORARY is basically GFP_KERNEL | __GFP_RECLAIMABLE which in itself is tricky because basically none of the existing caller provide a way to reclaim the allocated memory. So this is rather misleading and hard to evaluate for any benefits. I have checked some random users and none of them has added the flag with a specific justification. I suspect most of them just copied from other existing users and others just thought it might be a good idea to use without any measuring. This suggests that GFP_TEMPORARY just motivates for cargo cult usage without any reasoning. I believe that our gfp flags are quite complex already and especially those with highlevel semantic should be clearly defined to prevent from confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and replace all existing users to simply use GFP_KERNEL. Please note that SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and so they will be placed properly for memory fragmentation prevention. I can see reasons we might want some gfp flag to reflect shorterm allocations but I propose starting from a clear semantic definition and only then add users with proper justification. This was been brought up before LSF this year by Matthew [1] and it turned out that GFP_TEMPORARY really doesn't have a clear semantic. It seems to be a heuristic without any measured advantage for most (if not all) its current users. The follow up discussion has revealed that opinions on what might be temporary allocation differ a lot between developers. So rather than trying to tweak existing users into a semantic which they haven't expected I propose to simply remove the flag and start from scratch if we really need a semantic for short term allocations. [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org [akpm@linux-foundation.org: fix typo] [akpm@linux-foundation.org: coding-style fixes] [sfr@canb.auug.org.au: drm/i915: fix up] Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Neil Brown <neilb@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-13perf vendor events: Add JSON metrics for Skylake serverAndi Kleen
Add JSON metrics for Skylake server Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf vendor events: Add JSON metrics for Broadwell DEAndi Kleen
Add JSON metrics for Broadwell DE Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf vendor events: Add JSON metrics for Broadwell ServerAndi Kleen
Add JSON metrics for Broadwell Server. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf vendor events: Add JSON metrics for Haswell EPAndi Kleen
Add JSON metrics for Haswell EP. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf vendor events: Add JSON metrics for Ivy TownAndi Kleen
Add JSON metrics for Ivy Town. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf vendor events: Add JSON metrics for HaswellAndi Kleen
Add JSON metrics for Haswell. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf vendor events: Add JSON metrics for Ivy BridgeAndi Kleen
Add JSON metrics for Ivy Bridge. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf vendor events: Add JSON metrics for Sandy Bridge EPAndi Kleen
Add JSON metrics for Sandy Bridge EP. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf vendor events: Add JSON metrics for Sandy BridgeAndi Kleen
Add JSON metrics for Sandy Bridge. Committer testing: # grep "model name" /proc/cpuinfo | head -1 model name : Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz # perf list metricgroup List of pre-defined events (to be used in -e): Metric Groups: DSB FLOPS Frontend Frontend_Bandwidth Pipeline Ports_Utilization Power SMT Summary TopDownL1 # perf stat -M Power --metric-only -a sleep 1 Performance counter stats for 'system wide': Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency 0.8 0.0 98.1 0.0 0.0 0.0 23.4 0.0 1.001153658 seconds time elapsed # perf stat -v -M Power --metric-only -a sleep 1 Using CPUID GenuineIntel-6-2A metric expr cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc for Turbo_Utilization found event cpu_clk_unhalted.thread found event cpu_clk_unhalted.ref_tsc metric expr (cstate_core@c3\-residency@ / msr@tsc@) * 100 for C3_Core_Residency found event cstate_core/c3-residency/ found event msr/tsc/ metric expr (cstate_core@c6\-residency@ / msr@tsc@) * 100 for C6_Core_Residency found event cstate_core/c6-residency/ found event msr/tsc/ metric expr (cstate_core@c7\-residency@ / msr@tsc@) * 100 for C7_Core_Residency found event cstate_core/c7-residency/ found event msr/tsc/ metric expr (cstate_pkg@c2\-residency@ / msr@tsc@) * 100 for C2_Pkg_Residency found event cstate_pkg/c2-residency/ found event msr/tsc/ metric expr (cstate_pkg@c3\-residency@ / msr@tsc@) * 100 for C3_Pkg_Residency found event cstate_pkg/c3-residency/ found event msr/tsc/ metric expr (cstate_pkg@c6\-residency@ / msr@tsc@) * 100 for C6_Pkg_Residency found event cstate_pkg/c6-residency/ found event msr/tsc/ metric expr (cstate_pkg@c7\-residency@ / msr@tsc@) * 100 for C7_Pkg_Residency found event cstate_pkg/c7-residency/ found event msr/tsc/ adding {cpu_clk_unhalted.thread,cpu_clk_unhalted.ref_tsc}:W,{cstate_core/c3-residency/,msr/tsc/}:W,{cstate_core/c6-residency/,msr/tsc/}:W,{cstate_core/c7-residency/,msr/tsc/}:W,{cstate_pkg/c2-residency/,msr/tsc/}:W,{cstate_pkg/c3-residency/,msr/tsc/}:W,{cstate_pkg/c6-residency/,msr/tsc/}:W,{cstate_pkg/c7-residency/,msr/tsc/}:W cpu_clk_unhalted.thread -> cpu/event=0x3c/ cpu_clk_unhalted.ref_tsc -> cpu/umask=0x3,period=2000003,event=0/ Weak group for cstate_pkg/c2-residency//2 failed Weak group for cstate_pkg/c3-residency//2 failed Weak group for cstate_pkg/c6-residency//2 failed Weak group for cstate_pkg/c7-residency//2 failed cpu_clk_unhalted.thread: 5564185 4002833569 4002833569 cpu_clk_unhalted.ref_tsc: 7325424 4002833569 4002833569 cstate_core/c3-residency/: 68293 4003027101 4003027101 msr/tsc/: 12451294472 4003027101 4003027101 cstate_core/c6-residency/: 12238830163 4003260984 4003260984 msr/tsc/: 12452017806 4003260984 4003260984 cstate_core/c7-residency/: 0 4003489648 4003489648 msr/tsc/: 12452725162 4003489648 4003489648 cstate_pkg/c2-residency/: 1830054 1000913138 1000913138 msr/tsc/: 12453441079 4003717513 4003717513 cstate_pkg/c3-residency/: 0 1000973570 1000973570 msr/tsc/: 12454177865 4003954758 4003954758 cstate_pkg/c6-residency/: 2940448859 1001032370 1001032370 msr/tsc/: 12454833890 4004166118 4004166118 cstate_pkg/c7-residency/: 0 1001049818 1001049818 msr/tsc/: 12454919470 4004194204 4004194204 Performance counter stats for 'system wide': Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency 0.8 0.0 98.3 0.0 0.0 0.0 23.6 0.0 1.001126519 seconds time elapsed # Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170905195235.GW2482@two.firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf vendor events: Add JSON metrics for SkylakeAndi Kleen
Add JSON metrics for Skylake. Committer testing: # grep "model name" /proc/cpuinfo | head -1 model name : Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz # uname -a Linux seventh 4.12.0-rc6+ #1 SMP Fri Jun 30 16:40:55 -03 2017 x86_64 x86_64 x86_64 GNU/Linux # perf stat --metric-only -M Summary -a sleep 1 Performance counter stats for 'system wide': Instructions CPI CLKS CPU_Utilization GFLOPs SMT_2T_Utilization Kernel_Utilization 34021097.0 0.0 119424171.0 0.0 0.0 0.0 0.0 1.001001793 seconds time elapsed # perf list metricgroup List of pre-defined events (to be used in -e): Metric Groups: DSB FLOPS Frontend Frontend_Bandwidth Memory_BW Memory_Bound Memory_Lat Pipeline Ports_Utilization Power SMT Summary TLB TopDownL1 Unknown_Branches # perf stat --metric-only -M Ports_Utilization -a sleep 1 Performance counter stats for 'system wide': ILP 1475828.0 1.000688547 seconds time elapsed # perf stat -v --metric-only -M Ports_Utilization -a sleep 1 Using CPUID GenuineIntel-6-9E metric expr uops_executed.thread / ( uops_executed.core_cycles_ge_1 / 2) if #smt_on else uops_executed.core_cycles_ge_1 for ILP found event uops_executed.thread found event uops_executed.core_cycles_ge_1 adding {uops_executed.thread,uops_executed.core_cycles_ge_1}:W uops_executed.thread -> cpu/umask=0x1,period=2000003,event=0xb1/ uops_executed.core_cycles_ge_1 -> cpu/umask=0x2,period=2000003,cmask=1,event=0xb1/ uops_executed.thread: 8115271 4002547654 4002547654 uops_executed.core_cycles_ge_1: 3282969 4002547654 4002547654 Performance counter stats for 'system wide': ILP 3282969.0 1.000719870 seconds time elapsed # Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170905195235.GW2482@two.firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf vendor events: Add JSON metrics for BroadwellAndi Kleen
Add JSON metrics for Broadwell. Commiter testing: # uname -a Linux jouet 4.13.0-rc7+ #3 SMP Sat Sep 2 09:04:44 -03 2017 x86_64 x86_64 x86_64 GNU/Linux # grep "model name" /proc/cpuinfo | head -1 model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz # perf list metricgroup List of pre-defined events (to be used in -e): Metric Groups: DSB FLOPS Frontend Frontend_Bandwidth Memory_BW Memory_Bound Memory_Lat Pipeline Ports_Utilization Power SMT Summary TLB TopDownL1 Unknown_Branches # perf stat -M Power --metric-only -a sleep 1 Performance counter stats for 'system wide': Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency 1.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.003502904 seconds time elapsed # # perf stat -M Memory_BW --metric-only -a sleep 1 Performance counter stats for 'system wide': MLP 1.7 1.001364525 seconds time elapsed # # perf stat -M TLB --metric-only -a sleep 1 Performance counter stats for 'system wide': Page_Walks_Utilization 0.1 1.005962198 seconds time elapsed # # perf stat -M Summary --metric-only -a sleep 1 Performance counter stats for 'system wide': Instructions CPI CLKS CPU_Utilization GFLOPs SMT_2T_Utilization Kernel_Utilization 7281856697.0 0.0 11150898087.0 1.0 0.0 1.0 0.7 1.012134025 seconds time elapsed # Running in verbose mode shows which counters and expressions are being used: # perf stat -v -M Summary --metric-only -a sleep 1 Using CPUID GenuineIntel-6-3D metric expr 1 / inst_retired.any / cycles for CPI found event inst_retired.any found event cycles metric expr cpu_clk_unhalted.thread for CLKS found event cpu_clk_unhalted.thread metric expr inst_retired.any for Instructions found event inst_retired.any metric expr cpu_clk_unhalted.ref_tsc / msr@tsc@ for CPU_Utilization found event cpu_clk_unhalted.ref_tsc found event msr/tsc/ metric expr ( 1*( fp_arith_inst_retired.scalar_single + fp_arith_inst_retired.scalar_double ) + 2* fp_arith_inst_retired.128b_packed_double + 4*( fp_arith_inst_retired.128b_packed_single + fp_arith_inst_retired.256b_packed_double ) + 8* fp_arith_inst_retired.256b_packed_single ) / 1000000000 / duration_time for GFLOPs found event fp_arith_inst_retired.scalar_single found event fp_arith_inst_retired.scalar_double found event fp_arith_inst_retired.128b_packed_double found event fp_arith_inst_retired.128b_packed_single found event fp_arith_inst_retired.256b_packed_double found event fp_arith_inst_retired.256b_packed_single found event duration_time metric expr 1 - cpu_clk_thread_unhalted.one_thread_active / ( cpu_clk_thread_unhalted.ref_xclk_any / 2 ) if #smt_on else 0 for SMT_2T_Utilization found event cpu_clk_thread_unhalted.one_thread_active found event cpu_clk_thread_unhalted.ref_xclk_any metric expr cpu_clk_unhalted.ref_tsc:u / cpu_clk_unhalted.ref_tsc for Kernel_Utilization found event cpu_clk_unhalted.ref_tsc:u found event cpu_clk_unhalted.ref_tsc adding {inst_retired.any,cycles}:W,{cpu_clk_unhalted.thread}:W,{inst_retired.any}:W,{cpu_clk_unhalted.ref_tsc,msr/tsc/}:W,{fp_arith_inst_retired.scalar_single,fp_arith_inst_retired.scalar_double,fp_arith_inst_retired.128b_packed_double,fp_arith_inst_retired.128b_packed_single,fp_arith_inst_retired.256b_packed_double,fp_arith_inst_retired.256b_packed_single,duration_time}:W,{cpu_clk_thread_unhalted.one_thread_active,cpu_clk_thread_unhalted.ref_xclk_any}:W,{cpu_clk_unhalted.ref_tsc:u,cpu_clk_unhalted.ref_tsc}:W inst_retired.any -> cpu/event=0xc0/ cpu_clk_unhalted.thread -> cpu/event=0x3c/ inst_retired.any -> cpu/event=0xc0/ cpu_clk_unhalted.ref_tsc -> cpu/umask=0x3,period=2000003,event=0/ fp_arith_inst_retired.scalar_single -> cpu/umask=0x2,period=2000003,event=0xc7/ fp_arith_inst_retired.scalar_double -> cpu/umask=0x1,period=2000003,event=0xc7/ fp_arith_inst_retired.128b_packed_double -> cpu/umask=0x4,period=2000003,event=0xc7/ fp_arith_inst_retired.128b_packed_single -> cpu/umask=0x8,period=2000003,event=0xc7/ fp_arith_inst_retired.256b_packed_double -> cpu/umask=0x10,period=2000003,event=0xc7/ fp_arith_inst_retired.256b_packed_single -> cpu/umask=0x20,period=2000003,event=0xc7/ cpu_clk_thread_unhalted.one_thread_active -> cpu/umask=0x2,period=2000003,event=0x3c/ cpu_clk_thread_unhalted.ref_xclk_any -> cpu/umask=0x1,any=1,period=2000003,event=0x3c/ cpu_clk_unhalted.ref_tsc -> cpu/umask=0x3,period=2000003,event=0/ cpu_clk_unhalted.ref_tsc -> cpu/umask=0x3,period=2000003,event=0/ Weak group for fp_arith_inst_retired.scalar_single/7 failed Weak group for cpu_clk_unhalted.ref_tsc:u/2 failed inst_retired.any: 8704146437 4026374016 619883741 cycles: 11180800018 4026374016 619883741 cpu_clk_unhalted.thread: 11140030295 4026323772 931621933 inst_retired.any: 8643115117 4026260510 1243595906 cpu_clk_unhalted.ref_tsc: 10201638510 4026184297 1247351077 msr/tsc/: 10378022785 4026184297 1247351077 fp_arith_inst_retired.scalar_single: 134697 4026102728 1559210545 fp_arith_inst_retired.scalar_double: 274339 4026007348 1870014984 fp_arith_inst_retired.128b_packed_double: 1639 4025886054 1866736918 fp_arith_inst_retired.128b_packed_single: 0 4025776614 2175106569 fp_arith_inst_retired.256b_packed_double: 0 4025681734 1235551129 fp_arith_inst_retired.256b_packed_single: 0 4025582962 1232398454 duration_time: 0 4025552913 4025552913 cpu_clk_thread_unhalted.one_thread_active: 10505 4025474649 923893076 cpu_clk_thread_unhalted.ref_xclk_any: 394992110 4025474649 923893076 cpu_clk_unhalted.ref_tsc:u: 5341421014 4025360315 1231634198 cpu_clk_unhalted.ref_tsc: 10258278508 4025252611 307909362 Performance counter stats for 'system wide': Instructions CPI CLKS CPU_Utilization GFLOPs SMT_2T_Utilization Kernel_Utilization 8704146437.0 0.0 11140030295.0 1.0 0.0 1.0 0.5 1.006783654 seconds time elapsed # Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170905195235.GW2482@two.firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf stat: Fall weak group back even for EBADFAndi Kleen
It's not possible to run a package event and a per cpu event in the same group. This is used by some of the power metrics. They work correctly when not using a group. Normally weak groups should handle that, but in this case EBADF is returned instead of the normal EINVAL. $ strace -e perf_event_open ./perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1 Using CPUID GenuineIntel-6-3E perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, PERF_FLAG_FD_CLOEXEC) = -1 EINVAL (Invalid argument) perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument) perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument) perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument) perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = 3 perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, 3, 0) = 4 perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 1, 0, 0) = -1 EBADF (Bad file descriptor) and perf errors out. Make weak groups trigger a fall back for EBADF too. Then this case works correctly: $ perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1 Using CPUID GenuineIntel-6-3E Weak group for cstate_pkg/c2-residency//2 failed cstate_pkg/c2-residency/: 476709882 1000598460 1000598460 msr/tsc/: 39625837911 12007369110 12007369110 Performance counter stats for 'system wide': 476,709,882 cstate_pkg/c2-residency/ 39,625,837,911 msr/tsc/ 1.000697588 seconds time elapsed This fixes perf stat -M Power ... $ perf stat -M Power --metric-only -a sleep 1 Performance counter stats for 'system wide': Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency 1.0 0.7 30.0 0.0 0.9 0.1 0.4 0.0 1.001240740 seconds time elapsed Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170905211324.32427-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf tools: Make copyfile_offset() staticArnaldo Carvalho de Melo
There are no usage outside util.c and this is the only remaining reason for fcntl.h to be included in util.h, to get the loff_t definition in Alpine Linux, so make it static. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-2dzlsao7k6ihozs5karw6kpx@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf config: Allow creating empty config set for config file autogenerationTaeung Song
When there isn't a config file (e.g. ~/.perfconfig) or it has nothing, the config set wasn't created. If the config set does not exist, a config file can't be autogenerated. So allow creating a empty config set in the above case, then we can support the config file autogeneration. Before: $ rm -f ~/.perfconfig $ perf config --user report.children=false $ cat ~/.perfconfig cat: /root/.perfconfig: No such file or directory But I think it should work even if there isn't a config file. After: $ rm -f ~/.perfconfig $ perf config --user report.children=false $ cat ~/.perfconfig # this file is auto-generated. [report] children = false NOTE: As a result, if perf_config_set__init() fails, it looks as if the config set isn't freed. But it isn't a problem. Because the config set will be freed by perf_config_set__delete() at the end of cmd_config(). Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1504754336-9824-1-git-send-email-treeze.taeung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf config: Write a config file just onceTaeung Song
Currently set_config() can be repeatedly called for each input config on the below case: $ perf config kmem.default=slab report.children=false ... But it's a waste, so only once write a config file gathering all given config key=value pairs. Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1504754331-9776-1-git-send-email-treeze.taeung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf tools: Use scandir() to replace readdir()Kan Liang
In perf_event__synthesize_threads() perf goes through all proc files serially by readdir. scandir() does a snapshoot of /proc, which is multithreading friendly. It's possible that some threads which are added during event synthesize. But the number of lost threads should be small. They should not impact the final analysis. Signed-off-by: Kan Liang <kan.liang@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1504806954-150842-3-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf ui progress: Add size info into progress barJiri Olsa
Adding the size values '[current/total]' into progress bar, to show more detailed progress of data reading. Adding new ui_progress__init_size function to specify we want to display the size. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170908120510.22515-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf ui progress: Add ui specific init functionJiri Olsa
Adding ui specific init function allowing to setup the progress bar width based on current screen scales. Adding TUI init function to get more grained update of the progress bar. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170908120510.22515-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf tools: Add python-clean targetJiri Olsa
To be able to cleanup only python related binaries. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170908084621.31595-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf script: Support user regsAndi Kleen
Teach perf script to print user regs. % perf record --user-regs=ip,sp ... % perf script -F ip,sym,uregs ... ffffffff9e060c24 native_write_msr ABI:2 SP:0x7ffd0ea06c38 IP:0x7fe77f55b637 ffffffff9e060c24 native_write_msr ABI:2 SP:0x7ffd0ea06c38 IP:0x7fe77f55b637 ffffffff9e060c24 native_write_msr ABI:2 SP:0x7ffd0ea06c38 IP:0x7fe77f55b637 ffffffff9e060c24 native_write_msr ABI:2 SP:0x7ffd0ea06c38 IP:0x7fe77f55b637 ffffffff9e00cc12 intel_pmu_handle_irq ABI:2 SP:0x7ffd0ea06c38 IP:0x7fe77f55b637 v2: Rebased on top of phys-addr patches Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170905184057.26135-1-andi@firstfloor.org [ Use PRIu64 for regs->abi in print_sample_uregs() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf record: Support direct --user-regs argumentsAndi Kleen
USER_REGS can currently only collected implicitely with call graph recording. Sometimes it is useful to see them separately, and filter them. Add a new --user-regs option to record that is similar to --intr-regs, but acts on user regs. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170905170029.19722-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf stat: Update walltime_nsecs_stats in interval modeAndi Kleen
Some metrics (like GFLOPs) need walltime_nsecs_stats for each interval. Compute it for each interval instead of only at the end. Pointed out by Jiri. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-12-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf stat: Hide internal duration_time counterAndi Kleen
Some perf stat metrics use an internal "duration_time" metric. It is not correctly printed however. So hide it during output to avoid confusing users with 0 counts. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-11-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf stat: Support duration_time for metricsAndi Kleen
Some of the metrics formulas (like GFLOPs) need to know how long the measurement period is. Support an internal event called duration_time, which reports time in second. It maps to the dummy event, but is special cased for statistics to report the walltime duration. So far it is not printed, but only used internally for metrics. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-10-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf stat: Don't use ctx for saved values lookupAndi Kleen
We don't need to use ctx to look up events for saved values. The context is already part of the evsel pointer, which is the primary key. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-9-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf list: Add metric groups to perf listAndi Kleen
Add code to perf list to print metric groups, and metrics that don't have an event name. The metricgroup code collects the eventgroups and events into a rblist, and then prints them according to the configured filters. The metricgroups are printed by default, but can be limited by perf list metric or perf list metricgroup % perf list metricgroup .. Metric Groups: DSB: DSB_Coverage [Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)] FLOPS: GFLOPs [Giga Floating Point Operations Per Second] Frontend: IFetch_Line_Utilization [Rough Estimation of fraction of fetched lines bytes that were likely consumed by program instructions] Frontend_Bandwidth: DSB_Coverage [Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)] Memory_BW: MLP [Memory-Level-Parallelism (average number of L1 miss demand load when there is at least 1 such miss)] v2: Check return value of asprintf to fix warning on FC26 Fix key in lookup/addition for the groups list Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-8-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf stat: Support JSON metrics in perf statAndi Kleen
Add generic support for standalone metrics specified in JSON files to perf stat. A metric is a formula that uses multiple events to compute a higher level result (e.g. IPC). Previously metrics were always tied to an event and automatically enabled with that event. But now change it that we can have standalone metrics. They are in the same JSON data structure as events, but don't have an event name. We also allow to organize the metrics in metric groups, which allows a short cut to select several related metrics at once. Add a new -M / --metrics option to perf stat that adds the metrics or metric groups specified. Add the core code to manage and parse the metric groups. They are collected from the JSON data structures into a separate rblist. When computing shadow values look for metrics in that list. Then they are computed using the existing saved values infrastructure in stat-shadow.c The actual JSON metrics are in a separate pull request. % perf stat -M Summary --metric-only -a sleep 1 Performance counter stats for 'system wide': Instructions CLKS CPU_Utilization GFLOPs SMT_2T_Utilization Kernel_Utilization 317614222.0 1392930775.0 0.0 0.0 0.2 0.1 1.001497549 seconds time elapsed % perf stat -M GFLOPs flops Performance counter stats for 'flops': 3,999,541,471 fp_comp_ops_exe.sse_scalar_single # 1.2 GFLOPs (66.65%) 14 fp_comp_ops_exe.sse_scalar_double (66.65%) 0 fp_comp_ops_exe.sse_packed_double (66.67%) 0 fp_comp_ops_exe.sse_packed_single (66.70%) 0 simd_fp_256.packed_double (66.70%) 0 simd_fp_256.packed_single (66.67%) 0 duration_time 3.238372845 seconds time elapsed v2: Add missing header file v3: Move find_map to pmu.c Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-7-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf pmu: Extract function to get JSON alias mapAndi Kleen
Extract the code to get the per cpu JSON alias into a separate function for reuse. No behavior changes. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-6-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf stat: Print generic metric header even for failed expressionsAndi Kleen
Print the generic metric header even when the expression evaluation failed. Otherwise an expression that fails on the first collections due to division by zero may suddenly reappear later without an header. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-5-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf stat: Factor out generic metric printingAndi Kleen
The 'perf stat' shadow metric printing already supports generic metrics. Factor out the code doing that into a separate function that can be re-used in a later patch. No behavior changes. v2: Fix indentation Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-4-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf vendor events: Support metric_group and no event name in JSON parserAndi Kleen
Some enhancements to the JSON parser to prepare for metrics support - Parse the new MetricGroup field - Support JSON events with no event name, that have only MetricName. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-3-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf tools: Support weak groups in 'perf stat'Andi Kleen
Setting up groups can be complicated due to the complicated scheduling restrictions of different PMUs. User tools usually don't understand all these restrictions. Still in many cases it is useful to set up groups and they work most of the time. However if the group is set up wrong some members will not report any value because they never get scheduled. Add a concept of a 'weak group': try to set up a group, but if it's not schedulable fallback to not using a group. That gives us the best of both worlds: groups if they work, but still a usable fallback if they don't. In theory it would be possible to have more complex fallback strategies (e.g. try to split the group in half), but the simple fallback of not using a group seems to work for now. So far the weak group is only implemented for perf stat, not for record. Here's an unschedulable group (on IvyBridge with SMT on) % perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1 73,806,067 branches 4,848,144 branch-misses # 6.57% of all branches 14,754,458 l1d.replacement 24,905,558 l2_lines_in.all <not supported> l2_rqsts.all_code_rd <------- will never report anything With the weak group: % perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}:W' -a sleep 1 125,366,055 branches (80.02%) 9,208,402 branch-misses # 7.35% of all branches (80.01%) 24,560,249 l1d.replacement (80.00%) 43,174,971 l2_lines_in.all (80.05%) 31,891,457 l2_rqsts.all_code_rd (79.92%) The extra event scheduled with some extra multiplexing v2: Move fallback code to separate function. Add comment on for_each_group_member Adjust to new perf_evsel__close interface v3: Fix debug print out. Committer testing: Before: # perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1 Performance counter stats for 'system wide': <not counted> branches <not counted> branch-misses <not counted> l1d.replacement <not counted> l2_lines_in.all <not supported> l2_rqsts.all_code_rd 1.002147212 seconds time elapsed # perf stat -e '{branches,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1 Performance counter stats for 'system wide': 83,207,892 branches 11,065,444 l1d.replacement 28,484,024 l2_lines_in.all 12,186,179 l2_rqsts.all_code_rd 1.001739493 seconds time elapsed After: # perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}':W -a sleep 1 Performance counter stats for 'system wide': 543,323,909 branches (80.01%) 27,100,512 branch-misses # 4.99% of all branches (80.02%) 50,402,905 l1d.replacement (80.03%) 67,385,892 l2_lines_in.all (80.01%) 21,352,885 l2_rqsts.all_code_rd (79.94%) 1.001086658 seconds time elapsed # Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/20170831194036.30146-2-andi@firstfloor.org [ Add a "'perf stat' only, for now" comment in the man page, suggested by Jiri ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13perf sched timehist: Add pid and tid optionsDavid Ahern
Add options to only show event for specific pid(s) and tid(s). Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1504288152-19690-1-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-13Merge tag 'perf-urgent-for-mingo-4.14-20170912' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: - Fix TUI progress bar when delta from new total from that of the previous update is greater than the progress "step" (screen width progress bar block)) (Jiri Olsa) - Make tools/lib/api make DEBUG=1 build use -D_FORTIFY_SOURCE=2 not to cripple debuginfo, just like tools/perf/ does (Jiri Olsa) - Avoid leaking the 'perf.data' file to workloads started from the 'perf record' command line by using the O_CLOEXEC open flag (Jiri Olsa) - Fix building when libunwind's 'unwind.h' file is present in the include path, clashing with tools/perf/util/unwind.h (Milian Wolff) - Check per .perfconfig section entry flag, not just per section (Taeung Song) - Support running perf binaries with a dash in their name, needed to run perf as an AppImage (Milian Wolff) - Wait for the right child by using waitpid() when running workloads from 'perf stat', also to fix using perf as an AppImage (Milian Wolff) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-12Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf tooling updates from Ingo Molnar: "Perf tooling updates and fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf annotate browser: Help for cycling thru hottest instructions with TAB/shift+TAB perf stat: Only auto-merge events that are PMU aliases perf test: Add test case for PERF_SAMPLE_PHYS_ADDR perf script: Support physical address perf mem: Support physical address perf sort: Add sort option for physical address perf tools: Support new sample type for physical address perf vendor events powerpc: Remove duplicate events perf intel-pt: Fix syntax in documentation of config option perf test powerpc: Fix 'Object code reading' test perf trace: Support syscall name globbing perf syscalltbl: Support glob matching on syscall names perf report: Calculate the average cycles of iterations
2017-09-12perf stat: Wait for the correct childMilian Wolff
When packaging the perf userland application into an AppImage, the wait() call in perf stat returned too early. It turned out that some other child process exited, but not the one perf stat launched: $ sudo strace -e fork,execve,clone,wait4 -f ./perf-x86_64.AppImage stat sleep 1 execve("./perf-git.3a73b7f9-x86_64.AppImage", ["./perf-git.3a73b7f9-x86_64.AppIm"..., "stat", "sleep", "1"], 0x7ffec1bbf050 /* 18 vars */) = 0 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6a6e7efe50) = 3912 strace: Process 3912 attached [pid 3912] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6a6e7efe50) = 3914 strace: Process 3914 attached [pid 3912] +++ exited with 0 +++ [pid 3911] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3912, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- [pid 3914] clone(strace: Process 3915 attached child_stack=0x7f6a6d9fefb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f6a6d9ff9d0, tls=0x7f6a6d9ff700, child_tidptr=0x7f6a6d9ff9d0) = 3915 [pid 3911] execve("/tmp/.mount_perf-g6VYMpl/AppRun", ["./perf-git.3a73b7f9-x86_64.AppIm"..., "stat", "sleep", "1"], 0x14aab70 /* 21 vars */) = 0 [pid 3911] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f4ae113c4d0) = 3916 strace: Process 3916 attached [pid 3911] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3912 [pid 3916] execve("/usr/libexec/perf-core/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory) [pid 3916] execve("/tmp/./sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory) [pid 3916] execve("/home/milian/.bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory) [pid 3916] execve("/usr/lib/icecream/libexec/icecc/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory) [pid 3916] execve("/ssd2/milian/projects/compiled/other/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory) [pid 3916] execve("/home/milian/.bin/kf5/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory) [pid 3916] execve("/ssd2/milian/projects/compiled/kf5/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory) [pid 3916] execve("/home/milian/projects/compiled/other/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory) [pid 3916] execve("/home/milian/projects/compiled/kf5/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory) [pid 3916] execve("/usr/local/sbin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory) [pid 3916] execve("/usr/local/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory) [pid 3916] execve("/usr/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */ Performance counter stats for 'sleep 1': <not counted> task-clock <not counted> context-switches <not counted> cpu-migrations <not counted> page-faults <not counted> cycles <not counted> instructions <not counted> branches <not counted> branch-misses 0.000047194 seconds time elapsed [pid 3916] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=3911, si_uid=0} --- [pid 3916] +++ killed by SIGTERM +++ [pid 3911] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=3916, si_uid=0, si_status=SIGTERM, si_utime=0, si_stime=0} --- [pid 3915] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3914, si_uid=0} --- [pid 3911] +++ exited with 0 +++ [pid 3915] --- SIGHUP {si_signo=SIGHUP, si_code=SI_USER, si_pid=3914, si_uid=0} --- [pid 3915] +++ exited with 0 +++ +++ exited with 0 +++ This patch uses waitpid instead to ensure the call waits for the debuggee application launched by 'perf stat'. This fixes 'perf stat' when launched from an AppImage: $ ./perf-x86_64.AppImage stat sleep 1 Performance counter stats for 'sleep 1': 0.357235 task-clock (msec) # 0.000 CPUs utilized 1 context-switches # 0.003 M/sec 0 cpu-migrations # 0.000 K/sec 50 page-faults # 0.140 M/sec 1269602 cycles # 3.554 GHz 654278 instructions # 0.52 insn per cycle 129963 branches # 363.803 M/sec 7082 branch-misses # 5.45% of all branches 1.000633420 seconds time elapsed Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170912152523.4497-1-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-12perf tools: Support running perf binaries with a dash in their nameMilian Wolff
Previously the part behind "perf-" was interpreted as an internal perf command. If the suffix could not be handled, the execution was stopped. This makes it impossible to launch perf binaries that got renamed to have the `perf-` prefix. This is e.g. the case for appimages (e.g. "perf-x86_64.AppImage"), but would also apply to all other scenarios where users symlink or rename perf themselves: Status quo with the broken behavior: $ ln -s ./perf ./perf-custom-suffix $ ./perf-custom-suffix list cannot handle custom-suffix internally$ Also note the missing newline at the end of the error message. With this patch applied, the above works properly: $ ./perf-custom-suffix list List of pre-defined events (to be used in -e): ... Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Acked-by: David Ahern <dsahern@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20170911111422.31903-1-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>