perf intel-pt: Add mispred-all config option to aid use with autofdo

autofdo incorrectly expects branch flags to include either mispred or predicted. In fact mispred = predicted = 0 is valid and means the flags are not supported, which they aren't by Intel PT. To make autofdo work, add a config option which will cause Intel PT decoder to set the mispred flag on all branches. Below is an example of using Intel PT with autofdo. The example is also added to the Intel PT documentation. It requires autofdo (https://github.com/google/autofdo) and gcc version 5. The bubble sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial) amended to take the number of elements as a parameter. $ gcc-5 -O3 sort.c -o sort_optimized $ ./sort_optimized 30000 Bubble sorting array of 30000 elements 2254 ms $ cat ~/.perfconfig [intel-pt] mispred-all $ perf record -e intel_pt//u ./sort 3000 Bubble sorting array of 3000 elements 58 ms [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 3.939 MB perf.data ] $ perf inject -i perf.data -o inj --itrace=i100usle --strip $ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1 $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo $ ./sort_autofdo 30000 Bubble sorting array of 30000 elements 2155 ms Note there is currently no advantage to using Intel PT instead of LBR, but that may change in the future if greater use is made of the data. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-26-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
author: Adrian Hunter <adrian.hunter@intel.com> 2015-09-25 16:15:56 +0300
committer: Arnaldo Carvalho de Melo <acme@redhat.com> 2015-09-28 17:21:00 -0300
commit: ba11ba65e02836c475427ae199adfc2d8cc4a900 (patch)
tree: eb9e5d47f7147af4a662b92c0ba2c6ac2bc9dc27 /tools/perf/Documentation
parent: f56fb9864c501dc85ebe40af5bf925dd07d990c0 (diff)
1 files changed, 29 insertions, 0 deletions
diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt
index a0fbb5d71f7d..be764f9ec769 100644
--- a/tools/perf/Documentation/intel-pt.txt
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -764,3 +764,32 @@ perf inject also accepts the --itrace option in which case tracing data is
 removed and replaced with the synthesized events. e.g.
 
 	perf inject --itrace -i perf.data -o perf.data.new
+
+Below is an example of using Intel PT with autofdo.  It requires autofdo
+(https://github.com/google/autofdo) and gcc version 5.  The bubble
+sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial)
+amended to take the number of elements as a parameter.
+
+	$ gcc-5 -O3 sort.c -o sort_optimized
+	$ ./sort_optimized 30000
+	Bubble sorting array of 30000 elements
+	2254 ms
+
+	$ cat ~/.perfconfig
+	[intel-pt]
+		mispred-all
+
+	$ perf record -e intel_pt//u ./sort 3000
+	Bubble sorting array of 3000 elements
+	58 ms
+	[ perf record: Woken up 2 times to write data ]
+	[ perf record: Captured and wrote 3.939 MB perf.data ]
+	$ perf inject -i perf.data -o inj --itrace=i100usle --strip
+	$ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1
+	$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
+	$ ./sort_autofdo 30000
+	Bubble sorting array of 30000 elements
+	2155 ms
+
+Note there is currently no advantage to using Intel PT instead of LBR, but
+that may change in the future if greater use is made of the data.
author	Adrian Hunter <adrian.hunter@intel.com>	2015-09-25 16:15:56 +0300
committer	Arnaldo Carvalho de Melo <acme@redhat.com>	2015-09-28 17:21:00 -0300
commit	ba11ba65e02836c475427ae199adfc2d8cc4a900 (patch)
tree	eb9e5d47f7147af4a662b92c0ba2c6ac2bc9dc27 /tools/perf/Documentation
parent	f56fb9864c501dc85ebe40af5bf925dd07d990c0 (diff)