summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)Author
2016-12-25clocksource: Use a plain u64 instead of cycle_tThomas Gleixner
There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>
2016-12-15Merge tag 'trace-v4.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This release has a few updates: - STM can hook into the function tracer - Function filtering now supports more advance glob matching - Ftrace selftests updates and added tests - Softirq tag in traces now show only softirqs - ARM nop added to non traced locations at compile time - New trace_marker_raw file that allows for binary input - Optimizations to the ring buffer - Removal of kmap in trace_marker - Wakeup and irqsoff tracers now adhere to the set_graph_notrace file - Other various fixes and clean ups" * tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (42 commits) selftests: ftrace: Shift down default message verbosity kprobes/trace: Fix kprobe selftest for newer gcc tracing/kprobes: Add a helper method to return number of probe hits tracing/rb: Init the CPU mask on allocation tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too fgraph: Handle a case where a tracer ignores set_graph_notrace tracing: Replace kmap with copy_from_user() in trace_marker writing ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it tracing: Allow benchmark to be enabled at early_initcall() tracing: Have system enable return error if one of the events fail tracing: Do not start benchmark on boot up tracing: Have the reg function allow to fail ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inline ring-buffer: Froce rb_update_write_stamp() to be inlined ring-buffer: Force inline of hotpath helper functions tracing: Make __buffer_unlock_commit() always_inline tracing: Make tracepoint_printk a static_key ring-buffer: Always inline rb_event_data() ring-buffer: Make rb_reserve_next_event() always inlined ...
2016-12-13Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer updates from Jens Axboe: "This is the main block pull request this series. Contrary to previous release, I've kept the core and driver changes in the same branch. We always ended up having dependencies between the two for obvious reasons, so makes more sense to keep them together. That said, I'll probably try and keep more topical branches going forward, especially for cycles that end up being as busy as this one. The major parts of this pull request is: - Improved support for O_DIRECT on block devices, with a small private implementation instead of using the pig that is fs/direct-io.c. From Christoph. - Request completion tracking in a scalable fashion. This is utilized by two components in this pull, the new hybrid polling and the writeback queue throttling code. - Improved support for polling with O_DIRECT, adding a hybrid mode that combines pure polling with an initial sleep. From me. - Support for automatic throttling of writeback queues on the block side. This uses feedback from the device completion latencies to scale the queue on the block side up or down. From me. - Support from SMR drives in the block layer and for SD. From Hannes and Shaun. - Multi-connection support for nbd. From Josef. - Cleanup of request and bio flags, so we have a clear split between which are bio (or rq) private, and which ones are shared. From Christoph. - A set of patches from Bart, that improve how we handle queue stopping and starting in blk-mq. - Support for WRITE_ZEROES from Chaitanya. - Lightnvm updates from Javier/Matias. - Supoort for FC for the nvme-over-fabrics code. From James Smart. - A bunch of fixes from a whole slew of people, too many to name here" * 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits) blk-stat: fix a few cases of missing batch flushing blk-flush: run the queue when inserting blk-mq flush elevator: make the rqhash helpers exported blk-mq: abstract out blk_mq_dispatch_rq_list() helper blk-mq: add blk_mq_start_stopped_hw_queue() block: improve handling of the magic discard payload blk-wbt: don't throttle discard or write zeroes nbd: use dev_err_ratelimited in io path nbd: reset the setup task for NBD_CLEAR_SOCK nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME nvme-fabrics: Add target support for FC transport nvme-fabrics: Add host support for FC transport nvme-fabrics: Add FC transport LLDD api definitions nvme-fabrics: Add FC transport FC-NVME definitions nvme-fabrics: Add FC transport error codes to nvme.h Add type 0x28 NVME type code to scsi fc headers nvme-fabrics: patch target code in prep for FC transport support nvme-fabrics: set sqe.command_id in core not transports parser: add u64 number parser nvme-rdma: align to generic ib_event logging helper ...
2016-12-13Merge tag 'pstore-v4.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: "Improvements and fixes to pstore subsystem: - add additional checks for bad platform data - remove bounce buffer in console writer - protect read/unlink race with a mutex - correctly give up during dump locking failures - increase ftrace bandwidth by splitting ftrace buffers per CPU" * tag 'pstore-v4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: ramoops: add pdata NULL check to ramoops_probe pstore: Convert console write to use ->write_buf pstore: Protect unlink with read_mutex pstore: Use global ftrace filters for function trace filtering ftrace: Provide API to use global filtering for ftrace ops pstore: Clarify context field przs as dprzs pstore: improve error report for failed setup pstore: Merge per-CPU ftrace records into one pstore: Add ftrace timestamp counter ramoops: Split ftrace buffer space into per-CPU zones pstore: Make ramoops_init_przs generic for other prz arrays pstore: Allow prz to control need for locking pstore: Warn on PSTORE_TYPE_PMSG using deprecated function pstore: Make spinlock per zone instead of global pstore: Actually give up during locking failure
2016-12-12Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The time/timekeeping/timer folks deliver with this update: - Fix a reintroduced signed/unsigned issue and cleanup the whole signed/unsigned mess in the timekeeping core so this wont happen accidentaly again. - Add a new trace clock based on boot time - Prevent injection of random sleep times when PM tracing abuses the RTC for storage - Make posix timers configurable for real tiny systems - Add tracepoints for the alarm timer subsystem so timer based suspend wakeups can be instrumented - The usual pile of fixes and updates to core and drivers" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) timekeeping: Use mul_u64_u32_shr() instead of open coding it timekeeping: Get rid of pointless typecasts timekeeping: Make the conversion call chain consistently unsigned timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion alarmtimer: Add tracepoints for alarm timers trace: Update documentation for mono, mono_raw and boot clock trace: Add an option for boot clock as trace clock timekeeping: Add a fast and NMI safe boot clock timekeeping/clocksource_cyc2ns: Document intended range limitation timekeeping: Ignore the bogus sleep time if pm_trace is enabled selftests/timers: Fix spelling mistake "Asyncrhonous" -> "Asynchronous" clocksource/drivers/bcm2835_timer: Unmap region obtained by of_iomap clocksource/drivers/arm_arch_timer: Map frame with of_io_request_and_map() arm64: dts: rockchip: Arch counter doesn't tick in system suspend clocksource/drivers/arm_arch_timer: Don't assume clock runs in suspend posix-timers: Make them configurable posix_cpu_timers: Move the add_device_randomness() call to a proper place timer: Move sys_alarm from timer.c to itimer.c ptp_clock: Allow for it to be optional Kconfig: Regenerate *.c_shipped files after previous changes ...
2016-12-12Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp hotplug updates from Thomas Gleixner: "This is the final round of converting the notifier mess to the state machine. The removal of the notifiers and the related infrastructure will happen around rc1, as there are conversions outstanding in other trees. The whole exercise removed about 2000 lines of code in total and in course of the conversion several dozen bugs got fixed. The new mechanism allows to test almost every hotplug step standalone, so usage sites can exercise all transitions extensively. There is more room for improvement, like integrating all the pointlessly different architecture mechanisms of synchronizing, setting cpus online etc into the core code" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) tracing/rb: Init the CPU mask on allocation soc/fsl/qbman: Convert to hotplug state machine soc/fsl/qbman: Convert to hotplug state machine zram: Convert to hotplug state machine KVM/PPC/Book3S HV: Convert to hotplug state machine arm64/cpuinfo: Convert to hotplug state machine arm64/cpuinfo: Make hotplug notifier symmetric mm/compaction: Convert to hotplug state machine iommu/vt-d: Convert to hotplug state machine mm/zswap: Convert pool to hotplug state machine mm/zswap: Convert dst-mem to hotplug state machine mm/zsmalloc: Convert to hotplug state machine mm/vmstat: Convert to hotplug state machine mm/vmstat: Avoid on each online CPU loops mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead() tracing/rb: Convert to hotplug state machine oprofile/nmi timer: Convert to hotplug state machine net/iucv: Use explicit clean up labels in iucv_init() x86/pci/amd-bus: Convert to hotplug state machine x86/oprofile/nmi: Convert to hotplug state machine ...
2016-12-12kprobes/trace: Fix kprobe selftest for newer gccMarcin Nowakowski
Commit 265a5b7ee3eb ("kprobes/trace: Fix kprobe selftest for gcc 4.6") has added __used attribute to kprobe_trace_selftest_target to ensure that the method is listed in kallsyms table. However, even though the method remains in the kernel image, the actual call is optimized away as there are no side effects and the return value is never checked. Add a return value check and a 'noinline' attribute to ensure that an inlined copy of the method is not used by the caller. Also add checks that verify that the kprobe was really hit, as at the moment the tests show positive results despite the test method being optimized away. Finally, add __init annotations to find_trace_probe_file() and kprobe_trace_selftest_target() as they are only called from within an __init method. Link: http://lkml.kernel.org/r/1481293178-3128-2-git-send-email-marcin.nowakowski@imgtec.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-12tracing/kprobes: Add a helper method to return number of probe hitsMarcin Nowakowski
The number of probe hits is stored in a percpu variable and therefore can't be read directly. Add a helper method trace_kprobe_nhit() that performs the required calculation. It will be used in a follow-up commit that changes kprobe selftests to verify the number of probe hits. Link: http://lkml.kernel.org/r/1481293178-3128-1-git-send-email-marcin.nowakowski@imgtec.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-12tracing/rb: Init the CPU mask on allocationSebastian Andrzej Siewior
Before commit b32614c03413 ("tracing/rb: Convert to hotplug state machine") the allocated cpumask was initialized to the mask of ONLINE or POSSIBLE CPUs. After the CPU hotplug changes the buffer initialisation moved to trace_rb_cpu_prepare() but I forgot to initially set the cpumask to zero. This is done now. Link: http://lkml.kernel.org/r/20161207133133.hzkcqfllxcdi3joz@linutronix.de Fixes: b32614c03413 ("tracing/rb: Convert to hotplug state machine") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-12tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate resultsPavankumar Kondeti
The 's' flag is supposed to indicate that a softirq is running. This can be detected by testing the preempt_count with SOFTIRQ_OFFSET. The current code tests the preempt_count with SOFTIRQ_MASK, which would be true even when softirqs are disabled but not serving a softirq. Link: http://lkml.kernel.org/r/1481300417-3564-1-git-send-email-pkondeti@codeaurora.org Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions tooSteven Rostedt (Red Hat)
Currently both the wakeup and irqsoff traces do not handle set_graph_notrace well. The ftrace infrastructure will ignore the return paths of all functions leaving them hanging without an end: # echo '*spin*' > set_graph_notrace # cat trace [...] _raw_spin_lock() { preempt_count_add() { do_raw_spin_lock() { update_rq_clock(); Where the '*spin*' functions should have looked like this: _raw_spin_lock() { preempt_count_add(); do_raw_spin_lock(); } update_rq_clock(); Instead, have the wakeup and irqsoff tracers ignore the functions that are set by the set_graph_notrace like the function_graph tracer does. Move the logic in the function_graph tracer into a header to allow wakeup and irqsoff tracers to use it as well. Cc: Namhyung Kim <namhyung.kim@lge.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09fgraph: Handle a case where a tracer ignores set_graph_notraceSteven Rostedt (Red Hat)
Both the wakeup and irqsoff tracers can use the function graph tracer when the display-graph option is set. The problem is that they ignore the notrace file, and record the entry of functions that would be ignored by the function_graph tracer. This causes the trace->depth to be recorded into the ring buffer. The set_graph_notrace uses a trick by adding a large negative number to the trace->depth when a graph function is to be ignored. On trace output, the graph function uses the depth to record a stack of functions. But since the depth is negative, it accesses the array with a negative number and causes an out of bounds access that can cause a kernel oops or corrupt data. Have the print functions handle cases where a tracer still records functions even when they are in set_graph_notrace. Also add warnings if the depth is below zero before accessing the array. Note, the function graph logic will still prevent the return of these functions from being recorded, which means that they will be left hanging without a return. For example: # echo '*spin*' > set_graph_notrace # echo 1 > options/display-graph # echo wakeup > current_tracer # cat trace [...] _raw_spin_lock() { preempt_count_add() { do_raw_spin_lock() { update_rq_clock(); Where it should look like: _raw_spin_lock() { preempt_count_add(); do_raw_spin_lock(); } update_rq_clock(); Cc: stable@vger.kernel.org Cc: Namhyung Kim <namhyung.kim@lge.com> Fixes: 29ad23b00474 ("ftrace: Add set_graph_notrace filter") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09tracing: Replace kmap with copy_from_user() in trace_marker writingSteven Rostedt (Red Hat)
Instead of using get_user_pages_fast() and kmap_atomic() when writing to the trace_marker file, just allocate enough space on the ring buffer directly, and write into it via copy_from_user(). Writing into the trace_marker file use to allocate a temporary buffer to perform the copy_from_user(), as we didn't want to write into the ring buffer if the copy failed. But as a trace_marker write is suppose to be extremely fast, and allocating memory causes other tracepoints to trigger, Peter Zijlstra suggested using get_user_pages_fast() and kmap_atomic() to keep the user space pages in memory and reading it directly. But Henrik Austad had issues with this because it required taking the mm->mmap_sem and causing long delays with the write. Instead, just allocate the space in the ring buffer and use copy_from_user() directly. If it faults, return -EFAULT and write "<faulted>" into the ring buffer. Link: http://lkml.kernel.org/r/20161208124018.72dd0f86@gandalf.local.home Cc: Ingo Molnar <mingo@kernel.org> Cc: Henrik Austad <henrik@austad.us> Cc: Peter Zijlstra <peterz@infradead.org> Updates: d696b58ca2c3ca "tracing: Do not allocate buffer for trace_marker" Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09tracing: Allow benchmark to be enabled at early_initcall()Steven Rostedt (Red Hat)
The trace event start up selftests fails when the trace benchmark is enabled, because it is disabled during boot. It really only needs to be disabled before scheduling is set up, as it creates a thread. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09tracing: Have system enable return error if one of the events failSteven Rostedt (Red Hat)
If one of the events within a system fails to enable when "1" is written to the system "enable" file, it should return an error. Note, some events may still be enabled, but the user should know that something did go wrong. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09tracing: Do not start benchmark on boot upSteven Rostedt (Red Hat)
Trace events are enabled very early on boot up via the boot command line parameter. The benchmark tool creates a new thread to perform the trace event benchmarking. But at start up, it is called before scheduling is set up and because it creates a new thread before the init thread is created, this crashes the kernel. Have the benchmark fail to register when started via the kernel command line. Also, since the registering of a tracepoint now can handle failure cases, return -ENOMEM instead of warning if the thread cannot be created. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09tracing: Have the reg function allow to failSteven Rostedt (Red Hat)
Some tracepoints have a registration function that gets enabled when the tracepoint is enabled. There may be cases that the registraction function must fail (for example, can't allocate enough memory). In this case, the tracepoint should also fail to register, otherwise the user would not know why the tracepoint is not working. Cc: David Howells <dhowells@redhat.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Anton Blanchard <anton@samba.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-07tracing/rb: Init the CPU mask on allocationSebastian Andrzej Siewior
Before commit b32614c03413 ("tracing/rb: Convert to hotplug state machine") the allocated cpumask was initialized to the mask of online or possible CPUs. After the CPU hotplug changes the buffer initialization moved to trace_rb_cpu_prepare() but the cpumask is allocated with alloc_cpumask() and therefor has random content. As a consequence the cpu buffers are not initialized and a later access dereferences a NULL pointer. Use zalloc_cpumask() instead so trace_rb_cpu_prepare() initializes the buffers properly. Fixes: b32614c03413 ("tracing/rb: Convert to hotplug state machine") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20161207133133.hzkcqfllxcdi3joz@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02tracing/rb: Convert to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine. The notifier in struct ring_buffer is replaced by the multi instance interface. Upon __ring_buffer_alloc() invocation, cpuhp_state_add_instance() will invoke the trace_rb_cpu_prepare() on each CPU. This callback may now fail. This means __ring_buffer_alloc() will fail and cleanup (like previously) and during a CPU up event this failure will not allow the CPU to come up. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20161126231350.10321-7-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-29trace: Add an option for boot clock as trace clockJoel Fernandes
Unlike monotonic clock, boot clock as a trace clock will account for time spent in suspend useful for tracing suspend/resume. This uses earlier introduced infrastructure for using the fast boot clock. Signed-off-by: Joel Fernandes <joelaf@google.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Link: http://lkml.kernel.org/r/1480372524-15181-7-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-23ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inlineSteven Rostedt (Red Hat)
Both rb_end_commit() and rb_set_commit_to_write() are in the fast path of the ring buffer recording. Make sure they are always inlined. Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23ring-buffer: Froce rb_update_write_stamp() to be inlinedSteven Rostedt (Red Hat)
The function rb_update_write_stamp() is in the hotpath of the ring buffer recording. Make sure that it is inlined as well. There's not many places that call it. Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23ring-buffer: Force inline of hotpath helper functionsSteven Rostedt (Red Hat)
There's several small helper functions in ring_buffer.c that are used in the hot path. For some reason, even though they are marked inline, gcc tends not to enforce it. Make sure these functions are always inlined. Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23tracing: Make __buffer_unlock_commit() always_inlineSteven Rostedt (Red Hat)
The function __buffer_unlock_commit() is called in a few places outside of trace.c. But for the most part, it should really be inlined, as it is in the hot path of the trace_events. For the callers outside of trace.c, create a new function trace_buffer_unlock_commit_nostack(), as the reason it was used was to avoid the stack tracing that trace_buffer_unlock_commit() could do. Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23tracing: Make tracepoint_printk a static_keySteven Rostedt (Red Hat)
Currently, when tracepoint_printk is set (enabled by the "tp_printk" kernel command line), it causes trace events to print via printk(). This is a very dangerous operation, but is useful for debugging. The issue is, it's seldom used, but it is always checked even if it's not enabled by the kernel command line. Instead of having this feature called by a branch against a variable, turn that variable into a static key, and this will remove the test and jump. To simplify things, the functions output_printk() and trace_event_buffer_commit() were moved from trace_events.c to trace.c. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23ring-buffer: Always inline rb_event_data()Steven Rostedt (Red Hat)
The rb_event_data() is the fast path of getting the ring buffer data from an event. Externally, ring_buffer_event_data() is used to access this function. But unfortunately, rb_event_data() is not inlined, and calling ring_buffer_event_data() causes that function to be called again. Force rb_event_data() to be inlined to lower the number of operations needed when calling ring_buffer_event_data(). Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23ring-buffer: Make rb_reserve_next_event() always inlinedSteven Rostedt (Red Hat)
The function rb_reserved_next_event() is called by two functions: ring_buffer_lock_reserve() and ring_buffer_write(). This is in a very hot path of the tracing code, and it is best that they are not functions. The two callers are basically wrapers for rb_reserver_next_event(). Removing the function calls can save execution time in the hotpath of tracing. Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23tracing: Create a always_inlined __trace_buffer_lock_reserve()Steven Rostedt (Red Hat)
As Andi Kleen pointed out in the Link below, the trace events has quite a bit of code execution. A lot of that happens to be calling functions, where some of them should simply be inlined. One of these functions happens to be trace_buffer_lock_reserve() which is also a global, but it is used throughout the file it is defined in. Create a __trace_buffer_lock_reserve() that is always inlined that the file can benefit from. Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-22tracing: Add error checks to creation of event filesSteven Rostedt (Red Hat)
The creation of the set_event_pid file was assigned to a variable "entry" but that variable was never used. Ideally, it should be used to check if the file was created and warn if it was not. The files header_page, header_event should also be checked and a warning if they fail to be created. The "enable" file was moved up, as it is a more crucial file to have and a hard failure (return -ENOMEM) should be returned if it is not created. Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-22tracing: Add hook to function tracing for other subsystems to useChunyan Zhang
Currently Function traces can be only exported to the ring buffer. This adds a trace_export concept which can process traces and export them to a registered destination as an addition to the current one that outputs to Ftrace - i.e. ring buffer. In this way, if we want function traces to be sent to other destinations rather than only to the ring buffer, we just need to register a new trace_export and implement its own .write() function for writing traces to storage. With this patch, only function tracing (trace type is TRACE_FN) is supported. Link: http://lkml.kernel.org/r/1479715043-6534-2-git-send-email-zhang.chunyan@linaro.org Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
All conflicts were simple overlapping changes except perhaps for the Thunder driver. That driver has a change_mtu method explicitly for sending a message to the hardware. If that fails it returns an error. Normally a driver doesn't need an ndo_change_mtu method becuase those are usually just range changes, which are now handled generically. But since this extra operation is needed in the Thunder driver, it has to stay. However, if the message send fails we have to restore the original MTU before the change because the entire call chain expects that if an error is thrown by ndo_change_mtu then the MTU did not change. Therefore code is added to nicvf_change_mtu to remember the original MTU, and to restore it upon nicvf_update_hw_max_frs() failue. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15ftrace: Provide API to use global filtering for ftrace opsJoel Fernandes
Currently the global_ops filtering hash is not available to outside users registering for function tracing. Provide an API for those users to be able to choose global filtering. This is in preparation for pstore's ftrace feature to be able to use the global filters. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Kees Cook <keescook@chromium.org> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Joel Fernandes <joelaf@google.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2016-11-15tracing: Add new trace_marker_rawSteven Rostedt
A new file is created: /sys/kernel/debug/tracing/trace_marker_raw This allows for appications to create data structures and write the binary data directly into it, and then read the trace data out from trace_pipe_raw into the same type of data structure. This saves on converting numbers into ASCII that would be required by trace_marker. Suggested-by: Olof Johansson <olof@lixom.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14tracing: Allow wakeup_dl tracer to be used by instancesZhou Chengming
Allow wakeup_dl tracer to be used by instances, like wakeup tracer and wakeup_rt tracer. Link: http://lkml.kernel.org/r/1479093553-31264-1-git-send-email-zhouchengming1@huawei.com Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14tracing/filter: Define op as the enum that it isSteven Rostedt (Red Hat)
The trace_events_file.c filter logic can be a bit complex. I copy this into a userspace program where I can debug it a bit easier. One issue is the op is defined in most places as an int instead of as an enum, and gdb just gives the value when debugging. Having the actual op name shown in gdb is more useful. This has no functionality change, but helps in debugging when the file is debugged in user space. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14tracing: Optimise comparison filters and fix binary and for 64 bitSteven Rostedt (Red Hat)
Currently the filter logic for comparisons (like greater-than and less-than) are used, they share the same function and a switch statement is used to jump to the comparison type to perform. This is done in the extreme hot path of the tracing code, and it does not take much more space to create a unique comparison function to perform each type of comparison and remove the switch statement. Also, a bug was found where the binary and operation for 64 bits could fail if the resulting bits were greater than 32 bits, because the result was passed into a 32 bit variable. This was fixed when adding the separate binary and function. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14ftrace: Support full glob matchingMasami Hiramatsu
Use glob_match() to support flexible glob wildcards (*,?) and character classes ([) for ftrace. Since the full glob matching is slower than the current partial matching routines(*pat, pat*, *pat*), this leaves those routines and just add MATCH_GLOB for complex glob expression. e.g. ---- [root@localhost tracing]# echo 'sched*group' > set_ftrace_filter [root@localhost tracing]# cat set_ftrace_filter sched_free_group sched_change_group sched_create_group sched_online_group sched_destroy_group sched_offline_group [root@localhost tracing]# echo '[Ss]y[Ss]_*' > set_ftrace_filter [root@localhost tracing]# head set_ftrace_filter sys_arch_prctl sys_rt_sigreturn sys_ioperm SyS_iopl sys_modify_ldt SyS_mmap SyS_set_thread_area SyS_get_thread_area SyS_set_tid_address sys_fork ---- Link: http://lkml.kernel.org/r/147566869501.29136.6462645009894738056.stgit@devbox Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14ftrace: Add more checks for FTRACE_FL_DISABLED in processing ip recordsSteven Rostedt (Red Hat)
When a module is first loaded and its function ip records are added to the ftrace list of functions to modify, they are set to DISABLED, as their text is still in a read only state. When the module is fully loaded, and can be updated, the flag is cleared, and if their's any functions that should be tracing them, it is updated at that moment. But there's several locations that do record accounting and should ignore records that are marked as disabled, or they can cause issues. Alexei already fixed one location, but others need to be addressed. Cc: stable@vger.kernel.org Fixes: b7ffffbb46f2 "ftrace: Add infrastructure for delayed enabling of module functions" Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14ftrace: Ignore FTRACE_FL_DISABLED while walking dyn_ftrace recordsAlexei Starovoitov
ftrace_shutdown() checks for sanity of ftrace records and if dyn_ftrace->flags is not zero, it will warn. It can happen that 'flags' are set to FTRACE_FL_DISABLED at this point, since some module was loaded, but before ftrace_module_enable() cleared the flags for this module. In other words the module.c is doing: ftrace_module_init(mod); // calls ftrace_update_code() that sets flags=FTRACE_FL_DISABLED ... // here ftrace_shutdown() is called that warns, since err = prepare_coming_module(mod); // didn't have a chance to clear FTRACE_FL_DISABLED Fix it by ignoring disabled records. It's similar to what __ftrace_hash_rec_update() is already doing. Link: http://lkml.kernel.org/r/1478560460-3818619-1-git-send-email-ast@fb.com Cc: stable@vger.kernel.org Fixes: b7ffffbb46f2 "ftrace: Add infrastructure for delayed enabling of module functions" Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-10-28block: better op and flags encodingChristoph Hellwig
Now that we don't need the common flags to overflow outside the range of a 32-bit type we can encode them the same way for both the bio and request fields. This in addition allows us to place the operation first (and make some room for more ops while we're at it) and to stop having to shift around the operation values. In addition this allows passing around only one value in the block layer instead of two (and eventuall also in the file systems, but we can do that later) and thus clean up a lot of code. Last but not least this allows decreasing the size of the cmd_flags field in struct request to 32-bits. Various functions passing this value could also be updated, but I'd like to avoid the churn for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-22bpf: add helper for retrieving current numa node idDaniel Borkmann
Use case is mainly for soreuseport to select sockets for the local numa node, but since generic, lets also add this for other networking and tracing program types. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-12Disable the __builtin_return_address() warning globally after allLinus Torvalds
This affectively reverts commit 377ccbb48373 ("Makefile: Mute warning for __builtin_return_address(>0) for tracing only") because it turns out that it really isn't tracing only - it's all over the tree. We already also had the warning disabled separately for mm/usercopy.c (which this commit also removes), and it turns out that we will also want to disable it for get_lock_parent_ip(), that is used for at least TRACE_IRQFLAGS. Which (when enabled) ends up being all over the tree. Steven Rostedt had a patch that tried to limit it to just the config options that actually triggered this, but quite frankly, the extra complexity and abstraction just isn't worth it. We have never actually had a case where the warning is actually useful, so let's just disable it globally and not worry about it. Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina: "The usual rocket science from the trivial tree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: tracing/syscalls: fix multiline in error message text lib/Kconfig.debug: fix DEBUG_SECTION_MISMATCH description doc: vfs: fix fadvise() sycall name x86/entry: spell EBX register correctly in documentation securityfs: fix securityfs_create_dir comment irq: Fix typo in tracepoint.xml
2016-10-06Merge tag 'trace-v4.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This release cycle is rather small. Just a few fixes to tracing. The big change is the addition of the hwlat tracer. It not only detects SMIs, but also other latency that's caused by the hardware. I have detected some latency from large boxes having bus contention" * tag 'trace-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Call traceoff trigger after event is recorded ftrace/scripts: Add helper script to bisect function tracing problem functions tracing: Have max_latency be defined for HWLAT_TRACER as well tracing: Add NMI tracing in hwlat detector tracing: Have hwlat trace migrate across tracing_cpumask CPUs tracing: Add documentation for hwlat_detector tracer tracing: Added hardware latency tracer ftrace: Access ret_stack->subtime only in the function profiler function_graph: Handle TRACE_BPUTS in print_graph_comment tracing/uprobe: Drop isdigit() check in create_trace_uprobe
2016-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) BBR TCP congestion control, from Neal Cardwell, Yuchung Cheng and co. at Google. https://lwn.net/Articles/701165/ 2) Do TCP Small Queues for retransmits, from Eric Dumazet. 3) Support collect_md mode for all IPV4 and IPV6 tunnels, from Alexei Starovoitov. 4) Allow cls_flower to classify packets in ip tunnels, from Amir Vadai. 5) Support DSA tagging in older mv88e6xxx switches, from Andrew Lunn. 6) Support GMAC protocol in iwlwifi mwm, from Ayala Beker. 7) Support ndo_poll_controller in mlx5, from Calvin Owens. 8) Move VRF processing to an output hook and allow l3mdev to be loopback, from David Ahern. 9) Support SOCK_DESTROY for UDP sockets. Also from David Ahern. 10) Congestion control in RXRPC, from David Howells. 11) Support geneve RX offload in ixgbe, from Emil Tantilov. 12) When hitting pressure for new incoming TCP data SKBs, perform a partial rathern than a full purge of the OFO queue (which could be huge). From Eric Dumazet. 13) Convert XFRM state and policy lookups to RCU, from Florian Westphal. 14) Support RX network flow classification to igb, from Gangfeng Huang. 15) Hardware offloading of eBPF in nfp driver, from Jakub Kicinski. 16) New skbmod packet action, from Jamal Hadi Salim. 17) Remove some inefficiencies in snmp proc output, from Jia He. 18) Add FIB notifications to properly propagate route changes to hardware which is doing forwarding offloading. From Jiri Pirko. 19) New dsa driver for qca8xxx chips, from John Crispin. 20) Implement RFC7559 ipv6 router solicitation backoff, from Maciej Żenczykowski. 21) Add L3 mode to ipvlan, from Mahesh Bandewar. 22) Support 802.1ad in mlx4, from Moshe Shemesh. 23) Support hardware LRO in mediatek driver, from Nelson Chang. 24) Add TC offloading to mlx5, from Or Gerlitz. 25) Convert various drivers to ethtool ksettings interfaces, from Philippe Reynes. 26) TX max rate limiting for cxgb4, from Rahul Lakkireddy. 27) NAPI support for ath10k, from Rajkumar Manoharan. 28) Support XDP in mlx5, from Rana Shahout and Saeed Mahameed. 29) UDP replicast support in TIPC, from Richard Alpe. 30) Per-queue statistics for qed driver, from Sudarsana Reddy Kalluru. 31) Support BQL in thunderx driver, from Sunil Goutham. 32) TSO support in alx driver, from Tobias Regnery. 33) Add stream parser engine and use it in kcm. 34) Support async DHCP replies in ipconfig module, from Uwe Kleine-König. 35) DSA port fast aging for mv88e6xxx driver, from Vivien Didelot. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1715 commits) mlxsw: switchx2: Fix misuse of hard_header_len mlxsw: spectrum: Fix misuse of hard_header_len net/faraday: Stop NCSI device on shutdown net/ncsi: Introduce ncsi_stop_dev() net/ncsi: Rework the channel monitoring net/ncsi: Allow to extend NCSI request properties net/ncsi: Rework request index allocation net/ncsi: Don't probe on the reserved channel ID (0x1f) net/ncsi: Introduce NCSI_RESERVED_CHANNEL net/ncsi: Avoid unused-value build warning from ia64-linux-gcc net: Add netdev all_adj_list refcnt propagation to fix panic net: phy: Add Edge-rate driver for Microsemi PHYs. vmxnet3: Wake queue from reset work i40e: avoid NULL pointer dereference and recursive errors on early PCI error qed: Add RoCE ll2 & GSI support qed: Add support for memory registeration verbs qed: Add support for QP verbs qed: PD,PKEY and CQ verb support qed: Add support for RoCE hw init qede: Add qedr framework ...
2016-10-03Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull low-level x86 updates from Ingo Molnar: "In this cycle this topic tree has become one of those 'super topics' that accumulated a lot of changes: - Add CONFIG_VMAP_STACK=y support to the core kernel and enable it on x86 - preceded by an array of changes. v4.8 saw preparatory changes in this area already - this is the rest of the work. Includes the thread stack caching performance optimization. (Andy Lutomirski) - switch_to() cleanups and all around enhancements. (Brian Gerst) - A large number of dumpstack infrastructure enhancements and an unwinder abstraction. The secret long term plan is safe(r) live patching plus maybe another attempt at debuginfo based unwinding - but all these current bits are standalone enhancements in a frame pointer based debug environment as well. (Josh Poimboeuf) - More __ro_after_init and const annotations. (Kees Cook) - Enable KASLR for the vmemmap memory region. (Thomas Garnier)" [ The virtually mapped stack changes are pretty fundamental, and not x86-specific per se, even if they are only used on x86 right now. ] * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits) x86/asm: Get rid of __read_cr4_safe() thread_info: Use unsigned long for flags x86/alternatives: Add stack frame dependency to alternative_call_2() x86/dumpstack: Fix show_stack() task pointer regression x86/dumpstack: Remove dump_trace() and related callbacks x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder oprofile/x86: Convert x86_backtrace() to use the new unwinder x86/stacktrace: Convert save_stack_trace_*() to use the new unwinder perf/x86: Convert perf_callchain_kernel() to use the new unwinder x86/unwind: Add new unwind interface and implementations x86/dumpstack: Remove NULL task pointer convention fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK lib/syscall: Pin the task stack in collect_syscall() x86/process: Pin the target stack in get_wchan() x86/dumpstack: Pin the target stack when dumping it kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function sched/core: Add try_get_task_stack() and put_task_stack() x86/entry/64: Fix a minor comment rebase error iommu/amd: Don't put completion-wait semaphore on stack ...
2016-10-03Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main kernel side changes were: - uprobes enhancements (Masami Hiramatsu) - Uncore group events enhancements (David Carrillo-Cisneros) - x86 Intel: Add support for Skylake server uncore PMUs (Kan Liang) - x86 Intel: LBR cleanups and enhancements, for better branch annotation tracking (Peter Zijlstra) - x86 Intel: Add support for PTWRITE and power event tracing (Alexander Shishkin) - ... various fixes, cleanups and smaller enhancements. Lots of tooling changes - a couple of highlights: - Support event group view with hierarchy mode in 'perf top' and 'perf report' (Namhyung Kim) e.g.: $ perf record -e '{cycles,instructions}' make $ perf report --hierarchy --stdio ... # Overhead Command / Shared Object / Symbol # ...................... .................................. ... 25.74% 27.18%sh 19.96% 24.14%libc-2.24.so 9.55% 14.64%[.] __strcmp_sse2 1.54% 0.00%[.] __tfind 1.07% 1.13%[.] _int_malloc 0.95% 0.00%[.] __strchr_sse2 0.89% 1.39%[.] __tsearch 0.76% 0.00%[.] strlen - Add branch stack / basic block info to 'perf annotate --stdio', where for each branch, we add an asm comment after the instruction with information on how often it was taken and predicted. See example with color output at: http://vger.kernel.org/~acme/perf/annotate_basic_blocks.png (Peter Zijlstra) - Add support for using symbols in address filters with Intel PT and ARM CoreSight (hardware assisted tracing facilities) (Adrian Hunter, Mathieu Poirier) - Add support for interacting with Coresight PMU ETMs/PTMs, that are IP blocks to perform hardware assisted tracing on a ARM CPU core (Mathieu Poirier) - Support generating cross arch probes, i.e. if you specify a vmlinux file for different arch than the one in the host machine, $ perf probe --definition function_name args will generate the probe definition string needed to append to the target machine /sys/kernel/debug/tracing/kprobes_events file, using scripting (Masami Hiramatsu). - Allow configuring the default 'perf report -s' sort order in ~/.perfconfig, for instance, "sym,dso" may be more fitting for kernel developers. (Arnaldo Carvalho de Melo) - ... plus lots of other changes, refactorings, features and fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (149 commits) perf tests: Add dwarf unwind test for powerpc perf probe: Match linkage name with mangled name perf probe: Fix to cut off incompatible chars from group name perf probe: Skip if the function address is 0 perf probe: Ignore the error of finding inline instance perf intel-pt: Fix decoding when there are address filters perf intel-pt: Enable decoder to handle TIP.PGD with missing IP perf intel-pt: Read address filter from AUXTRACE_INFO event perf intel-pt: Record address filter in AUXTRACE_INFO event perf intel-pt: Add a helper function for processing AUXTRACE_INFO perf intel-pt: Fix missing error codes processing auxtrace_info perf intel-pt: Add support for recording the max non-turbo ratio perf intel-pt: Fix snapshot overlap detection decoder errors perf probe: Increase debug level of SDT debug messages perf record: Add support for using symbols in address filters perf symbols: Add dso__last_symbol() perf record: Fix error paths perf record: Rename label 'out_symbol_exit' perf script: Fix vanished idle symbols perf evsel: Add support for address filters ...
2016-10-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Three sets of overlapping changes. Nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30Merge branch 'x86/urgent' into x86/asmThomas Gleixner
Get the cr4 fixes so we can apply the final cleanup
2016-09-29tracing/syscalls: fix multiline in error message textColin Ian King
pr_info message spans two lines and the literal string is missing a white space between words. Add the white space. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>