summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2014-07-18tracing: Convert local function_graph functions to staticSteven Rostedt (Red Hat)
Local functions should be static. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18ftrace: Do not copy old hash when resettingWang Nan
Do not waste time copying the old hash if the hash is going to be reset. Just allocate a new hash and free the old one, as that is the same result as copying te old one and then resetting it. Link: http://lkml.kernel.org/p/1405384820-48837-1-git-send-email-wangnan0@huawei.com Signed-off-by: Wang Nan <wangnan0@huawei.com> [ SDR: Removed unused ftrace_filter_reset() function ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18tracing: let user specify tracing_thresh after selecting function_graphStanislav Fomichev
Currently, tracing_thresh works only if we specify it before selecting function_graph tracer. If we do the opposite, tracing_thresh will change it's value, but it will not be applied. To fix it, we add update_thresh callback which is called whenever tracing_thresh is updated and for function_graph tracer we register handler which reinitializes tracer depending on tracing_thresh. Link: http://lkml.kernel.org/p/20140718111727.GA3206@stfomichev-desktop.yandex.net Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18ring-buffer: Always run per-cpu ring buffer resize with schedule_work_on()Corey Minyard
The code for resizing the trace ring buffers has to run the per-cpu resize on the CPU itself. The code was using preempt_off() and running the code for the current CPU directly, otherwise calling schedule_work_on(). At least on RT this could result in the following: |BUG: sleeping function called from invalid context at kernel/rtmutex.c:673 |in_atomic(): 1, irqs_disabled(): 0, pid: 607, name: bash |3 locks held by bash/607: |CPU: 0 PID: 607 Comm: bash Not tainted 3.12.15-rt25+ #124 |(rt_spin_lock+0x28/0x68) |(free_hot_cold_page+0x84/0x3b8) |(free_buffer_page+0x14/0x20) |(rb_update_pages+0x280/0x338) |(ring_buffer_resize+0x32c/0x3dc) |(free_snapshot+0x18/0x38) |(tracing_set_tracer+0x27c/0x2ac) probably via |cd /sys/kernel/debug/tracing/ |echo 1 > events/enable ; sleep 2 |echo 1024 > buffer_size_kb If we just always use schedule_work_on(), there's no need for the preempt_off(). So do that. Link: http://lkml.kernel.org/p/1405537633-31518-1-git-send-email-cminyard@mvista.com Reported-by: Stanislav Meduna <stano@meduna.org> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TESTSteven Rostedt (Red Hat)
All users of function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST have been removed. We can safely remove them from the kernel. Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18ftrace: Remove function_trace_stop check from list funcSteven Rostedt (Red Hat)
function_trace_stop is no longer used to stop function tracing. Remove the check from __ftrace_ops_list_func(). Also, call FTRACE_WARN_ON() instead of setting function_trace_stop if a ops has no func to call. Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18ftrace: Do no disable function tracing on enabling function tracingSteven Rostedt (Red Hat)
When function tracing is being updated function_trace_stop is set to keep from tracing the updates. This was fine when function tracing was done from stop machine. But it is no longer done that way and this can cause real tracing to be missed. Remove it. Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18ftrace-graph: Remove usage of ftrace_stop() in ftrace_graph_stop()Steven Rostedt (Red Hat)
All archs now use ftrace_graph_is_dead() to stop function graph tracing. Remove the usage of ftrace_stop() as that is no longer needed. Cc: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-17ftrace-graph: Remove dependency of ftrace_stop() from ftrace_graph_stop()Steven Rostedt (Red Hat)
ftrace_stop() is going away as it disables parts of function tracing that affects users that should not be affected. But ftrace_graph_stop() is built on ftrace_stop(). Here's another example of killing all of function tracing because something went wrong with function graph tracing. Instead of disabling all users of function tracing on function graph error, disable only function graph tracing. A new function is created called ftrace_graph_is_dead(). This is called in strategic paths to prevent function graph from doing more harm and allowing at least a warning to be printed before the system crashes. NOTE: ftrace_stop() is still used until all the archs are converted over to use ftrace_graph_is_dead(). After that, ftrace_stop() will be removed. Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-17PM / Sleep: Remove ftrace_stop/start() from suspend and hibernateSteven Rostedt (Red Hat)
ftrace_stop() and ftrace_start() were added to the suspend and hibernate process because there was some function within the work flow that caused the system to reboot if it was traced. This function has recently been found (restore_processor_state()). Now there's no reason to disable function tracing while we are going into suspend or hibernate, which means that being able to trace this will help tremendously in debugging any issues with suspend or hibernate. This also means that the ftrace_stop/start() functions can be removed and simplify the function tracing code a bit. Link: http://lkml.kernel.org/r/1518201.VD9cU33jRU@vostro.rjw.lan Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16ftrace: Allow archs to specify if they need a separate function graph trampolineSteven Rostedt (Red Hat)
Currently if an arch supports function graph tracing, the core code will just assign the function graph trampoline to the function graph addr that gets called. But as the old method for function graph tracing always calls the function trampoline first and that calls the function graph trampoline, some archs may have the function graph trampoline dependent on operations that were done in the function trampoline. This causes function graph tracer to break on those archs. Instead of having the default be to set the function graph ftrace_ops to the function graph trampoline, have it instead just set it to zero which will keep it from jumping to a trampoline that is not set up to be jumped directly too. Link: http://lkml.kernel.org/r/53BED155.9040607@nvidia.com Reported-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com> Tested-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-09Merge branch 'trace/ftrace/urgent' into trace/ftrace/coreSteven Rostedt (Red Hat)
Needed 099ed151675c "tracing: Remove ftrace_stop/start() from reading the trace file" for the removal of ftrace_start/stop().
2014-07-01tracing: Remove ftrace_stop/start() from reading the trace fileSteven Rostedt (Red Hat)
Disabling reading and writing to the trace file should not be able to disable all function tracing callbacks. There's other users today (like kprobes and perf). Reading a trace file should not stop those from happening. Cc: stable@vger.kernel.org # 3.0+ Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Add description of set_graph_notrace to tracing/READMENamhyung Kim
It was missing the description of set_graph_notrace file. Add it. Link: http://lkml.kernel.org/p/1402590233-22321-5-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Improve message of empty set_ftrace_notrace fileNamhyung Kim
When there's no entry in set_ftrace_notrace, it'll print nothing, but it's better to print something like below like set_graph_notrace does: #### no functions disabled #### Link: http://lkml.kernel.org/p/1402644246-4649-1-git-send-email-namhyung@kernel.org Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Improve message of empty set_graph_notrace fileNamhyung Kim
When there's no entry in set_graph_notrace, it'll print below message #### all functions enabled #### While this is technically correct, it's better to print like below: #### no functions disabled #### Link: http://lkml.kernel.org/p/1402590233-22321-3-git-send-email-namhyung@kernel.org Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Add ftrace_graph_notrace boot parameterNamhyung Kim
The ftrace_graph_notrace option is for specifying notrace filter for function graph tracer at boot time. It can be altered after boot using set_graph_notrace file on the debugfs. Link: http://lkml.kernel.org/p/1402590233-22321-2-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Convert pr_warning() to pr_warn() in trace_events.cFabian Frederick
Convert pr_warning to standard pr_warn Define pr_fmt(fmt) fmt to avoid any future default fmt definition Link: http://lkml.kernel.org/p/1402141388-21144-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01ftrace: Do not copy hash if O_TRUNC is setNamhyung Kim
When a filter file is open for writing and O_TRUNC is set, there's no need to copy and free the filter entries. Link: http://lkml.kernel.org/p/1402474014-28655-2-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01ftrace: Fix memory leak on failure path in ftrace_allocate_pages()Namhyung Kim
As struct ftrace_page is managed in a single linked list, it should free from the start page. Link: http://lkml.kernel.org/p/1402474014-28655-1-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01ftrace: Get rid of obsolete global_start_up variableNamhyung Kim
It seems like it's a leftover from commit 4104d326b670 ("ftrace: Remove global function list and call function directly"). As it isn't updated at all, checking its value is meaningless. Let's get rid of it. Link: http://lkml.kernel.org/p/1402584972-17824-1-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Add trace_seq_buffer_ptr() helper functionSteven Rostedt (Red Hat)
There's several locations in the kernel that open code the calculation of the next location in the trace_seq buffer. This is usually done with p->buffer + p->len Instead of having this open coded, supply a helper function in the header to do it for them. This function is called trace_seq_buffer_ptr(). Link: http://lkml.kernel.org/p/20140626220129.452783019@goodmis.org Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Remove unnecessary null test before debugfs_remove()Fabian Frederick
This fixes checkpatch warning: "WARNING: debugfs_remove(NULL) is safe this check is probably not required" Link: http://lkml.kernel.org/p/1403802871-8599-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Remove trace_seq_reserve()Steven Rostedt (Red Hat)
trace_seq_reserve() has no users in the kernel, it just wastes space. Remove it. Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Make trace_seq_putmem_hex() more robustSteven Rostedt (Red Hat)
Currently trace_seq_putmem_hex() can only take as a parameter a pointer to something that is 8 bytes or less, otherwise it will overflow the buffer. This is protected by a macro that encompasses the call to trace_seq_putmem_hex() that has a BUILD_BUG_ON() for the variable before it is passed in. This is not very robust and if trace_seq_putmem_hex() ever gets used outside that macro it will cause issues. Instead of only being able to produce a hex output of memory that is for a single word, change it to be more robust and allow any size input. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Clean up trace_seq.cSteven Rostedt (Red Hat)
For using trace_seq_*() functions in NMI context, I posted a patch to move it to the lib/ directory. This caused Andrew Morton to take a look at the code. He went through and gave a lot of comments about missing kernel doc, inconsistent types for the save variable, mix match of EXPORT_SYMBOL_GPL() and EXPORT_SYMBOL() as well as missing EXPORT_SYMBOL*()s. There were a few comments about the way variables were being compared (int vs uint). All these were good review comments and should be implemented regardless of if trace_seq.c should be moved to lib/ or not. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Move the trace_seq_* functions into its own trace_seq.c fileSteven Rostedt (Red Hat)
The trace_seq_*() functions are a nice utility that allows users to manipulate buffers with printf() like formats. It has its own trace_seq.h header in include/linux and should be in its own file. Being tied with trace_output.c is rather awkward. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01ftrace: Simplify ftrace_hash_disable/enable path in ftrace_hash_moveMasami Hiramatsu
Simplify ftrace_hash_disable/enable path in ftrace_hash_move for hardening the process if the memory allocation failed. Link: http://lkml.kernel.org/p/20140617110442.15167.81076.stgit@kbuild-fedora.novalocal Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01ftrace: Add trampolines to enabled_functions debug fileSteven Rostedt (Red Hat)
The enabled_functions is used to help debug the dynamic function tracing. Adding what trampolines are attached to files is useful for debugging. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01ftrace: Optimize function graph to be called directlySteven Rostedt (Red Hat)
Function graph tracing is a bit different than the function tracers, as it is processed after either the ftrace_caller or ftrace_regs_caller and we only have one place to modify the jump to ftrace_graph_caller, the jump needs to happen after the restore of registeres. The function graph tracer is dependent on the function tracer, where even if the function graph tracing is going on by itself, the save and restore of registers is still done for function tracing regardless of if function tracing is happening, before it calls the function graph code. If there's no function tracing happening, it is possible to just call the function graph tracer directly, and avoid the wasted effort to save and restore regs for function tracing. This requires adding new flags to the dyn_ftrace records: FTRACE_FL_TRAMP FTRACE_FL_TRAMP_EN The first is set if the count for the record is one, and the ftrace_ops associated to that record has its own trampoline. That way the mcount code can call that trampoline directly. In the future, trampolines can be added to arbitrary ftrace_ops, where you can have two or more ftrace_ops registered to ftrace (like kprobes and perf) and if they are not tracing the same functions, then instead of doing a loop to check all registered ftrace_ops against their hashes, just call the ftrace_ops trampoline directly, which would call the registered ftrace_ops function directly. Without this patch perf showed: 0.05% hackbench [kernel.kallsyms] [k] ftrace_caller 0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save 0.05% hackbench [kernel.kallsyms] [k] native_sched_clock 0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit 0.04% hackbench [kernel.kallsyms] [k] preempt_trace 0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return 0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check 0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller See that the ftrace_caller took up more time than the ftrace_graph_caller did. With this patch: 0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit 0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard 0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller 0.04% hackbench [kernel.kallsyms] [k] sched_clock The ftrace_caller is no where to be found and ftrace_graph_caller still takes up the same percentage. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-30tracing/uprobes: Fix the usage of uprobe_buffer_enable() in probe_event_enable()Oleg Nesterov
The usage of uprobe_buffer_enable() added by dcad1a20 is very wrong, 1. uprobe_buffer_enable() and uprobe_buffer_disable() are not balanced, _enable() should be called only if !enabled. 2. If uprobe_buffer_enable() fails probe_event_enable() should clear tp.flags and free event_file_link. 3. If uprobe_register() fails it should do uprobe_buffer_disable(). Link: http://lkml.kernel.org/p/20140627170146.GA18332@redhat.com Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Fixes: dcad1a204f72 "tracing/uprobes: Fetch args before reserving a ring buffer" Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-30tracing/uprobes: Kill the bogus UPROBE_HANDLER_REMOVE code in ↵Oleg Nesterov
uprobe_dispatcher() I do not know why dd9fa555d7bb "tracing/uprobes: Move argument fetching to uprobe_dispatcher()" added the UPROBE_HANDLER_REMOVE, but it looks wrong. OK, perhaps it makes sense to avoid store_trace_args() if the tracee is nacked by uprobe_perf_filter(). But then we should kill the same code in uprobe_perf_func() and unify the TRACE/PROFILE filtering (we need to do this anyway to mix perf/ftrace). Until then this code actually adds the pessimization because uprobe_perf_filter() will be called twice and return T in likely case. Link: http://lkml.kernel.org/p/20140627170143.GA18329@redhat.com Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-30uprobes: Change unregister/apply to WARN() if uprobe/consumer is goneOleg Nesterov
Add WARN_ON's into uprobe_unregister() and uprobe_apply() to ensure that nobody tries to play with the dead uprobe/consumer. This helps to catch the bugs like the one fixed by the previous patch. In the longer term we should fix this poorly designed interface. uprobe_register() should return "struct uprobe *" which should be passed to apply/unregister. Plus other semantic changes, see the changelog in commit 41ccba029e94. Link: http://lkml.kernel.org/p/20140627170140.GA18322@redhat.com Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-30tracing/uprobes: Revert "Support mix of ftrace and perf"Oleg Nesterov
This reverts commit 43fe98913c9f67e3b523615ee3316f9520a623e0. This patch is very wrong. Firstly, this change leads to unbalanced uprobe_unregister(). Just for example, # perf probe -x /lib/libc.so.6 syscall # echo 1 >> /sys/kernel/debug/tracing/events/probe_libc/enable # perf record -e probe_libc:syscall whatever after that uprobe is dead (unregistered) but the user of ftrace/perf can't know this, and it looks as if nobody hits this probe. This would be easy to fix, but there are other reasons why it is not simple to mix ftrace and perf. If nothing else, they can't share the same ->consumer.filter. This is fixable too, but probably we need to fix the poorly designed uprobe_register() interface first. At least "register" and "apply" should be clearly separated. Link: http://lkml.kernel.org/p/20140627170136.GA18319@redhat.com Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: "zhangwei(Jovi)" <jovi.zhangwei@huawei.com> Cc: stable@vger.kernel.org # v3.14 Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-30ftrace: Add ftrace_rec_counter() macro to simplify the codeSteven Rostedt (Red Hat)
The ftrace dynamic record has a flags element that also has a counter. Instead of hard coding "rec->flags & ~FTRACE_FL_MASK" all over the place. Use a macro instead. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-30ftrace: Allow no regs if no more callbacks require itSteven Rostedt (Red Hat)
When registering a function callback for the function tracer, the ops can specify if it wants to save full regs (like an interrupt would) for each function that it traces, or if it does not care about regs and just wants to have the fastest return possible. Once a ops has registered a function, if other ops register that function they all will receive the regs too. That's because it does the work once, it does it for everyone. Now if the ops wanting regs unregisters the function so that there's only ops left that do not care about regs, those ops will still continue getting regs and going through the work for it on that function. This is because the disabling of the rec counter only sees the ops registered, and does not see the ops that are still attached, and does not know if the current ops that are still attached want regs or not. To play it safe, it just keeps regs being processed until no function is registered anymore. Instead of doing that, check the ops that are still registered for that function and if none want regs for it anymore, then disable the processing of regs. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-25Merge tag 'trace-fixes-v3.16-rc1-v2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing cleanups and fixes from Steven Rostedt: "This includes three patches from Oleg Nesterov. The first is a fix to a race condition that happens between enabling/disabling syscall tracepoints and new process creations (the check to go into the ptrace path for a process can be set when it shouldn't, or not set when it should). Not a major bug but one that should be fixed and even applied to stable. The other two patches are cleanup/fixes that are not that critical, but for an -rc1 release would be nice to have. They both deal with syscall tracepoints. It also includes a patch to introduce a new macro for the TRACE_EVENT() format called __field_struct(). Originally, __field() was used to record any variable into a trace event, but with the addition of setting the "is signed" attribute, the check causes anything but a primitive variable to fail to compile. That is, structs and unions can't be used as they once were. When the "is signed" check was introduce there were only primitive variables being recorded. But that will change soon and it was reported that __field() causes build failures. To solve the __field() issue, __field_struct() is introduced to allow trace_events to be able to record complex types too" * tag 'trace-fixes-v3.16-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Add __field_struct macro for TRACE_EVENT() tracing: syscall_regfunc() should not skip kernel threads tracing: Change syscall_*regfunc() to check PF_KTHREAD and use for_each_process_thread() tracing: Fix syscall_*regfunc() vs copy_process() race
2014-06-23kernel/watchdog.c: print traces for all cpus on lockup detectionAaron Tomlin
A 'softlockup' is defined as a bug that causes the kernel to loop in kernel mode for more than a predefined period to time, without giving other tasks a chance to run. Currently, upon detection of this condition by the per-cpu watchdog task, debug information (including a stack trace) is sent to the system log. On some occasions, we have observed that the "victim" rather than the actual "culprit" (i.e. the owner/holder of the contended resource) is reported to the user. Often this information has proven to be insufficient to assist debugging efforts. To avoid loss of useful debug information, for architectures which support NMI, this patch makes it possible to improve soft lockup reporting. This is accomplished by issuing an NMI to each cpu to obtain a stack trace. If NMI is not supported we just revert back to the old method. A sysctl and boot-time parameter is available to toggle this feature. [dzickus@redhat.com: add CONFIG_SMP in certain areas] [akpm@linux-foundation.org: additional CONFIG_SMP=n optimisations] [mq@suse.cz: fix warning] Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jan Moskyto Matejka <mq@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23mm, pcp: allow restoring percpu_pagelist_fraction defaultDavid Rientjes
Oleg reports a division by zero error on zero-length write() to the percpu_pagelist_fraction sysctl: divide error: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 1 PID: 9142 Comm: badarea_io Not tainted 3.15.0-rc2-vm-nfs+ #19 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8800d5aeb6e0 ti: ffff8800d87a2000 task.ti: ffff8800d87a2000 RIP: 0010: percpu_pagelist_fraction_sysctl_handler+0x84/0x120 RSP: 0018:ffff8800d87a3e78 EFLAGS: 00010246 RAX: 0000000000000f89 RBX: ffff88011f7fd000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000010 RBP: ffff8800d87a3e98 R08: ffffffff81d002c8 R09: ffff8800d87a3f50 R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000060 R13: ffffffff81c3c3e0 R14: ffffffff81cfddf8 R15: ffff8801193b0800 FS: 00007f614f1e9740(0000) GS:ffff88011f440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f614f1fa000 CR3: 00000000d9291000 CR4: 00000000000006e0 Call Trace: proc_sys_call_handler+0xb3/0xc0 proc_sys_write+0x14/0x20 vfs_write+0xba/0x1e0 SyS_write+0x46/0xb0 tracesys+0xe1/0xe6 However, if the percpu_pagelist_fraction sysctl is set by the user, it is also impossible to restore it to the kernel default since the user cannot write 0 to the sysctl. This patch allows the user to write 0 to restore the default behavior. It still requires a fraction equal to or larger than 8, however, as stated by the documentation for sanity. If a value in the range [1, 7] is written, the sysctl will return EINVAL. This successfully solves the divide by zero issue at the same time. Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Oleg Drokin <green@linuxhacker.ru> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23kernel/watchdog.c: remove preemption restrictions when restarting lockup ↵Don Zickus
detector Peter Wu noticed the following splat on his machine when updating /proc/sys/kernel/watchdog_thresh: BUG: sleeping function called from invalid context at mm/slub.c:965 in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init 3 locks held by init/1: #0: (sb_writers#3){.+.+.+}, at: [<ffffffff8117b663>] vfs_write+0x143/0x180 #1: (watchdog_proc_mutex){+.+.+.}, at: [<ffffffff810e02d3>] proc_dowatchdog+0x33/0x110 #2: (cpu_hotplug.lock){.+.+.+}, at: [<ffffffff810589c2>] get_online_cpus+0x32/0x80 Preemption disabled at:[<ffffffff810e0384>] proc_dowatchdog+0xe4/0x110 CPU: 0 PID: 1 Comm: init Not tainted 3.16.0-rc1-testing #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x4e/0x7a __might_sleep+0x11d/0x190 kmem_cache_alloc_trace+0x4e/0x1e0 perf_event_alloc+0x55/0x440 perf_event_create_kernel_counter+0x26/0xe0 watchdog_nmi_enable+0x75/0x140 update_timers_all_cpus+0x53/0xa0 proc_dowatchdog+0xe4/0x110 proc_sys_call_handler+0xb3/0xc0 proc_sys_write+0x14/0x20 vfs_write+0xad/0x180 SyS_write+0x49/0xb0 system_call_fastpath+0x16/0x1b NMI watchdog: disabled (cpu0): hardware events not enabled What happened is after updating the watchdog_thresh, the lockup detector is restarted to utilize the new value. Part of this process involved disabling preemption. Once preemption was disabled, perf tried to allocate a new event (as part of the restart). This caused the above BUG_ON as you can't sleep with preemption disabled. The preemption restriction seemed agressive as we are not doing anything on that particular cpu, but with all the online cpus (which are protected by the get_online_cpus lock). Remove the restriction and the BUG_ON goes away. Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> Reported-by: Peter Wu <peter@lekensteyn.nl> Tested-by: Peter Wu <peter@lekensteyn.nl> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.13+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23kexec: save PG_head_mask in VMCOREINFOPetr Tesarik
To allow filtering of huge pages, makedumpfile must be able to identify them in the dump. This can be done by checking the appropriate page flag, so communicate its value to makedumpfile through the VMCOREINFO interface. There's only one small catch. Depending on how many page flags are available on a given architecture, this bit can be called PG_head or PG_compound. I sent a similar patch back in 2012, but Eric Biederman did not like using an #ifdef. So, this time I'm adding a common symbol (PG_head_mask) instead. See https://lkml.org/lkml/2012/11/28/91 for the previous version. Signed-off-by: Petr Tesarik <ptesarik@suse.cz> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Shaohua Li <shli@kernel.org> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23CPU hotplug, smp: flush any pending IPI callbacks before CPU offlineSrivatsa S. Bhat
There is a race between the CPU offline code (within stop-machine) and the smp-call-function code, which can lead to getting IPIs on the outgoing CPU, *after* it has gone offline. Specifically, this can happen when using smp_call_function_single_async() to send the IPI, since this API allows sending asynchronous IPIs from IRQ disabled contexts. The exact race condition is described below. During CPU offline, in stop-machine, we don't enforce any rule in the _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the other CPUs disable their local interrupts. Due to this, we can encounter a situation in which an IPI is sent by one of the other CPUs to the outgoing CPU (while it is *still* online), but the outgoing CPU ends up noticing it only *after* it has gone offline. CPU 1 CPU 2 (Online CPU) (CPU going offline) Enter _PREPARE stage Enter _PREPARE stage Enter _DISABLE_IRQ stage = Got a device interrupt, and | Didn't notice the IPI the interrupt handler sent an | since interrupts were IPI to CPU 2 using | disabled on this CPU. smp_call_function_single_async() | = Enter _DISABLE_IRQ stage Enter _RUN stage Enter _RUN stage = Busy loop with interrupts | Invoke take_cpu_down() disabled. | and take CPU 2 offline = Enter _EXIT stage Enter _EXIT stage Re-enable interrupts Re-enable interrupts The pending IPI is noted immediately, but alas, the CPU is offline at this point. This of course, makes the smp-call-function IPI handler code running on CPU 2 unhappy and it complains about "receiving an IPI on an offline CPU". One real example of the scenario on CPU 1 is the block layer's complete-request call-path: __blk_complete_request() [interrupt-handler] raise_blk_irq() smp_call_function_single_async() However, if we look closely, the block layer does check that the target CPU is online before firing the IPI. So in this case, it is actually the unfortunate ordering/timing of events in the stop-machine phase that leads to receiving IPIs after the target CPU has gone offline. In reality, getting a late IPI on an offline CPU is not too bad by itself (this can happen even due to hardware latencies in IPI send-receive). It is a bug only if the target CPU really went offline without executing all the callbacks queued on its list. (Note that a CPU is free to execute its pending smp-call-function callbacks in a batch, without waiting for the corresponding IPIs to arrive for each one of those callbacks). So, fixing this issue can be broken up into two parts: 1. Ensure that a CPU goes offline only after executing all the callbacks queued on it. 2. Modify the warning condition in the smp-call-function IPI handler code such that it warns only if an offline CPU got an IPI *and* that CPU had gone offline with callbacks still pending in its queue. Achieving part 1 is straight-forward - just flush (execute) all the queued callbacks on the outgoing CPU in the CPU_DYING stage[1], including those callbacks for which the source CPU's IPIs might not have been received on the outgoing CPU yet. Once we do this, an IPI that arrives late on the CPU going offline (either due to the race mentioned above, or due to hardware latencies) will be completely harmless, since the outgoing CPU would have executed all the queued callbacks before going offline. Overall, this fix (parts 1 and 2 put together) additionally guarantees that we will see a warning only when the *IPI-sender code* is buggy - that is, if it queues the callback _after_ the target CPU has gone offline. [1]. The CPU_DYING part needs a little more explanation: by the time we execute the CPU_DYING notifier callbacks, the CPU would have already been marked offline. But we want to flush out the pending callbacks at this stage, ignoring the fact that the CPU is offline. So restructure the IPI handler code so that we can by-pass the "is-cpu-offline?" check in this particular case. (Of course, the right solution here is to fix CPU hotplug to mark the CPU offline _after_ invoking the CPU_DYING notifiers, but this requires a lot of audit to ensure that this change doesn't break any existing code; hence lets go with the solution proposed above until that is done). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Suggested-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <mgalbraith@suse.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sachin Kamat <sachin.kamat@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-21Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This is larger than usual: the main reason are the ARM symbol lookup speedups that came in late and were hard to resist. There's also a kprobes fix and various tooling fixes, plus the minimal re-enablement of the mmap2 support interface" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) x86/kprobes: Fix build errors and blacklist context_track_user perf tests: Add test for closing dso objects on EMFILE error perf tests: Add test for caching dso file descriptors perf tests: Allow reuse of test_file function perf tests: Spawn child for each test perf tools: Add dso__data_* interface descriptons perf tools: Allow to close dso fd in case of open failure perf tools: Add file size check and factor dso__data_read_offset perf tools: Cache dso data file descriptor perf tools: Add global count of opened dso objects perf tools: Add global list of opened dso objects perf tools: Add data_fd into dso object perf tools: Separate dso data related variables perf tools: Cache register accesses for unwind processing perf record: Fix to honor user freq/interval properly perf timechart: Reflow documentation perf probe: Improve error messages in --line option perf probe: Improve an error message of perf probe --vars mode perf probe: Show error code and description in verbose mode perf probe: Improve error message for unknown member of data structure ...
2014-06-21Merge branch 'locking-urgent-for-linus.patch' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rtmutex fixes from Thomas Gleixner: "Another three patches to make the rtmutex code more robust. That's the last urgent fallout from the big futex/rtmutex investigation" * 'locking-urgent-for-linus.patch' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rtmutex: Plug slow unlock race rtmutex: Detect changes in the pi lock chain rtmutex: Handle deadlock detection smarter
2014-06-21tracing: syscall_regfunc() should not skip kernel threadsOleg Nesterov
syscall_regfunc() ignores the kernel threads because "it has no effect", see cc3b13c1 "Don't trace kernel thread syscalls" which added this check. However, this means that a user-space task spawned by call_usermodehelper() will run without TIF_SYSCALL_TRACEPOINT if sys_tracepoint_refcount != 0. Remove this check. The unnecessary report from ret_from_fork path mentioned by cc3b13c1 is no longer possible, see See commit fb45550d76bb5 "make sure that kernel_thread() callbacks call do_exit() themselves". A kernel_thread() callback can only return and take the int_ret_from_sys_call path after do_execve() succeeds, otherwise the kernel will crash. But in this case it is no longer a kernel thread and thus is needs TIF_SYSCALL_TRACEPOINT. Link: http://lkml.kernel.org/p/20140413185938.GD20668@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-21tracing: Change syscall_*regfunc() to check PF_KTHREAD and use ↵Oleg Nesterov
for_each_process_thread() 1. Remove _irqsafe from syscall_regfunc/syscall_unregfunc, read_lock(tasklist) doesn't need to disable irqs. 2. Change this code to avoid the deprecated do_each_thread() and use for_each_process_thread() (stolen from the patch from Frederic). 3. Change syscall_regfunc() to check PF_KTHREAD to skip the kernel threads, ->mm != NULL is the common mistake. Note: probably this check should be simply removed, needs another patch. [fweisbec@gmail.com: s/do_each_thread/for_each_process_thread/] Link: http://lkml.kernel.org/p/20140413185918.GC20668@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-21tracing: Fix syscall_*regfunc() vs copy_process() raceOleg Nesterov
syscall_regfunc() and syscall_unregfunc() should set/clear TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race with copy_process() and miss the new child which was not added to the process/thread lists yet. Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT under tasklist. Link: http://lkml.kernel.org/p/20140413185854.GB20668@redhat.com Cc: stable@vger.kernel.org # 2.6.33 Fixes: a871bd33a6c0 "tracing: Add syscall tracepoints" Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-19Merge tag 'pm+acpi-3.16-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: "These are fixes mostly (ia64 regression related to the ACPI enumeration of devices, cpufreq regressions, fix for I2C controllers included in Intel SoCs, mvebu cpuidle driver fix related to sysfs) plus additional kernel command line arguments from Kees to make it possible to build kernel images with hibernation and the kernel address space randomization included simultaneously, a new ACPI battery driver quirk for a system with a broken BIOS and a couple of ACPI core cleanups. Specifics: - Fix for an ia64 regression introduced during the 3.11 cycle by a commit that modified the hardware initialization ordering and made device discovery fail on some systems. - Fix for a build problem on systems where the cpufreq-cpu0 driver is built-in and the cpu-thermal driver is modular from Arnd Bergmann. - Fix for a recently introduced computational mistake in the intel_pstate driver that leads to excessive rounding errors from Doug Smythies. - Fix for a failure code path in cpufreq_update_policy() that fails to unlock the locks acquired previously from Aaron Plattner. - Fix for the cpuidle mvebu driver to use shorter state names which will prevent the sysfs interface from returning mangled strings. From Gregory Clement. - ACPI LPSS driver fix to make sure that the I2C controllers included in BayTrail SoCs are not held in the reset state while they are being probed from Mika Westerberg. - New kernel command line arguments making it possible to build kernel images with hibernation and kASLR included at the same time and to select which of them will be used via the command line (they are still functionally mutually exclusive, though). From Kees Cook. - ACPI battery driver quirk for Acer Aspire V5-573G that fails to send battery status change notifications timely from Alexander Mezin. - Two ACPI core cleanups from Christoph Jaeger and Fabian Frederick" * tag 'pm+acpi-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpuidle: mvebu: Fix the name of the states cpufreq: unlock when failing cpufreq_update_policy() intel_pstate: Correct rounding in busy calculation ACPI: use kstrto*() instead of simple_strto*() ACPI / processor replace __attribute__((packed)) by __packed ACPI / battery: add quirk for Acer Aspire V5-573G ACPI / battery: use callback for setting up quirks ACPI / LPSS: Take I2C host controllers out of reset x86, kaslr: boot-time selectable with hibernation PM / hibernate: introduce "nohibernate" boot parameter cpufreq: cpufreq-cpu0: fix CPU_THERMAL dependency ACPI / ia64 / sba_iommu: Restore the working initialization ordering
2014-06-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-nextLinus Torvalds
Pull sparc fixes from David Miller: "Sparc sparse fixes from Sam Ravnborg" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: (67 commits) sparc64: fix sparse warnings in int_64.c sparc64: fix sparse warning in ftrace.c sparc64: fix sparse warning in kprobes.c sparc64: fix sparse warning in kgdb_64.c sparc64: fix sparse warnings in compat_audit.c sparc64: fix sparse warnings in init_64.c sparc64: fix sparse warnings in aes_glue.c sparc: fix sparse warnings in smp_32.c + smp_64.c sparc64: fix sparse warnings in perf_event.c sparc64: fix sparse warnings in kprobes.c sparc64: fix sparse warning in tsb.c sparc64: clean up compat_sigset_t.seta handling sparc64: fix sparse "Should it be static?" warnings in signal32.c sparc64: fix sparse warnings in sys_sparc32.c sparc64: fix sparse warning in pci.c sparc64: fix sparse warnings in smp_64.c sparc64: fix sparse warning in prom_64.c sparc64: fix sparse warning in btext.c sparc64: fix sparse warnings in sys_sparc_64.c + unaligned_64.c sparc64: fix sparse warning in process_64.c ... Conflicts: arch/sparc/include/asm/pgtable_64.h
2014-06-16x86, kaslr: boot-time selectable with hibernationKees Cook
Changes kASLR from being compile-time selectable (blocked by CONFIG_HIBERNATION), to being boot-time selectable (with hibernation available by default) via the "kaslr" kernel command line. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>