summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2013-08-14context_tracking: User/kernel broundary cross trace eventsFrederic Weisbecker
This can be useful to track all kernel/user round trips. And it's also helpful to debug the context tracking subsystem. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org>
2013-08-14context_tracking: Optimize context switch off case with static keysFrederic Weisbecker
No need for syscall slowpath if no CPU is full dynticks, rather nop this in this case. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org>
2013-08-14context_tracking: Optimize guest APIs off case with static keyFrederic Weisbecker
Optimize guest entry/exit APIs with static keys. This minimize the overhead for those who enable CONFIG_NO_HZ_FULL without always using it. Having no range passed to nohz_full= should result in the probes overhead to be minimized. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org>
2013-08-14context_tracking: Optimize main APIs off case with static keyFrederic Weisbecker
Optimize user and exception entry/exit APIs with static keys. This minimize the overhead for those who enable CONFIG_NO_HZ_FULL without always using it. Having no range passed to nohz_full= should result in the probes to be nopped (at least we hope so...). If this proves not be enough in the long term, we'll need to bring an exception slow path by re-routing the exception handlers. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org>
2013-08-14context_tracking: Ground setup for static key useFrederic Weisbecker
Prepare for using a static key in the context tracking subsystem. This will help optimizing the off case on its many users: * user_enter, user_exit, exception_enter, exception_exit, guest_enter, guest_exit, vtime_*() Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org>
2013-08-13context_tracking: Remove full dynticks' hacky dependency on wide context ↵Frederic Weisbecker
tracking Now that the full dynticks subsystem only enables the context tracking on full dynticks CPUs, lets remove the dependency on CONTEXT_TRACKING_FORCE This dependency was a hack to enable the context tracking widely for the full dynticks susbsystem until the latter becomes able to enable it in a more CPU-finegrained fashion. Now CONTEXT_TRACKING_FORCE only stands for testing on archs that work on support for the context tracking while full dynticks can't be used yet due to unmet dependencies. It simulates a system where all CPUs are full dynticks so that RCU user extended quiescent states and dynticks cputime accounting can be tested on the given arch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org>
2013-08-13nohz: Only enable context tracking on full dynticks CPUsFrederic Weisbecker
The context tracking subsystem has the ability to selectively enable the tracking on any defined subset of CPU. This means that we can define a CPU range that doesn't run the context tracking and another range that does. Now what we want in practice is to enable the tracking on full dynticks CPUs only. In order to perform this, we just need to pass our full dynticks CPU range selection from the full dynticks subsystem to the context tracking. This way we can spare the overhead of RCU user extended quiescent state and vtime maintainance on the CPUs that are outside the full dynticks range. Just keep in mind the raw context tracking itself is still necessary everywhere. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org>
2013-08-13context_tracking: Fix runtime CPU off-caseFrederic Weisbecker
As long as the context tracking is enabled on any CPU, even a single one, all other CPUs need to keep track of their user <-> kernel boundaries cross as well. This is because a task can sleep while servicing an exception that happened in the kernel or in userspace. Then when the task eventually wakes up and return from the exception, the CPU needs to know if we resume in userspace or in the kernel. exception_exit() get this information from exception_enter() that saved the previous state. If the CPU where the exception happened didn't keep track of these informations, exception_exit() doesn't know which state tracking to restore on the CPU where the task got migrated and we may return to userspace with the context tracking subsystem thinking that we are in kernel mode. This can be fixed in the long term if we move our context tracking probes on very low level arch fast path user <-> kernel boundary, although even that is worrisome as an exception can still happen in the few instructions between the probe and the actual iret. Also we are not yet ready to set these probes in the fast path given the potential overhead problem it induces. So let's fix this by always enable context tracking even on CPUs that are not in the full dynticks range. OTOH we can spare the rcu_user_*() and vtime_user_*() calls there because the tick runs on these CPUs and we can handle RCU state machine and cputime accounting through it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org>
2013-08-13vtime: Update a few commentsFrederic Weisbecker
Update a stale comment from the old vtime era and document some locking that might be non obvious. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org>
2013-08-13context_tracing: Fix guest accounting with native vtimeFrederic Weisbecker
1) If context tracking is enabled with native vtime accounting (which combo is useless except for dev testing), we call vtime_guest_enter() and vtime_guest_exit() on host <-> guest switches. But those are stubs in this configurations. As a result, cputime is not correctly flushed on kvm context switches. 2) If context tracking runs but is disabled on some CPUs, those CPUs end up calling __guest_enter/__guest_exit which in turn call vtime_account_system(). We don't want to call this because we run in tick based accounting for these CPUs. Refactor the guest_enter/guest_exit code such that all combinations finally work. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org>
2013-08-13sched: Consolidate open coded preemptible() checksFrederic Weisbecker
preempt_schedule() and preempt_schedule_context() open code their preemptability checks. Use the standard API instead for consolidation. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Alex Shi <alex.shi@intel.com> Cc: Paul Turner <pjt@google.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Vincent Guittot <vincent.guittot@linaro.org>
2013-08-12Merge branch 'fortglx/3.11/time' of ↵Ingo Molnar
git://git.linaro.org/people/jstultz/linux into timers/urgent Pull small fix for v3.11 from John Stultz. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-25Merge branch 'timers/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/urgent Pull nohz fixes from Frederic Weisbecker. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-24nohz: fix compile warning in tick_nohz_init()Li Zhong
cpu is not used after commit 5b8621a68fdcd2baf1d3b413726f913a5254d46a Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-07-24nohz: Do not warn about unstable tsc unless user uses nohz_fullSteven Rostedt
If the user enables CONFIG_NO_HZ_FULL and runs the kernel on a machine with an unstable TSC, it will produce a WARN_ON dump as well as taint the kernel. This is a bit extreme for a kernel that just enables a feature but doesn't use it. The warning should only happen if the user tries to use the feature by either adding nohz_full to the kernel command line, or by enabling CONFIG_NO_HZ_FULL_ALL that makes nohz used on all CPUs at boot up. Note, this second feature should not (yet) be used by distros or anyone that doesn't care if NO_HZ is used or not. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-07-22Merge tag 'trace-3.11-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes and cleanups from Steven Rostedt: "This contains fixes, optimizations and some clean ups Some of the fixes need to go back to 3.10. They are minor, and deal mostly with incorrect ref counting in accessing event files. There was a couple of optimizations that should have perf perform a bit better when accessing trace events. And some various clean ups. Some of the clean ups are necessary to help in a fix to a theoretical race between opening a event file and deleting that event" * tag 'trace-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Kill the unbalanced tr->ref++ in tracing_buffers_open() tracing: Kill trace_array->waiter tracing: Do not (ab)use trace_seq in event_id_read() tracing: Simplify the iteration logic in f_start/f_next tracing: Add ref_data to function and fgraph tracer structs tracing: Miscellaneous fixes for trace_array ref counting tracing: Fix error handling to ensure instances can always be removed tracing/kprobe: Wait for disabling all running kprobe handlers tracing/perf: Move the PERF_MAX_TRACE_SIZE check into perf_trace_buf_prepare() tracing/syscall: Avoid perf_trace_buf_*() if sys_data->perf_events is empty tracing/function: Avoid perf_trace_buf_*() if event_function.perf_events is empty tracing: Typo fix on ring buffer comments tracing: Use trace_seq_puts()/trace_seq_putc() where possible tracing: Use correct config guard CONFIG_STACK_TRACER
2013-07-22sched_clock: Fix integer overflowBaruch Siach
The expression '(1 << 32)' happens to evaluate as 0 on ARM, but it evaluates as 1 on xtensa and x86_64. This zeros sched_clock_mask, and breaks sched_clock(). Set the type of 1 to 'unsigned long long' to get the value we need. Reported-by: Max Filippov <jcmvbkbc@gmail.com> Tested-by: Max Filippov <jcmvbkbc@gmail.com> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-07-19tracing: Kill the unbalanced tr->ref++ in tracing_buffers_open()Oleg Nesterov
tracing_buffers_open() does trace_array_get() and then it wrongly inrcements tr->ref again under trace_types_lock. This means that every caller leaks trace_array: # cd /sys/kernel/debug/tracing/ # mkdir instances/X # true < instances/X/per_cpu/cpu0/trace_pipe_raw # rmdir instances/X rmdir: failed to remove `instances/X': Device or resource busy Link: http://lkml.kernel.org/r/20130719153644.GA18899@redhat.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: stable@vger.kernel.org # 3.10 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-19Merge tag 'pm+acpi-3.11-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael Wysocki: "These are fixes collected over the last week, most importnatly two cpufreq reverts fixing regressions introduced in 3.10, an autoseelp fix preventing systems using it from crashing during shutdown and two ACPI scan fixes related to hotplug. Specifics: - Two cpufreq commits from the 3.10 cycle introduced regressions. The first of them was buggy (it did way much more than it needed to do) and the second one attempted to fix an issue introduced by the first one. Fixes from Srivatsa S Bhat revert both. - If autosleep triggers during system shutdown and the shutdown callbacks of some device drivers have been called already, it may crash the system. Fix from Liu Shuo prevents that from happening by making try_to_suspend() check system_state. - The ACPI memory hotplug driver doesn't clear its driver_data on errors which may cause a NULL poiter dereference to happen later. Fix from Toshi Kani. - The ACPI namespace scanning code should not try to attach scan handlers to device objects that have them already, which may confuse things quite a bit, and it should rescan the whole namespace branch starting at the given node after receiving a bus check notify event even if the device at that particular node has been discovered already. Fixes from Rafael J Wysocki. - New ACPI video blacklist entry for a system whose initial backlight setting from the BIOS doesn't make sense. From Lan Tianyu. - Garbage string output avoindance for ACPI PNP from Liu Shuo. - Two Kconfig fixes for issues introduced recently in the s3c24xx cpufreq driver (when moving the driver to drivers/cpufreq) from Paul Bolle. - Trivial comment fix in pm_wakeup.h from Chanwoo Choi" * tag 'pm+acpi-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / video: ignore BIOS initial backlight value for Fujitsu E753 PNP / ACPI: avoid garbage in resource name cpufreq: Revert commit 2f7021a8 to fix CPU hotplug regression cpufreq: s3c24xx: fix "depends on ARM_S3C24XX" in Kconfig cpufreq: s3c24xx: rename CONFIG_CPU_FREQ_S3C24XX_DEBUGFS PM / Sleep: Fix comment typo in pm_wakeup.h PM / Sleep: avoid 'autosleep' in shutdown progress cpufreq: Revert commit a66b2e to fix suspend/resume regression ACPI / memhotplug: Fix a stale pointer in error path ACPI / scan: Always call acpi_bus_scan() for bus check notifications ACPI / scan: Do not try to attach scan handlers to devices having them
2013-07-19tracing: Kill trace_array->waiterOleg Nesterov
Trivial. trace_array->waiter has no users since 6eaaa5d5 "tracing/core: use appropriate waiting on trace_pipe". Link: http://lkml.kernel.org/r/20130719142036.GA1594@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18tracing: Do not (ab)use trace_seq in event_id_read()Oleg Nesterov
event_id_read() has no reason to kmalloc "struct trace_seq" (more than PAGE_SIZE!), it can use a small buffer instead. Note: "if (*ppos) return 0" looks strange and even wrong, simple_read_from_buffer() handles ppos != 0 case corrrectly. And it seems that almost every user of trace_seq in this file should be converted too. Unless you use seq_open(), trace_seq buys nothing compared to the raw buffer, but it needs a bit more memory and code. Link: http://lkml.kernel.org/r/20130718184712.GA4786@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18tracing: Simplify the iteration logic in f_start/f_nextOleg Nesterov
f_next() looks overcomplicated, and it is not strictly correct even if this doesn't matter. Say, FORMAT_FIELD_SEPERATOR should not return NULL (means EOF) if trace_get_fields() returns an empty list, we should simply advance to FORMAT_PRINTFMT as we do when we find the end of list. 1. Change f_next() to return "struct list_head *" rather than "ftrace_event_field *", and change f_show() to do list_entry(). This simplifies the code a bit, only f_show() needs to know about ftrace_event_field, and f_next() can play with ->prev directly 2. Change f_next() to not play with ->prev / return inside the switch() statement. It can simply set node = head/common_head, the prev-or-advance-to-the-next-magic below does all work. While at it. f_start() looks overcomplicated too. I don't think *pos == 0 makes sense as a separate case, just change this code to do "while" instead of "do/while". The patch also moves f_start() down, close to f_stop(). This is purely cosmetic, just to make the locking added by the next patch more clear/visible. Link: http://lkml.kernel.org/r/20130718184710.GA4783@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18tracing: Add ref_data to function and fgraph tracer structsSteven Rostedt (Red Hat)
The selftest for function and function graph tracers are defined as __init, as they are only executed at boot up. The "tracer" structs that are associated to those tracers are not setup as __init as they are used after boot. To stop mismatch warnings, those structures need to be annotated with __ref_data. Currently, the tracer structures are defined to __read_mostly, as they do not really change. But in the future they should be converted to consts, but that will take a little work because they have a "next" pointer that gets updated when they are registered. That will have to wait till the next major release. Link: http://lkml.kernel.org/r/1373596735.17876.84.camel@gandalf.local.home Reported-by: kbuild test robot <fengguang.wu@intel.com> Reported-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18tracing: Miscellaneous fixes for trace_array ref countingAlexander Z Lam
Some error paths did not handle ref counting properly, and some trace files need ref counting. Link: http://lkml.kernel.org/r/1374171524-11948-1-git-send-email-azl@google.com Cc: stable@vger.kernel.org # 3.10 Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: David Sharp <dhsharp@google.com> Cc: Alexander Z Lam <lambchop468@gmail.com> Signed-off-by: Alexander Z Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18tracing: Fix error handling to ensure instances can always be removedAlexander Z Lam
Remove debugfs directories for tracing instances during creation if an error occurs causing the trace_array for that instance to not be added to ftrace_trace_arrays. If the directory continues to exist after the error, it cannot be removed because the respective trace_array is not in ftrace_trace_arrays. Link: http://lkml.kernel.org/r/1373502874-1706-2-git-send-email-azl@google.com Cc: stable@vger.kernel.org # 3.10 Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: David Sharp <dhsharp@google.com> Cc: Alexander Z Lam <lambchop468@gmail.com> Signed-off-by: Alexander Z Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18tracing/kprobe: Wait for disabling all running kprobe handlersMasami Hiramatsu
Wait for disabling all running kprobe handlers when a kprobe event is disabled, since the caller, trace_remove_event_call() supposes that a removing event is disabled completely by disabling the event. With this change, ftrace can ensure that there is no running event handlers after disabling it. Link: http://lkml.kernel.org/r/20130709093526.20138.93100.stgit@mhiramat-M0-7522 Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18tracing/perf: Move the PERF_MAX_TRACE_SIZE check into perf_trace_buf_prepare()Oleg Nesterov
Every perf_trace_buf_prepare() caller does WARN_ONCE(size > PERF_MAX_TRACE_SIZE, message) and "message" is almost the same. Shift this WARN_ONCE() into perf_trace_buf_prepare(). This changes the meaning of _ONCE, but I think this is fine. - 4947014 2932448 10104832 17984294 1126b26 vmlinux + 4948422 2932448 10104832 17985702 11270a6 vmlinux on my build. Link: http://lkml.kernel.org/r/20130617170211.GA19813@redhat.com Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18tracing/syscall: Avoid perf_trace_buf_*() if sys_data->perf_events is emptyOleg Nesterov
perf_trace_buf_prepare() + perf_trace_buf_submit(head, task => NULL) make no sense if hlist_empty(head). Change perf_syscall_enter/exit() to check sys_data->{enter,exit}_event->perf_events beforehand. Link: http://lkml.kernel.org/r/20130617170207.GA19806@redhat.com Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18tracing/function: Avoid perf_trace_buf_*() if event_function.perf_events is ↵Oleg Nesterov
empty perf_trace_buf_prepare() + perf_trace_buf_submit(head, task => NULL) make no sense if hlist_empty(head). Change perf_ftrace_function_call() to check event_function.perf_events beforehand. Link: http://lkml.kernel.org/r/20130617170204.GA19803@redhat.com Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18tracing: Typo fix on ring buffer commentszhangwei(Jovi)
There have some mismatch between comments with real function name, update it. This patch also add some missed function arguments description. Link: http://lkml.kernel.org/r/51E3B3B2.4080307@huawei.com Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18tracing: Use trace_seq_puts()/trace_seq_putc() where possiblezhangwei(Jovi)
For string without format specifiers, use trace_seq_puts() or trace_seq_putc(). Link: http://lkml.kernel.org/r/51E3B3AC.1000605@huawei.com Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> [ fixed a trace_seq_putc(s, " ") to trace_seq_putc(s, ' ') ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18Merge tag 'driver-core-3.11-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core patches from Greg KH: "Here are some driver core patches for 3.11-rc2. They aren't really bugfixes, but a bunch of new helper macros for drivers to properly create attribute groups, which drivers and subsystems need to fix up a ton of race issues with incorrectly creating sysfs files (binary and normal) after userspace has been told that the device is present. Also here is the ability to create binary files as attribute groups, to solve that race condition, which was impossible to do before this, so that's my fault the drivers were broken. The majority of the .c changes is indenting and moving code around a bit. It affects no existing code, but allows the large backlog of 70+ patches that I already have created to start flowing into the different subtrees, instead of having to live in my driver-core tree, causing merge nightmares in linux-next for the next few months. These were finalized too late for the -rc1 merge window, which is why they were didn't make that pull request, testing and review from others didn't happen until a few weeks ago, and then there's the whole distraction of the past few days, which prevented these from getting to you sooner, sorry about that. Oh, and there's a bugfix for the documentation build warning in here as well. All of these have been in linux-next this week, with no reported problems" * tag 'driver-core-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver-core: fix new kernel-doc warning in base/platform.c sysfs: use file mode defines from stat.h sysfs: add more helper macro's for (bin_)attribute(_groups) driver core: add default groups to struct class driver core: Introduce device_create_groups sysfs: prevent warning when only using binary attributes sysfs: add support for binary attributes in groups driver core: device.h: add RW and RO attribute macros sysfs.h: add BIN_ATTR macro sysfs.h: add ATTRIBUTE_GROUPS() macro sysfs.h: add __ATTR_RW() macro
2013-07-16sysfs.h: add __ATTR_RW() macroGreg Kroah-Hartman
A number of parts of the kernel created their own version of this, might as well have the sysfs core provide it instead. Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-15tracing: Use correct config guard CONFIG_STACK_TRACERzhangwei(Jovi)
We should use CONFIG_STACK_TRACER to guard readme text of stack tracer related file, not CONFIG_STACKTRACE. Link: http://lkml.kernel.org/r/51E3B3A2.8080609@huawei.com Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-14kernel: delete __cpuinit usage from all core kernel filesPaul Gortmaker
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the uses of the __cpuinit macros from C files in the core kernel directories (kernel, init, lib, mm, and include) that don't really have a specific maintainer. [1] https://lkml.org/lkml/2013/5/20/589 Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-14rcu: delete __cpuinit usage from all rcu filesPaul Gortmaker
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the drivers/rcu uses of the __cpuinit macros from all C files. [1] https://lkml.org/lkml/2013/5/20/589 Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@freedesktop.org> Cc: Dipankar Sarma <dipankar@in.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-15PM / Sleep: avoid 'autosleep' in shutdown progressLiu ShuoX
Prevent automatic system suspend from happening during system shutdown by making try_to_suspend() check system_state and return immediately if it is not SYSTEM_RUNNING. This prevents the following breakage from happening (scenario from Zhang Yanmin): Kernel starts shutdown and calls all device driver's shutdown callback. When a driver's shutdown is called, the last wakelock is released and suspend-to-ram starts. However, as some driver's shut down callbacks already shut down devices and disabled runtime pm, the suspend-to-ram calls driver's suspend callback without noticing that device is already off and causes crash. [rjw: Changelog] Signed-off-by: Liu ShuoX <shuox.liu@intel.com> Cc: 3.5+ <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs stuff from Al Viro: "O_TMPFILE ABI changes, Oleg's fput() series, misc cleanups, including making simple_lookup() usable for filesystems with non-NULL s_d_op, which allows us to get rid of quite a bit of ugliness" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: sunrpc: now we can just set ->s_d_op cgroup: we can use simple_lookup() now efivarfs: we can use simple_lookup() now make simple_lookup() usable for filesystems that set ->s_d_op configfs: don't open-code d_alloc_name() __rpc_lookup_create_exclusive: pass string instead of qstr rpc_create_*_dir: don't bother with qstr llist: llist_add() can use llist_add_batch() llist: fix/simplify llist_add() and llist_add_batch() fput: turn "list_head delayed_fput_list" into llist_head fs/file_table.c:fput(): add comment Safer ABI for O_TMPFILE
2013-07-14cgroup: we can use simple_lookup() nowAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-13Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "Fix a potential deadlock versus hrtimers" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix HRTICK
2013-07-13Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: - core fix for missing round up in the generic irq chip implementation - new irq chip for MOXA SoCs - a few fixes and cleanups in the irqchip drivers * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: Add support for MOXA ART SoCs genirq: generic chip: Use DIV_ROUND_UP to calculate numchips irqchip: nvic: Fix wrong num_ct argument for irq_alloc_domain_generic_chips() irqchip: sun4i: Staticize sun4i_irq_ack() irqchip: vt8500: Staticize local symbols
2013-07-13Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: - watchdog fixes for full dynticks - improved debug output for full dynticks - remove an obsolete full dynticks check - two ARM SoC clocksource drivers for sharing across SoCs - tick broadcast fix for CPU hotplug * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick: broadcast: Check broadcast mode on CPU hotplug clocksource: arm_global_timer: Add ARM global timer support clocksource: Add Marvell Orion SoC timer nohz: Remove obsolete check for full dynticks CPUs to be RCU nocbs watchdog: Boot-disable by default on full dynticks watchdog: Rename confusing state variable watchdog: Register / unregister watchdog kthreads on sysctl control nohz: Warn if the machine can not perform nohz_full
2013-07-13Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: - fix for do_div() abuse on x86 - locking fix in perf core - a pile of (build) fixes and cleanups in perf tools * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) perf/x86: Fix incorrect use of do_div() in NMI warning perf: Fix perf_lock_task_context() vs RCU perf: Remove WARN_ON_ONCE() check in __perf_event_enable() for valid scenario perf: Clone child context from parent context pmu perf script: Fix broken include in Context.xs perf tools: Fix -ldw/-lelf link test when static linking perf tools: Revert regression in configuration of Python support perf tools: Fix perf version generation perf stat: Fix per-socket output bug for uncore events perf symbols: Fix vdso list searching perf evsel: Fix missing increment in sample parsing perf tools: Update symbol_conf.nr_events when processing attribute events perf tools: Fix new_term() missing free on error path perf tools: Fix parse_events_terms() segfault on error path perf evsel: Fix count parameter to read call in event_format__new perf tools: fix a typo of a Power7 event name perf tools: Fix -x/--exclude-other option for report command perf evlist: Enhance perf_evlist__start_workload() perf record: Remove -f/--force option perf record: Remove -A/--append option ...
2013-07-13Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking updates from Thomas Gleixner: "Header cleanup as requested by Linus" (This is the "don't include support for ww_mutex in a header file that everybody wants, when almost nobody wants the ww part" change) * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mutex: Move ww_mutex definitions to ww_mutex.h
2013-07-13Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds
Pull MIPS updates from Ralf Baechle: "MIPS updates: - All the things that didn't make 3.10. - Removes the Windriver PPMC platform. Nobody will miss it. - Remove a workaround from kernel/irq/irqdomain.c which was there exclusivly for MIPS. Patch by Grant Likely. - More small improvments for the SEAD 3 platform - Improvments on the BMIPS / SMP support for the BCM63xx series. - Various cleanups of dead leftovers. - Platform support for the Cavium Octeon-based EdgeRouter Lite. Two large KVM patchsets didn't make it for this pull request because their respective authors are vacationing" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (124 commits) MIPS: Kconfig: Add missing MODULES dependency to VPE_LOADER MIPS: BCM63xx: CLK: Add dummy clk_{set,round}_rate() functions MIPS: SEAD3: Disable L2 cache on SEAD-3. MIPS: BCM63xx: Enable second core SMP on BCM6328 if available MIPS: BCM63xx: Add SMP support to prom.c MIPS: define write{b,w,l,q}_relaxed MIPS: Expose missing pci_io{map,unmap} declarations MIPS: Malta: Update GCMP detection. Revert "MIPS: make CAC_ADDR and UNCAC_ADDR account for PHYS_OFFSET" MIPS: APSP: Remove <asm/kspd.h> SSB: Kconfig: Amend SSB_EMBEDDED dependencies MIPS: microMIPS: Fix improper definition of ISA exception bit. MIPS: Don't try to decode microMIPS branch instructions where they cannot exist. MIPS: Declare emulate_load_store_microMIPS as a static function. MIPS: Fix typos and cleanup comment MIPS: Cleanup indentation and whitespace MIPS: BMIPS: support booting from physical CPU other than 0 MIPS: Only set cpu_has_mmips if SYS_SUPPORTS_MICROMIPS MIPS: GIC: Fix gic_set_affinity infinite loop MIPS: Don't save/restore OCTEON wide multiplier state on syscalls. ...
2013-07-12sched: Fix HRTICKPeter Zijlstra
David reported that the HRTICK sched feature was borken; which was enough motivation for me to finally fix it ;-) We should not allow hrtimer code to do softirq wakeups while holding scheduler locks. The hrtimer code only needs this when we accidentally try to program an expired time. We don't much care about those anyway since we have the regular tick to fall back to. Reported-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130628091853.GE29209@dyad.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-12tick: broadcast: Check broadcast mode on CPU hotplugStephen Boyd
On ARM systems the dummy clockevent is registered with the cpu hotplug notifier chain before any other per-cpu clockevent. This has the side-effect of causing the dummy clockevent to be registered first in every hotplug sequence. Because the dummy is first, we'll try to turn the broadcast source on but the code in tick_device_uses_broadcast() assumes the broadcast source is in periodic mode and calls tick_broadcast_start_periodic() unconditionally. On boot this isn't a problem because we typically haven't switched into oneshot mode yet (if at all). During hotplug, if the broadcast source isn't in periodic mode we'll replace the broadcast oneshot handler with the broadcast periodic handler and start emulating oneshot mode when we shouldn't. Due to the way the broadcast oneshot handler programs the next_event it's possible for it to contain KTIME_MAX and cause us to hang the system when the periodic handler tries to program the next tick. Fix this by using the appropriate function to start the broadcast source. Reported-by: Stephen Warren <swarren@nvidia.com> Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Cc: Mark Rutland <Mark.Rutland@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: ARM kernel mailing list <linux-arm-kernel@lists.infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joseph Lo <josephl@nvidia.com> Link: http://lkml.kernel.org/r/20130711140059.GA27430@codeaurora.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-12Merge branch 'linus' into timers/urgentThomas Gleixner
Get upstream changes so we can apply fixes against them Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-12mutex: Move ww_mutex definitions to ww_mutex.hMaarten Lankhorst
Move the definitions for wound/wait mutexes out to a separate header, ww_mutex.h. This reduces clutter in mutex.h, and increases readability. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Dave Airlie <airlied@gmail.com> Link: http://lkml.kernel.org/r/51D675DC.3000907@canonical.com [ Tidied up the code a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-12perf: Fix perf_lock_task_context() vs RCUPeter Zijlstra
Jiri managed to trigger this warning: [] ====================================================== [] [ INFO: possible circular locking dependency detected ] [] 3.10.0+ #228 Tainted: G W [] ------------------------------------------------------- [] p/6613 is trying to acquire lock: [] (rcu_node_0){..-...}, at: [<ffffffff810ca797>] rcu_read_unlock_special+0xa7/0x250 [] [] but task is already holding lock: [] (&ctx->lock){-.-...}, at: [<ffffffff810f2879>] perf_lock_task_context+0xd9/0x2c0 [] [] which lock already depends on the new lock. [] [] the existing dependency chain (in reverse order) is: [] [] -> #4 (&ctx->lock){-.-...}: [] -> #3 (&rq->lock){-.-.-.}: [] -> #2 (&p->pi_lock){-.-.-.}: [] -> #1 (&rnp->nocb_gp_wq[1]){......}: [] -> #0 (rcu_node_0){..-...}: Paul was quick to explain that due to preemptible RCU we cannot call rcu_read_unlock() while holding scheduler (or nested) locks when part of the read side critical section was preemptible. Therefore solve it by making the entire RCU read side non-preemptible. Also pull out the retry from under the non-preempt to play nice with RT. Reported-by: Jiri Olsa <jolsa@redhat.com> Helped-out-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>