summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)Author
2010-10-15Merge branch 'tip/perf/recordmcount-2' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
2010-10-14ftrace: Rename config option HAVE_C_MCOUNT_RECORD to HAVE_C_RECORDMCOUNTSteven Rostedt
The config option used by archs to let the build system know that the C version of the recordmcount works for said arch is currently called HAVE_C_MCOUNT_RECORD which enables BUILD_C_RECORDMCOUNT. To be more consistent with the name that all archs may use, it has been renamed to HAVE_C_RECORDMCOUNT. This will be less confusing since we are building a C recordmcount and not a mcount_record. Suggested-by: Ingo Molnar <mingo@elte.hu> Cc: <linux-arch@vger.kernel.org> Cc: Michal Marek <mmarek@suse.cz> Cc: linux-kbuild@vger.kernel.org Cc: John Reiser <jreiser@bitwagon.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-10-14ftrace/x86: Add support for C version of recordmcountSteven Rostedt
This patch adds the support for the C version of recordmcount and compile times show ~ 12% improvement. After verifying this works, other archs can add: HAVE_C_MCOUNT_RECORD in its Kconfig and it will use the C version of recordmcount instead of the perl version. Cc: <linux-arch@vger.kernel.org> Cc: Michal Marek <mmarek@suse.cz> Cc: linux-kbuild@vger.kernel.org Cc: John Reiser <jreiser@bitwagon.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-10-13tracing: Fix function-graph build warning on 32-bitBorislav Petkov
Fix kernel/trace/trace_functions_graph.c: In function ‘trace_print_graph_duration’: kernel/trace/trace_functions_graph.c:652: warning: comparison of distinct pointer types lacks a cast when building 36-rc6 on a 32-bit due to the strict type check failing in the min() macro. Signed-off-by: Borislav Petkov <bp@alien8.de> Cc: Chase Douglas <chase.douglas@canonical.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> LKML-Reference: <20100929080823.GA13595@liondog.tnic> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-09-22jump label: Initialize workqueue tracepoints *before* they are registeredJason Baron
Initialize the workqueue data structures *before* they are registered so that they are ready for callbacks. Signed-off-by: Jason Baron <jbaron@redhat.com> LKML-Reference: <e3a3383fc370ac7086625bebe89d9480d7caf372.1284733808.git.jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-15Merge branch 'tip/perf/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
2010-09-14tracing: Remove leftover FTRACE_ENABLE/DISABLE_MCOUNT enumsSteven Rostedt
The enums for FTRACE_ENABLE_MCOUNT and FTRACE_DISABLE_MCOUNT were used as commands to ftrace_run_update_code(). But these commands were used by the old nasty ftrace daemon that has long been slain. This is a clean up patch to remove the references to these enums and simplify the code a little. Reported-by: Wu Zhangjin <wuzhangjin@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-14tracing: Do not trace in irq when funcgraph-irq option is zeroSteven Rostedt
When the function graph tracer funcgraph-irq option is zero, disable tracing in IRQs. This makes the option have two effects. 1) When reading the trace file, do not display the functions that happen in interrupt context (when detected) 2) [*new*] When recording a trace, skip those that are detected to be in interrupt by the 'in_irq()' function Note, in_irq() is updated at irq_enter() and irq_exit(). There are still functions that are recorded by the function graph tracer that is in interrupt context but outside the irq_enter/exit() routines. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-14tracing: Add funcgraph-irq option for function graph tracer.Jiri Olsa
It's handy to be able to disable the irq related output and not to have to jump over each irq related code, when you have no interrest in it. The option is by default enabled, so there's no change to current behaviour. It affects only the final output, so all the irq related data stay in the ring buffer. Signed-off-by: Jiri Olsa <jolsa@redhat.com> LKML-Reference: <20100907145344.GC1912@jolsa.brq.redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-14tracing: Fix reading of set_ftrace_filter across listsSteven Rostedt
If we do: # cd /sys/kernel/debug # echo 'do_IRQ:traceon schedule:traceon sys_write:traceon' > \ set_ftrace_filter # cat set_ftrace_filter We get the following output: #### all functions enabled #### sys_write:traceon:unlimited schedule:traceon:unlimited do_IRQ:traceon:unlimited This outputs two lists. One is the fact that all functions are currently enabled for function tracing, the other has three probed functions, which happen to have 'traceon' as their commands. Currently, when reading the first list (functions enabled) the seq_file code will receive a "NULL" from the t_next() function causing it to exit early. This makes "read()" from userspace stop reading the code at this boarder. Although read is allowed to do this, some (broken) applications might consider this an end of file and stop early. This patch adds the start of the second list to t_next() when it finishes the first list. It is a simple change and gives the set_ftrace_filter file nicer reading ability. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-14tracing: Keep track of set_ftrace_filter position and allow lseek againSteven Rostedt
This patch keeps track of the index within the elements of set_ftrace_filter and if the position goes backwards, it nicely resets and starts from the beginning again. This allows for lseek and pread to work properly now. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-14tracing: Replace typecasted void pointer in set_ftrace_filter codeSteven Rostedt
The set_ftrace_filter uses seq_file and reads from two lists. The pointer returned by t_next() can either be of type struct dyn_ftrace or struct ftrace_func_probe. If there is a bug (there was one) the wrong pointer may be used and the reference can cause an oops. This patch makes t_next() and friends only return the iterator structure which now has a pointer of type struct dyn_ftrace and struct ftrace_func_probe. The t_show() can now test if the pointer is NULL or not and if the pointer exists, it is guaranteed to be of the correct type. Now if there's a bug, only wrong data will be shown but not an oops. Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-14tracing: Do not reset *pos in set_ftrace_filterSteven Rostedt
After the filtered functions are read, the probed functions are read from the hash in set_ftrace_filter. When the hashed probed functions are read, the *pos passed in is reset. Instead of modifying the pos given to the read function, just record the pos where the filtered functions ended and subtract from that. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-10Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread perf symbols: Fix multiple initialization of symbol system perf: Fix CPU hotplug perf, trace: Fix module leak tracing/kprobe: Fix handling of C-unlike argument names tracing/kprobes: Fix handling of argument names perf probe: Fix handling of arguments names perf probe: Fix return probe support tracing/kprobe: Fix a memory leak in error case tracing: Do not allow llseek to set_ftrace_filter
2010-09-09tracing: t_start: reset FTRACE_ITER_HASH in case of seek/preadChris Wright
Be sure to avoid entering t_show() with FTRACE_ITER_HASH set without having properly started the iterator to iterate the hash. This case is degenerate and, as discovered by Robert Swiecki, can cause t_hash_show() to misuse a pointer. This causes a NULL ptr deref with possible security implications. Tracked as CVE-2010-3079. Cc: Robert Swiecki <swiecki@google.com> Cc: Eugene Teo <eugene@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-09perf: Rework the PMU methodsPeter Zijlstra
Replace pmu::{enable,disable,start,stop,unthrottle} with pmu::{add,del,start,stop}, all of which take a flags argument. The new interface extends the capability to stop a counter while keeping it scheduled on the PMU. We replace the throttled state with the generic stopped state. This also allows us to efficiently stop/start counters over certain code paths (like IRQ handlers). It also allows scheduling a counter without it starting, allowing for a generic frozen state (useful for rotating stopped counters). The stopped state is implemented in two different ways, depending on how the architecture implemented the throttled state: 1) We disable the counter: a) the pmu has per-counter enable bits, we flip that b) we program a NOP event, preserving the counter state 2) We store the counter state and ignore all read/overflow events Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09Merge branch 'perf/urgent' into perf/coreIngo Molnar
Merge reason: Pick up pending fixes before applying dependent new changes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09perf, trace: Fix module leakLi Zefan
Commit 1c024eca (perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events) caused a module refcount leak. Reported-And-Tested-by: Avi Kivity <avi@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4C7E1F12.8030304@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-08Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: gcc-4.6: kernel/*: Fix unused but set warnings mutex: Fix annotations to include it in kernel-locking docbook pid: make setpgid() system call use RCU read-side critical section MAINTAINERS: Add RCU's public git tree
2010-09-08tracing/kprobe: Fix handling of C-unlike argument namesMasami Hiramatsu
Check the argument name whether it is invalid (not C-like symbol name). This makes event format simple. Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113912.22882.62313.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-09-08tracing/kprobes: Fix handling of argument namesMasami Hiramatsu
Set "argN" name for each argument automatically if it has no specified name. Since dynamic trace event(kprobe_events) accepts special characters for its argument, its format can show those special characters (e.g. '$', '%', '+'). However, perf can't parse those format because of the character (especially '%') mess up the format. This sets "argX" name for those arguments if user omitted the argument names. E.g. # echo 'p do_fork %ax IP=%ip $stack' > tracing/kprobe_events # cat tracing/kprobe_events p:kprobes/p_do_fork_0 do_fork arg1=%ax IP=%ip arg3=$stack Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113906.22882.59312.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-09-08tracing/kprobe: Fix a memory leak in error caseMasami Hiramatsu
Fix a memory leak which happens when a field name conflicts with others. In error case, free_trace_probe() will free all arguments until nr_args, so this increments nr_args the begining of the loop instead of the end. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113846.22882.12670.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-09-08tracing: Do not allow llseek to set_ftrace_filterSteven Rostedt
Reading the file set_ftrace_filter does three things. 1) shows whether or not filters are set for the function tracer 2) shows what functions are set for the function tracer 3) shows what triggers are set on any functions 3 is independent from 1 and 2. The way this file currently works is that it is a state machine, and as you read it, it may change state. But this assumption breaks when you use lseek() on the file. The state machine gets out of sync and the t_show() may use the wrong pointer and cause a kernel oops. Luckily, this will only kill the app that does the lseek, but the app dies while holding a mutex. This prevents anyone else from using the set_ftrace_filter file (or any other function tracing file for that matter). A real fix for this is to rewrite the code, but that is too much for a -rc release or stable. This patch simply disables llseek on the set_ftrace_filter() file for now, and we can do the proper fix for the next major release. Reported-by: Robert Swiecki <swiecki@google.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Tavis Ormandy <taviso@google.com> Cc: Eugene Teo <eugene@redhat.com> Cc: vendor-sec@lst.de Cc: <stable@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-05gcc-4.6: kernel/*: Fix unused but set warningsAndi Kleen
No real bugs I believe, just some dead code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: andi@firstfloor.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-01ring-buffer: Place duplicate expression into a single functionSteven Rostedt
While discussing the strictness of the 80 character limit on the Kernel Summit Discussion mailing list, I showed examples that I broke that limit slightly with some algorithms. In discussing with John Linville, what looked better, I realized that two of the 80 char breaking culprits were an identical expression. As a clean up, this patch moves the identical expression into its own helper function and that is used instead. As a side effect, the offending code is now under the 80 character limit. :-) This clean up code also changes the expression from (A - B) - C to A - (B + C) This makes the code look a little nicer too. Cc: John W. Linville <linville@tuxdriver.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-08-31tracing: Fix a race in function profileLi Zefan
While we are reading trace_stat/functionX and someone just disabled function_profile at that time, we can trigger this: divide error: 0000 [#1] PREEMPT SMP ... EIP is at function_stat_show+0x90/0x230 ... This fix just takes the ftrace_profile_lock and checks if rec->counter is 0. If it's 0, we know the profile buffer has been reset. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: stable@kernel.org LKML-Reference: <4C723644.4040708@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-08-27Merge branch 'perf/urgent' into perf/coreFrederic Weisbecker
Conflicts: tools/perf/util/callchain.h Merge reason: Fix a non-trivial conflict with latest fixes
2010-08-25tracing/trace_stack: Fix stack trace on ppc64Anton Blanchard
save_stack_trace() stores the instruction pointer, not the function descriptor. On ppc64 the trace stack code currently dereferences the instruction pointer and shows 8 bytes of instructions in our backtraces: # cat /sys/kernel/debug/tracing/stack_trace Depth Size Location (26 entries) ----- ---- -------- 0) 5424 112 0x6000000048000004 1) 5312 160 0x60000000ebad01b0 2) 5152 160 0x2c23000041c20030 3) 4992 240 0x600000007c781b79 4) 4752 160 0xe84100284800000c 5) 4592 192 0x600000002fa30000 6) 4400 256 0x7f1800347b7407e0 7) 4144 208 0xe89f0108f87f0070 8) 3936 272 0xe84100282fa30000 Since we aren't dealing with function descriptors, use %pS instead of %pF to fix it: # cat /sys/kernel/debug/tracing/stack_trace Depth Size Location (26 entries) ----- ---- -------- 0) 5424 112 ftrace_call+0x4/0x8 1) 5312 160 .current_io_context+0x28/0x74 2) 5152 160 .get_io_context+0x48/0xa0 3) 4992 240 .cfq_set_request+0x94/0x4c4 4) 4752 160 .elv_set_request+0x60/0x84 5) 4592 192 .get_request+0x2d4/0x468 6) 4400 256 .get_request_wait+0x7c/0x258 7) 4144 208 .__make_request+0x49c/0x610 8) 3936 272 .generic_make_request+0x390/0x434 Signed-off-by: Anton Blanchard <anton@samba.org> Cc: rostedt@goodmis.org Cc: fweisbec@gmail.com LKML-Reference: <20100825013238.GE28360@kryten> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-19Merge branch 'tip/perf/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
2010-08-19perf, tracing: add missing __percpu markupsNamhyung Kim
ftrace_event_call->perf_events, perf_trace_buf, fgraph_data->cpu_data and some local variables are percpu pointers missing __percpu markups. Add them. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <1281498479-28551-1-git-send-email-namhyung@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-08-19perf: Humanize the number of contextsFrederic Weisbecker
Instead of hardcoding the number of contexts for the recursions barriers, define a cpp constant to make the code more self-explanatory. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com>
2010-08-18tracing: Clean up seqfile code for format fileLi Zefan
Remove the nasty hack that marks a pointer's LSB to distinguish common fields from event fields. Replace it with a more sane approach. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4C6A23C2.9020606@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-08-16Merge branch 'tip/perf/urgent-3' of ↵Steven Rostedt
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into trace/tip/perf/urgent-4 Conflicts: kernel/trace/trace_events.c Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-08-13tracing: Sanitize value returned from write(trace_marker, "...", len)Marcin Slusarz
When userspace code writes non-new-line-terminated string to trace_marker file, write handler appends new-line and returns number of bytes written to trace buffer, so write(fd, "abc", 3) will return 4 That's unexpected and unfortunately it confuses glibc's fprintf function. Example: int main() { fprintf(stderr, "abc"); return 0; } $ gcc test.c -o test $ echo mmiotrace > /sys/kernel/debug/tracing/current_tracer $ ./test 2>/sys/kernel/debug/tracing/trace_marker results in infinite loop: write(fd, "abc", 3) = 4 write(fd, "", 1) = 0 write(fd, "", 1) = 0 write(fd, "", 1) = 0 write(fd, "", 1) = 0 write(fd, "", 1) = 0 write(fd, "", 1) = 0 write(fd, "", 1) = 0 (...) ...and kernel trace buffer full of empty markers. Fix it by sanitizing write return value. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> LKML-Reference: <20100727231801.GB2826@joi.lan> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-08-12tracing/events: Convert format output to seq_fileSteven Rostedt
Two new events were added that broke the current format output. Both from the SCSI system: scsi_dispatch_cmd_done and scsi_dispatch_cmd_timeout The reason is that their print_fmt exceeded a page size. Since the output of the format used simple_read_from_buffer and trace_seq, it was limited to a page size in output. This patch converts the printing of the format of an event into seq_file, which allows greater than a page size to be shown. I diffed all event formats comparing the output with and without this patch. All matched except for the above two, which showed just: FORMAT TOO BIG without this patch, but now properly displays the output with this patch. v2: Remove updating *pos in seq start function. [ Thanks to Li Zefan for pointing that out ] Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com> Cc: James Bottomley <James.Bottomley@suse.de> Cc: Tomohiro Kusumi <kusumi.tomohiro@jp.fujitsu.com> Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-08-12block: add secure discardAdrian Hunter
Secure discard is the same as discard except that all copies of the discarded sectors (perhaps created by garbage collection) must also be erased. Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com> Acked-by: Jens Axboe <axboe@kernel.dk> Cc: Kyungmin Park <kmpark@infradead.org> Cc: Madhusudhan Chikkature <madhu.cr@ti.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ben Gardiner <bengardiner@nanometrics.ca> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-10Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits) block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n xen-blkfront: fix missing out label blkdev: fix blkdev_issue_zeroout return value block: update request stacking methods to support discards block: fix missing export of blk_types.h writeback: fix bad _bh spinlock nesting drbd: revert "delay probes", feature is being re-implemented differently drbd: Initialize all members of sync_conf to their defaults [Bugz 315] drbd: Disable delay probes for the upcomming release writeback: cleanup bdi_register writeback: add new tracepoints writeback: remove unnecessary init_timer call writeback: optimize periodic bdi thread wakeups writeback: prevent unnecessary bdi threads wakeups writeback: move bdi threads exiting logic to the forker thread writeback: restructure bdi forker loop a little writeback: move last_active to bdi writeback: do not remove bdi from bdi_list writeback: simplify bdi code a little writeback: do not lose wake-ups in bdi threads ... Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and drivers/scsi/scsi_error.c as per Jens.
2010-08-07Merge branch 'bkl/core' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: do_coredump: Do not take BKL init: Remove the BKL from startup code
2010-08-07Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits) workqueue: mark init_workqueues() as early_initcall() workqueue: explain for_each_*cwq_cpu() iterators fscache: fix build on !CONFIG_SYSCTL slow-work: kill it gfs2: use workqueue instead of slow-work drm: use workqueue instead of slow-work cifs: use workqueue instead of slow-work fscache: drop references to slow-work fscache: convert operation to use workqueue instead of slow-work fscache: convert object to use workqueue instead of slow-work workqueue: fix how cpu number is stored in work->data workqueue: fix mayday_mask handling on UP workqueue: fix build problem on !CONFIG_SMP workqueue: fix locking in retry path of maybe_create_worker() async: use workqueue for worker pool workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead workqueue: implement unbound workqueue workqueue: prepare for WQ_UNBOUND implementation libata: take advantage of cmwq and remove concurrency limitations workqueue: fix worker management invocation without pending works ... Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c
2010-08-07block: push BKL into blktrace ioctlsArnd Bergmann
The blktrace driver currently needs the BKL, but we should not need to take that in the block layer, so just push it down into the driver itself. It is quite likely that the BKL is not actually required in blktrace code and could be removed in a follow-on patch. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07block: unify flags for struct bio and struct requestChristoph Hellwig
Remove the current bio flags and reuse the request flags for the bio, too. This allows to more easily trace the type of I/O from the filesystem down to the block driver. There were two flags in the bio that were missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've renamed two request flags that had a superflous RW in them. Note that the flags are in bio.h despite having the REQ_ name - as blkdev.h includes bio.h that is the only way to go for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07block: remove wrappers for request type/flagsChristoph Hellwig
Remove all the trivial wrappers for the cmd_type and cmd_flags fields in struct requests. This allows much easier grepping for different request types instead of unwinding through macros. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-06Merge branch 'timers-timekeeping-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: um: Fix read_persistent_clock fallout kgdb: Do not access xtime directly powerpc: Clean up obsolete code relating to decrementer and timebase powerpc: Rework VDSO gettimeofday to prevent time going backwards clocksource: Add __clocksource_updatefreq_hz/khz methods x86: Convert common clocksources to use clocksource_register_hz/khz timekeeping: Make xtime and wall_to_monotonic static hrtimer: Cleanup direct access to wall_to_monotonic um: Convert to use read_persistent_clock timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset powerpc: Cleanup xtime usage powerpc: Simplify update_vsyscall time: Kill off CONFIG_GENERIC_TIME time: Implement timespec_add x86: Fix vtime/file timestamp inconsistencies Trivial conflicts in Documentation/feature-removal-schedule.txt Much less trivial conflicts in arch/powerpc/kernel/time.c resolved as per Thomas' earlier merge commit 47916be4e28c ("Merge branch 'powerpc.cherry-picks' into timers/clocksource")
2010-08-06tracing: Fix ring_buffer_read_page reading out of page boundaryHuang Ying
With the configuration: CONFIG_DEBUG_PAGEALLOC=y and Shaohua's patch: [PATCH]x86: make spurious_fault check correct pte bit Function call graph trace with the following will trigger a page fault. # cd /sys/kernel/debug/tracing/ # echo function_graph > current_tracer # cat per_cpu/cpu1/trace_pipe_raw > /dev/null BUG: unable to handle kernel paging request at ffff880006e99000 IP: [<ffffffff81085572>] rb_event_length+0x1/0x3f PGD 1b19063 PUD 1b1d063 PMD 3f067 PTE 6e99160 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/virtual/net/lo/operstate CPU 1 Modules linked in: Pid: 1982, comm: cat Not tainted 2.6.35-rc6-aes+ #300 /Bochs RIP: 0010:[<ffffffff81085572>] [<ffffffff81085572>] rb_event_length+0x1/0x3f RSP: 0018:ffff880006475e38 EFLAGS: 00010006 RAX: 0000000000000ff0 RBX: ffff88000786c630 RCX: 000000000000001d RDX: ffff880006e98000 RSI: 0000000000000ff0 RDI: ffff880006e99000 RBP: ffff880006475eb8 R08: 000000145d7008bd R09: 0000000000000000 R10: 0000000000008000 R11: ffffffff815d9336 R12: ffff880006d08000 R13: ffff880006e605d8 R14: 0000000000000000 R15: 0000000000000018 FS: 00007f2b83e456f0(0000) GS:ffff880002100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff880006e99000 CR3: 00000000064a8000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process cat (pid: 1982, threadinfo ffff880006474000, task ffff880006e40770) Stack: ffff880006475eb8 ffffffff8108730f 0000000000000ff0 000000145d7008bd <0> ffff880006e98010 ffff880006d08010 0000000000000296 ffff88000786c640 <0> ffffffff81002956 0000000000000000 ffff8800071f4680 ffff8800071f4680 Call Trace: [<ffffffff8108730f>] ? ring_buffer_read_page+0x15a/0x24a [<ffffffff81002956>] ? return_to_handler+0x15/0x2f [<ffffffff8108a575>] tracing_buffers_read+0xb9/0x164 [<ffffffff810debfe>] vfs_read+0xaf/0x150 [<ffffffff81002941>] return_to_handler+0x0/0x2f [<ffffffff810248b0>] __bad_area_nosemaphore+0x17e/0x1a1 [<ffffffff81002941>] return_to_handler+0x0/0x2f [<ffffffff810248e6>] bad_area_nosemaphore+0x13/0x15 Code: 80 25 b2 16 b3 00 fe c9 c3 55 48 89 e5 f0 80 0d a4 16 b3 00 02 c9 c3 55 31 c0 48 89 e5 48 83 3d 94 16 b3 00 01 c9 0f 94 c0 c3 55 <8a> 0f 48 89 e5 83 e1 1f b8 08 00 00 00 0f b6 d1 83 fa 1e 74 27 RIP [<ffffffff81085572>] rb_event_length+0x1/0x3f RSP <ffff880006475e38> CR2: ffff880006e99000 ---[ end trace a6877bb92ccb36bb ]--- The root cause is that ring_buffer_read_page() may read out of page boundary, because the boundary checking is done after reading. This is fixed via doing boundary checking before reading. Reported-by: Shaohua Li <shaohua.li@intel.com> Cc: <stable@kernel.org> Signed-off-by: Huang Ying <ying.huang@intel.com> LKML-Reference: <1280297641.2771.307.camel@yhuang-dev> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-08-06Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits) sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug sched: No need for bootmem special cases sched: Revert nohz_ratelimit() for now sched: Reduce update_group_power() calls sched: Update rq->clock for nohz balanced cpus sched: Fix spelling of sibling sched, cpuset: Drop __cpuexit from cpu hotplug callbacks sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check() sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand() sched: thread_group_cputime: Simplify, document the "alive" check sched: Remove the obsolete exit_state/signal hacks sched: task_tick_rt: Remove the obsolete ->signal != NULL check sched: __sched_setscheduler: Read the RLIMIT_RTPRIO value lockless sched: Fix comments to make them DocBook happy sched: Fix fix_small_capacity powerpc: Exclude arch_sd_sibiling_asym_packing() on UP powerpc: Enable asymmetric SMT scheduling on POWER7 sched: Add asymmetric group packing option for sibling domain sched: Fix capacity calculations for SMT4 sched: Change nohz idle load balancing logic to push model ...
2010-08-06Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits) tracing/kprobes: unregister_trace_probe needs to be called under mutex perf: expose event__process function perf events: Fix mmap offset determination perf, powerpc: fsl_emb: Restore setting perf_sample_data.period perf, powerpc: Convert the FSL driver to use local64_t perf tools: Don't keep unreferenced maps when unmaps are detected perf session: Invalidate last_match when removing threads from rb_tree perf session: Free the ref_reloc_sym memory at the right place x86,mmiotrace: Add support for tracing STOS instruction perf, sched migration: Librarize task states and event headers helpers perf, sched migration: Librarize the GUI class perf, sched migration: Make the GUI class client agnostic perf, sched migration: Make it vertically scrollable perf, sched migration: Parameterize cpu height and spacing perf, sched migration: Fix key bindings perf, sched migration: Ignore unhandled task states perf, sched migration: Handle ignored migrate out events perf: New migration tool overview tracing: Drop cpparg() macro perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call ... Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c
2010-08-06tracing: Fix an unallocated memory access in function_graphShaohua Li
With CONFIG_DEBUG_PAGEALLOC, I observed an unallocated memory access in function_graph trace. It appears we find a small size entry in ring buffer, but we access it as a big size entry. The access overflows the page size and touches an unallocated page. Cc: <stable@kernel.org> Signed-off-by: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <1280217994.32400.76.camel@sli10-desk.sh.intel.com> [ Added a comment to explain the problem - SDR ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-08-05ftrace,kdb: Allow dumping a specific cpu's buffer with ftdumpJason Wessel
In systems with more than one processor it is desirable to look at the per cpu trace buffers. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> CC: Frederic Weisbecker <fweisbec@gmail.com>
2010-08-05ftrace,kdb: Extend kdb to be able to dump the ftrace bufferJason Wessel
Add in a helper function to allow the kdb shell to dump the ftrace buffer. Modify trace.c to expose the capability to iterate over the ftrace buffer in a read only capacity. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> CC: Frederic Weisbecker <fweisbec@gmail.com>
2010-08-04tracing/kprobes: unregister_trace_probe needs to be called under mutexSrikar Dronamraju
Comment in unregister_trace_probe() says probe_lock will be held when it gets called. However there is a case where it might called without the probe_lock being held. Also since we are traversing the probe_list and deleting an element from the probe_list, probe_lock should be held. This was first pointed in uprobes traceevent review by Frederic Weisbecker here. (http://lkml.org/lkml/2010/5/12/106) Cc: Ingo Molnar <mingo@elte.hu> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20100630084548.GA10325@linux.vnet.ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>