summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2011-05-18ftrace: Separate hash allocation and assignmentSteven Rostedt
When filtering, allocate a hash to insert the function records. After the filtering is complete, assign it to the ftrace_ops structure. This allows the ftrace_ops structure to have a much smaller array of hash buckets instead of wasting a lot of memory. A read only empty_hash is created to be the minimum size that any ftrace_ops can point to. When a new hash is created, it has the following steps: o Allocate a default hash. o Walk the function records assigning the filtered records to the hash o Allocate a new hash with the appropriate size buckets o Move the entries from the default hash to the new hash. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-05-18ftrace: Create a global_ops to hold the filter and notrace hashesSteven Rostedt
Combine the filter and notrace hashes to be accessed by a single entity, the global_ops. The global_ops is a ftrace_ops structure that is passed to different functions that can read or modify the filtering of the function tracer. The ftrace_ops structure was modified to hold a filter and notrace hashes so that later patches may allow each ftrace_ops to have its own set of rules to what functions may be filtered. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-05-18ftrace: Use hash instead for FTRACE_FL_FILTERSteven Rostedt
When multiple users are allowed to have their own set of functions to trace, having the FTRACE_FL_FILTER flag will not be enough to handle the accounting of those users. Each user will need their own set of functions. Replace the FTRACE_FL_FILTER with a filter_hash instead. This is temporary until the rest of the function filtering accounting gets in. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-05-18ftrace: Replace FTRACE_FL_NOTRACE flag with a hash of ignored functionsSteven Rostedt
To prepare for the accounting system that will allow multiple users of the function tracer, having the FTRACE_FL_NOTRACE as a flag in the dyn_trace record does not make sense. All ftrace_ops will soon have a hash of functions they should trace and not trace. By making a global hash of functions not to trace makes this easier for the transition. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-05-18irq: Export functions to allow modular irq driversJonathan Cameron
Export handle_simple_irq, irq_modify_status, irq_alloc_descs, irq_free_descs and generic_handle_irq to allow their usage in modules. First user is IIO, which wants to be built modular, but needs to be able to create irq chips, allocate and configure interrupt descriptors and handle demultiplexing interrupts. [ tglx: Moved the uninlinig of generic_handle_irq to a separate patch ] Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk> Link: http://lkml.kernel.org/r/%3C1305711544-505-1-git-send-email-jic23%40cam.ac.uk%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-18genirq: Uninline and sanity check generic_handle_irq()Thomas Gleixner
generic_handle_irq() is missing a NULL pointer check for the result of irq_to_desc. This was a not a big problem, but we want to expose it to drivers, so we better have sanity checks in place. Add a return value as well, which indicates that the irq number was valid and the handler was invoked. Based on the pure code move from Jonathan Cameron. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jonathan Cameron <jic23@cam.ac.uk>
2011-05-18genirq: Remove pointless ifdefsThomas Gleixner
kernel/irq/ is only built when CONFIG_GENERIC_HARDIRQS=y. So making code inside of kernel/irq/ conditional on CONFIG_GENERIC_HARDIRQS is pointless. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-17PM: Allow drivers to allocate memory from .prepare() callbacks safelyRafael J. Wysocki
If device drivers allocate substantial amounts of memory (above 1 MB) in their hibernate .freeze() callbacks (or in their legacy suspend callbcks during hibernation), the subsequent creation of hibernate image may fail due to the lack of memory. This is the case, because the drivers' .freeze() callbacks are executed after the hibernate memory preallocation has been carried out and the preallocated amount of memory may be too small to cover the new driver allocations. Unfortunately, the drivers' .prepare() callbacks also are executed after the hibernate memory preallocation has completed, so they are not suitable for allocating additional memory either. Thus the only way a driver can safely allocate memory during hibernation is to use a hibernate/suspend notifier. However, the notifiers are called before the freezing of user space and the drivers wanting to use them for allocating additional memory may not know how much memory needs to be allocated at that point. To let device drivers overcome this difficulty rework the hibernation sequence so that the memory preallocation is carried out after the drivers' .prepare() callbacks have been executed, so that the .prepare() callbacks can be used for allocating additional memory to be used by the drivers' .freeze() callbacks. Update documentation to match the new behavior of the code. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-05-17PM: Remove CONFIG_PM_VERBOSERafael J. Wysocki
Now that we have CONFIG_DYNAMIC_DEBUG there is no need for yet another flag causing dev_dbg() and pr_debug() statements in the core PM code to produce output. Moreover, CONFIG_PM_VERBOSE causes so much output to be generated that it's not really useful and almost no one sets it. References: https://bugzilla.kernel.org/show_bug.cgi?id=23182 Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-05-17Merge branch 'power-domains' into for-linusRafael J. Wysocki
* power-domains: PM: Fix build issue in clock_ops.c for CONFIG_PM_RUNTIME unset PM: Revert "driver core: platform_bus: allow runtime override of dev_pm_ops" OMAP1 / PM: Use generic clock manipulation routines for runtime PM PM / Runtime: Generic clock manipulation rountines for runtime PM (v6) PM / Runtime: Add subsystem data field to struct dev_pm_info OMAP2+ / PM: move runtime PM implementation to use device power domains PM / Platform: Use generic runtime PM callbacks directly shmobile: Use power domains for platform runtime PM PM: Export platform bus type's default PM callbacks PM: Make power domain callbacks take precedence over subsystem ones
2011-05-17Merge branch 'syscore' into for-linusRafael J. Wysocki
* syscore: PM: Remove sysdev suspend, resume and shutdown operations PM / PowerPC: Use struct syscore_ops instead of sysdevs for PM PM / UNICORE32: Use struct syscore_ops instead of sysdevs for PM PM / AVR32: Use struct syscore_ops instead of sysdevs for PM PM / Blackfin: Use struct syscore_ops instead of sysdevs for PM ARM / Samsung: Use struct syscore_ops for "core" power management ARM / PXA: Use struct syscore_ops for "core" power management ARM / SA1100: Use struct syscore_ops for "core" power management ARM / Integrator: Use struct syscore_ops for core PM ARM / OMAP: Use struct syscore_ops for "core" power management ARM: Use struct syscore_ops instead of sysdevs for PM in common code
2011-05-17Revert "PM / Hibernate: Reduce autotuned default image size"Rafael J. Wysocki
This reverts commit bea3864fb627d110933cfb8babe048b63c4fc76e (PM / Hibernate: Reduce autotuned default image size), because users are now able to resolve the issue this commit was supposed to address in a different way (i.e. by using the new /sys/power/reserved_size interface). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-05-17PM / Hibernate: Add sysfs knob to control size of memory for driversRafael J. Wysocki
Martin reports that on his system hibernation occasionally fails due to the lack of memory, because the radeon driver apparently allocates too much of it during the device freeze stage. It turns out that the amount of memory allocated by radeon during hibernation (and presumably during system suspend too) depends on the utilization of the GPU (e.g. hibernating while there are two KDE 4 sessions with compositing enabled causes radeon to allocate more memory than for one KDE 4 session). In principle it should be possible to use image_size to make the memory preallocation mechanism free enough memory for the radeon driver, but in practice it is not easy to guess the right value because of the way the preallocation code uses image_size. For this reason, it seems reasonable to allow users to control the amount of memory reserved for driver allocations made after the hibernate preallocation, which currently is constant and amounts to 1 MB. Introduce a new sysfs file, /sys/power/reserved_size, whose value will be used as the amount of memory to reserve for the post-preallocation reservations made by device drivers, in bytes. For backwards compatibility, set its default (and initial) value to the currently used number (1 MB). References: https://bugzilla.kernel.org/show_bug.cgi?id=34102 Reported-and-tested-by: Martin Steigerwald <Martin@Lichtvoll.de> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-05-17kmod: always provide usermodehelper_disable()Kay Sievers
We need to prevent kernel-forked processes during system poweroff. Such processes try to access the filesystem whose disks we are trying to shutdown at the same time. This causes delays and exceptions in the storage drivers. A follow-up patch will add these calls and need usermodehelper_disable() also on systems without suspend support. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-05-17PM: Print a warning if firmware is requested when tasks are frozenRafael J. Wysocki
Some drivers erroneously use request_firmware() from their ->resume() (or ->thaw(), or ->restore()) callbacks, which is not going to work unless the firmware has been built in. This causes system resume to stall until the firmware-loading timeout expires, which makes users think that the resume has failed and reboot their machines unnecessarily. For this reason, make _request_firmware() print a warning and return immediately with error code if it has been called when tasks are frozen and it's impossible to start any new usermode helpers. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Reviewed-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
2011-05-17Freezer: Use SMP barriersMike Frysinger
The freezer processes are dealing with multiple threads running simultaneously, and on a UP system, the memory reads/writes do not need barriers to keep things in sync. These are only needed on SMP systems, so use SMP barriers instead. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-05-17PM / Suspend: Do not ignore error codes returned by suspend_enter()MyungJoo Ham
The current implementation of suspend-to-RAM returns 0 if there is an error from suspend_enter(), because suspend_devices_and_enter() ignores the return value from suspend_enter(). This patch addresses this issue and properly keep the error return from suspend_enter() and let suspend_devices_and_enter relay the error return. Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-05-17Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tick: Clear broadcast active bit when switching to oneshot rtc: mc13xxx: Don't call rtc_device_register while holding lock rtc: rp5c01: Initialize drvdata before registering device rtc: pcap: Initialize drvdata before registering device rtc: msm6242: Initialize drvdata before registering device rtc: max8998: Initialize drvdata before registering device rtc: max8925: Initialize drvdata before registering device rtc: m41t80: Initialize clientdata before registering device rtc: ds1286: Initialize drvdata before registering device rtc: ep93xx: Initialize drvdata before registering device rtc: davinci: Initialize drvdata before registering device rtc: mxc: Initialize drvdata before registering device clocksource: Install completely before selecting
2011-05-16tick: Clear broadcast active bit when switching to oneshotThomas Gleixner
The first cpu which switches from periodic to oneshot mode switches also the broadcast device into oneshot mode. The broadcast device serves as a backup for per cpu timers which stop in deeper C-states. To avoid starvation of the cpus which might be in idle and depend on broadcast mode it marks the other cpus as broadcast active and sets the brodcast expiry value of those cpus to the next tick. The oneshot mode broadcast bit for the other cpus is sticky and gets only cleared when those cpus exit idle. If a cpu was not idle while the bit got set in consequence the bit prevents that the broadcast device is armed on behalf of that cpu when it enters idle for the first time after it switched to oneshot mode. In most cases that goes unnoticed as one of the other cpus has usually a timer pending which keeps the broadcast device armed with a short timeout. Now if the only cpu which has a short timer active has the bit set then the broadcast device will not be armed on behalf of that cpu and will fire way after the expected timer expiry. In the case of Christians bug report it took ~145 seconds which is about half of the wrap around time of HPET (the limit for that device) due to the fact that all other cpus had no timers armed which expired before the 145 seconds timeframe. The solution is simply to clear the broadcast active bit unconditionally when a cpu switches to oneshot mode after the first cpu switched the broadcast device over. It's not idle at that point otherwise it would not be executing that code. [ I fundamentally hate that broadcast crap. Why the heck thought some folks that when going into deep idle it's a brilliant concept to switch off the last device which brings the cpu back from that state? ] Thanks to Christian for providing all the valuable debug information! Reported-and-tested-by: Christian Hoffmann <email@christianhoffmann.info> Cc: John Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1105161105170.3078%40ionos%3E Cc: stable@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-16sched: Fix and optimise calculation of the weight-inverseStephan Baerwolf
If the inverse loadweight should be zero, function "calc_delta_mine" calculates the inverse of "lw->weight" (in 32bit integer ops). This calculation is actually a little bit impure (because it is inverting something around "lw-weight"+1), especially when "lw->weight" becomes smaller. The correct inverse would be 1/lw->weight multiplied by "WMULT_CONST" for fixcomma-scaling it into integers. (So WMULT_CONST/lw->weight ...) The old, impure algorithm took two divisions for inverting lw->weight, the new, more exact one only takes one and an additional unlikely-if. Signed-off-by: Stephan Baerwolf <stephan.baerwolf@tu-ilmenau.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-0pz0wnyalr4tk4ln11xwumdx@git.kernel.org [ This could explain some aritmetical issues for small shares but nothing concrete has been reported yet so we are not confident enough to queue this up in sched/urgent and for -stable backport. But if anyone finds this commit and sees it to fix some badness then we can certainly change our mind! ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-16sched: Avoid going ahead if ->cpus_allowed is not changedYong Zhang
If cpumask_equal(&p->cpus_allowed, new_mask) is true, seems there is no reason to prevent set_cpus_allowed_ptr() return directly. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Acked-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110509140705.GA2219@zhy Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-16sched, rt: Update rq clock when unthrottling of an otherwise idle CPUMike Galbraith
If an RT task is awakened while it's rt_rq is throttled, the time between wakeup/enqueue and unthrottle/selection may be accounted as rt_time if the CPU is idle. Set rq->skip_clock_update negative upon throttle release to tell put_prev_task() that we need a clock update. Reported-by: Thomas Giesel <skoe@directbox.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1304059010.7472.1.camel@marge.simson.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-16sched: Fix rt_rq runtime leakage bugCheng Xu
This patch is to fix the real-time scheduler bug reported at: https://lkml.org/lkml/2011/4/26/13 That is, when running multiple real-time threads on every logical CPUs and then turning off one CPU, the kernel will bug at function __disable_runtime(). Function __disable_runtime() bugs and reports leakage of rt_rq runtime. The root cause is __disable_runtime() assumes it iterates through all the existing rt_rq's while walking rq->leaf_rt_rq_list, which actually contains only runnable rt_rq's. This problem also applies to __enable_runtime() and print_rt_stats(). The patch is based on above analysis, appears to fix the problem, but is only lightly tested. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Cheng Xu <chengxu@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4DCE1F12.6040609@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-13Cache user_ns in struct credSerge E. Hallyn
If !CONFIG_USERNS, have current_user_ns() defined to (&init_user_ns). Get rid of _current_user_ns. This requires nsown_capable() to be defined in capability.c rather than as static inline in capability.h, so do that. Request_key needs init_user_ns defined at current_user_ns if !CONFIG_USERNS, so forward-declare that in cred.h if !CONFIG_USERNS at current_user_ns() define. Compile-tested with and without CONFIG_USERNS. Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> [ This makes a huge performance difference for acl_permission_check(), up to 30%. And that is one of the hottest kernel functions for loads that are pathname-lookup heavy. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-13job control: reorganize wait_task_stopped()Tejun Heo
wait_task_stopped() tested task_stopped_code() without acquiring siglock and, if stop condition existed, called wait_task_stopped() and directly returned the result. This patch moves the initial task_stopped_code() testing into wait_task_stopped() and make wait_consider_task() fall through to wait_task_continue() on 0 return. This is for the following two reasons. * Because the initial task_stopped_code() test is done without acquiring siglock, it may race against SIGCONT generation. The stopped condition might have been replaced by continued state by the time wait_task_stopped() acquired siglock. This may lead to unexpected failure of WNOHANG waits. This reorganization addresses this single race case but there are other cases - TASK_RUNNING -> TASK_STOPPED transition and EXIT_* transitions. * Scheduled ptrace updates require changes to the initial test which would fit better inside wait_task_stopped(). Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2011-05-12sched: Remove unused parameters from sched_fork() and wake_up_new_task()Samir Bellabes
sched_fork() and wake_up_new_task() are defined with a parameter 'unsigned long clone_flags', which is unused. This patch removes the parameters. Signed-off-by: Samir Bellabes <sam@synack.fr> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1305130685-1047-1-git-send-email-sam@synack.fr Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-12Merge commit 'v2.6.39-rc7' into sched/coreIngo Molnar
2011-05-11PM: Remove sysdev suspend, resume and shutdown operationsRafael J. Wysocki
Since suspend, resume and shutdown operations in struct sysdev_class and struct sysdev_driver are not used any more, remove them. Also drop sysdev_suspend(), sysdev_resume() and sysdev_shutdown() used for executing those operations and modify all of their users accordingly. This reduces kernel code size quite a bit and reduces its complexity. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-11PM / Hibernate: Fix ioctl SNAPSHOT_S2RAMRafael J. Wysocki
The SNAPSHOT_S2RAM ioctl used for implementing the feature allowing one to suspend to RAM after creating a hibernation image is currently broken, because it doesn't clear the "ready" flag in the struct snapshot_data object handled by it. As a result, the SNAPSHOT_UNFREEZE doesn't work correctly after SNAPSHOT_S2RAM has returned and the user space hibernate task cannot thaw the other processes as appropriate. Make SNAPSHOT_S2RAM clear data->ready to fix this problem. Tested-by: Alexandre Felipe Muller de Souza <alexandrefm@mandriva.com.br> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: stable@kernel.org
2011-05-11PM / Hibernate: Make snapshot_release() restore GFP maskRafael J. Wysocki
If the process using the hibernate user space interface closes /dev/snapshot after creating a hibernation image without thawing tasks, snapshot_release() should call pm_restore_gfp_mask() to restore the GFP mask used before the creation of the image. Make that happen. Tested-by: Alexandre Felipe Muller de Souza <alexandrefm@mandriva.com.br> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: stable@kernel.org
2011-05-11PM: Fix warning in pm_restrict_gfp_mask() during SNAPSHOT_S2RAM ioctlRafael J. Wysocki
A warning is printed by pm_restrict_gfp_mask() while the SNAPSHOT_S2RAM ioctl is being executed after creating a hibernation image, because pm_restrict_gfp_mask() has been called once already before the image creation and suspend_devices_and_enter() calls it once again. This happens after commit 452aa6999e6703ffbddd7f6ea124d3 (mm/pm: force GFP_NOIO during suspend/hibernation and resume). To avoid this issue, move pm_restrict_gfp_mask() and pm_restore_gfp_mask() from suspend_devices_and_enter() to its caller in kernel/power/suspend.c. Reported-by: Alexandre Felipe Muller de Souza <alexandrefm@mandriva.com.br> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: stable@kernel.org
2011-05-10Merge commit 'v2.6.39-rc7' into perf/coreIngo Molnar
Merge reason: pull in the latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-09ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping()Tejun Heo
GROUP_STOP_TRAPPING waiting mechanism piggybacks on signal->wait_chldexit which is primarily used to implement waiting for wait(2) and friends. When do_wait() waits on signal->wait_chldexit, it uses a custom wake up callback, child_wait_callback(), which expects the child task which is waking up the parent to be passed in as @key to filter out spurious wakeups. task_clear_group_stop_trapping() used __wake_up_sync() which uses NULL @key causing the following oops if the parent was doing do_wait(). BUG: unable to handle kernel NULL pointer dereference at 00000000000002d8 IP: [<ffffffff810499f9>] child_wait_callback+0x29/0x80 PGD 1d899067 PUD 1e418067 PMD 0 Oops: 0000 [#1] PREEMPT SMP last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/local_cpus CPU 2 Modules linked in: Pid: 4498, comm: test-continued Not tainted 2.6.39-rc6-work+ #32 Bochs Bochs RIP: 0010:[<ffffffff810499f9>] [<ffffffff810499f9>] child_wait_callback+0x29/0x80 RSP: 0000:ffff88001b889bf8 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff88001fab3af8 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff88001d91df20 RBP: ffff88001b889c08 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: ffff88001fb70550 R14: 0000000000000000 R15: 0000000000000001 FS: 00007f26ccae4700(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000000002d8 CR3: 000000001b8ac000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process test-continued (pid: 4498, threadinfo ffff88001b888000, task ffff88001fb88000) Stack: ffff88001b889c18 ffff88001fb70538 ffff88001b889c58 ffffffff810312f9 0000000000000001 0000000200000001 ffff88001b889c58 ffff88001fb70518 0000000000000002 0000000000000082 0000000000000001 0000000000000000 Call Trace: [<ffffffff810312f9>] __wake_up_common+0x59/0x90 [<ffffffff81035263>] __wake_up_sync_key+0x53/0x80 [<ffffffff810352a0>] __wake_up_sync+0x10/0x20 [<ffffffff8105a984>] task_clear_jobctl_trapping+0x44/0x50 [<ffffffff8105bcbc>] ptrace_stop+0x7c/0x290 [<ffffffff8105c20a>] do_signal_stop+0x28a/0x2d0 [<ffffffff8105d27f>] get_signal_to_deliver+0x14f/0x5a0 [<ffffffff81002175>] do_signal+0x75/0x7b0 [<ffffffff8100292d>] do_notify_resume+0x5d/0x70 [<ffffffff8182e36a>] retint_signal+0x46/0x8c Code: 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 8b 47 d8 83 f8 03 74 3a 85 c0 49 89 c8 75 23 89 c0 48 8b 5f e0 4c 8d 0c 40 31 c0 <4b> 39 9c c8 d8 02 00 00 74 1d 48 83 c4 08 5b c9 c3 66 0f 1f 44 Fix it by using __wake_up_sync_key() and passing in the child as @key. I still think it's a mistake to piggyback on wait_chldexit for this. Given the relative low frequency of ptrace use, we would be much better off leaving already complex wait_chldexit alone and using bit waitqueue. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
2011-05-09signal: sys_sigprocmask() needs retarget_shared_pending()Oleg Nesterov
sys_sigprocmask() changes current->blocked by hand. Convert this code to use set_current_blocked(). Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2011-05-07perf,rcu: convert call_rcu(swevent_hlist_release_rcu) to kfree_rcu()Lai Jiangshan
The rcu callback swevent_hlist_release_rcu() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(swevent_hlist_release_rcu). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-07perf,rcu: convert call_rcu(free_ctx) to kfree_rcu()Lai Jiangshan
The rcu callback free_ctx() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(free_ctx). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-07cgroup,rcu: convert call_rcu(__free_css_id_cb) to kfree_rcu()Lai Jiangshan
The rcu callback __free_css_id_cb() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(__free_css_id_cb). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-07cgroup,rcu: convert call_rcu(free_cgroup_rcu) to kfree_rcu()Lai Jiangshan
The rcu callback free_cgroup_rcu() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(free_cgroup_rcu). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-07cgroup,rcu: convert call_rcu(free_css_set_rcu) to kfree_rcu()Lai Jiangshan
The rcu callback free_css_set_rcu() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(free_css_set_rcu). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-07rcu: permit rcu_read_unlock() to be called while holding runqueue locksPaul E. McKenney
Avoid calling into the scheduler while holding core RCU locks. This allows rcu_read_unlock() to be called while holding the runqueue locks, but only as long as there was no chance of the RCU read-side critical section having been preempted. (Otherwise, if RCU priority boosting is enabled, rcu_read_unlock() might call into the scheduler in order to unboost itself, which might allows self-deadlock on the runqueue locks within the scheduler.) Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-05-07Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf tools: Makefile: Use gcc to determine ARCH perf events, x86: Fix Intel Nehalem and Westmere last level cache event definitions hw_breakpoints, powerpc: Fix CONFIG_HAVE_HW_BREAKPOINT off-case in ptrace_set_debugreg() sh, hw_breakpoints: Fix racy access to ptrace breakpoints arm, hw_breakpoints: Fix racy access to ptrace breakpoints powerpc, hw_breakpoints: Fix racy access to ptrace breakpoints x86, hw_breakpoints: Fix racy access to ptrace breakpoints ptrace: Prepare to fix racy accesses on task breakpoints
2011-05-06reboot: disable usermodehelper to prevent fs accessKay Sievers
In case CONFIG_UEVENT_HELPER_PATH is not set to "", which it should be on every system, the kernel forks processes during shutdown, which try to access the rootfs, even when the binary does not exist. It causes exceptions and long delays in the disk driver, which gets read requests at the time it tries to shut down the disk. This patch disables all kernel-forked processes during reboot to allow a clean poweroff. Cc: Tejun Heo <htejun@gmail.com> Tested-By: Anton Guda <atu@dmeti.dp.ua> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-06Regression: partial revert "tracing: Remove lock_depth from event entry"Arjan van de Ven
This partially reverts commit e6e1e2593592a8f6f6380496655d8c6f67431266. That commit changed the structure layout of the trace structure, which in turn broke PowerTOP (1.9x generation) quite badly. I appreciate not wanting to expose the variable in question, and PowerTOP was not using it, so I've replaced the variable with just a padding field - that way if in the future a new field is needed it can just use this padding field. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-06sched: Shorten the construction of the span cpu mask of sched domainHillf Danton
For a given node, when constructing the cpumask for its sched_domain to span, if there is no best node available after searching, further efforts could be saved, based on small change in the return value of find_next_best_node(). Signed-off-by: Hillf Danton <dhillf@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Yong Zhang <yong.zhang0@gmail.com> Link: http://lkml.kernel.org/r/BANLkTi%3DqPWxRAa6%2BdT3ohEP6Z%3D0v%2Be4EXA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-06sched: Wrap the 'cfs_rq->nr_spread_over' field with CONFIG_SCHED_DEBUGRakib Mullick
cfs_rq->nr_spread_over is only used when CONFIG_SCHED_DEBUG is set. So wrap it with CONFIG_SCHED_DEBUG. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1304528026.15681.3.camel@localhost.localdomain Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-05rcu: provide rcu_virt_note_context_switch() function.Gleb Natapov
Provide rcu_virt_note_context_switch() for vitalization use to note quiescent state during guest entry. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-05-05rcu: get rid of signed overflow in check_cpu_stall()Paul E. McKenney
Signed integer overflow is undefined by the C standard, so move calculations to unsigned. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-05-05rcu: optimize rcutinyEric Dumazet
rcu_sched_qs() currently calls local_irq_save()/local_irq_restore() up to three times. Remove irq masking from rcu_qsctr_help() / invoke_rcu_kthread() and do it once in rcu_sched_qs() / rcu_bh_qs() This generates smaller code as well. text data bss dec hex filename 2314 156 24 2494 9be kernel/rcutiny.old.o 2250 156 24 2430 97e kernel/rcutiny.new.o Fix an outdated comment for rcu_qsctr_help() Move invoke_rcu_kthread() definition before its use. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: prevent call_rcu() from diving into rcu core if irqs disabledPaul E. McKenney
This commit marks a first step towards making call_rcu() have real-time behavior. If irqs are disabled, don't dive into the RCU core. Later on, this new early exit will wake up the per-CPU kthread, which first must be modified to handle the cases involving callback storms. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05rcu: further lower priority in rcu_yield()Paul E. McKenney
Although rcu_yield() dropped from real-time to normal priority, there is always the possibility that the competing tasks have been niced. So nice to 19 in rcu_yield() to help ensure that other tasks have a better chance of running. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>