summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2010-04-14perf: Store active software events in a hashlistFrederic Weisbecker
Each time a software event triggers, we need to walk through the entire list of events from the current cpu and task contexts to retrieve a running perf event that matches. We also need to check a matching perf event is actually counting. This walk is wasteful and makes the event fast path scaling down with a growing number of events running on the same contexts. To solve this, we store the running perf events in a hashlist to get an immediate access to them against their type:event_id when they trigger. v2: - Fix SWEVENT_HLIST_SIZE definition (and re-learn some basic maths along the way) - Only allocate hlist for online cpus, but keep track of the refcount on offline possible cpus too, so that we allocate it if needed when it becomes online. - Drop the kref use as it's not adapted to our tricks anymore. v3: - Fix bad refcount check (address instead of value). Thanks to Eric Dumazet who spotted this. - While exiting cpu, move the hlist release out of the IPI path to lock the hlist mutex sanely. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu>
2010-04-08Merge branch 'linus' into perf/coreIngo Molnar
Semantic conflict: arch/x86/kernel/cpu/perf_event_intel_ds.c Merge reason: pick up latest fixes, fix the conflict Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-08virtio: disable multiport console support.Michael S. Tsirkin
Move MULTIPORT feature and related config changes out of exported headers, and disable the feature at runtime. At this point, it seems less risky to keep code around until we can enable it than rip it out completely. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-04-07Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip: x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boards x86: Increase CONFIG_NODES_SHIFT max to 10 ibft, x86: Change reserve_ibft_region() to find_ibft_region() x86, hpet: Fix bug in RTC emulation x86, hpet: Erratum workaround for read after write of HPET comparator bootmem, x86: Fix 32bit numa system without RAM on node 0 nobootmem, x86: Fix 32bit numa system without RAM on node 0 x86: Handle overlapping mptables x86: Make e820_remove_range to handle all covered case x86-32, resume: do a global tlb flush in S4 resume
2010-04-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ALSA: mixart: range checking proc file ALSA: hda - Fix a wrong array range check in patch_realtek.c ALSA: ASoC: move dma_data from snd_soc_dai to snd_soc_pcm_stream ALSA: hda - Enable amplifiers on Acer Inspire 6530G ASoC: Only do WM8994 bias off transition from standby ASoC: Don't use DCS_DATAPATH_BUSY for WM hubs devices ASoC: Don't do runtime wm_hubs DC servo updates if using offset correction ASoC: Support second DC servo readback method for wm_hubs ASoC: Avoid wraparound in wm_hubs DC servo correction ALSA: echoaudio - Eliminate use after free ALSA: i2c: cleanup: change parameter to pointer ALSA: hda - Add MSI blacklist for Aopen MZ915-M ASoC: OMAP: Fix capture pointer handling for OMAP1510 to work correctly with recent ALSA PCM code ALSA: hda - Update document about MSI and interrupts ALSA: hda: Fix 0 dB offset for Lenovo Thinkpad models using AD1981 ALSA: hda - Add missing printk argument in previous patch ASoC: Fix passing platform_data to ac97 bus users and fix a leak ALSA: hda - Fix ADC/MUX assignment of ALC269 codec ALSA: hda - Fix invalid bit values passed to snd_hda_codec_amp_stereo() ASoC: wm8994: playback => capture
2010-04-07memcg: fix race in file_mapped accountingKAMEZAWA Hiroyuki
Presently, memcg's FILE_MAPPED accounting has following race with move_account (happens at rmdir()). increment page->mapcount (rmap.c) mem_cgroup_update_file_mapped() move_account() lock_page_cgroup() check page_mapped() if page_mapped(page)>1 { FILE_MAPPED -1 from old memcg FILE_MAPPED +1 to old memcg } ..... overwrite pc->mem_cgroup unlock_page_cgroup() lock_page_cgroup() FILE_MAPPED + 1 to pc->mem_cgroup unlock_page_cgroup() Then, old memcg (-1 file mapped) new memcg (+2 file mapped) This happens because move_account see page_mapped() which is not guarded by lock_page_cgroup(). This patch adds FILE_MAPPED flag to page_cgroup and move account information based on it. Now, all checks are synchronous with lock_page_cgroup(). Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Balbir Singh <balbir@in.ibm.com> Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Andrea Righi <arighi@develer.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07pagemap: fix pfn calculation for hugepageNaoya Horiguchi
When we look into pagemap using page-types with option -p, the value of pfn for hugepages looks wrong (see below.) This is because pte was evaluated only once for one vma although it should be updated for each hugepage. This patch fixes it. $ page-types -p 3277 -Nl -b huge voffset offset len flags 7f21e8a00 11e400 1 ___U___________H_G________________ 7f21e8a01 11e401 1ff ________________TG________________ ^^^ 7f21e8c00 11e400 1 ___U___________H_G________________ 7f21e8c01 11e401 1ff ________________TG________________ ^^^ One hugepage contains 1 head page and 511 tail pages in x86_64 and each two lines represent each hugepage. Voffset and offset mean virtual address and physical address in the page unit, respectively. The different hugepages should not have the same offset value. With this patch applied: $ page-types -p 3386 -Nl -b huge voffset offset len flags 7fec7a600 112c00 1 ___UD__________H_G________________ 7fec7a601 112c01 1ff ________________TG________________ ^^^ 7fec7a800 113200 1 ___UD__________H_G________________ 7fec7a801 113201 1ff ________________TG________________ ^^^ OK More info: - This patch modifies walk_page_range()'s hugepage walker. But the change only affects pagemap_read(), which is the only caller of hugepage callback. - Without this patch, hugetlb_entry() callback is called per vma, that doesn't match the natural expectation from its name. - With this patch, hugetlb_entry() is called per hugepte entry and the callback can become much simpler. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07kernel.h: fix wrong usage of __ratelimit()Yong Zhang
When __ratelimit() returns 1 this means that we can go ahead. Signed-off-by: Yong Zhang <yong.zhang@windriver.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07vfs: rename block_fsync() to blkdev_fsync()Andrew Morton
Requested by hch, for consistency now it is exported. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Anton Blanchard <anton@samba.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07raw: fsync method is now requiredAnton Blanchard
Commit 148f948ba877f4d3cdef036b1ff6d9f68986706a (vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode) broke the raw driver. We now call through generic_file_aio_write -> generic_write_sync -> vfs_fsync_range. vfs_fsync_range has: if (!fop || !fop->fsync) { ret = -EINVAL; goto out; } But drivers/char/raw.c doesn't set an fsync method. We have two options: fix it or remove the raw driver completely. I'm happy to do either, the fact this has been broken for so long suggests it is rarely used. The patch below adds an fsync method to the raw driver. My knowledge of the block layer is pretty sketchy so this could do with a once over. If we instead decide to remove the raw driver, this patch might still be useful as a backport to 2.6.33 and 2.6.32. Signed-off-by: Anton Blanchard <anton@samba.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07include/linux/kfifo.h: fix INIT_KFIFO()David Härdeman
DECLARE_KFIFO creates a union with a struct kfifo and a buffer array with size [size + sizeof(struct kfifo)]. INIT_KFIFO then sets the buffer pointer in struct kfifo to point to the beginning of the buffer array which means that the first call to kfifo_in will overwrite members of the struct kfifo. Signed-off-by: David Härdeman <david@hardeman.nu> Acked-by: Stefani Seibold <stefani@seibold.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07bitops: remove temporary for_each_bit()Andrew Morton
Migration has been completed so remove this now. There's one straggler in linux-next's drivers/mtd/sm_ftl.c. A patch has been sent. Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07Merge branch 'fix/asoc' into for-linusTakashi Iwai
2010-04-06Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: libata: unlock HPA if device shrunk libata: disable NCQ on Crucial C300 SSD libata: don't whine on spurious IRQ
2010-04-06libata: unlock HPA if device shrunkTejun Heo
Some BIOSes don't configure HPA during boot but do so while resuming. This causes harddrives to shrink during resume making libata detach and reattach them. This can be worked around by unlocking HPA if old size equals native size. Add ATA_DFLAG_UNLOCK_HPA so that HPA unlocking can be controlled per-device and update ata_dev_revalidate() such that it sets ATA_DFLAG_UNLOCK_HPA and fails with -EIO when the above condition is detected. This patch fixes the following bug. https://bugzilla.kernel.org/show_bug.cgi?id=15396 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Oleksandr Yermolenko <yaa.bta@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-04-05Fix up possibly racy module refcountingNick Piggin
Module refcounting is implemented with a per-cpu counter for speed. However there is a race when tallying the counter where a reference may be taken by one CPU and released by another. Reference count summation may then see the decrement without having seen the previous increment, leading to lower than expected count. A module which never has its actual reference drop below 1 may return a reference count of 0 due to this race. Module removal generally runs under stop_machine, which prevents this race causing bugs due to removal of in-use modules. However there are other real bugs in module.c code and driver code (module_refcount is exported) where the callers do not run under stop_machine. Fix this by maintaining running per-cpu counters for the number of module refcount increments and the number of refcount decrements. The increments are tallied after the decrements, so any decrement seen will always have its corresponding increment counted. The final refcount is the difference of the total increments and decrements, preventing a low-refcount from being returned. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: saving negative to unsigned char 9p: return on mutex_lock_interruptible() 9p: Creating files with names too long should fail with ENAMETOOLONG. 9p: Make sure we are able to clunk the cached fid on umount 9p: drop nlink remove fs/9p: Clunk the fid resulting from partial walk of the name 9p: documentation update 9p: Fix setting of protocol flags in v9fs_session_info structure.
2010-04-05ALSA: ASoC: move dma_data from snd_soc_dai to snd_soc_pcm_streamDaniel Mack
This fixes a memory corruption when ASoC devices are used in full-duplex mode. Specifically for pxa-ssp code, where this pointer is dynamically allocated for each direction and destroyed upon each stream start. All other platforms are fixed blindly, I couldn't even compile-test them. Sorry for any breakage I may have caused. [Note that this is a backported version for 2.6.34. Upstream commit is fd23b7dee] Signed-off-by: Daniel Mack <daniel@caiaq.de> Reported-by: Sven Neumann <s.neumann@raumfeld.com> Reported-by: Michael Hirsch <m.hirsch@raumfeld.com> Acked-by: Liam Girdwood <lrg@slimlogic.co.uk> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2010-04-05Merge branch 'slabh' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/miscLinus Torvalds
* 'slabh' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc: eeepc-wmi: include slab.h staging/otus: include slab.h from usbdrv.h percpu: don't implicitly include slab.h from percpu.h kmemcheck: Fix build errors due to missing slab.h include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h iwlwifi: don't include iwl-dev.h from iwl-devtrace.h x86: don't include slab.h from arch/x86/include/asm/pgtable_32.h Fix up trivial conflicts in include/linux/percpu.h due to is_kernel_percpu_address() having been introduced since the slab.h cleanup with the percpu_up.c splitup.
2010-04-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: module: add stub for is_module_percpu_address percpu, module: implement and use is_kernel/module_percpu_address() module: encapsulate percpu handling better and record percpu_size
2010-04-059p: Make sure we are able to clunk the cached fid on umountAneesh Kumar K.V
dcache prune happen on umount. So we cannot mark the client satus disconnect. That will prevent a 9p call to the server Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-04-05Merge branch 'master' into export-slabhTejun Heo
2010-04-04Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Always build the powerpc perf_arch_fetch_caller_regs version perf: Always build the stub perf_arch_fetch_caller_regs version perf, probe-finder: Build fix on Debian perf/scripts: Tuple was set from long in both branches in python_process_event() perf: Fix 'perf sched record' deadlock perf, x86: Fix callgraphs of 32-bit processes on 64-bit kernels perf, x86: Fix AMD hotplug & constraint initialization x86: Move notify_cpu_starting() callback to a later stage x86,kgdb: Always initialize the hw breakpoint attribute perf: Use hot regs with software sched switch/migrate events perf: Correctly align perf event tracing buffer
2010-04-04perf: Fetch hot regs from the template callerFrederic Weisbecker
Trace events can be defined from a template using DECLARE_EVENT_CLASS/DEFINE_EVENT or directly with TRACE_EVENT. In both cases we have a template tracepoint handler, used to record the trace, to which we pass our ftrace event instance. In the function level, if the class is named "foo" and the event is named "blah", we have the following chain of calls: perf_trace_blah() -> perf_trace_templ_foo() In the case we have several events sharing the class "blah", we'll have multiple users of perf_trace_templ_foo(), and it won't be inlined by the compiler. This is usually what happens with the DECLARE_EVENT_CLASS/DEFINE_EVENT based definition. But if perf_trace_blah() is the only caller of perf_trace_templ_foo() there are fair chances that it will be inlined. The problem is that we fetch the regs from perf_trace_templ_foo() after we rewinded the frame pointer to the second caller, we want to reach the caller of perf_trace_blah() to get the right source of the event. And we do this by always assuming that perf_trace_templ_foo() is not inlined. But as shown above this is not always true. And if it is inlined we miss the first caller, losing the most important level of precision. We get: 61.31% ls [kernel.kallsyms] [k] do_softirq | --- do_softirq irq_exit do_IRQ common_interrupt | |--25.00%-- tty_buffer_request_room Instead of: 61.31% ls [kernel.kallsyms] [k] __do_softirq | --- __do_softirq do_softirq irq_exit do_IRQ common_interrupt | |--25.00%-- tty_buffer_request_room To fix this, we fetch the regs from perf_trace_blah() rather than perf_trace_templ_foo() so that we don't have to deal with inlining surprises. That also bring us the advantage of having the true source of the event even if we don't have frame pointers. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu>
2010-04-04ALSA: i2c: cleanup: change parameter to pointerDan Carpenter
We actually pass an array of 7 chars not 5. This silences a smatch warning. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-04-02Merge master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds
* master.kernel.org:/home/rmk/linux-2.6-arm: ARM: 5965/1: Fix soft lockup in at91 udc driver ARM: 6006/1: ARM: Use the correct NOP size in memmove for Thumb-2 kernel builds ARM: 6005/1: arm: kprobes: fix register corruption with jprobes ARM: 6003/1: removing compilation warning from pl061.h ARM: 6001/1: removing compilation warning comming from clkdev.h ARM: 6000/1: removing compilation warning comming from <asm/irq.h> ARM: 5999/1: Including device.h and resource.h header files in linux/amba/bus.h ARM: 5997/1: ARM: Correct the VFPv3 detection ARM: 5996/1: ARM: Change the mandatory barriers implementation (4/4) ARM: 5995/1: ARM: Add L2x0 outer_sync() support (3/4) ARM: 5994/1: ARM: Add outer_cache_fns.sync function pointer (2/4) ARM: 5993/1: ARM: Move the outer_cache definitions into a separate file (1/4)
2010-04-02Merge branch 'pm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: Freezer: Fix buggy resume test for tasks frozen with cgroup freezer Freezer: Only show the state of tasks refusing to freeze
2010-04-02Merge branch 'perf/urgent' into perf/coreIngo Molnar
Conflicts: arch/x86/kernel/cpu/perf_event.c Merge reason: Resolve the conflict, pick up fixes Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02Merge branch 'perf/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent
2010-04-01ibft, x86: Change reserve_ibft_region() to find_ibft_region()Yinghai Lu
This allows arch code could decide the way to reserve the ibft. And we should reserve ibft as early as possible, instead of BOOTMEM stage, in case the table is in RAM range and is not reserved by BIOS (this will often be the case.) Move to just after find_smp_config(). Also when CONFIG_NO_BOOTMEM=y, We will not have reserve_bootmem() anymore. -v2: fix typo about ibft pointed by Konrad Rzeszutek Wilk <konrad@darnok.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4BB510FB.80601@kernel.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Peter Jones <pjones@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> CC: Jan Beulich <jbeulich@novell.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-01Merge branch 'drm-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (76 commits) drm/radeon/kms: enable ACPI powermanagement mode on radeon gpus. drm/radeon/kms: rs400/480 should set common registers. drm/radeon/kms: add sanity check to wptr. drm/radeon/kms/evergreen: get DP working drm/radeon/kms: add hw_i2c module option drm/radeon/kms: use new pre/post_xfer i2c bit algo hooks drm/radeon/kms: disable MSI on IGP chips drm/radeon/kms: display watermark updates (v2) drm/radeon/kms/dp: disable training pattern on the sink at the end of link training drm/radeon/kms: minor fixes for eDP with LCD* device tags (v2) drm/radeon/kms/dp: remove extraneous training complete call drm/radeon/kms/atom: minor fixes to transmitter setup drm/radeon/kms: Only restrict BO to visible VRAM size when pinning to VRAM. drm: fix build error when SYSRQ is disabled drm/radeon/kms: fix macbookpro connector quirk drm/radeon/r6xx/r7xx: further safe reg clean up drm/radeon: bump the UMS driver version for r6xx/r7xx const buffer support drm/radeon/kms: bump the version for r6xx/r7xx const buffer support drm/radeon/r6xx/r7xx: CS parser fixes drm/radeon/kms: fix some typos in r6xx/r7xx hpd setup ... Fix up MSI-related conflicts in drivers/gpu/drm/radeon/radeon_irq_kms.c
2010-04-01perf: Use hot regs with software sched switch/migrate eventsFrederic Weisbecker
Scheduler's task migration events don't work because they always pass NULL regs perf_sw_event(). The event hence gets filtered in perf_swevent_add(). Scheduler's context switches events use task_pt_regs() to get the context when the event occured which is a wrong thing to do as this won't give us the place in the kernel where we went to sleep but the place where we left userspace. The result is even more wrong if we switch from a kernel thread. Use the hot regs snapshot for both events as they belong to the non-interrupt/exception based events family. Unlike page faults or so that provide the regs matching the exact origin of the event, we need to save the current context. This makes the task migration event working and fix the context switch callchains and origin ip. Example: perf record -a -e cs Before: 10.91% ksoftirqd/0 0 [k] 0000000000000000 | --- (nil) perf_callchain perf_prepare_sample __perf_event_overflow perf_swevent_overflow perf_swevent_add perf_swevent_ctx_event do_perf_sw_event __perf_sw_event perf_event_task_sched_out schedule run_ksoftirqd kthread kernel_thread_helper After: 23.77% hald-addon-stor [kernel.kallsyms] [k] schedule | --- schedule | |--60.00%-- schedule_timeout | wait_for_common | wait_for_completion | blk_execute_rq | scsi_execute | scsi_execute_req | sr_test_unit_ready | | | |--66.67%-- sr_media_change | | media_changed | | cdrom_media_changed | | sr_block_media_changed | | check_disk_change | | cdrom_open v2: Always build perf_arch_fetch_caller_regs() now that software events need that too. They don't need it from modules, unlike trace events, so we keep the EXPORT_SYMBOL in trace_event_perf.c Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net>
2010-03-31Merge branch 'v2.6.34-rc2' into drm-linusDave Airlie
2010-03-31module: add stub for is_module_percpu_addressRandy Dunlap
Fix build for CONFIG_MODULES not enabled by providing a stub for is_module_percpu_address(). kernel/lockdep.c:605: error: implicit declaration of function 'is_module_percpu_address' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-03-30percpu: don't implicitly include slab.h from percpu.hTejun Heo
percpu.h has always been including slab.h to get k[mz]alloc/free() for UP inline implementation. percpu.h being used by very low level headers including module.h and sched.h, this meant that a lot files unintentionally got slab.h inclusion. Lee Schermerhorn was trying to make topology.h use percpu.h and got bitten by this implicit inclusion. The right thing to do is break this ultimately unnecessary dependency. The previous patch added explicit inclusion of either gfp.h or slab.h to the source files using them. This patch updates percpu.h such that slab.h is no longer included from percpu.h. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits) r8169: offical fix for CVE-2009-4537 (overlength frame DMAs) ipv6: Don't drop cache route entry unless timer actually expired. tulip: Add missing parens. r8169: fix broken register writes pcnet_cs: add new id bonding: fix broken multicast with round-robin mode drivers/net: Fix continuation lines e1000: do not modify tx_queue_len on link speed change net: ipmr/ip6mr: prevent out-of-bounds vif_table access ixgbe: Do not run all Diagnostic offline tests when VFs are active igb: use correct bits to identify if managability is enabled benet: Fix compile warnnings in drivers/net/benet/be_ethtool.c net: Add MSG_WAITFORONE flag to recvmmsg e1000e: do not modify tx_queue_len on link speed change igbvf: do not modify tx_queue_len on link speed change ipv4: Restart rt_intern_hash after emergency rebuild (v2) ipv4: Cleanup struct net dereference in rt_intern_hash net: fix netlink address dumping in IPv4/IPv6 tulip: Fix null dereference in uli526x_rx_packet() gianfar: fix undo of reserve() ...
2010-03-29ext3: fix broken handling of EXT3_STATE_NEWLinus Torvalds
In commit 9df93939b735 ("ext3: Use bitops to read/modify EXT3_I(inode)->i_state") ext3 changed its internal 'i_state' variable to use bitops for its state handling. However, unline the same ext4 change, it didn't actually change the name of the field when it changed the semantics of it. As a result, an old use of 'i_state' remained in fs/ext3/ialloc.c that initialized the field to EXT3_STATE_NEW. And that does not work _at_all_ when we're now working with individually named bits rather than values that get masked. So the code tried to mark the state to be new, but in actual fact set the field to EXT3_STATE_JDATA. Which makes no sense at all, and screws up all the code that checks whether the inode was newly allocated. In particular, it made the xattr code unhappy, and caused various random behavior, like apparently https://bugzilla.redhat.com/show_bug.cgi?id=577911 So fix the initialization, and rename the field to match ext4 so that we don't have this happen again. Cc: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Daniel J Walsh <dwalsh@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-29ARM: 6003/1: removing compilation warning from pl061.hviresh kumar
pl061.h is using u8 type. including <linux/types.h> in pl061.h to avoid warning. Signed-off-by: Viresh Kumar <viresh.kumar@st.com> Acked-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-03-29ARM: 5999/1: Including device.h and resource.h header files in linux/amba/bus.hviresh kumar
linux/amba/bus.h have dependencies on linux/device.h and linux/resource.h, but it doesn't include them. We get compilation errors in our files which include bus.h but doesn't include device.h and resource.h. This patch includes device.h and resource.h in linux/amba/bus.h file. Signed-off-by: Viresh Kumar <viresh.kumar@st.com> Acked-by: Linux Walleij <linux.ml.walleij@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-03-29SLOW_WORK: CONFIG_SLOW_WORK_PROC should be CONFIG_SLOW_WORK_DEBUGDavid Howells
CONFIG_SLOW_WORK_PROC was changed to CONFIG_SLOW_WORK_DEBUG, but not in all instances. Change the remaining instances. This makes the debugfs file display the time mark and the owner's description again. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-29percpu, module: implement and use is_kernel/module_percpu_address()Tejun Heo
lockdep has custom code to check whether a pointer belongs to static percpu area which is somewhat broken. Implement proper is_kernel/module_percpu_address() and replace the custom code. On UP, percpu variables are regular static variables and can't be distinguished from them. Always return %false on UP. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@redhat.com>
2010-03-29module: encapsulate percpu handling better and record percpu_sizeTejun Heo
Better encapsulate module static percpu area handling so that code outsidef of CONFIG_SMP ifdef doesn't deal with mod->percpu directly and add mod->percpu_size and record percpu_size in it. Both percpu fields are compiled out on UP. While at it, mark mod->percpu w/ __percpu. This is to prepare for is_module_percpu_address(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au>
2010-03-27net: Add MSG_WAITFORONE flag to recvmmsgBrandon L Black
Add new flag MSG_WAITFORONE for the recvmmsg() syscall. When this flag is specified for a blocking socket, recvmmsg() will only block until at least 1 packet is available. The default behavior is to block until all vlen packets are available. This flag has no effect on non-blocking sockets or when used in combination with MSG_DONTWAIT. Signed-off-by: Brandon L Black <blblack@gmail.com> Acked-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: x86/PCI: truncate _CRS windows with _LEN > _MAX - _MIN + 1 x86/PCI: for host bridge address space collisions, show conflicting resource frv/PCI: remove redundant warnings x86/PCI: remove redundant warnings PCI: don't say we claimed a resource if we failed PCI quirk: Disable MSI on VIA K8T890 systems PCI quirk: RS780/RS880: work around missing MSI initialization PCI quirk: only apply CX700 PCI bus parking quirk if external VT6212L is present PCI: complain about devices that seem to be broken PCI: print resources consistently with %pR PCI: make disabled window printk style match the enabled ones PCI: break out primary/secondary/subordinate for readability PCI: for address space collisions, show conflicting resource resources: add interfaces that return conflict information PCI: cleanup error return for pcix get and set mmrbc functions PCI: fix access of PCI_X_CMD by pcix get and set mmrbc functions PCI: kill off pci_register_set_vga_state() symbol export. PCI: fix return value from pcix_get_max_mmrbc()
2010-03-26Freezer: Fix buggy resume test for tasks frozen with cgroup freezerMatt Helsley
When the cgroup freezer is used to freeze tasks we do not want to thaw those tasks during resume. Currently we test the cgroup freezer state of the resuming tasks to see if the cgroup is FROZEN. If so then we don't thaw the task. However, the FREEZING state also indicates that the task should remain frozen. This also avoids a problem pointed out by Oren Ladaan: the freezer state transition from FREEZING to FROZEN is updated lazily when userspace reads or writes the freezer.state file in the cgroup filesystem. This means that resume will thaw tasks in cgroups which should be in the FROZEN state if there is no read/write of the freezer.state file to trigger this transition before suspend. NOTE: Another "simple" solution would be to always update the cgroup freezer state during resume. However it's a bad choice for several reasons: Updating the cgroup freezer state is somewhat expensive because it requires walking all the tasks in the cgroup and checking if they are each frozen. Worse, this could easily make resume run in N^2 time where N is the number of tasks in the cgroup. Finally, updating the freezer state from this code path requires trickier locking because of the way locks must be ordered. Instead of updating the freezer state we rely on the fact that lazy updates only manage the transition from FREEZING to FROZEN. We know that a cgroup with the FREEZING state may actually be FROZEN so test for that state too. This makes sense in the resume path even for partially-frozen cgroups -- those that really are FREEZING but not FROZEN. Reported-by: Oren Ladaan <orenl@cs.columbia.edu> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Cc: stable@kernel.org Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-03-26Merge branch 'urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6 * 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6: pcmcia: use dev_pm_ops for class pcmcia_socket_class power: support _noirq actions on device types and classes pcmcia: allow for four multifunction subdevices (again) pcmcia: do not use ioports < 0x100 on x86 pd6729: Coding Style fixes
2010-03-26Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: time: Fix accumulation bug triggered by long delay. posix-cpu-timers: Reset expire cache when no timer is running timer stats: Fix del_timer_sync() and try_to_del_timer_sync() clockevents: Sanitize min_delta_ns adjustment and prevent overflows
2010-03-26Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Remove excessive early_res debug output softlockup: Stop spurious softlockup messages due to overflow rcu: Fix local_irq_disable() CONFIG_PROVE_RCU=y false positives rcu: Fix tracepoints & lockdep false positive rcu: Make rcu_read_lock_bh_held() allow for disabled BH
2010-03-26x86, perf, bts, mm: Delete the never used BTS-ptrace codePeter Zijlstra
Support for the PMU's BTS features has been upstreamed in v2.6.32, but we still have the old and disabled ptrace-BTS, as Linus noticed it not so long ago. It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without regard for other uses (perf) and doesn't provide the flexibility needed for perf either. Its users are ptrace-block-step and ptrace-bts, since ptrace-bts was never used and ptrace-block-step can be implemented using a much simpler approach. So axe all 3000 lines of it. That includes the *locked_memory*() APIs in mm/mlock.c as well. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Markus Metzger <markus.t.metzger@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <20100325135413.938004390@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>