summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2014-01-25introduce __fcheck_files() to fix rcu_dereference_check_fdtable(), kill ↵Oleg Nesterov
rcu_my_thread_group_empty() rcu_dereference_check_fdtable() looks very wrong, 1. rcu_my_thread_group_empty() was added by 844b9a8707f1 "vfs: fix RCU-lockdep false positive due to /proc" but it doesn't really fix the problem. A CLONE_THREAD (without CLONE_FILES) task can hit the same race with get_files_struct(). And otoh rcu_my_thread_group_empty() can suppress the correct warning if the caller is the CLONE_FILES (without CLONE_THREAD) task. 2. files->count == 1 check is not really right too. Even if this files_struct is not shared it is not safe to access it lockless unless the caller is the owner. Otoh, this check is sub-optimal. files->count == 0 always means it is safe to use it lockless even if files != current->files, but put_files_struct() has to take rcu_read_lock(). See the next patch. This patch removes the buggy checks and turns fcheck_files() into __fcheck_files() which uses rcu_dereference_raw(), the "unshared" callers, fget_light() and fget_raw_light(), can use it to avoid the warning from RCU-lockdep. fcheck_files() is trivially reimplemented as rcu_lockdep_assert() plus __fcheck_files(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25sysctl: Add neg_one as a standard constraintAaron Tomlin
Add neg_one to the list of standard constraints - will be used by the next patch. Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: oleg@redhat.com Link: http://lkml.kernel.org/r/1390239253-24030-2-git-send-email-atomlin@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25kgdb/kdb: Fix no KDB config problemMike Travis
Some code added to the debug_core module had KDB dependencies that it shouldn't have. Move the KDB dependent REASON back to the caller to remove the dependency in the debug core code. Update the call from the UV NMI handler to conform to the new interface. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Hedi Berriche <hedi@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/20140114162551.318251993@asylum.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25Merge branch 'timers/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/urgent Pull dynticks cleanups from Frederic Weisbecker. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-24Merge tag 'pm+acpi-3.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: "As far as the number of commits goes, the top spot belongs to ACPI this time with cpufreq in the second position and a handful of PM core, PNP and cpuidle updates. They are fixes and cleanups mostly, as usual, with a couple of new features in the mix. The most visible change is probably that we will create struct acpi_device objects (visible in sysfs) for all devices represented in the ACPI tables regardless of their status and there will be a new sysfs attribute under those objects allowing user space to check that status via _STA. Consequently, ACPI device eject or generally hot-removal will not delete those objects, unless the table containing the corresponding namespace nodes is unloaded, which is extremely rare. Also ACPI container hotplug will be handled quite a bit differently and cpufreq will support CPU boost ("turbo") generically and not only in the acpi-cpufreq driver. Specifics: - ACPI core changes to make it create a struct acpi_device object for every device represented in the ACPI tables during all namespace scans regardless of the current status of that device. In accordance with this, ACPI hotplug operations will not delete those objects, unless the underlying ACPI tables go away. - On top of the above, new sysfs attribute for ACPI device objects allowing user space to check device status by triggering the execution of _STA for its ACPI object. From Srinivas Pandruvada. - ACPI core hotplug changes reducing code duplication, integrating the PCI root hotplug with the core and reworking container hotplug. - ACPI core simplifications making it use ACPI_COMPANION() in the code "glueing" ACPI device objects to "physical" devices. - ACPICA update to upstream version 20131218. This adds support for the DBG2 and PCCT tables to ACPICA, fixes some bugs and improves debug facilities. From Bob Moore, Lv Zheng and Betty Dall. - Init code change to carry out the early ACPI initialization earlier. That should allow us to use ACPI during the timekeeping initialization and possibly to simplify the EFI initialization too. From Chun-Yi Lee. - Clenups of the inclusions of ACPI headers in many places all over from Lv Zheng and Rashika Kheria (work in progress). - New helper for ACPI _DSM execution and rework of the code in drivers that uses _DSM to execute it via the new helper. From Jiang Liu. - New Win8 OSI blacklist entries from Takashi Iwai. - Assorted ACPI fixes and cleanups from Al Stone, Emil Goode, Hanjun Guo, Lan Tianyu, Masanari Iida, Oliver Neukum, Prarit Bhargava, Rashika Kheria, Tang Chen, Zhang Rui. - intel_pstate driver updates, including proper Baytrail support, from Dirk Brandewie and intel_pstate documentation from Ramkumar Ramachandra. - Generic CPU boost ("turbo") support for cpufreq from Lukasz Majewski. - powernow-k6 cpufreq driver fixes from Mikulas Patocka. - cpufreq core fixes and cleanups from Viresh Kumar, Jane Li, Mark Brown. - Assorted cpufreq drivers fixes and cleanups from Anson Huang, John Tobias, Paul Bolle, Paul Walmsley, Sachin Kamat, Shawn Guo, Viresh Kumar. - cpuidle cleanups from Bartlomiej Zolnierkiewicz. - Support for hibernation APM events from Bin Shi. - Hibernation fix to avoid bringing up nonboot CPUs with ACPI EC disabled during thaw transitions from Bjørn Mork. - PM core fixes and cleanups from Ben Dooks, Leonardo Potenza, Ulf Hansson. - PNP subsystem fixes and cleanups from Dmitry Torokhov, Levente Kurusa, Rashika Kheria. - New tool for profiling system suspend from Todd E Brandt and a cpupower tool cleanup from One Thousand Gnomes" * tag 'pm+acpi-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (153 commits) thermal: exynos: boost: Automatic enable/disable of BOOST feature (at Exynos4412) cpufreq: exynos4x12: Change L0 driver data to CPUFREQ_BOOST_FREQ Documentation: cpufreq / boost: Update BOOST documentation cpufreq: exynos: Extend Exynos cpufreq driver to support boost cpufreq / boost: Kconfig: Support for software-managed BOOST acpi-cpufreq: Adjust the code to use the common boost attribute cpufreq: Add boost frequency support in core intel_pstate: Add trace point to report internal state. cpufreq: introduce cpufreq_generic_get() routine ARM: SA1100: Create dummy clk_get_rate() to avoid build failures cpufreq: stats: create sysfs entries when cpufreq_stats is a module cpufreq: stats: free table and remove sysfs entry in a single routine cpufreq: stats: remove hotplug notifiers cpufreq: stats: handle cpufreq_unregister_driver() and suspend/resume properly cpufreq: speedstep: remove unused speedstep_get_state platform: introduce OF style 'modalias' support for platform bus PM / tools: new tool for suspend/resume performance optimization ACPI: fix module autoloading for ACPI enumerated devices ACPI: add module autoloading support for ACPI enumerated devices ACPI: fix create_modalias() return value handling ...
2014-01-23Merge branch 'akpm' (incoming from Andrew)Linus Torvalds
Merge second patch-bomb from Andrew Morton: - various misc bits - the rest of MM - add generic fixmap.h, use it - backlight updates - dynamic_debug updates - printk() updates - checkpatch updates - binfmt_elf - ramfs - init/ - autofs4 - drivers/rtc - nilfs - hfsplus - Documentation/ - coredump - procfs - fork - exec - kexec - kdump - partitions - rapidio - rbtree - userns - memstick - w1 - decompressors * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (197 commits) lib/decompress_unlz4.c: always set an error return code on failures romfs: fix returm err while getting inode in fill_super drivers/w1/masters/w1-gpio.c: add strong pullup emulation drivers/memstick/host/rtsx_pci_ms.c: fix ms card data transfer bug userns: relax the posix_acl_valid() checks arch/sh/kernel/dwarf.c: use rbtree postorder iteration helper instead of solution using repeated rb_erase() fs-ext3-use-rbtree-postorder-iteration-helper-instead-of-opencoding-fix fs/ext3: use rbtree postorder iteration helper instead of opencoding fs/jffs2: use rbtree postorder iteration helper instead of opencoding fs/ext4: use rbtree postorder iteration helper instead of opencoding fs/ubifs: use rbtree postorder iteration helper instead of opencoding net/netfilter/ipset/ip_set_hash_netiface.c: use rbtree postorder iteration instead of opencoding rbtree/test: test rbtree_postorder_for_each_entry_safe() rbtree/test: move rb_node to the middle of the test struct rapidio: add modular rapidio core build into powerpc and mips branches partitions/efi: complete documentation of gpt kernel param purpose kdump: add /sys/kernel/vmcoreinfo ABI documentation kdump: fix exported size of vmcoreinfo note kexec: add sysctl to disable kexec_load fs/exec.c: call arch_pick_mmap_layout() only once ...
2014-01-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
Pull crypto update from Herbert Xu: "Here is the crypto update for 3.14: - Improved crypto_memneq helper - Use cyprto_memneq in arch-specific crypto code - Replaced orphaned DCP driver with Freescale MXS DCP driver - Added AVX/AVX2 version of AESNI-GCM encode and decode - Added AMD Cryptographic Coprocessor (CCP) driver - Misc fixes" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (41 commits) crypto: aesni - fix build on x86 (32bit) crypto: mxs - Fix sparse non static symbol warning crypto: ccp - CCP device enabled/disabled changes crypto: ccp - Cleanup hash invocation calls crypto: ccp - Change data length declarations to u64 crypto: ccp - Check for caller result area before using it crypto: ccp - Cleanup scatterlist usage crypto: ccp - Apply appropriate gfp_t type to memory allocations crypto: drivers - Sort drivers/crypto/Makefile ARM: mxs: dts: Enable DCP for MXS crypto: mxs - Add Freescale MXS DCP driver crypto: mxs - Remove the old DCP driver crypto: ahash - Fully restore ahash request before completing crypto: aesni - fix build on x86 (32bit) crypto: talitos - Remove redundant dev_set_drvdata crypto: ccp - Remove redundant dev_set_drvdata crypto: crypto4xx - Remove redundant dev_set_drvdata crypto: caam - simplify and harden key parsing crypto: omap-sham - Fix Polling mode for larger blocks crypto: tcrypt - Added speed tests for AEAD crypto alogrithms in tcrypt test suite ...
2014-01-23Merge git://git.infradead.org/users/eparis/auditLinus Torvalds
Pull audit update from Eric Paris: "Again we stayed pretty well contained inside the audit system. Venturing out was fixing a couple of function prototypes which were inconsistent (didn't hurt anything, but we used the same value as an int, uint, u32, and I think even a long in a couple of places). We also made a couple of minor changes to when a couple of LSMs called the audit system. We hoped to add aarch64 audit support this go round, but it wasn't ready. I'm disappearing on vacation on Thursday. I should have internet access, but it'll be spotty. If anything goes wrong please be sure to cc rgb@redhat.com. He'll make fixing things his top priority" * git://git.infradead.org/users/eparis/audit: (50 commits) audit: whitespace fix in kernel-parameters.txt audit: fix location of __net_initdata for audit_net_ops audit: remove pr_info for every network namespace audit: Modify a set of system calls in audit class definitions audit: Convert int limit uses to u32 audit: Use more current logging style audit: Use hex_byte_pack_upper audit: correct a type mismatch in audit_syscall_exit() audit: reorder AUDIT_TTY_SET arguments audit: rework AUDIT_TTY_SET to only grab spin_lock once audit: remove needless switch in AUDIT_SET audit: use define's for audit version audit: documentation of audit= kernel parameter audit: wait_for_auditd rework for readability audit: update MAINTAINERS audit: log task info on feature change audit: fix incorrect set of audit_sock audit: print error message when fail to create audit socket audit: fix dangling keywords in audit_log_set_loginuid() output audit: log on errors from filter user rules ...
2014-01-23kdump: fix exported size of vmcoreinfo noteVivek Goyal
Right now we seem to be exporting the max data size contained inside vmcoreinfo note. But this does not include the size of meta data around vmcore info data. Like name of the note and starting and ending elf_note. I think user space expects total size and that size is put in PT_NOTE elf header. Things seem to be fine so far because we are not using vmcoreinfo note to the maximum capacity. But as it starts filling up, to capacity, at some point of time, problem will be visible. I don't think user space will be broken with this change. So there is no need to introduce vmcoreinfo2. This change is safe and backward compatible. More explanation on why this change is safe is below. vmcoreinfo contains information about kernel which user space needs to know to do things like filtering. For example, various kernel config options or information about size or offset of some data structures etc. All this information is commmunicated to user space with an ELF note present in ELF /proc/vmcore file. Currently vmcoreinfo data size is 4096. With some elf note meta data around it, actual size is 4132 bytes. But we are using barely 25% of that size. Rest is empty. So even if we tell user space that size of ELf note is 4096 and not 4132, nothing will be broken becase after around 1000 bytes, everything is zero anyway. But once we start filling up the note to the capacity, and not report the full size of note, bad things will start happening. Either some data will be lost or tools will be confused that they did not fine the zero note at the end. So I think this change is safe and should not break existing tools. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Cc: Dan Aloni <da-x@monatomic.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23kexec: add sysctl to disable kexec_loadKees Cook
For general-purpose (i.e. distro) kernel builds it makes sense to build with CONFIG_KEXEC to allow end users to choose what kind of things they want to do with kexec. However, in the face of trying to lock down a system with such a kernel, there needs to be a way to disable kexec_load (much like module loading can be disabled). Without this, it is too easy for the root user to modify kernel memory even when CONFIG_STRICT_DEVMEM and modules_disabled are set. With this change, it is still possible to load an image for use later, then disable kexec_load so the image (or lack of image) can't be altered. The intention is for using this in environments where "perfect" enforcement is hard. Without a verified boot, along with verified modules, and along with verified kexec, this is trying to give a system a better chance to defend itself (or at least grow the window of discoverability) against attack in the face of a privilege escalation. In my mind, I consider several boot scenarios: 1) Verified boot of read-only verified root fs loading fd-based verification of kexec images. 2) Secure boot of writable root fs loading signed kexec images. 3) Regular boot loading kexec (e.g. kcrash) image early and locking it. 4) Regular boot with no control of kexec image at all. 1 and 2 don't exist yet, but will soon once the verified kexec series has landed. 4 is the state of things now. The gap between 2 and 4 is too large, so this change creates scenario 3, a middle-ground above 4 when 2 and 1 are not possible for a system. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Rik van Riel <riel@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23kernel/signal.c: change do_signal_stop/do_sigaction to use while_each_thread()Oleg Nesterov
Change do_signal_stop() and do_sigaction() to avoid next_thread() and use while_each_thread() instead. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Reviewed-by: Sameer Nanda <snanda@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23kernel/sys.c: k_getrusage() can use while_each_thread()Oleg Nesterov
Change k_getrusage() to use while_each_thread(), no changes in the compiled code. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Reviewed-by: Sameer Nanda <snanda@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23exec: kill task_struct->did_execOleg Nesterov
We can kill either task->did_exec or PF_FORKNOEXEC, they are mutually exclusive. The patch kills ->did_exec because it has a single user. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23kernel/fork.c: remove redundant NULL check in dup_mm()Daeseok Youn
current->mm doesn't need a NULL check in dup_mm(). Becasue dup_mm() is used only in copy_mm() and current->mm is checked whether it is NULL or not in copy_mm() before calling dup_mm(). Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23kernel/fork.c: fix coding style issuesDaeseok Youn
Fix errors reported by checkpatch.pl. One error is parentheses, the other is a whitespace issue. Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23kernel/fork.c: make dup_mm() staticDaeSeok Youn
dup_mm() is used only in kernel/fork.c Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23printk: flush conflicting continuation lineArun KS
An earlier newline was missing and current print is from different task. In this scenario flush the continuation line and store this line seperatly. This patch fix the below scenario of timestamp interleaving, [ 28.154370 ] read_word_reg : reg[0x 3], reg[0x 4] data [0x 642] [ 28.155428 ] uart disconnect [ 31.947341 ] dvfs[cpufreq.c<275>]:plug-in cpu<1> done [ 28.155445 ] UART detached : send switch state 201 [ 32.014112 ] read_reg : reg[0x 3] data[0x21] [akpm@linux-foundation.org: simplify and condense the code] Signed-off-by: Arun KS <getarunks@gmail.com> Signed-off-by: Arun KS <arun.ks@broadcom.com> Cc: Joe Perches <joe@perches.com> Cc: Tejun Heo <tj@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23numa: add a sysctl for numa_balancingAndi Kleen
Add a working sysctl to enable/disable automatic numa memory balancing at runtime. This allows us to track down performance problems with this feature and is generally a good idea. This was possible earlier through debugfs, but only with special debugging options set. Also fix the boot message. [akpm@linux-foundation.org: s/sched_numa_balancing/sysctl_numa_balancing/] Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23tracing: Check if tracing is enabled in trace_puts()Steven Rostedt (Red Hat)
If trace_puts() is used very early in boot up, it can crash the machine if it is called before the ring buffer is allocated. If a trace_printk() is used with no arguments, then it will be converted into a trace_puts() and suffer the same fate. Cc: stable@vger.kernel.org # 3.10+ Fixes: 09ae72348ecc "tracing: Add trace_puts() for even faster trace_printk() tracing" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-23sched/clock: Fixup early initializationPeter Zijlstra
The code would assume sched_clock_stable() and switch to !stable later, this switch brings a discontinuity in time. The discontinuity on switching from stable to unstable was always present, but previously we would set stable/unstable before initializing TSC and usually stick to the one we start out with. So the static_key bits brought an extra switch where there previously wasn't one. Things are further complicated by the fact that we cannot use static_key as early as we usually call set_sched_clock_stable(). Fix things by tracking the stable state in a regular variable and only set the static_key to the right state on sched_clock_init(), which is ran right after late_time_init->tsc_init(). Before this we would not be using the TSC anyway. Reported-and-Tested-by: Sasha Levin <sasha.levin@oracle.com> Reported-by: dyoung@redhat.com Fixes: 35af99e646c7 ("sched/clock, x86: Use a static_key for sched_clock_stable") Cc: jacob.jun.pan@linux.intel.com Cc: Mike Galbraith <bitbucket@online.de> Cc: hpa@zytor.com Cc: paulmck@linux.vnet.ibm.com Cc: John Stultz <john.stultz@linaro.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: lenb@kernel.org Cc: rjw@rjwysocki.net Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: rui.zhang@intel.com Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140122115918.GG3694@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-23Revert "sched: Fix sleep time double accounting in enqueue entity"Vincent Guittot
This reverts commit 282cf499f03ec1754b6c8c945c9674b02631fb0f. With the current implementation, the load average statistics of a sched entity change according to other activity on the CPU even if this activity is done between the running window of the sched entity and have no influence on the running duration of the task. When a task wakes up on the same CPU, we currently update last_runnable_update with the return of __synchronize_entity_decay without updating the runnable_avg_sum and runnable_avg_period accordingly. In fact, we have to sync the load_contrib of the se with the rq's blocked_load_contrib before removing it from the latter (with __synchronize_entity_decay) but we must keep last_runnable_update unchanged for updating runnable_avg_sum/period during the next update_entity_load_avg. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Ben Segall <bsegall@google.com> Cc: pjt@google.com Cc: alex.shi@linaro.org Link: http://lkml.kernel.org/r/1390376734-6800-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-22Merge tag 'modules-next-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module updates from Rusty Russell. * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: module: Add missing newline in printk call. module: fix coding style export: declare ksymtab symbols module.h: Remove unnecessary semicolon params: improve standard definitions Add Documentation/module-signing.txt file
2014-01-23tracing: Fix formatting of trace README fileSteven Rostedt (Red Hat)
Fix the formatting of the README file in the trace debugfs to fit in an 80 character window. Also add a comment about the event trigger counter with regards to traceon and traceoff. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-22tracing/README: Add event file usage to tracing mini-HOWTOTom Zanussi
It would be useful to have a cheat-sheet for everything under tracing/events/ alongside the existing text describing the other files in the tracing/ dir. Add short descriptions of the directories and files under events/ along with examples, similar to the existing text for the other files in tracing/. Also clean up a few minor alignment problems noticed when adding the new text. Link: http://lkml.kernel.org/r/1389993104.3040.445.camel@empanada Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-22Merge tag 'trace-3.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This pull request has a new feature to ftrace, namely the trace event triggers by Tom Zanussi. A trigger is a way to enable an action when an event is hit. The actions are: o trace on/off - enable or disable tracing o snapshot - save the current trace buffer in the snapshot o stacktrace - dump the current stack trace to the ringbuffer o enable/disable events - enable or disable another event Namhyung Kim added updates to the tracing uprobes code. Having the uprobes add support for fetch methods. The rest are various bug fixes with the new code, and minor ones for the old code" * tag 'trace-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (38 commits) tracing: Fix buggered tee(2) on tracing_pipe tracing: Have trace buffer point back to trace_array ftrace: Fix synchronization location disabling and freeing ftrace_ops ftrace: Have function graph only trace based on global_ops filters ftrace: Synchronize setting function_trace_op with ftrace_trace_function tracing: Show available event triggers when no trigger is set tracing: Consolidate event trigger code tracing: Fix counter for traceon/off event triggers tracing: Remove double-underscore naming in syscall trigger invocations tracing/kprobes: Add trace event trigger invocations tracing/probes: Fix build break on !CONFIG_KPROBE_EVENT tracing/uprobes: Add @+file_offset fetch method uprobes: Allocate ->utask before handler_chain() for tracing handlers tracing/uprobes: Add support for full argument access methods tracing/uprobes: Fetch args before reserving a ring buffer tracing/uprobes: Pass 'is_return' to traceprobe_parse_probe_arg() tracing/probes: Implement 'memory' fetch method for uprobes tracing/probes: Add fetch{,_size} member into deref fetch method tracing/probes: Move 'symbol' fetch method to kprobes tracing/probes: Implement 'stack' fetch method for uprobes ...
2014-01-21Merge branch 'akpm' (incoming from Andrew)Linus Torvalds
Merge first patch-bomb from Andrew Morton: - a couple of misc things - inotify/fsnotify work from Jan - ocfs2 updates (partial) - about half of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits) mm/migrate: remove unused function, fail_migrate_page() mm/migrate: remove putback_lru_pages, fix comment on putback_movable_pages mm/migrate: correct failure handling if !hugepage_migration_support() mm/migrate: add comment about permanent failure path mm, page_alloc: warn for non-blockable __GFP_NOFAIL allocation failure mm: compaction: reset scanner positions immediately when they meet mm: compaction: do not mark unmovable pageblocks as skipped in async compaction mm: compaction: detect when scanners meet in isolate_freepages mm: compaction: reset cached scanner pfn's before reading them mm: compaction: encapsulate defer reset logic mm: compaction: trace compaction begin and end memcg, oom: lock mem_cgroup_print_oom_info sched: add tracepoints related to NUMA task migration mm: numa: do not automatically migrate KSM pages mm: numa: trace tasks that fail migration due to rate limiting mm: numa: limit scope of lock for NUMA migrate rate limiting mm: numa: make NUMA-migrate related functions static lib/show_mem.c: show num_poisoned_pages when oom mm/hwpoison: add '#' to hwpoison_inject mm/memblock: use WARN_ONCE when MAX_NUMNODES passed as input parameter ...
2014-01-21Merge branch 'for-3.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "The bulk of changes are cleanups and preparations for the upcoming kernfs conversion. - cgroup_event mechanism which is and will be used only by memcg is moved to memcg. - pidlist handling is updated so that it can be served by seq_file. Also, the list is not sorted if sane_behavior. cgroup documentation explicitly states that the file is not sorted but it has been for quite some time. - All cgroup file handling now happens on top of seq_file. This is to prepare for kernfs conversion. In addition, all operations are restructured so that they map 1-1 to kernfs operations. - Other cleanups and low-pri fixes" * 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (40 commits) cgroup: trivial style updates cgroup: remove stray references to css_id doc: cgroups: Fix typo in doc/cgroups cgroup: fix fail path in cgroup_load_subsys() cgroup: fix missing unlock on error in cgroup_load_subsys() cgroup: remove for_each_root_subsys() cgroup: implement for_each_css() cgroup: factor out cgroup_subsys_state creation into create_css() cgroup: combine css handling loops in cgroup_create() cgroup: reorder operations in cgroup_create() cgroup: make for_each_subsys() useable under cgroup_root_mutex cgroup: css iterations and css_from_dir() are safe under cgroup_mutex cgroup: unify pidlist and other file handling cgroup: replace cftype->read_seq_string() with cftype->seq_show() cgroup: attach cgroup_open_file to all cgroup files cgroup: generalize cgroup_pidlist_open_file cgroup: unify read path so that seq_file is always used cgroup: unify cgroup_write_X64() and cgroup_write_string() cgroup: remove cftype->read(), ->read_map() and ->write() hugetlb_cgroup: convert away from cftype->read() ...
2014-01-21Merge branch 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
Pull workqueue update from Tejun Heo: "Just one patch to add destroy_work_on_stack() annotations to help debugobj debugging" * 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
2014-01-21sched: add tracepoints related to NUMA task migrationMel Gorman
This patch adds three tracepoints o trace_sched_move_numa when a task is moved to a node o trace_sched_swap_numa when a task is swapped with another task o trace_sched_stick_numa when a numa-related migration fails The tracepoints allow the NUMA scheduler activity to be monitored and the following high-level metrics can be calculated o NUMA migrated stuck nr trace_sched_stick_numa o NUMA migrated idle nr trace_sched_move_numa o NUMA migrated swapped nr trace_sched_swap_numa o NUMA local swapped trace_sched_swap_numa src_nid == dst_nid (should never happen) o NUMA remote swapped trace_sched_swap_numa src_nid != dst_nid (should == NUMA migrated swapped) o NUMA group swapped trace_sched_swap_numa src_ngid == dst_ngid Maybe a small number of these are acceptable but a high number would be a major surprise. It would be even worse if bounces are frequent. o NUMA avg task migs. Average number of migrations for tasks o NUMA stddev task mig Self-explanatory o NUMA max task migs. Maximum number of migrations for a single task In general the intent of the tracepoints is to help diagnose problems where automatic NUMA balancing appears to be doing an excessive amount of useless work. [akpm@linux-foundation.org: remove semicolon-after-if, repair coding-style] Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21kernel/power/snapshot.c: use memblock apis for early memory allocationsSantosh Shilimkar
Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fallback to exiting bootmem APIs. Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Paul Walmsley <paul@pwsan.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: Tony Lindgren <tony@atomide.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21kernel/printk/printk.c: use memblock apis for early memory allocationsSantosh Shilimkar
Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fallback to exiting bootmem APIs. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Paul Walmsley <paul@pwsan.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21introduce for_each_thread() to replace the buggy while_each_thread()Oleg Nesterov
while_each_thread() and next_thread() should die, almost every lockless usage is wrong. 1. Unless g == current, the lockless while_each_thread() is not safe. while_each_thread(g, t) can loop forever if g exits, next_thread() can't reach the unhashed thread in this case. Note that this can happen even if g is the group leader, it can exec. 2. Even if while_each_thread() itself was correct, people often use it wrongly. It was never safe to just take rcu_read_lock() and loop unless you verify that pid_alive(g) == T, even the first next_thread() can point to the already freed/reused memory. This patch adds signal_struct->thread_head and task->thread_node to create the normal rcu-safe list with the stable head. The new for_each_thread(g, t) helper is always safe under rcu_read_lock() as long as this task_struct can't go away. Note: of course it is ugly to have both task_struct->thread_node and the old task_struct->thread_group, we will kill it later, after we change the users of while_each_thread() to use for_each_thread(). Perhaps we can kill it even before we convert all users, we can reimplement next_thread(t) using the new thread_head/thread_node. But we can't do this right now because this will lead to subtle behavioural changes. For example, do/while_each_thread() always sees at least one task, while for_each_thread() can do nothing if the whole thread group has died. Or thread_group_empty(), currently its semantics is not clear unless thread_group_leader(p) and we need to audit the callers before we can change it. So this patch adds the new interface which has to coexist with the old one for some time, hopefully the next changes will be more or less straightforward and the old one will go away soon. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Sergey Dyasly <dserrg@gmail.com> Tested-by: Sergey Dyasly <dserrg@gmail.com> Reviewed-by: Sameer Nanda <snanda@chromium.org> Acked-by: David Rientjes <rientjes@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mandeep Singh Baines <msb@chromium.org> Cc: "Ma, Xindong" <xindong.ma@intel.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21mm: add overcommit_kbytes sysctl variableJerome Marchand
Some applications that run on HPC clusters are designed around the availability of RAM and the overcommit ratio is fine tuned to get the maximum usage of memory without swapping. With growing memory, the 1%-of-all-RAM grain provided by overcommit_ratio has become too coarse for these workload (on a 2TB machine it represents no less than 20GB). This patch adds the new overcommit_kbytes sysctl variable that allow a much finer grain. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix nommu build] Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21fsnotify: remove pointless NULL initializersJan Kara
We usually rely on the fact that struct members not specified in the initializer are set to NULL. So do that with fsnotify function pointers as well. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21fsnotify: remove .should_send_event callbackJan Kara
After removing event structure creation from the generic layer there is no reason for separate .should_send_event and .handle_event callbacks. So just remove the first one. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21fsnotify: do not share events between notification groupsJan Kara
Currently fsnotify framework creates one event structure for each notification event and links this event into all interested notification groups. This is done so that we save memory when several notification groups are interested in the event. However the need for event structure shared between inotify & fanotify bloats the event structure so the result is often higher memory consumption. Another problem is that fsnotify framework keeps path references with outstanding events so that fanotify can return open file descriptors with its events. This has the undesirable effect that filesystem cannot be unmounted while there are outstanding events - a regression for inotify compared to a situation before it was converted to fsnotify framework. For fanotify this problem is hard to avoid and users of fanotify should kind of expect this behavior when they ask for file descriptors from notified files. This patch changes fsnotify and its users to create separate event structure for each group. This allows for much simpler code (~400 lines removed by this patch) and also smaller event structures. For example on 64-bit system original struct fsnotify_event consumes 120 bytes, plus additional space for file name, additional 24 bytes for second and each subsequent group linking the event, and additional 32 bytes for each inotify group for private data. After the conversion inotify event consumes 48 bytes plus space for file name which is considerably less memory unless file names are long and there are several groups interested in the events (both of which are uncommon). Fanotify event fits in 56 bytes after the conversion (fanotify doesn't care about file names so its events don't have to have it allocated). A win unless there are four or more fanotify groups interested in the event. The conversion also solves the problem with unmount when only inotify is used as we don't have to grab path references for inotify events. [hughd@google.com: fanotify: fix corruption preventing startup] Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21module: Add missing newline in printk call.Tetsuo Handa
Add missing \n and also follow commit bddb12b3 "kernel/module.c: use pr_foo()". Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-01-20Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer changes from Ingo Molnar: - ARM clocksource/clockevent improvements and fixes - generic timekeeping updates: TAI fixes/improvements, cleanups - Posix cpu timer cleanups and improvements - dynticks updates: full dynticks bugfixes, optimizations and cleanups * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) clocksource: Timer-sun5i: Switch to sched_clock_register() timekeeping: Remove comment that's mostly out of date rtc-cmos: Add an alarm disable quirk timekeeper: fix comment typo for tk_setup_internals() timekeeping: Fix missing timekeeping_update in suspend path timekeeping: Fix CLOCK_TAI timer/nanosleep delays tick/timekeeping: Call update_wall_time outside the jiffies lock timekeeping: Avoid possible deadlock from clock_was_set_delayed timekeeping: Fix potential lost pv notification of time change timekeeping: Fix lost updates to tai adjustment clocksource: sh_cmt: Add clk_prepare/unprepare support clocksource: bcm_kona_timer: Remove unused bcm_timer_ids clocksource: vt8500: Remove deprecated IRQF_DISABLED clocksource: tegra: Remove deprecated IRQF_DISABLED clocksource: misc drivers: Remove deprecated IRQF_DISABLED clocksource: sh_mtu2: Remove unnecessary platform_set_drvdata() clocksource: sh_tmu: Remove unnecessary platform_set_drvdata() clocksource: armada-370-xp: Enable timer divider only when needed clocksource: clksrc-of: Warn if no clock sources are found clocksource: orion: Switch to sched_clock_register() ...
2014-01-20Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: - Add the initial implementation of SCHED_DEADLINE support: a real-time scheduling policy where tasks that meet their deadlines and periodically execute their instances in less than their runtime quota see real-time scheduling and won't miss any of their deadlines. Tasks that go over their quota get delayed (Available to privileged users for now) - Clean up and fix preempt_enable_no_resched() abuse all around the tree - Do sched_clock() performance optimizations on x86 and elsewhere - Fix and improve auto-NUMA balancing - Fix and clean up the idle loop - Apply various cleanups and fixes * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) sched: Fix __sched_setscheduler() nice test sched: Move SCHED_RESET_ON_FORK into attr::sched_flags sched: Fix up attr::sched_priority warning sched: Fix up scheduler syscall LTP fails sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls sched/core: Fix htmldocs warnings sched/deadline: No need to check p if dl_se is valid sched/deadline: Remove unused variables sched/deadline: Fix sparse static warnings m68k: Fix build warning in mac_via.h sched, thermal: Clean up preempt_enable_no_resched() abuse sched, net: Fixup busy_loop_us_clock() sched, net: Clean up preempt_enable_no_resched() abuse sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding sched/preempt, locking: Rework local_bh_{dis,en}able() sched/clock, x86: Avoid a runtime condition in native_sched_clock() sched/clock: Fix up clear_sched_clock_stable() sched/clock, x86: Use a static_key for sched_clock_stable sched/clock: Remove local_irq_disable() from the clocks sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs ...
2014-01-20Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side changes: - Add Intel RAPL energy counter support (Stephane Eranian) - Clean up uprobes (Oleg Nesterov) - Optimize ring-buffer writes (Peter Zijlstra) Tooling side changes, user visible: - 'perf diff': - Add column colouring improvements (Ramkumar Ramachandra) - 'perf kvm': - Add guest related improvements, including allowing to specify a directory with guest specific /proc information (Dongsheng Yang) - Add shell completion support (Ramkumar Ramachandra) - Add '-v' option (Dongsheng Yang) - Support --guestmount (Dongsheng Yang) - 'perf probe': - Support showing source code, asking for variables to be collected at probe time and other 'perf probe' operations that use DWARF information. This supports only binaries with debugging information at this time, detached debuginfo (aka debuginfo packages) support should come in later patches (Masami Hiramatsu) - 'perf record': - Rename --no-delay option to --no-buffering, better reflecting its purpose and freeing up '--delay' to take the place of '--initial-delay', so that 'record' and 'stat' are consistent (Arnaldo Carvalho de Melo) - Default the -t/--thread option to no inheritance (Adrian Hunter) - Make per-cpu mmaps the default (Adrian Hunter) - 'perf report': - Improve callchain processing performance (Frederic Weisbecker) - Retain bfd reference to lookup source line numbers, greatly optimizing, among other use cases, 'perf report -s srcline' (Adrian Hunter) - Improve callchain processing performance even more (Namhyung Kim) - Add a perf.data file header window in the 'perf report' TUI, associated with the 'i' hotkey, providing a counterpart to the --header option in the stdio UI (Namhyung Kim) - 'perf script': - Add an option in 'perf script' to print the source line number (Adrian Hunter) - Add --header/--header-only options to 'script' and 'report', the default is not tho show the header info, but as this has been the default for some time, leave a single line explaining how to obtain that information (Jiri Olsa) - Add options to show comm, fork, exit and mmap PERF_RECORD_ events (Namhyung Kim) - Print callchains and symbols if they exist (David Ahern) - 'perf timechart' - Add backtrace support to CPU info - Print pid along the name - Add support for CPU topology - Add new option --highlight'ing threads, be it by name or, if a numeric value is provided, that run more than given duration (Stanislav Fomichev) - 'perf top': - Make 'perf top -g' refer to callchains, for consistency with other tools (David Ahern) - 'perf trace': - Handle old kernels where the "raw_syscalls" tracepoints were called plain "syscalls" (David Ahern) - Remove thread summary coloring, by Pekka Enberg. - Honour -m option in 'trace', the tool was offering the option to set the mmap size, but wasn't using it when doing the actual mmap on the events file descriptors (Jiri Olsa) - generic: - Backport libtraceevent plugin support (trace-cmd repository, with plugins for jbd2, hrtimer, kmem, kvm, mac80211, sched_switch, function, xen, scsi, cfg80211 (Jiri Olsa) - Print session information only if --stdio is given (Namhyung Kim) Tooling side changes, developer visible (plumbing): - Improve 'perf probe' exit path, release resources (Masami Hiramatsu) - Improve libtraceevent plugins exit path, allowing the registering of an unregister handler to be called at exit time (Namhyung Kim) - Add an alias to the build test makefile (make -C tools/perf build-test) (Namhyung Kim) - Get rid of die() and friends (good riddance!) in libtraceevent (Namhyung Kim) - Fix cross build problems related to pkgconfig and CROSS_COMPILE not being propagated to the feature tests, leading to features being tested in the host and then being enabled on the target (Mark Rutland) - Improve forked workload error reporting by sending the errno in the signal data queueing integer field, using sigqueue and by doing the signal setup in the evlist methods, removing open coded equivalents in various tools (Arnaldo Carvalho de Melo) - Do more auto exit cleanup chores in the 'evlist' destructor, so that the tools don't have to all do that sequence (Arnaldo Carvalho de Melo) - Pack 'struct perf_session_env' and 'struct trace' (Arnaldo Carvalho de Melo) - Add test for building detached source tarballs (Arnaldo Carvalho de Melo) - Move some header files (tools/perf/ to tools/include/ to make them available to other tools/ dwelling codebases (Namhyung Kim) - Move logic to warn about kptr_restrict'ed kernels to separate function in 'report' (Arnaldo Carvalho de Melo) - Move hist browser selection code to separate function (Arnaldo Carvalho de Melo) - Move histogram entries collapsing to separate function (Arnaldo Carvalho de Melo) - Introduce evlist__for_each() & friends (Arnaldo Carvalho de Melo) - Automate setup of FEATURE_CHECK_(C|LD)FLAGS-all variables (Jiri Olsa) - Move arch setup into seprate Makefile (Jiri Olsa) - Make libtraceevent install target quieter (Jiri Olsa) - Make tests/make output more compact (Jiri Olsa) - Ignore generated files in feature-checks (Chunwei Chen) - Introduce pevent_filter_strerror() in libtraceevent, similar in purpose to libc's strerror() function (Namhyung Kim) - Use perf_data_file methods to write output file in 'record' and 'inject' (Jiri Olsa) - Use pr_*() functions where applicable in 'report' (Namhyumg Kim) - Add 'machine' 'addr_location' struct to have full picture (machine, thread, map, symbol, addr) for a (partially) resolved address, reducing function signatures (Arnaldo Carvalho de Melo) - Reduce code duplication in the histogram entry creation/insertion (Arnaldo Carvalho de Melo) - Auto allocate annotation histogram data structures (Arnaldo Carvalho de Melo) - No need to test against NULL before calling free, also set freed memory in struct pointers to NULL, to help fixing use after free bugs (Arnaldo Carvalho de Melo) - Rename some struct DSO binary_type related members and methods, to clarify its purpose and need for differentiation (symtab_type, ie one is about the files .text, CFI, etc, i.e. its binary contents, and the other is about where the symbol table came from (Arnaldo Carvalho de Melo) - Convert to new topic libraries, starting with an API one (sysfs, debugfs, etc), renaming liblk in the process (Borislav Petkov) - Get rid of some more panic() like error handling in libtraceevent. (Namhyung Kim) - Get rid of panic() like calls in libtraceevent (Namyung Kim) - Start carving out symbol parsing routines (perf, just moving routines to topic files in tools/lib/symbol/, tools that want to use it need to integrate it directly, ie no tools/lib/symbol/Makefile is provided (Arnaldo Carvalho de Melo) - Assorted refactoring patches, moving code around and adding utility evlist methods that will be used in the IPT patchset (Adrian Hunter) - Assorted mmap_pages handling fixes (Adrian Hunter) - Several man pages typo fixes (Dongsheng Yang) - Get rid of several die() calls in libtraceevent (Namhyung Kim) - Use basename() in a more robust way, to avoid problems related to different system library implementations for that function (Stephane Eranian) - Remove open coded management of short_name_allocated member (Adrian Hunter) - Several cleanups in the "dso" methods, constifying some parameters and renaming some fields to clarify its purpose (Arnaldo Carvalho de Melo) - Add per-feature check flags, fixing libunwind related build problems on some architectures (Jean Pihet) - Do not disable source line lookup just because of one failure. (Adrian Hunter) - Several 'perf kvm' man page corrections (Dongsheng Yang) - Correct the message in feature-libnuma checking, swowing the right devel package names for various distros (Dongsheng Yang) - Polish 'readn()' function and introduce its counterpart, 'writen()' (Jiri Olsa) - Start moving timechart state from global variables to a 'perf_tool' derived 'timechart' struct (Arnaldo Carvalho de Melo) ... and lots of fixes and improvements I forgot to list" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (282 commits) perf tools: Remove unnecessary callchain cursor state restore on unmatch perf callchain: Spare double comparison of callchain first entry perf tools: Do proper comm override error handling perf symbols: Export elf_section_by_name and reuse perf probe: Release all dynamically allocated parameters perf probe: Release allocated probe_trace_event if failed perf tools: Add 'build-test' make target tools lib traceevent: Unregister handler when xen plugin is unloaded tools lib traceevent: Unregister handler when scsi plugin is unloaded tools lib traceevent: Unregister handler when jbd2 plugin is is unloaded tools lib traceevent: Unregister handler when cfg80211 plugin is unloaded tools lib traceevent: Unregister handler when mac80211 plugin is unloaded tools lib traceevent: Unregister handler when sched_switch plugin is unloaded tools lib traceevent: Unregister handler when kvm plugin is unloaded tools lib traceevent: Unregister handler when kmem plugin is unloaded tools lib traceevent: Unregister handler when hrtimer plugin is unloaded tools lib traceevent: Unregister handler when function plugin is unloaded tools lib traceevent: Add pevent_unregister_print_function() tools lib traceevent: Add pevent_unregister_event_handler() tools lib traceevent: fix pointer-integer size mismatch ...
2014-01-20Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: - add RCU torture scripts/tooling - static analysis improvements - update RCU documentation - miscellaneous fixes * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) rcu: Remove "extern" from function declarations in kernel/rcu/rcu.h rcu: Remove "extern" from function declarations in include/linux/*rcu*.h rcu/torture: Dynamically allocate SRCU output buffer to avoid overflow rcu: Don't activate RCU core on NO_HZ_FULL CPUs rcu: Warn on allegedly impossible rcu_read_unlock_special() from irq rcu: Add an RCU_INITIALIZER for global RCU-protected pointers rcu: Make rcu_assign_pointer's assignment volatile and type-safe bonding: Use RCU_INIT_POINTER() for better overhead and for sparse rcu: Add comment on evaluate-once properties of rcu_assign_pointer(). rcu: Provide better diagnostics for blocking in RCU callback functions rcu: Improve SRCU's grace-period comments rcu: Fix CONFIG_RCU_FANOUT_EXACT for odd fanout/leaf values rcu: Fix coccinelle warnings rcutorture: Stop tracking FSF's postal address rcutorture: Move checkarg to functions.sh rcutorture: Flag errors and warnings with color coding rcutorture: Record results from repeated runs of the same test scenario rcutorture: Test summary at end of run with less chattiness rcutorture: Update comment in kvm.sh listing typical RCU trace events rcutorture: Add tracing-enabled version of TREE08 ...
2014-01-20Merge branch 'core-locking-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking changes from Ingo Molnar: - futex performance increases: larger hashes, smarter wakeups - mutex debugging improvements - lots of SMP ordering documentation updates - introduce the smp_load_acquire(), smp_store_release() primitives. (There are WIP patches that make use of them - not yet merged) - lockdep micro-optimizations - lockdep improvement: better cover IRQ contexts - liblockdep at last. We'll continue to monitor how useful this is * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) futexes: Fix futex_hashsize initialization arch: Re-sort some Kbuild files to hopefully help avoid some conflicts futexes: Avoid taking the hb->lock if there's nothing to wake up futexes: Document multiprocessor ordering guarantees futexes: Increase hash table size for better performance futexes: Clean up various details arch: Introduce smp_load_acquire(), smp_store_release() arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h locking/doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE mutexes: Give more informative mutex warning in the !lock->owner case powerpc: Full barrier for smp_mb__after_unlock_lock() rcu: Apply smp_mb__after_unlock_lock() to preserve grace periods Documentation/memory-barriers.txt: Downgrade UNLOCK+BLOCK locking: Add an smp_mb__after_unlock_lock() for UNLOCK+BLOCK barrier Documentation/memory-barriers.txt: Document ACCESS_ONCE() Documentation/memory-barriers.txt: Prohibit speculative writes Documentation/memory-barriers.txt: Add long atomic examples to memory-barriers.txt Documentation/memory-barriers.txt: Add needed ACCESS_ONCE() calls to memory-barriers.txt Revert "smp/cpumask: Make CONFIG_CPUMASK_OFFSTACK=y usable without debug dependency" ...
2014-01-20Merge branch 'core-debug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core debug changes from Ingo Molnar: "Currently there are two methods to set the panic_timeout: via 'panic=X' boot commandline option, or via /proc/sys/kernel/panic. This tree adds a third panic_timeout configuration method: configuration via Kconfig, via CONFIG_PANIC_TIMEOUT=X - useful to distros that generally want their kernel defaults to come with the .config. CONFIG_PANIC_TIMEOUT defaults to 0, which was the previous default value of panic_timeout. Doing that unearthed a few arch trickeries regarding arch-special panic_timeout values and related complications - hopefully all resolved to the satisfaction of everyone" * 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: powerpc: Clean up panic_timeout usage MIPS: Remove panic_timeout settings panic: Make panic_timeout configurable
2014-01-19tracing: Fix buggered tee(2) on tracing_pipeAl Viro
In kernel/trace/trace.c we have this: static void tracing_pipe_buf_release(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { __free_page(buf->page); } static const struct pipe_buf_operations tracing_pipe_buf_ops = { .can_merge = 0, .map = generic_pipe_buf_map, .unmap = generic_pipe_buf_unmap, .confirm = generic_pipe_buf_confirm, .release = tracing_pipe_buf_release, .steal = generic_pipe_buf_steal, .get = generic_pipe_buf_get, }; with void generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { page_cache_get(buf->page); } and I don't see anything that would've prevented tee(2) called on the pipe that got stuff spliced into it from that sucker. ->ops->get() will be called, then buf gets copied into target pipe's ->bufs[] and eventually readers get to both copies of the buffer. With get_page(page) look at that page __free_page(page) look at that page __free_page(page) which is not a good thing, to put it mildly. AFAICS, that ought to use the normal generic_pipe_buf_release() (aka page_cache_release(buf->page)), shouldn't it? [ SDR - As trace_pipe just allocates the page with alloc_page(GFP_KERNEL), and doesn't do anything special with it (no LRU logic). The __free_page() should be fine, as it wont actually free a page with reference count. Maybe there's a chance to leak memory? Anyway, This change is at a minimum good for being symmetric with generic_pipe_buf_get, it is fine to add. ] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [ SDR - Removed no longer used tracing_pipe_buf_release ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-18cgroup: trivial style updatesSeongJae Park
* Place newline before function opening brace in cgroup_kill_sb(). * Insert space before assignment in attach_task_by_pid() tj: merged two patches into one. Signed-off-by: SeongJae Park <sj38.park@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-01-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace fixes from Eric Biederman: "This is a set of 3 regression fixes. This fixes /proc/mounts when using "ip netns add <netns>" to display the actual mount point. This fixes a regression in clone that broke lxc-attach. This fixes a regression in the permission checks for mounting /proc that made proc unmountable if binfmt_misc was in use. Oops. My apologies for sending this pull request so late. Al Viro gave interesting review comments about the d_path fix that I wanted to address in detail before I sent this pull request. Unfortunately a bad round of colds kept from addressing that in detail until today. The executive summary of the review was: Al: Is patching d_path really sufficient? The prepend_path, d_path, d_absolute_path, and __d_path family of functions is a really mess. Me: Yes, patching d_path is really sufficient. Yes, the code is mess. No it is not appropriate to rewrite all of d_path for a regression that has existed for entirely too long already, when a two line change will do" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: vfs: Fix a regression in mounting proc fork: Allow CLONE_PARENT after setns(CLONE_NEWPID) vfs: In d_path don't call d_dname on a mount point
2014-01-17audit: fix location of __net_initdata for audit_net_opsRichard Guy Briggs
Fixup caught by checkpatch. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-17audit: remove pr_info for every network namespaceEric Paris
A message about creating the audit socket might be fine at startup, but a pr_info for every single network namespace created on a system isn't useful. Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-16sched: Fix __sched_setscheduler() nice testPeter Zijlstra
With the introduction of sched_attr::sched_nice we need to check if we've got permission to actually change the nice value. Daniel found that can_nice() would always fail; and upon inspection it turns out that can_nice() only tests to see if we can lower the nice value, but it doesn't validate if we're lowering or not. Therefore amend the test to only call can_nice() when we lower the nice value. Reported-and-Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: raistlin@linux.it Cc: juri.lelli@gmail.com Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Fixes: d50dde5a10f3 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI") Link: http://lkml.kernel.org/r/20140116165425.GA9481@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16futexes: Fix futex_hashsize initializationHeiko Carstens
"futexes: Increase hash table size for better performance" introduces a new alloc_large_system_hash() call. alloc_large_system_hash() however may allocate less memory than requested, e.g. limited by MAX_ORDER. Hence pass a pointer to alloc_large_system_hash() which will contain the hash shift when the function returns. Afterwards correctly set futex_hashsize. Fixes a crash on s390 where the requested allocation size was 4MB but only 1MB was allocated. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Waiman Long <Waiman.Long@hp.com> Cc: Jason Low <jason.low2@hp.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Link: http://lkml.kernel.org/r/20140116135450.GA4345@osiris Signed-off-by: Ingo Molnar <mingo@kernel.org>