summaryrefslogtreecommitdiff
path: root/init
AgeCommit message (Collapse)Author
2014-09-19init/main.c: Give init_task a canaryAaron Tomlin
Tasks get their end of stack set to STACK_END_MAGIC with the aim to catch stack overruns. Currently this feature does not apply to init_task. This patch removes this restriction. Note that a similar patch was posted by Prarit Bhargava some time ago but was never merged: http://marc.info/?l=linux-kernel&m=127144305403241&w=2 Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: aneesh.kumar@linux.vnet.ibm.com Cc: dzickus@redhat.com Cc: bmr@redhat.com Cc: jcastillo@redhat.com Cc: jgh@redhat.com Cc: minchan@kernel.org Cc: tglx@linutronix.de Cc: hannes@cmpxchg.org Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Daeseok Youn <daeseok.youn@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1410527779-8133-2-git-send-email-atomlin@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-17Revert "init: make rootdelay=N consistent with rootwait behaviour"Paul Gortmaker
This reverts commit 4dfe694f616e00e6fd83e5bbcd7a3c4d7113493d. In that, we did: Here we move the rootdelay code to be right beside the rootwait code, so that their behaviour is consistent. ...which is fine, but in hindsight, perhaps moving the rootwait to be beside the rootdelay would have been better. We also indicated: It should be noted that in doing so, the actions based on the saved_root_name[0] and initrd_load() were previously put on hold by rootdelay=N and now currently will not be delayed. However, I think consistent behaviour is more important than matching historical behaviour of delaying the above two operations. But Pavel reported an instance where an ARM target with root on MMC was failing to mount root, and Russell diagnosed it to the fact that the call to set ROOT_DEV within the saved_root_name[0] processing block mentioned above was no longer being delayed. Rather than moving both wait clauses to the original position of rootdelay and risking unearthing other possible corner case breakage at this point in time, we simply revert now and we can revisit trying the alternate/earlier location in another development cycle. Cc: Pavel Machek <pavel@denx.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-16Merge branch 'rcu-tasks.2014.09.10a' into HEADPaul E. McKenney
rcu-tasks.2014.09.10a: Add RCU-tasks flavor of RCU.
2014-09-16rcu: Fix attempt to avoid unsolicited offloading of callbacksPaul E. McKenney
Commit b58cc46c5f6b (rcu: Don't offload callbacks unless specifically requested) failed to adjust the callback lists of the CPUs that are known to be no-CBs CPUs only because they are also nohz_full= CPUs. This failure can result in callbacks that are posted during early boot getting stranded on nxtlist for CPUs whose no-CBs property becomes apparent late, and there can also be spurious warnings about offline CPUs posting callbacks. This commit fixes these problems by adding an early-boot rcu_init_nohz() that properly initializes the no-CBs CPUs. Note that kernels built with CONFIG_RCU_NOCB_CPU_ALL=y or with CONFIG_RCU_NOCB_CPU=n do not exhibit this bug. Neither do kernels booted without the nohz_full= boot parameter. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-09-13nohz: Move nohz full init call to tick initFrederic Weisbecker
This way we unbloat a bit main.c and more importantly we initialize nohz full after init_IRQ(). This dependency will be needed in further patches because nohz full needs irq work to raise its own IRQ. Information about the support for this ability on ARM64 is obtained on init_IRQ() which initialize the pointer to __smp_call_function. Since tick_init() is called right after init_IRQ(), this is a good place to call tick_nohz_init() and prepare for that dependency. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-09-07rcu: Add call_rcu_tasks()Paul E. McKenney
This commit adds a new RCU-tasks flavor of RCU, which provides call_rcu_tasks(). This RCU flavor's quiescent states are voluntary context switch (not preemption!) and userspace execution (not the idle loop -- use some sort of schedule_on_each_cpu() if you need to handle the idle tasks. Note that unlike other RCU flavors, these quiescent states occur in tasks, not necessarily CPUs. Includes fixes from Steven Rostedt. This RCU flavor is assumed to have very infrequent latency-tolerant updaters. This assumption permits significant simplifications, including a single global callback list protected by a single global lock, along with a single task-private linked list containing all tasks that have not yet passed through a quiescent state. If experience shows this assumption to be incorrect, the required additional complexity will be added. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-08-28init/do_mounts: better syntax descriptionPavel Machek
Specify hex device number unambiquously. Signed-off-by: Pavel Machek <pavel@denx.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-08-27kbuild: handle module compression while running 'make modules_install'.Bertrand Jacquin
Since module-init-tools (gzip) and kmod (gzip and xz) support compressed modules, it could be useful to include a support for compressing modules right after having them installed. Doing this in kbuild instead of per distro can permit to make this kind of usage more generic. This patch add a Kconfig entry to "Enable loadable module support" menu and let you choose to compress using gzip (default) or xz. Both gzip and xz does not used any extra -[1-9] option since Andi Kleen and Rusty Russell prove no gain is made using them. gzip is called with -n argument to avoid storing original filename inside compressed file, that way we can save some more bytes. On a v3.16 kernel, 'make allmodconfig' generated 4680 modules for a total of 378MB (no strip, no sign, no compress), the following table shows observed disk space gain based on the allmodconfig .config : | time | +-------------+-----------------+ | manual .ko | make | size | percent | compression | modules_install | | gain +-------------+-----------------+------+-------- - | | 18.61s | 378M | GZIP | 3m16s | 3m37s | 102M | 73.41% XZ | 5m22s | 5m39s | 77M | 79.83% The gain for restricted environnement seems to be interesting while uncompress can be time consuming but happens only while loading a module, that is generally done only once. This is fully compatible with signed modules while the signed module is compressed. module-init-tools or kmod handles decompression and provide to other layer the uncompressed but signed payload. Reviewed-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Bertrand Jacquin <beber@meleeweb.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-08-26mm: Fix CROSS_MEMORY_ATTACH help text grammarGeert Uytterhoeven
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-08-17mm: Support compiling out madvise and fadviseJosh Triplett
Many embedded systems will not need these syscalls, and omitting them saves space. Add a new EXPERT config option CONFIG_ADVISE_SYSCALLS (default y) to support compiling them out. bloat-o-meter: add/remove: 0/3 grow/shrink: 0/0 up/down: 0/-2250 (-2250) function old new delta sys_fadvise64 57 - -57 sys_fadvise64_64 691 - -691 sys_madvise 1502 - -1502 Signed-off-by: Josh Triplett <josh@joshtriplett.org>
2014-08-14mm: fix CROSS_MEMORY_ATTACH help text grammarGeert Uytterhoeven
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08Merge branch 'akpm' (second patchbomb from Andrew Morton)Linus Torvalds
Merge more incoming from Andrew Morton: "Two new syscalls: memfd_create in "shm: add memfd_create() syscall" kexec_file_load in "kexec: implementation of new syscall kexec_file_load" And: - Most (all?) of the rest of MM - Lots of the usual misc bits - fs/autofs4 - drivers/rtc - fs/nilfs - procfs - fork.c, exec.c - more in lib/ - rapidio - Janitorial work in filesystems: fs/ufs, fs/reiserfs, fs/adfs, fs/cramfs, fs/romfs, fs/qnx6. - initrd/initramfs work - "file sealing" and the memfd_create() syscall, in tmpfs - add pci_zalloc_consistent, use it in lots of places - MAINTAINERS maintenance - kexec feature work" * emailed patches from Andrew Morton <akpm@linux-foundation.org: (193 commits) MAINTAINERS: update nomadik patterns MAINTAINERS: update usb/gadget patterns MAINTAINERS: update DMA BUFFER SHARING patterns kexec: verify the signature of signed PE bzImage kexec: support kexec/kdump on EFI systems kexec: support for kexec on panic using new system call kexec-bzImage64: support for loading bzImage using 64bit entry kexec: load and relocate purgatory at kernel load time purgatory: core purgatory functionality purgatory/sha256: provide implementation of sha256 in purgaotory context kexec: implementation of new syscall kexec_file_load kexec: new syscall kexec_file_load() declaration kexec: make kexec_segment user buffer pointer a union resource: provide new functions to walk through resources kexec: use common function for kimage_normal_alloc() and kimage_crash_alloc() kexec: move segment verification code in a separate function kexec: rename unusebale_pages to unusable_pages kernel: build bin2c based on config option CONFIG_BUILD_BIN2C bin2c: move bin2c in scripts/basic shm: wait for pins to be released when sealing ...
2014-08-08kernel: build bin2c based on config option CONFIG_BUILD_BIN2CVivek Goyal
currently bin2c builds only if CONFIG_IKCONFIG=y. But bin2c will now be used by kexec too. So make it compilation dependent on CONFIG_BUILD_BIN2C and this config option can be selected by CONFIG_KEXEC and CONFIG_IKCONFIG. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08init/main.c: code clean-upFabian Frederick
Fixing some checkpatch warnings(remove global initialization, move __initdata, coalesce formats ...) Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08initramfs: add write error checksDavid Engraf
On a system with low memory extracting the initramfs may fail. If this happens the user gets "Failed to execute /init" instead of an initramfs error. Check return value of sys_write and call error() when the write was incomplete or failed. Signed-off-by: David Engraf <david.engraf@sysgo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08initramfs: support initramfs that is bigger than 2GiBYinghai Lu
Now with 64bit bzImage and kexec tools, we support ramdisk that size is bigger than 2g, as we could put it above 4G. Found compressed initramfs image could not be decompressed properly. It turns out that image length is int during decompress detection, and it will become < 0 when length is more than 2G. Furthermore, during decompressing len as int is used for inbuf count, that has problem too. Change len to long, that should be ok as on 32 bit platform long is 32bits. Tested with following compressed initramfs image as root with kexec. gzip, bzip2, xz, lzma, lzop, lz4. run time for populate_rootfs(): size name Nehalem-EX Westmere-EX Ivybridge-EX 9034400256 root_img : 26s 24s 30s 3561095057 root_img.lz4 : 28s 27s 27s 3459554629 root_img.lzo : 29s 29s 28s 3219399480 root_img.gz : 64s 62s 49s 2251594592 root_img.xz : 262s 260s 183s 2226366598 root_img.lzma: 386s 376s 277s 2901482513 root_img.bz2 : 635s 599s Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Rashika Kheria <rashika.kheria@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kyungsik Lee <kyungsik.lee@lge.com> Cc: P J P <ppandit@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: "Daniel M. Weeks" <dan@danweeks.net> Cc: Alexandre Courbot <acourbot@nvidia.com> Cc: Jan Beulich <JBeulich@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08initramfs: support initrd that is bigger than 2GiBYinghai Lu
When initrd (compressed or not) is used, kernel report data corrupted with /dev/ram0. The root cause: During initramfs checking, if it is initrd, it will be transferred to /initrd.image with sys_write. sys_write only support 2G-4K write, so if the initrd ram is more than that, /initrd.image will not complete at all. Add local xwrite to loop calling sys_write to workaround the problem. Also need to use xwrite in write_buffer() to handle: image is uncompressed cpio and there is one big file (>2G) in it. unpack_to_rootfs ===> write_buffer ===> actions[]/do_copy At the same time, we don't need to worry about sys_read/sys_write in do_mounts_rd.c::crd_load. As decompressor will have fill/flush and local buffer that is smaller than 2G. Test with uncompressed initrd, and compressed ones with gz, bz2, lzma,xz, lzop. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: "Daniel M. Weeks" <dan@danweeks.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08init: make rootdelay=N consistent with rootwait behaviourPaul Gortmaker
Currently rootdelay=N and rootwait behave differently (aside from the obvious unbounded wait duration) because they are at different places in the init sequence. The difference manifests itself for md devices because the call to md_run_setup() lives between rootdelay and rootwait, so if you try to use rootdelay=20 to try and allow a slow RAID0 array to assemble, you get this: [ 4.526011] sd 6:0:0:0: [sdc] Attached SCSI removable disk [ 22.972079] md: Waiting for all devices to be available before autodetect i.e. you've achieved nothing other than delaying the probing 20s, when what you wanted was a 20s delay _after_ the probing for md devices was initiated. Here we move the rootdelay code to be right beside the rootwait code, so that their behaviour is consistent. It should be noted that in doing so, the actions based on the saved_root_name[0] and initrd_load() were previously put on hold by rootdelay=N and now currently will not be delayed. However, I think consistent behaviour is more important than matching historical behaviour of delaying the above two operations. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08Merge tag 'soc-for-3.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC platform changes from Olof Johansson: "This is the bulk of new SoC enablement and other platform changes for 3.17: - Samsung S5PV210 has been converted to DT and multiplatform - Clock drivers and bindings for some of the lower-end i.MX 1/2 platforms - Kirkwood, one of the popular Marvell platforms, is folded into the mvebu platform code, removing mach-kirkwood - Hwmod data for TI AM43xx and DRA7 platforms - More additions of Renesas shmobile platform support - Removal of plat-samsung contents that can be removed with S5PV210 being multiplatform/DT-enabled and the other two old platforms being removed New platforms (most with only basic support right now): - Hisilicon X5HD2 settop box chipset is introduced - Mediatek MT6589 (mobile chipset) is introduced - Broadcom BCM7xxx settop box chipset is introduced + as usual a lot other pieces all over the platform code" * tag 'soc-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (240 commits) ARM: hisi: remove smp from machine descriptor power: reset: move hisilicon reboot code ARM: dts: Add hix5hd2-dkb dts file. ARM: debug: Rename Hi3716 to HIX5HD2 ARM: hisi: enable hix5hd2 SoC ARM: hisi: add ARCH_HISI MAINTAINERS: add entry for Broadcom ARM STB architecture ARM: brcmstb: select GISB arbiter and interrupt drivers ARM: brcmstb: add infrastructure for ARM-based Broadcom STB SoCs ARM: configs: enable SMP in bcm_defconfig ARM: add SMP support for Broadcom mobile SoCs Documentation: arm: misc updates to Marvell EBU SoC status Documentation: arm: add URLs to public datasheets for the Marvell Armada XP SoC ARM: mvebu: fix build without platforms selected ARM: mvebu: add cpuidle support for Armada 38x ARM: mvebu: add cpuidle support for Armada 370 cpuidle: mvebu: add Armada 38x support cpuidle: mvebu: add Armada 370 support cpuidle: mvebu: rename the driver from armada-370-xp to mvebu-v7 ARM: mvebu: export the SCU address ...
2014-08-06printk: allow increasing the ring buffer depending on the number of CPUsLuis R. Rodriguez
The default size of the ring buffer is too small for machines with a large amount of CPUs under heavy load. What ends up happening when debugging is the ring buffer overlaps and chews up old messages making debugging impossible unless the size is passed as a kernel parameter. An idle system upon boot up will on average spew out only about one or two extra lines but where this really matters is on heavy load and that will vary widely depending on the system and environment. There are mechanisms to help increase the kernel ring buffer for tracing through debugfs, and those interfaces even allow growing the kernel ring buffer per CPU. We also have a static value which can be passed upon boot. Relying on debugfs however is not ideal for production, and relying on the value passed upon bootup is can only used *after* an issue has creeped up. Instead of being reactive this adds a proactive measure which lets you scale the amount of contributions you'd expect to the kernel ring buffer under load by each CPU in the worst case scenario. We use num_possible_cpus() to avoid complexities which could be introduced by dynamically changing the ring buffer size at run time, num_possible_cpus() lets us use the upper limit on possible number of CPUs therefore avoiding having to deal with hotplugging CPUs on and off. This introduces the kernel configuration option LOG_CPU_MAX_BUF_SHIFT which is used to specify the maximum amount of contributions to the kernel ring buffer in the worst case before the kernel ring buffer flips over, the size is specified as a power of 2. The total amount of contributions made by each CPU must be greater than half of the default kernel ring buffer size (1 << LOG_BUF_SHIFT bytes) in order to trigger an increase upon bootup. The kernel ring buffer is increased to the next power of two that would fit the required minimum kernel ring buffer size plus the additional CPU contribution. For example if LOG_BUF_SHIFT is 18 (256 KB) you'd require at least 128 KB contributions by other CPUs in order to trigger an increase of the kernel ring buffer. With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least anything over > 64 possible CPUs to trigger an increase. If you had 128 possible CPUs the amount of minimum required kernel ring buffer bumps to: ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB Since we require the ring buffer to be a power of two the new required size would be 1024 KB. This CPU contributions are ignored when the "log_buf_len" kernel parameter is used as it forces the exact size of the ring buffer to an expected power of two value. [pmladek@suse.cz: fix build] Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.cz> Tested-by: Davidlohr Bueso <davidlohr@hp.com> Tested-by: Petr Mladek <pmladek@suse.cz> Reviewed-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Petr Mladek <pmladek@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Arun KS <arunks.linux@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-09Merge branches 'doc.2014.07.08a', 'fixes.2014.07.09a', ↵Paul E. McKenney
'maintainers.2014.07.08b', 'nocbs.2014.07.07a' and 'torture.2014.07.07a' into HEAD doc.2014.07.08a: Documentation updates. fixes.2014.07.09a: Miscellaneous fixes. maintainers.2014.07.08b: Maintainership updates. nocbs.2014.07.07a: Callback-offloading fixes. torture.2014.07.07a: Torture-test updates.
2014-07-09rcu: Handle obsolete references to TINY_PREEMPT_RCUPaul E. McKenney
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2014-07-07rcu: Don't offload callbacks unless specifically requestedPaul E. McKenney
Enabling NO_HZ_FULL currently has the side effect of enabling callback offloading on all CPUs. This results in lots of additional rcuo kthreads, and can also increase context switching and wakeups, even in cases where callback offloading is neither needed nor particularly desirable. This commit therefore enables callback offloading on a given CPU only if specifically requested at build time or boot time, or if that CPU has been specifically designated (again, either at build time or boot time) as a nohz_full CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-06-16kernel: add calibration_delay_done()Peter De Schrijver
Add calibration_delay_done() call and dummy implementation. This allows architectures to stop accepting registrations for new timer based delay functions. Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Stephen Warren <swarren@nvidia.com>
2014-06-11Merge tag 'modules-next-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module updates from Rusty Russell: "Most of this is cleaning up various driver sysfs permissions so we can re-add the perm check (we unified the module param and sysfs checks, but the module ones were stronger so we weakened them temporarily). Param parsing gets documented, and also "--" now forces args to be handed to init (and ignored by the kernel). Module NX/RO protections get tightened: we now set them before calling parse_args()" * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: module: set nx before marking module MODULE_STATE_COMING. samples/kobject/: avoid world-writable sysfs files. drivers/hid/hid-picolcd_fb: avoid world-writable sysfs files. drivers/staging/speakup/: avoid world-writable sysfs files. drivers/regulator/virtual: avoid world-writable sysfs files. drivers/scsi/pm8001/pm8001_ctl.c: avoid world-writable sysfs files. drivers/hid/hid-lg4ff.c: avoid world-writable sysfs files. drivers/video/fbdev/sm501fb.c: avoid world-writable sysfs files. drivers/mtd/devices/docg3.c: avoid world-writable sysfs files. speakup: fix incorrect perms on speakup_acntsa.c cpumask.h: silence warning with -Wsign-compare Documentation: Update kernel-parameters.tx param: hand arguments after -- straight to init modpost: Fix resource leak in read_dump()
2014-06-05Merge branch 'x86/espfix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull x86-64 espfix changes from Peter Anvin: "This is the espfix64 code, which fixes the IRET information leak as well as the associated functionality problem. With this code applied, 16-bit stack segments finally work as intended even on a 64-bit kernel. Consequently, this patchset also removes the runtime option that we added as an interim measure. To help the people working on Linux kernels for very small systems, this patchset also makes these compile-time configurable features" * 'x86/espfix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option" x86, espfix: Make it possible to disable 16-bit support x86, espfix: Make espfix64 a Kconfig option, fix UML x86, espfix: Fix broken header guard x86, espfix: Move espfix definitions into a separate header file x86-32, espfix: Remove filter for espfix32 due to race x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack
2014-06-04init/main.c: remove an ifdefAndrew Morton
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04kthreads: kill CLONE_KERNEL, change kernel_thread(kernel_init) to avoid ↵Oleg Nesterov
CLONE_SIGHAND 1. Remove CLONE_KERNEL, it has no users and it is dangerous. The (old) comment says "List of flags we want to share for kernel threads" but this is not true, we do not want to share ->sighand by default. This flag can only be used if the caller is sure that both parent/child will never play with signals (say, allow_signal/etc). 2. Change rest_init() to clone kernel_init() without CLONE_SIGHAND. In this case CLONE_SIGHAND does not really hurt, and it looks like optimization because copy_sighand() can avoid kmem_cache_alloc(). But in fact this only adds the minor pessimization. kernel_init() is going to exec the init process, and de_thread() will need to unshare ->sighand and do kmem_cache_alloc(sighand_cachep) anyway, but it needs to do more work and take tasklist_lock and siglock. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04init/main.c: add initcall_blacklist kernel parameterPrarit Bhargava
When a module is built into the kernel the module_init() function becomes an initcall. Sometimes debugging through dynamic debug can help, however, debugging built in kernel modules is typically done by changing the .config, recompiling, and booting the new kernel in an effort to determine exactly which module caused a problem. This patchset can be useful stand-alone or combined with initcall_debug. There are cases where some initcalls can hang the machine before the console can be flushed, which can make initcall_debug output inaccurate. Having the ability to skip initcalls can help further debugging of these scenarios. Usage: initcall_blacklist=<list of comma separated initcalls> ex) added "initcall_blacklist=sgi_uv_sysfs_init" as a kernel parameter and the log contains: blacklisting initcall sgi_uv_sysfs_init ... ... initcall sgi_uv_sysfs_init blacklisted ex) added "initcall_blacklist=foo_bar,sgi_uv_sysfs_init" as a kernel parameter and the log contains: blacklisting initcall foo_bar blacklisting initcall sgi_uv_sysfs_init ... ... initcall sgi_uv_sysfs_init blacklisted [akpm@linux-foundation.org: tweak printk text] Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Richard Weinberger <richard.weinberger@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: Rob Landley <rob@landley.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04init/main.c: don't use pr_debug()Andrew Morton
Pertially revert commit ea676e846a81 ("init/main.c: convert to pr_foo()"). Unbeknownst to me, pr_debug() is different from the other pr_foo() levels: pr_debug() is a no-op when DEBUG is not defined. Happily, init/main.c does have a #define DEBUG so we didn't break initcall_debug. But the functioning of initcall_debug should not be dependent upon the presence of that #define DEBUG. Reported-by: Russell King <rmk@arm.linux.org.uk> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04kernel/printk: use symbolic defines for console loglevelsBorislav Petkov
... instead of naked numbers. Stuff in sysrq.c used to set it to 8 which is supposed to mean above default level so set it to DEBUG instead as we're terminating/killing all tasks and we want to be verbose there. Also, correct the check in x86_64_start_kernel which should be >= as we're clearly issuing the string there for all debug levels, not only the magical 10. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: Joe Perches <joe@perches.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04sys_sgetmask/sys_ssetmask: add CONFIG_SGETMASK_SYSCALLFabian Frederick
sys_sgetmask and sys_ssetmask are obsolete system calls no longer supported in libc. This patch replaces architecture related __ARCH_WANT_SYS_SGETMAX by expert mode configuration.That option is enabled by default for those architectures. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Steven Miao <realmz6@gmail.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm/process_vm_access: move config option into init/KconfigKonstantin Khlebnikov
CONFIG_CROSS_MEMORY_ATTACH adds couple syscalls: process_vm_readv and process_vm_writev, it's a kind of IPC for copying data between processes. Currently this option is placed inside "Processor type and features". This patch moves it into "General setup" (where all other arch-independed syscalls and ipc features are placed) and changes prompt string to less cryptic. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Christopher Yeoh <cyeoh@au1.ibm.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Hugh Dickins <hughd@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04memcg: kill start_kernel()->mm_init_owner(&init_mm)Oleg Nesterov
Remove start_kernel()->mm_init_owner(&init_mm, &init_task). This doesn't really hurt but unnecessary and misleading. init_task is the "swapper" thread == current, its ->mm is always NULL. And init_mm can only be used as ->active_mm, not as ->mm. mm_init_owner() has a single caller with this patch, perhaps it should die. mm_init() can initialize ->owner under #ifdef. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Peter Chiang <pchiang@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04memcg: kill CONFIG_MM_OWNEROleg Nesterov
CONFIG_MM_OWNER makes no sense. It is not user-selectable, it is only selected by CONFIG_MEMCG automatically. So we can kill this option in init/Kconfig and do s/CONFIG_MM_OWNER/CONFIG_MEMCG/ globally. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04Documentation/memcg: warn about incomplete kmemcg stateVladimir Davydov
Kmemcg is currently under development and lacks some important features. In particular, it does not have support of kmem reclaim on memory pressure inside cgroup, which practically makes it unusable in real life. Let's warn about it in both Kconfig and Documentation to prevent complaints arising. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-21Merge commit '7ed6fb9b5a5510e4ef78ab27419184741169978a' into x86/espfixH. Peter Anvin
Merge in Linus' tree with: fa81511bb0bb x86-64, modify_ldt: Make support for 16-bit segments a runtime option ... reverted, to avoid a conflict. This commit is no longer necessary with the proper fix in place. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05asmlinkage: Add explicit __visible to drivers/*, lib/*, kernel/*Andi Kleen
As requested by Linus add explicit __visible to the asmlinkage users. This marks functions visible to assembler. Tree sweep for rest of tree. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1398984278-29319-4-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-04x86, espfix: Make espfix64 a Kconfig option, fix UMLH. Peter Anvin
Make espfix64 a hidden Kconfig option. This fixes the x86-64 UML build which had broken due to the non-existence of init_espfix_bsp() in UML: since UML uses its own Kconfig, this option does not appear in the UML build. This also makes it possible to make support for 16-bit segments a configuration option, for the people who want to minimize the size of the kernel. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Richard Weinberger <richard@nod.at> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
2014-04-30x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stackH. Peter Anvin
The IRET instruction, when returning to a 16-bit segment, only restores the bottom 16 bits of the user space stack pointer. This causes some 16-bit software to break, but it also leaks kernel state to user space. We have a software workaround for that ("espfix") for the 32-bit kernel, but it relies on a nonzero stack segment base which is not available in 64-bit mode. In checkin: b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels we "solved" this by forbidding 16-bit segments on 64-bit kernels, with the logic that 16-bit support is crippled on 64-bit kernels anyway (no V86 support), but it turns out that people are doing stuff like running old Win16 binaries under Wine and expect it to work. This works around this by creating percpu "ministacks", each of which is mapped 2^16 times 64K apart. When we detect that the return SS is on the LDT, we copy the IRET frame to the ministack and use the relevant alias to return to userspace. The ministacks are mapped readonly, so if IRET faults we promote #GP to #DF which is an IST vector and thus has its own stack; we then do the fixup in the #DF handler. (Making #GP an IST exception would make the msr_safe functions unsafe in NMI/MC context, and quite possibly have other effects.) Special thanks to: - Andy Lutomirski, for the suggestion of using very small stack slots and copy (as opposed to map) the IRET frame there, and for the suggestion to mark them readonly and let the fault promote to #DF. - Konrad Wilk for paravirt fixup and testing. - Borislav Petkov for testing help and useful comments. Reported-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Lutomriski <amluto@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dirk Hohndel <dirk@hohndel.org> Cc: Arjan van de Ven <arjan.van.de.ven@intel.com> Cc: comex <comexk@gmail.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: <stable@vger.kernel.org> # consider after upstream merge
2014-04-28param: hand arguments after -- straight to initRusty Russell
The kernel passes any args it doesn't need through to init, except it assumes anything containing '.' belongs to the kernel (for a module). This change means all users can clearly distinguish which arguments are for init. For example, the kernel uses debug ("dee-bug") to mean log everything to the console, where systemd uses the debug from the Scandinavian "day-boog" meaning "fail to boot". If a future versions uses argv[] instead of reading /proc/cmdline, this confusion will be avoided. eg: test 'FOO="this is --foo"' -- 'systemd.debug="true true true"' Gives: argv[0] = '/debug-init' argv[1] = 'test' argv[2] = 'systemd.debug=true true true' envp[0] = 'HOME=/' envp[1] = 'TERM=linux' envp[2] = 'FOO=this is --foo' Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-04-18init/Kconfig: move the trusted keyring config option to general setupPeter Foley
The SYSTEM_TRUSTED_KEYRING config option is not in any menu, causing it to show up in the toplevel of the kernel configuration. Fix this by moving it under the General Setup menu. Signed-off-by: Peter Foley <pefoley2@pefoley.com> Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-12Merge git://git.infradead.org/users/eparis/auditLinus Torvalds
Pull audit updates from Eric Paris. * git://git.infradead.org/users/eparis/audit: (28 commits) AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range audit: do not cast audit_rule_data pointers pointlesly AUDIT: Allow login in non-init namespaces audit: define audit_is_compat in kernel internal header kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c sched: declare pid_alive as inline audit: use uapi/linux/audit.h for AUDIT_ARCH declarations syscall_get_arch: remove useless function arguments audit: remove stray newline from audit_log_execve_info() audit_panic() call audit: remove stray newlines from audit_log_lost messages audit: include subject in login records audit: remove superfluous new- prefix in AUDIT_LOGIN messages audit: allow user processes to log from another PID namespace audit: anchor all pid references in the initial pid namespace audit: convert PPIDs to the inital PID namespace. pid: get pid_t ppid of task in init_pid_ns audit: rename the misleading audit_get_context() to audit_take_context() audit: Add generic compat syscall support audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL ...
2014-04-07initramfs: debug detected compression methodDaniel M. Weeks
This can greatly aid in narrowing down the real source of initramfs problems such as failures related to the compression of the in-kernel initramfs when an external initramfs is in use as well. Existing errors are ambiguous as to which initramfs is a problem and why. [akpm@linux-foundation.org: use pr_debug()] Signed-off-by: Daniel M. Weeks <dan@danweeks.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07kconfig: make allnoconfig disable options behind EMBEDDED and EXPERTJosh Triplett
"make allnoconfig" exists to ease testing of minimal configurations. Documentation/SubmitChecklist includes a note to test with allnoconfig. This helps catch missing dependencies on common-but-not-required functionality, which might otherwise go unnoticed. However, allnoconfig still leaves many symbols enabled, because they're hidden behind CONFIG_EMBEDDED or CONFIG_EXPERT. For instance, allnoconfig still has CONFIG_PRINTK and CONFIG_BLOCK enabled, so drivers don't typically get build-tested with those disabled. To address this, introduce a new Kconfig option "allnoconfig_y", used on symbols which only exist to hide other symbols. Set it on CONFIG_EMBEDDED (which then selects CONFIG_EXPERT). allnoconfig will then disable all the symbols hidden behind those. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03Merge branch 'akpm' (incoming from Andrew)Linus Torvalds
Merge first patch-bomb from Andrew Morton: - Various misc bits - kmemleak fixes - small befs, codafs, cifs, efs, freexxfs, hfsplus, minixfs, reiserfs things - fanotify - I appear to have become SuperH maintainer - ocfs2 updates - direct-io tweaks - a bit of the MM queue - printk updates - MAINTAINERS maintenance - some backlight things - lib/ updates - checkpatch updates - the rtc queue - nilfs2 updates - Small Documentation/ updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (237 commits) Documentation/SubmittingPatches: remove references to patch-scripts Documentation/SubmittingPatches: update some dead URLs Documentation/filesystems/ntfs.txt: remove changelog reference Documentation/kmemleak.txt: updates fs/reiserfs/super.c: add __init to init_inodecache fs/reiserfs: move prototype declaration to header file fs/hfsplus/attributes.c: add __init to hfsplus_create_attr_tree_cache() fs/hfsplus/extents.c: fix concurrent acess of alloc_blocks fs/hfsplus/extents.c: remove unused variable in hfsplus_get_block nilfs2: update project's web site in nilfs2.txt nilfs2: update MAINTAINERS file entries fix nilfs2: verify metadata sizes read from disk nilfs2: add FITRIM ioctl support for nilfs2 nilfs2: add nilfs_sufile_trim_fs to trim clean segs nilfs2: implementation of NILFS_IOCTL_SET_SUINFO ioctl nilfs2: add nilfs_sufile_set_suinfo to update segment usage nilfs2: add struct nilfs_suinfo_update and flags nilfs2: update MAINTAINERS file entries fs/coda/inode.c: add __init to init_inodecache() BEFS: logging cleanup ...
2014-04-03init/do_mounts.c: fix comment errorchishanmingshen
Signed-off-by: chishanmingshen <chishanmingshen@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fs, kernel: permit disabling the uselib syscallJosh Triplett
uselib hasn't been used since libc5; glibc does not use it. Support turning it off. When disabled, also omit the load_elf_library implementation from binfmt_elf.c, which only uselib invokes. bloat-o-meter: add/remove: 0/4 grow/shrink: 0/1 up/down: 0/-785 (-785) function old new delta padzero 39 36 -3 uselib_flags 20 - -20 sys_uselib 168 - -168 SyS_uselib 168 - -168 load_elf_library 426 - -426 The new CONFIG_USELIB defaults to `y'. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03sys_sysfs: Add CONFIG_SYSFS_SYSCALLFabian Frederick
sys_sysfs is an obsolete system call no longer supported by libc. - This patch adds a default CONFIG_SYSFS_SYSCALL=y - Option can be turned off in expert mode. - cond_syscall added to kernel/sys_ni.c [akpm@linux-foundation.org: tweak Kconfig help text] Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03Merge branch 'for-3.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "A lot updates for cgroup: - The biggest one is cgroup's conversion to kernfs. cgroup took after the long abandoned vfs-entangled sysfs implementation and made it even more convoluted over time. cgroup's internal objects were fused with vfs objects which also brought in vfs locking and object lifetime rules. Naturally, there are places where vfs rules don't fit and nasty hacks, such as credential switching or lock dance interleaving inode mutex and cgroup_mutex with object serial number comparison thrown in to decide whether the operation is actually necessary, needed to be employed. After conversion to kernfs, internal object lifetime and locking rules are mostly isolated from vfs interactions allowing shedding of several nasty hacks and overall simplification. This will also allow implmentation of operations which may affect multiple cgroups which weren't possible before as it would have required nesting i_mutexes. - Various simplifications including dropping of module support, easier cgroup name/path handling, simplified cgroup file type handling and task_cg_lists optimization. - Prepatory changes for the planned unified hierarchy, which is still a patchset away from being actually operational. The dummy hierarchy is updated to serve as the default unified hierarchy. Controllers which aren't claimed by other hierarchies are associated with it, which BTW was what the dummy hierarchy was for anyway. - Various fixes from Li and others. This pull request includes some patches to add missing slab.h to various subsystems. This was triggered xattr.h include removal from cgroup.h. cgroup.h indirectly got included a lot of files which brought in xattr.h which brought in slab.h. There are several merge commits - one to pull in kernfs updates necessary for converting cgroup (already in upstream through driver-core), others for interfering changes in the fixes branch" * 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits) cgroup: remove useless argument from cgroup_exit() cgroup: fix spurious lockdep warning in cgroup_exit() cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c cgroup: break kernfs active_ref protection in cgroup directory operations cgroup: fix cgroup_taskset walking order cgroup: implement CFTYPE_ONLY_ON_DFL cgroup: make cgrp_dfl_root mountable cgroup: drop const from @buffer of cftype->write_string() cgroup: rename cgroup_dummy_root and related names cgroup: move ->subsys_mask from cgroupfs_root to cgroup cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}() cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root cgroup: reorganize cgroup bootstrapping cgroup: relocate setting of CGRP_DEAD cpuset: use rcu_read_lock() to protect task_cs() cgroup_freezer: document freezer_fork() subtleties cgroup: update cgroup_transfer_tasks() to either succeed or fail cgroup: drop task_lock() protection around task->cgroups cgroup: update how a newly forked task gets associated with css_set ...