summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2014-10-09prctl: PR_SET_MM -- factor out mmap_sem when updating mm::exe_fileCyrill Gorcunov
Instead of taking mm->mmap_sem inside prctl_set_mm_exe_file() move it out and rename the helper to prctl_set_mm_exe_file_locked(). This will allow to reuse this function in a next patch. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Julien Tinnes <jln@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09mm: use may_adjust_brk helperCyrill Gorcunov
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Julien Tinnes <jln@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09kernel/kthread.c: partial revert of 81c98869faa5 ("kthread: ensure locality ↵Nishanth Aravamudan
of task_struct allocations") After discussions with Tejun, we don't want to spread the use of cpu_to_mem() (and thus knowledge of allocators/NUMA topology details) into callers, but would rather ensure the callees correctly handle memoryless nodes. With the previous patches ("topology: add support for node_to_mem_node() to determine the fallback node" and "slub: fallback to node_to_mem_node() node if allocating on memoryless node") adding and using node_to_mem_node(), we can safely undo part of the change to the kthread logic from 81c98869faa5. Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Cc: Han Pingtian <hanpt@linux.vnet.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Anton Blanchard <anton@samba.org> Cc: Christoph Lameter <cl@linux.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09softlockup: make detector be aware of task switch of processes hogging cpuchai wen
For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of "warn only once for a process", as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the task pointer of the hogging process and use that to reset soft_watchdog_warn too. [dzickus@redhat.com: update comment] Signed-off-by: chai wen <chaiw.fnst@cn.fujitsu.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09Merge tag 'pm+acpi-3.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: "Features-wise, to me the most important this time is a rework of wakeup interrupts handling in the core that makes them work consistently across all of the available sleep states, including suspend-to-idle. Many thanks to Thomas Gleixner for his help with this work. Second is an update of the generic PM domains code that has been in need of some care for quite a while. Unused code is being removed, DT support is being added and domains are now going to be attached to devices in bus type code in analogy with the ACPI PM domain. The majority of work here was done by Ulf Hansson who also has been the most active developer this time. Apart from this we have a traditional ACPICA update, this time to upstream version 20140828 and a few ACPI wakeup interrupts handling patches on top of the general rework mentioned above. There also are several cpufreq commits including renaming the cpufreq-cpu0 driver to cpufreq-dt, as this is what implements generic DT-based cpufreq support, and a new DT-based idle states infrastructure for cpuidle. In addition to that, the ACPI LPSS driver is updated, ACPI support for Apple machines is improved, a few bugs are fixed and a few cleanups are made all over. Finally, the Adaptive Voltage Scaling (AVS) subsystem now has a tree maintained by Kevin Hilman that will be merged through the PM tree. Numbers-wise, the generic PM domains update takes the lead this time with 32 non-merge commits, second is cpufreq (15 commits) and the 3rd place goes to the wakeup interrupts handling rework (13 commits). Specifics: - Rework the handling of wakeup IRQs by the IRQ core such that all of them will be switched over to "wakeup" mode in suspend_device_irqs() and in that mode the first interrupt will abort system suspend in progress or wake up the system if already in suspend-to-idle (or equivalent) without executing any interrupt handlers. Among other things that eliminates the wakeup-related motivation to use the IRQF_NO_SUSPEND interrupt flag with interrupts which don't really need it and should not use it (Thomas Gleixner and Rafael Wysocki) - Switch over ACPI to handling wakeup interrupts with the help of the new mechanism introduced by the above IRQ core rework (Rafael Wysocki) - Rework the core generic PM domains code to eliminate code that's not used, add DT support and add a generic mechanism by which devices can be added to PM domains automatically during enumeration (Ulf Hansson, Geert Uytterhoeven and Tomasz Figa). - Add debugfs-based mechanics for debugging generic PM domains (Maciej Matraszek). - ACPICA update to upstream version 20140828. Included are updates related to the SRAT and GTDT tables and the _PSx methods are in the METHOD_NAME list now (Bob Moore and Hanjun Guo). - Add _OSI("Darwin") support to the ACPI core (unfortunately, that can't really be done in a straightforward way) to prevent Thunderbolt from being turned off on Apple systems after boot (or after resume from system suspend) and rework the ACPI Smart Battery Subsystem (SBS) driver to work correctly with Apple platforms (Matthew Garrett and Andreas Noever). - ACPI LPSS (Low-Power Subsystem) driver update cleaning up the code, adding support for 133MHz I2C source clock on Intel Baytrail to it and making it avoid using UART RTS override with Auto Flow Control (Heikki Krogerus). - ACPI backlight updates removing the video_set_use_native_backlight quirk which is not necessary any more, making the code check the list of output devices returned by the _DOD method to avoid creating acpi_video interfaces that won't work and adding a quirk for Lenovo Ideapad Z570 (Hans de Goede, Aaron Lu and Stepan Bujnak) - New Win8 ACPI OSI quirks for some Dell laptops (Edward Lin) - Assorted ACPI code cleanups (Fabian Frederick, Rasmus Villemoes, Sudip Mukherjee, Yijing Wang, and Zhang Rui) - cpufreq core updates and cleanups (Viresh Kumar, Preeti U Murthy, Rasmus Villemoes) - cpufreq driver updates: cpufreq-cpu0/cpufreq-dt (driver name change among other things), ppc-corenet, powernv (Viresh Kumar, Preeti U Murthy, Shilpasri G Bhat, Lucas Stach) - cpuidle support for DT-based idle states infrastructure, new ARM64 cpuidle driver, cpuidle core cleanups (Lorenzo Pieralisi, Rasmus Villemoes) - ARM big.LITTLE cpuidle driver updates: support for DT-based initialization and Exynos5800 compatible string (Lorenzo Pieralisi, Kevin Hilman) - Rework of the test_suspend kernel command line argument and a new trace event for console resume (Srinivas Pandruvada, Todd E Brandt) - Second attempt to optimize swsusp_free() (hibernation core) to make it avoid going through all PFNs which may be way too slow on some systems (Joerg Roedel) - devfreq updates (Paul Bolle, Punit Agrawal, Ãrjan Eide). - rockchip-io Adaptive Voltage Scaling (AVS) driver and AVS entry update in MAINTAINERS (Heiko Stübner, Kevin Hilman) - PM core fix related to clock management (Geert Uytterhoeven) - PM core's sysfs code cleanup (Johannes Berg)" * tag 'pm+acpi-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (105 commits) ACPI / fan: printk replacement PM / clk: Fix crash in clocks management code if !CONFIG_PM_RUNTIME PM / Domains: Rename cpu_data to cpuidle_data cpufreq: cpufreq-dt: fix potential double put of cpu OF node cpufreq: cpu0: rename driver and internals to 'cpufreq_dt' PM / hibernate: Iterate over set bits instead of PFNs in swsusp_free() cpufreq: ppc-corenet: remove duplicate update of cpu_data ACPI / sleep: Rework the handling of ACPI GPE wakeup from suspend-to-idle PM / sleep: Rename platform suspend/resume functions in suspend.c PM / sleep: Export dpm_suspend_late/noirq() and dpm_resume_early/noirq() ACPICA: Introduce acpi_enable_all_wakeup_gpes() ACPICA: Clear all non-wakeup GPEs in acpi_hw_enable_wakeup_gpe_block() ACPI / video: check _DOD list when creating backlight devices PM / Domains: Move dev_pm_domain_attach|detach() to pm_domain.h cpufreq: Replace strnicmp with strncasecmp cpufreq: powernv: Set the cpus to nominal frequency during reboot/kexec cpufreq: powernv: Set the pstate of the last hotplugged out cpu in policy->cpus to minimum cpufreq: Allow stop CPU callback to be used by all cpufreq drivers PM / devfreq: exynos: Enable building exynos PPMU as module PM / devfreq: Export helper functions for drivers ...
2014-10-09Merge tag 'pci-v3.18-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "The interesting things here are: - Turn on Config Request Retry Status Software Visibility. This caused hangs last time, but we included a fix this time. - Rework PCI device configuration to use _HPP/_HPX more aggressively - Allow PCI devices to be put into D3cold during system suspend - Add arm64 PCI support - Add APM X-Gene host bridge driver - Add TI Keystone host bridge driver - Add Xilinx AXI host bridge driver More detailed summary: Enumeration - Check Vendor ID only for Config Request Retry Status (Rajat Jain) - Enable Config Request Retry Status when supported (Rajat Jain) - Add generic domain handling (Catalin Marinas) - Generate uppercase hex for modalias interface class (Ricardo Ribalda Delgado) Resource management - Add missing MEM_64 mask in pci_assign_unassigned_bridge_resources() (Yinghai Lu) - Increase IBM ipr SAS Crocodile BARs to at least system page size (Douglas Lehr) PCI device hotplug - Prevent NULL dereference during pciehp probe (Andreas Noever) - Move _HPP & _HPX handling into core (Bjorn Helgaas) - Apply _HPP to PCIe devices as well as PCI (Bjorn Helgaas) - Apply _HPP/_HPX to display devices (Bjorn Helgaas) - Preserve SERR & PARITY settings when applying _HPP/_HPX (Bjorn Helgaas) - Preserve MPS and MRRS settings when applying _HPP/_HPX (Bjorn Helgaas) - Apply _HPP/_HPX to all devices, not just hot-added ones (Bjorn Helgaas) - Fix wait time in pciehp timeout message (Yinghai Lu) - Add more pciehp Slot Control debug output (Yinghai Lu) - Stop disabling pciehp notifications during init (Yinghai Lu) MSI - Remove arch_msi_check_device() (Alexander Gordeev) - Rename pci_msi_check_device() to pci_msi_supported() (Alexander Gordeev) - Move D0 check into pci_msi_check_device() (Alexander Gordeev) - Remove unused kobject from struct msi_desc (Yijing Wang) - Remove "pos" from the struct msi_desc msi_attrib (Yijing Wang) - Add "msi_bus" sysfs MSI/MSI-X control for endpoints (Yijing Wang) - Use __get_cached_msi_msg() instead of get_cached_msi_msg() (Yijing Wang) - Use __read_msi_msg() instead of read_msi_msg() (Yijing Wang) - Use __write_msi_msg() instead of write_msi_msg() (Yijing Wang) Power management - Drop unused runtime PM support code for PCIe ports (Rafael J. Wysocki) - Allow PCI devices to be put into D3cold during system suspend (Rafael J. Wysocki) AER - Add additional AER error strings (Gong Chen) - Make <linux/aer.h> standalone includable (Thierry Reding) Virtualization - Add ACS quirk for Solarflare SFC9120 & SFC9140 (Alex Williamson) - Add ACS quirk for Intel 10G NICs (Alex Williamson) - Add ACS quirk for AMD A88X southbridge (Marti Raudsepp) - Remove unused pci_find_upstream_pcie_bridge(), pci_get_dma_source() (Alex Williamson) - Add device flag helpers (Ethan Zhao) - Assume all Mellanox devices have broken INTx masking (Gavin Shan) Generic host bridge driver - Fix ioport_map() for !CONFIG_GENERIC_IOMAP (Liviu Dudau) - Add pci_register_io_range() and pci_pio_to_address() (Liviu Dudau) - Define PCI_IOBASE as the base of virtual PCI IO space (Liviu Dudau) - Fix the conversion of IO ranges into IO resources (Liviu Dudau) - Add pci_get_new_domain_nr() and of_get_pci_domain_nr() (Liviu Dudau) - Add support for parsing PCI host bridge resources from DT (Liviu Dudau) - Add pci_remap_iospace() to map bus I/O resources (Liviu Dudau) - Add arm64 architectural support for PCI (Liviu Dudau) APM X-Gene - Add APM X-Gene PCIe driver (Tanmay Inamdar) - Add arm64 DT APM X-Gene PCIe device tree nodes (Tanmay Inamdar) Freescale i.MX6 - Probe in module_init(), not fs_initcall() (Lucas Stach) - Delay enabling reference clock for SS until it stabilizes (Tim Harvey) Marvell MVEBU - Fix uninitialized variable in mvebu_get_tgt_attr() (Thomas Petazzoni) NVIDIA Tegra - Make sure the PCIe PLL is really reset (Eric Yuen) - Add error path tegra_msi_teardown_irq() cleanup (Jisheng Zhang) - Fix extended configuration space mapping (Peter Daifuku) - Implement resource hierarchy (Thierry Reding) - Clear CLKREQ# enable on port disable (Thierry Reding) - Add Tegra124 support (Thierry Reding) ST Microelectronics SPEAr13xx - Pass config resource through reg property (Pratyush Anand) Synopsys DesignWare - Use NULL instead of false (Fabio Estevam) - Parse bus-range property from devicetree (Lucas Stach) - Use pci_create_root_bus() instead of pci_scan_root_bus() (Lucas Stach) - Remove pci_assign_unassigned_resources() (Lucas Stach) - Check private_data validity in single place (Lucas Stach) - Setup and clear exactly one MSI at a time (Lucas Stach) - Remove open-coded bitmap operations (Lucas Stach) - Fix configuration base address when using 'reg' (Minghuan Lian) - Fix IO resource end address calculation (Minghuan Lian) - Rename get_msi_data() to get_msi_addr() (Minghuan Lian) - Add get_msi_data() to pcie_host_ops (Minghuan Lian) - Add support for v3.65 hardware (Murali Karicheri) - Fold struct pcie_port_info into struct pcie_port (Pratyush Anand) TI Keystone - Add TI Keystone PCIe driver (Murali Karicheri) - Limit MRSS for all downstream devices (Murali Karicheri) - Assume controller is already in RC mode (Murali Karicheri) - Set device ID based on SoC to support multiple ports (Murali Karicheri) Xilinx AXI - Add Xilinx AXI PCIe driver (Srikanth Thokala) - Fix xilinx_pcie_assign_msi() return value test (Dan Carpenter) Miscellaneous - Clean up whitespace (Quentin Lambert) - Remove assignments from "if" conditions (Quentin Lambert) - Move PCI_VENDOR_ID_VMWARE to pci_ids.h (Francesco Ruggeri) - x86: Mark DMI tables as initialization data (Mathias Krause) - x86: Move __init annotation to the correct place (Mathias Krause) - x86: Mark constants of pci_mmcfg_nvidia_mcp55() as __initconst (Mathias Krause) - x86: Constify pci_mmcfg_probes[] array (Mathias Krause) - x86: Mark PCI BIOS initialization code as such (Mathias Krause) - Parenthesize PCI_DEVID and PCI_VPD_LRDT_ID parameters (Megan Kamiya) - Remove unnecessary variable in pci_add_dynid() (Tobias Klauser)" * tag 'pci-v3.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (109 commits) arm64: dts: Add APM X-Gene PCIe device tree nodes PCI: Add ACS quirk for AMD A88X southbridge devices PCI: xgene: Add APM X-Gene PCIe driver PCI: designware: Remove open-coded bitmap operations PCI/MSI: Remove unnecessary temporary variable PCI/MSI: Use __write_msi_msg() instead of write_msi_msg() MSI/powerpc: Use __read_msi_msg() instead of read_msi_msg() PCI/MSI: Use __get_cached_msi_msg() instead of get_cached_msi_msg() PCI/MSI: Add "msi_bus" sysfs MSI/MSI-X control for endpoints PCI/MSI: Remove "pos" from the struct msi_desc msi_attrib PCI/MSI: Remove unused kobject from struct msi_desc PCI/MSI: Rename pci_msi_check_device() to pci_msi_supported() PCI/MSI: Move D0 check into pci_msi_check_device() PCI/MSI: Remove arch_msi_check_device() irqchip: armada-370-xp: Remove arch_msi_check_device() PCI/MSI/PPC: Remove arch_msi_check_device() arm64: Add architectural support for PCI PCI: Add pci_remap_iospace() to map bus I/O resources of/pci: Add support for parsing PCI host bridge resources from DT of/pci: Add pci_get_new_domain_nr() and of_get_pci_domain_nr() ... Conflicts: arch/arm64/boot/dts/apm-storm.dtsi
2014-10-09tracing: Clean up scheduling in trace_wakeup_test_thread()Steven Rostedt
Peter's new debugging tool triggers when tasks exit with !TASK_RUNNING. The code in trace_wakeup_test_thread() also has a single schedule() call that should be encompassed by a loop. This cleans up the code a little to make it a bit more robust and also makes the return exit properly with TASK_RUNNING. Link: http://lkml.kernel.org/p/20141008135216.76142204@gandalf.local.home Reported-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra <peterz@infreadead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-10-09Merge branch 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "The irq departement delivers: - a cleanup series to get rid of mindlessly copied code. - another bunch of new pointlessly different interrupt chip drivers. Adding homebrewn irq chips (and timers) to SoCs must provide a value add which is beyond the imagination of mere mortals. - the usual SoC irq controller updates, IOW my second cat herding project" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) irqchip: gic-v3: Implement CPU PM notifier irqchip: gic-v3: Refactor gic_enable_redist to support both enabling and disabling irqchip: renesas-intc-irqpin: Add minimal runtime PM support irqchip: renesas-intc-irqpin: Add helper variable dev = &pdev->dev irqchip: atmel-aic5: Add sama5d4 support irqchip: atmel-aic5: The sama5d3 has 48 IRQs Documentation: bcm7120-l2: Add Broadcom BCM7120-style L2 binding irqchip: bcm7120-l2: Add Broadcom BCM7120-style Level 2 interrupt controller irqchip: renesas-irqc: Add binding docs for new R-Car Gen2 SoCs irqchip: renesas-irqc: Add DT binding documentation irqchip: renesas-intc-irqpin: Document SoC-specific bindings openrisc: Get rid of handle_IRQ arm64: Get rid of handle_IRQ ARM: omap2: irq: Convert to handle_domain_irq ARM: imx: tzic: Convert to handle_domain_irq ARM: imx: avic: Convert to handle_domain_irq irqchip: or1k-pic: Convert to handle_domain_irq irqchip: atmel-aic5: Convert to handle_domain_irq irqchip: atmel-aic: Convert to handle_domain_irq irqchip: gic-v3: Convert to handle_domain_irq ...
2014-10-09Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Nothing really exciting this time: - a few fixlets in the NOHZ code - a new ARM SoC timer abomination. One should expect that we have enough of them already, but they insist on inventing new ones. - the usual bunch of ARM SoC timer updates. That feels like herding cats" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: arm_arch_timer: Consolidate arch_timer_evtstrm_enable clocksource: arm_arch_timer: Enable counter access for 32-bit ARM clocksource: arm_arch_timer: Change clocksource name if CP15 unavailable clocksource: sirf: Disable counter before re-setting it clocksource: cadence_ttc: Add support for 32bit mode clocksource: tcb_clksrc: Sanitize IRQ request clocksource: arm_arch_timer: Discard unavailable timers correctly clocksource: vf_pit_timer: Support shutdown mode ARM: meson6: clocksource: Add Meson6 timer support ARM: meson: documentation: Add timer documentation clocksource: sh_tmu: Document r8a7779 binding clocksource: sh_mtu2: Document r7s72100 binding clocksource: sh_cmt: Document SoC specific bindings timerfd: Remove an always true check nohz: Avoid tick's double reprogramming in highres mode nohz: Fix spurious periodic tick behaviour in low-res dynticks mode
2014-10-09Merge branch 'timers-nohz-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "Main changes: - Fix the deadlock reported by Dave Jones et al - Clean up and fix nohz_full interaction with arch abilities - nohz init code consolidation/cleanup" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: nohz: nohz full depends on irq work self IPI support nohz: Consolidate nohz full init code arm64: Tell irq work about self IPI support arm: Tell irq work about self IPI support x86: Tell irq work about self IPI support irq_work: Force raised irq work to run on irq work interrupt irq_work: Introduce arch_irq_work_has_interrupt() nohz: Move nohz full init call to tick init
2014-10-09s390/nohz: use a per-cpu flag for arch_needs_cpuMartin Schwidefsky
Move the nohz_delay bit from the s390_idle data structure to the per-cpu flags. Clear the nohz delay flag in __cpu_disable and remove the cpu hotplug notifier that used to do this. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-09Merge branch 'rcu/next' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull additional commits for locktorture, from Paul E. McKenney. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-09switch /dev/kmsg to ->write_iter()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Most notable changes in here: 1) By far the biggest accomplishment, thanks to a large range of contributors, is the addition of multi-send for transmit. This is the result of discussions back in Chicago, and the hard work of several individuals. Now, when the ->ndo_start_xmit() method of a driver sees skb->xmit_more as true, it can choose to defer the doorbell telling the driver to start processing the new TX queue entires. skb->xmit_more means that the generic networking is guaranteed to call the driver immediately with another SKB to send. There is logic added to the qdisc layer to dequeue multiple packets at a time, and the handling mis-predicted offloads in software is now done with no locks held. Finally, pktgen is extended to have a "burst" parameter that can be used to test a multi-send implementation. Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4, virtio_net Adding support is almost trivial, so export more drivers to support this optimization soon. I want to thank, in no particular or implied order, Jesper Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann, David Tat, Hannes Frederic Sowa, and Rusty Russell. 2) PTP and timestamping support in bnx2x, from Michal Kalderon. 3) Allow adjusting the rx_copybreak threshold for a driver via ethtool, and add rx_copybreak support to enic driver. From Govindarajulu Varadarajan. 4) Significant enhancements to the generic PHY layer and the bcm7xxx driver in particular (EEE support, auto power down, etc.) from Florian Fainelli. 5) Allow raw buffers to be used for flow dissection, allowing drivers to determine the optimal "linear pull" size for devices that DMA into pools of pages. The objective is to get exactly the necessary amount of headers into the linear SKB area pre-pulled, but no more. The new interface drivers use is eth_get_headlen(). From WANG Cong, with driver conversions (several had their own by-hand duplicated implementations) by Alexander Duyck and Eric Dumazet. 6) Support checksumming more smoothly and efficiently for encapsulations, and add "foo over UDP" facility. From Tom Herbert. 7) Add Broadcom SF2 switch driver to DSA layer, from Florian Fainelli. 8) eBPF now can load programs via a system call and has an extensive testsuite. Alexei Starovoitov and Daniel Borkmann. 9) Major overhaul of the packet scheduler to use RCU in several major areas such as the classifiers and rate estimators. From John Fastabend. 10) Add driver for Intel FM10000 Ethernet Switch, from Alexander Duyck. 11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric Dumazet. 12) Add Datacenter TCP congestion control algorithm support, From Florian Westphal. 13) Reorganize sk_buff so that __copy_skb_header() is significantly faster. From Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits) netlabel: directly return netlbl_unlabel_genl_init() net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers net: description of dma_cookie cause make xmldocs warning cxgb4: clean up a type issue cxgb4: potential shift wrapping bug i40e: skb->xmit_more support net: fs_enet: Add NAPI TX net: fs_enet: Remove non NAPI RX r8169:add support for RTL8168EP net_sched: copy exts->type in tcf_exts_change() wimax: convert printk to pr_foo() af_unix: remove 0 assignment on static ipv6: Do not warn for informational ICMP messages, regardless of type. Update Intel Ethernet Driver maintainers list bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING tipc: fix bug in multicast congestion handling net: better IFF_XMIT_DST_RELEASE support net/mlx4_en: remove NETDEV_TX_BUSY 3c59x: fix bad split of cpu_to_le32(pci_map_single()) net: bcmgenet: fix Tx ring priority programming ...
2014-10-08tracing: Robustify wait loopPeter Zijlstra
The pending nested sleep debugging triggered on the potential stale TASK_INTERRUPTIBLE in this code. While there, fix the loop such that we won't revert to a while(1) yield() 'spin' loop if we ever get a spurious wakeup. And fix the actual issue by properly terminating the 'wait' loop by setting TASK_RUNNING. Link: http://lkml.kernel.org/p/20141008165110.GA14547@worktop.programming.kicks-ass.net Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2014-10-08Merge tag 'nfs-for-3.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - fix an NFSv4.1 state renewal regression - fix open/lock state recovery error handling - fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails - fix statd when reconnection fails - don't wake tasks during connection abort - don't start reboot recovery if lease check fails - fix duplicate proc entries Features: - pNFS block driver fixes and clean ups from Christoph - More code cleanups from Anna - Improve mmap() writeback performance - Replace use of PF_TRANS with a more generic mechanism for avoiding deadlocks in nfs_release_page" * tag 'nfs-for-3.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (66 commits) NFSv4.1: Fix an NFSv4.1 state renewal regression NFSv4: fix open/lock state recovery error handling NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails NFS: Fabricate fscache server index key correctly SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT NFSv3: Fix missing includes of nfs3_fs.h NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() NFS: avoid waiting at all in nfs_release_page when congested. NFS: avoid deadlocks with loop-back mounted NFS filesystems. MM: export page_wakeup functions SCHED: add some "wait..on_bit...timeout()" interfaces. NFS: don't use STABLE writes during writeback. NFSv4: use exponential retry on NFS4ERR_DELAY for async requests. rpc: Add -EPERM processing for xs_udp_send_request() rpc: return sent and err from xs_sendpages() lockd: Try to reconnect if statd has moved SUNRPC: Don't wake tasks during connection abort Fixing lease renewal nfs: fix duplicate proc entries pnfs/blocklayout: Fix a 64-bit division/remainder issue in bl_map_stripe ...
2014-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
Pull crypto update from Herbert Xu: - add multibuffer infrastructure (single_task_running scheduler helper, OKed by Peter on lkml. - add SHA1 multibuffer implementation for AVX2. - reenable "by8" AVX CTR optimisation after fixing counter overflow. - add APM X-Gene SoC RNG support. - SHA256/SHA512 now handles unaligned input correctly. - set lz4 decompressed length correctly. - fix algif socket buffer allocation failure for 64K page machines. - misc fixes * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (47 commits) crypto: sha - Handle unaligned input data in generic sha256 and sha512. Revert "crypto: aesni - disable "by8" AVX CTR optimization" crypto: aesni - remove unused defines in "by8" variant crypto: aesni - fix counter overflow handling in "by8" variant hwrng: printk replacement crypto: qat - Removed unneeded partial state crypto: qat - Fix typo in name of tasklet_struct crypto: caam - Dynamic allocation of addresses for various memory blocks in CAAM. crypto: mcryptd - Fix typos in CRYPTO_MCRYPTD description crypto: algif - avoid excessive use of socket buffer in skcipher arm64: dts: add random number generator dts node to APM X-Gene platform. Documentation: rng: Add X-Gene SoC RNG driver documentation hwrng: xgene - add support for APM X-Gene SoC RNG support crypto: mv_cesa - Add missing #define crypto: testmgr - add test for lz4 and lz4hc crypto: lz4,lz4hc - fix decompression crypto: qat - Use pci_enable_msix_exact() instead of pci_enable_msix() crypto: drbg - fix maximum value checks on 32 bit systems crypto: drbg - fix sparse warning for cpu_to_be[32|64] crypto: sha-mb - sha1_mb_alg_state can be static ...
2014-10-08Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - eBPF JIT compiler for arm64 - CPU suspend backend for PSCI (firmware interface) with standard idle states defined in DT (generic idle driver to be merged via a different tree) - Support for CONFIG_DEBUG_SET_MODULE_RONX - Support for unmapped cpu-release-addr (outside kernel linear mapping) - set_arch_dma_coherent_ops() implemented and bus notifiers removed - EFI_STUB improvements when base of DRAM is occupied - Typos in KGDB macros - Clean-up to (partially) allow kernel building with LLVM - Other clean-ups (extern keyword, phys_addr_t usage) * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (51 commits) arm64: Remove unneeded extern keyword ARM64: make of_device_ids const arm64: Use phys_addr_t type for physical address aarch64: filter $x from kallsyms arm64: Use DMA_ERROR_CODE to denote failed allocation arm64: Fix typos in KGDB macros arm64: insn: Add return statements after BUG_ON() arm64: debug: don't re-enable debug exceptions on return from el1_dbg Revert "arm64: dmi: Add SMBIOS/DMI support" arm64: Implement set_arch_dma_coherent_ops() to replace bus notifiers of: amba: use of_dma_configure for AMBA devices arm64: dmi: Add SMBIOS/DMI support arm64: Correct ftrace calls to aarch64_insn_gen_branch_imm() arm64:mm: initialize max_mapnr using function set_max_mapnr setup: Move unmask of async interrupts after possible earlycon setup arm64: LLVMLinux: Fix inline arm64 assembly for use with clang arm64: pageattr: Correctly adjust unaligned start addresses net: bpf: arm64: fix module memory leak when JIT image build fails arm64: add PSCI CPU_SUSPEND based cpu_suspend support arm64: kernel: introduce cpu_init_idle CPU operation ...
2014-10-08Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM updates from Russell King: "Included in these updates are: - Performance optimisation to avoid writing the control register at every exception. - Use static inline instead of extern inline in ftrace code. - Crypto ARM assembly updates for big endian - Alignment of initrd/.init memory to page sizes when freeing to ensure that we fully free the regions - Add gcov support - A couple of preparatory patches for VDSO support: use _install_special_mapping, and randomize the sigpage placement above stack. - Add L2 ePAPR DT cache properties so that DT can specify the cache geometry. - Preparatory patch for FIQ (NMI) kernel C code for things like spinlock lockup debug. Following on from this are a couple of my patches cleaning up show_regs() and removing an unused (probably since 1.x days) do_unexp_fiq() function. - Use pr_warn() rather than pr_warning(). - A number of cleanups (smp, footbridge, return_address)" * 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (21 commits) ARM: 8167/1: extend the reserved memory for initrd to be page aligned ARM: 8168/1: extend __init_end to a page align address ARM: 8169/1: l2c: parse cache properties from ePAPR definitions ARM: 8160/1: drop warning about return_address not using unwind tables ARM: 8161/1: footbridge: select machine dir based on ARCH_FOOTBRIDGE ARM: 8158/1: LLVMLinux: use static inline in ARM ftrace.h ARM: 8155/1: place sigpage at a random offset above stack ARM: 8154/1: use _install_special_mapping for sigpage ARM: 8153/1: Enable gcov support on the ARM architecture ARM: Avoid writing to control register on every exception ARM: 8152/1: Convert pr_warning to pr_warn ARM: remove unused do_unexp_fiq() function ARM: remove extraneous newline in show_regs() ARM: 8150/3: fiq: Replace default FIQ handler ARM: 8140/1: ep93xx: Enable DEBUG_LL_UART_PL01X ARM: 8139/1: versatile: Enable DEBUG_LL_UART_PL01X ARM: 8138/1: drop ISAR0 workaround for B15 ARM: 8136/1: sa1100: add Micro ASIC platform device ARM: 8131/1: arm/smp: Absorb boot_secondary() ARM: 8126/1: crypto: enable NEON SHA-384/SHA-512 for big endian ...
2014-10-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull "trivial tree" updates from Jiri Kosina: "Usual pile from trivial tree everyone is so eagerly waiting for" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) Remove MN10300_PROC_MN2WS0038 mei: fix comments treewide: Fix typos in Kconfig kprobes: update jprobe_example.c for do_fork() change Documentation: change "&" to "and" in Documentation/applying-patches.txt Documentation: remove obsolete pcmcia-cs from Changes Documentation: update links in Changes Documentation: Docbook: Fix generated DocBook/kernel-api.xml score: Remove GENERIC_HAS_IOMAP gpio: fix 'CONFIG_GPIO_IRQCHIP' comments tty: doc: Fix grammar in serial/tty dma-debug: modify check_for_stack output treewide: fix errors in printk genirq: fix reference in devm_request_threaded_irq comment treewide: fix synchronize_rcu() in comments checkstack.pl: port to AArch64 doc: queue-sysfs: minor fixes init/do_mounts: better syntax description MIPS: fix comment spelling powerpc/simpleboot: fix comment ...
2014-10-07Merge tag 'dmaengine-3.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine Pull dmaengine updates from Dan Williams: "Even though this has fixes marked for -stable, given the size and the needed conflict resolutions this is 3.18-rc1/merge-window material. These patches have been languishing in my tree for a long while. The fact that I do not have the time to do proper/prompt maintenance of this tree is a primary factor in the decision to step down as dmaengine maintainer. That and the fact that the bulk of drivers/dma/ activity is going through Vinod these days. The net_dma removal has not been in -next. It has developed simple conflicts against mainline and net-next (for-3.18). Continuing thanks to Vinod for staying on top of drivers/dma/. Summary: 1/ Step down as dmaengine maintainer see commit 08223d80df38 "dmaengine maintainer update" 2/ Removal of net_dma, as it has been marked 'broken' since 3.13 (commit 77873803363c "net_dma: mark broken"), without reports of performance regression. 3/ Miscellaneous fixes" * tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine: net: make tcp_cleanup_rbuf private net_dma: revert 'copied_early' net_dma: simple removal dmaengine maintainer update dmatest: prevent memory leakage on error path in thread ioat: Use time_before_jiffies() dmaengine: fix xor sources continuation dma: mv_xor: Rename __mv_xor_slot_cleanup() to mv_xor_slot_cleanup() dma: mv_xor: Remove all callers of mv_xor_slot_cleanup() dma: mv_xor: Remove unneeded mv_xor_clean_completed_slots() call ioat: Use pci_enable_msix_exact() instead of pci_enable_msix() drivers: dma: Include appropriate header file in dca.c drivers: dma: Mark functions as static in dma_v3.c dma: mv_xor: Add DMA API error checks ioat/dca: Use dev_is_pci() to check whether it is pci device
2014-10-07Merge tag 'modules-next-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module update from Rusty Russell: "Nothing major: support for compressing modules, and auto-tainting params. PS. My virtio-next tree is empty: DaveM took the patches I had. There might be a virtio-rng starvation fix, but so far it's a bit voodoo so I will get to that in the next two days or it will wait" * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: moduleparam: Resolve missing-field-initializer warning kbuild: handle module compression while running 'make modules_install'. modinst: wrap long lines in order to enhance cmd_modules_install modsign: lookup lines ending in .ko in .mod files modpost: simplify file name generation of *.mod.c files modpost: reduce visibility of symbols and constify r/o arrays param: check for tainting before calling set op. drm/i915: taint the kernel if unsafe module parameters are set module: add module_param_unsafe and module_param_named_unsafe module: make it possible to have unsafe, tainting module params module: rename KERNEL_PARAM_FL_NOARG to avoid confusion
2014-10-07Merge tag 'tiny/for-3.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux Pull "tinification" patches from Josh Triplett. Work on making smaller kernels. * tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux: bloat-o-meter: Ignore syscall aliases SyS_ and compat_SyS_ mm: Support compiling out madvise and fadvise x86: Support compiling out human-friendly processor feature names x86: Drop support for /proc files when !CONFIG_PROC_FS x86, boot: Don't compile early_serial_console.c when !CONFIG_EARLY_PRINTK x86, boot: Don't compile aslr.c when !CONFIG_RANDOMIZE_BASE x86, boot: Use the usual -y -n mechanism for objects in vmlinux x86: Add "make tinyconfig" to configure the tiniest possible kernel x86, platform, kconfig: move kvmconfig functionality to a helper
2014-10-07Merge branch 'pm-domains'Rafael J. Wysocki
* pm-domains: (32 commits) PM / Domains: Rename cpu_data to cpuidle_data PM / Domains: Move dev_pm_domain_attach|detach() to pm_domain.h PM / Domains: Remove legacy API for adding devices through DT PM / Domains: Add genpd attach/detach callbacks PM / Domains: add debugfs listing of struct generic_pm_domain-s ACPI / PM: Convert acpi_dev_pm_detach() into a static function ARM: exynos: Move to generic PM domain DT bindings amba: Add support for attach/detach of PM domains spi: core: Convert to dev_pm_domain_attach|detach() mmc: sdio: Convert to dev_pm_domain_attach|detach() i2c: core: Convert to dev_pm_domain_attach|detach() drivercore / platform: Convert to dev_pm_domain_attach|detach() PM / Domains: Add APIs to attach/detach a PM domain for a device PM / Domains: Add generic OF-based PM domain look-up ACPI / PM: Assign the ->detach() callback when attaching the PM domain PM / Domains: Add a detach callback to the struct dev_pm_domain PM / domains: Spelling s/domian/domain/ PM / domains: Keep declaration of dev_power_governors together PM / domains: Remove default_stop_ok() API drivers: sh: Leave disabling of unused PM domains to genpd ...
2014-10-07Merge branch 'acpi-pm'Rafael J. Wysocki
* acpi-pm: ACPI / sleep: Rework the handling of ACPI GPE wakeup from suspend-to-idle PM / sleep: Rename platform suspend/resume functions in suspend.c PM / sleep: Export dpm_suspend_late/noirq() and dpm_resume_early/noirq()
2014-10-07Merge branch 'pm-sleep'Rafael J. Wysocki
* pm-sleep: PM / hibernate: Iterate over set bits instead of PFNs in swsusp_free() PM / sleep: new suspend_resume trace event for console resume PM / sleep: Update test_suspend option documentation PM / sleep: Enhance test_suspend option with repeat capability PM / sleep: Support freeze as test_suspend option PM / sysfs: avoid shadowing variables
2014-10-07Merge branch 'pm-genirq'Rafael J. Wysocki
* pm-genirq: PM / genirq: Document rules related to system suspend and interrupts PCI / PM: Make PCIe PME interrupts wake up from suspend-to-idle x86 / PM: Set IRQCHIP_SKIP_SET_WAKE for IOAPIC IRQ chip objects genirq: Simplify wakeup mechanism genirq: Mark wakeup sources as armed on suspend genirq: Create helper for flow handler entry check genirq: Distangle edge handler entry genirq: Avoid double loop on suspend genirq: Move MASK_ON_SUSPEND handling into suspend_device_irqs() genirq: Make use of pm misfeature accounting genirq: Add sanity checks for PM options on shared interrupt lines genirq: Move suspend/resume logic into irq/pm code PM / sleep: Mechanism for aborting system suspends unconditionally
2014-10-06workqueue: Use cond_resched_rcu_qs macroJoe Lawrence
Tidy up and use cond_resched_rcu_qs when calling cond_resched and reporting potential quiescent state to RCU. Splitting this change in this way allows easy backporting to -stable for kernel versions not having cond_resched_rcu_qs(). Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-10-06workqueue: Add quiescent state between work itemsJoe Lawrence
Similar to the stop_machine deadlock scenario on !PREEMPT kernels addressed in b22ce2785d97 "workqueue: cond_resched() after processing each work item", kworker threads requeueing back-to-back with zero jiffy delay can stall RCU. The cond_resched call introduced in that fix will yield only iff there are other higher priority tasks to run, so force a quiescent RCU state between work items. Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com> Link: https://lkml.kernel.org/r/20140926105227.01325697@jlaw-desktop.mno.stratus.com Link: https://lkml.kernel.org/r/20140929115445.40221d8e@jlaw-desktop.mno.stratus.com Fixes: b22ce2785d97 ("workqueue: cond_resched() after processing each work item") Cc: <stable@vger.kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-10-03Merge tag 'trace-fixes-v3.17-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull trace ring buffer iterator fix from Steven Rostedt: "While testing some new changes for 3.18, I kept hitting a bug every so often in the ring buffer. At first I thought it had to do with some of the changes I was working on, but then testing something else I realized that the bug was in 3.17 itself. I ran several bisects as the bug was not very reproducible, and finally came up with the commit that I could reproduce easily within a few minutes, and without the change I could run the tests over an hour without issue. The change fit the bug and I figured out a fix. That bad commit was: Commit 651e22f2701b "ring-buffer: Always reset iterator to reader page" This commit fixed a bug, but in the process created another one. It used the wrong value as the cached value that is used to see if things changed while an iterator was in use. This made it look like a change always happened, and could cause the iterator to go into an infinite loop" * tag 'trace-fixes-v3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Fix infinite spin in reading buffer
2014-10-03locking/lockdep: Revert qrwlock recusive stuffPeter Zijlstra
Commit f0bab73cb539 ("locking/lockdep: Restrict the use of recursive read_lock() with qrwlock") changed lockdep to try and conform to the qrwlock semantics which differ from the traditional rwlock semantics. In particular qrwlock is fair outside of interrupt context, but in interrupt context readers will ignore all fairness. The problem modeling this is that read and write side have different lock state (interrupts) semantics but we only have a single representation of these. Therefore lockdep will get confused, thinking the lock can cause interrupt lock inversions. So revert it for now; the old rwlock semantics were already imperfectly modeled and the qrwlock extra won't fit either. If we want to properly fix this, I think we need to resurrect the work by Gautham did a few years ago that split the read and write state of locks: http://lwn.net/Articles/332801/ FWIW the locking selftest that would've failed (and was reported by Borislav earlier) is something like: RL(X1); /* IRQ-ON */ LOCK(A); UNLOCK(A); RU(X1); IRQ_ENTER(); RL(X1); /* IN-IRQ */ RU(X1); IRQ_EXIT(); At which point it would report that because A is an IRQ-unsafe lock we can suffer the following inversion: CPU0 CPU1 lock(A) lock(X1) lock(A) <IRQ> lock(X1) And this is 'wrong' because X1 can recurse (assuming the above lock are in fact read-lock) but lockdep doesn't know about this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Waiman Long <Waiman.Long@hp.com> Cc: ego@linux.vnet.ibm.com Cc: bp@alien8.de Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20140930132600.GA7444@worktop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03locking/rwsem: Avoid double checking before try acquiring write lockJason Low
Commit 9b0fc9c09f1b ("rwsem: skip initial trylock in rwsem_down_write_failed") checks for if there are known active lockers in order to avoid write trylocking using expensive cmpxchg() when it likely wouldn't get the lock. However, a subsequent patch was added such that we directly check for sem->count == RWSEM_WAITING_BIAS right before trying that cmpxchg(). Thus, commit 9b0fc9c09f1b now just adds overhead. This patch modifies it so that we only do a check for if count == RWSEM_WAITING_BIAS. Also, add a comment on why we do an "extra check" of count before the cmpxchg(). Signed-off-by: Jason Low <jason.low2@hp.com> Acked-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Chegu Vinod <chegu_vinod@hp.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1410913017.2447.22.camel@j-VirtualBox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03sched/dl: Use dl_bw_of() under rcu_read_lock_sched()Kirill Tkhai
rq->rd is freed using call_rcu_sched(), so rcu_read_lock() to access it is not enough. We should use either rcu_read_lock_sched() or preempt_disable(). Reported-by: Sasha Levin <sasha.levin@oracle.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Fixes: 66339c31bc39 "sched: Use dl_bw_of() under RCU read lock" Link: http://lkml.kernel.org/r/1412065417.20287.24.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03sched/fair: Delete resched_cpu() from idle_balance()Kirill Tkhai
We already reschedule env.dst_cpu in attach_tasks()->check_preempt_curr() if this is necessary. Furthermore, a higher priority class task may be current on dest rq, we shouldn't disturb it. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Cc: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140930210441.5258.55054.stgit@localhost Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03sched, time: Fix build error with 64 bit cputime_t on 32 bit systemsRik van Riel
On 32 bit systems cmpxchg cannot handle 64 bit values, so some additional magic is required to allow a 32 bit system with CONFIG_VIRT_CPU_ACCOUNTING_GEN=y enabled to build. Make sure the correct cmpxchg function is used when doing an atomic swap of a cputime_t. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: umgwanakikbuti@gmail.com Cc: fweisbec@gmail.com Cc: srao@redhat.com Cc: lwoodman@redhat.com Cc: atheurer@redhat.com Cc: oleg@redhat.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: linux390@de.ibm.com Cc: linux-arch@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Link: http://lkml.kernel.org/r/20140930155947.070cdb1f@annuminas.surriel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03sched: Improve sysbench performance by fixing spurious active migrationVincent Guittot
Since commit caeb178c60f4 ("sched/fair: Make update_sd_pick_busiest() ...") sd_pick_busiest returns a group that can be neither imbalanced nor overloaded but is only more loaded than others. This change has been introduced to ensure a better load balance in system that are not overloaded but as a side effect, it can also generate useless active migration between groups. Let take the example of 3 tasks on a quad cores system. We will always have an idle core so the load balance will find a busiest group (core) whenever an ILB is triggered and it will force an active migration (once above nr_balance_failed threshold) so the idle core becomes busy but another core will become idle. With the next ILB, the freshly idle core will try to pull the task of a busy CPU. The number of spurious active migration is not so huge in quad core system because the ILB is not triggered so much. But it becomes significant as soon as you have more than one sched_domain level like on a dual cluster of quad cores where the ILB is triggered every tick when you have more than 1 busy_cpu We need to ensure that the migration generate a real improveùent and will not only move the avg_load imbalance on another CPU. Before caeb178c60f4f93f1b45c0bc056b5cf6d217b67f, the filtering of such use case was ensured by the following test in f_b_g: if ((local->idle_cpus < busiest->idle_cpus) && busiest->sum_nr_running <= busiest->group_weight) This patch modified the condition to take into account situation where busiest group is not overloaded: If the diff between the number of idle cpus in 2 groups is less than or equal to 1 and the busiest group is not overloaded, moving a task will not improve the load balance but just move it. A test with sysbench on a dual clusters of quad cores gives the following results: command: sysbench --test=cpu --num-threads=5 --max-time=5 run The HZ is 200 which means that 1000 ticks has fired during the test. With Mainline, perf gives the following figures: Samples: 727 of event 'sched:sched_migrate_task' Event count (approx.): 727 Overhead Command Shared Object Symbol ........ ............... ............. .............. 12.52% migration/1 [unknown] [.] 00000000 12.52% migration/5 [unknown] [.] 00000000 12.52% migration/7 [unknown] [.] 00000000 12.10% migration/6 [unknown] [.] 00000000 11.83% migration/0 [unknown] [.] 00000000 11.83% migration/3 [unknown] [.] 00000000 11.14% migration/4 [unknown] [.] 00000000 10.87% migration/2 [unknown] [.] 00000000 2.75% sysbench [unknown] [.] 00000000 0.83% swapper [unknown] [.] 00000000 0.55% ktps65090charge [unknown] [.] 00000000 0.41% mmcqd/1 [unknown] [.] 00000000 0.14% perf [unknown] [.] 00000000 With this patch, perf gives the following figures Samples: 20 of event 'sched:sched_migrate_task' Event count (approx.): 20 Overhead Command Shared Object Symbol ........ ............... ............. .............. 80.00% sysbench [unknown] [.] 00000000 10.00% swapper [unknown] [.] 00000000 5.00% ktps65090charge [unknown] [.] 00000000 5.00% migration/1 [unknown] [.] 00000000 Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1412170735-5356-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03perf: Fix perf bug in fork()Peter Zijlstra
Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by calling perf_event_free_task() when failing sched_fork() we will not yet have done the memset() on ->perf_event_ctxp[] and will therefore try and 'free' the inherited contexts, which are still in use by the parent process. This is bad and might explain some outstanding fuzzer failures ... Suggested-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Sylvain 'ythier' Hitier <sylvain.hitier@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daeseok Youn <daeseok.youn@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Rik van Riel <riel@redhat.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20140929101201.GE5430@worktop Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03perf: Fix unclone_ctx() vs. lockingPeter Zijlstra
The idiot who did 4a1c0f262f88 ("perf: Fix lockdep warning on process exit") forgot to pay attention and fix all similar cases. Do so now. In particular, unclone_ctx() must be called while holding ctx->lock, therefore all such sites are broken for the same reason. Pull the put_ctx() call out from under ctx->lock. Reported-by: Sasha Levin <sasha.levin@oracle.com> Probably-also-reported-by: Vince Weaver <vincent.weaver@maine.edu> Fixes: 4a1c0f262f88 ("perf: Fix lockdep warning on process exit") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Cong Wang <cwang@twopensource.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140930172308.GI4241@worktop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-02perf: fix perf bug in fork()Peter Zijlstra
Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by calling perf_event_free_task() when failing sched_fork() we will not yet have done the memset() on ->perf_event_ctxp[] and will therefore try and 'free' the inherited contexts, which are still in use by the parent process. This is bad.. Suggested-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Sylvain 'ythier' Hitier <sylvain.hitier@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-02ring-buffer: Fix infinite spin in reading bufferSteven Rostedt (Red Hat)
Commit 651e22f2701b "ring-buffer: Always reset iterator to reader page" fixed one bug but in the process caused another one. The reset is to update the header page, but that fix also changed the way the cached reads were updated. The cache reads are used to test if an iterator needs to be updated or not. A ring buffer iterator, when created, disables writes to the ring buffer but does not stop other readers or consuming reads from happening. Although all readers are synchronized via a lock, they are only synchronized when in the ring buffer functions. Those functions may be called by any number of readers. The iterator continues down when its not interrupted by a consuming reader. If a consuming read occurs, the iterator starts from the beginning of the buffer. The way the iterator sees that a consuming read has happened since its last read is by checking the reader "cache". The cache holds the last counts of the read and the reader page itself. Commit 651e22f2701b changed what was saved by the cache_read when the rb_iter_reset() occurred, making the iterator never match the cache. Then if the iterator calls rb_iter_reset(), it will go into an infinite loop by checking if the cache doesn't match, doing the reset and retrying, just to see that the cache still doesn't match! Which should never happen as the reset is suppose to set the cache to the current value and there's locks that keep a consuming reader from having access to the data. Fixes: 651e22f2701b "ring-buffer: Always reset iterator to reader page" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-10-02Merge branches 'fiq' (early part), 'fixes', 'l2c' (early part) and 'misc' ↵Russell King
into for-next
2014-10-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/usb/r8152.c net/netfilter/nfnetlink.c Both r8152 and nfnetlink conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-02aarch64: filter $x from kallsymsKyle McMartin
Similar to ARM, AArch64 is generating $x and $d syms... which isn't terribly helpful when looking at %pF output and the like. Filter those out in kallsyms, modpost and when looking at module symbols. Seems simplest since none of these check EM_ARM anyway, to just add it to the strchr used, rather than trying to make things overly complicated. initcall_debug improves: dmesg_before.txt: initcall $x+0x0/0x154 [sg] returned 0 after 26331 usecs dmesg_after.txt: initcall init_sg+0x0/0x154 [sg] returned 0 after 15461 usecs Signed-off-by: Kyle McMartin <kyle@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-10-01bpf: add search pruning optimization to verifierAlexei Starovoitov
consider C program represented in eBPF: int filter(int arg) { int a, b, c, *ptr; if (arg == 1) ptr = &a; else if (arg == 2) ptr = &b; else ptr = &c; *ptr = 0; return 0; } eBPF verifier has to follow all possible paths through the program to recognize that '*ptr = 0' instruction would be safe to execute in all situations. It's doing it by picking a path towards the end and observes changes to registers and stack at every insn until it reaches bpf_exit. Then it comes back to one of the previous branches and goes towards the end again with potentially different values in registers. When program has a lot of branches, the number of possible combinations of branches is huge, so verifer has a hard limit of walking no more than 32k instructions. This limit can be reached and complex (but valid) programs could be rejected. Therefore it's important to recognize equivalent verifier states to prune this depth first search. Basic idea can be illustrated by the program (where .. are some eBPF insns): 1: .. 2: if (rX == rY) goto 4 3: .. 4: .. 5: .. 6: bpf_exit In the first pass towards bpf_exit the verifier will walk insns: 1, 2, 3, 4, 5, 6 Since insn#2 is a branch the verifier will remember its state in verifier stack to come back to it later. Since insn#4 is marked as 'branch target', the verifier will remember its state in explored_states[4] linked list. Once it reaches insn#6 successfully it will pop the state recorded at insn#2 and will continue. Without search pruning optimization verifier would have to walk 4, 5, 6 again, effectively simulating execution of insns 1, 2, 4, 5, 6 With search pruning it will check whether state at #4 after jumping from #2 is equivalent to one recorded in explored_states[4] during first pass. If there is an equivalent state, verifier can prune the search at #4 and declare this path to be safe as well. In other words two states at #4 are equivalent if execution of 1, 2, 3, 4 insns and 1, 2, 4 insns produces equivalent registers and stack. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30PM / hibernate: Iterate over set bits instead of PFNs in swsusp_free()Joerg Roedel
The existing implementation of swsusp_free iterates over all pfns in the system and checks every bit in the two memory bitmaps. This doesn't scale very well with large numbers of pfns, especially when the bitmaps are not populated very densly. Change the algorithm to iterate over the set bits in the bitmaps instead to make it scale better in large memory configurations. Also add a memory_bm_clear_current() helper function that clears the bit for the last position returned from the memory bitmap. This new version adds a !NULL check for the memory bitmaps before they are walked. Not doing so causes a kernel crash when the bitmaps are NULL. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-30ACPI / sleep: Rework the handling of ACPI GPE wakeup from suspend-to-idleRafael J. Wysocki
The ACPI GPE wakeup from suspend-to-idle is currently based on using the IRQF_NO_SUSPEND flag for the ACPI SCI, but that is problematic for a couple of reasons. First, in principle the ACPI SCI may be shared and IRQF_NO_SUSPEND does not really work well with shared interrupts. Second, it may require the ACPI subsystem to special-case the handling of device notifications depending on whether or not they are received during suspend-to-idle in some places which would lead to fragile code. Finally, it's better the handle ACPI wakeup interrupts consistently with wakeup interrupts from other sources. For this reason, remove the IRQF_NO_SUSPEND flag from the ACPI SCI and use enable_irq_wake()/disable_irq_wake() with it instead, which requires two additional platform hooks to be added to struct platform_freeze_ops. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-30PM / sleep: Rename platform suspend/resume functions in suspend.cRafael J. Wysocki
Rename several local functions related to platform handling during system suspend resume in suspend.c so that their names better reflect their roles. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-30PM / sleep: Export dpm_suspend_late/noirq() and dpm_resume_early/noirq()Rafael J. Wysocki
Subsequent change sets will add platform-related operations between dpm_suspend_late() and dpm_suspend_noirq() as well as between dpm_resume_noirq() and dpm_resume_early() in suspend_enter(), so export these functions for suspend_enter() to be able to call them separately and split the invocations of dpm_suspend_end() and dpm_resume_start() in there accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-30Merge branch 'pm-genirq' into acpi-pmRafael J. Wysocki