summaryrefslogtreecommitdiff
path: root/drivers/base/cpu.c
AgeCommit message (Collapse)Author
2019-11-04x86/bugs: Add ITLB_MULTIHIT bug infrastructureVineela Tummalapalli
Some processors may incur a machine check error possibly resulting in an unrecoverable CPU lockup when an instruction fetch encounters a TLB multi-hit in the instruction TLB. This can occur when the page size is changed along with either the physical address or cache type. The relevant erratum can be found here: https://bugzilla.kernel.org/show_bug.cgi?id=205195 There are other processors affected for which the erratum does not fully disclose the impact. This issue affects both bare-metal x86 page tables and EPT. It can be mitigated by either eliminating the use of large pages or by using careful TLB invalidations when changing the page size in the page tables. Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which are mitigated against this issue. Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com> Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-10-28x86/speculation/taa: Add sysfs reporting for TSX Async AbortPawan Gupta
Add the sysfs reporting file for TSX Async Abort. It exposes the vulnerability and the mitigation state similar to the existing files for the other hardware vulnerabilities. Sysfs file path is: /sys/devices/system/cpu/vulnerabilities/tsx_async_abort Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Reviewed-by: Mark Gross <mgross@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2019-05-14Merge branch 'x86-mds-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 MDS mitigations from Thomas Gleixner: "Microarchitectural Data Sampling (MDS) is a hardware vulnerability which allows unprivileged speculative access to data which is available in various CPU internal buffers. This new set of misfeatures has the following CVEs assigned: CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory MDS attacks target microarchitectural buffers which speculatively forward data under certain conditions. Disclosure gadgets can expose this data via cache side channels. Contrary to other speculation based vulnerabilities the MDS vulnerability does not allow the attacker to control the memory target address. As a consequence the attacks are purely sampling based, but as demonstrated with the TLBleed attack samples can be postprocessed successfully. The mitigation is to flush the microarchitectural buffers on return to user space and before entering a VM. It's bolted on the VERW instruction and requires a microcode update. As some of the attacks exploit data structures shared between hyperthreads, full protection requires to disable hyperthreading. The kernel does not do that by default to avoid breaking unattended updates. The mitigation set comes with documentation for administrators and a deeper technical view" * 'x86-mds-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/speculation/mds: Fix documentation typo Documentation: Correct the possible MDS sysfs values x86/mds: Add MDSUM variant to the MDS documentation x86/speculation/mds: Add 'mitigations=' support for MDS x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off x86/speculation/mds: Fix comment x86/speculation/mds: Add SMT warning message x86/speculation: Move arch_smt_update() call to after mitigation decisions x86/speculation/mds: Add mds=full,nosmt cmdline option Documentation: Add MDS vulnerability documentation Documentation: Move L1TF to separate directory x86/speculation/mds: Add mitigation mode VMWERV x86/speculation/mds: Add sysfs reporting for MDS x86/speculation/mds: Add mitigation control for MDS x86/speculation/mds: Conditionally clear CPU buffers on idle entry x86/kvm/vmx: Add MDS protection when L1D Flush is not active x86/speculation/mds: Clear CPU buffers on exit to user x86/speculation/mds: Add mds_clear_cpu_buffers() x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests x86/speculation/mds: Add BUG_MSBDS_ONLY ...
2019-03-06Merge tag 'driver-core-5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big driver core patchset for 5.1-rc1 More patches than "normal" here this merge window, due to some work in the driver core by Alexander Duyck to rework the async probe functionality to work better for a number of devices, and independant work from Rafael for the device link functionality to make it work "correctly". Also in here is: - lots of BUS_ATTR() removals, the macro is about to go away - firmware test fixups - ihex fixups and simplification - component additions (also includes i915 patches) - lots of minor coding style fixups and cleanups. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits) driver core: platform: remove misleading err_alloc label platform: set of_node in platform_device_register_full() firmware: hardcode the debug message for -ENOENT driver core: Add missing description of new struct device_link field driver core: Fix PM-runtime for links added during consumer probe drivers/component: kerneldoc polish async: Add cmdline option to specify drivers to be async probed driver core: Fix possible supplier PM-usage counter imbalance PM-runtime: Fix __pm_runtime_set_status() race with runtime resume driver: platform: Support parsing GpioInt 0 in platform_get_irq() selftests: firmware: fix verify_reqs() return value Revert "selftests: firmware: remove use of non-standard diff -Z option" Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config" device: Fix comment for driver_data in struct device kernfs: Allocating memory for kernfs_iattrs with kmem_cache. sysfs: remove unused include of kernfs-internal.h driver core: Postpone DMA tear-down until after devres release driver core: Document limitation related to DL_FLAG_RPM_ACTIVE PM-runtime: Take suppliers into account in __pm_runtime_set_status() device.h: Add __cold to dev_<level> logging functions ...
2019-03-06x86/speculation/mds: Add sysfs reporting for MDSThomas Gleixner
Add the sysfs reporting file for MDS. It exposes the vulnerability and mitigation state similar to the existing files for the other speculative hardware vulnerabilities. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jon Masters <jcm@redhat.com> Tested-by: Jon Masters <jcm@redhat.com>
2019-02-19PM / core: Add support to skip power management in device/driver modelSudeep Holla
All device objects in the driver model contain fields that control the handling of various power management activities. However, it's not always useful. There are few instances where pseudo devices are added to the model just to take advantage of many other features like kobjects, udev events, and so on. One such example is cpu devices and their caches. The sysfs for the cpu caches are managed by adding devices with cpu as the parent in cpu_device_create() when secondary cpu is brought online. Generally when the secondary CPUs are hotplugged back in as part of resume from suspend-to-ram, we call cpu_device_create() from the cpu hotplug state machine while the cpu device associated with that CPU is not yet ready to be resumed as the device_resume() call happens bit later. It's not really needed to set the flag is_prepared for cpu devices as they are mostly pseudo device and hotplug framework deals with state machine and not managed through the cpu device. This often results in annoying warning when resuming: Enabling non-boot CPUs ... CPU1: Booted secondary processor cache: parent cpu1 should not be sleeping CPU1 is up CPU2: Booted secondary processor cache: parent cpu2 should not be sleeping CPU2 is up .... and so on. So in order to fix these kind of errors, we could just completely avoid doing any power management related initialisations and operations if they are not used by these devices. Add no_pm flags to indicate that the device doesn't require any sort of PM activities and all of them can be completely skipped. We can use the same flag to also avoid adding not used *power* sysfs entries for these devices. For now, lets use this for cpu cache devices. Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Eugeniu Rosca <erosca@de.adit-jv.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-31drivers: base: Use __printf markup to silence compilerMathieu Malaterre
Silence warnings (triggered at W=1) by adding relevant __printf attributes. drivers/base/cpu.c:432:2: warning: function '__cpu_device_create' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-20x86/speculation/l1tf: Add sysfs reporting for l1tfAndi Kleen
L1TF core kernel workarounds are cheap and normally always enabled, However they still should be reported in sysfs if the system is vulnerable or mitigated. Add the necessary CPU feature/bug bits. - Extend the existing checks for Meltdowns to determine if the system is vulnerable. All CPUs which are not vulnerable to Meltdown are also not vulnerable to L1TF - Check for 32bit non PAE and emit a warning as there is no practical way for mitigation due to the limited physical address bits - If the system has more than MAX_PA/2 physical memory the invert page workarounds don't protect the system against the L1TF attack anymore, because an inverted physical address will also point to valid memory. Print a warning in this case and report that the system is vulnerable. Add a function which returns the PFN limit for the L1TF mitigation, which will be used in follow up patches for sanity and range checks. [ tglx: Renamed the CPU feature bit to L1TF_PTEINV ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Dave Hansen <dave.hansen@intel.com>
2018-05-03x86/bugs: Expose /sys/../spec_store_bypassKonrad Rzeszutek Wilk
Add the sysfs file for the new vulerability. It does not do much except show the words 'Vulnerable' for recent x86 cores. Intel cores prior to family 6 are known not to be vulnerable, and so are some Atoms and some Xeon Phi. It assumes that older Cyrix, Centaur, etc. cores are immune. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ingo Molnar <mingo@kernel.org>
2018-03-15driver core: cpu: use put_device() if device_register failArvind Yadav
if device_register() returned an error! Always use put_device() to give up the reference initialized. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-01Merge tag 'driver-core-4.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of "big" driver core patches for 4.16-rc1. The majority of the work here is in the firmware subsystem, with reworks to try to attempt to make the code easier to handle in the long run, but no functional change. There's also some tree-wide sysfs attribute fixups with lots of acks from the various subsystem maintainers, as well as a handful of other normal fixes and changes. And finally, some license cleanups for the driver core and sysfs code. All have been in linux-next for a while with no reported issues" * tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (48 commits) device property: Define type of PROPERTY_ENRTY_*() macros device property: Reuse property_entry_free_data() device property: Move property_entry_free_data() upper firmware: Fix up docs referring to FIRMWARE_IN_KERNEL firmware: Drop FIRMWARE_IN_KERNEL Kconfig option USB: serial: keyspan: Drop firmware Kconfig options sysfs: remove DEBUG defines sysfs: use SPDX identifiers drivers: base: add coredump driver ops sysfs: add attribute specification for /sysfs/devices/.../coredump test_firmware: fix missing unlock on error in config_num_requests_store() test_firmware: make local symbol test_fw_config static sysfs: turn WARN() into pr_warn() firmware: Fix a typo in fallback-mechanisms.rst treewide: Use DEVICE_ATTR_WO treewide: Use DEVICE_ATTR_RO treewide: Use DEVICE_ATTR_RW sysfs.h: Use octal permissions component: add debugfs support bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate ...
2018-01-14Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti updates from Thomas Gleixner: "This contains: - a PTI bugfix to avoid setting reserved CR3 bits when PCID is disabled. This seems to cause issues on a virtual machine at least and is incorrect according to the AMD manual. - a PTI bugfix which disables the perf BTS facility if PTI is enabled. The BTS AUX buffer is not globally visible and causes the CPU to fault when the mapping disappears on switching CR3 to user space. A full fix which restores BTS on PTI is non trivial and will be worked on. - PTI bugfixes for EFI and trusted boot which make sure that the user space visible page table entries have the NX bit cleared - removal of dead code in the PTI pagetable setup functions - add PTI documentation - add a selftest for vsyscall to verify that the kernel actually implements what it advertises. - a sysfs interface to expose vulnerability and mitigation information so there is a coherent way for users to retrieve the status. - the initial spectre_v2 mitigations, aka retpoline: + The necessary ASM thunk and compiler support + The ASM variants of retpoline and the conversion of affected ASM code + Make LFENCE serializing on AMD so it can be used as speculation trap + The RSB fill after vmexit - initial objtool support for retpoline As I said in the status mail this is the most of the set of patches which should go into 4.15 except two straight forward patches still on hold: - the retpoline add on of LFENCE which waits for ACKs - the RSB fill after context switch Both should be ready to go early next week and with that we'll have covered the major holes of spectre_v2 and go back to normality" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits) x86,perf: Disable intel_bts when PTI security/Kconfig: Correct the Documentation reference for PTI x86/pti: Fix !PCID and sanitize defines selftests/x86: Add test_vsyscall x86/retpoline: Fill return stack buffer on vmexit x86/retpoline/irq32: Convert assembler indirect jumps x86/retpoline/checksum32: Convert assembler indirect jumps x86/retpoline/xen: Convert Xen hypercall indirect jumps x86/retpoline/hyperv: Convert assembler indirect jumps x86/retpoline/ftrace: Convert ftrace assembler indirect jumps x86/retpoline/entry: Convert entry assembler indirect jumps x86/retpoline/crypto: Convert crypto assembler indirect jumps x86/spectre: Add boot time option to select Spectre v2 mitigation x86/retpoline: Add initial retpoline support objtool: Allow alternatives to be ignored objtool: Detect jumps to retpoline thunks x86/pti: Make unpoison of pgd for trusted boot work for real x86/alternatives: Fix optimize_nops() checking sysfs/cpu: Fix typos in vulnerability documentation x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC ...
2018-01-08sysfs/cpu: Add vulnerability folderThomas Gleixner
As the meltdown/spectre problem affects several CPU architectures, it makes sense to have common way to express whether a system is affected by a particular vulnerability or not. If affected the way to express the mitigation should be common as well. Create /sys/devices/system/cpu/vulnerabilities folder and files for meltdown, spectre_v1 and spectre_v2. Allow architectures to override the show function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/20180107214913.096657732@linutronix.de
2017-12-07driver core: add SPDX identifiers to all driver core filesGreg Kroah-Hartman
It's good to have SPDX identifiers in all files to make it easier to audit the kernel tree for correct licenses. Update the driver core files files with the correct SPDX license identifier based on the license text in the file itself. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This work is based on a script and data from Thomas Gleixner, Philippe Ombredanne, and Kate Stewart. Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org> Cc: William Breathitt Gray <vilhelm.gray@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-13Merge tag 'pm-4.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "There are no real big ticket items here this time. The most noticeable change is probably the relocation of the OPP (Operating Performance Points) framework to its own directory under drivers/ as it has grown big enough for that. Also Viresh is now going to maintain it and send pull requests for it to me, so you will see this change in the git history going forward (but still not right now). Another noticeable set of changes is the modifications of the PM core, the PCI subsystem and the ACPI PM domain to allow of more integration between system-wide suspend/resume and runtime PM. For now it's just a way to avoid resuming devices from runtime suspend unnecessarily during system suspend (if the driver sets a flag to indicate its readiness for that) and in the works is an analogous mechanism to allow devices to stay suspended after system resume. In addition to that, we have some changes related to supporting frequency-invariant CPU utilization metrics in the scheduler and in the schedutil cpufreq governor on ARM and changes to add support for device performance states to the generic power domains (genpd) framework. The rest is mostly fixes and cleanups of various sorts. Specifics: - Relocate the OPP (Operating Performance Points) framework to its own directory under drivers/ and add support for power domain performance states to it (Viresh Kumar). - Modify the PM core, the PCI bus type and the ACPI PM domain to support power management driver flags allowing device drivers to specify their capabilities and preferences regarding the handling of devices with enabled runtime PM during system suspend/resume and clean up that code somewhat (Rafael Wysocki, Ulf Hansson). - Add frequency-invariant accounting support to the task scheduler on ARM and ARM64 (Dietmar Eggemann). - Fix PM QoS device resume latency framework to prevent "no restriction" requests from overriding requests with specific requirements and drop the confusing PM_QOS_FLAG_REMOTE_WAKEUP device PM QoS flag (Rafael Wysocki). - Drop legacy class suspend/resume operations from the PM core and drop legacy bus type suspend and resume callbacks from ARM/locomo (Rafael Wysocki). - Add min/max frequency support to devfreq and clean it up somewhat (Chanwoo Choi). - Rework wakeup support in the generic power domains (genpd) framework and update some of its users accordingly (Geert Uytterhoeven). - Convert timers in the PM core to use timer_setup() (Kees Cook). - Add support for exposing the SLP_S0 (Low Power S0 Idle) residency counter based on the LPIT ACPI table on Intel platforms (Srinivas Pandruvada). - Add per-CPU PM QoS resume latency support to the ladder cpuidle governor (Ramesh Thomas). - Fix a deadlock between the wakeup notify handler and the notifier removal in the ACPI core (Ville Syrjälä). - Fix a cpufreq schedutil governor issue causing it to use stale cached frequency values sometimes (Viresh Kumar). - Fix an issue in the system suspend core support code causing wakeup events detection to fail in some cases (Rajat Jain). - Fix the generic power domains (genpd) framework to prevent the PM core from using the direct-complete optimization with it as that is guaranteed to fail (Ulf Hansson). - Fix a minor issue in the cpuidle core and clean it up a bit (Gaurav Jindal, Nicholas Piggin). - Fix and clean up the intel_idle and ARM cpuidle drivers (Jason Baron, Len Brown, Leo Yan). - Fix a couple of minor issues in the OPP framework and clean it up (Arvind Yadav, Fabio Estevam, Sudeep Holla, Tobias Jordan). - Fix and clean up some cpufreq drivers and fix a minor issue in the cpufreq statistics code (Arvind Yadav, Bhumika Goyal, Fabio Estevam, Gautham Shenoy, Gustavo Silva, Marek Szyprowski, Masahiro Yamada, Robert Jarzmik, Zumeng Chen). - Fix minor issues in the system suspend and hibernation core, in power management documentation and in the AVS (Adaptive Voltage Scaling) framework (Helge Deller, Himanshu Jha, Joe Perches, Rafael Wysocki). - Fix some issues in the cpupower utility and document that Shuah Khan is going to maintain it going forward (Prarit Bhargava, Shuah Khan)" * tag 'pm-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (88 commits) tools/power/cpupower: add libcpupower.so.0.0.1 to .gitignore tools/power/cpupower: Add 64 bit library detection intel_idle: Graceful probe failure when MWAIT is disabled cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq freezer: Fix typo in freezable_schedule_timeout() comment PM / s2idle: Clear the events_check_enabled flag cpufreq: stats: Handle the case when trans_table goes beyond PAGE_SIZE cpufreq: arm_big_little: make cpufreq_arm_bL_ops structures const cpufreq: arm_big_little: make function arguments and structure pointer const cpuidle: Avoid assignment in if () argument cpuidle: Clean up cpuidle_enable_device() error handling a bit ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue() deadlock PM / Domains: Fix genpd to deal with drivers returning 1 from ->prepare() cpuidle: ladder: Add per CPU PM QoS resume latency support PM / QoS: Fix device resume latency framework PM / domains: Rework governor code to be more consistent PM / Domains: Remove gpd_dev_ops.active_wakeup() callback soc: rockchip: power-domain: Use GENPD_FLAG_ACTIVE_WAKEUP soc: mediatek: Use GENPD_FLAG_ACTIVE_WAKEUP ARM: shmobile: pm-rmobile: Use GENPD_FLAG_ACTIVE_WAKEUP ...
2017-11-08PM / QoS: Fix device resume latency frameworkRafael J. Wysocki
The special value of 0 for device resume latency PM QoS means "no restriction", but there are two problems with that. First, device resume latency PM QoS requests with 0 as the value are always put in front of requests with positive values in the priority lists used internally by the PM QoS framework, causing 0 to be chosen as an effective constraint value. However, that 0 is then interpreted as "no restriction" effectively overriding the other requests with specific restrictions which is incorrect. Second, the users of device resume latency PM QoS have no way to specify that *any* resume latency at all should be avoided, which is an artificial limitation in general. To address these issues, modify device resume latency PM QoS to use S32_MAX as the "no constraint" value and 0 as the "no latency at all" one and rework its users (the cpuidle menu governor, the genpd QoS governor and the runtime PM framework) to follow these changes. Also add a special "n/a" value to the corresponding user space I/F to allow user space to indicate that it cannot accept any resume latencies at all for the given device. Fixes: 85dc0b8a4019 (PM / QoS: Make it possible to expose PM QoS latency constraints) Link: https://bugzilla.kernel.org/show_bug.cgi?id=197323 Reported-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Tero Kristo <t-kristo@ti.com> Reviewed-by: Ramesh Thomas <ramesh.thomas@intel.com>
2017-10-31Revert "PM / QoS: Fix device resume latency PM QoS"Rafael J. Wysocki
This reverts commit 0cc2b4e5a020 (PM / QoS: Fix device resume latency PM QoS) as it introduced regressions on multiple systems and the fix-up in commit 2a9a86d5c813 (PM / QoS: Fix default runtime_pm device resume latency) does not address all of them. The original problem that commit 0cc2b4e5a020 was attempting to fix will be addressed later. Fixes: 0cc2b4e5a020 (PM / QoS: Fix device resume latency PM QoS) Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-10-27sched/isolation: Move isolcpus= handling to the housekeeping codeFrederic Weisbecker
We want to centralize the isolation features, to be done by the housekeeping subsystem and scheduler domain isolation is a significant part of it. No intended behaviour change, we just reuse the housekeeping cpumask and core code. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1509072159-31808-11-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-24PM / QoS: Fix device resume latency PM QoSRafael J. Wysocki
The special value of 0 for device resume latency PM QoS means "no restriction", but there are two problems with that. First, device resume latency PM QoS requests with 0 as the value are always put in front of requests with positive values in the priority lists used internally by the PM QoS framework, causing 0 to be chosen as an effective constraint value. However, that 0 is then interpreted as "no restriction" effectively overriding the other requests with specific restrictions which is incorrect. Second, the users of device resume latency PM QoS have no way to specify that *any* resume latency at all should be avoided, which is an artificial limitation in general. To address these issues, modify device resume latency PM QoS to use S32_MAX as the "no constraint" value and 0 as the "no latency at all" one and rework its users (the cpuidle menu governor, the genpd QoS governor and the runtime PM framework) to follow these changes. Also add a special "n/a" value to the corresponding user space I/F to allow user space to indicate that it cannot accept any resume latencies at all for the given device. Fixes: 85dc0b8a4019 (PM / QoS: Make it possible to expose PM QoS latency constraints) Link: https://bugzilla.kernel.org/show_bug.cgi?id=197323 Reported-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Alex Shi <alex.shi@linaro.org> Cc: All applicable <stable@vger.kernel.org>
2017-09-08treewide: make "nr_cpu_ids" unsignedAlexey Dobriyan
First, number of CPUs can't be negative number. Second, different signnnedness leads to suboptimal code in the following cases: 1) kmalloc(nr_cpu_ids * sizeof(X)); "int" has to be sign extended to size_t. 2) while (loff_t *pos < nr_cpu_ids) MOVSXD is 1 byte longed than the same MOV. Other cases exist as well. Basically compiler is told that nr_cpu_ids can't be negative which can't be deduced if it is "int". Code savings on allyesconfig kernel: -3KB add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370) function old new delta coretemp_cpu_online 450 512 +62 rcu_init_one 1234 1272 +38 pci_device_probe 374 399 +25 ... pgdat_reclaimable_pages 628 556 -72 select_fallback_rq 446 369 -77 task_numa_find_cpu 1923 1807 -116 Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-30CPU / PM: expose pm_qos_resume_latency for CPUsAlex Shi
The cpu-dma PM QoS constraint impacts all the cpus in the system. There is no way to let the user to choose a PM QoS constraint per cpu. The following patch exposes to the userspace a per cpu based sysfs file in order to let the userspace to change the value of the PM QoS latency constraint. This change is inoperative in its form and the cpuidle governors have to take into account the per cpu latency constraint in addition to the global cpu-dma latency constraint in order to operate properly. BTW The pm_qos_resume_latency usage defined in Documentation/ABI/testing/sysfs-devices-power The /sys/devices/.../power/pm_qos_resume_latency_us attribute contains the PM QoS resume latency limit for the given device, which is the maximum allowed time it can take to resume the device, after it has been suspended at run time, from a resume request to the moment the device will be ready to process I/O, in microseconds. If it is equal to 0, however, this means that the PM QoS resume latency may be arbitrary. Signed-off-by: Alex Shi <alex.shi@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-08-31cpu: clean up register_cpu funcAlex Shi
This patch could reduce one branch in this function. Also make the code more readble. Signed-off-by: Alex Shi <alex.shi@linaro.org> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> To: linux-kernel@vger.kernel.org To: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-pm@vger.kernel.org Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-20drivers/base/cpu.c: use __cpu_*_mask directlyRasmus Villemoes
The only user of the lvalue-ness of the cpu_*_mask variables is in drivers/base/cpu.c, and that is mostly a work-around for the fact that not even const variables can be used in static initialization. Now that the underlying struct cpumasks are exposed we can take their address. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-05cpu: Remove bogus __ref annotation of cpu_subsys_online()Mathias Krause
In commit 0db0628d9012 ("kernel: delete __cpuinit usage from all core kernel files") cpu_up() lost its __cpuinit annotation, vanishing the need for cpu_subsys_online() to have a __ref annotation. Just drop it to be able to catch real section mismatches in the future. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-20show nohz_full cpus in sysfsRik van Riel
Currently there is no way to query which CPUs are in nohz_full mode from userspace. Export the CPU list running in nohz_full mode in sysfs, specifically in the file /sys/devices/system/cpu/nohz_full This can be used by system management tools like libvirt, openstack, and others to ensure proper task placement. Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Mike Galbraith <umgwanakikbuti@gmail.com> Acked-by: Chris Metcalf <cmetcalf@ezchip.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-20show isolated cpus in sysfsRik van Riel
After system bootup, there is no totally reliable way to see which CPUs are isolated, because the kernel may modify the CPUs specified on the isolcpus= kernel command line option. Export the CPU list that actually got isolated in sysfs, specifically in the file /sys/devices/system/cpu/isolated This can be used by system management tools like libvirt, openstack, and others to ensure proper placement of tasks. Suggested-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Mike Galbraith <umgwanakikbuti@gmail.com> Acked-by: Chris Metcalf <cmetcalf@ezchip.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-13drivers/base: use %*pb[l] to print bitmaps including cpumasks and nodemasksTejun Heo
printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. * Line termination only requires one extra space at the end of the buffer. Use PAGE_SIZE - 1 instead of PAGE_SIZE - 2 when formatting. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-07drivers: base: add cpu_device_create to support per-cpu devicesSudeep Holla
This patch adds a new function to create per-cpu devices. This helps in: 1. reusing the device infrastructure to create any cpu related attributes and corresponding sysfs instead of creating and dealing with raw kobjects directly 2. retaining the legacy path(/sys/devices/system/cpu/..) to support existing sysfs ABI 3. avoiding to create links in the bus directory pointing to the device as there would be per-cpu instance of these devices with the same name since dev->bus is not populated to cpu_sysbus on purpose Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Stephen Boyd <sboyd@codeaurora.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-07cpumask: factor out show_cpumap into separate helper functionSudeep Holla
Many sysfs *_show function use cpu{list,mask}_scnprintf to copy cpumap to the buffer aligned to PAGE_SIZE, append '\n' and '\0' to return null terminated buffer with newline. This patch creates a new helper function cpumap_print_to_pagebuf in cpumask.h using newly added bitmap_print_to_pagebuf and consolidates most of those sysfs functions using the new helper function. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Suggested-by: Stephen Boyd <sboyd@codeaurora.org> Tested-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: x86@kernel.org Cc: linux-acpi@vger.kernel.org Cc: linux-pci@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-18x86: align x86 arch with generic CPU modalias handlingArd Biesheuvel
The x86 CPU feature modalias handling existed before it was reimplemented generically. This patch aligns the x86 handling so that it (a) reuses some more code that is now generic; (b) uses the generic format for the modalias module metadata entry, i.e., it now uses 'cpu:type:x86,venVVVVfamFFFFmodMMMM:feature:,XXXX,YYYY' instead of the 'x86cpu:vendor:VVVV:family:FFFF:model:MMMM:feature:,XXXX,YYYY' that was used before. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-18cpu: add generic support for CPU feature based module autoloadingArd Biesheuvel
This patch adds support for advertising optional CPU features over udev using the modalias, and for declaring compatibility with/dependency upon such a feature in a module. The mapping between feature numbers and actual features should be provided by the architecture in a file called <asm/cpufeature.h> which exports the following functions/macros: - cpu_feature(FEAT), a preprocessor macro that maps token FEAT to a numeric index; - bool cpu_have_feature(n), returning whether this CPU has support for feature #n; - MAX_CPU_FEATURES, an upper bound for 'n' in the previous function. The feature can then be enabled by setting CONFIG_GENERIC_CPU_AUTOPROBE for the architecture. For instance, a module that registers its module init function using module_cpu_feature_match(FEAT_X, module_init_function) will be probed automatically when the CPU's support for the 'FEAT_X' feature is advertised over udev, and will only allow the module to be loaded by hand if the 'FEAT_X' feature is supported. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-30hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock()Toshi Kani
cpu_hotplug_driver_lock() serializes CPU online/offline operations when ARCH_CPU_PROBE_RELEASE is set. This lock interface is no longer necessary with the following reason: - lock_device_hotplug() now protects CPU online/offline operations, including the probe & release interfaces enabled by ARCH_CPU_PROBE_RELEASE. The use of cpu_hotplug_driver_lock() is redundant. - cpu_hotplug_driver_lock() is only valid when ARCH_CPU_PROBE_RELEASE is defined, which is misleading and is only enabled on powerpc. This patch removes the cpu_hotplug_driver_lock() interface. As a result, ARCH_CPU_PROBE_RELEASE only enables / disables the cpu probe & release interface as intended. There is no functional change in this patch. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25hotplug / x86: Add hotplug lock to missing placesToshi Kani
lock_device_hotplug[_sysfs]() serializes CPU & Memory online/offline and hotplug operations. However, this lock is not held in the debug interfaces below that initiate CPU online/offline operations. - _debug_hotplug_cpu(), cpu0 hotplug test interface enabled by CONFIG_DEBUG_HOTPLUG_CPU0. - cpu_probe_store() and cpu_release_store(), cpu hotplug test interface enabled by CONFIG_ARCH_CPU_PROBE_RELEASE. This patch changes the above interfaces to hold lock_device_hotplug(). Signed-off-by: Toshi Kani <toshi.kani@hp.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-03Merge tag 'pm+acpi-3.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: 1) ACPI-based PCI hotplug (ACPIPHP) subsystem rework and introduction of Intel Thunderbolt support on systems that use ACPI for signalling Thunderbolt hotplug events. This also should make ACPIPHP work in some cases in which it was known to have problems. From Rafael J Wysocki, Mika Westerberg and Kirill A Shutemov. 2) ACPI core code cleanups and dock station support cleanups from Jiang Liu and Rafael J Wysocki. 3) Fixes for locking problems related to ACPI device hotplug from Rafael J Wysocki. 4) ACPICA update to version 20130725 includig fixes, cleanups, support for more than 256 GPEs per GPE block and a change to make the ACPI PM Timer optional (we've seen systems without the PM Timer in the field already). One of the fixes, related to the DeRefOf operator, is necessary to prevent some Windows 8 oriented AML from causing problems to happen. From Bob Moore, Lv Zheng, and Jung-uk Kim. 5) Removal of the old and long deprecated /proc/acpi/event interface and related driver changes from Thomas Renninger. 6) ACPI and Xen changes to make the reduced hardware sleep work with the latter from Ben Guthro. 7) ACPI video driver cleanups and a blacklist of systems that should not tell the BIOS that they are compatible with Windows 8 (or ACPI backlight and possibly other things will not work on them). From Felipe Contreras. 8) Assorted ACPI fixes and cleanups from Aaron Lu, Hanjun Guo, Kuppuswamy Sathyanarayanan, Lan Tianyu, Sachin Kamat, Tang Chen, Toshi Kani, and Wei Yongjun. 9) cpufreq ondemand governor target frequency selection change to reduce oscillations between min and max frequencies (essentially, it causes the governor to choose target frequencies proportional to load) from Stratos Karafotis. 10) cpufreq fixes allowing sysfs attributes file permissions to be preserved over suspend/resume cycles Srivatsa S Bhat. 11) Removal of Device Tree parsing for CPU device nodes from multiple cpufreq drivers that required some changes related to of_get_cpu_node() to be made in a few architectures and in the driver core. From Sudeep KarkadaNagesha. 12) cpufreq core fixes and cleanups related to mutual exclusion and driver module references from Viresh Kumar, Lukasz Majewski and Rafael J Wysocki. 13) Assorted cpufreq fixes and cleanups from Amit Daniel Kachhap, Bartlomiej Zolnierkiewicz, Hanjun Guo, Jingoo Han, Joseph Lo, Julia Lawall, Li Zhong, Mark Brown, Sascha Hauer, Stephen Boyd, Stratos Karafotis, and Viresh Kumar. 14) Fixes to prevent race conditions in coupled cpuidle from happening from Colin Cross. 15) cpuidle core fixes and cleanups from Daniel Lezcano and Tuukka Tikkanen. 16) Assorted cpuidle fixes and cleanups from Daniel Lezcano, Geert Uytterhoeven, Jingoo Han, Julia Lawall, Linus Walleij, and Sahara. 17) System sleep tracing changes from Todd E Brandt and Shuah Khan. 18) PNP subsystem conversion to using struct dev_pm_ops for power management from Shuah Khan. * tag 'pm+acpi-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (217 commits) cpufreq: Don't use smp_processor_id() in preemptible context cpuidle: coupled: fix race condition between pokes and safe state cpuidle: coupled: abort idle if pokes are pending cpuidle: coupled: disable interrupts after entering safe state ACPI / hotplug: Remove containers synchronously driver core / ACPI: Avoid device hot remove locking issues cpufreq: governor: Fix typos in comments cpufreq: governors: Remove duplicate check of target freq in supported range cpufreq: Fix timer/workqueue corruption due to double queueing ACPI / EC: Add ASUSTEK L4R to quirk list in order to validate ECDT ACPI / thermal: Add check of "_TZD" availability and evaluating result cpufreq: imx6q: Fix clock enable balance ACPI: blacklist win8 OSI for buggy laptops cpufreq: tegra: fix the wrong clock name cpuidle: Change struct menu_device field types cpuidle: Add a comment warning about possible overflow cpuidle: Fix variable domains in get_typical_interval() cpuidle: Fix menu_device->intervals type cpuidle: CodingStyle: Break up multiple assignments on single line cpuidle: Check called function parameter in get_typical_interval() ...
2013-08-21driver/core: cpu: initialize of_node in cpu's device strutureSudeep KarkadaNagesha
CPUs are also registered as devices but the of_node in these cpu devices are not initialized. Currently different drivers requiring to access cpu device node are parsing the nodes themselves and initialising the of_node in cpu device. The of_node in all the cpu devices needs to be initialized properly and at one place. The best place to update this is CPU subsystem driver when registering the cpu devices. The OF/DT core library now provides of_get_cpu_node to retrieve a cpu device node for a given logical index by abstracting the architecture specific details. This patch uses of_get_cpu_node to assign of_node when registering the cpu devices. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
2013-08-12driver core / cpu: Check if NUMA node is valid before bringing CPU upRafael J. Wysocki
There is a potential race condition between cpu_subsys_online() and either acpi_processor_remove() or remove_memory() that execute try_offline_node(). Namely, it is possible that cpu_subsys_online() will run right after the CPUs NUMA node has been put offline and cpu_to_node() executed by it will return NUMA_NO_NODE (-1). In that case the CPU is gone and it doesn't make sense to call cpu_up() for it, so make cpu_subsys_online() return -ENODEV then. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-14drivers: delete __cpuinit usage from all remaining drivers filesPaul Gortmaker
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the remaining one-off uses of the __cpuinit macros from all C files in the drivers/* directory. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-03Merge tag 'pm+acpi-3.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI updates from Rafael Wysocki: "This time the total number of ACPI commits is slightly greater than the number of cpufreq commits, but Viresh Kumar (who works on cpufreq) remains the most active patch submitter. To me, the most significant change is the addition of offline/online device operations to the driver core (with the Greg's blessing) and the related modifications of the ACPI core hotplug code. Next are the freezer updates from Colin Cross that should make the freezing of tasks a bit less heavy weight. We also have a couple of regression fixes, a number of fixes for issues that have not been identified as regressions, two new drivers and a bunch of cleanups all over. Highlights: - Hotplug changes to support graceful hot-removal failures. It sometimes is necessary to fail device hot-removal operations gracefully if they cannot be carried out completely. For example, if memory from a memory module being hot-removed has been allocated for the kernel's own use and cannot be moved elsewhere, it's desirable to fail the hot-removal operation in a graceful way rather than to crash the kernel, but currenty a success or a kernel crash are the only possible outcomes of an attempted memory hot-removal. Needless to say, that is not a very attractive alternative and it had to be addressed. However, in order to make it work for memory, I first had to make it work for CPUs and for this purpose I needed to modify the ACPI processor driver. It's been split into two parts, a resident one handling the low-level initialization/cleanup and a modular one playing the actual driver's role (but it binds to the CPU system device objects rather than to the ACPI device objects representing processors). That's been sort of like a live brain surgery on a patient who's riding a bike. So this is a little scary, but since we found and fixed a couple of regressions it caused to happen during the early linux-next testing (a month ago), nobody has complained. As a bonus we remove some duplicated ACPI hotplug code, because the ACPI-based CPU hotplug is now going to use the common ACPI hotplug code. - Lighter weight freezing of tasks. These changes from Colin Cross and Mandeep Singh Baines are targeted at making the freezing of tasks a bit less heavy weight operation. They reduce the number of tasks woken up every time during the freezing, by using the observation that the freezer simply doesn't need to wake up some of them and wait for them all to call refrigerator(). The time needed for the freezer to decide to report a failure is reduced too. Also reintroduced is the check causing a lockdep warining to trigger when try_to_freeze() is called with locks held (which is generally unsafe and shouldn't happen). - cpufreq updates First off, a commit from Srivatsa S Bhat fixes a resume regression introduced during the 3.10 cycle causing some cpufreq sysfs attributes to return wrong values to user space after resume. The fix is kind of fresh, but also it's pretty obvious once Srivatsa has identified the root cause. Second, we have a new freqdomain_cpus sysfs attribute for the acpi-cpufreq driver to provide information previously available via related_cpus. From Lan Tianyu. Finally, we fix a number of issues, mostly related to the CPUFREQ_POSTCHANGE notifier and cpufreq Kconfig options and clean up some code. The majority of changes from Viresh Kumar with bits from Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia, Arnd Bergmann, and Tang Yuantian. - ACPICA update A usual bunch of updates from the ACPICA upstream. During the 3.4 cycle we introduced support for ACPI 5 extended sleep registers, but they are only supposed to be used if the HW-reduced mode bit is set in the FADT flags and the code attempted to use them without checking that bit. That caused suspend/resume regressions to happen on some systems. Fix from Lv Zheng causes those registers to be used only if the HW-reduced mode bit is set. Apart from this some other ACPICA bugs are fixed and code cleanups are made by Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and Zhang Rui. - cpuidle updates New driver for Xilinx Zynq processors is added by Michal Simek. Multidriver support simplification, addition of some missing kerneldoc comments and Kconfig-related fixes come from Daniel Lezcano. - ACPI power management updates Changes to make suspend/resume work correctly in Xen guests from Konrad Rzeszutek Wilk, sparse warning fix from Fengguang Wu and cleanups and fixes of the ACPI device power state selection routine. - ACPI documentation updates Some previously missing pieces of ACPI documentation are added by Lv Zheng and Aaron Lu (hopefully, that will help people to uderstand how the ACPI subsystem works) and one outdated doc is updated by Hanjun Guo. - Assorted ACPI updates We finally nailed down the IA-64 issue that was the reason for reverting commit 9f29ab11ddbf ("ACPI / scan: do not match drivers against objects having scan handlers"), so we can fix it and move the ACPI scan handler check added to the ACPI video driver back to the core. A mechanism for adding CMOS RTC address space handlers is introduced by Lan Tianyu to allow some EC-related breakage to be fixed on some systems. A spec-compliant implementation of acpi_os_get_timer() is added by Mika Westerberg. The evaluation of _STA is added to do_acpi_find_child() to avoid situations in which a pointer to a disabled device object is returned instead of an enabled one with the same _ADR value. From Jeff Wu. Intel BayTrail PCH (Platform Controller Hub) support is added to the ACPI driver for Intel Low-Power Subsystems (LPSS) and that driver is modified to work around a couple of known BIOS issues. Changes from Mika Westerberg and Heikki Krogerus. The EC driver is fixed by Vasiliy Kulikov to use get_user() and put_user() instead of dereferencing user space pointers blindly. Code cleanups are made by Bjorn Helgaas, Nicholas Mazzuca and Toshi Kani. - Assorted power management updates The "runtime idle" helper routine is changed to take the return values of the callbacks executed by it into account and to call rpm_suspend() if they return 0, which allows us to reduce the overall code bloat a bit (by dropping some code that's not necessary any more after that modification). The runtime PM documentation is updated by Alan Stern (to reflect the "runtime idle" behavior change). New trace points for PM QoS are added by Sahara (<keun-o.park@windriver.com>). PM QoS documentation is updated by Lan Tianyu. Code cleanups are made and minor issues are addressed by Bernie Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan. - devfreq updates New driver for the Exynos5-bus device from Abhilash Kesavan. Minor cleanups, fixes and MAINTAINERS update from MyungJoo Ham, Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun. - OMAP power management updates Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver updates from Andrii Tseglytskyi and Nishanth Menon." * tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits) cpufreq: Fix cpufreq regression after suspend/resume ACPI / PM: Fix possible NULL pointer deref in acpi_pm_device_sleep_state() PM / Sleep: Warn about system time after resume with pm_trace cpufreq: don't leave stale policy pointer in cdbs->cur_policy acpi-cpufreq: Add new sysfs attribute freqdomain_cpus cpufreq: make sure frequency transitions are serialized ACPI: implement acpi_os_get_timer() according the spec ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan ACPI: Add CMOS RTC Operation Region handler support ACPI / processor: Drop unused variable from processor_perflib.c cpufreq: tegra: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: s3c64xx: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: omap: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: imx6q: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: exynos: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: dbx500: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: davinci: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: arm-big-little: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: powernow-k8: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: pcc: call CPUFREQ_POSTCHANGE notfier in error cases ...
2013-05-30CPU: Fix sysfs cpu/online of offlined CPUsToshi Kani
As reported by Dave Hansen, sysfs cpu/online shows 1 for offlined CPUs at boot. Fix this problem by initializing dev.offline with cpu_online() when registering a CPU. References: https://lkml.org/lkml/2013/5/29/403 Reported-and-tested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-21cpu: make sure that cpu/online file created before KOBJ_ADD is emittedIgor Mammedov
It fixes race between udev and hotplugged CPU registration by defining "online" attribute statically, so that device_add() would create it before notifying udev about new CPU. Original issue report is at https://lkml.org/lkml/2012/4/30/198 " > On Mon, Apr 30, 2012 at 11:36:23AM -0400, Konrad Rzeszutek Wilk wrote: > > Hey Greg, > > > > Hoping you can help with some guidance on how to fix this. > > > > The issue is with CPU hotplug is that when a CPU goes up > > it calls 'arch_register_cpu' which eventually calls > > register_cpu. That function does these two things: > > > > 251 error = device_register(&cpu->dev); > > 252 if (!error && cpu->hotpluggable) > > 253 register_cpu_control(cpu); > > > > and the device_register creates a nice little SysFS directory: > > > > /sys/devices/system/cpu/cpu2/ which at line 251 has the 'add' attribute > > but no 'online' attribute. udev then tries to echo 1 to the 'online' > > and it we get: > > udevd-work[2421]: error opening ATTR{/sys/devices/system/cpu/cpu2/online} for writing: No such file or directory > > > > Line 253 creates said 'online' and at that time udev [or the system admin] > > can write 1 to 'online' and the CPU goes up. > > > > So .. any thoughts? Is there some way to inhibit from uevent being sent > > until line 253 has run? " Signed-off-by: Igor Mammedov <imammedo@redhat.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-21cpu: fix "crash_notes" and "crash_notes_size" leaks in register_cpu()Igor Mammedov
"crash_notes" and "crash_notes_size" are dynamically created with device_create_file() but aren't deleted anywhere. Define "crash_notes" and "crash_notes_size" statically via attribute groups so that device_register would create them automatically and files would be destroyed when CPU is destroyed. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-12ACPI / processor: Use common hotplug infrastructureRafael J. Wysocki
Split the ACPI processor driver into two parts, one that is non-modular, resides in the ACPI core and handles the enumeration and hotplug of processors and one that implements the rest of the existing processor driver functionality. The non-modular part uses an ACPI scan handler object to enumerate processors on the basis of information provided by the ACPI namespace and to hook up with the common ACPI hotplug infrastructure. It also populates the ACPI handle of each processor device having a corresponding object in the ACPI namespace, which allows the driver proper to bind to those devices, and makes the driver bind to them if it is readily available (i.e. loaded) when the scan handler's .attach() routine is running. There are a few reasons to make this change. First, switching the ACPI processor driver to using the common ACPI hotplug infrastructure reduces code duplication and size considerably, even though a new file is created along with a header comment etc. Second, since the common hotplug code attempts to offline devices before starting the (non-reversible) removal procedure, it will abort (and possibly roll back) hot-remove operations involving processors if cpu_down() returns an error code for one of them instead of continuing them blindly (if /sys/firmware/acpi/hotplug/force_remove is unset). That is a more desirable behavior than what the current code does. Finally, the separation of the scan/hotplug part from the driver proper makes it possible to simplify the driver's .remove() routine, because it doesn't need to worry about the possible cleanup related to processor removal any more (the scan/hotplug part is responsible for that now) and can handle device removal and driver removal symmetricaly (i.e. as appropriate). Some user-visible changes in sysfs are made (for example, the 'sysdev' link from the ACPI device node to the processor device's directory is gone and a 'physical_node' link is present instead and a corresponding 'firmware_node' is present in the processor device's directory, the processor driver is now visible under /sys/bus/cpu/drivers/ and bound to the processor device), but that shouldn't affect the functionality that users care about (frequency scaling, C-states and thermal management). Tested on my venerable Toshiba Portege R500. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
2013-05-12Driver core: Use generic offline/online for CPU offline/onlineRafael J. Wysocki
Rework the CPU hotplug code in drivers/base/cpu.c to use the generic offline/online support introduced previously instead of its own CPU-specific code. For this purpose, modify cpu_subsys to provide offline and online callbacks for CONFIG_HOTPLUG_CPU set and remove the code handling the CPU-specific 'online' sysfs attribute. This modification is not supposed to change the user-observable behavior of the kernel (i.e. the 'online' attribute will be present in exactly the same place in sysfs and should trigger exactly the same actions as before). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
2013-04-29numa, cpu hotplug: change links of CPU and node when changing node number by ↵Yasuaki Ishimatsu
onlining CPU When booting x86 system contains memoryless node, node numbers of CPUs on memoryless node were changed to nearest online node number by init_cpu_to_node() because the node is not online. In my system, node numbers of cpu#30-44 and 75-89 were changed from 2 to 0 as follows: $ numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 node 0 size: 32394 MB node 0 free: 27898 MB node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 node 1 size: 32768 MB node 1 free: 30335 MB If we hot add memory to memoryless node and offine/online all CPUs on the node, node numbers of these CPUs are changed to correct node numbers by srat_detect_node() because the node become online. In this case, node numbers of cpu#30-44 and 75-89 were changed from 0 to 2 in my system as follows: $ numactl --hardware available: 3 nodes (0-2) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 node 0 size: 32394 MB node 0 free: 27218 MB node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 node 1 size: 32768 MB node 1 free: 30014 MB node 2 cpus: 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 node 2 size: 16384 MB node 2 free: 16384 MB But "cpu to node" and "node to cpu" links were not changed as follows: $ ls /sys/devices/system/cpu/cpu30/|grep node node0 $ ls /sys/devices/system/node/node0/|grep cpu30 cpu30 "numactl --hardware" shows that cpu30 belongs to node 2. But sysfs links does not change. This patch changes "cpu to node" and "node to cpu" links when node number changed by onlining CPU. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-03sysfs: fix crash_notes_size build warningArnd Bergmann
commit eca4549f57 "sysfs: Add crash_notes_size to export percpu note size" adds a printk that outputs a size_t value as %lu when it should be %zu, resulting in this warning. drivers/base/cpu.c: In function 'show_crash_notes_size': drivers/base/cpu.c:142:2: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'unsigned int' [-Wformat=] Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-29sysfs: Add crash_notes_size to export percpu note sizeZhang Yanfei
For percpu notes, we are exporting only address and not size. So the userspace tool kexec-tools is putting an upper limit of 1024 and putting the value in p_memsz and p_filesz fields. So the patch add the new sysfile crash_notes_size to export the exact percpu note size and let the kexec-tools parse it intead of using 1024. The idea came from Vivek Goyal. And a later patch will be sent to kexec-tools to let it parse the size. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-16drivers/base/cpu.c: Fix typo in commentRalf Baechle
[ We should make fun of people who can't speel too, but then we'd have no time for any real work at all - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17sched: Remove stale power aware scheduling remnants and dysfunctional knobsPeter Zijlstra
It's been broken forever (i.e. it's not scheduling in a power aware fashion), as reported by Suresh and others sending patches, and nobody cares enough to fix it properly ... so remove it to make space free for something better. There's various problems with the code as it stands today, first and foremost the user interface which is bound to topology levels and has multiple values per level. This results in a state explosion which the administrator or distro needs to master and almost nobody does. Furthermore large configuration state spaces aren't good, it means the thing doesn't just work right because it's either under so many impossibe to meet constraints, or even if there's an achievable state workloads have to be aware of it precisely and can never meet it for dynamic workloads. So pushing this kind of decision to user-space was a bad idea even with a single knob - it's exponentially worse with knobs on every node of the topology. There is a proposal to replace the user interface with a single 3 state knob: sched_balance_policy := { performance, power, auto } where 'auto' would be the preferred default which looks at things like Battery/AC mode and possible cpufreq state or whatever the hw exposes to show us power use expectations - but there's been no progress on it in the past many months. Aside from that, the actual implementation of the various knobs is known to be broken. There have been sporadic attempts at fixing things but these always stop short of reaching a mergable state. Therefore this wholesale removal with the hopes of spurring people who care to come forward once again and work on a coherent replacement. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1326104915.2442.53.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-09Merge 3.3-rc6 into driver-core-nextGreg Kroah-Hartman
This was done to resolve a conflict in the drivers/base/cpu.c file. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-08driver-core: cpu: fix kobject warning when hotplugging a cpuGreg Kroah-Hartman
Due to the sysdev conversion to struct device, the cpu objects get reused when adding a cpu after offlining it, which causes a big warning that the kobject portion is not properly initialized. So clear out the object before we register it again, so all is quiet. Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>