summaryrefslogtreecommitdiff
path: root/drivers/thermal
AgeCommit message (Collapse)Author
2021-07-10Merge tag 'thermal-v5.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux Pull thermal updates from Daniel Lezcano: - Add rk3568 sensor support (Finley Xiao) - Add missing MODULE_DEVICE_TABLE for the Spreadtrum sensor (Chunyan Zhang) - Export additionnal attributes for the int340x thermal processor (Srinivas Pandruvada) - Add SC7280 compatible for the tsens driver (Rajeshwari Ravindra Kamble) - Fix kernel documentation for thermal_zone_device_unregister() and use devm_platform_get_and_ioremap_resource() (Yang Yingliang) - Fix coefficient calculations for the rcar_gen3 sensor driver (Niklas Söderlund) - Fix shadowing variable rcar_gen3_ths_tj_1 (Geert Uytterhoeven) - Add missing of_node_put() for the iMX and Spreadtrum sensors (Krzysztof Kozlowski) - Add tegra3 thermal sensor DT bindings (Dmitry Osipenko) - Stop the thermal zone monitoring when unregistering it to prevent a temperature update without the 'get_temp' callback (Dmitry Osipenko) - Add rk3568 DT bindings, convert bindings to yaml schemas and add the corresponding compatible in the Rockchip sensor (Ezequiel Garcia) - Add the sc8180x compatible for the Qualcomm tsensor (Bjorn Andersson) - Use the find_first_zero_bit() function instead of custom code (Andy Shevchenko) - Fix the kernel doc for the device cooling device (Yang Li) - Reorg the processor thermal int340x to set the scene for the PCI mmio driver (Srinivas Pandruvada) - Add PCI MMIO driver for the int340x processor thermal driver (Srinivas Pandruvada) - Add hwmon sensors for the mediatek sensor (Frank Wunderlich) - Fix warning for return value reported by Smatch for the int340x thermal processor (Srinivas Pandruvada) - Fix wrong register access and decoding for the int340x thermal processor (Srinivas Pandruvada) * tag 'thermal-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (23 commits) thermal/drivers/int340x/processor_thermal: Fix tcc setting thermal/drivers/int340x/processor_thermal: Fix warning for return value thermal/drivers/mediatek: Add sensors-support thermal/drivers/int340x/processor_thermal: Add PCI MMIO based thermal driver thermal/drivers/int340x/processor_thermal: Split enumeration and processing part thermal: devfreq_cooling: Fix kernel-doc thermal/drivers/intel/intel_soc_dts_iosf: Switch to use find_first_zero_bit() dt-bindings: thermal: tsens: Add sc8180x compatible dt-bindings: rockchip-thermal: Support the RK3568 SoC compatible dt-bindings: thermal: convert rockchip-thermal to json-schema thermal/core/thermal_of: Stop zone device before unregistering it dt-bindings: thermal: Add binding for Tegra30 thermal sensor thermal/drivers/sprd: Add missing of_node_put for loop iteration thermal/drivers/imx_sc: Add missing of_node_put for loop iteration thermal/drivers/rcar_gen3_thermal: Do not shadow rcar_gen3_ths_tj_1 thermal/drivers/rcar_gen3_thermal: Fix coefficient calculations thermal/drivers/st: Use devm_platform_get_and_ioremap_resource() thermal/core: Correct function name thermal_zone_device_unregister() dt-bindings: thermal: tsens: Add compatible string to TSENS binding for SC7280 thermal/drivers/int340x: processor_thermal: Export additional attributes ...
2021-07-04thermal/drivers/int340x/processor_thermal: Fix tcc settingSrinivas Pandruvada
The following fixes are done for tcc sysfs interface: - TCC is 6 bits only from bit 29-24 - TCC of 0 is valid - When BIT(31) is set, this register is read only - Check for invalid tcc value - Error for negative values Fixes: fdf4f2fb8e899 ("drivers: thermal: processor_thermal_device: Export sysfs interface for TCC offset") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: stable@vger.kernel.org Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210628215803.75038-1-srinivas.pandruvada@linux.intel.com
2021-07-04thermal/drivers/int340x/processor_thermal: Fix warning for return valueSrinivas Pandruvada
Fix smatch warnings: drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c:258 proc_thermal_pci_probe() warn: missing error code 'ret' Use PTR_ERR to return failure of thermal_zone_device_register(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210628183232.62877-1-srinivas.pandruvada@linux.intel.com
2021-07-04thermal/drivers/mediatek: Add sensors-supportFrank Wunderlich
Add HWMON-support to mediateks thermal driver to allow lm-sensors userspace tools read soc temperature Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210608154530.70074-1-linux@fw-web.de
2021-07-04thermal/drivers/int340x/processor_thermal: Add PCI MMIO based thermal driverSrinivas Pandruvada
Add a new PCI driver which register a thermal zone and allows to get notification for threshold violation by a RW trip point. These notifications are delivered from the device using MSI based interrupt. The main difference between this new PCI driver and the existing one is that the temperature and trip points directly use PCI MMIO instead of using ACPI methods. This driver registers a thermal zone "TCPU_PCI" in addition to the legacy processor thermal device, which uses ACPI companion device to set name, temperature and trips. This driver is enabled for AlderLake. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210525204811.3793651-3-srinivas.pandruvada@linux.intel.com
2021-07-04thermal/drivers/int340x/processor_thermal: Split enumeration and processing partSrinivas Pandruvada
Remove enumeration part from the processor_thermal_device to two different modules. One for ACPI and one for PCI: ACPI enumeration: int3401_thermal PCI part: processor_thermal_device_pci_legacy The current processor_thermal_device now just implements interface functions to be used by the ACPI and PCI enumeration module. This is done by: 1. Make functions proc_thermal_add() and proc_thermal_remove() non static and export them for usage in other processor_thermal_device_pci_legacy.c and in int3401_thermal.c. 2. Move the sysfs file creation for TCC offset and power limit attribute group to the proc_thermal_add() from the individual enumeration callbacks for PCI and ACPI. 3. Create new interface functions proc_thermal_mmio_add() and proc_thermal_mmio_remove() which will be called from the processor_thermal_device_pci_legacy module. 4. Export proc_thermal_resume(), so that it can be used by power management callbacks. 5. Remove special check for double enumeration as it never happens. While here, fix some cleanup on error conditions in proc_thermal_add(). No functional changes are expected with this change. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210525204811.3793651-2-srinivas.pandruvada@linux.intel.com
2021-07-04thermal: devfreq_cooling: Fix kernel-docYang Li
Fix function name in devfreq_cooling.c comment to remove a warning found by kernel-doc. drivers/thermal/devfreq_cooling.c:479: warning: expecting prototype for devfreq_cooling_em_register_power(). Prototype was for devfreq_cooling_em_register() instead. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/1623223350-128104-1-git-send-email-yang.lee@linux.alibaba.com
2021-07-04thermal/drivers/intel/intel_soc_dts_iosf: Switch to use find_first_zero_bit()Andy Shevchenko
Switch to use find_first_zero_bit() instead of open-coded variant. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210618153451.89246-1-andriy.shevchenko@linux.intel.com
2021-07-04thermal/core/thermal_of: Stop zone device before unregistering itDmitry Osipenko
Zone device is enabled after thermal_zone_of_sensor_register() completion, but it's not disabled before senor is unregistered, leaving temperature polling active. This results in accessing a disabled zone device and produces a warning about this problem. Stop zone device before unregistering it in order to fix this "use-after-free" problem. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210616190417.32214-3-digetx@gmail.com
2021-06-28Merge tag 'sched-core-2021-06-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler udpates from Ingo Molnar: - Changes to core scheduling facilities: - Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables coordinated scheduling across SMT siblings. This is a much requested feature for cloud computing platforms, to allow the flexible utilization of SMT siblings, without exposing untrusted domains to information leaks & side channels, plus to ensure more deterministic computing performance on SMT systems used by heterogenous workloads. There are new prctls to set core scheduling groups, which allows more flexible management of workloads that can share siblings. - Fix task->state access anti-patterns that may result in missed wakeups and rename it to ->__state in the process to catch new abuses. - Load-balancing changes: - Tweak newidle_balance for fair-sched, to improve 'memcache'-like workloads. - "Age" (decay) average idle time, to better track & improve workloads such as 'tbench'. - Fix & improve energy-aware (EAS) balancing logic & metrics. - Fix & improve the uclamp metrics. - Fix task migration (taskset) corner case on !CONFIG_CPUSET. - Fix RT and deadline utilization tracking across policy changes - Introduce a "burstable" CFS controller via cgroups, which allows bursty CPU-bound workloads to borrow a bit against their future quota to improve overall latencies & batching. Can be tweaked via /sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us. - Rework assymetric topology/capacity detection & handling. - Scheduler statistics & tooling: - Disable delayacct by default, but add a sysctl to enable it at runtime if tooling needs it. Use static keys and other optimizations to make it more palatable. - Use sched_clock() in delayacct, instead of ktime_get_ns(). - Misc cleanups and fixes. * tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits) sched/doc: Update the CPU capacity asymmetry bits sched/topology: Rework CPU capacity asymmetry detection sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag psi: Fix race between psi_trigger_create/destroy sched/fair: Introduce the burstable CFS controller sched/uclamp: Fix uclamp_tg_restrict() sched/rt: Fix Deadline utilization tracking during policy change sched/rt: Fix RT utilization tracking during policy change sched: Change task_struct::state sched,arch: Remove unused TASK_STATE offsets sched,timer: Use __set_current_state() sched: Add get_current_state() sched,perf,kvm: Fix preemption condition sched: Introduce task_is_running() sched: Unbreak wakeups sched/fair: Age the average idle time sched/cpufreq: Consider reduced CPU capacity in energy calculation sched/fair: Take thermal pressure into account while estimating energy thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressure sched/fair: Return early from update_tg_cfs_load() if delta == 0 ...
2021-06-23Merge remote-tracking branch 'regulator/for-5.14' into regulator-nextMark Brown
2021-06-21thermal: Use generic HW-protection shutdown APIMatti Vaittinen
The hardware shutdown function was exported from kernel/reboot for other subsystems to use. Logic is copied from the thermal_core. The protection mutex is replaced by an atomic_t to allow calls also from an IRQ context. Also the WARN() was replaced by pr_emerg() based on discussions here: https://lore.kernel.org/lkml/YJuPwAZroVZ%2Fw633@alley/ and here: https://lore.kernel.org/linux-iommu/20210331093104.383705-4-geert+renesas@glider.be/ Use the exported API instead of implementing own just for the thermal_core. Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/5531e89d9e710f5d10e7cdce3ee58957335b9e03.1622628333.git.matti.vaittinen@fi.rohmeurope.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-06-18Merge branch 'sched/urgent' into sched/core, to resolve conflictsIngo Molnar
This commit in sched/urgent moved the cfs_rq_is_decayed() function: a7b359fc6a37: ("sched/fair: Correctly insert cfs_rq's to list on unthrottle") and this fresh commit in sched/core modified it in the old location: 9e077b52d86a: ("sched/pelt: Check that *_avg are null when *_sum are") Merge the two variants. Conflicts: kernel/sched/fair.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-06-17thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressureLukasz Luba
The thermal pressure signal gives information to the scheduler about reduced CPU capacity due to thermal. It is based on a value stored in a per-cpu 'thermal_pressure' variable. The online CPUs will get the new value there, while the offline won't. Unfortunately, when the CPU is back online, the value read from per-cpu variable might be wrong (stale data). This might affect the scheduler decisions, since it sees the CPU capacity differently than what is actually available. Fix it by making sure that all online+offline CPUs would get the proper value in their per-cpu variable when thermal framework sets capping. Fixes: f12e4f66ab6a3 ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping") Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://lore.kernel.org/r/20210614191030.22241-1-lukasz.luba@arm.com
2021-06-14thermal/drivers/sprd: Add missing of_node_put for loop iterationKrzysztof Kozlowski
Early exits from for_each_available_child_of_node() should decrement the node reference counter. Reported by Coccinelle: drivers/thermal/sprd_thermal.c:387:1-23: WARNING: Function "for_each_child_of_node" should have of_node_put() before goto around lines 391. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Acked-by: Chunyan Zhang <zhang.lyra@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210614192230.19248-2-krzysztof.kozlowski@canonical.com
2021-06-14thermal/drivers/imx_sc: Add missing of_node_put for loop iterationKrzysztof Kozlowski
Early exits from for_each_available_child_of_node() should decrement the node reference counter. Reported by Coccinelle: drivers/thermal/imx_sc_thermal.c:93:1-33: WARNING: Function "for_each_available_child_of_node" should have of_node_put() before return around line 97. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Reviewed-by: Jacky Bai <ping.bai@nxp.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210614192230.19248-1-krzysztof.kozlowski@canonical.com
2021-06-14thermal/drivers/rcar_gen3_thermal: Do not shadow rcar_gen3_ths_tj_1Geert Uytterhoeven
With -Wshadow: drivers/thermal/rcar_gen3_thermal.c: In function ‘rcar_gen3_thermal_probe’: drivers/thermal/rcar_gen3_thermal.c:310:13: warning: declaration of ‘rcar_gen3_ths_tj_1’ shadows a global declaration [-Wshadow] 310 | const int *rcar_gen3_ths_tj_1 = of_device_get_match_data(dev); | ^~~~~~~~~~~~~~~~~~ drivers/thermal/rcar_gen3_thermal.c:246:18: note: shadowed declaration is here 246 | static const int rcar_gen3_ths_tj_1 = 126; | ^~~~~~~~~~~~~~~~~~ To add to the confusion, the local variable has a different type. Fix the shadowing by renaming the local variable to ths_tj_1. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/9ea7e65d0331daba96f9a7925cb3d12d2170efb1.1623076804.git.geert+renesas@glider.be
2021-06-14thermal/drivers/rcar_gen3_thermal: Fix coefficient calculationsNiklas Söderlund
The fixed value of 157 used in the calculations are only correct for M3-W, on other Gen3 SoC it should be 167. The constant can be derived correctly from the static TJ_3 constant and the SoC specific TJ_1 value. Update the calculation be correct on all Gen3 SoCs. Fixes: 4eb39f79ef44 ("thermal: rcar_gen3_thermal: Update value of Tj_1") Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210605085211.564909-1-niklas.soderlund+renesas@ragnatech.se
2021-06-14thermal/drivers/st: Use devm_platform_get_and_ioremap_resource()Yang Yingliang
Use devm_platform_get_and_ioremap_resource() to simplify code and remove error message which within devm_platform_get_and_ioremap_resource() already. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210605120205.2459578-1-yangyingliang@huawei.com
2021-06-12thermal/core: Correct function name thermal_zone_device_unregister()Yang Yingliang
Fix the following make W=1 kernel build warning: drivers/thermal/thermal_core.c:1376: warning: expecting prototype for thermal_device_unregister(). Prototype was for thermal_zone_device_unregister() instead Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210517051020.3463536-1-yangyingliang@huawei.com
2021-06-11thermal/drivers/int340x: processor_thermal: Export additional attributesSrinivas Pandruvada
Export additional attributes: ddr_data_rate (RO) : Show current DDR (Double Data Rate) data rate. rfi_restriction (RW) : Show or set current state for RFI (Radio Frequency Interference) protection. These attributes use mailbox commands to get/set information. Here command codes are: 0x0007: Read RFI restriction 0x0107: Read DDR data rate 0x0008: Write RFI restriction Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210517061441.1921901-3-srinivas.pandruvada@linux.intel.com
2021-06-11thermal/drivers/int340x: processor_thermal: Export mailbox interfaceSrinivas Pandruvada
Export the mailbox interface to be used by other modules. Also change command id and response from u8 to u32 data type. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210517061441.1921901-2-srinivas.pandruvada@linux.intel.com
2021-06-11thermal/drivers/sprd: Add missing MODULE_DEVICE_TABLEChunyan Zhang
MODULE_DEVICE_TABLE is used to extract the device information out of the driver and builds a table when being compiled. If using this macro, kernel can find the driver if available when the device is plugged in, and then loads that driver and initializes the device. Fixes: 554fdbaf19b18 ("thermal: sprd: Add Spreadtrum thermal driver support") Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210512093752.243168-1-zhang.lyra@gmail.com
2021-06-11thermal/drivers/rockchip: Support RK3568 SoCs in the thermal driverFinley Xiao
The RK3568 SoCs have two Temperature Sensors, channel 0 is for CPU, channel 1 is for GPU. Signed-off-by: Finley Xiao <finley.xiao@rock-chips.com> Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210506175514.168365-5-ezequiel@collabora.com
2021-06-06Merge tag 'x86_urgent_for_v5.13-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "A bunch of x86/urgent stuff accumulated for the last two weeks so lemme unload it to you. It should be all totally risk-free, of course. :-) - Fix out-of-spec hardware (1st gen Hygon) which does not implement MSR_AMD64_SEV even though the spec clearly states so, and check CPUID bits first. - Send only one signal to a task when it is a SEGV_PKUERR si_code type. - Do away with all the wankery of reserving X amount of memory in the first megabyte to prevent BIOS corrupting it and simply and unconditionally reserve the whole first megabyte. - Make alternatives NOP optimization work at an arbitrary position within the patched sequence because the compiler can put single-byte NOPs for alignment anywhere in the sequence (32-bit retpoline), vs our previous assumption that the NOPs are only appended. - Force-disable ENQCMD[S] instructions support and remove update_pasid() because of insufficient protection against FPU state modification in an interrupt context, among other xstate horrors which are being addressed at the moment. This one limits the fallout until proper enablement. - Use cpu_feature_enabled() in the idxd driver so that it can be build-time disabled through the defines in disabled-features.h. - Fix LVT thermal setup for SMI delivery mode by making sure the APIC LVT value is read before APIC initialization so that softlockups during boot do not happen at least on one machine. - Mark all legacy interrupts as legacy vectors when the IO-APIC is disabled and when all legacy interrupts are routed through the PIC" * tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Check SME/SEV support in CPUID first x86/fault: Don't send SIGSEGV twice on SEGV_PKUERR x86/setup: Always reserve the first 1M of RAM x86/alternative: Optimize single-byte NOPs at an arbitrary position x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid() dmaengine: idxd: Use cpu_feature_enabled() x86/thermal: Fix LVT thermal setup for SMI delivery mode x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
2021-05-31x86/thermal: Fix LVT thermal setup for SMI delivery modeBorislav Petkov
There are machines out there with added value crap^WBIOS which provide an SMI handler for the local APIC thermal sensor interrupt. Out of reset, the BSP on those machines has something like 0x200 in that APIC register (timestamps left in because this whole issue is timing sensitive): [ 0.033858] read lvtthmr: 0x330, val: 0x200 which means: - bit 16 - the interrupt mask bit is clear and thus that interrupt is enabled - bits [10:8] have 010b which means SMI delivery mode. Now, later during boot, when the kernel programs the local APIC, it soft-disables it temporarily through the spurious vector register: setup_local_APIC: ... /* * If this comes from kexec/kcrash the APIC might be enabled in * SPIV. Soft disable it before doing further initialization. */ value = apic_read(APIC_SPIV); value &= ~APIC_SPIV_APIC_ENABLED; apic_write(APIC_SPIV, value); which means (from the SDM): "10.4.7.2 Local APIC State After It Has Been Software Disabled ... * The mask bits for all the LVT entries are set. Attempts to reset these bits will be ignored." And this happens too: [ 0.124111] APIC: Switch to symmetric I/O mode setup [ 0.124117] lvtthmr 0x200 before write 0xf to APIC 0xf0 [ 0.124118] lvtthmr 0x10200 after write 0xf to APIC 0xf0 This results in CPU 0 soft lockups depending on the placement in time when the APIC soft-disable happens. Those soft lockups are not 100% reproducible and the reason for that can only be speculated as no one tells you what SMM does. Likely, it confuses the SMM code that the APIC is disabled and the thermal interrupt doesn't doesn't fire at all, leading to CPU 0 stuck in SMM forever... Now, before 4f432e8bb15b ("x86/mce: Get rid of mcheck_intel_therm_init()") due to how the APIC_LVTTHMR was read before APIC initialization in mcheck_intel_therm_init(), it would read the value with the mask bit 16 clear and then intel_init_thermal() would replicate it onto the APs and all would be peachy - the thermal interrupt would remain enabled. But that commit moved that reading to a later moment in intel_init_thermal(), resulting in reading APIC_LVTTHMR on the BSP too late and with its interrupt mask bit set. Thus, revert back to the old behavior of reading the thermal LVT register before the APIC gets initialized. Fixes: 4f432e8bb15b ("x86/mce: Get rid of mcheck_intel_therm_init()") Reported-by: James Feeney <james@nurealm.net> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lkml.kernel.org/r/YKIqDdFNaXYd39wz@zn.tnic
2021-05-27thermal/drivers/qcom: Fix error code in adc_tm5_get_dt_channel_data()Yang Yingliang
Return -EINVAL when args is invalid instead of 'ret' which is set to zero by a previous successful call to a function. Fixes: ca66dca5eda6 ("thermal: qcom: add support for adc-tm5 PMIC thermal monitor") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210527092640.2070555-1-yangyingliang@huawei.com
2021-05-24thermal/ti-soc-thermal: Fix kernel-docYang Li
Fix function name in ti-bandgap.c kernel-doc comment to remove a warning. drivers/thermal/ti-soc-thermal/ti-bandgap.c:787: warning: expecting prototype for ti_bandgap_alert_init(). Prototype was for ti_bandgap_talert_init() instead. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Acked-by: Suman Anna <s-anna@ti.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/1621851963-36548-1-git-send-email-yang.lee@linux.alibaba.com
2021-05-14thermal/drivers/intel: Initialize RW trip to THERMAL_TEMP_INVALIDSrinivas Pandruvada
After commit 81ad4276b505 ("Thermal: Ignore invalid trip points") all user_space governor notifications via RW trip point is broken in intel thermal drivers. This commits marks trip_points with value of 0 during call to thermal_zone_device_register() as invalid. RW trip points can be 0 as user space will set the correct trip temperature later. During driver init, x86_package_temp and all int340x drivers sets RW trip temperature as 0. This results in all these trips marked as invalid by the thermal core. To fix this initialize RW trips to THERMAL_TEMP_INVALID instead of 0. Cc: <stable@vger.kernel.org> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210430122343.1789899-1-srinivas.pandruvada@linux.intel.com
2021-05-05Merge tag 'thermal-v5.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux Pull thermal updates from Daniel Lezcano: - Remove duplicate error message for the amlogic driver (Tang Bin) - Fix spellos in comments for the tegra and sun8i (Bhaskar Chowdhury) - Add the missing fifth node on the rcar_gen3 sensor (Niklas Söderlund) - Remove duplicate include in ti-bandgap (Zhang Yunkai) - Assign error code in the error path in the function thermal_of_populate_bind_params() (Jia-Ju Bai) - Fix spelling mistake in a comment 'disabed' -> 'disabled' (Colin Ian King) - Use the device name instead of auto-numbering for a better identification of the cooling device (Daniel Lezcano) - Improve a bit the division accuracy in the power allocator governor (Jeson Gao) - Enable the missing third sensor on msm8976 (Konrad Dybcio) - Add QCom tsens driver co-maintainer (Thara Gopinath) - Fix memory leak and use after free errors in the core code (Daniel Lezcano) - Add the MDM9607 compatible bindings (Konrad Dybcio) - Fix trivial spello in the copyright name for Hisilicon (Hao Fang) - Fix negative index array access when converting the frequency to power in the energy model (Brian-sy Yang) - Add support for Gen2 new PMIC support for Qcom SPMI (David Collins) - Update maintainer file for CPU cooling device section (Lukasz Luba) - Fix missing put_device on error in the Qcom tsens driver (Guangqing Zhu) - Add compatible DT binding for sm8350 (Robert Foss) - Add support for the MDM9607's tsens driver (Konrad Dybcio) - Remove duplicate error messages in thermal_mmio and the bcm2835 driver (Ruiqi Gong) - Add the Thermal Temperature Cooling driver (Zhang Rui) - Remove duplicate error messages in the Hisilicon sensor driver (Ye Bin) - Use the devm_platform_ioremap_resource_byname() function instead of a couple of corresponding calls (dingsenjie) - Sort the headers alphabetically in the ti-bandgap driver (Zhen Lei) - Add missing property in the DT thermal sensor binding (Rafał Miłecki) - Remove dead code in the ti-bandgap sensor driver (Lin Ruizhe) - Convert the BRCM DT bindings to the yaml schema (Rafał Miłecki) - Replace the thermal_notify_framework() call by a call to the thermal_zone_device_update() function. Remove the function as well as the corresponding documentation (Thara Gopinath) - Add support for the ipq8064-tsens sensor along with a set of cleanups and code preparation (Ansuel Smith) - Add a lockless __thermal_cdev_update() function to improve the locking scheme in the core code and governors (Lukasz Luba) - Fix multiple cooling device notification changes (Lukasz Luba) - Remove unneeded variable initialization (Colin Ian King) * tag 'thermal-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (55 commits) thermal/drivers/mtk_thermal: Remove redundant initializations of several variables thermal/core/power allocator: Use the lockless __thermal_cdev_update() function thermal/core/fair share: Use the lockless __thermal_cdev_update() function thermal/core/fair share: Lock the thermal zone while looping over instances thermal/core/power_allocator: Update once cooling devices when temp is low thermal/core/power_allocator: Maintain the device statistics from going stale thermal/core: Create a helper __thermal_cdev_update() without a lock dt-bindings: thermal: tsens: Document ipq8064 bindings thermal/drivers/tsens: Add support for ipq8064-tsens thermal/drivers/tsens: Drop unused define for msm8960 thermal/drivers/tsens: Replace custom 8960 apis with generic apis thermal/drivers/tsens: Fix bug in sensor enable for msm8960 thermal/drivers/tsens: Use init_common for msm8960 thermal/drivers/tsens: Add VER_0 tsens version thermal/drivers/tsens: Convert msm8960 to reg_field thermal/drivers/tsens: Don't hardcode sensor slope Documentation: driver-api: thermal: Remove thermal_notify_framework from documentation thermal/core: Remove thermal_notify_framework iwlwifi: mvm: tt: Replace thermal_notify_framework dt-bindings: thermal: brcm,ns-thermal: Convert to the json-schema ...
2021-04-22thermal/drivers/mtk_thermal: Remove redundant initializations of several ↵Colin Ian King
variables Several variables are being initialized with values that is never read and being updated later with a new value. The initializations are redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210422120412.246291-1-colin.king@canonical.com
2021-04-22thermal/core/power allocator: Use the lockless __thermal_cdev_update() functionLukasz Luba
Use the new helper function and avoid unnecessery second lock/unlock, which was present in old approach with thermal_cdev_update(). Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210422153624.6074-4-lukasz.luba@arm.com
2021-04-22thermal/core/fair share: Use the lockless __thermal_cdev_update() functionLukasz Luba
Use the new helper function and avoid unnecessery second lock/unlock, which was present in old approach with thermal_cdev_update(). Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210422153624.6074-3-lukasz.luba@arm.com
2021-04-22thermal/core/fair share: Lock the thermal zone while looping over instancesLukasz Luba
The tz->lock must be hold during the looping over the instances in that thermal zone. This lock was missing in the governor code since the beginning, so it's hard to point into a particular commit. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210422153624.6074-2-lukasz.luba@arm.com
2021-04-22thermal/core/power_allocator: Update once cooling devices when temp is lowLukasz Luba
The cooling device state change generates an event, also when there is no need, because temperature is low and device is not throttled. Avoid to unnecessary update the cooling device which means also not sending event. The cooling device state has not changed because the temperature is still below the first activation trip point value, so we can do this. Add a tracking mechanism to make sure it updates cooling devices only once - when the temperature dropps below first trip point. Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210422114308.29684-4-lukasz.luba@arm.com
2021-04-22thermal/core/power_allocator: Maintain the device statistics from going staleLukasz Luba
When the temperature is below the first activation trip point the cooling devices are not checked, so they cannot maintain fresh statistics. It leads into the situation, when temperature crosses first trip point, the statistics are stale and show state for very long period. This has impact on IPA algorithm calculation and wrong decisions. Thus, check the cooling devices even when the temperature is low, to refresh these statistics. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210422114308.29684-3-lukasz.luba@arm.com
2021-04-22thermal/core: Create a helper __thermal_cdev_update() without a lockLukasz Luba
There is a need to have a helper function which updates cooling device state from the governors code. With this change governor can use lock and unlock while calling helper function. This avoid unnecessary second time lock/unlock which was in previous solution present in governor implementation. This new helper function must be called with mutex 'cdev->lock' hold. The changed been discussed and part of code presented in thread: https://lore.kernel.org/linux-pm/20210419084536.25000-1-lukasz.luba@arm.com/ Co-developed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://lore.kernel.org/r/20210422114308.29684-2-lukasz.luba@arm.com
2021-04-22thermal/drivers/tsens: Add support for ipq8064-tsensAnsuel Smith
Add support for tsens present in ipq806x SoCs based on generic msm8960 tsens driver. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210420183343.2272-9-ansuelsmth@gmail.com
2021-04-22thermal/drivers/tsens: Drop unused define for msm8960Ansuel Smith
Drop unused define for msm8960 replaced by generic api and reg_field. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210420183343.2272-8-ansuelsmth@gmail.com
2021-04-22thermal/drivers/tsens: Replace custom 8960 apis with generic apisAnsuel Smith
Rework calibrate function to use common function. Derive the offset from a missing hardcoded slope table and the data from the nvmem calib efuses. Drop custom get_temp function and use generic api. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Acked-by: Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210420183343.2272-7-ansuelsmth@gmail.com
2021-04-22thermal/drivers/tsens: Fix bug in sensor enable for msm8960Ansuel Smith
Device based on tsens VER_0 contains a hardware bug that results in some problem with sensor enablement. Sensor id 6-11 can't be enabled selectively and all of them must be enabled in one step. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Acked-by: Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210420183343.2272-6-ansuelsmth@gmail.com
2021-04-22thermal/drivers/tsens: Use init_common for msm8960Ansuel Smith
Use init_common and drop custom init for msm8960. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210420183343.2272-5-ansuelsmth@gmail.com
2021-04-22thermal/drivers/tsens: Add VER_0 tsens versionAnsuel Smith
VER_0 is used to describe device based on tsens version before v0.1. These device are devices based on msm8960 for example apq8064 or ipq806x. Add support for VER_0 in tsens.c and set the right tsens feat in tsens-8960.c file. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210420183343.2272-4-ansuelsmth@gmail.com
2021-04-22thermal/drivers/tsens: Convert msm8960 to reg_fieldAnsuel Smith
Convert msm9860 driver to reg_field to use the init_common function. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Acked-by: Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210420183343.2272-3-ansuelsmth@gmail.com
2021-04-22thermal/drivers/tsens: Don't hardcode sensor slopeAnsuel Smith
Function compute_intercept_slope hardcode the sensor slope to SLOPE_DEFAULT. Change this and use the default value only if a slope is not defined. This is needed for tsens VER_0 that has a hardcoded slope table. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210420183343.2272-2-ansuelsmth@gmail.com
2021-04-22thermal/core: Remove thermal_notify_frameworkThara Gopinath
thermal_notify_framework just updates for a single trip point where as thermal_zone_device_update does other bookkeeping like updating the temperature of the thermal zone and setting the next trip point. The only driver that was using thermal_notify_framework was updated in the previous patch to use thermal_zone_device_update instead. Since there are no users for thermal_notify_framework remove it. Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210122023406.3500424-3-thara.gopinath@linaro.org
2021-04-21thermal/drivers/ti-soc-thermal/bandgap Remove unused variable 'val'Lin Ruizhe
The function ti_bandgap_restore_ctxt() restores the context at resume time. It checks if the sensor has a counter, reads the register but does nothing with the value. The block was probably omitted by the commit b87ea759a4cc. Remove the unused variable as well as the block using it as we can consider it as dead code. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: b87ea759a4cc ("staging: omap-thermal: fix context restore function") Signed-off-by: Lin Ruizhe <linruizhe@huawei.com> Reviewed-by: Tony Lindgren <tony@atomide.com> Tested-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210421084256.57591-1-linruizhe@huawei.com
2021-04-20thermal/drivers/ti-soc-thermal/ti-bandgap: Rearrange all the included header ↵Zhen Lei
files alphabetically For the sake of lisibility, reorder the header files alphabetically. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210406091912.2583-2-thunder.leizhen@huawei.com
2021-04-20thermal/drivers/tegra: Use devm_platform_ioremap_resource_bynamedingsenjie
Use the devm_platform_ioremap_resource_byname() helper instead of calling platform_get_resource_byname() and devm_ioremap_resource() separately. Signed-off-by: dingsenjie <dingsenjie@yulong.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210414063943.96244-1-dingsenjie@163.com
2021-04-20thermal/drivers/hisi: Remove redundant dev_err call in hisi_thermal_probe()Ye Bin
There is a error message within devm_ioremap_resource already, so remove the dev_err call to avoid redundant error message. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210409075224.2109503-1-yebin10@huawei.com