summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2020-09-03MIPS: SNI: Fix SCSI interruptThomas Bogendoerfer
On RM400(a20r) machines ISA and SCSI interrupts share the same interrupt line. Commit 49e6e07e3c80 ("MIPS: pass non-NULL dev_id on shared request_irq()") accidently dropped the IRQF_SHARED bit, which breaks registering SCSI interrupt. Put back IRQF_SHARED and add dev_id for ISA interrupt. Fixes: 49e6e07e3c80 ("MIPS: pass non-NULL dev_id on shared request_irq()") Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-09-03MIPS: add missing MSACSR and upper MSA initializationHuang Pei
In cc97ab235f3f ("MIPS: Simplify FP context initialization), init_fp_ctx just initialize the fp/msa context, and own_fp_inatomic just restore FCSR and 64bit FP regs from it, but miss MSACSR and upper MSA regs for MSA, so MSACSR and MSA upper regs's value from previous task on current cpu can leak into current task and cause unpredictable behavior when MSA context not initialized. Fixes: cc97ab235f3f ("MIPS: Simplify FP context initialization") Signed-off-by: Huang Pei <huangpei@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-09-03x86/mm/32: Bring back vmalloc faulting on x86_32Joerg Roedel
One can not simply remove vmalloc faulting on x86-32. Upstream commit: 7f0a002b5a21 ("x86/mm: remove vmalloc faulting") removed it on x86 alltogether because previously the arch_sync_kernel_mappings() interface was introduced. This interface added synchronization of vmalloc/ioremap page-table updates to all page-tables in the system at creation time and was thought to make vmalloc faulting obsolete. But that assumption was incredibly naive. It turned out that there is a race window between the time the vmalloc or ioremap code establishes a mapping and the time it synchronizes this change to other page-tables in the system. During this race window another CPU or thread can establish a vmalloc mapping which uses the same intermediate page-table entries (e.g. PMD or PUD) and does no synchronization in the end, because it found all necessary mappings already present in the kernel reference page-table. But when these intermediate page-table entries are not yet synchronized, the other CPU or thread will continue with a vmalloc address that is not yet mapped in the page-table it currently uses, causing an unhandled page fault and oops like below: BUG: unable to handle page fault for address: fe80c000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page *pde = 33183067 *pte = a8648163 Oops: 0002 [#1] SMP CPU: 1 PID: 13514 Comm: cve-2017-17053 Tainted: G ... Call Trace: ldt_dup_context+0x66/0x80 dup_mm+0x2b3/0x480 copy_process+0x133b/0x15c0 _do_fork+0x94/0x3e0 __ia32_sys_clone+0x67/0x80 __do_fast_syscall_32+0x3f/0x70 do_fast_syscall_32+0x29/0x60 do_SYSENTER_32+0x15/0x20 entry_SYSENTER_32+0x9f/0xf2 EIP: 0xb7eef549 So the arch_sync_kernel_mappings() interface is racy, but removing it would mean to re-introduce the vmalloc_sync_all() interface, which is even more awful. Keep arch_sync_kernel_mappings() in place and catch the race condition in the page-fault handler instead. Do a partial revert of above commit to get vmalloc faulting on x86-32 back in place. Fixes: 7f0a002b5a21 ("x86/mm: remove vmalloc faulting") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200902155904.17544-1-joro@8bytes.org
2020-09-03x86/cmdline: Disable jump tables for cmdline.cArvind Sankar
When CONFIG_RETPOLINE is disabled, Clang uses a jump table for the switch statement in cmdline_find_option (jump tables are disabled when CONFIG_RETPOLINE is enabled). This function is called very early in boot from sme_enable() if CONFIG_AMD_MEM_ENCRYPT is enabled. At this time, the kernel is still executing out of the identity mapping, but the jump table will contain virtual addresses. Fix this by disabling jump tables for cmdline.c when AMD_MEM_ENCRYPT is enabled. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200903023056.3914690-1-nivedita@alum.mit.edu
2020-09-03KVM: PPC: Book3S HV: XICS: Replace the 'destroy' method by a 'release' methodGreg Kurz
Similarly to what was done with XICS-on-XIVE and XIVE native KVM devices with commit 5422e95103cf ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method"), convert the historical XICS KVM device to implement the 'release' method. This is needed to run nested guests with an in-kernel IRQ chip. A typical POWER9 guest can select XICS or XIVE during boot, which requires to be able to destroy and to re-create the KVM device. Only the historical XICS KVM device is available under pseries at the current time and it still uses the legacy 'destroy' method. Switching to 'release' means that vCPUs might still be running when the device is destroyed. In order to avoid potential use-after-free, the kvmppc_xics structure is allocated on first usage and kept around until the VM exits. The same pointer is used each time a KVM XICS device is being created, but this is okay since we only have one per VM. Clear the ICP of each vCPU with vcpu->mutex held. This ensures that the next time the vCPU resumes execution, it won't be going into the XICS code anymore. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-09-02s390: update defconfigsHeiko Carstens
Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-02s390: fix GENERIC_LOCKBREAK dependency typo in KconfigEric Farman
Commit fa686453053b ("sched/rt, s390: Use CONFIG_PREEMPTION") changed a bunch of uses of CONFIG_PREEMPT to _PREEMPTION. Except in the Kconfig it used two T's. That's the only place in the system where that spelling exists, so let's fix that. Fixes: fa686453053b ("sched/rt, s390: Use CONFIG_PREEMPTION") Cc: <stable@vger.kernel.org> # 5.6 Signed-off-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-02arm64/module: set trampoline section flags regardless of CONFIG_DYNAMIC_FTRACEJessica Yu
In the arm64 module linker script, the section .text.ftrace_trampoline is specified unconditionally regardless of whether CONFIG_DYNAMIC_FTRACE is enabled (this is simply due to the limitation that module linker scripts are not preprocessed like the vmlinux one). Normally, for .plt and .text.ftrace_trampoline, the section flags present in the module binary wouldn't matter since module_frob_arch_sections() would assign them manually anyway. However, the arm64 module loader only sets the section flags for .text.ftrace_trampoline when CONFIG_DYNAMIC_FTRACE=y. That's only become problematic recently due to a recent change in binutils-2.35, where the .text.ftrace_trampoline section (along with the .plt section) is now marked writable and executable (WAX). We no longer allow writable and executable sections to be loaded due to commit 5c3a7db0c7ec ("module: Harden STRICT_MODULE_RWX"), so this is causing all modules linked with binutils-2.35 to be rejected under arm64. Drop the IS_ENABLED(CONFIG_DYNAMIC_FTRACE) check in module_frob_arch_sections() so that the section flags for .text.ftrace_trampoline get properly set to SHF_EXECINSTR|SHF_ALLOC, without SHF_WRITE. Signed-off-by: Jessica Yu <jeyu@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: http://lore.kernel.org/r/20200831094651.GA16385@linux-8ccs Link: https://lore.kernel.org/r/20200901160016.3646-1-jeyu@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-09-02arm64: Remove exporting cpu_logical_map symbolSudeep Holla
Commit eaecca9e7710 ("arm64: Fix __cpu_logical_map undefined issue") exported cpu_logical_map in order to fix tegra194-cpufreq module build failure. As this might potentially cause problem while supporting physical CPU hotplug, tegra194-cpufreq module was reworded to avoid use of cpu_logical_map() via the commit 93d0c1ab2328 ("cpufreq: replace cpu_logical_map() with read_cpuid_mpir()") Since cpu_logical_map was exported to fix the module build temporarily, let us remove the same before it gains any user again. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20200901095229.56793-1-sudeep.holla@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-09-01ARC: [plat-hsdk]: Switch ethernet phy-mode to rgmii-idEvgeniy Didin
HSDK board has Micrel KSZ9031, recent commit bcf3440c6dd ("net: phy: micrel: add phy-mode support for the KSZ9031 PHY") caused a breakdown of Ethernet. Using 'phy-mode = "rgmii"' is not correct because accodring RGMII specification it is necessary to have delay on RX (PHY to MAX) which is not generated in case of "rgmii". Using "rgmii-id" adds necessary delay and solves the issue. Also adding name of PHY placed on HSDK board. Signed-off-by: Evgeniy Didin <Evgeniy.Didin@synopsys.com> Cc: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Cc: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2020-09-01arc: fix memory initialization for systems with two memory banksMike Rapoport
Rework of memory map initialization broke initialization of ARC systems with two memory banks. Before these changes, memblock was not aware of nodes configuration and the memory map was always allocated from the "lowmem" bank. After the addition of node information to memblock, the core mm attempts to allocate the memory map for the "highmem" bank from its node. The access to this memory using __va() fails because it can be only accessed using kmap. Anther problem that was uncovered is that {min,max}_high_pfn are calculated from u64 high_mem_start variable which prevents truncation to 32-bit physical address and the PFN values are above the node and zone boundaries. Use phys_addr_t type for high_mem_start and high_mem_size to ensure correspondence between PFNs and highmem zone boundaries and reserve the entire highmem bank until mem_init() to avoid accesses to it before highmem is enabled. To test this: 1. Enable HIGHMEM in ARC config 2. Enable 2 memory banks in haps_hs.dts (uncomment the 2nd bank) Fixes: 51930df5801e ("mm: free_area_init: allow defining max_zone_pfn in descending order") Cc: stable@vger.kernel.org [5.8] Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> [vgupta: added instructions to test highmem]
2020-09-01ia64: fix min_low_pfn/max_low_pfn build errorsRandy Dunlap
Fix min_low_pfn/max_low_pfn build errors for arch/ia64/: (e.g.) ERROR: "max_low_pfn" [drivers/rpmsg/virtio_rpmsg_bus.ko] undefined! ERROR: "min_low_pfn" [drivers/rpmsg/virtio_rpmsg_bus.ko] undefined! ERROR: "min_low_pfn" [drivers/hwtracing/intel_th/intel_th_msu.ko] undefined! ERROR: "max_low_pfn" [drivers/hwtracing/intel_th/intel_th_msu.ko] undefined! ERROR: "min_low_pfn" [drivers/crypto/cavium/nitrox/n5pf.ko] undefined! ERROR: "max_low_pfn" [drivers/crypto/cavium/nitrox/n5pf.ko] undefined! ERROR: "max_low_pfn" [drivers/md/dm-integrity.ko] undefined! ERROR: "min_low_pfn" [drivers/md/dm-integrity.ko] undefined! ERROR: "max_low_pfn" [crypto/tcrypt.ko] undefined! ERROR: "min_low_pfn" [crypto/tcrypt.ko] undefined! ERROR: "min_low_pfn" [security/keys/encrypted-keys/encrypted-keys.ko] undefined! ERROR: "max_low_pfn" [security/keys/encrypted-keys/encrypted-keys.ko] undefined! ERROR: "min_low_pfn" [arch/ia64/kernel/mca_recovery.ko] undefined! ERROR: "max_low_pfn" [arch/ia64/kernel/mca_recovery.ko] undefined! David suggested just exporting min_low_pfn & max_low_pfn in mm/memblock.c: https://lore.kernel.org/lkml/alpine.DEB.2.22.394.2006291911220.1118534@chino.kir.corp.google.com/ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: linux-ia64@vger.kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-09-01MIPS: perf: Fix wrong check condition of Loongson event IDsTiezhu Yang
According to the user's manual chapter 8.2.1 of Loongson 3A2000 CPU [1] and 3A3000 CPU [2], we should take some event IDs such as 274, 358, 359 and 360 as valid in the check condition, otherwise they are recognized as "not supported", fix it. [1] http://www.loongson.cn/uploadfile/cpu/3A2000/Loongson3A2000_user2.pdf [2] http://www.loongson.cn/uploadfile/cpu/3A3000/Loongson3A3000_3B3000user2.pdf Fixes: e9dfbaaeef1c ("MIPS: perf: Add hardware perf events support for new Loongson-3") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Huang Pei <huangpei@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-08-31Merge tag 'mmc-v5.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: - Fix HS400 tuning for ACPI ID AMDI0040 - Fix reset of CQHCI for Intel GLK-based controllers - Use correct timeout clock for Tegra186/194/210 - Fix eMMC mounting on mt7622/Bpi-64 * tag 'mmc-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: sdhci: tegra: Add missing TMCLK for data timeout arm64: tegra: Add missing timeout clock to Tegra194 SDMMC nodes arm64: tegra: Add missing timeout clock to Tegra186 SDMMC nodes arm64: tegra: Add missing timeout clock to Tegra210 SDMMC dt-bindings: mmc: tegra: Add tmclk for Tegra210 and later sdhci: tegra: Remove SDHCI_QUIRK_DATA_TIMEOUT_USES_SDCLK for Tegra186 sdhci: tegra: Remove SDHCI_QUIRK_DATA_TIMEOUT_USES_SDCLK for Tegra210 arm64: dts: mt7622: add reset node for mmc device dt-bindings: mmc: Add missing description for clk_in/out_sd1 mmc: mediatek: add optional module reset property mmc: dt-bindings: Add resets/reset-names for Mediatek MMC bindings mmc: sdhci-pci: Fix SDHCI_RESET_ALL for CQHCI for Intel GLK-based controllers mmc: sdhci-acpi: Fix HS400 tuning for AMDI0040
2020-08-31microblaze: fix min_low_pfn/max_low_pfn build errorsRandy Dunlap
Fix min_low_pfn/max_low_pfn build errors for arch/microblaze/: (e.g.) ERROR: "min_low_pfn" [drivers/rpmsg/virtio_rpmsg_bus.ko] undefined! ERROR: "min_low_pfn" [drivers/hwtracing/intel_th/intel_th_msu_sink.ko] undefined! ERROR: "min_low_pfn" [drivers/hwtracing/intel_th/intel_th_msu.ko] undefined! ERROR: "min_low_pfn" [drivers/mmc/core/mmc_core.ko] undefined! ERROR: "min_low_pfn" [drivers/md/dm-crypt.ko] undefined! ERROR: "min_low_pfn" [drivers/net/wireless/ath/ath6kl/ath6kl_sdio.ko] undefined! ERROR: "min_low_pfn" [crypto/tcrypt.ko] undefined! ERROR: "min_low_pfn" [crypto/asymmetric_keys/asym_tpm.ko] undefined! Mike had/has an alternate patch for Microblaze: https://lore.kernel.org/lkml/20200630111519.GA1951986@linux.ibm.com/ David suggested just exporting min_low_pfn & max_low_pfn in mm/memblock.c: https://lore.kernel.org/lkml/alpine.DEB.2.22.394.2006291911220.1118534@chino.kir.corp.google.com/ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Michal Simek <monstr@monstr.eu> Acked-by: David Rientjes <rientjes@google.com> Cc: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-08-30x86/kvm: Expose TSX Suspend Load Tracking featureCathy Zhang
TSX suspend load tracking instruction is supported by the Intel uarch Sapphire Rapids. It aims to give a way to choose which memory accesses do not need to be tracked in the TSX read set. It's availability is indicated as CPUID.(EAX=7,ECX=0):EDX[bit 16]. Expose TSX Suspend Load Address Tracking feature in KVM CPUID, so KVM could pass this information to guests and they can make use of this feature accordingly. Signed-off-by: Cathy Zhang <cathy.zhang@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1598316478-23337-3-git-send-email-cathy.zhang@intel.com
2020-08-30Merge tag 'x86-urgent-2020-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Three interrupt related fixes for X86: - Move disabling of the local APIC after invoking fixup_irqs() to ensure that interrupts which are incoming are noted in the IRR and not ignored. - Unbreak affinity setting. The rework of the entry code reused the regular exception entry code for device interrupts. The vector number is pushed into the errorcode slot on the stack which is then lifted into an argument and set to -1 because that's regs->orig_ax which is used in quite some places to check whether the entry came from a syscall. But it was overlooked that orig_ax is used in the affinity cleanup code to validate whether the interrupt has arrived on the new target. It turned out that this vector check is pointless because interrupts are never moved from one vector to another on the same CPU. That check is a historical leftover from the time where x86 supported multi-CPU affinities, but not longer needed with the now strict single CPU affinity. Famous last words ... - Add a missing check for an empty cpumask into the matrix allocator. The affinity change added a warning to catch the case where an interrupt is moved on the same CPU to a different vector. This triggers because a condition with an empty cpumask returns an assignment from the allocator as the allocator uses for_each_cpu() without checking the cpumask for being empty. The historical inconsistent for_each_cpu() behaviour of ignoring the cpumask and unconditionally claiming that CPU0 is in the mask struck again. Sigh. plus a new entry into the MAINTAINER file for the HPE/UV platform" * tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/matrix: Deal with the sillyness of for_each_cpu() on UP x86/irq: Unbreak interrupt affinity setting x86/hotplug: Silence APIC only after all interrupts are migrated MAINTAINERS: Add entry for HPE Superdome Flex (UV) maintainers
2020-08-30Merge tag 'irq-urgent-2020-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for interrupt chip drivers: - Revert the platform driver conversion of interrupt chip drivers as it turned out to create more problems than it solves. - Fix a trivial typo in the new module helpers which made probing reliably fail. - Small fixes in the STM32 and MIPS Ingenic drivers - The TI firmware rework which had badly managed dependencies and had to wait post rc1" * tag 'irq-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/ingenic: Leave parent IRQ unmasked on suspend irqchip/stm32-exti: Avoid losing interrupts due to clearing pending bits by mistake irqchip: Revert modular support for drivers using IRQCHIP_PLATFORM_DRIVER helperse irqchip: Fix probing deferal when using IRQCHIP_PLATFORM_DRIVER helpers arm64: dts: k3-am65: Update the RM resource types arm64: dts: k3-am65: ti-sci-inta/intr: Update to latest bindings arm64: dts: k3-j721e: ti-sci-inta/intr: Update to latest bindings irqchip/ti-sci-inta: Add support for INTA directly connecting to GIC irqchip/ti-sci-inta: Do not store TISCI device id in platform device id field dt-bindings: irqchip: Convert ti, sci-inta bindings to yaml dt-bindings: irqchip: ti, sci-inta: Update docs to support different parent. irqchip/ti-sci-intr: Add support for INTR being a parent to INTR dt-bindings: irqchip: Convert ti, sci-intr bindings to yaml dt-bindings: irqchip: ti, sci-intr: Update bindings to drop the usage of gic as parent firmware: ti_sci: Add support for getting resource with subtype firmware: ti_sci: Drop unused structure ti_sci_rm_type_map firmware: ti_sci: Drop the device id to resource type translation
2020-08-30Merge tag 'locking-urgent-2020-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "A set of fixes for lockdep, tracing and RCU: - Prevent recursion by using raw_cpu_* operations - Fixup the interrupt state in the cpu idle code to be consistent - Push rcu_idle_enter/exit() invocations deeper into the idle path so that the lock operations are inside the RCU watching sections - Move trace_cpu_idle() into generic code so it's called before RCU goes idle. - Handle raw_local_irq* vs. local_irq* operations correctly - Move the tracepoints out from under the lockdep recursion handling which turned out to be fragile and inconsistent" * tag 'locking-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep,trace: Expose tracepoints lockdep: Only trace IRQ edges mips: Implement arch_irqs_disabled() arm64: Implement arch_irqs_disabled() nds32: Implement arch_irqs_disabled() locking/lockdep: Cleanup x86/entry: Remove unused THUNKs cpuidle: Move trace_cpu_idle() into generic code cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic sched,idle,rcu: Push rcu_idle deeper into the idle path cpuidle: Fixup IRQ state lockdep: Use raw_cpu_*() for per-cpu variables
2020-08-30Merge tag 'powerpc-5.9-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Revert our removal of PROT_SAO, at least one user expressed an interest in using it on Power9. Instead don't allow it to be used in guests unless enabled explicitly at compile time. - A fix for a crash introduced by a recent change to FP handling. - Revert a change to our idle code that left Power10 with no idle support. - One minor fix for the new scv system call path to set PPR. - Fix a crash in our "generic" PMU if branch stack events were enabled. - A fix for the IMC PMU, to correctly identify host kernel samples. - The ADB_PMU powermac code was found to be incompatible with VMAP_STACK, so make them incompatible in Kconfig until the code can be fixed. - A build fix in drivers/video/fbdev/controlfb.c, and a documentation fix. Thanks to Alexey Kardashevskiy, Athira Rajeev, Christophe Leroy, Giuseppe Sacco, Madhavan Srinivasan, Milton Miller, Nicholas Piggin, Pratik Rajesh Sampat, Randy Dunlap, Shawn Anastasio, Vaidyanathan Srinivasan. * tag 'powerpc-5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/32s: Disable VMAP stack which CONFIG_ADB_PMU Revert "powerpc/powernv/idle: Replace CPU feature check with PVR check" powerpc/perf: Fix reading of MSR[HV/PR] bits in trace-imc powerpc/perf: Fix crashes with generic_compat_pmu & BHRB powerpc/64s: Fix crash in load_fp_state() due to fpexc_mode powerpc/64s: scv entry should set PPR Documentation/powerpc: fix malformed table in syscall64-abi video: fbdev: controlfb: Fix build for COMPILE_TEST=y && PPC_PMAC=n selftests/powerpc: Update PROT_SAO test to skip ISA 3.1 powerpc/64s: Disallow PROT_SAO in LPARs by default Revert "powerpc/64s: Remove PROT_SAO support"
2020-08-30x86/cpufeatures: Enumerate TSX suspend load address tracking instructionsKyung Min Park
Intel TSX suspend load tracking instructions aim to give a way to choose which memory accesses do not need to be tracked in the TSX read set. Add TSX suspend load tracking CPUID feature flag TSXLDTRK for enumeration. A processor supports Intel TSX suspend load address tracking if CPUID.0x07.0x0:EDX[16] is present. Two instructions XSUSLDTRK, XRESLDTRK are available when this feature is present. The CPU feature flag is shown as "tsxldtrk" in /proc/cpuinfo. Signed-off-by: Kyung Min Park <kyung.min.park@intel.com> Signed-off-by: Cathy Zhang <cathy.zhang@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1598316478-23337-2-git-send-email-cathy.zhang@intel.com
2020-08-29Merge tag 'fallthrough-fixes-5.9-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull fallthrough fixes from Gustavo A. R. Silva: "Fix some minor issues introduced by the recent treewide fallthrough conversions: - Fix identation issue - Fix erroneous fallthrough annotation - Remove unnecessary fallthrough annotation - Fix code comment changed by fallthrough conversion" * tag 'fallthrough-fixes-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: arm64/cpuinfo: Remove unnecessary fallthrough annotation media: dib0700: Fix identation issue in dib8096_set_param_override() afs: Remove erroneous fallthough annotation iio: dpot-dac: fix code comment in dpot_dac_read_raw()
2020-08-29Merge tag 's390-5.9-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Disable preemption trace in percpu macros since the lockdep code itself uses percpu variables now and it causes recursions. - Fix kernel space 4-level paging broken by recent vmem rework. * tag 's390-5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/vmem: fix vmem_add_range for 4-level paging s390: don't trace preemption in percpu macros
2020-08-28Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Fix kernel build with the integrated LLVM assembler which doesn't see the -Wa,-march option. - Fix "make vdso_install" when COMPAT_VDSO is disabled. - Make KVM more robust if the AT S1E1R instruction triggers an exception (architecture corner cases). * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: KVM: arm64: Set HCR_EL2.PTW to prevent AT taking synchronous exception KVM: arm64: Survive synchronous exceptions caused by AT instructions KVM: arm64: Add kvm_extable for vaxorcism code arm64: vdso32: make vdso32 install conditional arm64: use a common .arch preamble for inline assembly
2020-08-28KVM: arm64: Set HCR_EL2.PTW to prevent AT taking synchronous exceptionJames Morse
AT instructions do a translation table walk and return the result, or the fault in PAR_EL1. KVM uses these to find the IPA when the value is not provided by the CPU in HPFAR_EL1. If a translation table walk causes an external abort it is taken as an exception, even if it was due to an AT instruction. (DDI0487F.a's D5.2.11 "Synchronous faults generated by address translation instructions") While we previously made KVM resilient to exceptions taken due to AT instructions, the device access causes mismatched attributes, and may occur speculatively. Prevent this, by forbidding a walk through memory described as device at stage2. Now such AT instructions will report a stage2 fault. Such a fault will cause KVM to restart the guest. If the AT instructions always walk the page tables, but guest execution uses the translation cached in the TLB, the guest can't make forward progress until the TLB entry is evicted. This isn't a problem, as since commit 5dcd0fdbb492 ("KVM: arm64: Defer guest entry when an asynchronous exception is pending"), KVM will return to the host to process IRQs allowing the rest of the system to keep running. Cc: stable@vger.kernel.org # <v5.3: 5dcd0fdbb492 ("KVM: arm64: Defer guest entry when an asynchronous exception is pending") Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-08-28KVM: arm64: Survive synchronous exceptions caused by AT instructionsJames Morse
KVM doesn't expect any synchronous exceptions when executing, any such exception leads to a panic(). AT instructions access the guest page tables, and can cause a synchronous external abort to be taken. The arm-arm is unclear on what should happen if the guest has configured the hardware update of the access-flag, and a memory type in TCR_EL1 that does not support atomic operations. B2.2.6 "Possible implementation restrictions on using atomic instructions" from DDI0487F.a lists synchronous external abort as a possible behaviour of atomic instructions that target memory that isn't writeback cacheable, but the page table walker may behave differently. Make KVM robust to synchronous exceptions caused by AT instructions. Add a get_user() style helper for AT instructions that returns -EFAULT if an exception was generated. While KVM's version of the exception table mixes synchronous and asynchronous exceptions, only one of these can occur at each location. Re-enter the guest when the AT instructions take an exception on the assumption the guest will take the same exception. This isn't guaranteed to make forward progress, as the AT instructions may always walk the page tables, but guest execution may use the translation cached in the TLB. This isn't a problem, as since commit 5dcd0fdbb492 ("KVM: arm64: Defer guest entry when an asynchronous exception is pending"), KVM will return to the host to process IRQs allowing the rest of the system to keep running. Cc: stable@vger.kernel.org # <v5.3: 5dcd0fdbb492 ("KVM: arm64: Defer guest entry when an asynchronous exception is pending") Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-08-28KVM: arm64: Add kvm_extable for vaxorcism codeJames Morse
KVM has a one instruction window where it will allow an SError exception to be consumed by the hypervisor without treating it as a hypervisor bug. This is used to consume asynchronous external abort that were caused by the guest. As we are about to add another location that survives unexpected exceptions, generalise this code to make it behave like the host's extable. KVM's version has to be mapped to EL2 to be accessible on nVHE systems. The SError vaxorcism code is a one instruction window, so has two entries in the extable. Because the KVM code is copied for VHE and nVHE, we end up with four entries, half of which correspond with code that isn't mapped. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-08-28arm64: vdso32: make vdso32 install conditionalFrank van der Linden
vdso32 should only be installed if CONFIG_COMPAT_VDSO is enabled, since it's not even supposed to be compiled otherwise, and arm64 builds without a 32bit crosscompiler will fail. Fixes: 8d75785a8142 ("ARM64: vdso32: Install vdso32 from vdso_install") Signed-off-by: Frank van der Linden <fllinden@amazon.com> Cc: stable@vger.kernel.org [5.4+] Link: https://lore.kernel.org/r/20200827234012.19757-1-fllinden@amazon.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-08-28arm64: use a common .arch preamble for inline assemblySami Tolvanen
Commit 7c78f67e9bd9 ("arm64: enable tlbi range instructions") breaks LLVM's integrated assembler, because -Wa,-march is only passed to external assemblers and therefore, the new instructions are not enabled when IAS is used. This change adds a common architecture version preamble, which can be used in inline assembly blocks that contain instructions that require a newer architecture version, and uses it to fix __TLBI_0 and __TLBI_1 with ARM64_TLB_RANGE. Fixes: 7c78f67e9bd9 ("arm64: enable tlbi range instructions") Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1106 Link: https://lore.kernel.org/r/20200827203608.1225689-1-samitolvanen@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-08-28arm64: tegra: Add missing timeout clock to Tegra194 SDMMC nodesSowjanya Komatineni
commit 5425fb15d8ee ("arm64: tegra: Add Tegra194 chip device tree") Tegra194 uses separate SDMMC_LEGACY_TM clock for data timeout and this clock is not enabled currently which is not recommended. Tegra194 SDMMC advertises 12Mhz as timeout clock frequency in host capability register. So, this clock should be kept enabled by SDMMC driver. Fixes: 5425fb15d8ee ("arm64: tegra: Add Tegra194 chip device tree") Cc: stable <stable@vger.kernel.org> # 5.4 Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com> Link: https://lore.kernel.org/r/1598548861-32373-7-git-send-email-skomatineni@nvidia.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-08-28arm64: tegra: Add missing timeout clock to Tegra186 SDMMC nodesSowjanya Komatineni
commit 39cb62cb8973 ("arm64: tegra: Add Tegra186 support") Tegra186 uses separate SDMMC_LEGACY_TM clock for data timeout and this clock is not enabled currently which is not recommended. Tegra186 SDMMC advertises 12Mhz as timeout clock frequency in host capability register and uses it by default. So, this clock should be kept enabled by the SDMMC driver. Fixes: 39cb62cb8973 ("arm64: tegra: Add Tegra186 support") Cc: stable <stable@vger.kernel.org> # 5.4 Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com> Link: https://lore.kernel.org/r/1598548861-32373-6-git-send-email-skomatineni@nvidia.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-08-28arm64: tegra: Add missing timeout clock to Tegra210 SDMMCSowjanya Komatineni
commit 742af7e7a0a1 ("arm64: tegra: Add Tegra210 support") Tegra210 uses separate SDMMC_LEGACY_TM clock for data timeout and this clock is not enabled currently which is not recommended. Tegra SDMMC advertises 12Mhz as timeout clock frequency in host capability register. So, this clock should be kept enabled by SDMMC driver. Fixes: 742af7e7a0a1 ("arm64: tegra: Add Tegra210 support") Cc: stable <stable@vger.kernel.org> # 5.4 Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com> Link: https://lore.kernel.org/r/1598548861-32373-5-git-send-email-skomatineni@nvidia.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-08-28arm64: dts: mt7622: add reset node for mmc deviceWenbin Mei
This commit adds reset node for mmc device. Cc: <stable@vger.kernel.org> # v5.4+ Fixes: 966580ad236e ("mmc: mediatek: add support for MT7622 SoC") Signed-off-by: Wenbin Mei <wenbin.mei@mediatek.com> Tested-by: Frank Wunderlich <frank-w@public-files.de> Acked-by: Matthias Brugger <matthias.bgg@gmail.com> Link: https://lore.kernel.org/r/20200814014346.6496-3-wenbin.mei@mediatek.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-08-28powerpc/32s: Disable VMAP stack which CONFIG_ADB_PMUChristophe Leroy
low_sleep_handler() can't restore the context from virtual stack because the stack can hardly be accessed with MMU OFF. For now, disable VMAP stack when CONFIG_ADB_PMU is selected. Fixes: cd08f109e262 ("powerpc/32s: Enable CONFIG_VMAP_STACK") Cc: stable@vger.kernel.org # v5.6+ Reported-by: Giuseppe Sacco <giuseppe@sguazz.it> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ec96c15bfa1a7415ab604ee1c98cd45779c08be0.1598553015.git.christophe.leroy@csgroup.eu
2020-08-27arm64/cpuinfo: Remove unnecessary fallthrough annotationGustavo A. R. Silva
Fallthrough annotations for consecutive default and case labels are not necessary. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-27irqchip/eznps: Fix build error for !ARC700 buildsVineet Gupta
eznps driver is supposed to be platform independent however it ends up including stuff from inside arch/arc headers leading to rand config build errors. The quick hack to fix this (proper fix is too much chrun for non active user-base) is to add following to nps platform agnostic header. - copy AUX_IENABLE from arch/arc header - move CTOP_AUX_IACK from arch/arc/plat-eznps/*/** Reported-by: kernel test robot <lkp@intel.com> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/20200824095831.5lpkmkafelnvlpi2@linutronix.de Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2020-08-27ARC: show_regs: fix r12 printing and simplifyVineet Gupta
when working on ARC64, spotted an issue in ARCv2 reg file printing. print_reg_file() assumes contiguous reg-file whereas in ARCv2 they are not: r12 comes before r0-r11 due to hardware auto-save. Apparently this issue has been present since v2 port submission. To avoid bolting hacks for this discontinuity while looping through pt_regs, just ditching the loop and print pt_regs directly. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2020-08-27Revert "powerpc/powernv/idle: Replace CPU feature check with PVR check"Pratik Rajesh Sampat
cpuidle stop state implementation has minor optimizations for P10 where hardware preserves more SPR registers compared to P9. The current P9 driver works for P10, although does few extra save-restores. P9 driver can provide the required power management features like SMT thread folding and core level power savings on a P10 platform. Until the P10 stop driver is available, revert the commit which allows for only P9 systems to utilize cpuidle and blocks all idle stop states for P10. CPU idle states are enabled and tested on the P10 platform with this fix. This reverts commit 8747bf36f312356f8a295a0c39ff092d65ce75ae. Fixes: 8747bf36f312 ("powerpc/powernv/idle: Replace CPU feature check with PVR check") Signed-off-by: Pratik Rajesh Sampat <psampat@linux.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200826082918.89306-1-psampat@linux.ibm.com
2020-08-27powerpc/perf: Fix reading of MSR[HV/PR] bits in trace-imcAthira Rajeev
IMC trace-mode uses MSR[HV/PR] bits to set the cpumode for the instruction pointer captured in each sample. The bits are fetched from the third double word of the trace record. Reading third double word from IMC trace record should use be64_to_cpu() along with READ_ONCE inorder to fetch correct MSR[HV/PR] bits. Patch addresses this change. Currently we are using PERF_RECORD_MISC_HYPERVISOR as cpumode if MSR HV is 1 and PR is 0 which means the address is from host counter. But using PERF_RECORD_MISC_HYPERVISOR for host counter data will fail to resolve the address -> symbol during "perf report" because perf tools side uses PERF_RECORD_MISC_KERNEL to represent the host counter data. Therefore, fix the trace imc sample data to use PERF_RECORD_MISC_KERNEL as cpumode for host kernel information. Fixes: 77ca3951cc37 ("powerpc/perf: Add kernel support for new MSR[HV PR] bits in trace-imc") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1598424029-1662-1-git-send-email-atrajeev@linux.vnet.ibm.com
2020-08-27powerpc/perf: Fix crashes with generic_compat_pmu & BHRBAlexey Kardashevskiy
The bhrb_filter_map ("The Branch History Rolling Buffer") callback is only defined in raw CPUs' power_pmu structs. The "architected" CPUs use generic_compat_pmu, which does not have this callback, and crashes occur if a user tries to enable branch stack for an event. This add a NULL pointer check for bhrb_filter_map() which behaves as if the callback returned an error. This does not add the same check for config_bhrb() as the only caller checks for cpuhw->bhrb_users which remains zero if bhrb_filter_map==0. Fixes: be80e758d0c2 ("powerpc/perf: Add generic compat mode pmu driver") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200602025612.62707-1-aik@ozlabs.ru
2020-08-27powerpc/64s: Fix crash in load_fp_state() due to fpexc_modeMichael Ellerman
The recent commit 01eb01877f33 ("powerpc/64s: Fix restore_math unnecessarily changing MSR") changed some of the handling of floating point/vector restore. In particular it caused current->thread.fpexc_mode to be copied into the current MSR (via msr_check_and_set()), rather than just into regs->msr (which is moved into MSR on return to userspace). This can lead to a crash in the kernel if we take a floating point exception when restoring FPSCR: Oops: Exception in kernel mode, sig: 8 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 3 PID: 101213 Comm: ld64.so.2 Not tainted 5.9.0-rc1-00098-g18445bf405cb-dirty #9 NIP: c00000000000fbb4 LR: c00000000001a7ac CTR: c000000000183570 REGS: c0000016b7cfb3b0 TRAP: 0700 Not tainted (5.9.0-rc1-00098-g18445bf405cb-dirty) MSR: 900000000290b933 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44002444 XER: 00000000 CFAR: c00000000001a7a8 IRQMASK: 1 GPR00: c00000000001ae40 c0000016b7cfb640 c0000000011b7f00 c000001542a0f740 GPR04: c000001542a0f720 c000001542a0eb00 0000000000000900 c000001542a0eb00 GPR08: 000000000000000a 0000000000002000 9000000000009033 0000000000000000 GPR12: 0000000000004000 c0000017ffffd900 0000000000000001 c000000000df5a58 GPR16: c000000000e19c18 c0000000010e1123 0000000000000001 c000000000e1a638 GPR20: 0000000000000000 c0000000044b1d00 0000000000000000 c000001542a0f2a0 GPR24: 00000016c7fe0000 c000001542a0f720 c000000001c93da0 c000000000fe5f28 GPR28: c000001542a0f720 0000000000800000 c0000016b7cfbe90 0000000002802900 NIP load_fp_state+0x4/0x214 LR restore_math+0x17c/0x1f0 Call Trace: 0xc0000016b7cfb680 (unreliable) __switch_to+0x330/0x460 __schedule+0x318/0x920 schedule+0x74/0x140 schedule_timeout+0x318/0x3f0 wait_for_completion+0xc8/0x210 call_usermodehelper_exec+0x234/0x280 do_coredump+0xedc/0x13c0 get_signal+0x1d4/0xbe0 do_notify_resume+0x1a0/0x490 interrupt_exit_user_prepare+0x1c4/0x230 interrupt_return+0x14/0x1c0 Instruction dump: ebe10168 e88101a0 7c8ff120 382101e0 e8010010 7c0803a6 4e800020 790605c4 782905c4 7c0008a8 7c0008a8 c8030200 <fffe058e> 48000088 c8030000 c8230010 Fix it by only loading the fpexc_mode value into regs->msr. Also add a comment to explain that although VSX is subject to the value of fpexc_mode, we don't have to handle that separately because we only allow VSX to be enabled if FP is also enabled. Fixes: 01eb01877f33 ("powerpc/64s: Fix restore_math unnecessarily changing MSR") Reported-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Link: https://lore.kernel.org/r/20200825093424.3967813-1-mpe@ellerman.id.au
2020-08-27powerpc/64s: scv entry should set PPRNicholas Piggin
Kernel entry sets PPR to HMT_MEDIUM by convention. The scv entry path missed this. Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200825075309.224184-1-npiggin@gmail.com
2020-08-27x86/irq: Unbreak interrupt affinity settingThomas Gleixner
Several people reported that 5.8 broke the interrupt affinity setting mechanism. The consolidation of the entry code reused the regular exception entry code for device interrupts and changed the way how the vector number is conveyed from ptregs->orig_ax to a function argument. The low level entry uses the hardware error code slot to push the vector number onto the stack which is retrieved from there into a function argument and the slot on stack is set to -1. The reason for setting it to -1 is that the error code slot is at the position where pt_regs::orig_ax is. A positive value in pt_regs::orig_ax indicates that the entry came via a syscall. If it's not set to a negative value then a signal delivery on return to userspace would try to restart a syscall. But there are other places which rely on pt_regs::orig_ax being a valid indicator for syscall entry. But setting pt_regs::orig_ax to -1 has a nasty side effect vs. the interrupt affinity setting mechanism, which was overlooked when this change was made. Moving interrupts on x86 happens in several steps. A new vector on a different CPU is allocated and the relevant interrupt source is reprogrammed to that. But that's racy and there might be an interrupt already in flight to the old vector. So the old vector is preserved until the first interrupt arrives on the new vector and the new target CPU. Once that happens the old vector is cleaned up, but this cleanup still depends on the vector number being stored in pt_regs::orig_ax, which is now -1. That -1 makes the check for cleanup: pt_regs::orig_ax == new_vector always false. As a consequence the interrupt is moved once, but then it cannot be moved anymore because the cleanup of the old vector never happens. There would be several ways to convey the vector information to that place in the guts of the interrupt handling, but on deeper inspection it turned out that this check is pointless and a leftover from the old affinity model of X86 which supported multi-CPU affinities. Under this model it was possible that an interrupt had an old and a new vector on the same CPU, so the vector match was required. Under the new model the effective affinity of an interrupt is always a single CPU from the requested affinity mask. If the affinity mask changes then either the interrupt stays on the CPU and on the same vector when that CPU is still in the new affinity mask or it is moved to a different CPU, but it is never moved to a different vector on the same CPU. Ergo the cleanup check for the matching vector number is not required and can be removed which makes the dependency on pt_regs:orig_ax go away. The remaining check for new_cpu == smp_processsor_id() is completely sufficient. If it matches then the interrupt was successfully migrated and the cleanup can proceed. For paranoia sake add a warning into the vector assignment code to validate that the assumption of never moving to a different vector on the same CPU holds. Fixes: 633260fa143 ("x86/irq: Convey vector as argument and not in ptregs") Reported-by: Alex bykov <alex.bykov@scylladb.com> Reported-by: Avi Kivity <avi@scylladb.com> Reported-by: Alexander Graf <graf@amazon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Alexander Graf <graf@amazon.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87wo1ltaxz.fsf@nanos.tec.linutronix.de
2020-08-27x86/hotplug: Silence APIC only after all interrupts are migratedAshok Raj
There is a race when taking a CPU offline. Current code looks like this: native_cpu_disable() { ... apic_soft_disable(); /* * Any existing set bits for pending interrupt to * this CPU are preserved and will be sent via IPI * to another CPU by fixup_irqs(). */ cpu_disable_common(); { .... /* * Race window happens here. Once local APIC has been * disabled any new interrupts from the device to * the old CPU are lost */ fixup_irqs(); // Too late to capture anything in IRR. ... } } The fix is to disable the APIC *after* cpu_disable_common(). Testing was done with a USB NIC that provided a source of frequent interrupts. A script migrated interrupts to a specific CPU and then took that CPU offline. Fixes: 60dcaad5736f ("x86/hotplug: Silence APIC and NMI when CPU is dead") Reported-by: Evan Green <evgreen@chromium.org> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mathias Nyman <mathias.nyman@linux.intel.com> Tested-by: Evan Green <evgreen@chromium.org> Reviewed-by: Evan Green <evgreen@chromium.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/875zdarr4h.fsf@nanos.tec.linutronix.de/ Link: https://lore.kernel.org/r/1598501530-45821-1-git-send-email-ashok.raj@intel.com
2020-08-26s390/vmem: fix vmem_add_range for 4-level pagingVasily Gorbik
The kernel currently crashes if 4-level paging is used. Add missing p4d_populate for just allocated pud entry. Fixes: 3e0d3e408e63 ("s390/vmem: consolidate vmem_add_range() and vmem_remove_range()") Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-08-26s390: don't trace preemption in percpu macrosSven Schnelle
Since commit a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables") the lockdep code itself uses percpu variables. This leads to recursions because the percpu macros are calling preempt_enable() which might call trace_preempt_on(). Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-08-26lockdep: Only trace IRQ edgesNicholas Piggin
Problem: raw_local_irq_save(); // software state on local_irq_save(); // software state off ... local_irq_restore(); // software state still off, because we don't enable IRQs raw_local_irq_restore(); // software state still off, *whoopsie* existing instances: - lock_acquire() raw_local_irq_save() __lock_acquire() arch_spin_lock(&graph_lock) pv_wait() := kvm_wait() (same or worse for Xen/HyperV) local_irq_save() - trace_clock_global() raw_local_irq_save() arch_spin_lock() pv_wait() := kvm_wait() local_irq_save() - apic_retrigger_irq() raw_local_irq_save() apic->send_IPI() := default_send_IPI_single_phys() local_irq_save() Possible solutions: A) make it work by enabling the tracing inside raw_*() B) make it work by keeping tracing disabled inside raw_*() C) call it broken and clean it up now Now, given that the only reason to use the raw_* variant is because you don't want tracing. Therefore A) seems like a weird option (although it can be done). C) is tempting, but OTOH it ends up converting a _lot_ of code to raw just because there is one raw user, this strips the validation/tracing off for all the other users. So we pick B) and declare any code that ends up doing: raw_local_irq_save() local_irq_save() lockdep_assert_irqs_disabled(); broken. AFAICT this problem has existed forever, the only reason it came up is because commit: 859d069ee1dd ("lockdep: Prepare for NMI IRQ state tracking") changed IRQ tracing vs lockdep recursion and the first instance is fairly common, the other cases hardly ever happen. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [rewrote changelog] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200723105615.1268126-1-npiggin@gmail.com
2020-08-26mips: Implement arch_irqs_disabled()Peter Zijlstra
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Paul Burton <paulburton@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200826101653.GE1362448@hirez.programming.kicks-ass.net
2020-08-26arm64: Implement arch_irqs_disabled()Peter Zijlstra
Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200821085348.664425120@infradead.org
2020-08-26nds32: Implement arch_irqs_disabled()Peter Zijlstra
Cc: Nick Hu <nickhu@andestech.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Vincent Chen <deanbo422@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.604899379@infradead.org