summaryrefslogtreecommitdiff
path: root/arch/arm64/include
AgeCommit message (Collapse)Author
2020-12-09Merge branches 'for-next/kvm-build-fix', 'for-next/va-refactor', ↵Catalin Marinas
'for-next/lto', 'for-next/mem-hotplug', 'for-next/cppc-ffh', 'for-next/pad-image-header', 'for-next/zone-dma-default-32-bit', 'for-next/signal-tag-bits' and 'for-next/cmdline-extended' into for-next/core * for-next/kvm-build-fix: : Fix KVM build issues with 64K pages KVM: arm64: Fix build error in user_mem_abort() * for-next/va-refactor: : VA layout changes arm64: mm: don't assume struct page is always 64 bytes Documentation/arm64: fix RST layout of memory.rst arm64: mm: tidy up top of kernel VA space arm64: mm: make vmemmap region a projection of the linear region arm64: mm: extend linear region for 52-bit VA configurations * for-next/lto: : Upgrade READ_ONCE() to RCpc acquire on arm64 with LTO arm64: lto: Strengthen READ_ONCE() to acquire when CONFIG_LTO=y arm64: alternatives: Remove READ_ONCE() usage during patch operation arm64: cpufeatures: Add capability for LDAPR instruction arm64: alternatives: Split up alternative.h arm64: uaccess: move uao_* alternatives to asm-uaccess.h * for-next/mem-hotplug: : Memory hotplug improvements arm64/mm/hotplug: Ensure early memory sections are all online arm64/mm/hotplug: Enable MEM_OFFLINE event handling arm64/mm/hotplug: Register boot memory hot remove notifier earlier arm64: mm: account for hotplug memory when randomizing the linear region * for-next/cppc-ffh: : Add CPPC FFH support using arm64 AMU counters arm64: abort counter_read_on_cpu() when irqs_disabled() arm64: implement CPPC FFH support using AMUs arm64: split counter validation function arm64: wrap and generalise counter read functions * for-next/pad-image-header: : Pad Image header to 64KB and unmap it arm64: head: tidy up the Image header definition arm64/head: avoid symbol names pointing into first 64 KB of kernel image arm64: omit [_text, _stext) from permanent kernel mapping * for-next/zone-dma-default-32-bit: : Default to 32-bit wide ZONE_DMA (previously reduced to 1GB for RPi4) of: unittest: Fix build on architectures without CONFIG_OF_ADDRESS mm: Remove examples from enum zone_type comment arm64: mm: Set ZONE_DMA size based on early IORT scan arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges of: unittest: Add test for of_dma_get_max_cpu_address() of/address: Introduce of_dma_get_max_cpu_address() arm64: mm: Move zone_dma_bits initialization into zone_sizes_init() arm64: mm: Move reserve_crashkernel() into mem_init() arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required arm64: Ignore any DMA offsets in the max_zone_phys() calculation * for-next/signal-tag-bits: : Expose the FAR_EL1 tag bits in siginfo arm64: expose FAR_EL1 tag bits in siginfo signal: define the SA_EXPOSE_TAGBITS bit in sa_flags signal: define the SA_UNSUPPORTED bit in sa_flags arch: provide better documentation for the arch-specific SA_* flags signal: clear non-uapi flag bits when passing/returning sa_flags arch: move SA_* definitions to generic headers parisc: start using signal-defs.h parisc: Drop parisc special case for __sighandler_t * for-next/cmdline-extended: : Add support for CONFIG_CMDLINE_EXTENDED arm64: Extend the kernel command line from the bootloader arm64: kaslr: Refactor early init command line parsing
2020-11-23arm64: expose FAR_EL1 tag bits in siginfoPeter Collingbourne
The kernel currently clears the tag bits (i.e. bits 56-63) in the fault address exposed via siginfo.si_addr and sigcontext.fault_address. However, the tag bits may be needed by tools in order to accurately diagnose memory errors, such as HWASan [1] or future tools based on the Memory Tagging Extension (MTE). Expose these bits via the arch_untagged_si_addr mechanism, so that they are only exposed to signal handlers with the SA_EXPOSE_TAGBITS flag set. [1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://linux-review.googlesource.com/id/Ia8876bad8c798e0a32df7c2ce1256c4771c81446 Link: https://lore.kernel.org/r/0010296597784267472fa13b39f8238d87a72cf8.1605904350.git.pcc@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-11-13arm64: implement CPPC FFH support using AMUsIonela Voinescu
If Activity Monitors (AMUs) are present, two of the counters can be used to implement support for CPPC's (Collaborative Processor Performance Control) delivered and reference performance monitoring functionality using FFH (Functional Fixed Hardware). Given that counters for a certain CPU can only be read from that CPU, while FFH operations can be called from any CPU for any of the CPUs, use smp_call_function_single() to provide the requested values. Therefore, depending on the register addresses, the following values are returned: - 0x0 (DeliveredPerformanceCounterRegister): AMU core counter - 0x1 (ReferencePerformanceCounterRegister): AMU constant counter The use of Activity Monitors is hidden behind the generic cpu_read_{corecnt,constcnt}() functions. Read functionality for these two registers represents the only current FFH support for CPPC. Read operations for other register values or write operation for all registers are unsupported. Therefore, keep CPPC's FFH unsupported if no CPUs have valid AMU frequency counters. For this purpose, the get_cpu_with_amu_feat() is introduced. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201106125334.21570-4-ionela.voinescu@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-11-13arm64: wrap and generalise counter read functionsIonela Voinescu
In preparation for other uses of Activity Monitors (AMU) cycle counters, place counter read functionality in generic functions that can reused: read_corecnt() and read_constcnt(). As a result, implement update_freq_counters_refs() to replace init_cpu_freq_invariance_counters() and both initialise and update the per-cpu reference variables. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201106125334.21570-2-ionela.voinescu@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-11-12arm64: mm: don't assume struct page is always 64 bytesArd Biesheuvel
Commit 8c96400d6a39be7 simplified the page-to-virt and virt-to-page conversions, based on the assumption that struct page is always 64 bytes in size, in which case we can use a single signed shift to perform the conversion (provided that the vmemmap array is placed appropriately in the kernel VA space) Unfortunately, this assumption turns out not to hold, and so we need to revert part of this commit, and go back to an affine transformation. Given that all the quantities involved are compile time constants, this should not make any practical difference. Fixes: 8c96400d6a39 ("arm64: mm: make vmemmap region a projection of the linear region") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20201110180511.29083-1-ardb@kernel.org Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-11-09arm64: lto: Strengthen READ_ONCE() to acquire when CONFIG_LTO=yWill Deacon
When building with LTO, there is an increased risk of the compiler converting an address dependency headed by a READ_ONCE() invocation into a control dependency and consequently allowing for harmful reordering by the CPU. Ensure that such transformations are harmless by overriding the generic READ_ONCE() definition with one that provides acquire semantics when building with LTO. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-11-09arm64: cpufeatures: Add capability for LDAPR instructionWill Deacon
Armv8.3 introduced the LDAPR instruction, which provides weaker memory ordering semantics than LDARi (RCpc vs RCsc). Generally, we provide an RCsc implementation when implementing the Linux memory model, but LDAPR can be used as a useful alternative to dependency ordering, particularly when the compiler is capable of breaking the dependencies. Since LDAPR is not available on all CPUs, add a cpufeature to detect it at runtime and allow the instruction to be used with alternative code patching. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-11-09arm64: alternatives: Split up alternative.hWill Deacon
asm/alternative.h contains both the macros needed to use alternatives, as well the type definitions and function prototypes for applying them. Split the header in two, so that alternatives can be used from core header files such as linux/compiler.h without the risk of circular includes Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-11-09arm64: uaccess: move uao_* alternatives to asm-uaccess.hMark Rutland
The uao_* alternative asm macros are only used by the uaccess assembly routines in arch/arm64/lib/, where they are included indirectly via asm-uaccess.h. Since they're specific to the uaccess assembly (and will lose the alternatives in subsequent patches), let's move them into asm-uaccess.h. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> [will: update #include in mte.S to pull in uao asm macros] Signed-off-by: Will Deacon <will@kernel.org>
2020-11-09arm64: mm: tidy up top of kernel VA spaceArd Biesheuvel
Tidy up the way the top of the kernel VA space is organized, by mirroring the 256 MB region we have below the vmalloc space, and populating it top down with the PCI I/O space, some guard regions, and the fixmap region. The latter region is itself populated top down, and today only covers about 4 MB, and so 224 MB is ample, and no guard region is therefore required. The resulting layout is identical between 48-bit/4k and 52-bit/64k configurations. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Steve Capper <steve.capper@arm.com> Link: https://lore.kernel.org/r/20201008153602.9467-5-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-11-09arm64: mm: make vmemmap region a projection of the linear regionArd Biesheuvel
Now that we have reverted the introduction of the vmemmap struct page pointer and the separate physvirt_offset, we can simplify things further, and place the vmemmap region in the VA space in such a way that virtual to page translations and vice versa can be implemented using a single arithmetic shift. One happy coincidence resulting from this is that the 48-bit/4k and 52-bit/64k configurations (which are assumed to be the two most prevalent) end up with the same placement of the vmemmap region. In a subsequent patch, we will take advantage of this, and unify the memory maps even more. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Steve Capper <steve.capper@arm.com> Link: https://lore.kernel.org/r/20201008153602.9467-4-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-11-09arm64: mm: extend linear region for 52-bit VA configurationsArd Biesheuvel
For historical reasons, the arm64 kernel VA space is configured as two equally sized halves, i.e., on a 48-bit VA build, the VA space is split into a 47-bit vmalloc region and a 47-bit linear region. When support for 52-bit virtual addressing was added, this equal split was kept, resulting in a substantial waste of virtual address space in the linear region: 48-bit VA 52-bit VA 0xffff_ffff_ffff_ffff +-------------+ +-------------+ | vmalloc | | vmalloc | 0xffff_8000_0000_0000 +-------------+ _PAGE_END(48) +-------------+ | linear | : : 0xffff_0000_0000_0000 +-------------+ : : : : : : : : : : : : : : : : : currently : : unusable : : : : : : unused : : by : : : : : : : : hardware : : : : : : : 0xfff8_0000_0000_0000 : : _PAGE_END(52) +-------------+ : : | | : : | | : : | | : : | | : : | | : unusable : | | : : | linear | : by : | | : : | region | : hardware : | | : : | | : : | | : : | | : : | | : : | | : : | | 0xfff0_0000_0000_0000 +-------------+ PAGE_OFFSET +-------------+ As illustrated above, the 52-bit VA kernel uses 47 bits for the vmalloc space (as before), to ensure that a single 64k granule kernel image can support any 64k granule capable system, regardless of whether it supports the 52-bit virtual addressing extension. However, due to the fact that the VA space is still split in equal halves, the linear region is only 2^51 bytes in size, wasting almost half of the 52-bit VA space. Let's fix this, by abandoning the equal split, and simply assigning all VA space outside of the vmalloc region to the linear region. The KASAN shadow region is reconfigured so that it ends at the start of the vmalloc region, and grows downwards. That way, the arrangement of the vmalloc space (which contains kernel mappings, modules, BPF region, the vmemmap array etc) is identical between non-KASAN and KASAN builds, which aids debugging. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Steve Capper <steve.capper@arm.com> Link: https://lore.kernel.org/r/20201008153602.9467-3-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-11-06Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Here's the weekly batch of fixes for arm64. Not an awful lot here, but there are still a few unresolved issues relating to CPU hotplug, RCU and IRQ tracing that I hope to queue fixes for next week. Summary: - Fix early use of kprobes - Fix kernel placement in kexec_file_load() - Bump maximum number of NUMA nodes" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: kexec_file: try more regions if loading segments fails arm64: kprobes: Use BRK instead of single-step when executing instructions out-of-line arm64: NUMA: Kconfig: Increase NODES_SHIFT to 4
2020-11-03arm64: kprobes: Use BRK instead of single-step when executing instructions ↵Jean-Philippe Brucker
out-of-line Commit 36dadef23fcc ("kprobes: Init kprobes in early_initcall") enabled using kprobes from early_initcall. Unfortunately at this point the hardware debug infrastructure is not operational. The OS lock may still be locked, and the hardware watchpoints may have unknown values when kprobe enables debug monitors to single-step instructions. Rather than using hardware single-step, append a BRK instruction after the instruction to be executed out-of-line. Fixes: 36dadef23fcc ("kprobes: Init kprobes in early_initcall") Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20201103134900.337243-1-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2020-11-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - selftest fix - force PTE mapping on device pages provided via VFIO - fix detection of cacheable mapping at S2 - fallback to PMD/PTE mappings for composite huge pages - fix accounting of Stage-2 PGD allocation - fix AArch32 handling of some of the debug registers - simplify host HYP entry - fix stray pointer conversion on nVHE TLB invalidation - fix initialization of the nVHE code - simplify handling of capabilities exposed to HYP - nuke VCPUs caught using a forbidden AArch32 EL0 x86: - new nested virtualization selftest - miscellaneous fixes - make W=1 fixes - reserve new CPUID bit in the KVM leaves" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: vmx: remove unused variable KVM: selftests: Don't require THP to run tests KVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work again KVM: selftests: test behavior of unmapped L2 APIC-access address KVM: x86: Fix NULL dereference at kvm_msr_ignored_check() KVM: x86: replace static const variables with macros KVM: arm64: Handle Asymmetric AArch32 systems arm64: cpufeature: upgrade hyp caps to final arm64: cpufeature: reorder cpus_have_{const, final}_cap() KVM: arm64: Factor out is_{vhe,nvhe}_hyp_code() KVM: arm64: Force PTE mapping on fault resulting in a device mapping KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes KVM: arm64: Fix masks in stage2_pte_cacheable() KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCR KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT KVM: arm64: Drop useless PAN setting on host EL1 to EL2 transition KVM: arm64: Remove leftover kern_hyp_va() in nVHE TLB invalidation KVM: arm64: Don't corrupt tpidr_el2 on failed HVC call x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID
2020-10-30Merge tag 'kvmarm-fixes-5.10-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.10, take #1 - Force PTE mapping on device pages provided via VFIO - Fix detection of cacheable mapping at S2 - Fallback to PMD/PTE mappings for composite huge pages - Fix accounting of Stage-2 PGD allocation - Fix AArch32 handling of some of the debug registers - Simplify host HYP entry - Fix stray pointer conversion on nVHE TLB invalidation - Fix initialization of the nVHE code - Simplify handling of capabilities exposed to HYP - Nuke VCPUs caught using a forbidden AArch32 EL0
2020-10-30arm64: cpufeature: upgrade hyp caps to finalMark Rutland
We finalize caps before initializing kvm hyp code, and any use of cpus_have_const_cap() in kvm hyp code generates redundant and potentially unsound code to read the cpu_hwcaps array. A number of helper functions used in both hyp context and regular kernel context use cpus_have_const_cap(), as some regular kernel code runs before the capabilities are finalized. It's tedious and error-prone to write separate copies of these for hyp and non-hyp code. So that we can avoid the redundant code, let's automatically upgrade cpus_have_const_cap() to cpus_have_final_cap() when used in hyp context. With this change, there's never a reason to access to cpu_hwcaps array from hyp code, and we don't need to create an NVHE alias for this. This should have no effect on non-hyp code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: David Brazdil <dbrazdil@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201026134931.28246-4-mark.rutland@arm.com
2020-10-30arm64: cpufeature: reorder cpus_have_{const, final}_cap()Mark Rutland
In a subsequent patch we'll modify cpus_have_const_cap() to call cpus_have_final_cap(), and hence we need to define cpus_have_final_cap() first. To make subsequent changes easier to follow, this patch reorders the two without making any other changes. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: David Brazdil <dbrazdil@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201026134931.28246-3-mark.rutland@arm.com
2020-10-30KVM: arm64: Factor out is_{vhe,nvhe}_hyp_code()Mark Rutland
Currently has_vhe() detects whether it is being compiled for VHE/NVHE hyp code based on preprocessor definitions, and uses this knowledge to avoid redundant runtime checks. There are other cases where we'd like to use this knowledge, so let's factor the preprocessor checks out into separate helpers. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: David Brazdil <dbrazdil@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201026134931.28246-2-mark.rutland@arm.com
2020-10-29KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCRMarc Zyngier
The DBGD{CCINT,SCRext} and DBGVCR register entries in the cp14 array are missing their target register, resulting in all accesses being targetted at the guard sysreg (indexed by __INVALID_SYSREG__). Point the emulation code at the actual register entries. Fixes: bdfb4b389c8d ("arm64: KVM: add trap handlers for AArch32 debug registers") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201029172409.2768336-1-maz@kernel.org
2020-10-29arm64: Add workaround for Arm Cortex-A77 erratum 1508412Rob Herring
On Cortex-A77 r0p0 and r1p0, a sequence of a non-cacheable or device load and a store exclusive or PAR_EL1 read can cause a deadlock. The workaround requires a DMB SY before and after a PAR_EL1 register read. In addition, it's possible an interrupt (doing a device read) or KVM guest exit could be taken between the DMB and PAR read, so we also need a DMB before returning from interrupt and before returning to a guest. A deadlock is still possible with the workaround as KVM guests must also have the workaround. IOW, a malicious guest can deadlock an affected systems. This workaround also depends on a firmware counterpart to enable the h/w to insert DMB SY after load and store exclusive instructions. See the errata document SDEN-1152370 v10 [1] for more information. [1] https://static.docs.arm.com/101992/0010/Arm_Cortex_A77_MP074_Software_Developer_Errata_Notice_v10.pdf Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/20201028182839.166037-2-robh@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-10-29arm64: Add part number for Arm Cortex-A77Rob Herring
Add the MIDR part number info for the Arm Cortex-A77. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201028182839.166037-1-robh@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-10-28arm64: avoid -Woverride-init warningArnd Bergmann
The icache_policy_str[] definition causes a warning when extra warning flags are enabled: arch/arm64/kernel/cpuinfo.c:38:26: warning: initialized field overwritten [-Woverride-init] 38 | [ICACHE_POLICY_VIPT] = "VIPT", | ^~~~~~ arch/arm64/kernel/cpuinfo.c:38:26: note: (near initialization for 'icache_policy_str[2]') arch/arm64/kernel/cpuinfo.c:39:26: warning: initialized field overwritten [-Woverride-init] 39 | [ICACHE_POLICY_PIPT] = "PIPT", | ^~~~~~ arch/arm64/kernel/cpuinfo.c:39:26: note: (near initialization for 'icache_policy_str[3]') arch/arm64/kernel/cpuinfo.c:40:27: warning: initialized field overwritten [-Woverride-init] 40 | [ICACHE_POLICY_VPIPT] = "VPIPT", | ^~~~~~~ arch/arm64/kernel/cpuinfo.c:40:27: note: (near initialization for 'icache_policy_str[0]') There is no real need for the default initializer here, as printing a NULL string is harmless. Rewrite the logic to have an explicit reserved value for the only one that uses the default value. This partially reverts the commit that removed ICACHE_POLICY_AIVIVT. Fixes: 155433cb365e ("arm64: cache: Remove support for ASID-tagged VIVT I-caches") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20201026193807.3816388-1-arnd@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-10-25treewide: Convert macro and uses of __section(foo) to __section("foo")Joe Perches
Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "For x86, there is a new alternative and (in the future) more scalable implementation of extended page tables that does not need a reverse map from guest physical addresses to host physical addresses. For now it is disabled by default because it is still lacking a few of the existing MMU's bells and whistles. However it is a very solid piece of work and it is already available for people to hammer on it. Other updates: ARM: - New page table code for both hypervisor and guest stage-2 - Introduction of a new EL2-private host context - Allow EL2 to have its own private per-CPU variables - Support of PMU event filtering - Complete rework of the Spectre mitigation PPC: - Fix for running nested guests with in-kernel IRQ chip - Fix race condition causing occasional host hard lockup - Minor cleanups and bugfixes x86: - allow trapping unknown MSRs to userspace - allow userspace to force #GP on specific MSRs - INVPCID support on AMD - nested AMD cleanup, on demand allocation of nested SVM state - hide PV MSRs and hypercalls for features not enabled in CPUID - new test for MSR_IA32_TSC writes from host and guest - cleanups: MMU, CPUID, shared MSRs - LAPIC latency optimizations ad bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits) kvm: x86/mmu: NX largepage recovery for TDP MMU kvm: x86/mmu: Don't clear write flooding count for direct roots kvm: x86/mmu: Support MMIO in the TDP MMU kvm: x86/mmu: Support write protection for nesting in tdp MMU kvm: x86/mmu: Support disabling dirty logging for the tdp MMU kvm: x86/mmu: Support dirty logging for the TDP MMU kvm: x86/mmu: Support changed pte notifier in tdp MMU kvm: x86/mmu: Add access tracking for tdp_mmu kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU kvm: x86/mmu: Add TDP MMU PF handler kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg kvm: x86/mmu: Support zapping SPTEs in the TDP MMU KVM: Cache as_id in kvm_memory_slot kvm: x86/mmu: Add functions to handle changed TDP SPTEs kvm: x86/mmu: Allocate and free TDP MMU roots kvm: x86/mmu: Init / Uninit the TDP MMU kvm: x86/mmu: Introduce tdp_iter KVM: mmu: extract spte.h and spte.c KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp ...
2020-10-23Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull more arm64 updates from Will Deacon: "A small selection of further arm64 fixes and updates. Most of these are fixes that came in during the merge window, with the exception of the HAVE_MOVE_PMD mremap() speed-up which we discussed back in 2018 and somehow forgot to enable upstream. - Improve performance of Spectre-v2 mitigation on Falkor CPUs (if you're lucky enough to have one) - Select HAVE_MOVE_PMD. This has been shown to improve mremap() performance, which is used heavily by the Android runtime GC, and it seems we forgot to enable this upstream back in 2018. - Ensure linker flags are consistent between LLVM and BFD - Fix stale comment in Spectre mitigation rework - Fix broken copyright header - Fix KASLR randomisation of the linear map - Prevent arm64-specific prctl()s from compat tasks (return -EINVAL)" Link: https://lore.kernel.org/kvmarm/20181108181201.88826-3-joelaf@google.com/ * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: proton-pack: Update comment to reflect new function name arm64: spectre-v2: Favour CPU-specific mitigation at EL2 arm64: link with -z norelro regardless of CONFIG_RELOCATABLE arm64: Fix a broken copyright header in gen_vdso_offsets.sh arm64: mremap speedup - Enable HAVE_MOVE_PMD arm64: mm: use single quantity to represent the PA to VA translation arm64: reject prctl(PR_PAC_RESET_KEYS) on compat tasks
2020-10-22Merge tag 'kbuild-v5.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Support 'make compile_commands.json' to generate the compilation database more easily, avoiding stale entries - Support 'make clang-analyzer' and 'make clang-tidy' for static checks using clang-tidy - Preprocess scripts/modules.lds.S to allow CONFIG options in the module linker script - Drop cc-option tests from compiler flags supported by our minimal GCC/Clang versions - Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y - Use sha1 build id for both BFD linker and LLD - Improve deb-pkg for reproducible builds and rootless builds - Remove stale, useless scripts/namespace.pl - Turn -Wreturn-type warning into error - Fix build error of deb-pkg when CONFIG_MODULES=n - Replace 'hostname' command with more portable 'uname -n' - Various Makefile cleanups * tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits) kbuild: Use uname for LINUX_COMPILE_HOST detection kbuild: Only add -fno-var-tracking-assignments for old GCC versions kbuild: remove leftover comment for filechk utility treewide: remove DISABLE_LTO kbuild: deb-pkg: clean up package name variables kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n kbuild: enforce -Werror=return-type scripts: remove namespace.pl builddeb: Add support for all required debian/rules targets builddeb: Enable rootless builds builddeb: Pass -n to gzip for reproducible packages kbuild: split the build log of kallsyms kbuild: explicitly specify the build id style scripts/setlocalversion: make git describe output more reliable kbuild: remove cc-option test of -Werror=date-time kbuild: remove cc-option test of -fno-stack-check kbuild: remove cc-option test of -fno-strict-overflow kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan kbuild: do not create built-in objects for external module builds ...
2020-10-20Merge tag 'kvmarm-5.10' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.10 - New page table code for both hypervisor and guest stage-2 - Introduction of a new EL2-private host context - Allow EL2 to have its own private per-CPU variables - Support of PMU event filtering - Complete rework of the Spectre mitigation
2020-10-18mm/madvise: introduce process_madvise() syscall: an external memory hinting APIMinchan Kim
There is usecase that System Management Software(SMS) want to give a memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the case of Android, it is the ActivityManagerService. The information required to make the reclaim decision is not known to the app. Instead, it is known to the centralized userspace daemon(ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the issue, this patch introduces a new syscall process_madvise(2). It uses pidfd of an external process to give the hint. It also supports vector address range because Android app has thousands of vmas due to zygote so it's totally waste of CPU and power if we should call the syscall one by one for each vma.(With testing 2000-vma syscall vs 1-vector syscall, it showed 15% performance improvement. I think it would be bigger in real practice because the testing ran very cache friendly environment). Another potential use case for the vector range is to amortize the cost ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could benefit users like TCP receive zerocopy and malloc implementations. In future, we could find more usecases for other advises so let's make it happens as API since we introduce a new syscall at this moment. With that, existing madvise(2) user could replace it with process_madvise(2) with their own pid if they want to have batch address ranges support feature. ince it could affect other process's address range, only privileged process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same UID) gives it the right to ptrace the process could use it successfully. The flag argument is reserved for future use if we need to extend the API. I think supporting all hints madvise has/will supported/support to process_madvise is rather risky. Because we are not sure all hints make sense from external process and implementation for the hint may rely on the caller being in the current context so it could be error-prone. Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this patch. If someone want to add other hints, we could hear the usecase and review it for each hint. It's safer for maintenance rather than introducing a buggy syscall but hard to fix it later. So finally, the API is as follows, ssize_t process_madvise(int pidfd, const struct iovec *iovec, unsigned long vlen, int advice, unsigned int flags); DESCRIPTION The process_madvise() system call is used to give advice or directions to the kernel about the address ranges from external process as well as local process. It provides the advice to address ranges of process described by iovec and vlen. The goal of such advice is to improve system or application performance. The pidfd selects the process referred to by the PID file descriptor specified in pidfd. (See pidofd_open(2) for further information) The pointer iovec points to an array of iovec structures, defined in <sys/uio.h> as: struct iovec { void *iov_base; /* starting address */ size_t iov_len; /* number of bytes to be advised */ }; The iovec describes address ranges beginning at address(iov_base) and with size length of bytes(iov_len). The vlen represents the number of elements in iovec. The advice is indicated in the advice argument, which is one of the following at this moment if the target process specified by pidfd is external. MADV_COLD MADV_PAGEOUT Permission to provide a hint to external process is governed by a ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2). The process_madvise supports every advice madvise(2) has if target process is in same thread group with calling process so user could use process_madvise(2) to extend existing madvise(2) to support vector address ranges. RETURN VALUE On success, process_madvise() returns the number of bytes advised. This return value may be less than the total number of requested bytes, if an error occurred. The caller should check return value to determine whether a partial advice occurred. FAQ: Q.1 - Why does any external entity have better knowledge? Quote from Sandeep "For Android, every application (including the special SystemServer) are forked from Zygote. The reason of course is to share as many libraries and classes between the two as possible to benefit from the preloading during boot. After applications start, (almost) all of the APIs end up calling into this SystemServer process over IPC (binder) and back to the application. In a fully running system, the SystemServer monitors every single process periodically to calculate their PSS / RSS and also decides which process is "important" to the user for interactivity. So, because of how these processes start _and_ the fact that the SystemServer is looping to monitor each process, it does tend to *know* which address range of the application is not used / useful. Besides, we can never rely on applications to clean things up themselves. We've had the "hey app1, the system is low on memory, please trim your memory usage down" notifications for a long time[1]. They rely on applications honoring the broadcasts and very few do. So, if we want to avoid the inevitable killing of the application and restarting it, some way to be able to tell the OS about unimportant memory in these applications will be useful. - ssp Q.2 - How to guarantee the race(i.e., object validation) between when giving a hint from an external process and get the hint from the target process? process_madvise operates on the target process's address space as it exists at the instant that process_madvise is called. If the space target process can run between the time the process_madvise process inspects the target process address space and the time that process_madvise is actually called, process_madvise may operate on memory regions that the calling process does not expect. It's the responsibility of the process calling process_madvise to close this race condition. For example, the calling process can suspend the target process with ptrace, SIGSTOP, or the freezer cgroup so that it doesn't have an opportunity to change its own address space before process_madvise is called. Another option is to operate on memory regions that the caller knows a priori will be unchanged in the target process. Yet another option is to accept the race for certain process_madvise calls after reasoning that mistargeting will do no harm. The suggested API itself does not provide synchronization. It also apply other APIs like move_pages, process_vm_write. The race isn't really a problem though. Why is it so wrong to require that callers do their own synchronization in some manner? Nobody objects to write(2) merely because it's possible for two processes to open the same file and clobber each other's writes --- instead, we tell people to use flock or something. Think about mmap. It never guarantees newly allocated address space is still valid when the user tries to access it because other threads could unmap the memory right before. That's where we need synchronization by using other API or design from userside. It shouldn't be part of API itself. If someone needs more fine-grained synchronization rather than process level, there were two ideas suggested - cookie[2] and anon-fd[3]. Both are applicable via using last reserved argument of the API but I don't think it's necessary right now since we have already ways to prevent the race so don't want to add additional complexity with more fine-grained optimization model. To make the API extend, it reserved an unsigned long as last argument so we could support it in future if someone really needs it. Q.3 - Why doesn't ptrace work? Injecting an madvise in the target process using ptrace would not work for us because such injected madvise would have to be executed by the target process, which means that process would have to be runnable and that creates the risk of the abovementioned race and hinting a wrong VMA. Furthermore, we want to act the hint in caller's context, not the callee's, because the callee is usually limited in cpuset/cgroups or even freezed state so they can't act by themselves quick enough, which causes more thrashing/kill. It doesn't work if the target process are ptraced(e.g., strace, debugger, minidump) because a process can have at most one ptracer. [1] https://developer.android.com/topic/performance/memory" [2] process_getinfo for getting the cookie which is updated whenever vma of process address layout are changed - Daniel Colascione - https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224 [3] anonymous fd which is used for the object(i.e., address range) validation - Michal Hocko - https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/ [minchan@kernel.org: fix process_madvise build break for arm64] Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com [minchan@kernel.org: fix build error for mips of process_madvise] Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com [akpm@linux-foundation.org: fix patch ordering issue] [akpm@linux-foundation.org: fix arm64 whoops] [minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian] [akpm@linux-foundation.org: fix i386 build] [sfr@canb.auug.org.au: fix syscall numbering] Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au [sfr@canb.auug.org.au: madvise.c needs compat.h] Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au [minchan@kernel.org: fix mips build] Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com [yuehaibing@huawei.com: remove duplicate header which is included twice] Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com [minchan@kernel.org: do not use helper functions for process_madvise] Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com [akpm@linux-foundation.org: pidfd_get_pid() gained an argument] [sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"] Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <christian@brauner.io> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: <linux-man@vger.kernel.org> Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-15arm64: mm: use single quantity to represent the PA to VA translationArd Biesheuvel
On arm64, the global variable memstart_addr represents the physical address of PAGE_OFFSET, and so physical to virtual translations or vice versa used to come down to simple additions or subtractions involving the values of PAGE_OFFSET and memstart_addr. When support for 52-bit virtual addressing was introduced, we had to deal with PAGE_OFFSET potentially being outside of the region that can be covered by the virtual range (as the 52-bit VA capable build needs to be able to run on systems that are only 48-bit VA capable), and for this reason, another translation was introduced, and recorded in the global variable physvirt_offset. However, if we go back to the original definition of memstart_addr, i.e., the physical address of PAGE_OFFSET, it turns out that there is no need for two separate translations: instead, we can simply subtract the size of the unaddressable VA space from memstart_addr to make the available physical memory appear in the 48-bit addressable VA region. This simplifies things, but also fixes a bug on KASLR builds, which may update memstart_addr later on in arm64_memblock_init(), but fails to update vmemmap and physvirt_offset accordingly. Fixes: 5383cc6efed1 ("arm64: mm: Introduce vabits_actual") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Steve Capper <steve.capper@arm.com> Link: https://lore.kernel.org/r/20201008153602.9467-2-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-10-14Merge tag 'iommu-updates-v5.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - ARM-SMMU Updates from Will: - Continued SVM enablement, where page-table is shared with CPU - Groundwork to support integrated SMMU with Adreno GPU - Allow disabling of MSI-based polling on the kernel command-line - Minor driver fixes and cleanups (octal permissions, error messages, ...) - Secure Nested Paging Support for AMD IOMMU. The IOMMU will fault when a device tries DMA on memory owned by a guest. This needs new fault-types as well as a rewrite of the IOMMU memory semaphore for command completions. - Allow broken Intel IOMMUs (wrong address widths reported) to still be used for interrupt remapping. - IOMMU UAPI updates for supporting vSVA, where the IOMMU can access address spaces of processes running in a VM. - Support for the MT8167 IOMMU in the Mediatek IOMMU driver. - Device-tree updates for the Renesas driver to support r8a7742. - Several smaller fixes and cleanups all over the place. * tag 'iommu-updates-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (57 commits) iommu/vt-d: Gracefully handle DMAR units with no supported address widths iommu/vt-d: Check UAPI data processed by IOMMU core iommu/uapi: Handle data and argsz filled by users iommu/uapi: Rename uapi functions iommu/uapi: Use named union for user data iommu/uapi: Add argsz for user filled data docs: IOMMU user API iommu/qcom: add missing put_device() call in qcom_iommu_of_xlate() iommu/arm-smmu-v3: Add SVA device feature iommu/arm-smmu-v3: Check for SVA features iommu/arm-smmu-v3: Seize private ASID iommu/arm-smmu-v3: Share process page tables iommu/arm-smmu-v3: Move definitions to a header iommu/io-pgtable-arm: Move some definitions to a header iommu/arm-smmu-v3: Ensure queue is read after updating prod pointer iommu/amd: Re-purpose Exclusion range registers to support SNP CWWB iommu/amd: Add support for RMP_PAGE_FAULT and RMP_HW_ERR iommu/amd: Use 4K page for completion wait write-back semaphore iommu/tegra-smmu: Allow to group clients in same swgroup iommu/tegra-smmu: Fix iova->phys translation ...
2020-10-14Merge tag 'pm-5.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These rework the collection of cpufreq statistics to allow it to take place if fast frequency switching is enabled in the governor, rework the frequency invariance handling in the cpufreq core and drivers, add new hardware support to a couple of cpufreq drivers, fix a number of assorted issues and clean up the code all over. Specifics: - Rework cpufreq statistics collection to allow it to take place when fast frequency switching is enabled in the governor (Viresh Kumar). - Make the cpufreq core set the frequency scale on behalf of the driver and update several cpufreq drivers accordingly (Ionela Voinescu, Valentin Schneider). - Add new hardware support to the STI and qcom cpufreq drivers and improve them (Alain Volmat, Manivannan Sadhasivam). - Fix multiple assorted issues in cpufreq drivers (Jon Hunter, Krzysztof Kozlowski, Matthias Kaehlcke, Pali Rohár, Stephan Gerhold, Viresh Kumar). - Fix several assorted issues in the operating performance points (OPP) framework (Stephan Gerhold, Viresh Kumar). - Allow devfreq drivers to fetch devfreq instances by DT enumeration instead of using explicit phandles and modify the devfreq core code to support driver-specific devfreq DT bindings (Leonard Crestez, Chanwoo Choi). - Improve initial hardware resetting in the tegra30 devfreq driver and clean up the tegra cpuidle driver (Dmitry Osipenko). - Update the cpuidle core to collect state entry rejection statistics and expose them via sysfs (Lina Iyer). - Improve the ACPI _CST code handling diagnostics (Chen Yu). - Update the PSCI cpuidle driver to allow the PM domain initialization to occur in the OSI mode as well as in the PC mode (Ulf Hansson). - Rework the generic power domains (genpd) core code to allow domain power off transition to be aborted in the absence of the "power off" domain callback (Ulf Hansson). - Fix two suspend-to-idle issues in the ACPI EC driver (Rafael Wysocki). - Fix the handling of timer_expires in the PM-runtime framework on 32-bit systems and the handling of device links in it (Grygorii Strashko, Xiang Chen). - Add IO requests batching support to the hibernate image saving and reading code and drop a bogus get_gendisk() from there (Xiaoyi Chen, Christoph Hellwig). - Allow PCIe ports to be put into the D3cold power state if they are power-manageable via ACPI (Lukas Wunner). - Add missing header file include to a power capping driver (Pujin Shi). - Clean up the qcom-cpr AVS driver a bit (Liu Shixin). - Kevin Hilman steps down as designated reviwer of adaptive voltage scaling (AVS) drivers (Kevin Hilman)" * tag 'pm-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (65 commits) cpufreq: stats: Fix string format specifier mismatch arm: disable frequency invariance for CONFIG_BL_SWITCHER cpufreq,arm,arm64: restructure definitions of arch_set_freq_scale() cpufreq: stats: Add memory barrier to store_reset() cpufreq: schedutil: Simplify sugov_fast_switch() ACPI: EC: PM: Drop ec_no_wakeup check from acpi_ec_dispatch_gpe() ACPI: EC: PM: Flush EC work unconditionally after wakeup PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI PM: hibernate: remove the bogus call to get_gendisk() in software_resume() cpufreq: Move traces and update to policy->cur to cpufreq core cpufreq: stats: Enable stats for fast-switch as well cpufreq: stats: Mark few conditionals with unlikely() cpufreq: stats: Remove locking cpufreq: stats: Defer stats update to cpufreq_stats_record_transition() PM: domains: Allow to abort power off when no ->power_off() callback PM: domains: Rename power state enums for genpd PM / devfreq: tegra30: Improve initial hardware resetting PM / devfreq: event: Change prototype of devfreq_event_get_edev_by_phandle function PM / devfreq: Change prototype of devfreq_get_devfreq_by_phandle function PM / devfreq: Add devfreq_get_devfreq_by_node function ...
2020-10-14Merge tag 'for-linus-5.10b-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - two small cleanup patches - avoid error messages when initializing MCA banks in a Xen dom0 - a small series for converting the Xen gntdev driver to use pin_user_pages*() instead of get_user_pages*() - intermediate fix for running as a Xen guest on Arm with KPTI enabled (the final solution will need new Xen functionality) * tag 'for-linus-5.10b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: Fix typo in xen_pagetable_p2m_free() x86/xen: disable Firmware First mode for correctable memory errors xen/arm: do not setup the runstate info page if kpti is enabled xen: remove redundant initialization of variable ret xen/gntdev.c: Convert get_user_pages*() to pin_user_pages*() xen/gntdev.c: Mark pages as dirty
2020-10-12Merge branch 'compat.mount' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull compat mount cleanups from Al Viro: "The last remnants of mount(2) compat buried by Christoph. Buried into NFS, that is. Generally I'm less enthusiastic about "let's use in_compat_syscall() deep in call chain" kind of approach than Christoph seems to be, but in this case it's warranted - that had been an NFS-specific wart, hopefully not to be repeated in any other filesystems (read: any new filesystem introducing non-text mount options will get NAKed even if it doesn't mess the layout up). IOW, not worth trying to grow an infrastructure that would avoid that use of in_compat_syscall()..." * 'compat.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: remove compat_sys_mount fs,nfs: lift compat nfs4 mount data handling into the nfs code nfs: simplify nfs4_parse_monolithic
2020-10-12Merge branch 'work.quota-compat' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull compat quotactl cleanups from Al Viro: "More Christoph's compat cleanups: quotactl(2)" * 'work.quota-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: quota: simplify the quotactl compat handling compat: add a compat_need_64bit_alignment_fixup() helper compat: lift compat_s64 and compat_u64 to <asm-generic/compat.h>
2020-10-12Merge branch 'work.iov_iter' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull compat iovec cleanups from Al Viro: "Christoph's series around import_iovec() and compat variant thereof" * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: security/keys: remove compat_keyctl_instantiate_key_iov mm: remove compat_process_vm_{readv,writev} fs: remove compat_sys_vmsplice fs: remove the compat readv/writev syscalls fs: remove various compat readv/writev helpers iov_iter: transparently handle compat iovecs in import_iovec iov_iter: refactor rw_copy_check_uvector and import_iovec iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c compat.h: fix a spelling error in <linux/compat.h>
2020-10-12Merge tag 'efi-core-2020-10-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI changes from Ingo Molnar: - Preliminary RISC-V enablement - the bulk of it will arrive via the RISCV tree. - Relax decompressed image placement rules for 32-bit ARM - Add support for passing MOK certificate table contents via a config table rather than a EFI variable. - Add support for 18 bit DIMM row IDs in the CPER records. - Work around broken Dell firmware that passes the entire Boot#### variable contents as the command line - Add definition of the EFI_MEMORY_CPU_CRYPTO memory attribute so we can identify it in the memory map listings. - Don't abort the boot on arm64 if the EFI RNG protocol is available but returns with an error - Replace slashes with exclamation marks in efivarfs file names - Split efi-pstore from the deprecated efivars sysfs code, so we can disable the latter on !x86. - Misc fixes, cleanups and updates. * tag 'efi-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) efi: mokvar: add missing include of asm/early_ioremap.h efi: efivars: limit availability to X86 builds efi: remove some false dependencies on CONFIG_EFI_VARS efi: gsmi: fix false dependency on CONFIG_EFI_VARS efi: efivars: un-export efivars_sysfs_init() efi: pstore: move workqueue handling out of efivars efi: pstore: disentangle from deprecated efivars module efi: mokvar-table: fix some issues in new code efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure efivarfs: Replace invalid slashes with exclamation marks in dentries. efi: Delete deprecated parameter comments efi/libstub: Fix missing-prototypes in string.c efi: Add definition of EFI_MEMORY_CPU_CRYPTO and ability to report it cper,edac,efi: Memory Error Record: bank group/address and chip id edac,ghes,cper: Add Row Extension to Memory Error Record efi/x86: Add a quirk to support command line arguments on Dell EFI firmware efi/libstub: Add efi_warn and *_once logging helpers integrity: Load certs from the EFI MOK config table integrity: Move import of MokListRT certs to a separate routine efi: Support for MOK variable config table ...
2020-10-12Merge tag 'irq-core-2020-10-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt subsystem: Core: - Allow trimming of interrupt hierarchy to support odd hardware setups where only a subset of the interrupts requires the full hierarchy. - Allow the retrigger mechanism to follow a hierarchy to simplify driver code. - Provide a mechanism to force enable wakeup interrrupts on suspend. - More infrastructure to handle IPIs in the core code Architectures: - Convert ARM/ARM64 IPI handling to utilize the interrupt core code. Drivers: - The usual pile of new interrupt chips (MStar, Actions Owl, TI PRUSS, Designware ICTL) - ARM(64) IPI related conversions - Wakeup support for Qualcom PDC - Prevent hierarchy corruption in the NVIDIA Tegra driver - The usual small fixes, improvements and cleanups all over the place" * tag 'irq-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits) dt-bindings: interrupt-controller: Add MStar interrupt controller irqchip/irq-mst: Add MStar interrupt controller support soc/tegra: pmc: Don't create fake interrupt hierarchy levels soc/tegra: pmc: Allow optional irq parent callbacks gpio: tegra186: Allow optional irq parent callbacks genirq/irqdomain: Allow partial trimming of irq_data hierarchy irqchip/qcom-pdc: Reset PDC interrupts during init irqchip/qcom-pdc: Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag pinctrl: qcom: Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag genirq/PM: Introduce IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag pinctrl: qcom: Use return value from irq_set_wake() call pinctrl: qcom: Set IRQCHIP_SET_TYPE_MASKED and IRQCHIP_MASK_ON_SUSPEND flags ARM: Handle no IPI being registered in show_ipi_list() MAINTAINERS: Add entries for Actions Semi Owl SIRQ controller irqchip: Add Actions Semi Owl SIRQ controller dt-bindings: interrupt-controller: Add Actions SIRQ controller binding dt-bindings: dw-apb-ictl: Update binding to describe use as primary interrupt controller irqchip/dw-apb-ictl: Add primary interrupt controller support irqchip/dw-apb-ictl: Refactor priot to introducing hierarchical irq domains genirq: Add stub for set_handle_irq() when !GENERIC_IRQ_MULTI_HANDLER ...
2020-10-12Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "There's quite a lot of code here, but much of it is due to the addition of a new PMU driver as well as some arm64-specific selftests which is an area where we've traditionally been lagging a bit. In terms of exciting features, this includes support for the Memory Tagging Extension which narrowly missed 5.9, hopefully allowing userspace to run with use-after-free detection in production on CPUs that support it. Work is ongoing to integrate the feature with KASAN for 5.11. Another change that I'm excited about (assuming they get the hardware right) is preparing the ASID allocator for sharing the CPU page-table with the SMMU. Those changes will also come in via Joerg with the IOMMU pull. We do stray outside of our usual directories in a few places, mostly due to core changes required by MTE. Although much of this has been Acked, there were a couple of places where we unfortunately didn't get any review feedback. Other than that, we ran into a handful of minor conflicts in -next, but nothing that should post any issues. Summary: - Userspace support for the Memory Tagging Extension introduced by Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11. - Selftests for MTE, Pointer Authentication and FPSIMD/SVE context switching. - Fix and subsequent rewrite of our Spectre mitigations, including the addition of support for PR_SPEC_DISABLE_NOEXEC. - Support for the Armv8.3 Pointer Authentication enhancements. - Support for ASID pinning, which is required when sharing page-tables with the SMMU. - MM updates, including treating flush_tlb_fix_spurious_fault() as a no-op. - Perf/PMU driver updates, including addition of the ARM CMN PMU driver and also support to handle CPU PMU IRQs as NMIs. - Allow prefetchable PCI BARs to be exposed to userspace using normal non-cacheable mappings. - Implementation of ARCH_STACKWALK for unwinding. - Improve reporting of unexpected kernel traps due to BPF JIT failure. - Improve robustness of user-visible HWCAP strings and their corresponding numerical constants. - Removal of TEXT_OFFSET. - Removal of some unused functions, parameters and prototypes. - Removal of MPIDR-based topology detection in favour of firmware description. - Cleanups to handling of SVE and FPSIMD register state in preparation for potential future optimisation of handling across syscalls. - Cleanups to the SDEI driver in preparation for support in KVM. - Miscellaneous cleanups and refactoring work" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits) Revert "arm64: initialize per-cpu offsets earlier" arm64: random: Remove no longer needed prototypes arm64: initialize per-cpu offsets earlier kselftest/arm64: Check mte tagged user address in kernel kselftest/arm64: Verify KSM page merge for MTE pages kselftest/arm64: Verify all different mmap MTE options kselftest/arm64: Check forked child mte memory accessibility kselftest/arm64: Verify mte tag inclusion via prctl kselftest/arm64: Add utilities and a test to validate mte memory perf: arm-cmn: Fix conversion specifiers for node type perf: arm-cmn: Fix unsigned comparison to less than zero arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option arm64: Pull in task_stack_page() to Spectre-v4 mitigation code KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled arm64: Get rid of arm64_ssbd_state KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state() KVM: arm64: Get rid of kvm_arm_have_ssbd() KVM: arm64: Simplify handling of ARCH_WORKAROUND_2 ...
2020-10-11Merge tag 'irqchip-5.10' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: Core changes: - Allow irq retriggering to follow a hierarchy - Allow interrupt hierarchies to be trimmed at allocation time - Allow interrupts to be hidden from /proc/interrupts (IPIs) - Introduce stub for set_handle_irq() when !GENERIC_IRQ_MULTI_HANDLER - New per-cpu IPI handling flow Architecture changes: - Move arm/arm64 IPI handling to the core interrupt code, removing the home brewed accounting Driver updates: - New driver for the MStar (and more recently Mediatek) platforms - New driver for the Actions Owl SIRQ controller - New driver for the TI PRUSS infrastructure - Wake-up support for the Qualcomm PDC controller - Primary interrupt controller support for the Designware APB ICTL - Convert the IPI code for GIC, GICv3, hip04, armada-270-xp and bcm2836 to using standard interrupts - Improve GICv3 pseudo-NMI support to deal with both non-secure and secure priorities on arm64 - Convert the GIC/GICv3 drivers to using HW-based irq retrigger - A sprinkling of dev_err_probe() conversion - A set of NVIDIA Tegra fixes for interrupt hierarchy corruption - A reset fix for the Loongson HTVEC driver - A couple of error handling fixes in the TI SCI drivers
2020-10-09Revert "arm64: initialize per-cpu offsets earlier"Will Deacon
This reverts commit 353e228eb355be5a65a3c0996c774a0f46737fda. Qian Cai reports that TX2 no longer boots with his .config as it appears that task_cpu() gets instrumented and used before KASAN has been initialised. Although Mark has a proposed fix, let's take the safe option of reverting this for now and sorting it out properly later. Link: https://lore.kernel.org/r/711bc57a314d8d646b41307008db2845b7537b3d.camel@redhat.com Reported-by: Qian Cai <cai@redhat.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-10-08cpufreq,arm,arm64: restructure definitions of arch_set_freq_scale()Ionela Voinescu
Compared to other arch_* functions, arch_set_freq_scale() has an atypical weak definition that can be replaced by a strong architecture specific implementation. The more typical support for architectural functions involves defining an empty stub in a header file if the symbol is not already defined in architecture code. Some examples involve: - #define arch_scale_freq_capacity topology_get_freq_scale - #define arch_scale_freq_invariant topology_scale_freq_invariant - #define arch_scale_cpu_capacity topology_get_cpu_scale - #define arch_update_cpu_topology topology_update_cpu_topology - #define arch_scale_thermal_pressure topology_get_thermal_pressure - #define arch_set_thermal_pressure topology_set_thermal_pressure Bring arch_set_freq_scale() in line with these functions by renaming it to topology_set_freq_scale() in the arch topology driver, and by defining the arch_set_freq_scale symbol to point to the new function for arm and arm64. While there are other users of the arch_topology driver, this patch defines arch_set_freq_scale for arm and arm64 only, due to their existing definitions of arch_scale_freq_capacity. This is the getter function of the frequency invariance scale factor and without a getter function, the setter function - arch_set_freq_scale() has not purpose. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> (BL_SWITCHER and topology parts) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-07Merge branch 'for-next/late-arrivals' into for-next/coreWill Deacon
Late patches for 5.10: MTE selftests, minor KCSAN preparation and removal of some unused prototypes. (Amit Daniel Kachhap and others) * for-next/late-arrivals: arm64: random: Remove no longer needed prototypes arm64: initialize per-cpu offsets earlier kselftest/arm64: Check mte tagged user address in kernel kselftest/arm64: Verify KSM page merge for MTE pages kselftest/arm64: Verify all different mmap MTE options kselftest/arm64: Check forked child mte memory accessibility kselftest/arm64: Verify mte tag inclusion via prctl kselftest/arm64: Add utilities and a test to validate mte memory
2020-10-07arm64: random: Remove no longer needed prototypesAndre Przywara
Commit 9bceb80b3cc4 ("arm64: kaslr: Use standard early random function") removed the direct calls of the __arm64_rndr() and __early_cpu_has_rndr() functions, but left the dummy prototypes in the #else branch of the #ifdef CONFIG_ARCH_RANDOM guard. Remove the redundant prototypes, as they have no users outside of this header file. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20201006194453.36519-1-andre.przywara@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-10-07Merge branches 'arm/allwinner', 'arm/mediatek', 'arm/renesas', 'arm/tegra', ↵Joerg Roedel
'arm/qcom', 'arm/smmu', 'ppc/pamu', 'x86/amd', 'x86/vt-d' and 'core' into next
2020-10-05arm64: initialize per-cpu offsets earlierMark Rutland
The current initialization of the per-cpu offset register is difficult to follow and this initialization is not always early enough for upcoming instrumentation with KCSAN, where the instrumentation callbacks use the per-cpu offset. To make it possible to support KCSAN, and to simplify reasoning about early bringup code, let's initialize the per-cpu offset earlier, before we run any C code that may consume it. To do so, this patch adds a new init_this_cpu_offset() helper that's called before the usual primary/secondary start functions. For consistency, this is also used to re-initialize the per-cpu offset after the runtime per-cpu areas have been allocated (which can change CPU0's offset). So that init_this_cpu_offset() isn't subject to any instrumentation that might consume the per-cpu offset, it is marked with noinstr, preventing instrumentation. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201005164303.21389-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-10-05Merge back cpufreq material for 5.10.Rafael J. Wysocki
2020-10-04xen/arm: do not setup the runstate info page if kpti is enabledStefano Stabellini
The VCPUOP_register_runstate_memory_area hypercall takes a virtual address of a buffer as a parameter. The semantics of the hypercall are such that the virtual address should always be valid. When KPTI is enabled and we are running userspace code, the virtual address is not valid, thus, Linux is violating the semantics of VCPUOP_register_runstate_memory_area. Do not call VCPUOP_register_runstate_memory_area when KPTI is enabled. Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> CC: Bertrand Marquis <Bertrand.Marquis@arm.com> CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Link: https://lore.kernel.org/r/20200924234955.15455-1-sstabellini@kernel.org Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2020-10-03mm: remove compat_process_vm_{readv,writev}Christoph Hellwig
Now that import_iovec handles compat iovecs, the native syscalls can be used for the compat case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-10-03fs: remove compat_sys_vmspliceChristoph Hellwig
Now that import_iovec handles compat iovecs, the native vmsplice syscall can be used for the compat case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>