summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2018-06-21x86/platform/UV: Use new set memory block size functionmike.travis@hpe.com
Add a call to the new function to "adjust" the current fixed UV memory block size of 2GB so it can be changed to a different physical boundary. This accommodates changes in the Intel BIOS, and therefore UV BIOS, which now can align boundaries different than the previous UV standard of 2GB. It also flags any UV Global Address boundaries from BIOS that cause a change in the mem block size (boundary). The current boundary of 2GB has been used on UV since the first system release in 2009 with Linux 2.6 and has worked fine. But the new NVDIMM persistent memory modules (PMEM), along with the Intel BIOS changes to support these modules caused the memory block size boundary to be set to a lower limit. Intel only guarantees that this minimum boundary at 64MB though the current Linux limit is 128MB. Note that the default remains 2GB if no changes occur. Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Andrew Banman <andrew.banman@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Cc: jgross@suse.com Cc: kirill.shutemov@linux.intel.com Cc: mhocko@suse.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/lkml/20180524201711.732785782@stormcage.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21x86/platform/UV: Add adjustable set memory block size functionmike.travis@hpe.com
Add a new function to "adjust" the current fixed UV memory block size of 2GB so it can be changed to a different physical boundary. This is out of necessity so arch dependent code can accommodate specific BIOS requirements which can align these new PMEM modules at less than the default boundaries. A "set order" type of function was used to insure that the memory block size will be a power of two value without requiring a validity check. 64GB was chosen as the upper limit for memory block size values to accommodate upcoming 4PB systems which have 6 more bits of physical address space (46 becoming 52). Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Andrew Banman <andrew.banman@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Cc: jgross@suse.com Cc: kirill.shutemov@linux.intel.com Cc: mhocko@suse.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/lkml/20180524201711.609546602@stormcage.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21x86/build: Remove unnecessary preparation for purgatoryMasahiro Yamada
kexec-purgatory.c is properly generated when Kbuild descend into the arch/x86/purgatory/. Thus the 'archprepare' target is redundant. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <michal.lkml@markovi.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/1529401422-28838-3-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21Revert "kexec/purgatory: Add clean-up for purgatory directory"Masahiro Yamada
Reverts the following commit: b0108f9e93d0 ("kexec: purgatory: add clean-up for purgatory directory") ... which incorrectly stated that the kexec-purgatory.c and purgatory.ro files were not removed after 'make mrproper'. In fact, they are. You can confirm it after reverting it. $ make mrproper $ touch arch/x86/purgatory/kexec-purgatory.c $ touch arch/x86/purgatory/purgatory.ro $ make mrproper CLEAN arch/x86/purgatory $ ls arch/x86/purgatory/ entry64.S Makefile purgatory.c setup-x86_64.S stack.S string.c This is obvious from the build system point of view. arch/x86/Makefile adds 'arch/x86' to core-y. Hence 'make clean' descends like this: arch/x86/Kbuild -> arch/x86/purgatory/Makefile Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <michal.lkml@markovi.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/1529401422-28838-2-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21x86/xen: Add call of speculative_store_bypass_ht_init() to PV pathsJuergen Gross
Commit: 1f50ddb4f418 ("x86/speculation: Handle HT correctly on AMD") ... added speculative_store_bypass_ht_init() to the per-CPU initialization sequence. speculative_store_bypass_ht_init() needs to be called on each CPU for PV guests, too. Reported-by: Brian Woods <brian.woods@amd.com> Tested-by: Brian Woods <brian.woods@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: xen-devel@lists.xenproject.org Fixes: 1f50ddb4f4189243c05926b842dc1a0332195f31 ("x86/speculation: Handle HT correctly on AMD") Link: https://lore.kernel.org/lkml/20180621084331.21228-1-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-20x86: Call fixup_exception() before notify_die() in math_error()Siarhei Liakh
fpu__drop() has an explicit fwait which under some conditions can trigger a fixable FPU exception while in kernel. Thus, we should attempt to fixup the exception first, and only call notify_die() if the fixup failed just like in do_general_protection(). The original call sequence incorrectly triggers KDB entry on debug kernels under particular FPU-intensive workloads. Andy noted, that this makes the whole conditional irq enable thing even more inconsistent, but fixing that it outside the scope of this. Signed-off-by: Siarhei Liakh <siarhei.liakh@concurrent-rt.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Borislav Petkov" <bpetkov@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/DM5PR11MB201156F1CAB2592B07C79A03B17D0@DM5PR11MB2011.namprd11.prod.outlook.com
2018-06-09x86/intel_rdt: Enable CMT and MBM on new Skylake steppingTony Luck
New stepping of Skylake has fixes for cache occupancy and memory bandwidth monitoring. Update the code to enable these by default on newer steppings. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: stable@vger.kernel.org # v4.14 Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Link: https://lkml.kernel.org/r/20180608160732.9842-1-tony.luck@intel.com
2018-06-06x86/apic/vector: Print APIC control bits in debugfsThomas Gleixner
Extend the debugability of the vector management by adding the state bits to the debugfs output. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.908136099@linutronix.de
2018-06-06x86/platform/uv: Use apic_ack_irq()Thomas Gleixner
To address the EBUSY fail of interrupt affinity settings in case that the previous setting has not been cleaned up yet, use the new apic_ack_irq() function instead of the special uv_ack_apic() implementation which is merily a wrapper around ack_APIC_irq(). Preparatory change for the real fix Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Reported-by: Song Liu <liu.song.a23@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.721691398@linutronix.de
2018-06-06x86/ioapic: Use apic_ack_irq()Thomas Gleixner
To address the EBUSY fail of interrupt affinity settings in case that the previous setting has not been cleaned up yet, use the new apic_ack_irq() function instead of directly invoking ack_APIC_irq(). Preparatory change for the real fix Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.639011135@linutronix.de
2018-06-06x86/apic: Provide apic_ack_irq()Thomas Gleixner
apic_ack_edge() is explicitely for handling interrupt affinity cleanup when interrupt remapping is not available or disable. Remapped interrupts and also some of the platform specific special interrupts, e.g. UV, invoke ack_APIC_irq() directly. To address the issue of failing an affinity update with -EBUSY the delayed affinity mechanism can be reused, but ack_APIC_irq() does not handle that. Adding this to ack_APIC_irq() is not possible, because that function is also used for exceptions and directly handled interrupts like IPIs. Create a new function, which just contains the conditional invocation of irq_move_irq() and the final ack_APIC_irq(). Reuse the new function in apic_ack_edge(). Preparatory change for the real fix. Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de
2018-06-06x86/apic/vector: Prevent hlist corruption and leaksThomas Gleixner
Several people observed the WARN_ON() in irq_matrix_free() which triggers when the caller tries to free an vector which is not in the allocation range. Song provided the trace information which allowed to decode the root cause. The rework of the vector allocation mechanism failed to preserve a sanity check, which prevents setting a new target vector/CPU when the previous affinity change has not fully completed. As a result a half finished affinity change can be overwritten, which can cause the leak of a irq descriptor pointer on the previous target CPU and double enqueue of the hlist head into the cleanup lists of two or more CPUs. After one CPU cleaned up its vector the next CPU will invoke the cleanup handler with vector 0, which triggers the out of range warning in the matrix allocator. Prevent this by checking the apic_data of the interrupt whether the move_in_progress flag is false and the hlist node is not hashed. Return -EBUSY if not. This prevents the damage and restores the behaviour before the vector allocation rework, but due to other changes in that area it also widens the chance that user space can observe -EBUSY. In theory this should be fine, but actually not all user space tools handle -EBUSY correctly. Addressing that is not part of this fix, but will be addressed in follow up patches. Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment") Reported-by: Dmitry Safonov <0x7f454c46@gmail.com> Reported-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Song Liu <liu.song.a23@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20180604162224.303870257@linutronix.de
2018-06-06x86/vector: Fix the args of vector_alloc tracepointDou Liyang
The vector_alloc tracepont reversed the reserved and ret aggs, that made the trace print wrong. Exchange them. Fixes: 8d1e3dca7de6 ("x86/vector: Add tracepoints for vector management") Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180601065031.21872-1-douly.fnst@cn.fujitsu.com
2018-06-06x86/idt: Simplify the idt_setup_apic_and_irq_gates()Dou Liyang
The idt_setup_apic_and_irq_gates() sets the gates from FIRST_EXTERNAL_VECTOR up to FIRST_SYSTEM_VECTOR first. then secondly, from FIRST_SYSTEM_VECTOR to NR_VECTORS, it takes both APIC=y and APIC=n into account. But for APIC=n, the FIRST_SYSTEM_VECTOR is equal to NR_VECTORS, all vectors has been set at the first step. Simplify the second step, make it just work for APIC=y. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20180523023555.2933-1-douly.fnst@cn.fujitsu.com
2018-06-06x86/platform/uv: Remove extra parenthesesVarsha Rao
Remove extra parentheses to fix the extraneous parentheses clang warning. Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Varsha Rao <rvarsha016@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Nicholas Mc Guire <der.herr@hofr.at> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180520080012.8215-1-rvarsha016@gmail.com
2018-06-06x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SMEKirill A. Shutemov
AMD SME claims one bit from physical address to indicate whether the page is encrypted or not. To achieve that we clear out the bit from __PHYSICAL_MASK. The capability to adjust __PHYSICAL_MASK is required beyond AMD SME. For instance for upcoming Intel Multi-Key Total Memory Encryption. Factor it out into a separate feature with own Kconfig handle. It also helps with overhead of AMD SME. It saves more than 3k in .text on defconfig + AMD_MEM_ENCRYPT: add/remove: 3/2 grow/shrink: 5/110 up/down: 189/-3753 (-3564) We would need to return to this once we have infrastructure to patch constants in code. That's good candidate for it. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-mm@kvack.org Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180518113028.79825-1-kirill.shutemov@linux.intel.com
2018-06-06x86: Mark native_set_p4d() as __always_inlineArnd Bergmann
When CONFIG_OPTIMIZE_INLINING is enabled, the function native_set_p4d() may not be fully inlined into the caller, resulting in a false-positive warning about an access to the __pgtable_l5_enabled variable from a non-__init function, despite the original caller being an __init function: WARNING: vmlinux.o(.text.unlikely+0x1429): Section mismatch in reference from the function native_set_p4d() to the variable .init.data:__pgtable_l5_enabled WARNING: vmlinux.o(.text.unlikely+0x1429): Section mismatch in reference from the function native_p4d_clear() to the variable .init.data:__pgtable_l5_enabled The function native_set_p4d() references the variable __initdata __pgtable_l5_enabled. This is often because native_set_p4d lacks a __initdata annotation or the annotation of __pgtable_l5_enabled is wrong. Marking the native_set_p4d function and its caller native_p4d_clear() avoids this problem. I did not bisect the original cause, but I assume this is related to the recent rework that turned pgtable_l5_enabled() into an inline function, which in turn caused the compiler to make different inlining decisions. Fixes: ad3fe525b950 ("x86/mm: Unify pgtable_l5_enabled usage in early boot code") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Link: https://lkml.kernel.org/r/20180605113715.1133726-1-arnd@arndb.de
2018-06-04Merge branches 'x86/dma', 'x86/microcode', 'x86/mm' and 'x86/vdso' into ↵Ingo Molnar
x86/urgent Merge these small and simple 1-2 commit branches into the urgent branch. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-27x86/microcode: Make the late update update_lock a raw lock for RTScott Wood
__reload_late() is called from stop_machine context and thus cannot acquire a non-raw spinlock on PREEMPT_RT. Signed-off-by: Scott Wood <swood@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Pei Zhang <pezhang@redhat.com> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/20180524154420.24455-1-swood@redhat.com
2018-05-26Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 store buffer fixes from Thomas Gleixner: "Two fixes for the SSBD mitigation code: - expose SSBD properly to guests. This got broken when the CPU feature flags got reshuffled. - simplify the CPU detection logic to avoid duplicate entries in the tables" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Simplify the CPU bug detection logic KVM/VMX: Expose SSBD properly to guests
2018-05-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "PPC: - Close a hole which could possibly lead to the host timebase getting out of sync. - Three fixes relating to PTEs and TLB entries for radix guests. - Fix a bug which could lead to an interrupt never getting delivered to the guest, if it is pending for a guest vCPU when the vCPU gets offlined. s390: - Fix false negatives in VSIE validity check (Cc stable) x86: - Fix time drift of VMX preemption timer when a guest uses LAPIC timer in periodic mode (Cc stable) - Unconditionally expose CPUID.IA32_ARCH_CAPABILITIES to allow migration from hosts that don't need retpoline mitigation (Cc stable) - Fix guest crashes on reboot by properly coupling CR4.OSXSAVE and CPUID.OSXSAVE (Cc stable) - Report correct RIP after Hyper-V hypercall #UD (introduced in -rc6)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: fix #UD address of failed Hyper-V hypercalls kvm: x86: IA32_ARCH_CAPABILITIES is always supported KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed x86/kvm: fix LAPIC timer drift when guest uses periodic mode KVM: s390: vsie: fix < 8k check for the itdba KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path KVM: PPC: Book3S HV: XIVE: Resend re-routed interrupts on CPU priority change KVM: PPC: Book3S HV: Make radix clear pte when unmapping KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in kvmppc_radix_tlbie_page KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry
2018-05-25KVM: x86: fix #UD address of failed Hyper-V hypercallsRadim Krčmář
If the hypercall was called from userspace or real mode, KVM injects #UD and then advances RIP, so it looks like #UD was caused by the following instruction. This probably won't cause more than confusion, but could give an unexpected access to guest OS' instruction emulator. Also, refactor the code to count hv hypercalls that were handled by the virt userspace. Fixes: 6356ee0c9602 ("x86: Delay skip of emulated hypercall instruction") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-05-24kvm: x86: IA32_ARCH_CAPABILITIES is always supportedJim Mattson
If there is a possibility that a VM may migrate to a Skylake host, then the hypervisor should report IA32_ARCH_CAPABILITIES.RSBA[bit 2] as being set (future work, of course). This implies that CPUID.(EAX=7,ECX=0):EDX.ARCH_CAPABILITIES[bit 29] should be set. Therefore, kvm should report this CPUID bit as being supported whether or not the host supports it. Userspace is still free to clear the bit if it chooses. For more information on RSBA, see Intel's white paper, "Retpoline: A Branch Target Injection Mitigation" (Document Number 337131-001), currently available at https://bugzilla.kernel.org/show_bug.cgi?id=199511. Since the IA32_ARCH_CAPABILITIES MSR is emulated in kvm, there is no dependency on hardware support for this feature. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Fixes: 28c1c9fabf48 ("KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES") Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-05-24KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changedWei Huang
The CPUID bits of OSXSAVE (function=0x1) and OSPKE (func=0x7, leaf=0x0) allows user apps to detect if OS has set CR4.OSXSAVE or CR4.PKE. KVM is supposed to update these CPUID bits when CR4 is updated. Current KVM code doesn't handle some special cases when updates come from emulator. Here is one example: Step 1: guest boots Step 2: guest OS enables XSAVE ==> CR4.OSXSAVE=1 and CPUID.OSXSAVE=1 Step 3: guest hot reboot ==> QEMU reset CR4 to 0, but CPUID.OSXAVE==1 Step 4: guest os checks CPUID.OSXAVE, detects 1, then executes xgetbv Step 4 above will cause an #UD and guest crash because guest OS hasn't turned on OSXAVE yet. This patch solves the problem by comparing the the old_cr4 with cr4. If the related bits have been changed, kvm_update_cpuid() needs to be called. Signed-off-by: Wei Huang <wei@redhat.com> Reviewed-by: Bandan Das <bsd@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-05-24x86/kvm: fix LAPIC timer drift when guest uses periodic modeDavid Vrabel
Since 4.10, commit 8003c9ae204e (KVM: LAPIC: add APIC Timer periodic/oneshot mode VMX preemption timer support), guests using periodic LAPIC timers (such as FreeBSD 8.4) would see their timers drift significantly over time. Differences in the underlying clocks and numerical errors means the periods of the two timers (hv and sw) are not the same. This difference will accumulate with every expiry resulting in a large error between the hv and sw timer. This means the sw timer may be running slow when compared to the hv timer. When the timer is switched from hv to sw, the now active sw timer will expire late. The guest VCPU is reentered and it switches to using the hv timer. This timer catches up, injecting multiple IRQs into the guest (of which the guest only sees one as it does not get to run until the hv timer has caught up) and thus the guest's timer rate is low (and becomes increasing slower over time as the sw timer lags further and further behind). I believe a similar problem would occur if the hv timer is the slower one, but I have not observed this. Fix this by synchronizing the deadlines for both timers to the same time source on every tick. This prevents the errors from accumulating. Fixes: 8003c9ae204e21204e49816c5ea629357e283b06 Cc: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: David Vrabel <david.vrabel@nutanix.com> Cc: stable@vger.kernel.org Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-05-23x86/speculation: Simplify the CPU bug detection logicDominik Brodowski
Only CPUs which speculate can speculate. Therefore, it seems prudent to test for cpu_no_speculation first and only then determine whether a specific speculating CPU is susceptible to store bypass speculation. This is underlined by all CPUs currently listed in cpu_no_speculation were present in cpu_no_spec_store_bypass as well. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bp@suse.de Cc: konrad.wilk@oracle.com Link: https://lkml.kernel.org/r/20180522090539.GA24668@light.dominikbrodowski.net
2018-05-23KVM/VMX: Expose SSBD properly to guestsKonrad Rzeszutek Wilk
The X86_FEATURE_SSBD is an synthetic CPU feature - that is it bit location has no relevance to the real CPUID 0x7.EBX[31] bit position. For that we need the new CPU feature name. Fixes: 52817587e706 ("x86/cpufeatures: Disentangle SSBD enumeration") Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: stable@vger.kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lkml.kernel.org/r/20180521215449.26423-2-konrad.wilk@oracle.com
2018-05-21Merge branch 'speck-v20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Merge speculative store buffer bypass fixes from Thomas Gleixner: - rework of the SPEC_CTRL MSR management to accomodate the new fancy SSBD (Speculative Store Bypass Disable) bit handling. - the CPU bug and sysfs infrastructure for the exciting new Speculative Store Bypass 'feature'. - support for disabling SSB via LS_CFG MSR on AMD CPUs including Hyperthread synchronization on ZEN. - PRCTL support for dynamic runtime control of SSB - SECCOMP integration to automatically disable SSB for sandboxed processes with a filter flag for opt-out. - KVM integration to allow guests fiddling with SSBD including the new software MSR VIRT_SPEC_CTRL to handle the LS_CFG based oddities on AMD. - BPF protection against SSB .. this is just the core and x86 side, other architecture support will come separately. * 'speck-v20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits) bpf: Prevent memory disambiguation attack x86/bugs: Rename SSBD_NO to SSB_NO KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG x86/bugs: Rework spec_ctrl base and mask logic x86/bugs: Remove x86_spec_ctrl_set() x86/bugs: Expose x86_spec_ctrl_base directly x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host} x86/speculation: Rework speculative_store_bypass_update() x86/speculation: Add virtualized speculative store bypass disable support x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL x86/speculation: Handle HT correctly on AMD x86/cpufeatures: Add FEATURE_ZEN x86/cpufeatures: Disentangle SSBD enumeration x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP KVM: SVM: Move spec control call after restore of GS x86/cpu: Make alternative_msr_write work for 32-bit code x86/bugs: Fix the parameters alignment and missing void x86/bugs: Make cpu_show_common() static ...
2018-05-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "An unfortunately larger set of fixes, but a large portion is selftests: - Fix the missing clusterid initializaiton for x2apic cluster management which caused boot failures due to IPIs being sent to the wrong cluster - Drop TX_COMPAT when a 64bit executable is exec()'ed from a compat task - Wrap access to __supported_pte_mask in __startup_64() where clang compile fails due to a non PC relative access being generated. - Two fixes for 5 level paging fallout in the decompressor: - Handle GOT correctly for paging_prepare() and cleanup_trampoline() - Fix the page table handling in cleanup_trampoline() to avoid page table corruption. - Stop special casing protection key 0 as this is inconsistent with the manpage and also inconsistent with the allocation map handling. - Override the protection key wen moving away from PROT_EXEC to prevent inaccessible memory. - Fix and update the protection key selftests to address breakage and to cover the above issue - Add a MOV SS self test" [ Part of the x86 fixes were in the earlier core pull due to dependencies ] * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/mm: Drop TS_COMPAT on 64-bit exec() syscall x86/apic/x2apic: Initialize cluster ID properly x86/boot/compressed/64: Fix moving page table out of trampoline memory x86/boot/compressed/64: Set up GOT for paging_prepare() and cleanup_trampoline() x86/pkeys: Do not special case protection key 0 x86/pkeys/selftests: Add a test for pkey 0 x86/pkeys/selftests: Save off 'prot' for allocations x86/pkeys/selftests: Fix pointer math x86/pkeys: Override pkey when moving away from PROT_EXEC x86/pkeys/selftests: Fix pkey exhaustion test off-by-one x86/pkeys/selftests: Add PROT_EXEC test x86/pkeys/selftests: Factor out "instruction page" x86/pkeys/selftests: Allow faults on unknown keys x86/pkeys/selftests: Avoid printf-in-signal deadlocks x86/pkeys/selftests: Remove dead debugging code, fix dprint_in_signal x86/pkeys/selftests: Stop using assert() x86/pkeys/selftests: Give better unexpected fault error messages x86/selftests: Add mov_to_ss test x86/mpx/selftests: Adjust the self-test to fresh distros that export the MPX ABI x86/pkeys/selftests: Adjust the self-test to fresh distros that export the pkeys ABI ...
2018-05-20Merge branch 'ras-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fix from Thomas Gleixner: "Fix a regression in the new AMD SMCA code which issues an SMP function call from the early interrupt disabled region of CPU hotplug. To avoid that, use cached block addresses which can be used directly" * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/MCE/AMD: Cache SMCA MISC block addresses
2018-05-20Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Thomas Gleixner: - Use explicitely sized type for the romimage pointer in the 32bit EFI protocol struct so a 64bit kernel does not expand it to 64bit. Ditto for the 64bit struct to avoid the reverse issue on 32bit kernels. - Handle randomized tex offset correctly in the ARM64 EFI stub to avoid unaligned data resulting in stack corruption and other hard to diagnose wreckage. * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub/arm64: Handle randomized TEXT_OFFSET efi: Avoid potential crashes, fix the 'struct efi_pci_io_protocol_32' definition for mixed mode
2018-05-20Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Thomas Gleixner: - Unbreak the BPF compilation which got broken by the unconditional requirement of asm-goto, which is not supported by clang. - Prevent probing on exception masking instructions in uprobes and kprobes to avoid the issues of the delayed exceptions instead of having an ugly workaround. - Prevent a double free_page() in the error path of do_kexec_load() - A set of objtool updates addressing various issues mostly related to switch tables and the noreturn detection for recursive sibling calls - Header sync for tools. * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Detect RIP-relative switch table references, part 2 objtool: Detect RIP-relative switch table references objtool: Support GCC 8 switch tables objtool: Support GCC 8's cold subfunctions objtool: Fix "noreturn" detection for recursive sibling calls objtool, kprobes/x86: Sync the latest <asm/insn.h> header with tools/objtool/arch/x86/include/asm/insn.h x86/cpufeature: Guard asm_volatile_goto usage for BPF compilation uprobes/x86: Prohibit probing on MOV SS instruction kprobes/x86: Prohibit probing on exception masking instructions x86/kexec: Avoid double free_page() upon do_kexec_load() failure
2018-05-19x86/MCE/AMD: Cache SMCA MISC block addressesBorislav Petkov
... into a global, two-dimensional array and service subsequent reads from that cache to avoid rdmsr_on_cpu() calls during CPU hotplug (IPIs with IRQs disabled). In addition, this fixes a KASAN slab-out-of-bounds read due to wrong usage of the bank->blocks pointer. Fixes: 27bd59502702 ("x86/mce/AMD: Get address from already initialized block") Reported-by: Johannes Hirte <johannes.hirte@datenkhaos.de> Tested-by: Johannes Hirte <johannes.hirte@datenkhaos.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20180414004230.GA2033@probook
2018-05-19x86/mm: Drop TS_COMPAT on 64-bit exec() syscallDmitry Safonov
The x86 mmap() code selects the mmap base for an allocation depending on the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and for 32bit mm->mmap_compat_base. exec() calls mmap() which in turn uses in_compat_syscall() to check whether the mapping is for a 32bit or a 64bit task. The decision is made on the following criteria: ia32 child->thread.status & TS_COMPAT x32 child->pt_regs.orig_ax & __X32_SYSCALL_BIT ia64 !ia32 && !x32 __set_personality_x32() was dropping TS_COMPAT flag, but set_personality_64bit() has kept compat syscall flag making in_compat_syscall() return true during the first exec() syscall. Which in result has user-visible effects, mentioned by Alexey: 1) It breaks ASAN $ gcc -fsanitize=address wrap.c -o wrap-asan $ ./wrap32 ./wrap-asan true ==1217==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING. ==1217==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range. ==1217==Process memory map follows: 0x000000400000-0x000000401000 /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan 0x000000600000-0x000000601000 /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan 0x000000601000-0x000000602000 /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan 0x0000f7dbd000-0x0000f7de2000 /lib64/ld-2.27.so 0x0000f7fe2000-0x0000f7fe3000 /lib64/ld-2.27.so 0x0000f7fe3000-0x0000f7fe4000 /lib64/ld-2.27.so 0x0000f7fe4000-0x0000f7fe5000 0x7fed9abff000-0x7fed9af54000 0x7fed9af54000-0x7fed9af6b000 /lib64/libgcc_s.so.1 [snip] 2) It doesn't seem to be great for security if an attacker always knows that ld.so is going to be mapped into the first 4GB in this case (the same thing happens for PIEs as well). The testcase: $ cat wrap.c int main(int argc, char *argv[]) { execvp(argv[1], &argv[1]); return 127; } $ gcc wrap.c -o wrap $ LD_SHOW_AUXV=1 ./wrap ./wrap true |& grep AT_BASE AT_BASE: 0x7f63b8309000 AT_BASE: 0x7faec143c000 AT_BASE: 0x7fbdb25fa000 $ gcc -m32 wrap.c -o wrap32 $ LD_SHOW_AUXV=1 ./wrap32 ./wrap true |& grep AT_BASE AT_BASE: 0xf7eff000 AT_BASE: 0xf7cee000 AT_BASE: 0x7f8b9774e000 Fixes: 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()") Fixes: ada26481dfe6 ("x86/mm: Make in_compat_syscall() work during exec") Reported-by: Alexey Izbyshev <izbyshev@ispras.ru> Bisected-by: Alexander Monakov <amonakov@ispras.ru> Investigated-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Borislav Petkov <bp@suse.de> Cc: Alexander Monakov <amonakov@ispras.ru> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: linux-mm@kvack.org Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20180517233510.24996-1-dima@arista.com
2018-05-18x86/bugs: Rename SSBD_NO to SSB_NOKonrad Rzeszutek Wilk
The "336996 Speculative Execution Side Channel Mitigations" from May defines this as SSB_NO, hence lets sync-up. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-17Merge tag 'hwmon-for-linus-v4.17-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: "Two k10temp fixes: - fix race condition when accessing System Management Network registers - fix reading critical temperatures on F15h M60h and M70h Also add PCI ID's for the AMD Raven Ridge root bridge" * tag 'hwmon-for-linus-v4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (k10temp) Use API function to access System Management Network x86/amd_nb: Add support for Raven Ridge CPUs hwmon: (k10temp) Fix reading critical temperature register
2018-05-17x86/apic/x2apic: Initialize cluster ID properlyThomas Gleixner
Rick bisected a regression on large systems which use the x2apic cluster mode for interrupt delivery to the commit wich reworked the cluster management. The problem is caused by a missing initialization of the clusterid field in the shared cluster data structures. So all structures end up with cluster ID 0 which only allows sharing between all CPUs which belong to cluster 0. All other CPUs with a cluster ID > 0 cannot share the data structure because they cannot find existing data with their cluster ID. This causes malfunction with IPIs because IPIs are sent to the wrong cluster and the caller waits for ever that the target CPU handles the IPI. Add the missing initialization when a upcoming CPU is the first in a cluster so that the later booting CPUs can find the data and share it for proper operation. Fixes: 023a611748fd ("x86/apic/x2apic: Simplify cluster management") Reported-by: Rick Warner <rick@microway.com> Bisected-by: Rick Warner <rick@microway.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Rick Warner <rick@microway.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1805171418210.1947@nanos.tec.linutronix.de
2018-05-17Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: - ARM/ARM64 locking fixes - x86 fixes: PCID, UMIP, locking - improved support for recent Windows version that have a 2048 Hz APIC timer - rename KVM_HINTS_DEDICATED CPUID bit to KVM_HINTS_REALTIME - better behaved selftests * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME KVM: arm/arm64: VGIC/ITS save/restore: protect kvm_read_guest() calls KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock KVM: arm/arm64: VGIC/ITS: Promote irq_lock() in update_affinity KVM: arm/arm64: Properly protect VGIC locks from IRQs KVM: X86: Lower the default timer frequency limit to 200us KVM: vmx: update sec exec controls for UMIP iff emulating UMIP kvm: x86: Suppress CR3_PCID_INVD bit only when PCIDs are enabled KVM: selftests: exit with 0 status code when tests cannot be run KVM: hyperv: idr_find needs RCU protection x86: Delay skip of emulated hypercall instruction KVM: Extend MAX_IRQ_ROUTES to 4096 for all archs
2018-05-17kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIMEMichael S. Tsirkin
KVM_HINTS_DEDICATED seems to be somewhat confusing: Guest doesn't really care whether it's the only task running on a host CPU as long as it's not preempted. And there are more reasons for Guest to be preempted than host CPU sharing, for example, with memory overcommit it can get preempted on a memory access, post copy migration can cause preemption, etc. Let's call it KVM_HINTS_REALTIME which seems to better match what guests expect. Also, the flag most be set on all vCPUs - current guests assume this. Note so in the documentation. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-05-17KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBDTom Lendacky
Expose the new virtualized architectural mechanism, VIRT_SSBD, for using speculative store bypass disable (SSBD) under SVM. This will allow guests to use SSBD on hardware that uses non-architectural mechanisms for enabling SSBD. [ tglx: Folded the migration fixup from Paolo Bonzini ] Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-17x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFGThomas Gleixner
Add the necessary logic for supporting the emulated VIRT_SPEC_CTRL MSR to x86_virt_spec_ctrl(). If either X86_FEATURE_LS_CFG_SSBD or X86_FEATURE_VIRT_SPEC_CTRL is set then use the new guest_virt_spec_ctrl argument to check whether the state must be modified on the host. The update reuses speculative_store_bypass_update() so the ZEN-specific sibling coordination can be reused. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-17x86/bugs: Rework spec_ctrl base and mask logicThomas Gleixner
x86_spec_ctrL_mask is intended to mask out bits from a MSR_SPEC_CTRL value which are not to be modified. However the implementation is not really used and the bitmask was inverted to make a check easier, which was removed in "x86/bugs: Remove x86_spec_ctrl_set()" Aside of that it is missing the STIBP bit if it is supported by the platform, so if the mask would be used in x86_virt_spec_ctrl() then it would prevent a guest from setting STIBP. Add the STIBP bit if supported and use the mask in x86_virt_spec_ctrl() to sanitize the value which is supplied by the guest. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de>
2018-05-17x86/bugs: Remove x86_spec_ctrl_set()Thomas Gleixner
x86_spec_ctrl_set() is only used in bugs.c and the extra mask checks there provide no real value as both call sites can just write x86_spec_ctrl_base to MSR_SPEC_CTRL. x86_spec_ctrl_base is valid and does not need any extra masking or checking. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17x86/bugs: Expose x86_spec_ctrl_base directlyThomas Gleixner
x86_spec_ctrl_base is the system wide default value for the SPEC_CTRL MSR. x86_spec_ctrl_get_default() returns x86_spec_ctrl_base and was intended to prevent modification to that variable. Though the variable is read only after init and globaly visible already. Remove the function and export the variable instead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}Borislav Petkov
Function bodies are very similar and are going to grow more almost identical code. Add a bool arg to determine whether SPEC_CTRL is being set for the guest or restored to the host. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17x86/speculation: Rework speculative_store_bypass_update()Thomas Gleixner
The upcoming support for the virtual SPEC_CTRL MSR on AMD needs to reuse speculative_store_bypass_update() to avoid code duplication. Add an argument for supplying a thread info (TIF) value and create a wrapper speculative_store_bypass_update_current() which is used at the existing call site. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17x86/speculation: Add virtualized speculative store bypass disable supportTom Lendacky
Some AMD processors only support a non-architectural means of enabling speculative store bypass disable (SSBD). To allow a simplified view of this to a guest, an architectural definition has been created through a new CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f. With this, a hypervisor can virtualize the existence of this definition and provide an architectural method for using SSBD to a guest. Add the new CPUID feature, the new MSR and update the existing SSBD support to use this MSR when present. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de>
2018-05-17x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRLThomas Gleixner
AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care about the bit position of the SSBD bit and thus facilitate migration. Also, the sibling coordination on Family 17H CPUs can only be done on the host. Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an extra argument for the VIRT_SPEC_CTRL MSR. Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU data structure which is going to be used in later patches for the actual implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17x86/speculation: Handle HT correctly on AMDThomas Gleixner
The AMD64_LS_CFG MSR is a per core MSR on Family 17H CPUs. That means when hyperthreading is enabled the SSBD bit toggle needs to take both cores into account. Otherwise the following situation can happen: CPU0 CPU1 disable SSB disable SSB enable SSB <- Enables it for the Core, i.e. for CPU0 as well So after the SSB enable on CPU1 the task on CPU0 runs with SSB enabled again. On Intel the SSBD control is per core as well, but the synchronization logic is implemented behind the per thread SPEC_CTRL MSR. It works like this: CORE_SPEC_CTRL = THREAD0_SPEC_CTRL | THREAD1_SPEC_CTRL i.e. if one of the threads enables a mitigation then this affects both and the mitigation is only disabled in the core when both threads disabled it. Add the necessary synchronization logic for AMD family 17H. Unfortunately that requires a spinlock to serialize the access to the MSR, but the locks are only shared between siblings. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17x86/cpufeatures: Add FEATURE_ZENThomas Gleixner
Add a ZEN feature bit so family-dependent static_cpu_has() optimizations can be built for ZEN. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>