summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2017-12-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "ARM: - A number of issues in the vgic discovered using SMATCH - A bit one-off calculation in out stage base address mask (32-bit and 64-bit) - Fixes to single-step debugging instructions that trap for other reasons such as MMMIO aborts - Printing unavailable hyp mode as error - Potential spinlock deadlock in the vgic - Avoid calling vgic vcpu free more than once - Broken bit calculation for big endian systems s390: - SPDX tags - Fence storage key accesses from problem state - Make sure that irq_state.flags is not used in the future x86: - Intercept port 0x80 accesses to prevent host instability (CVE) - Use userspace FPU context for guest FPU (mainly an optimization that fixes a double use of kernel FPU) - Do not leak one page per module load - Flush APIC page address cache from MMU invalidation notifiers" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits) KVM: x86: fix APIC page invalidation KVM: s390: Fix skey emulation permission check KVM: s390: mark irq_state.flags as non-usable KVM: s390: Remove redundant license text KVM: s390: add SPDX identifiers to the remaining files KVM: VMX: fix page leak in hardware_setup() KVM: VMX: remove I/O port 0x80 bypass on Intel hosts x86,kvm: remove KVM emulator get_fpu / put_fpu x86,kvm: move qemu/guest FPU switching out to vcpu_run KVM: arm/arm64: Fix broken GICH_ELRSR big endian conversion KVM: arm/arm64: kvm_arch_destroy_vm cleanups KVM: arm/arm64: Fix spinlock acquisition in vgic_set_owner kvm: arm: don't treat unavailable HYP mode as an error KVM: arm/arm64: Avoid attempting to load timer vgic state without a vgic kvm: arm64: handle single-step of hyp emulated mmio instructions kvm: arm64: handle single-step during SError exceptions kvm: arm64: handle single-step of userspace mmio instructions kvm: arm64: handle single-stepping trapped instructions KVM: arm/arm64: debug: Introduce helper for single-step arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one ...
2017-12-08kmemcheck: rip it out for realMichal Hocko
Commit 4675ff05de2d ("kmemcheck: rip it out") has removed the code but for some reason SPDX header stayed in place. This looks like a rebase mistake in the mmotm tree or the merge mistake. Let's drop those leftovers as well. Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) CAN fixes from Martin Kelly (cancel URBs properly in all the CAN usb drivers). 2) Revert returning -EEXIST from __dev_alloc_name() as this propagates to userspace and broke some apps. From Johannes Berg. 3) Fix conn memory leaks and crashes in TIPC, from Jon Malloc and Cong Wang. 4) Gianfar MAC can't do EEE so don't advertise it by default, from Claudiu Manoil. 5) Relax strict netlink attribute validation, but emit a warning. From David Ahern. 6) Fix regression in checksum offload of thunderx driver, from Florian Westphal. 7) Fix UAPI bpf issues on s390, from Hendrik Brueckner. 8) New card support in iwlwifi, from Ihab Zhaika. 9) BBR congestion control bug fixes from Neal Cardwell. 10) Fix port stats in nfp driver, from Pieter Jansen van Vuuren. 11) Fix leaks in qualcomm rmnet, from Subash Abhinov Kasiviswanathan. 12) Fix DMA API handling in sh_eth driver, from Thomas Petazzoni. 13) Fix spurious netpoll warnings in bnxt_en, from Calvin Owens. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits) net: mvpp2: fix the RSS table entry offset tcp: evaluate packet losses upon RTT change tcp: fix off-by-one bug in RACK tcp: always evaluate losses in RACK upon undo tcp: correctly test congestion state in RACK bnxt_en: Fix sources of spurious netpoll warnings tcp_bbr: reset long-term bandwidth sampling on loss recovery undo tcp_bbr: reset full pipe detection on loss recovery undo tcp_bbr: record "full bw reached" decision in new full_bw_reached bit sfc: pass valid pointers from efx_enqueue_unwind gianfar: Disable EEE autoneg by default tcp: invalidate rate samples during SACK reneging can: peak/pcie_fd: fix potential bug in restarting tx queue can: usb_8dev: cancel urb on -EPIPE and -EPROTO can: kvaser_usb: cancel urb on -EPIPE and -EPROTO can: esd_usb2: cancel urb on -EPIPE and -EPROTO can: ems_usb: cancel urb on -EPIPE and -EPROTO can: mcba_usb: cancel urb on -EPROTO usbnet: fix alignment for frames with no ethernet header tcp: use current time in tcp_rcv_space_adjust() ...
2017-12-06Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - make CR4 handling irq-safe, which bug vmware guests ran into - don't crash on early IRQs in Xen guests - don't crash secondary CPU bringup if #UD assisted WARN()ings are triggered - make X86_BUG_FXSAVE_LEAK optional on newer AMD CPUs that have the fix - fix AMD Fam17h microcode loading - fix broadcom_postcore_init() if ACPI is disabled - fix resume regression in __restore_processor_context() - fix Sparse warnings - fix a GCC-8 warning * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Change time() prototype to match __vdso_time() x86: Fix Sparse warnings about non-static functions x86/power: Fix some ordering bugs in __restore_processor_context() x86/PCI: Make broadcom_postcore_init() check acpi_disabled x86/microcode/AMD: Add support for fam17h microcode loading x86/cpufeatures: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD x86/idt: Load idt early in start_secondary x86/xen: Support early interrupts in xen pv guests x86/tlb: Disable interrupts when changing CR4 x86/tlb: Refactor CR4 setting and shadow write
2017-12-06x86/vdso: Change time() prototype to match __vdso_time()Arnd Bergmann
gcc-8 warns that time() is an alias for __vdso_time() but the two have different prototypes: arch/x86/entry/vdso/vclock_gettime.c:327:5: error: 'time' alias between functions of incompatible types 'int(time_t *)' {aka 'int(long int *)'} and 'time_t(time_t *)' {aka 'long int(long int *)'} [-Werror=attribute-alias] int time(time_t *t) ^~~~ arch/x86/entry/vdso/vclock_gettime.c:318:16: note: aliased declaration here I could not figure out whether this is intentional, but I see that changing it to return time_t avoids the warning. Returning 'int' from time() is also a bit questionable, as it causes an overflow in y2038 even on 64-bit architectures that use a 64-bit time_t type. On 32-bit architecture with 64-bit time_t, time() should always be implement by the C library by calling a (to be added) clock_gettime() variant that takes a sufficiently wide argument. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: http://lkml.kernel.org/r/20171204150203.852959-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-06x86: Fix Sparse warnings about non-static functionsColin Ian King
Functions x86_vector_debug_show(), uv_handle_nmi() and uv_nmi_setup_common() are local to the source and do not need to be in global scope, so make them static. Fixes up various sparse warnings. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Mike Travis <mike.travis@hpe.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Kosina <trivial@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Cc: travis@sgi.com Link: http://lkml.kernel.org/r/20171206173358.24388-1-colin.king@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-06KVM: x86: fix APIC page invalidationRadim Krčmář
Implementation of the unpinned APIC page didn't update the VMCS address cache when invalidation was done through range mmu notifiers. This became a problem when the page notifier was removed. Re-introduce the arch-specific helper and call it from ...range_start. Reported-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Fixes: 38b9917350cb ("kvm: vmx: Implement set_apic_access_page_addr") Fixes: 369ea8242c0f ("mm/rmap: update to new mmu_notifier semantic v2") Cc: <stable@vger.kernel.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Wanpeng Li <wanpeng.li@hotmail.com> Tested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-12-06x86/power: Fix some ordering bugs in __restore_processor_context()Andy Lutomirski
__restore_processor_context() had a couple of ordering bugs. It restored GSBASE after calling load_gs_index(), and the latter can call into tracing code. It also tried to restore segment registers before restoring the LDT, which is straight-up wrong. Reorder the code so that we restore GSBASE, then the descriptor tables, then the segments. This fixes two bugs. First, it fixes a regression that broke resume under certain configurations due to irqflag tracing in native_load_gs_index(). Second, it fixes resume when the userspace process that initiated suspect had funny segments. The latter can be reproduced by compiling this: // SPDX-License-Identifier: GPL-2.0 /* * ldt_echo.c - Echo argv[1] while using an LDT segment */ int main(int argc, char **argv) { int ret; size_t len; char *buf; const struct user_desc desc = { .entry_number = 0, .base_addr = 0, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0 }; if (argc != 2) errx(1, "Usage: %s STRING", argv[0]); len = asprintf(&buf, "%s\n", argv[1]); if (len < 0) errx(1, "Out of memory"); ret = syscall(SYS_modify_ldt, 1, &desc, sizeof(desc)); if (ret < -1) errno = -ret; if (ret) err(1, "modify_ldt"); asm volatile ("movw %0, %%es" :: "rm" ((unsigned short)7)); write(1, buf, len); return 0; } and running ldt_echo >/sys/power/mem Without the fix, the latter causes a triple fault on resume. Fixes: ca37e57bbe0c ("x86/entry/64: Add missing irqflags tracing to native_load_gs_index()") Reported-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/6b31721ea92f51ea839e79bd97ade4a75b1eeea2.1512057304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-06x86/PCI: Make broadcom_postcore_init() check acpi_disabledRafael J. Wysocki
acpi_os_get_root_pointer() may return a valid address even if acpi_disabled is set, but the host bridge information from the ACPI tables is not going to be used in that case and the Broadcom host bridge initialization should not be skipped then, So make broadcom_postcore_init() check acpi_disabled too to avoid this issue. Fixes: 6361d72b04d1 (x86/PCI: read Broadcom CNB20LE host bridge info before PCI scan) Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Linux PCI <linux-pci@vger.kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/3186627.pxZj1QbYNg@aspire.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-06x86/microcode/AMD: Add support for fam17h microcode loadingTom Lendacky
The size for the Microcode Patch Block (MPB) for an AMD family 17h processor is 3200 bytes. Add a #define for fam17h so that it does not default to 2048 bytes and fail a microcode load/update. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20171130224640.15391.40247.stgit@tlendack-t1.amdoffice.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-06x86/cpufeatures: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMDRudolf Marek
The latest AMD AMD64 Architecture Programmer's Manual adds a CPUID feature XSaveErPtr (CPUID_Fn80000008_EBX[2]). If this feature is set, the FXSAVE, XSAVE, FXSAVEOPT, XSAVEC, XSAVES / FXRSTOR, XRSTOR, XRSTORS always save/restore error pointers, thus making the X86_BUG_FXSAVE_LEAK workaround obsolete on such CPUs. Signed-off-by: Rudolf Marek <r.marek@assembler.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Tested-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/bdcebe90-62c5-1f05-083c-eba7f08b2540@assembler.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-05x86: don't hash faulting address in oops printoutLinus Torvalds
Things like this will probably keep showing up for other architectures and other special cases. I actually thought we already used %lx for this, and that is indeed _historically_ the case, but we moved to %p when merging the 32-bit and 64-bit cases as a convenient way to get the formatting right (ie automatically picking "%08lx" vs "%016lx" based on register size). So just turn this %p into %px. Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-05locking/refcounts: Do not force refcount_t usage as GPL-only exportKees Cook
The refcount_t protection on x86 was not intended to use the stricter GPL export. This adjusts the linkage again to avoid a regression in the availability of the refcount API. Reported-by: Dave Airlie <airlied@gmail.com> Fixes: 7a46ec0e2f48 ("locking/refcounts, x86/asm: Implement fast refcount overflow protection") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-05KVM: VMX: fix page leak in hardware_setup()Jim Mattson
vmx_io_bitmap_b should not be allocated twice. Fixes: 23611332938d ("KVM: VMX: refactor setup of global page-sized bitmaps") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-12-05KVM: VMX: remove I/O port 0x80 bypass on Intel hostsAndrew Honig
This fixes CVE-2017-1000407. KVM allows guests to directly access I/O port 0x80 on Intel hosts. If the guest floods this port with writes it generates exceptions and instability in the host kernel, leading to a crash. With this change guest writes to port 0x80 on Intel will behave the same as they currently behave on AMD systems. Prevent the flooding by removing the code that sets port 0x80 as a passthrough port. This is essentially the same as upstream patch 99f85a28a78e96d28907fe036e1671a218fee597, except that patch was for AMD chipsets and this patch is for Intel. Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Fixes: fdef3ad1b386 ("KVM: VMX: Enable io bitmaps to avoid IO port 0x80 VMEXITs") Cc: <stable@vger.kernel.org> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-12-05x86,kvm: remove KVM emulator get_fpu / put_fpuRik van Riel
Now that get_fpu and put_fpu do nothing, because the scheduler will automatically load and restore the guest FPU context for us while we are in this code (deep inside the vcpu_run main loop), we can get rid of the get_fpu and put_fpu hooks. Signed-off-by: Rik van Riel <riel@redhat.com> Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-05x86,kvm: move qemu/guest FPU switching out to vcpu_runRik van Riel
Currently, every time a VCPU is scheduled out, the host kernel will first save the guest FPU/xstate context, then load the qemu userspace FPU context, only to then immediately save the qemu userspace FPU context back to memory. When scheduling in a VCPU, the same extraneous FPU loads and saves are done. This could be avoided by moving from a model where the guest FPU is loaded and stored with preemption disabled, to a model where the qemu userspace FPU is swapped out for the guest FPU context for the duration of the KVM_RUN ioctl. This is done under the VCPU mutex, which is also taken when other tasks inspect the VCPU FPU context, so the code should already be safe for this change. That should come as no surprise, given that s390 already has this optimization. This can fix a bug where KVM calls get_user_pages while owning the FPU, and the file system ends up requesting the FPU again: [258270.527947] __warn+0xcb/0xf0 [258270.527948] warn_slowpath_null+0x1d/0x20 [258270.527951] kernel_fpu_disable+0x3f/0x50 [258270.527953] __kernel_fpu_begin+0x49/0x100 [258270.527955] kernel_fpu_begin+0xe/0x10 [258270.527958] crc32c_pcl_intel_update+0x84/0xb0 [258270.527961] crypto_shash_update+0x3f/0x110 [258270.527968] crc32c+0x63/0x8a [libcrc32c] [258270.527975] dm_bm_checksum+0x1b/0x20 [dm_persistent_data] [258270.527978] node_prepare_for_write+0x44/0x70 [dm_persistent_data] [258270.527985] dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data] [258270.527988] submit_io+0x170/0x1b0 [dm_bufio] [258270.527992] __write_dirty_buffer+0x89/0x90 [dm_bufio] [258270.527994] __make_buffer_clean+0x4f/0x80 [dm_bufio] [258270.527996] __try_evict_buffer+0x42/0x60 [dm_bufio] [258270.527998] dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio] [258270.528002] shrink_slab.part.40+0x1f5/0x420 [258270.528004] shrink_node+0x22c/0x320 [258270.528006] do_try_to_free_pages+0xf5/0x330 [258270.528008] try_to_free_pages+0xe9/0x190 [258270.528009] __alloc_pages_slowpath+0x40f/0xba0 [258270.528011] __alloc_pages_nodemask+0x209/0x260 [258270.528014] alloc_pages_vma+0x1f1/0x250 [258270.528017] do_huge_pmd_anonymous_page+0x123/0x660 [258270.528021] handle_mm_fault+0xfd3/0x1330 [258270.528025] __get_user_pages+0x113/0x640 [258270.528027] get_user_pages+0x4f/0x60 [258270.528063] __gfn_to_pfn_memslot+0x120/0x3f0 [kvm] [258270.528108] try_async_pf+0x66/0x230 [kvm] [258270.528135] tdp_page_fault+0x130/0x280 [kvm] [258270.528149] kvm_mmu_page_fault+0x60/0x120 [kvm] [258270.528158] handle_ept_violation+0x91/0x170 [kvm_intel] [258270.528162] vmx_handle_exit+0x1ca/0x1400 [kvm_intel] No performance changes were detected in quick ping-pong tests on my 4 socket system, which is expected since an FPU+xstate load is on the order of 0.1us, while ping-ponging between CPUs is on the order of 20us, and somewhat noisy. Cc: stable@vger.kernel.org Signed-off-by: Rik van Riel <riel@redhat.com> Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu, which happened inside from KVM_CREATE_VCPU ioctl. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-12-05bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program typeHendrik Brueckner
Commit 0515e5999a466dfe ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type") introduced the bpf_perf_event_data structure which exports the pt_regs structure. This is OK for multiple architectures but fail for s390 and arm64 which do not export pt_regs. Programs using them, for example, the bpf selftest fail to compile on these architectures. For s390, exporting the pt_regs is not an option because s390 wants to allow changes to it. For arm64, there is a user_pt_regs structure that covers parts of the pt_regs structure for use by user space. To solve the broken uapi for s390 and arm64, introduce an abstract type for pt_regs and add an asm/bpf_perf_event.h file that concretes the type. An asm-generic header file covers the architectures that export pt_regs today. The arch-specific enablement for s390 and arm64 follows in separate commits. Reported-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Fixes: 0515e5999a466dfe ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type") Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: - x86 bugfixes: APIC, nested virtualization, IOAPIC - PPC bugfix: HPT guests on a POWER9 radix host * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits) KVM: Let KVM_SET_SIGNAL_MASK work as advertised KVM: VMX: Fix vmx->nested freeing when no SMI handler KVM: VMX: Fix rflags cache during vCPU reset KVM: X86: Fix softlockup when get the current kvmclock KVM: lapic: Fixup LDR on load in x2apic KVM: lapic: Split out x2apic ldr calculation KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on radix hosts KVM: vmx: use X86_CR4_UMIP and X86_FEATURE_UMIP KVM: x86: Fix CPUID function for word 6 (80000001_ECX) KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2 KVM: x86: ioapic: Preserve read-only values in the redirection table KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq KVM: x86: ioapic: Don't fire level irq when Remote IRR set KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race KVM: x86: inject exceptions produced by x86_decode_insn KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRs KVM: x86: fix em_fxstor() sleeping while in atomic KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry ...
2017-11-29mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITEDan Williams
In response to compile breakage introduced by a series that added the pud_write helper to x86, Stephen notes: did you consider using the other paradigm: In arch include files: #define pud_write pud_write static inline int pud_write(pud_t pud) ..... Then in include/asm-generic/pgtable.h: #ifndef pud_write tatic inline int pud_write(pud_t pud) { .... } #endif If you had, then the powerpc code would have worked ... ;-) and many of the other interfaces in include/asm-generic/pgtable.h are protected that way ... Given that some architecture already define pmd_write() as a macro, it's a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE. Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Oliver OHalloran <oliveroh@au1.ibm.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29mm: fix device-dax pud write-faults triggered by get_user_pages()Dan Williams
Currently only get_user_pages_fast() can safely handle the writable gup case due to its use of pud_access_permitted() to check whether the pud entry is writable. In the gup slow path pud_write() is used instead of pud_access_permitted() and to date it has been unimplemented, just calls BUG_ON(). kernel BUG at ./include/linux/hugetlb.h:244! [..] RIP: 0010:follow_devmap_pud+0x482/0x490 [..] Call Trace: follow_page_mask+0x28c/0x6e0 __get_user_pages+0xe4/0x6c0 get_user_pages_unlocked+0x130/0x1b0 get_user_pages_fast+0x89/0xb0 iov_iter_get_pages_alloc+0x114/0x4a0 nfs_direct_read_schedule_iovec+0xd2/0x350 ? nfs_start_io_direct+0x63/0x70 nfs_file_direct_read+0x1e0/0x250 nfs_file_read+0x90/0xc0 For now this just implements a simple check for the _PAGE_RW bit similar to pmd_write. However, this implies that the gup-slow-path check is missing the extra checks that the gup-fast-path performs with pud_access_permitted. Later patches will align all checks to use the 'access_permitted' helper if the architecture provides it. Note that the generic 'access_permitted' helper fallback is the simple _PAGE_RW check on architectures that do not define the 'access_permitted' helper(s). [dan.j.williams@intel.com: fix powerpc compile error] Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwillia2-desk3.amr.corp.intel.com Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86] Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-28x86/idt: Load idt early in start_secondaryChunyu Hu
On a secondary, idt is first loaded in cpu_init() with load_current_idt(), i.e. no exceptions can be handled before that point. The conversion of WARN() to use UD requires the IDT being loaded earlier as any warning between start_secondary() and load_curren_idt() in cpu_init() will result in an unhandled @UD exception and therefore fail the bringup of the CPU. Install the IDT handlers right in start_secondary() before calling cpu_init(). [ tglx: Massaged changelog ] Fixes: 9a93848fe787 ("x86/debug: Implement __WARN() using UD0") Signed-off-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: peterz@infradead.org Cc: bp@alien8.de Cc: rostedt@goodmis.org Cc: luto@kernel.org Link: https://lkml.kernel.org/r/1511792499-4073-1-git-send-email-chuhu@redhat.com
2017-11-28x86/xen: Support early interrupts in xen pv guestsJuergen Gross
Add early interrupt handlers activated by idt_setup_early_handler() to the handlers supported by Xen pv guests. This will allow for early WARN() calls not crashing the guest. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: boris.ostrovsky@oracle.com Link: https://lkml.kernel.org/r/20171124084221.30172-1-jgross@suse.com
2017-11-27KVM: Let KVM_SET_SIGNAL_MASK work as advertisedJan H. Schönherr
KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that "any unblocked signal received [...] will cause KVM_RUN to return with -EINTR" and that "the signal will only be delivered if not blocked by the original signal mask". This, however, is only true, when the calling task has a signal handler registered for a signal. If not, signal evaluation is short-circuited for SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN returning or the whole process is terminated. Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar to that in do_sigtimedwait() to avoid short-circuiting of signals. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-27KVM: VMX: Fix vmx->nested freeing when no SMI handlerWanpeng Li
Reported by syzkaller: ------------[ cut here ]------------ WARNING: CPU: 5 PID: 2939 at arch/x86/kvm/vmx.c:3844 free_loaded_vmcs+0x77/0x80 [kvm_intel] CPU: 5 PID: 2939 Comm: repro Not tainted 4.14.0+ #26 RIP: 0010:free_loaded_vmcs+0x77/0x80 [kvm_intel] Call Trace: vmx_free_vcpu+0xda/0x130 [kvm_intel] kvm_arch_destroy_vm+0x192/0x290 [kvm] kvm_put_kvm+0x262/0x560 [kvm] kvm_vm_release+0x2c/0x30 [kvm] __fput+0x190/0x370 task_work_run+0xa1/0xd0 do_exit+0x4d2/0x13e0 do_group_exit+0x89/0x140 get_signal+0x318/0xb80 do_signal+0x8c/0xb40 exit_to_usermode_loop+0xe4/0x140 syscall_return_slowpath+0x206/0x230 entry_SYSCALL_64_fastpath+0x98/0x9a The syzkaller testcase will execute VMXON/VMLAUCH instructions, so the vmx->nested stuff is populated, it will also issue KVM_SMI ioctl. However, the testcase is just a simple c program and not be lauched by something like seabios which implements smi_handler. Commit 05cade71cf (KVM: nSVM: fix SMI injection in guest mode) gets out of guest mode and set nested.vmxon to false for the duration of SMM according to SDM 34.14.1 "leave VMX operation" upon entering SMM. We can't alloc/free the vmx->nested stuff each time when entering/exiting SMM since it will induce more overhead. So the function vmx_pre_enter_smm() marks nested.vmxon false even if vmx->nested stuff is still populated. What it expected is em_rsm() can mark nested.vmxon to be true again. However, the smi_handler/rsm will not execute since there is no something like seabios in this scenario. The function free_nested() fails to free the vmx->nested stuff since the vmx->nested.vmxon is false which results in the above warning. This patch fixes it by also considering the no SMI handler case, luckily vmx->nested.smm.vmxon is marked according to the value of vmx->nested.vmxon in vmx_pre_enter_smm(), we can take advantage of it and free vmx->nested stuff when L1 goes down. Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Fixes: 05cade71cf (KVM: nSVM: fix SMI injection in guest mode) Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-27KVM: VMX: Fix rflags cache during vCPU resetWanpeng Li
Reported by syzkaller: *** Guest State *** CR0: actual=0x0000000080010031, shadow=0x0000000060000010, gh_mask=fffffffffffffff7 CR4: actual=0x0000000000002061, shadow=0x0000000000000000, gh_mask=ffffffffffffe8f1 CR3 = 0x000000002081e000 RSP = 0x000000000000fffa RIP = 0x0000000000000000 RFLAGS=0x00023000 DR7 = 0x00000000000000 ^^^^^^^^^^ ------------[ cut here ]------------ WARNING: CPU: 6 PID: 24431 at /home/kernel/linux/arch/x86/kvm//x86.c:7302 kvm_arch_vcpu_ioctl_run+0x651/0x2ea0 [kvm] CPU: 6 PID: 24431 Comm: reprotest Tainted: G W OE 4.14.0+ #26 RIP: 0010:kvm_arch_vcpu_ioctl_run+0x651/0x2ea0 [kvm] RSP: 0018:ffff880291d179e0 EFLAGS: 00010202 Call Trace: kvm_vcpu_ioctl+0x479/0x880 [kvm] do_vfs_ioctl+0x142/0x9a0 SyS_ioctl+0x74/0x80 entry_SYSCALL_64_fastpath+0x23/0x9a The failed vmentry is triggered by the following beautified testcase: #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <linux/kvm.h> #include <fcntl.h> #include <sys/ioctl.h> long r[5]; int main() { struct kvm_debugregs dr = { 0 }; r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7); struct kvm_guest_debug debug = { .control = 0xf0403, .arch = { .debugreg[6] = 0x2, .debugreg[7] = 0x2 } }; ioctl(r[4], KVM_SET_GUEST_DEBUG, &debug); ioctl(r[4], KVM_RUN, 0); } which testcase tries to setup the processor specific debug registers and configure vCPU for handling guest debug events through KVM_SET_GUEST_DEBUG. The KVM_SET_GUEST_DEBUG ioctl will get and set rflags in order to set TF bit if single step is needed. All regs' caches are reset to avail and GUEST_RFLAGS vmcs field is reset to 0x2 during vCPU reset. However, the cache of rflags is not reset during vCPU reset. The function vmx_get_rflags() returns an unreset rflags cache value since the cache is marked avail, it is 0 after boot. Vmentry fails if the rflags reserved bit 1 is 0. This patch fixes it by resetting both the GUEST_RFLAGS vmcs field and its cache to 0x2 during vCPU reset. Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-27KVM: X86: Fix softlockup when get the current kvmclockWanpeng Li
watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [qemu-system-x86:10185] CPU: 6 PID: 10185 Comm: qemu-system-x86 Tainted: G OE 4.14.0-rc4+ #4 RIP: 0010:kvm_get_time_scale+0x4e/0xa0 [kvm] Call Trace: get_time_ref_counter+0x5a/0x80 [kvm] kvm_hv_process_stimers+0x120/0x5f0 [kvm] kvm_arch_vcpu_ioctl_run+0x4b4/0x1690 [kvm] kvm_vcpu_ioctl+0x33a/0x620 [kvm] do_vfs_ioctl+0xa1/0x5d0 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x1e/0xa9 This can be reproduced when running kvm-unit-tests/hyperv_stimer.flat and cpu-hotplug stress simultaneously. __this_cpu_read(cpu_tsc_khz) returns 0 (set in kvmclock_cpu_down_prep()) when the pCPU is unhotplug which results in kvm_get_time_scale() gets into an infinite loop. This patch fixes it by treating the unhotplug pCPU as not using master clock. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-27KVM: lapic: Fixup LDR on load in x2apicDr. David Alan Gilbert
In x2apic mode the LDR is fixed based on the ID rather than separately loadable like it was before x2. When kvm_apic_set_state is called, the base is set, and if it has the X2APIC_ENABLE flag set then the LDR is calculated; however that value gets overwritten by the memcpy a few lines below overwriting it with the value that came from userland. The symptom is a lack of EOI after loading the state (e.g. after a QEMU migration) and is due to the EOI bitmap being wrong due to the incorrect LDR. This was seen with a Win2016 guest under Qemu with irqchip=split whose USB mouse didn't work after a VM migration. This corresponds to RH bug: https://bugzilla.redhat.com/show_bug.cgi?id=1502591 Reported-by: Yiqian Wei <yiwei@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: stable@vger.kernel.org [Applied fixup from Liran Alon. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-27KVM: lapic: Split out x2apic ldr calculationDr. David Alan Gilbert
Split out the ldr calculation from kvm_apic_set_x2apic_id since we're about to reuse it in the following patch. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-26Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - topology enumeration fixes - KASAN fix - two entry fixes (not yet the big series related to KASLR) - remove obsolete code - instruction decoder fix - better /dev/mem sanity checks, hopefully working better this time - pkeys fixes - two ACPI fixes - 5-level paging related fixes - UMIP fixes that should make application visible faults more debuggable - boot fix for weird virtualization environment * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/decoder: Add new TEST instruction pattern x86/PCI: Remove unused HyperTransport interrupt support x86/umip: Fix insn_get_code_seg_params()'s return value x86/boot/KASLR: Remove unused variable x86/entry/64: Add missing irqflags tracing to native_load_gs_index() x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing x86/pkeys/selftests: Fix protection keys write() warning x86/pkeys/selftests: Rename 'si_pkey' to 'siginfo_pkey' x86/mpx/selftests: Fix up weird arrays x86/pkeys: Update documentation about availability x86/umip: Print a warning into the syslog if UMIP-protected instructions are used x86/smpboot: Fix __max_logical_packages estimate x86/topology: Avoid wasting 128k for package id array perf/x86/intel/uncore: Cache logical pkg id in uncore driver x86/acpi: Reduce code duplication in mp_override_legacy_irq() x86/acpi: Handle SCI interrupts above legacy space gracefully x86/boot: Fix boot failure when SMP MP-table is based at 0 x86/mm: Limit mmap() of /dev/mem to valid physical addresses x86/selftests: Add test for mapping placement for 5-level paging ...
2017-11-26Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: two PMU driver fixes and a memory leak fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix memory leak triggered by perf --namespace perf/x86/intel/uncore: Add event constraint for BDX PCU perf/x86/intel: Hide TSX events when RTM is not supported
2017-11-25x86/tlb: Disable interrupts when changing CR4Nadav Amit
CR4 modifications are implemented as RMW operations which update a shadow variable and write the result to CR4. The RMW operation is protected by preemption disable, but there is no enforcement or debugging mechanism. CR4 modifications happen also in interrupt context via __native_flush_tlb_global(). This implementation does not affect a interrupted thread context CR4 operation, because the CR4 toggle restores the original content and does not modify the shadow variable. So the current situation seems to be safe, but a recent patch tried to add an actual RMW operation in interrupt context, which will cause subtle corruptions. To prevent that and make the CR4 handling future proof: - Add a lockdep assertion to __cr4_set() which will catch interrupt enabled invocations - Disable interrupts in the cr4 manipulator inlines - Rename cr4_toggle_bits() to cr4_toggle_bits_irqsoff(). This is called from __switch_to_xtra() where interrupts are already disabled and performance matters. All other call sites are not performance critical, so the extra overhead of an additional local_irq_save/restore() pair is not a problem. If new call sites care about performance then the necessary _irqsoff() variants can be added. [ tglx: Condensed the patch by moving the irq protection inside the manipulator functions. Updated changelog ] Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Luck <tony.luck@intel.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: nadav.amit@gmail.com Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/20171125032907.2241-3-namit@vmware.com
2017-11-25x86/tlb: Refactor CR4 setting and shadow writeNadav Amit
Refactor the write to CR4 and its shadow value. This is done in preparation for the addition of an assertion to check that IRQs are disabled during CR4 update. No functional change. Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: nadav.amit@gmail.com Cc: Andy Lutomirski <luto@kernel.org> Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/20171125032907.2241-2-namit@vmware.com
2017-11-24Merge tag 'kvm-4.15-2' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Radim Krčmář: "Trimmed second batch of KVM changes for Linux 4.15: - GICv4 Support for KVM/ARM - re-introduce support for CPUs without virtual NMI (cc stable) and allow testing of KVM without virtual NMI on available CPUs - fix long-standing performance issues with assigned devices on AMD (cc stable)" * tag 'kvm-4.15-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (30 commits) kvm: vmx: Allow disabling virtual NMI support kvm: vmx: Reinstate support for CPUs without virtual NMI KVM: SVM: obey guest PAT KVM: arm/arm64: Don't queue VLPIs on INV/INVALL KVM: arm/arm64: Fix GICv4 ITS initialization issues KVM: arm/arm64: GICv4: Theory of operations KVM: arm/arm64: GICv4: Enable VLPI support KVM: arm/arm64: GICv4: Prevent userspace from changing doorbell affinity KVM: arm/arm64: GICv4: Prevent a VM using GICv4 from being saved KVM: arm/arm64: GICv4: Enable virtual cpuif if VLPIs can be delivered KVM: arm/arm64: GICv4: Hook vPE scheduling into vgic flush/sync KVM: arm/arm64: GICv4: Use the doorbell interrupt as an unblocking source KVM: arm/arm64: GICv4: Add doorbell interrupt handling KVM: arm/arm64: GICv4: Use pending_last as a scheduling hint KVM: arm/arm64: GICv4: Handle INVALL applied to a vPE KVM: arm/arm64: GICv4: Propagate property updates to VLPIs KVM: arm/arm64: GICv4: Handle MOVALL applied to a vPE KVM: arm/arm64: GICv4: Handle CLEAR applied to a VLPI KVM: arm/arm64: GICv4: Propagate affinity changes to the physical ITS KVM: arm/arm64: GICv4: Unmap VLPI when freeing an LPI ...
2017-11-24x86/decoder: Add new TEST instruction patternMasami Hiramatsu
The kbuild test robot reported this build warning: Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx) Warning: objdump says 3 bytes, but insn_get_length() says 2 Warning: decoded and checked 1569014 instructions with 1 warnings This sequence seems to be a new instruction not in the opcode map in the Intel SDM. The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8. Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of the ModR/M Byte (bits 2,1,0 in parenthesis)" In that table, opcodes listed by the index REG bits as: 000 001 010 011 100 101 110 111 TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX So, it seems TEST Ib is assigned to 001. Add the new pattern. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-23x86/PCI: Remove unused HyperTransport interrupt supportBjorn Helgaas
There are no in-tree callers of ht_create_irq(), the driver interface for HyperTransport interrupts, left. Remove the unused entry point and all the supporting code. See 8b955b0dddb3 ("[PATCH] Initial generic hypertransport interrupt support"). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-pci@vger.kernel.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Link: https://lkml.kernel.org/r/20171122221337.3877.23362.stgit@bhelgaas-glaptop.roam.corp.google.com
2017-11-23x86/umip: Fix insn_get_code_seg_params()'s return valueBorislav Petkov
In order to save on redundant structs definitions insn_get_code_seg_params() was made to return two 4-bit values in a char but clang complains: arch/x86/lib/insn-eval.c:780:10: warning: implicit conversion from 'int' to 'char' changes value from 132 to -124 [-Wconstant-conversion] return INSN_CODE_SEG_PARAMS(4, 8); ~~~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~ ./arch/x86/include/asm/insn-eval.h:16:57: note: expanded from macro 'INSN_CODE_SEG_PARAMS' #define INSN_CODE_SEG_PARAMS(oper_sz, addr_sz) (oper_sz | (addr_sz << 4)) Those two values do get picked apart afterwards the opposite way of how they were ORed so wrt to the LSByte, the return value is the same. But this function returns -EINVAL in the error case, which is an int. So make it return an int which is the native word size anyway and thus fix the clang warning. Reported-by: Kees Cook <keescook@google.com> Reported-by: Nick Desaulniers <nick.desaulniers@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ricardo.neri-calderon@linux.intel.com Link: https://lkml.kernel.org/r/20171123091951.1462-1-bp@alien8.de
2017-11-23x86/boot/KASLR: Remove unused variableChao Fan
There are two variables "rc" in mem_avoid_memmap. One at the top of the function and another one inside the while() loop. Drop the outer one as it is unused. Cleanup some whitespace damage while at it. Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: gregkh@linuxfoundation.org Cc: n-horiguchi@ah.jp.nec.com Cc: keescook@chromium.org Link: https://lkml.kernel.org/r/20171123090847.15293-1-fanc.fnst@cn.fujitsu.com
2017-11-23x86/entry/64: Add missing irqflags tracing to native_load_gs_index()Andy Lutomirski
Running this code with IRQs enabled (where dummy_lock is a spinlock): static void check_load_gs_index(void) { /* This will fail. */ load_gs_index(0xffff); spin_lock(&dummy_lock); spin_unlock(&dummy_lock); } Will generate a lockdep warning. The issue is that the actual write to %gs would cause an exception with IRQs disabled, and the exception handler would, as an inadvertent side effect, update irqflag tracing to reflect the IRQs-off status. native_load_gs_index() would then turn IRQs back on and return with irqflag tracing still thinking that IRQs were off. The dummy lock-and-unlock causes lockdep to notice the error and warn. Fix it by adding the missing tracing. Apparently nothing did this in a context where it mattered. I haven't tried to find a code path that would actually exhibit the warning if appropriately nasty user code were running. I suspect that the security impact of this bug is very, very low -- production systems don't run with lockdep enabled, and the warning is mostly harmless anyway. Found during a quick audit of the entry code to try to track down an unrelated bug that Ingo found in some still-in-development code. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/e1aeb0e6ba8dd430ec36c8a35e63b429698b4132.1511411918.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-22x86/mm/kasan: Don't use vmemmap_populate() to initialize shadowAndrey Ryabinin
[ Note, this commit is a cherry-picked version of: d17a1d97dc20: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow") ... for easier x86 entry code testing and back-porting. ] The KASAN shadow is currently mapped using vmemmap_populate() since that provides a semi-convenient way to map pages into init_top_pgt. However, since that no longer zeroes the mapped pages, it is not suitable for KASAN, which requires zeroed shadow memory. Add kasan_populate_shadow() interface and use it instead of vmemmap_populate(). Besides, this allows us to take advantage of gigantic pages and use them to populate the shadow, which should save us some memory wasted on page tables and reduce TLB pressure. Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Bob Picco <bob.picco@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Alexander Potapenko <glider@google.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-22x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracingAndy Lutomirski
When I added entry_SYSCALL_64_after_hwframe(), I left TRACE_IRQS_OFF before it. This means that users of entry_SYSCALL_64_after_hwframe() were responsible for invoking TRACE_IRQS_OFF, and the one and only user (Xen, added in the same commit) got it wrong. I think this would manifest as a warning if a Xen PV guest with CONFIG_DEBUG_LOCKDEP=y were used with context tracking. (The context tracking bit is to cause lockdep to get invoked before we turn IRQs back on.) I haven't tested that for real yet because I can't get a kernel configured like that to boot at all on Xen PV. Move TRACE_IRQS_OFF below the label. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 8a9949bc71a7 ("x86/xen/64: Rearrange the SYSCALL entries") Link: http://lkml.kernel.org/r/9150aac013b7b95d62c2336751d5b6e91d2722aa.1511325444.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-21x86/umip: Print a warning into the syslog if UMIP-protected instructions are ↵Ricardo Neri
used Print a rate-limited warning when a user-space program attempts to execute any of the instructions that UMIP protects (i.e., SGDT, SIDT, SLDT, STR and SMSW). This is useful, because when CONFIG_X86_INTEL_UMIP=y is selected and supported by the hardware, user space programs that try to execute such instructions will receive a SIGSEGV signal that they might not expect. In the specific cases for which emulation is provided (instructions SGDT, SIDT and SMSW in protected and virtual-8086 modes), no signal is generated. However, a warning is helpful to encourage updates in such programs to avoid the use of such instructions. Warnings are printed via a customized printk() function that also provides information about the program that attempted to use the affected instructions. Utility macros are defined to wrap umip_printk() for the error and warning kernel log levels. While here, replace an existing call to the generic rate-limited pr_err() with the new umip_pr_err(). Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1511233476-17088-1-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-17Merge tag 'kbuild-v4.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: "One of the most remarkable improvements in this cycle is, Kbuild is now able to cache the result of shell commands. Some variables are expensive to compute, for example, $(call cc-option,...) invokes the compiler. It is not efficient to redo this computation every time, even when we are not actually building anything. Kbuild creates a hidden file ".cache.mk" that contains invoked shell commands and their results. The speed-up should be noticeable. Summary: - Fix arch build issues (hexagon, sh) - Clean up various Makefiles and scripts - Fix wrong usage of {CFLAGS,LDFLAGS}_MODULE in arch Makefiles - Cache variables that are expensive to compute - Improve cc-ldopton and ld-option for Clang - Optimize output directory creation" * tag 'kbuild-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (30 commits) kbuild: move coccicheck help from scripts/Makefile.help to top Makefile sh: decompressor: add shipped files to .gitignore frv: .gitignore: ignore vmlinux.lds selinux: remove unnecessary assignment to subdir- kbuild: specify FORCE in Makefile.headersinst as .PHONY target kbuild: remove redundant mkdir from ./Kbuild kbuild: optimize object directory creation for incremental build kbuild: create object directories simpler and faster kbuild: filter-out PHONY targets from "targets" kbuild: remove redundant $(wildcard ...) for cmd_files calculation kbuild: create directory for make cache only when necessary sh: select KBUILD_DEFCONFIG depending on ARCH kbuild: fix linker feature test macros when cross compiling with Clang kbuild: shrink .cache.mk when it exceeds 1000 lines kbuild: do not call cc-option before KBUILD_CFLAGS initialization kbuild: Cache a few more calls to the compiler kbuild: Add a cache for generated variables kbuild: add forward declaration of default target to Makefile.asm-generic kbuild: remove KBUILD_SUBDIR_ASFLAGS and KBUILD_SUBDIR_CCFLAGS hexagon/kbuild: replace CFLAGS_MODULE with KBUILD_CFLAGS_MODULE ...
2017-11-17Merge tag 'pm-fixes-4.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull two power management fixes from Rafael Wysocki: "This is the change making /proc/cpuinfo on x86 report current CPU frequency in "cpu MHz" again in all cases and an additional one dealing with an overzealous check in one of the helper routines in the runtime PM framework" * tag 'pm-fixes-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / runtime: Drop children check from __pm_runtime_set_status() x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
2017-11-17Merge tag 'locks-v4.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking update from Jeff Layton: "A couple of fixes for a patch that went into v4.14, and the bug report just came in a few days ago.. It passes my (minimal) testing, and has been in linux-next for a few days now. I also would like to get my address changed in MAINTAINERS to clear that hurdle" * tag 'locks-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall fcntl: don't leak fd reference when fixup_compat_flock fails MAINTAINERS: s/jlayton@poochiereds.net/jlayton@kernel.org/
2017-11-17x86/smpboot: Fix __max_logical_packages estimatePrarit Bhargava
A system booted with a small number of cores enabled per package panics because the estimate of __max_logical_packages is too low. This occurs when the total number of active cores across all packages is less than the maximum core count for a single package. e.g.: On a 4 package system with 20 cores/package where only 4 cores are enabled on each package, the value of __max_logical_packages is calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4. Calculate __max_logical_packages after the cpu enumeration has completed. Use the boot cpu's data to extrapolate the number of packages. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kan Liang <kan.liang@intel.com> Cc: He Chen <he.chen@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Piotr Luc <piotr.luc@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Mathias Krause <minipli@googlemail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20171114124257.22013-4-prarit@redhat.com
2017-11-17x86/topology: Avoid wasting 128k for package id arrayAndi Kleen
Analyzing large early boot allocations unveiled the logical package id storage as a prominent memory waste. Since commit 1f12e32f4cd5 ("x86/topology: Create logical package id") every 64-bit system allocates a 128k array to convert logical package ids. This happens because the array is sized for MAX_LOCAL_APIC which is always 32k on 64bit systems, and it needs 4 bytes for each entry. This is fairly wasteful, especially for the common case of having only one socket, which uses exactly 4 byte out of 128K. There is no user of the package id map which is performance critical, so the lookup is not required to be O(1). Store the logical processor id in cpu_data and use a loop based lookup. To keep the mapping stable accross cpu hotplug operations, add a flag to cpu_data which is set when the CPU is brought up the first time. When the flag is set, then cpu_data is not reinitialized by copying boot_cpu_data on subsequent bringups. [ tglx: Rename the flag to 'initialized', use proper pointers instead of repeated cpu_data(x) evaluation and massage changelog. ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kan Liang <kan.liang@intel.com> Cc: He Chen <he.chen@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Piotr Luc <piotr.luc@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Mathias Krause <minipli@googlemail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20171114124257.22013-3-prarit@redhat.com
2017-11-17perf/x86/intel/uncore: Cache logical pkg id in uncore driverAndi Kleen
The SNB-EP uncore driver is the only user of topology_phys_to_logical_pkg in a performance critical path. Change it query the logical pkg ID only once at initialization time and then cache it in box structure. This allows to change the logical package management without affecting the performance critical path. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kan Liang <kan.liang@intel.com> Cc: He Chen <he.chen@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Piotr Luc <piotr.luc@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Mathias Krause <minipli@googlemail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20171114124257.22013-2-prarit@redhat.com
2017-11-17x86/acpi: Reduce code duplication in mp_override_legacy_irq()Vikas C Sajjan
The new function mp_register_ioapic_irq() is a subset of the code in mp_override_legacy_irq(). Replace the code duplication by invoking mp_register_ioapic_irq() from mp_override_legacy_irq(). Signed-off-by: Vikas C Sajjan <vikas.cha.sajjan@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: linux-pm@vger.kernel.org Cc: kkamagui@gmail.com Cc: linux-acpi@vger.kernel.org Link: https://lkml.kernel.org/r/1510848825-21965-3-git-send-email-vikas.cha.sajjan@hpe.com
2017-11-17x86/acpi: Handle SCI interrupts above legacy space gracefullyVikas C Sajjan
Platforms which support only IOAPIC mode, pass the SCI information above the legacy space (0-15) via the FADT mechanism and not via MADT. In such cases mp_override_legacy_irq() which is invoked from acpi_sci_ioapic_setup() to register SCI interrupts fails for interrupts greater equal 16, since it is meant to handle only the legacy space and emits error "Invalid bus_irq %u for legacy override". Add a new function to handle SCI interrupts >= 16 and invoke it conditionally in acpi_sci_ioapic_setup(). The code duplication due to this new function will be cleaned up in a separate patch. Co-developed-by: Sunil V L <sunil.vl@hpe.com> Signed-off-by: Vikas C Sajjan <vikas.cha.sajjan@hpe.com> Signed-off-by: Sunil V L <sunil.vl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Abdul Lateef Attar <abdul-lateef.attar@hpe.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: linux-pm@vger.kernel.org Cc: kkamagui@gmail.com Cc: linux-acpi@vger.kernel.org Link: https://lkml.kernel.org/r/1510848825-21965-2-git-send-email-vikas.cha.sajjan@hpe.com