summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2016-11-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "Four fixes for bugs found by syzkaller on x86, all for stable" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: check for pic and ioapic presence before use KVM: x86: fix out-of-bounds accesses of rtc_eoi map KVM: x86: drop error recovery in em_jmp_far and em_ret_far KVM: x86: fix out-of-bounds access in lapic
2016-11-24KVM: x86: check for pic and ioapic presence before useRadim Krčmář
Split irqchip allows pic and ioapic routes to be used without them being created, which results in NULL access. Check for NULL and avoid it. (The setup is too racy for a nicer solutions.) Found by syzkaller: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 3 PID: 11923 Comm: kworker/3:2 Not tainted 4.9.0-rc5+ #27 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events irqfd_inject task: ffff88006a06c7c0 task.stack: ffff880068638000 RIP: 0010:[...] [...] __lock_acquire+0xb35/0x3380 kernel/locking/lockdep.c:3221 RSP: 0000:ffff88006863ea20 EFLAGS: 00010006 RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: 0000000000000000 RDX: 0000000000000039 RSI: 0000000000000000 RDI: 1ffff1000d0c7d9e RBP: ffff88006863ef58 R08: 0000000000000001 R09: 0000000000000000 R10: 00000000000001c8 R11: 0000000000000000 R12: ffff88006a06c7c0 R13: 0000000000000001 R14: ffffffff8baab1a0 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004abdd0 CR3: 000000003e2f2000 CR4: 00000000000026e0 Stack: ffffffff894d0098 1ffff1000d0c7d56 ffff88006863ecd0 dffffc0000000000 ffff88006a06c7c0 0000000000000000 ffff88006863ecf8 0000000000000082 0000000000000000 ffffffff815dd7c1 ffffffff00000000 ffffffff00000000 Call Trace: [...] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3746 [...] __raw_spin_lock include/linux/spinlock_api_smp.h:144 [...] _raw_spin_lock+0x38/0x50 kernel/locking/spinlock.c:151 [...] spin_lock include/linux/spinlock.h:302 [...] kvm_ioapic_set_irq+0x4c/0x100 arch/x86/kvm/ioapic.c:379 [...] kvm_set_ioapic_irq+0x8f/0xc0 arch/x86/kvm/irq_comm.c:52 [...] kvm_set_irq+0x239/0x640 arch/x86/kvm/../../../virt/kvm/irqchip.c:101 [...] irqfd_inject+0xb4/0x150 arch/x86/kvm/../../../virt/kvm/eventfd.c:60 [...] process_one_work+0xb40/0x1ba0 kernel/workqueue.c:2096 [...] worker_thread+0x214/0x18a0 kernel/workqueue.c:2230 [...] kthread+0x328/0x3e0 kernel/kthread.c:209 [...] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: 49df6397edfc ("KVM: x86: Split the APIC from the rest of IRQCHIP.") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-24KVM: x86: fix out-of-bounds accesses of rtc_eoi mapRadim Krčmář
KVM was using arrays of size KVM_MAX_VCPUS with vcpu_id, but ID can be bigger that the maximal number of VCPUs, resulting in out-of-bounds access. Found by syzkaller: BUG: KASAN: slab-out-of-bounds in __apic_accept_irq+0xb33/0xb50 at addr [...] Write of size 1 by task a.out/27101 CPU: 1 PID: 27101 Comm: a.out Not tainted 4.9.0-rc5+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [...] Call Trace: [...] __apic_accept_irq+0xb33/0xb50 arch/x86/kvm/lapic.c:905 [...] kvm_apic_set_irq+0x10e/0x180 arch/x86/kvm/lapic.c:495 [...] kvm_irq_delivery_to_apic+0x732/0xc10 arch/x86/kvm/irq_comm.c:86 [...] ioapic_service+0x41d/0x760 arch/x86/kvm/ioapic.c:360 [...] ioapic_set_irq+0x275/0x6c0 arch/x86/kvm/ioapic.c:222 [...] kvm_ioapic_inject_all arch/x86/kvm/ioapic.c:235 [...] kvm_set_ioapic+0x223/0x310 arch/x86/kvm/ioapic.c:670 [...] kvm_vm_ioctl_set_irqchip arch/x86/kvm/x86.c:3668 [...] kvm_arch_vm_ioctl+0x1a08/0x23c0 arch/x86/kvm/x86.c:3999 [...] kvm_vm_ioctl+0x1fa/0x1a70 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3099 Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: af1bae5497b9 ("KVM: x86: bump KVM_MAX_VCPU_ID to 1023") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-24KVM: x86: drop error recovery in em_jmp_far and em_ret_farRadim Krčmář
em_jmp_far and em_ret_far assumed that setting IP can only fail in 64 bit mode, but syzkaller proved otherwise (and SDM agrees). Code segment was restored upon failure, but it was left uninitialized outside of long mode, which could lead to a leak of host kernel stack. We could have fixed that by always saving and restoring the CS, but we take a simpler approach and just break any guest that manages to fail as the error recovery is error-prone and modern CPUs don't need emulator for this. Found by syzkaller: WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480 Kernel panic - not syncing: panic_on_warn set ... CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [...] Call Trace: [...] __dump_stack lib/dump_stack.c:15 [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51 [...] panic+0x1b7/0x3a3 kernel/panic.c:179 [...] __warn+0x1c4/0x1e0 kernel/panic.c:542 [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585 [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217 [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227 [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294 [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545 [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116 [...] complete_emulated_io arch/x86/kvm/x86.c:6870 [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934 [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978 [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557 [...] vfs_ioctl fs/ioctl.c:43 [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679 [...] SYSC_ioctl fs/ioctl.c:694 [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685 [...] entry_SYSCALL_64_fastpath+0x1f/0xc2 Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far jumps") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-24KVM: x86: fix out-of-bounds access in lapicRadim Krčmář
Cluster xAPIC delivery incorrectly assumed that dest_id <= 0xff. With enabled KVM_X2APIC_API_USE_32BIT_IDS in KVM_CAP_X2APIC_API, a userspace can send an interrupt with dest_id that results in out-of-bounds access. Found by syzkaller: BUG: KASAN: slab-out-of-bounds in kvm_irq_delivery_to_apic_fast+0x11fa/0x1210 at addr ffff88003d9ca750 Read of size 8 by task syz-executor/22923 CPU: 0 PID: 22923 Comm: syz-executor Not tainted 4.9.0-rc4+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [...] Call Trace: [...] __dump_stack lib/dump_stack.c:15 [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51 [...] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156 [...] print_address_description mm/kasan/report.c:194 [...] kasan_report_error mm/kasan/report.c:283 [...] kasan_report+0x231/0x500 mm/kasan/report.c:303 [...] __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:329 [...] kvm_irq_delivery_to_apic_fast+0x11fa/0x1210 arch/x86/kvm/lapic.c:824 [...] kvm_irq_delivery_to_apic+0x132/0x9a0 arch/x86/kvm/irq_comm.c:72 [...] kvm_set_msi+0x111/0x160 arch/x86/kvm/irq_comm.c:157 [...] kvm_send_userspace_msi+0x201/0x280 arch/x86/kvm/../../../virt/kvm/irqchip.c:74 [...] kvm_vm_ioctl+0xba5/0x1670 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3015 [...] vfs_ioctl fs/ioctl.c:43 [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679 [...] SYSC_ioctl fs/ioctl.c:694 [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685 [...] entry_SYSCALL_64_fastpath+0x1f/0xc2 Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: e45115b62f9a ("KVM: x86: use physical LAPIC array for logical x2APIC") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-23Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Six fixes for bugs that were found via fuzzing, and a trivial hw-enablement patch for AMD Family-17h CPU PMUs" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Allow only a single PMU/box within an events group perf/x86/intel: Cure bogus unwind from PEBS entries perf/x86: Restore TASK_SIZE check on frame pointer perf/core: Fix address filter parser perf/x86: Add perf support for AMD family-17h processors perf/x86/uncore: Fix crash by removing bogus event_list[] handling for SNB client uncore IMC perf/core: Do not set cpuctx->cgrp for unscheduled cgroups
2016-11-22Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - two fixes to make (very) old Intel CPUs boot reliably - fix the intel-mid driver and rename it - two KASAN false positive fixes - an FPU fix - two sysfb fixes - two build fixes related to new toolchain versions" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/intel-mid: Rename platform_wdt to platform_mrfld_wdt x86/build: Build compressed x86 kernels as PIE when !CONFIG_RELOCATABLE as well x86/platform/intel-mid: Register watchdog device after SCU x86/fpu: Fix invalid FPU ptrace state after execve() x86/boot: Fail the boot if !M486 and CPUID is missing x86/traps: Ignore high word of regs->cs in early_fixup_exception() x86/dumpstack: Prevent KASAN false positive warnings x86/unwind: Prevent KASAN false positive warnings in guess unwinder x86/boot: Avoid warning for zero-filling .bss x86/sysfb: Fix lfb_size calculation x86/sysfb: Add support for 64bit EFI lfb_base
2016-11-22perf/x86/intel/uncore: Allow only a single PMU/box within an events groupPeter Zijlstra
Group validation expects all events to be of the same PMU; however is_uncore_pmu() is too wide, it matches _all_ uncore events, even across PMUs. This triggers failure when we group different events from different uncore PMUs, like: perf stat -vv -e '{uncore_cbox_0/config=0x0334/,uncore_qpi_0/event=1/}' -a sleep 1 Fix is_uncore_pmu() by only matching events to the box at hand. Note that generic code; ran after this step; will disallow this mixture of PMU events. Reported-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20161118125354.GQ3117@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-22perf/x86/intel: Cure bogus unwind from PEBS entriesPeter Zijlstra
Vince Weaver reported that perf_fuzzer + KASAN detects that PEBS event unwinds sometimes do 'weird' things. In particular, we seemed to be ending up unwinding from random places on the NMI stack. While it was somewhat expected that the event record BP,SP would not match the interrupt BP,SP in that the interrupt is strictly later than the record event, it was overlooked that it could be on an already overwritten stack. Therefore, don't copy the recorded BP,SP over the interrupted BP,SP when we need stack unwinds. Note that its still possible the unwind doesn't full match the actual event, as its entirely possible to have done an (I)RET between record and interrupt, but on average it should still point in the general direction of where the event came from. Also, it's the best we can do, considering. The particular scenario that triggered the bogus NMI stack unwind was a PEBS event with very short period, upon enabling the event at the tail of the PMI handler (FREEZE_ON_PMI is not used), it instantly triggers a record (while still on the NMI stack) which in turn triggers the next PMI. This then causes back-to-back NMIs and we'll try and unwind the stack-frame from the last NMI, which obviously is now overwritten by our own. Analyzed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davej@codemonkey.org.uk <davej@codemonkey.org.uk> Cc: dvyukov@google.com <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: ca037701a025 ("perf, x86: Add PEBS infrastructure") Link: http://lkml.kernel.org/r/20161117171731.GV3157@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-22perf/x86: Restore TASK_SIZE check on frame pointerJohannes Weiner
The following commit: 75925e1ad7f5 ("perf/x86: Optimize stack walk user accesses") ... switched from copy_from_user_nmi() to __copy_from_user_nmi() with a manual access_ok() check. Unfortunately, copy_from_user_nmi() does an explicit check against TASK_SIZE, whereas the access_ok() uses whatever the current address limit of the task is. We are getting NMIs when __probe_kernel_read() has switched to KERNEL_DS, and then see vmalloc faults when we access what looks like pointers into vmalloc space: [] WARNING: CPU: 3 PID: 3685731 at arch/x86/mm/fault.c:435 vmalloc_fault+0x289/0x290 [] CPU: 3 PID: 3685731 Comm: sh Tainted: G W 4.6.0-5_fbk1_223_gdbf0f40 #1 [] Call Trace: [] <NMI> [<ffffffff814717d1>] dump_stack+0x4d/0x6c [] [<ffffffff81076e43>] __warn+0xd3/0xf0 [] [<ffffffff81076f2d>] warn_slowpath_null+0x1d/0x20 [] [<ffffffff8104a899>] vmalloc_fault+0x289/0x290 [] [<ffffffff8104b5a0>] __do_page_fault+0x330/0x490 [] [<ffffffff8104b70c>] do_page_fault+0xc/0x10 [] [<ffffffff81794e82>] page_fault+0x22/0x30 [] [<ffffffff81006280>] ? perf_callchain_user+0x100/0x2a0 [] [<ffffffff8115124f>] get_perf_callchain+0x17f/0x190 [] [<ffffffff811512c7>] perf_callchain+0x67/0x80 [] [<ffffffff8114e750>] perf_prepare_sample+0x2a0/0x370 [] [<ffffffff8114e840>] perf_event_output+0x20/0x60 [] [<ffffffff8114aee7>] ? perf_event_update_userpage+0xc7/0x130 [] [<ffffffff8114ea01>] __perf_event_overflow+0x181/0x1d0 [] [<ffffffff8114f484>] perf_event_overflow+0x14/0x20 [] [<ffffffff8100a6e3>] intel_pmu_handle_irq+0x1d3/0x490 [] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10 [] [<ffffffff81197191>] ? vunmap_page_range+0x1a1/0x2f0 [] [<ffffffff811972f1>] ? unmap_kernel_range_noflush+0x11/0x20 [] [<ffffffff814f2056>] ? ghes_copy_tofrom_phys+0x116/0x1f0 [] [<ffffffff81040d1d>] ? x2apic_send_IPI_self+0x1d/0x20 [] [<ffffffff8100411d>] perf_event_nmi_handler+0x2d/0x50 [] [<ffffffff8101ea31>] nmi_handle+0x61/0x110 [] [<ffffffff8101ef94>] default_do_nmi+0x44/0x110 [] [<ffffffff8101f13b>] do_nmi+0xdb/0x150 [] [<ffffffff81795187>] end_repeat_nmi+0x1a/0x1e [] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10 [] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10 [] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10 [] <<EOE>> <IRQ> [<ffffffff8115d05e>] ? __probe_kernel_read+0x3e/0xa0 Fix this by moving the valid_user_frame() check to before the uaccess that loads the return address and the pointer to the next frame. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Fixes: 75925e1ad7f5 ("perf/x86: Optimize stack walk user accesses") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21x86/platform/intel-mid: Rename platform_wdt to platform_mrfld_wdtAndy Shevchenko
Rename the watchdog platform library file to explicitly show that is used only on Intel Merrifield platforms. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161118172723.179761-1-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21x86/build: Build compressed x86 kernels as PIE when !CONFIG_RELOCATABLE as wellH.J. Lu
Since the bootloader may load the compressed x86 kernel at any address, it should always be built as PIE, not just when CONFIG_RELOCATABLE=y. Otherwise, linker in binutils 2.27 will optimize GOT load into the absolute address when building the compressed x86 kernel as a non-PIE executable. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org [ Small wording changes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21x86/platform/intel-mid: Register watchdog device after SCUAndy Shevchenko
Watchdog device in Intel Tangier relies on SCU to be present. It uses the SCU IPC channel to send commands and receive responses. If watchdog driver is initialized quite before SCU and a command has been sent the result is always an error like the following: intel_mid_wdt: Error stopping watchdog: 0xffffffed Register watchdog device whne SCU is ready to avoid described issue. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161118165224.175514-1-andriy.shevchenko@linux.intel.com [ Small cleanups. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21x86/fpu: Fix invalid FPU ptrace state after execve()Yu-cheng Yu
Robert O'Callahan reported that after an execve PTRACE_GETREGSET NT_X86_XSTATE continues to return the pre-exec register values until the exec'ed task modifies FPU state. The test code is at: https://bugzilla.redhat.com/attachment.cgi?id=1164286. What is happening is fpu__clear() does not properly clear fpstate. Fix it by doing just that. Reported-by: Robert O'Callahan <robert@ocallahan.org> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: David Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1479402695-6553-1-git-send-email-yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21x86/boot: Fail the boot if !M486 and CPUID is missingAndy Lutomirski
Linux will have all kinds of sporadic problems on systems that don't have the CPUID instruction unless CONFIG_M486=y. In particular, sync_core() will explode. I believe that these kernels had a better chance of working before commit 05fb3c199bb0 ("x86/boot: Initialize FPU and X86_FEATURE_ALWAYS even if we don't have CPUID"). That commit inadvertently fixed a serious bug: we used to fail to detect the FPU if CPUID wasn't present. Because we also used to forget to set X86_FEATURE_ALWAYS, we end up with no cpu feature bits set at all. This meant that alternative patching didn't do anything and, if paravirt was disabled, we could plausibly finish the entire boot process without calling sync_core(). Rather than trying to work around these issues, just have the kernel fail loudly if it's running on a CPUID-less 486, doesn't have CPUID, and doesn't have CONFIG_M486 set. Reported-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/70eac6639f23df8be5fe03fa1984aedd5d40077a.1479598603.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21x86/traps: Ignore high word of regs->cs in early_fixup_exception()Andy Lutomirski
On the 80486 DX, it seems that some exceptions may leave garbage in the high bits of CS. This causes sporadic failures in which early_fixup_exception() refuses to fix up an exception. As far as I can tell, this has been buggy for a long time, but the problem seems to have been exacerbated by commits: 1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4") e1bfc11c5a6f ("x86/init: Fix cr4_init_shadow() on CR4-less machines") This appears to have broken for as long as we've had early exception handling. [ Note to stable maintainers: This patch is needed all the way back to 3.4, but it will only apply to 4.6 and up, as it depends on commit: 0e861fbb5bda ("x86/head: Move early exception panic code into early_fixup_exception()") If you want to backport to kernels before 4.6, please don't backport the prerequisites (there was a big chain of them that rewrote a lot of the early exception machinery); instead, ask me and I can send you a one-liner that will apply. ] Reported-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 4c5023a3fa2e ("x86-32: Handle exception table entries during early boot") Link: http://lkml.kernel.org/r/cb32c69920e58a1a58e7b5cad975038a69c0ce7d.1479609510.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-19Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "ARM: - Fix handling of the 32bit cycle counter - Fix cycle counter filtering x86: - Fix a race leading to double unregistering of user notifiers - Amend oversight in kvm_arch_set_irq that turned Hyper-V code dead - Use SRCU around kvm_lapic_set_vapic_addr - Avoid recursive flushing of asynchronous page faults - Do not rely on deferred update in KVM_GET_CLOCK, which fixes #GP - Let userspace know that KVM_GET_CLOCK is useful with master clock; 4.9 changed the return value to better match the guest clock, but didn't provide means to let guests take advantage of it" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86: merge kvm_arch_set_irq and kvm_arch_set_irq_inatomic KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr KVM: async_pf: avoid recursive flushing of work items kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use KVM: Disable irq while unregistering user notifier KVM: x86: do not go through vcpu in __get_kvmclock_ns KVM: arm64: Fix the issues when guest PMCCFILTR is configured arm64: KVM: pmu: Fix AArch32 cycle counter access
2016-11-19kvm: x86: merge kvm_arch_set_irq and kvm_arch_set_irq_inatomicPaolo Bonzini
kvm_arch_set_irq is unused since commit b97e6de9c96. Merge its functionality with kvm_arch_set_irq_inatomic. Reported-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-19KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addrPaolo Bonzini
Reported by syzkaller: [ INFO: suspicious RCU usage. ] 4.9.0-rc4+ #47 Not tainted ------------------------------- ./include/linux/kvm_host.h:536 suspicious rcu_dereference_check() usage! stack backtrace: CPU: 1 PID: 6679 Comm: syz-executor Not tainted 4.9.0-rc4+ #47 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffff880039e2f6d0 ffffffff81c2e46b ffff88003e3a5b40 0000000000000000 0000000000000001 ffffffff83215600 ffff880039e2f700 ffffffff81334ea9 ffffc9000730b000 0000000000000004 ffff88003c4f8420 ffff88003d3f8000 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff81c2e46b>] dump_stack+0xb3/0x118 lib/dump_stack.c:51 [<ffffffff81334ea9>] lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4445 [< inline >] __kvm_memslots include/linux/kvm_host.h:534 [< inline >] kvm_memslots include/linux/kvm_host.h:541 [<ffffffff8105d6ae>] kvm_gfn_to_hva_cache_init+0xa1e/0xce0 virt/kvm/kvm_main.c:1941 [<ffffffff8112685d>] kvm_lapic_set_vapic_addr+0xed/0x140 arch/x86/kvm/lapic.c:2217 Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: fda4e2e85589191b123d31cdc21fd33ee70f50fd Cc: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-19kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in usePaolo Bonzini
Userspace can read the exact value of kvmclock by reading the TSC and fetching the timekeeping parameters out of guest memory. This however is brittle and not necessary anymore with KVM 4.11. Provide a mechanism that lets userspace know if the new KVM_GET_CLOCK semantics are in effect, and---since we are at it---if the clock is stable across all VCPUs. Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-19KVM: Disable irq while unregistering user notifierIgnacio Alvarado
Function user_notifier_unregister should be called only once for each registered user notifier. Function kvm_arch_hardware_disable can be executed from an IPI context which could cause a race condition with a VCPU returning to user mode and attempting to unregister the notifier. Signed-off-by: Ignacio Alvarado <ikalvarado@google.com> Cc: stable@vger.kernel.org Fixes: 18863bdd60f8 ("KVM: x86 shared msr infrastructure") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-19KVM: x86: do not go through vcpu in __get_kvmclock_nsPaolo Bonzini
Going through the first VCPU is wrong if you follow a KVM_SET_CLOCK with a KVM_GET_CLOCK immediately after, without letting the VCPU run and call kvm_guest_time_update. To fix this, compute the kvmclock value ourselves, using the master clock (tsc, nsec) pair as the base and the host CPU frequency as the scale. Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-18Merge branch 'rc-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild fixes from Michal Marek: "Here are some regression fixes for kbuild: - modversion support for exported asm symbols (Nick Piggin). The affected architectures need separate patches adding asm-prototypes.h. - fix rebuilds of lib-ksyms.o (Nick Piggin) - -fno-PIE builds (Sebastian Siewior and Borislav Petkov). This is not a kernel regression, but one of the Debian gcc package. Nevertheless, it's quite annoying, so I think it should go into mainline and stable now" * 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: kbuild: Steal gcc's pie from the very beginning kbuild: be more careful about matching preprocessed asm ___EXPORT_SYMBOL x86/kexec: add -fno-PIE scripts/has-stack-protector: add -fno-PIE kbuild: add -fno-PIE kbuild: modversions for EXPORT_SYMBOL() for asm kbuild: prevent lib-ksyms.o rebuilds
2016-11-18perf/x86: Add perf support for AMD family-17h processorsJanakarajan Natarajan
This patch enables perf core PMU support for the new AMD family-17h processors. In family-17h, there is no PMC-event constraint. All events, irrespective of the type, can be measured using any of the six generic performance counters. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1479399306-13375-1-git-send-email-Janakarajan.Natarajan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-18x86/dumpstack: Prevent KASAN false positive warningsJosh Poimboeuf
The oops stack dump code scans the entire stack, which can cause KASAN "stack-out-of-bounds" false positive warnings. Tell KASAN to ignore it. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: davej@codemonkey.org.uk Cc: dvyukov@google.com Link: http://lkml.kernel.org/r/5f6e80c4b0c7f7f0b6211900847a247cdaad753c.1479398226.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-18x86/unwind: Prevent KASAN false positive warnings in guess unwinderJosh Poimboeuf
The guess unwinder scans the entire stack, which can cause KASAN "stack-out-of-bounds" false positive warnings. Tell KASAN to ignore it. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: davej@codemonkey.org.uk Cc: dvyukov@google.com Link: http://lkml.kernel.org/r/61939c0b2b2d63ce97ba59cba3b00fd47c2962cf.1479398226.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-17x86/boot: Avoid warning for zero-filling .bssArnd Bergmann
The latest binutils are warning about a .fill directive with an explicit value in a .bss section: arch/x86/kernel/head_32.S: Assembler messages: arch/x86/kernel/head_32.S:677: Warning: ignoring fill value in section `.bss..page_aligned' arch/x86/kernel/head_32.S:679: Warning: ignoring fill value in section `.bss..page_aligned' This comes from the 'ENTRY()' macro padding the space between the symbols with 'nop' via: .align 4,0x90 Open-coding the .globl directive without the padding avoids that warning, as all the symbols are already page aligned. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161116141726.2013389-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16perf/x86/uncore: Fix crash by removing bogus event_list[] handling for SNB ↵Kan Liang
client uncore IMC Vince Weaver reported the following bug when KASAN is enabled: [ 205.748005] BUG: KASAN: slab-out-of-bounds in snb_uncore_imc_event_del+0x6c/0xa0 at addr ffff8800caa43768 [ 205.758324] Read of size 8 by task perf_fuzzer/6618 It's caused by accessing box->event_list. For client IMC, there are no generic counters. It defines its own fixed free running counters. So event_list and n_events are unused. They can be removed safely, which fixes the bug. ( There's still the separate question of how uninitialized state snuck into this data structure - but that's a separate fix. ) Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: davej@codemonkey.org.uk Cc: dvyukov@google.com Cc: eranian@gmail.com Link: http://lkml.kernel.org/r/1479235210-29090-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16x86/sysfb: Fix lfb_size calculationDavid Herrmann
The screen_info.lfb_size field is shifted by 16 bits *only* in case of VBE. This has historical reasons since VBE advertised it similarly. However, in case of EFI framebuffers, the size is no longer shifted. Fix the x86 simple-framebuffer setup code to use the correct size in the non-VBE case. While at it, avoid variable abbreviations and rename 'len' to 'length', and use the correct types matching the screen_info definition. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Gundersen <teg@jklm.no> Link: http://lkml.kernel.org/r/20161115120158.15388-3-dh.herrmann@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16x86/sysfb: Add support for 64bit EFI lfb_baseDavid Herrmann
The screen_info object was extended to support 64-bit lfb_base addresses in: ae2ee627dc87 ("efifb: Add support for 64-bit frame buffer addresses") However, the x86 simple-framebuffer setup code never made use of it. Fix it to properly assemble and verify the lfb_base before advertising simple-framebuffer devices. In particular, this means if VIDEO_CAPABILITY_64BIT_BASE is set, the screen_info->ext_lfb_base field will contain the upper 32bit of the actual lfb_base. Make sure the address is not 0 (i.e., unset), as well as does not overflow the physical address type. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Gundersen <teg@jklm.no> Link: http://lkml.kernel.org/r/20161115120158.15388-2-dh.herrmann@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-14Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - fix an Intel/MID boot crash/hang bug - fix a cache topology mis-parsing bug on certain AMD CPUs - fix a virtualization firmware bug by adding a check+quirk workaround on the kernel side" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Deal with broken firmware (VMWare/XEN) x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems x86/platform/intel-mid: Retrofit pci_platform_pm_ops ->get_state hook
2016-11-14Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "An uncore PMU driver hardware enablement change for Intel SkyLake uncore PMUs (Skylake Y, U, H and S platforms), plus a number of tooling fixes for the histogram handling/displaying code" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Add more Intel uncore IMC PCI IDs for SkyLake perf hists: Fix column length on --hierarchy perf hists browser: Fix column indentation on --hierarchy perf hists browser: Show folded sign properly on --hierarchy perf hists browser: Fix indentation of folded sign on --hierarchy perf hist browser: Fix hierarchy column counts
2016-11-13x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=yMatt Fleming
Booting an EFI mixed mode kernel has been crashing since commit: e37e43a497d5 ("x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)") The user-visible effect in my test setup was the kernel being unable to find the root file system ramdisk. This was likely caused by silent memory or page table corruption. Enabling CONFIG_DEBUG_VIRTUAL=y immediately flagged the thunking code as abusing virt_to_phys() because it was passing addresses that were not part of the kernel direct mapping. Use the slow version instead, which correctly handles all memory regions by performing a page table walk. Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20161112210424.5157-3-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-13x86/efi: Fix EFI memmap pointer size warningBorislav Petkov
Fix this when building on 32-bit: arch/x86/platform/efi/efi.c: In function ‘__efi_enter_virtual_mode’: arch/x86/platform/efi/efi.c:911:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] (efi_memory_desc_t *)pa); ^ arch/x86/platform/efi/efi.c:918:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] (efi_memory_desc_t *)pa); ^ The @pa local variable is declared as phys_addr_t and that is a u64 when CONFIG_PHYS_ADDR_T_64BIT=y. (The last is enabled on 32-bit on a PAE build.) However, its value comes from __pa() which is basically doing pointer arithmetic and checking, and returns unsigned long as it is the native pointer width. So let's use an unsigned long too. It should be fine to do so because the later users cast it to a pointer too. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20161112210424.5157-2-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-11crypto: aesni: shut up -Wmaybe-uninitialized warningArnd Bergmann
The rfc4106 encrypy/decrypt helper functions cause an annoying false-positive warning in allmodconfig if we turn on -Wmaybe-uninitialized warnings again: arch/x86/crypto/aesni-intel_glue.c: In function ‘helper_rfc4106_decrypt’: include/linux/scatterlist.h:67:31: warning: ‘dst_sg_walk.sg’ may be used uninitialized in this function [-Wmaybe-uninitialized] The problem seems to be that the compiler doesn't track the state of the 'one_entry_in_sg' variable across the kernel_fpu_begin/kernel_fpu_end section. This takes the easy way out by adding a bogus initialization, which should be harmless enough to get the patch into v4.9 so we can turn on this warning again by default without producing useless output. A follow-up patch for v4.10 rearranges the code to make the warning go away. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-11x86: apm: avoid uninitialized dataArnd Bergmann
apm_bios_call() can fail, and return a status in its argument structure. If that status however is zero during a call from apm_get_power_status(), we end up using data that may have never been set, as reported by "gcc -Wmaybe-uninitialized": arch/x86/kernel/apm_32.c: In function ‘apm’: arch/x86/kernel/apm_32.c:1729:17: error: ‘bx’ may be used uninitialized in this function [-Werror=maybe-uninitialized] arch/x86/kernel/apm_32.c:1835:5: error: ‘cx’ may be used uninitialized in this function [-Werror=maybe-uninitialized] arch/x86/kernel/apm_32.c:1730:17: note: ‘cx’ was declared here arch/x86/kernel/apm_32.c:1842:27: error: ‘dx’ may be used uninitialized in this function [-Werror=maybe-uninitialized] arch/x86/kernel/apm_32.c:1731:17: note: ‘dx’ was declared here This changes the function to return "APM_NO_ERROR" here, which makes the code more robust to broken BIOS versions, and avoids the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-11perf/x86/intel/uncore: Add more Intel uncore IMC PCI IDs for SkyLakeKan Liang
Several uncore IMC PCI IDs are missed for Intel SkyLake. Add the PCI IDs for SkyLake Y, U, H and S platforms. Rename the ID macros for 0x191f and 0x190c. The corresponding bug: https://bugzilla.kernel.org/show_bug.cgi?id=187301 The related datasheets are also attached in the bug entry for permanent reference. Reported-by: Ben Widawsky <benjamin.widawsky@intel.com> Tested-by: Ben Widawsky <benjamin.widawsky@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1478631281-5061-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-09x86/kexec: add -fno-PIESebastian Andrzej Siewior
If the gcc is configured to do -fPIE by default then the build aborts later with: | Unsupported relocation type: unknown type rel type name (29) Tagging it stable so it is possible to compile recent stable kernels as well. Cc: stable@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Michal Marek <mmarek@suse.com>
2016-11-09x86/cpu: Deal with broken firmware (VMWare/XEN)Thomas Gleixner
Both ACPI and MP specifications require that the APIC id in the respective tables must be the same as the APIC id in CPUID. The kernel retrieves the physical package id from the APIC id during the ACPI/MP table scan and builds the physical to logical package map. The physical package id which is used after a CPU comes up is retrieved from CPUID. So we rely on ACPI/MP tables and CPUID agreeing in that respect. There exist VMware and XEN implementations which violate the spec. As a result the physical to logical package map, which relies on the ACPI/MP tables does not work on those systems, because the CPUID initialized physical package id does not match the firmware id. This causes system crashes and malfunction due to invalid package mappings. The only way to cure this is to sanitize the physical package id after the CPUID enumeration and yell when the APIC ids are different. Fix up the initial APIC id, which is fine as it is only used printout purposes. If the physical package IDs differ yell and use the package information from the ACPI/MP tables so the existing logical package map just works. Chas provided the resulting dmesg output for his affected 4 virtual sockets, 1 core per socket VM: [Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2 [Firmware Bug]: CPU1: Using firmware package id 1 instead of 2 .... Reported-and-tested-by: "Charles (Chas) Williams" <ciwillia@brocade.com>, Reported-by: M. Vefa Bicakci <m.v.b@runbox.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Alok Kataria <akataria@vmware.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: #4.6+ <stable@vger,kernel.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1611091613540.3501@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-09x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systemsYazen Ghannam
cpu_llc_id (Last Level Cache ID) derivation on AMD Fam17h has an underflow bug when extracting the socket_id value. It starts from 0 so subtracting 1 from it will result in an invalid value. This breaks scheduling topology later on since the cpu_llc_id will be incorrect. For example, the the cpu_llc_id of the *other* CPU in the loops in set_cpu_sibling_map() underflows and we're generating the funniest thread_siblings masks and then when I run 8 threads of nbench, they get spread around the LLC domains in a very strange pattern which doesn't give you the normal scheduling spread one would expect for performance. Other things like EDAC use cpu_llc_id so they will be b0rked too. So, the APIC ID is preset in APICx020 for bits 3 and above: they contain the core complex, node and socket IDs. The LLC is at the core complex level so we can find a unique cpu_llc_id by right shifting the APICID by 3 because then the least significant bit will be the Core Complex ID. Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> [ Cleaned up and extended the commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> # v4.4.. Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 3849e91f571d ("x86/AMD: Fix last level cache topology for AMD Fam17h systems") Link: http://lkml.kernel.org/r/20161108083506.rvqb5h4chrcptj7d@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-07x86/platform/intel-mid: Retrofit pci_platform_pm_ops ->get_state hookLukas Wunner
Commit cc7cc02bada8 ("PCI: Query platform firmware for device power state") augmented struct pci_platform_pm_ops with a ->get_state hook and implemented it for acpi_pci_platform_pm, the only pci_platform_pm_ops existing till v4.7. However v4.8 introduced another pci_platform_pm_ops for Intel Mobile Internet Devices with commit 5823d0893ec2 ("x86/platform/intel-mid: Add Power Management Unit driver"). It is missing the ->get_state hook, which is fatal since pci_set_platform_pm() enforces its presence. Andy Shevchenko reports that without the present commit, such a device "crashes without even a character printed out on serial console and reboots (since watchdog)". Retrofit mid_pci_platform_pm with the missing callback to fix the breakage. Acked-and-tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Fixes: cc7cc02bada8 ("PCI: Query platform firmware for device power state") Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: http://lkml.kernel.org/r/7c1567d4c49303a4aada94ba16275cbf56b8976b.1477221514.git.lukas@wunner.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "One NULL pointer dereference, and two fixes for regressions introduced during the merge window. The rest are fixes for MIPS, s390 and nested VMX" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86: Check memopp before dereference (CVE-2016-8630) kvm: nVMX: VMCLEAR an active shadow VMCS after last use KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCK KVM: x86: fix wbinvd_dirty_mask use-after-free kvm/x86: Show WRMSR data is in hex kvm: nVMX: Fix kernel panics induced by illegal INVEPT/INVVPID types KVM: document lock orders KVM: fix OOPS on flush_work KVM: s390: Fix STHYI buffer alignment for diag224 KVM: MIPS: Precalculate MMIO load resume PC KVM: MIPS: Make ERET handle ERL before EXL KVM: MIPS: Fix lazy user ASID regenerate for SMP
2016-11-02kvm: x86: Check memopp before dereference (CVE-2016-8630)Owen Hofmann
Commit 41061cdb98 ("KVM: emulate: do not initialize memopp") removes a check for non-NULL under incorrect assumptions. An undefined instruction with a ModR/M byte with Mod=0 and R/M-5 (e.g. 0xc7 0x15) will attempt to dereference a null pointer here. Fixes: 41061cdb98a0bec464278b4db8e894a3121671f5 Message-Id: <1477592752-126650-2-git-send-email-osh@google.com> Signed-off-by: Owen Hofmann <osh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02kvm: nVMX: VMCLEAR an active shadow VMCS after last useJim Mattson
After a successful VM-entry with the "VMCS shadowing" VM-execution control set, the shadow VMCS referenced by the VMCS link pointer field in the current VMCS becomes active on the logical processor. A VMCS that is made active on more than one logical processor may become corrupted. Therefore, before an active VMCS can be migrated to another logical processor, the first logical processor must execute a VMCLEAR for the active VMCS. VMCLEAR both ensures that all VMCS data are written to memory and makes the VMCS inactive. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-By: David Matlack <dmatlack@google.com> Message-Id: <1477668579-22555-1-git-send-email-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCKPaolo Bonzini
Since commit a545ab6a0085 ("kvm: x86: add tsc_offset field to struct kvm_vcpu_arch", 2016-09-07) the offset between host and L1 TSC is cached and need not be fished out of the VMCS or VMCB. This means that we can implement adjust_tsc_offset_guest and read_l1_tsc entirely in generic code. The simplification is particularly significant for VMX code, where vmx->nested.vmcs01_tsc_offset was duplicating what is now in vcpu->arch.tsc_offset. Therefore the vmcs01_tsc_offset can be dropped completely. More importantly, this fixes KVM_GET_CLOCK/KVM_SET_CLOCK which, after commit 108b249c453d ("KVM: x86: introduce get_kvmclock_ns", 2016-09-01) called read_l1_tsc while the VMCS was not loaded. It thus returned bogus values on Intel CPUs. Fixes: 108b249c453dd7132599ab6dc7e435a7036c193f Reported-by: Roman Kagan <rkagan@virtuozzo.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-29x86/smpboot: Init apic mapping before usageThomas Gleixner
The recent changes, which forced the registration of the boot cpu on UP systems, which do not have ACPI tables, have been fixed for systems w/o local APIC, but left a wreckage for systems which have neither ACPI nor mptables, but the CPU has an APIC, e.g. virtualbox. The boot process crashes in prefill_possible_map() as it wants to register the boot cpu, which needs to access the local apic, but the local APIC is not yet mapped. There is no reason why init_apic_mapping() can't be invoked before prefill_possible_map(). So instead of playing another silly early mapping game, as the ACPI/mptables code does, we just move init_apic_mapping() before the call to prefill_possible_map(). In hindsight, I should have noticed that combination earlier. Sorry for the churn (also in stable)! Fixes: ff8560512b8d ("x86/boot/smp: Don't try to poke disabled/non-existent APIC") Reported-and-debugged-by: Michal Necasek <michal.necasek@oracle.com> Reported-and-tested-by: Wolfgang Bauer <wbauer@tmo.at> Cc: prarit@redhat.com Cc: ville.syrjala@linux.intel.com Cc: michael.thayer@oracle.com Cc: knut.osmundsen@oracle.com Cc: frank.mehnert@oracle.com Cc: Borislav Petkov <bp@alien8.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610282114380.5053@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-28Merge tag 'acpi-4.9-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix recent ACPICA regressions, an older PCI IRQ management regression, and an incorrect return value of a function in the APEI code. Specifics: - Fix three ACPICA issues related to the interpreter locking and introduced by recent changes in that area (Lv Zheng). - Fix a PCI IRQ management regression introduced during the 4.7 cycle and related to the configuration of shared IRQs on systems with an ISA bus (Sinan Kaya). - Fix up a return value of one function in the APEI code (Punit Agrawal)" * tag 'acpi-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPICA: Dispatcher: Fix interpreter locking around acpi_ev_initialize_region() ACPICA: Dispatcher: Fix an unbalanced lock exit path in acpi_ds_auto_serialize_method() ACPICA: Dispatcher: Fix order issue of method termination ACPI / APEI: Fix incorrect return value of ghes_proc() ACPI/PCI: pci_link: Include PIRQ_PENALTY_PCI_USING for ISA IRQs ACPI/PCI: pci_link: penalize SCI correctly ACPI/PCI/IRQ: assign ISA IRQ directly during early boot stages
2016-10-28Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc kernel fixes: a virtualization environment related fix, an uncore PMU driver removal handling fix, a PowerPC fix and new events for Knights Landing" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Honour the CPUID for number of fixed counters in hypervisors perf/powerpc: Don't call perf_event_disable() from atomic context perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic perf/x86/intel/cstate: Add C-state residency events for Knights Landing
2016-10-28Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: three build fixes, an unwinder fix and a microcode loader fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Fix more fallout from CONFIG_RANDOMIZE_MEMORY=y x86: Fix export for mcount and __fentry__ x86/quirks: Hide maybe-uninitialized warning x86/build: Fix build with older GCC versions x86/unwind: Fix empty stack dereference in guess unwinder
2016-10-28Merge tag 'drm-x86-pat-regression-fix' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull drm x86/pat regression fixes from Dave Airlie: "This is a standalone pull request for the fix for a regression introduced in -rc1 by a change to vm_insert_mixed to start using the PAT range tracking to validate page protections. With this fix in place, all the VRAM mappings for GPU drivers ended up at UC instead of WC. There are probably better ways to fix this long term, but nothing I'd considered for -fixes that wouldn't need more settling in time. So I've just created a new arch API that the drivers can reserve all their VRAM aperture ranges as WC" * tag 'drm-x86-pat-regression-fix' of git://people.freedesktop.org/~airlied/linux: drm/drivers: add support for using the arch wc mapping API. x86/io: add interface to reserve io memtype for a resource range. (v1.1)