summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2017-08-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Another pile of small fixes and updates for x86: - Plug a hole in the SMAP implementation which misses to clear AC on NMI entry - Fix the norandmaps/ADDR_NO_RANDOMIZE logic so the command line parameter works correctly again - Use the proper accessor in the startup64 code for next_early_pgt to prevent accessing of invalid addresses and faulting in the early boot code. - Prevent CPU hotplug lock recursion in the MTRR code - Unbreak CPU0 hotplugging - Rename overly long CPUID bits which got introduced in this cycle - Two commits which mark data 'const' and restrict the scope of data and functions to file scope by making them 'static'" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Constify attribute_group structures x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt' x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks x86: Fix norandmaps/ADDR_NO_RANDOMIZE x86/mtrr: Prevent CPU hotplug lock recursion x86: Mark various structures and functions as 'static' x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized VMSAVE/VMLOAD" CPUID flag x86/smpboot: Unbreak CPU0 hotplug x86/asm/64: Clear AC on NMI entries
2017-08-20Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Two fixes for the perf subsystem: - Fix an inconsistency of RDPMC mm struct tagging across exec() which causes RDPMC to fault. - Correct the timestamp mechanics across IOC_DISABLE/ENABLE which causes incorrect timestamps and total time calculations" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix time on IOC_ENABLE perf/x86: Fix RDPMC vs. mm_struct tracking
2017-08-20Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull watchdog fix from Thomas Gleixner: "A fix for the hardlockup watchdog to prevent false positives with extreme Turbo-Modes which make the perf/NMI watchdog fire faster than the hrtimer which is used to verify. Slightly larger than the minimal fix, which just would increase the hrtimer frequency, but comes with extra overhead of more watchdog timer interrupts and thread wakeups for all users. With this change we restrict the overhead to the extreme Turbo-Mode systems" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kernel/watchdog: Prevent false positives with turbo modes
2017-08-18mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changesKees Cook
Moving the x86_64 and arm64 PIE base from 0x555555554000 to 0x000100000000 broke AddressSanitizer. This is a partial revert of: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") 02445990a96e ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB") The AddressSanitizer tool has hard-coded expectations about where executable mappings are loaded. The motivation for changing the PIE base in the above commits was to avoid the Stack-Clash CVEs that allowed executable mappings to get too close to heap and stack. This was mainly a problem on 32-bit, but the 64-bit bases were moved too, in an effort to proactively protect those systems (proofs of concept do exist that show 64-bit collisions, but other recent changes to fix stack accounting and setuid behaviors will minimize the impact). The new 32-bit PIE base is fine for ASan (since it matches the ET_EXEC base), so only the 64-bit PIE base needs to be reverted to let x86 and arm64 ASan binaries run again. Future changes to the 64-bit PIE base on these architectures can be made optional once a more dynamic method for dealing with AddressSanitizer is found. (e.g. always loading PIE into the mmap region for marked binaries.) Link: http://lkml.kernel.org/r/20170807201542.GA21271@beast Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") Fixes: 02445990a96e ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Kostya Serebryany <kcc@google.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-18kernel/watchdog: fix Kconfig constraints for perf hardlockup watchdogNicholas Piggin
Commit 05a4a9527931 ("kernel/watchdog: split up config options") lost the perf-based hardlockup detector's dependency on PERF_EVENTS, which can result in broken builds with some powerpc configurations. Restore the dependency. Add it in for x86 too, despite x86 always selecting PERF_EVENTS it seems reasonable to make the dependency explicit. Link: http://lkml.kernel.org/r/20170810114452.6673-1-npiggin@gmail.com Fixes: 05a4a9527931 ("kernel/watchdog: split up config options") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-18kernel/watchdog: Prevent false positives with turbo modesThomas Gleixner
The hardlockup detector on x86 uses a performance counter based on unhalted CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the performance counter period, so the hrtimer should fire 2-3 times before the performance counter NMI fires. The NMI code checks whether the hrtimer fired since the last invocation. If not, it assumess a hard lockup. The calculation of those periods is based on the nominal CPU frequency. Turbo modes increase the CPU clock frequency and therefore shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x nominal frequency) the perf/NMI period is shorter than the hrtimer period which leads to false positives. A simple fix would be to shorten the hrtimer period, but that comes with the side effect of more frequent hrtimer and softlockup thread wakeups, which is not desired. Implement a low pass filter, which checks the perf/NMI period against kernel time. If the perf/NMI fires before 4/5 of the watchdog period has elapsed then the event is ignored and postponed to the next perf/NMI. That solves the problem and avoids the overhead of shorter hrtimer periods and more frequent softlockup thread wakeups. Fixes: 58687acba592 ("lockup_detector: Combine nmi_watchdog and softlockup detector") Reported-and-tested-by: Kan Liang <Kan.liang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: dzickus@redhat.com Cc: prarit@redhat.com Cc: ak@linux.intel.com Cc: babu.moger@oracle.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: acme@redhat.com Cc: stable@vger.kernel.org Cc: atomlin@redhat.com Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos
2017-08-18x86: Constify attribute_group structuresArvind Yadav
attribute_groups are not supposed to change at runtime and none of the groups is modified. Mark the non-const structs as const. [ tglx: Folded into one big patch ] Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: tony.luck@intel.com Cc: bp@alien8.de Link: http://lkml.kernel.org/r/1500550238-15655-2-git-send-email-arvind.yadav.cs@gmail.com
2017-08-17Merge tag 'pm-4.13-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix two issues related to exposing the current CPU frequency to user space on x86. Specifics: - Disable interrupts around reading IA32_APERF and IA32_MPERF in aperfmperf_snapshot_khz() (introduced recently) to avoid excessive delays between the reads that may result from interrupt handling (Doug Smythies). - Fix the computation of the CPU frequency to be reported through the pstate_sample tracepoint in intel_pstate (Doug Smythies)" * tag 'pm-4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: x86: Disable interrupts during MSRs reading cpufreq: intel_pstate: report correct CPU frequencies during trace
2017-08-17Merge branches 'intel_pstate-fix' and 'cpufreq-x86-fix'Rafael J. Wysocki
* intel_pstate-fix: cpufreq: intel_pstate: report correct CPU frequencies during trace * cpufreq-x86-fix: cpufreq: x86: Disable interrupts during MSRs reading
2017-08-17x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt'Alexander Potapenko
__startup_64() is normally using fixup_pointer() to access globals in a position-independent fashion. However 'next_early_pgt' was accessed directly, which wasn't guaranteed to work. Luckily GCC was generating a R_X86_64_PC32 PC-relative relocation for 'next_early_pgt', but Clang emitted a R_X86_64_32S, which led to accessing invalid memory and rebooting the kernel. Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Davidson <md@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: c88d71508e36 ("x86/boot/64: Rewrite startup_64() in C") Link: http://lkml.kernel.org/r/20170816190808.131748-1-glider@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-16x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checksOleg Nesterov
The ADDR_NO_RANDOMIZE checks in stack_maxrandom_size() and randomize_stack_top() are not required. PF_RANDOMIZE is set by load_elf_binary() only if ADDR_NO_RANDOMIZE is not set, no need to re-check after that. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: stable@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20170815154011.GB1076@redhat.com
2017-08-16x86: Fix norandmaps/ADDR_NO_RANDOMIZEOleg Nesterov
Documentation/admin-guide/kernel-parameters.txt says: norandmaps Don't use address space randomization. Equivalent to echo 0 > /proc/sys/kernel/randomize_va_space but it doesn't work because arch_rnd() which is used to randomize mm->mmap_base returns a random value unconditionally. And as Kirill pointed out, ADDR_NO_RANDOMIZE is broken by the same reason. Just shift the PF_RANDOMIZE check from arch_mmap_rnd() to arch_rnd(). Fixes: 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: stable@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20170815153952.GA1076@redhat.com
2017-08-15x86/mtrr: Prevent CPU hotplug lock recursionThomas Gleixner
Larry reported a CPU hotplug lock recursion in the MTRR code. ============================================ WARNING: possible recursive locking detected systemd-udevd/153 is trying to acquire lock: (cpu_hotplug_lock.rw_sem){.+.+.+}, at: [<c030fc26>] stop_machine+0x16/0x30 but task is already holding lock: (cpu_hotplug_lock.rw_sem){.+.+.+}, at: [<c0234353>] mtrr_add_page+0x83/0x470 .... cpus_read_lock+0x48/0x90 stop_machine+0x16/0x30 mtrr_add_page+0x18b/0x470 mtrr_add+0x3e/0x70 mtrr_add_page() holds the hotplug rwsem already and calls stop_machine() which acquires it again. Call stop_machine_cpuslocked() instead. Reported-and-tested-by: Larry Finger <Larry.Finger@lwfinger.net> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708140920250.1865@nanos Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@suse.de>
2017-08-14Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "Fix an error path bug in ixp4xx as well as a read overrun in sha1-avx2" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/sha1 - Fix reads beyond the number of blocks passed crypto: ixp4xx - Fix error handling path in 'aead_perform()'
2017-08-12Merge tag 'for-linus-4.13b-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Some fixes for Xen: - a fix for a regression introduced in 4.13 for a Xen HVM-guest configured with KASLR - a fix for a possible deadlock in the xenbus driver when booting the system - a fix for lost interrupts in Xen guests" * tag 'for-linus-4.13b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/events: Fix interrupt lost during irq_disable and irq_enable xen: avoid deadlock in xenbus xen: fix hvm guest with kaslr enabled xen: split up xen_hvm_init_shared_info() x86: provide an init_mem_mapping hypervisor hook
2017-08-11xen: fix hvm guest with kaslr enabledJuergen Gross
A Xen HVM guest running with KASLR enabled will die rather soon today because the shared info page mapping is using va() too early. This was introduced by commit a5d5f328b0e2baa5ee7c119fd66324eb79eeeb66 ("xen: allocate page for shared info page from low memory"). In order to fix this use early_memremap() to get a temporary virtual address for shared info until va() can be used safely. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-08-11xen: split up xen_hvm_init_shared_info()Juergen Gross
Instead of calling xen_hvm_init_shared_info() on boot and resume split it up into a boot time function searching for the pfn to use and a mapping function doing the hypervisor mapping call. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-08-11x86: provide an init_mem_mapping hypervisor hookJuergen Gross
Provide a hook in hypervisor_x86 called after setting up initial memory mapping. This is needed e.g. by Xen HVM guests to map the hypervisor shared info page. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-08-11x86: Mark various structures and functions as 'static'Colin Ian King
Mark a couple of structures and functions as 'static', pointed out by Sparse: warning: symbol 'bts_pmu' was not declared. Should it be static? warning: symbol 'p4_event_aliases' was not declared. Should it be static? warning: symbol 'rapl_attr_groups' was not declared. Should it be static? symbol 'process_uv2_message' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Andrew Banman <abanman@hpe.com> # for the UV change Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20170810155709.7094-1-colin.king@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-11x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized ↵Borislav Petkov
VMSAVE/VMLOAD" CPUID flag "virtual_vmload_vmsave" is what is going to land in /proc/cpuinfo now as per v4.13-rc4, for a single feature bit which is clearly too long. So rename it to what it is called in the processor manual. "v_vmsave_vmload" is a bit shorter, after all. We could go more aggressively here but having it the same as in the processor manual is advantageous. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Radim Krčmář <rkrcmar@redhat.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: Jörg Rödel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm-ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/20170801185552.GA3743@nazgul.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-11cpufreq: x86: Disable interrupts during MSRs readingDoug Smythies
According to Intel 64 and IA-32 Architectures SDM, Volume 3, Chapter 14.2, "Software needs to exercise care to avoid delays between the two RDMSRs (for example interrupts)". So, disable interrupts during reading MSRs IA32_APERF and IA32_MPERF. See also: commit 4ab60c3f32c7 (cpufreq: intel_pstate: Disable interrupts during MSRs reading). Signed-off-by: Doug Smythies <dsmythies@telus.net> Reviewed-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-10x86/smpboot: Unbreak CPU0 hotplugVitaly Kuznetsov
A hang on CPU0 onlining after a preceding offlining is observed. Trace shows that CPU0 is stuck in check_tsc_sync_target() waiting for source CPU to run check_tsc_sync_source() but this never happens. Source CPU, in its turn, is stuck on synchronize_sched() which is called from native_cpu_up() -> do_boot_cpu() -> unregister_nmi_handler(). So it's a classic ABBA deadlock, due to the use of synchronize_sched() in unregister_nmi_handler(). Fix the bug by moving unregister_nmi_handler() from do_boot_cpu() to native_cpu_up() after cpu onlining is done. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170803105818.9934-1-vkuznets@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10x86/asm/64: Clear AC on NMI entriesAndy Lutomirski
This closes a hole in our SMAP implementation. This patch comes from grsecurity. Good catch! Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/314cc9f294e8f14ed85485727556ad4f15bb1659.1502159503.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10perf/x86: Fix RDPMC vs. mm_struct trackingPeter Zijlstra
Vince reported the following rdpmc() testcase failure: > Failing test case: > > fd=perf_event_open(); > addr=mmap(fd); > exec() // without closing or unmapping the event > fd=perf_event_open(); > addr=mmap(fd); > rdpmc() // GPFs due to rdpmc being disabled The problem is of course that exec() plays tricks with what is current->mm, only destroying the old mappings after having installed the new mm. Fix this confusion by passing along vma->vm_mm instead of relying on current->mm. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 1e0fb9ec679c ("perf: Add pmu callbacks to track event mapping and unmapping") Link: http://lkml.kernel.org/r/20170802173930.cstykcqefmqt7jau@hirez.programming.kicks-ass.net [ Minor cleanups. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-09crypto: x86/sha1 - Fix reads beyond the number of blocks passedmegha.dey@linux.intel.com
It was reported that the sha1 AVX2 function(sha1_transform_avx2) is reading ahead beyond its intended data, and causing a crash if the next block is beyond page boundary: http://marc.info/?l=linux-crypto-vger&m=149373371023377 This patch makes sure that there is no overflow for any buffer length. It passes the tests written by Jan Stancek that revealed this problem: https://github.com/jstancek/sha1-avx2-crash I have re-enabled sha1-avx2 by reverting commit b82ce24426a4071da9529d726057e4e642948667 Cc: <stable@vger.kernel.org> Fixes: b82ce24426a4 ("crypto: sha1-ssse3 - Disable avx2") Originally-by: Ilya Albrekht <ilya.albrekht@intel.com> Tested-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-08-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "ARM: - Yet another race with VM destruction plugged - A set of small vgic fixes x86: - Preserve pending INIT - RCU fixes in paravirtual async pf, VM teardown, and VMXOFF emulation - nVMX interrupt injection and dirty tracking fixes - initialize to make UBSAN happy" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: arm/arm64: vgic: Use READ_ONCE fo cmpxchg KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit" KVM: nVMX: mark vmcs12 pages dirty on L2 exit kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown KVM: nVMX: do not pin the VMCS12 KVM: avoid using rcu_dereference_protected KVM: X86: init irq->level in kvm_pv_kick_cpu_op KVM: X86: Fix loss of pending INIT due to race KVM: async_pf: make rcu irq exit if not triggered from idle task KVM: nVMX: fixes to nested virt interrupt injection KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12 KVM: arm/arm64: Handle hva aging while destroying the vm KVM: arm/arm64: PMU: Fix overflow interrupt injection KVM: arm/arm64: Fix bug in advertising KVM_CAP_MSI_DEVID capability
2017-08-04Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "The recent irq core changes unearthed API abuse in the HPET code, which manifested itself in a suspend/resume regression. The fix replaces the cruft with the proper function calls and cures the regression" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hpet: Cure interface abuse in the resume path
2017-08-03Merge branches 'pm-cpufreq-x86', 'pm-cpufreq-docs' and 'intel_pstate'Rafael J. Wysocki
* pm-cpufreq-x86: cpufreq: x86: Make scaling_cur_freq behave more as expected * pm-cpufreq-docs: cpufreq: docs: Add missing cpuinfo_cur_freq description * intel_pstate: cpufreq: intel_pstate: Drop ->get from intel_pstate structure
2017-08-03KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"Wanpeng Li
------------[ cut here ]------------ WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7 RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] Call Trace: vmx_check_nested_events+0x131/0x1f0 [kvm_intel] ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm] ? vmx_vcpu_load+0x1be/0x220 [kvm_intel] ? kvm_arch_vcpu_load+0x62/0x230 [kvm] kvm_vcpu_ioctl+0x340/0x700 [kvm] ? kvm_vcpu_ioctl+0x340/0x700 [kvm] ? __fget+0xfc/0x210 do_vfs_ioctl+0xa4/0x6a0 ? __fget+0x11d/0x210 SyS_ioctl+0x79/0x90 do_syscall_64+0x8f/0x750 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL64_slow_path+0x25/0x25 This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which means that tells the kernel to not make use of any IOAPICs that may be present in the system. Actually external_intr variable in nested_vmx_vmexit() is the req_int_win variable passed from vcpu_enter_guest() which means that the L0's userspace requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) && L0's userspace reqeusts an irq window) is true, so there is no interrupt which L1 requires to inject to L2, we should not attempt to emualte "Acknowledge interrupt on exit" for the irq window requirement in this scenario. This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit" if there is no L1 requirement to inject an interrupt to L2. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> [Added code comment to make it obvious that the behavior is not correct. We should do a userspace exit with open interrupt window instead of the nested VM exit. This patch still improves the behavior, so it was accepted as a (temporary) workaround.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-02KVM: nVMX: mark vmcs12 pages dirty on L2 exitDavid Matlack
The host physical addresses of L1's Virtual APIC Page and Posted Interrupt descriptor are loaded into the VMCS02. The CPU may write to these pages via their host physical address while L2 is running, bypassing address-translation-based dirty tracking (e.g. EPT write protection). Mark them dirty on every exit from L2 to prevent them from getting out of sync with dirty tracking. Also mark the virtual APIC page and the posted interrupt descriptor dirty when KVM is virtualizing posted interrupt processing. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-02kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardownDavid Matlack
According to the Intel SDM, software cannot rely on the current VMCS to be coherent after a VMXOFF or shutdown. So this is a valid way to handle VMCS12 flushes. 24.11.1 Software Use of Virtual-Machine Control Structures ... If a logical processor leaves VMX operation, any VMCSs active on that logical processor may be corrupted (see below). To prevent such corruption of a VMCS that may be used either after a return to VMX operation or on another logical processor, software should execute VMCLEAR for that VMCS before executing the VMXOFF instruction or removing power from the processor (e.g., as part of a transition to the S3 and S4 power states). ... This fixes a "suspicious rcu_dereference_check() usage!" warning during kvm_vm_release() because nested_release_vmcs12() calls kvm_vcpu_write_guest_page() without holding kvm->srcu. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-02KVM: nVMX: do not pin the VMCS12Paolo Bonzini
Since the current implementation of VMCS12 does a memcpy in and out of guest memory, we do not need current_vmcs12 and current_vmcs12_page anymore. current_vmptr is enough to read and write the VMCS12. And David Matlack noted: This patch also fixes dirty tracking (memslot->dirty_bitmap) of the VMCS12 page by using kvm_write_guest. nested_release_page() only marks the struct page dirty. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Added David Matlack's note and nested_release_page_clean() fix.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-02KVM: X86: init irq->level in kvm_pv_kick_cpu_opLongpeng(Mike)
'lapic_irq' is a local variable and its 'level' field isn't initialized, so 'level' is random, it doesn't matter but makes UBSAN unhappy: UBSAN: Undefined behaviour in .../lapic.c:... load of value 10 is not a valid value for type '_Bool' ... Call Trace: [<ffffffff81f030b6>] dump_stack+0x1e/0x20 [<ffffffff81f03173>] ubsan_epilogue+0x12/0x55 [<ffffffff81f03b96>] __ubsan_handle_load_invalid_value+0x118/0x162 [<ffffffffa1575173>] kvm_apic_set_irq+0xc3/0xf0 [kvm] [<ffffffffa1575b20>] kvm_irq_delivery_to_apic_fast+0x450/0x910 [kvm] [<ffffffffa15858ea>] kvm_irq_delivery_to_apic+0xfa/0x7a0 [kvm] [<ffffffffa1517f4e>] kvm_emulate_hypercall+0x62e/0x760 [kvm] [<ffffffffa113141a>] handle_vmcall+0x1a/0x30 [kvm_intel] [<ffffffffa114e592>] vmx_handle_exit+0x7a2/0x1fa0 [kvm_intel] ... Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-02KVM: X86: Fix loss of pending INIT due to raceWanpeng Li
When SMP VM start, AP may lost INIT because of receiving INIT between kvm_vcpu_ioctl_x86_get/set_vcpu_events. vcpu 0 vcpu 1 kvm_vcpu_ioctl_x86_get_vcpu_events events->smi.latched_init = 0 send INIT to vcpu1 set vcpu1's pending_events kvm_vcpu_ioctl_x86_set_vcpu_events if (events->smi.latched_init == 0) clear INIT in pending_events This patch fixes it by just update SMM related flags if we are in SMM. Thanks Peng Hao for the report and original commit message. Reported-by: Peng Hao <peng.hao2@zte.com.cn> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-01KVM: async_pf: make rcu irq exit if not triggered from idle taskWanpeng Li
WARNING: CPU: 5 PID: 1242 at kernel/rcu/tree_plugin.h:323 rcu_note_context_switch+0x207/0x6b0 CPU: 5 PID: 1242 Comm: unity-settings- Not tainted 4.13.0-rc2+ #1 RIP: 0010:rcu_note_context_switch+0x207/0x6b0 Call Trace: __schedule+0xda/0xba0 ? kvm_async_pf_task_wait+0x1b2/0x270 schedule+0x40/0x90 kvm_async_pf_task_wait+0x1cc/0x270 ? prepare_to_swait+0x22/0x70 do_async_page_fault+0x77/0xb0 ? do_async_page_fault+0x77/0xb0 async_page_fault+0x28/0x30 RIP: 0010:__d_lookup_rcu+0x90/0x1e0 I encounter this when trying to stress the async page fault in L1 guest w/ L2 guests running. Commit 9b132fbe5419 (Add rcu user eqs exception hooks for async page fault) adds rcu_irq_enter/exit() to kvm_async_pf_task_wait() to exit cpu idle eqs when needed, to protect the code that needs use rcu. However, we need to call the pair even if the function calls schedule(), as seen from the above backtrace. This patch fixes it by informing the RCU subsystem exit/enter the irq towards/away from idle for both n.halted and !n.halted. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-01KVM: nVMX: fixes to nested virt interrupt injectionPaolo Bonzini
There are three issues in nested_vmx_check_exception: 1) it is not taking PFEC_MATCH/PFEC_MASK into account, as reported by Wanpeng Li; 2) it should rebuild the interruption info and exit qualification fields from scratch, as reported by Jim Mattson, because the values from the L2->L0 vmexit may be invalid (e.g. if an emulated instruction causes a page fault, the EPT misconfig's exit qualification is incorrect). 3) CR2 and DR6 should not be written for exception intercept vmexits (CR2 only for AMD). This patch fixes the first two and adds a comment about the last, outlining the fix. Cc: Jim Mattson <jmattson@google.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-01KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12Paolo Bonzini
Do this in the caller of nested_vmx_vmexit instead. nested_vmx_check_exception was doing a vmwrite to the vmcs02's VM_EXIT_INTR_ERROR_CODE field, so that prepare_vmcs12 would move the field to vmcs12->vm_exit_intr_error_code. However that isn't possible on pre-Haswell machines. Moving the vmcs12 write to the callers fixes it. Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [Changed nested_vmx_reflect_vmexit() return type to (int)1 from (bool)1, thanks to fengguang.wu@intel.com] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-01x86/hpet: Cure interface abuse in the resume pathThomas Gleixner
The HPET resume path abuses irq_domain_[de]activate_irq() to restore the MSI message in the HPET chip for the boot CPU on resume and it relies on an implementation detail of the interrupt core code, which magically makes the HPET unmask call invoked via a irq_disable/enable pair. This worked as long as the irq code did unconditionally invoke the unmask() callback. With the recent changes which keep track of the masked state to avoid expensive hardware access, this does not longer work. As a consequence the HPET timer interrupts are not unmasked which breaks resume as the boot CPU waits forever that a timer interrupt arrives. Make the restore of the MSI message explicit and invoke the unmask() function directly. While at it get rid of the pointless affinity setting as nothing can change the affinity of the interrupt and the vector across suspend/resume. The restore of the MSI message reestablishes the previous affinity setting which is the correct one. Fixes: bf22ff45bed6 ("genirq: Avoid unnecessary low level irq function calls") Reported-and-tested-by: Tomi Sarvela <tomi.p.sarvela@intel.com> Reported-by: Martin Peres <martin.peres@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: jeffy.chen@rock-chips.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707312158590.2287@nanos
2017-07-30Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of x86 fixes: - prevent the kernel from using the EFI reboot method when EFI is disabled. - two patches addressing clang issues" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Disable the address-of-packed-member compiler warning x86/efi: Fix reboot_mode when EFI runtime services are disabled x86/boot: #undef memcpy() et al in string.c
2017-07-30Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A couple of fixes for performance counters and kprobes: - a series of small patches which make the uncore performance counters on Skylake server systems work correctly - add a missing instruction slot release to the failure path of kprobes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes/x86: Release insn_slot in failure path perf/x86/intel/uncore: Fix missing marker for skx_uncore_cha_extra_regs perf/x86/intel/uncore: Fix SKX CHA event extra regs perf/x86/intel/uncore: Remove invalid Skylake server CHA filter field perf/x86/intel/uncore: Fix Skylake server CHA LLC_LOOKUP event umask perf/x86/intel/uncore: Fix Skylake server PCU PMU event format perf/x86/intel/uncore: Fix Skylake UPI PMU event masks
2017-07-30cpufreq: x86: Make scaling_cur_freq behave more as expectedRafael J. Wysocki
After commit f8475cef9008 "x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF" the scaling_cur_freq policy attribute in sysfs only behaves as expected on x86 with APERF/MPERF registers available when it is read from at least twice in a row. The value returned by the first read may not be meaningful, because the computations in there use cached values from the previous iteration of aperfmperf_snapshot_khz() which may be stale. To prevent that from happening, modify arch_freq_get_on_cpu() to call aperfmperf_snapshot_khz() twice, with a short delay between these calls, if the previous invocation of aperfmperf_snapshot_khz() was too far back in the past (specifically, more that 1s ago). Also, as pointed out by Doug Smythies, aperf_delta is limited now and the multiplication of it by cpu_khz won't overflow, so simplify the s->khz computations too. Fixes: f8475cef9008 "x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF" Reported-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-28x86/boot: Disable the address-of-packed-member compiler warningMatthias Kaehlcke
The clang warning 'address-of-packed-member' is disabled for the general kernel code, also disable it for the x86 boot code. This suppresses a bunch of warnings like this when building with clang: ./arch/x86/include/asm/processor.h:535:30: warning: taking address of packed member 'sp0' of class or structure 'x86_hw_tss' may result in an unaligned pointer value [-Waddress-of-packed-member] return this_cpu_read_stable(cpu_tss.x86_tss.sp0); ^~~~~~~~~~~~~~~~~~~ ./arch/x86/include/asm/percpu.h:391:59: note: expanded from macro 'this_cpu_read_stable' #define this_cpu_read_stable(var) percpu_stable_op("mov", var) ^~~ ./arch/x86/include/asm/percpu.h:228:16: note: expanded from macro 'percpu_stable_op' : "p" (&(var))); ^~~ Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Cc: Doug Anderson <dianders@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170725215053.135586-1-mka@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-26KVM: LAPIC: Fix reentrancy issues with preempt notifiersWanpeng Li
Preempt can occur in the preemption timer expiration handler: CPU0 CPU1 preemption timer vmexit handle_preemption_timer(vCPU0) kvm_lapic_expired_hv_timer hv_timer_is_use == true sched_out sched_in kvm_arch_vcpu_load kvm_lapic_restart_hv_timer restart_apic_timer start_hv_timer already-expired timer or sw timer triggerd in the window start_sw_timer cancel_hv_timer /* back in kvm_lapic_expired_hv_timer */ cancel_hv_timer WARN_ON(!apic->lapic_timer.hv_timer_in_use); ==> Oops This can be reproduced if CONFIG_PREEMPT is enabled. ------------[ cut here ]------------ WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1563 kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm] CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G OE 4.13.0-rc2+ #16 RIP: 0010:kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm] Call Trace: handle_preemption_timer+0xe/0x20 [kvm_intel] vmx_handle_exit+0xb8/0xd70 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm] ? kvm_arch_vcpu_load+0x47/0x230 [kvm] ? kvm_arch_vcpu_load+0x62/0x230 [kvm] kvm_vcpu_ioctl+0x340/0x700 [kvm] ? kvm_vcpu_ioctl+0x340/0x700 [kvm] ? __fget+0xfc/0x210 do_vfs_ioctl+0xa4/0x6a0 ? __fget+0x11d/0x210 SyS_ioctl+0x79/0x90 do_syscall_64+0x81/0x220 entry_SYSCALL64_slow_path+0x25/0x25 ------------[ cut here ]------------ WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1498 cancel_hv_timer.isra.40+0x4f/0x60 [kvm] CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G W OE 4.13.0-rc2+ #16 RIP: 0010:cancel_hv_timer.isra.40+0x4f/0x60 [kvm] Call Trace: kvm_lapic_expired_hv_timer+0x3e/0xb0 [kvm] handle_preemption_timer+0xe/0x20 [kvm_intel] vmx_handle_exit+0xb8/0xd70 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm] ? kvm_arch_vcpu_load+0x47/0x230 [kvm] ? kvm_arch_vcpu_load+0x62/0x230 [kvm] kvm_vcpu_ioctl+0x340/0x700 [kvm] ? kvm_vcpu_ioctl+0x340/0x700 [kvm] ? __fget+0xfc/0x210 do_vfs_ioctl+0xa4/0x6a0 ? __fget+0x11d/0x210 SyS_ioctl+0x79/0x90 do_syscall_64+0x81/0x220 entry_SYSCALL64_slow_path+0x25/0x25 This patch fixes it by making the caller of cancel_hv_timer, start_hv_timer and start_sw_timer be in preemption-disabled regions, which trivially avoid any reentrancy issue with preempt notifier. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> [Add more WARNs. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-26KVM: nVMX: Fix loss of L2's NMI blocking stateWanpeng Li
Run kvm-unit-tests/eventinj.flat in L1 w/ ept=0 on both L0 and L1: Before NMI IRET test Sending NMI to self NMI isr running stack 0x461000 Sending nested NMI to self After nested NMI to self Nested NMI isr running rip=40038e After iret After NMI to self FAIL: NMI Commit 4c4a6f790ee862 (KVM: nVMX: track NMI blocking state separately for each VMCS) tracks NMI blocking state separately for vmcs01 and vmcs02. However it is not enough: - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault on IRET, so the L2 can generate #PF which can be intercepted by L0. - L0 walks L1's guest page table and sees the mapping is invalid, it resumes the L1 guest and injects the #PF into L1. At this point the vmcs02 has nmi_known_unmasked=true. - L1 sets set bit 3 (blocking by NMI) in the interruptibility-state field of vmcs12 (and fixes the shadow page table) before resuming L2 guest. - L1 executes VMRESUME to resume L2, causing a vmexit to L0 - during VMRESUME emulation, prepare_vmcs02 sets bit 3 in the interruptibility-state field of vmcs02, but nmi_known_unmasked is still true. - L2 immediately exits to L0 with another page fault, because L0 still has not updated the NGVA->HPA page tables. However, nmi_known_unmasked is true so vmx_recover_nmi_blocking does not do anything. The fix is to update nmi_known_unmasked when preparing vmcs02 from vmcs12. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-26KVM: nVMX: Fix posted intr delivery when vcpu is in guest modeWincy Van
The PI vector for L0 and L1 must be different. If dest vcpu0 is in guest mode while vcpu1 is delivering a non-nested PI to vcpu0, there wont't be any vmexit so that the non-nested interrupt will be delayed. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-26x86: irq: Define a global vector for nested posted interruptsWincy Van
We are using the same vector for nested/non-nested posted interrupts delivery, this may cause interrupts latency in L1 since we can't kick the L2 vcpu out of vmx-nonroot mode. This patch introduces a new vector which is only for nested posted interrupts to solve the problems above. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-26KVM: x86: do mask out upper bits of PAE CR3Paolo Bonzini
This reverts the change of commit f85c758dbee54cc3612a6e873ef7cecdb66ebee5, as the behavior it modified was intended. The VM is running in 32-bit PAE mode, and Table 4-7 of the Intel manual says: Table 4-7. Use of CR3 with PAE Paging Bit Position(s) Contents 4:0 Ignored 31:5 Physical address of the 32-Byte aligned page-directory-pointer table used for linear-address translation 63:32 Ignored (these bits exist only on processors supporting the Intel-64 architecture) To placate the static checker, write the mask explicitly as an unsigned long constant instead of using a 32-bit unsigned constant. Cc: Dan Carpenter <dan.carpenter@oracle.com> Fixes: f85c758dbee54cc3612a6e873ef7cecdb66ebee5 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-25x86/efi: Fix reboot_mode when EFI runtime services are disabledStefan Assmann
When EFI runtime services are disabled, for example by the "noefi" kernel cmdline parameter, the reboot_type could still be set to BOOT_EFI causing reboot to fail. Fix this by checking if EFI runtime services are enabled. Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170724122248.24006-1-sassmann@kpanic.de [ Fixed 'not disabled' double negation. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-25x86/boot: #undef memcpy() et al in string.cMichael Davidson
undef memcpy() and friends in boot/string.c so that the functions defined here will have the correct names, otherwise we end up up trying to redefine __builtin_memcpy() etc. Surprisingly, GCC allows this (and, helpfully, discards the __builtin_ prefix from the function name when compiling it), but clang does not. Adding these #undef's appears to preserve what I assume was the original intent of the code. Signed-off-by: Michael Davidson <md@google.com> Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bernhard.Rosenkranzer@linaro.org Cc: Greg Hackmann <ghackmann@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170724235155.79255-1-mka@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-24kprobes/x86: Release insn_slot in failure pathMasami Hiramatsu
The following commit: 003002e04ed3 ("kprobes: Fix arch_prepare_kprobe to handle copy insn failures") returns an error if the copying of the instruction, but does not release the allocated insn_slot. Clean up correctly. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S . Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 003002e04ed3 ("kprobes: Fix arch_prepare_kprobe to handle copy insn failures") Link: http://lkml.kernel.org/r/150064834183.6172.11694375818447664416.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>