Age | Commit message (Collapse) | Author |
|
HV_X64_MSR_VP_RUNTIME msr used by guest to get
"the time the virtual processor consumes running guest code,
and the time the associated logical processor spends running
hypervisor code on behalf of that guest."
Calculation of this time is performed by task_cputime_adjusted()
for vcpu task.
Necessary to support loading of winhv.sys in guest, which in turn is
required to support Windows VMBus.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Insert Hyper-V HV_X64_MSR_VP_INDEX into msr's emulated list,
so QEMU can set Hyper-V features cpuid HV_X64_MSR_VP_INDEX_AVAILABLE
bit correctly. KVM emulation part is in place already.
Necessary to support loading of winhv.sys in guest, which in turn is
required to support Windows VMBus.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
HV_X64_MSR_RESET msr is used by Hyper-V based Windows guest
to reset guest VM by hypervisor.
Necessary to support loading of winhv.sys in guest, which in turn is
required to support Windows VMBus.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In order to enable userspace PIC support, the userspace PIC needs to
be able to inject local interrupts even when the APICs are in the
kernel.
KVM_INTERRUPT now supports sending local interrupts to an APIC when
APICs are in the kernel.
The ready_for_interrupt_request flag is now only set when the CPU/APIC
will immediately accept and inject an interrupt (i.e. APIC has not
masked the PIC).
When the PIC wishes to initiate an INTA cycle with, say, CPU0, it
kicks CPU0 out of the guest, and renedezvous with CPU0 once it arrives
in userspace.
When the CPU/APIC unmasks the PIC, a KVM_EXIT_IRQ_WINDOW_OPEN is
triggered, so that userspace has a chance to inject a PIC interrupt
if it had been pending.
Overall, this design can lead to a small number of spurious userspace
renedezvous. In particular, whenever the PIC transistions from low to
high while it is masked and whenever the PIC becomes unmasked while
it is low.
Note: this does not buffer more than one local interrupt in the
kernel, so the VMM needs to enter the guest in order to complete
interrupt injection before injecting an additional interrupt.
Compiles for x86.
Can pass the KVM Unit Tests.
Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In order to support a userspace IOAPIC interacting with an in kernel
APIC, the EOI exit bitmaps need to be configurable.
If the IOAPIC is in userspace (i.e. the irqchip has been split), the
EOI exit bitmaps will be set whenever the GSI Routes are configured.
In particular, for the low MSI routes are reservable for userspace
IOAPICs. For these MSI routes, the EOI Exit bit corresponding to the
destination vector of the route will be set for the destination VCPU.
The intention is for the userspace IOAPICs to use the reservable MSI
routes to inject interrupts into the guest.
This is a slight abuse of the notion of an MSI Route, given that MSIs
classically bypass the IOAPIC. It might be worthwhile to add an
additional route type to improve clarity.
Compile tested for Intel x86.
Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Adds KVM_EXIT_IOAPIC_EOI which allows the kernel to EOI
level-triggered IOAPIC interrupts.
Uses a per VCPU exit bitmap to decide whether or not the IOAPIC needs
to be informed (which is identical to the EOI_EXIT_BITMAP field used
by modern x86 processors, but can also be used to elide kvm IOAPIC EOI
exits on older processors).
[Note: A prototype using ResampleFDs found that decoupling the EOI
from the VCPU's thread made it possible for the VCPU to not see a
recent EOI after reentering the guest. This does not match real
hardware.]
Compile tested for Intel x86.
Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
First patch in a series which enables the relocation of the
PIC/IOAPIC to userspace.
Adds capability KVM_CAP_SPLIT_IRQCHIP;
KVM_CAP_SPLIT_IRQCHIP enables the construction of LAPICs without the
rest of the irqchip.
Compile tested for x86.
Signed-off-by: Steve Rutherford <srutherford@google.com>
Suggested-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The interrupt window is currently checked twice, once in vmx.c/svm.c and
once in dm_request_for_irq_injection. The only difference is the extra
check for kvm_arch_interrupt_allowed in dm_request_for_irq_injection,
and the different return value (EINTR/KVM_EXIT_INTR for vmx.c/svm.c vs.
0/KVM_EXIT_IRQ_WINDOW_OPEN for dm_request_for_irq_injection).
However, dm_request_for_irq_injection is basically dead code! Revive it
by removing the checks in vmx.c and svm.c's vmexit handlers, and
fixing the returned values for the dm_request_for_irq_injection case.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Avoid pointer chasing and memory barriers, and simplify the code
when split irqchip (LAPIC in kernel, IOAPIC/PIC in userspace)
is introduced.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
We can reuse the algorithm that computes the EOI exit bitmap to figure
out which vectors are handled by the IOAPIC. The only difference
between the two is for edge-triggered interrupts other than IRQ8
that have no notifiers active; however, the IOAPIC does not have to
do anything special for these interrupts anyway.
This again limits the interactions between the IOAPIC and the LAPIC,
making it easier to move the former to userspace.
Inspired by a patch from Steve Rutherford.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Do not compute TMR in advance. Instead, set the TMR just before the interrupt
is accepted into the IRR. This limits the coupling between IOAPIC and LAPIC.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
system MSR"
Shifting pvclock_vcpu_time_info.system_time on write to KVM system time
MSR is a change of ABI. Probably only 2.6.16 based SLES 10 breaks due
to its custom enhancements to kvmclock, but KVM never declared the MSR
only for one-shot initialization. (Doc says that only one write is
needed.)
This reverts commit b7e60c5aedd2b63f16ef06fde4f81ca032211bc5.
And adds a note to the definition of PVCLOCK_COUNTS_FROM_ZERO.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
These have roughly the same purpose as the SMRR, which we do not need
to implement in KVM. However, Linux accesses MSR_K8_TSEG_ADDR at
boot, which causes problems when running a Xen dom0 under KVM.
Just return 0, meaning that processor protection of SMRAM is not
in effect.
Reported-by: M A Young <m.a.young@durham.ac.uk>
Cc: stable@vger.kernel.org
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This new statistic can help diagnosing VCPUs that, for any reason,
trigger bad behavior of halt_poll_ns autotuning.
For example, say halt_poll_ns = 480000, and wakeups are spaced exactly
like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes
10+20+40+80+160+320+480 = 1110 microseconds out of every
479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then
is consuming about 30% more CPU than it would use without
polling. This would show as an abnormally high number of
attempted polling compared to the successful polls.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com<
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Pull more kvm updates from Paolo Bonzini:
"ARM:
- Full debug support for arm64
- Active state switching for timer interrupts
- Lazy FP/SIMD save/restore for arm64
- Generic ARMv8 target
PPC:
- Book3S: A few bug fixes
- Book3S: Allow micro-threading on POWER8
x86:
- Compiler warnings
Generic:
- Adaptive polling for guest halt"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (49 commits)
kvm: irqchip: fix memory leak
kvm: move new trace event outside #ifdef CONFIG_KVM_ASYNC_PF
KVM: trace kvm_halt_poll_ns grow/shrink
KVM: dynamic halt-polling
KVM: make halt_poll_ns per-vCPU
Silence compiler warning in arch/x86/kvm/emulate.c
kvm: compile process_smi_save_seg_64() only for x86_64
KVM: x86: avoid uninitialized variable warning
KVM: PPC: Book3S: Fix typo in top comment about locking
KVM: PPC: Book3S: Fix size of the PSPB register
KVM: PPC: Book3S HV: Exit on H_DOORBELL if HOST_IPI is set
KVM: PPC: Book3S HV: Fix race in starting secondary threads
KVM: PPC: Book3S: correct width in XER handling
KVM: PPC: Book3S HV: Fix preempted vcore stolen time calculation
KVM: PPC: Book3S HV: Fix preempted vcore list locking
KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD
KVM: PPC: Book3S HV: Fix bug in dirty page tracking
KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE
KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
KVM: PPC: Book3S HV: Make use of unused threads when running guests
...
|
|
The process_smi_save_seg_64() function called only in the
process_smi_save_state_64() if the CONFIG_X86_64 is set. This
patch adds #ifdef CONFIG_X86_64 around process_smi_save_seg_64()
to prevent following warning message:
arch/x86/kvm/x86.c:5946:13: warning: ‘process_smi_save_seg_64’ defined but not used [-Wunused-function]
static void process_smi_save_seg_64(struct kvm_vcpu *vcpu, char *buf, int n)
^
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm changes from Ingo Molnar:
"The biggest changes in this cycle were:
- Revamp, simplify (and in some cases fix) Time Stamp Counter (TSC)
primitives. (Andy Lutomirski)
- Add new, comprehensible entry and exit handlers written in C.
(Andy Lutomirski)
- vm86 mode cleanups and fixes. (Brian Gerst)
- 32-bit compat code cleanups. (Brian Gerst)
The amount of simplification in low level assembly code is already
palpable:
arch/x86/entry/entry_32.S | 130 +----
arch/x86/entry/entry_64.S | 197 ++-----
but more simplifications are planned.
There's also the usual laudry mix of low level changes - see the
changelog for details"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (83 commits)
x86/asm: Drop repeated macro of X86_EFLAGS_AC definition
x86/asm/msr: Make wrmsrl() a function
x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer
x86/asm: Add MONITORX/MWAITX instruction support
x86/traps: Weaken context tracking entry assertions
x86/asm/tsc: Add rdtscll() merge helper
selftests/x86: Add syscall_nt selftest
selftests/x86: Disable sigreturn_64
x86/vdso: Emit a GNU hash
x86/entry: Remove do_notify_resume(), syscall_trace_leave(), and their TIF masks
x86/entry/32: Migrate to C exit path
x86/entry/32: Remove 32-bit syscall audit optimizations
x86/vm86: Rename vm86->v86flags and v86mask
x86/vm86: Rename vm86->vm86_info to user_vm86
x86/vm86: Clean up vm86.h includes
x86/vm86: Move the vm86 IRQ definitions to vm86.h
x86/vm86: Use the normal pt_regs area for vm86
x86/vm86: Eliminate 'struct kernel_vm86_struct'
x86/vm86: Move fields from 'struct kernel_vm86_struct' to 'struct vm86'
x86/vm86: Move vm86 fields out of 'thread_struct'
...
|
|
Pull kvm updates from Paolo Bonzini:
"A very small release for x86 and s390 KVM.
- s390: timekeeping changes, cleanups and fixes
- x86: support for Hyper-V MSRs to report crashes, and a bunch of
cleanups.
One interesting feature that was planned for 4.3 (emulating the local
APIC in kernel while keeping the IOAPIC and 8254 in userspace) had to
be delayed because Intel complained about my reading of the manual"
* tag 'kvm-4.3-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
x86/kvm: Rename VMX's segment access rights defines
KVM: x86/vPMU: Fix unnecessary signed extension for AMD PERFCTRn
kvm: x86: Fix error handling in the function kvm_lapic_sync_from_vapic
KVM: s390: Fix assumption that kvm_set_irq_routing is always run successfully
KVM: VMX: drop ept misconfig check
KVM: MMU: fully check zero bits for sptes
KVM: MMU: introduce is_shadow_zero_bits_set()
KVM: MMU: introduce the framework to check zero bits on sptes
KVM: MMU: split reset_rsvds_bits_mask_ept
KVM: MMU: split reset_rsvds_bits_mask
KVM: MMU: introduce rsvd_bits_validate
KVM: MMU: move FNAME(is_rsvd_bits_set) to mmu.c
KVM: MMU: fix validation of mmio page fault
KVM: MTRR: Use default type for non-MTRR-covered gfn before WARN_ON
KVM: s390: host STP toleration for VMs
KVM: x86: clean/fix memory barriers in irqchip_in_kernel
KVM: document memory barriers for kvm->vcpus/kvm->online_vcpus
KVM: x86: remove unnecessary memory barriers for shared MSRs
KVM: move code related to KVM_SET_BOOT_CPU_ID to x86
KVM: s390: log capability enablement and vm attribute changes
...
|
|
Conflicts:
arch/x86/entry/entry_64_compat.S
arch/x86/math-emu/get_address.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
When kvm_set_msr_common() handles a guest's write to
MSR_IA32_TSC_ADJUST, it will calcuate an adjustment based on the data
written by guest and then use it to adjust TSC offset by calling a
call-back adjust_tsc_offset(). The 3rd parameter of adjust_tsc_offset()
indicates whether the adjustment is in host TSC cycles or in guest TSC
cycles. If SVM TSC scaling is enabled, adjust_tsc_offset()
[i.e. svm_adjust_tsc_offset()] will first scale the adjustment;
otherwise, it will just use the unscaled one. As the MSR write here
comes from the guest, the adjustment is in guest TSC cycles. However,
the current kvm_set_msr_common() uses it as a value in host TSC
cycles (by using true as the 3rd parameter of adjust_tsc_offset()),
which can result in an incorrect adjustment of TSC offset if SVM TSC
scaling is enabled. This patch fixes this problem.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Cc: stable@vger.linux.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The recent BlackHat 2015 presentation "The Memory Sinkhole"
mentions that the IDT limit is zeroed on entry to SMM.
This is not documented, and must have changed some time after 2010
(see http://www.ssi.gouv.fr/uploads/IMG/pdf/IT_Defense_2010_final.pdf).
KVM was not doing it, but the fix is easy.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
These two fields, rsvd_bits_mask and bad_mt_xwr, in "struct kvm_mmu" are
used to check if reserved bits set on guest ptes, move them to a data
struct so that the approach can be applied to check host shadow page
table entries as well
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The memory barriers are trying to protect against concurrent RCU-based
interrupt injection, but the IRQ routing table is not valid at the time
kvm->arch.vpic is written. Fix this by writing kvm->arch.vpic last.
kvm_destroy_pic then need not set kvm->arch.vpic to NULL; modify it
to take a struct kvm_pic* and reuse it if the IOAPIC creation fails.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
There is no smp_rmb matching the smp_wmb. shared_msr_update is called from
hardware_enable, which in turn is called via on_each_cpu. on_each_cpu
and must imply a read memory barrier (on x86 the rmb is achieved simply
through asm volatile in native_apic_mem_write).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This is another remnant of ia64 support.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Sending of notification is done by exiting vcpu to user space
if KVM_REQ_HV_CRASH is enabled for vcpu. At exit to user space
the kvm_run structure contains system_event with type
KVM_SYSTEM_EVENT_CRASH to notify about guest crash occurred.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Added kvm Hyper-V context hv crash variables as storage
of Hyper-V crash msrs.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This patch introduce Hyper-V related source code file - hyperv.c and
per vm and per vcpu hyperv context structures.
All Hyper-V MSR's and hypercall code moved into hyperv.c.
All Hyper-V kvm/vcpu fields moved into appropriate hyperv context
structures. Copyrights and authors information copied from x86.c
to hyperv.c.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
[ 68.196974] WARNING: CPU: 1 PID: 2140 at arch/x86/kvm/x86.c:3161 kvm_arch_vcpu_ioctl+0xe88/0x1340 [kvm]()
[ 68.196975] Modules linked in: snd_hda_codec_hdmi i915 rfcomm bnep bluetooth i2c_algo_bit rfkill nfsd drm_kms_helper nfs_acl nfs drm lockd grace sunrpc fscache snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_dummy snd_seq_oss x86_pkg_temp_thermal snd_seq_midi kvm_intel snd_seq_midi_event snd_rawmidi kvm snd_seq ghash_clmulni_intel fuse snd_timer aesni_intel parport_pc ablk_helper snd_seq_device cryptd ppdev snd lp parport lrw dcdbas gf128mul i2c_core glue_helper lpc_ich video shpchp mfd_core soundcore serio_raw acpi_cpufreq ext4 mbcache jbd2 sd_mod crc32c_intel ahci libahci libata e1000e ptp pps_core
[ 68.197005] CPU: 1 PID: 2140 Comm: qemu-system-x86 Not tainted 4.2.0-rc1+ #2
[ 68.197006] Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
[ 68.197007] ffffffffa03b0657 ffff8800d984bca8 ffffffff815915a2 0000000000000000
[ 68.197009] 0000000000000000 ffff8800d984bce8 ffffffff81057c0a 00007ff6d0001000
[ 68.197010] 0000000000000002 ffff880211c1a000 0000000000000004 ffff8800ce0288c0
[ 68.197012] Call Trace:
[ 68.197017] [<ffffffff815915a2>] dump_stack+0x45/0x57
[ 68.197020] [<ffffffff81057c0a>] warn_slowpath_common+0x8a/0xc0
[ 68.197022] [<ffffffff81057cfa>] warn_slowpath_null+0x1a/0x20
[ 68.197029] [<ffffffffa037bed8>] kvm_arch_vcpu_ioctl+0xe88/0x1340 [kvm]
[ 68.197035] [<ffffffffa037aede>] ? kvm_arch_vcpu_load+0x4e/0x1c0 [kvm]
[ 68.197040] [<ffffffffa03696a6>] kvm_vcpu_ioctl+0xc6/0x5c0 [kvm]
[ 68.197043] [<ffffffff811252d2>] ? perf_pmu_enable+0x22/0x30
[ 68.197044] [<ffffffff8112663e>] ? perf_event_context_sched_in+0x7e/0xb0
[ 68.197048] [<ffffffff811a6882>] do_vfs_ioctl+0x2c2/0x4a0
[ 68.197050] [<ffffffff8107bf33>] ? finish_task_switch+0x173/0x220
[ 68.197053] [<ffffffff8123307f>] ? selinux_file_ioctl+0x4f/0xd0
[ 68.197055] [<ffffffff8122cac3>] ? security_file_ioctl+0x43/0x60
[ 68.197057] [<ffffffff811a6ad9>] SyS_ioctl+0x79/0x90
[ 68.197060] [<ffffffff81597e57>] entry_SYSCALL_64_fastpath+0x12/0x6a
[ 68.197061] ---[ end trace 558a5ebf9445fc80 ]---
After commit (0c4109bec0 'x86/fpu/xstate: Fix up bad get_xsave_addr()
assumptions'), there is no assumption an xsave bit is present in the
hardware (pcntxt_mask) that it is always present in a given xsave buffer.
An enabled state to be present on 'pcntxt_mask', but *not* in 'xstate_bv'
could happen when the last 'xsave' did not request that this feature be
saved (unlikely) or because the "init optimization" caused it to not be
saved. This patch kill the assumption.
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
If there are no assigned devices, the guest PAT are not providing
any useful information and can be overridden to writeback; VMX
always does this because it has the "IPAT" bit in its extended
page table entries, but SVM does not have anything similar.
Hook into VFIO and legacy device assignment so that they
provide this information to KVM.
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
fpu_activate is called outside of vcpu_load(), which means it should not
touch VMCS, but fpu_activate needs to. Avoid the call by moving it to a
point where we know that the guest needs eager FPU and VMCS is loaded.
This will get rid of the following trace
vmwrite error: reg 6800 value 0 (err 1)
[<ffffffff8162035b>] dump_stack+0x19/0x1b
[<ffffffffa046c701>] vmwrite_error+0x2c/0x2e [kvm_intel]
[<ffffffffa045f26f>] vmcs_writel+0x1f/0x30 [kvm_intel]
[<ffffffffa04617e5>] vmx_fpu_activate.part.61+0x45/0xb0 [kvm_intel]
[<ffffffffa0461865>] vmx_fpu_activate+0x15/0x20 [kvm_intel]
[<ffffffffa0560b91>] kvm_arch_vcpu_create+0x51/0x70 [kvm]
[<ffffffffa0548011>] kvm_vm_ioctl+0x1c1/0x760 [kvm]
[<ffffffff8118b55a>] ? handle_mm_fault+0x49a/0xec0
[<ffffffff811e47d5>] do_vfs_ioctl+0x2e5/0x4c0
[<ffffffff8127abbe>] ? file_has_perm+0xae/0xc0
[<ffffffff811e4a51>] SyS_ioctl+0xa1/0xc0
[<ffffffff81630949>] system_call_fastpath+0x16/0x1b
(Note: we also unconditionally activate FPU in vmx_vcpu_reset(), so the
removed code added nothing.)
Fixes: c447e76b4cab ("kvm/fpu: Enable eager restore kvm FPU for MPX")
Cc: <stable@vger.kernel.org>
Reported-by: Vlastimil Holer <vlastimil.holer@gmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
rdtsc_barrier(); rdtsc() is an unnecessary mouthful and requires
more thought than should be necessary. Add an rdtsc_ordered()
helper and replace the trivial call sites with it.
This should not change generated code. The duplication of the
fence asm is temporary.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/dddbf98a2af53312e9aa73a5a2b1622fe5d6f52b.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Now that there is no paravirt TSC, the "native" is
inappropriate. The function does RDTSC, so give it the obvious
name: rdtsc().
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/fd43e16281991f096c1e4d21574d9e1402c62d39.1434501121.git.luto@kernel.org
[ Ported it to v4.2-rc1. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The only caller was KVM's read_tsc(). The only difference
between vget_cycles() and native_read_tsc() was that
vget_cycles() returned zero instead of crashing on TSC-less
systems. KVM already checks vclock_mode() before calling that
function, so the extra check is unnecessary. Also, KVM
(host-side) requires the TSC to exist.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/20615df14ae2eb713ea7a5f5123c1dc4c7ca993d.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Commit 609e36d372ad ("KVM: x86: pass host_initiated to functions that
read MSRs") modified kvm_get_msr_common function to use msr_info->data
instead of data but missed one occurrence. Replace it and remove the
unused local variable.
Fixes: 609e36d372ad ("KVM: x86: pass host_initiated to functions that
read MSRs")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Pull first batch of KVM updates from Paolo Bonzini:
"The bulk of the changes here is for x86. And for once it's not for
silicon that no one owns: these are really new features for everyone.
Details:
- ARM:
several features are in progress but missed the 4.2 deadline.
So here is just a smattering of bug fixes, plus enabling the
VFIO integration.
- s390:
Some fixes/refactorings/optimizations, plus support for 2GB
pages.
- x86:
* host and guest support for marking kvmclock as a stable
scheduler clock.
* support for write combining.
* support for system management mode, needed for secure boot in
guests.
* a bunch of cleanups required for the above
* support for virtualized performance counters on AMD
* legacy PCI device assignment is deprecated and defaults to "n"
in Kconfig; VFIO replaces it
On top of this there are also bug fixes and eager FPU context
loading for FPU-heavy guests.
- Common code:
Support for multiple address spaces; for now it is used only for
x86 SMM but the s390 folks also have plans"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits)
KVM: s390: clear floating interrupt bitmap and parameters
KVM: x86/vPMU: Enable PMU handling for AMD PERFCTRn and EVNTSELn MSRs
KVM: x86/vPMU: Implement AMD vPMU code for KVM
KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch
KVM: x86/vPMU: introduce kvm_pmu_msr_idx_to_pmc
KVM: x86/vPMU: reorder PMU functions
KVM: x86/vPMU: whitespace and stylistic adjustments in PMU code
KVM: x86/vPMU: use the new macros to go between PMC, PMU and VCPU
KVM: x86/vPMU: introduce pmu.h header
KVM: x86/vPMU: rename a few PMU functions
KVM: MTRR: do not map huge page for non-consistent range
KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type
KVM: MTRR: introduce mtrr_for_each_mem_type
KVM: MTRR: introduce fixed_mtrr_addr_* functions
KVM: MTRR: sort variable MTRRs
KVM: MTRR: introduce var_mtrr_range
KVM: MTRR: introduce fixed_mtrr_segment table
KVM: MTRR: improve kvm_mtrr_get_guest_memory_type
KVM: MTRR: do not split 64 bits MSR content
KVM: MTRR: clean up mtrr default type
...
|
|
This patch enables AMD guest VM to access (R/W) PMU related MSRs, which
include PERFCTR[0..3] and EVNTSEL[0..3].
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This will be used for private function used by AMD- and Intel-specific
PMU implementations.
Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Before introducing a pmu.h header for them, make the naming more
consistent.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Sort all valid variable MTRRs based on its base address, it will help us to
check a range to see if it's fully contained in variable MTRRs
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[Fix list insertion sort, simplify var_mtrr_range_is_valid to just
test the V bit. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
vMTRR does not depend on any host MTRR feature and fixed MTRRs have always
been implemented, so drop this field
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
MSR_MTRRcap is a MTRR msr so move the handler to the common place, also
add some comments to make the hard code more readable
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
MTRR code locates in x86.c and mmu.c so that move them to a separate file to
make the organization more clearer and it will be the place where we fully
implement vMTRR
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Currently, CR0.CD is not checked when we virtualize memory cache type for
noncoherent_dma guests, this patch fixes it by :
- setting UC for all memory if CR0.CD = 1
- zapping all the last sptes in MMU if CR0.CD is changed
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
... and we're done. :)
Because SMBASE is usually relocated above 1M on modern chipsets, and
SMM handlers might indeed rely on 4G segment limits, we only expose it
if KVM is able to run the guest in big real mode. This includes any
of VMX+emulate_invalid_guest_state, VMX+unrestricted_guest, or SVM.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This is now very simple to do. The only interesting part is a simple
trick to find the right memslot in gfn_to_rmap, retrieving the address
space from the spte role word. The same trick is used in the auditing
code.
The comment on top of union kvm_mmu_page_role has been stale forever,
so remove it. Speaking of stale code, remove pad_for_nice_hex_output
too: it was splitting the "access" bitfield across two bytes and thus
had effectively turned into pad_for_ugly_hex_output.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This patch has no semantic change, but it prepares for the introduction
of a second address space for system management mode.
A new function x86_set_memory_region (and the "slots_lock taken"
counterpart __x86_set_memory_region) is introduced in order to
operate on all address spaces when adding or deleting private
memory slots.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
We need to hide SMRAM from guests not running in SMM. Therefore,
all uses of kvm_read_guest* and kvm_write_guest* must be changed to
check whether the VCPU is in system management mode and use a
different set of memslots. Switch from kvm_* to the newly-introduced
kvm_vcpu_*, which call into kvm_arch_vcpu_memslots_id.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|