summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-08-06KVM: vmx: skip VMWRITE of HOST_{FS,GS}_SEL when possibleSean Christopherson
On a 64-bit host, FS.sel and GS.sel are all but guaranteed to be 0, which in turn means they'll rarely change. Skip the VMWRITE for the associated VMCS fields when loading host state if the selector hasn't changed since the last VMWRITE. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: vmx: always initialize HOST_{FS,GS}_BASE to zero during setupSean Christopherson
The HOST_{FS,GS}_BASE fields are guaranteed to be written prior to VMENTER, by way of vmx_prepare_switch_to_guest(). Initialize the fields to zero for 64-bit kernels instead of pulling the base values from their respective MSRs. In addition to eliminating two RDMSRs, vmx_prepare_switch_to_guest() can safely assume the initial value of the fields is zero in all cases. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: vmx: move struct host_state usage to struct loaded_vmcsSean Christopherson
Make host_state a property of a loaded_vmcs so that it can be used as a cache of the VMCS fields, e.g. to lazily VMWRITE the corresponding VMCS field. Treating host_state as a cache does not work if it's not VMCS specific as the cache would become incoherent when switching between vmcs01 and vmcs02. Move vmcs_host_cr3 and vmcs_host_cr4 into host_state. Explicitly zero out host_state when allocating a new VMCS for a loaded_vmcs. Unlike the pre-existing vmcs_host_cr{3,4} usage, the segment information is not guaranteed to be (re)initialized when running a new nested VMCS, e.g. HOST_FS_BASE is not written in vmx_set_constant_host_state(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: vmx: compute need to reload FS/GS/LDT on demandSean Christopherson
Remove fs_reload_needed and gs_ldt_reload_needed from host_state and instead compute whether we need to reload various state at the time we actually do the reload. The state that is tracked by the *_reload_needed variables is not any more volatile than the trackers themselves. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: remove a misleading comment regarding vmcs02 fieldsSean Christopherson
prepare_vmcs02() has an odd comment that says certain fields are "not in vmcs02". AFAICT the intent of the comment is to document that various VMCS fields are not handled by prepare_vmcs02(), e.g. HOST_{FS,GS}_{BASE,SELECTOR}. While technically true, the comment is misleading, e.g. it can lead the reader to think that KVM never writes those fields to vmcs02. Remove the comment altogether as the handling of FS and GS is not specific to nested VMX, and GUEST_PML_INDEX has been written by prepare_vmcs02() since commit "4e59516a12a6 (kvm: vmx: ensure VMCS is current while enabling PML)" Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: vmx: rename __vmx_load_host_state() and vmx_save_host_state()Sean Christopherson
Now that the vmx_load_host_state() wrapper is gone, i.e. the only time we call the core functions is when we're actually about to switch between guest/host, rename the functions that handle lazy state switching to vmx_prepare_switch_to_{guest,host}_state() to better document the full extent of their functionality. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: vmx: add dedicated utility to access guest's kernel_gs_baseSean Christopherson
When lazy save/restore of MSR_KERNEL_GS_BASE was introduced[1], the MSR was intercepted in all modes and was only restored for the host when the guest is in 64-bit mode. So at the time, going through the full host restore prior to accessing MSR_KERNEL_GS_BASE was necessary to load host state and was not a significant waste of cycles. Later, MSR_KERNEL_GS_BASE interception was disabled for a 64-bit guest[2], and then unconditionally saved/restored for the host[3]. As a result, loading full host state is overkill for accesses to MSR_KERNEL_GS_BASE, and completely unnecessary when the guest is not in 64-bit mode. Add a dedicated utility to read/write the guest's MSR_KERNEL_GS_BASE (outside of the save/restore flow) to minimize the overhead incurred when accessing the MSR. When setting EFER, only decache the MSR if the new EFER will disable long mode. Removing out-of-band usage of vmx_load_host_state() also eliminates, or at least reduces, potential corner cases in its usage, which in turn will (hopefuly) make it easier to reason about future changes to the save/restore flow, e.g. optimization of saving host state. [1] commit 44ea2b1758d8 ("KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx autoload msr area") [2] commit 5897297bc228 ("KVM: VMX: Don't intercept MSR_KERNEL_GS_BASE") [3] commit c8770e7ba63b ("KVM: VMX: Fix host userspace gsbase corruption") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: vmx: track host_state.loaded using a loaded_vmcs pointerSean Christopherson
Using 'struct loaded_vmcs*' to track whether the CPU registers contain host or guest state kills two birds with one stone. 1. The (effective) boolean host_state.loaded is poorly named. It does not track whether or not host state is loaded into the CPU registers (which most readers would expect), but rather tracks if host state has been saved AND guest state is loaded. 2. Using a loaded_vmcs pointer provides a more robust framework for the optimized guest/host state switching, especially when consideration per-VMCS enhancements. To that end, WARN_ONCE if we try to switch to host state with a different VMCS than was last used to save host state. Resolve an occurrence of the new WARN by setting loaded_vmcs after the call to vmx_vcpu_put() in vmx_switch_vmcs(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: vmx: refactor segmentation code in vmx_save_host_state()Sean Christopherson
Use local variables in vmx_save_host_state() to temporarily track the selector and base values for FS and GS, and reorganize the code so that the 64-bit vs 32-bit portions are contained within a single #ifdef. This refactoring paves the way for future patches to modify the updating of VMCS state with minimal changes to the code, and (hopefully) simplifies resolving a likely conflict with another in-flight patch[1] by being the whipping boy for future patches. [1] https://www.spinics.net/lists/kvm/msg171647.html Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: nVMX: Fix fault priority for VMX operationsJim Mattson
When checking emulated VMX instructions for faults, the #UD for "IF (not in VMX operation)" should take precedence over the #GP for "ELSIF CPL > 0." Suggested-by: Eric Northup <digitaleric@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: nVMX: Fix fault vector for VMX operation at CPL > 0Jim Mattson
The fault that should be raised for a privilege level violation is #GP rather than #UD. Fixes: 727ba748e110b4 ("kvm: nVMX: Enforce cpl=0 for VMX instructions") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: try __get_user_pages_fast even if not in atomic contextPaolo Bonzini
We are currently cutting hva_to_pfn_fast short if we do not want an immediate exit, which is represented by !async && !atomic. However, this is unnecessary, and __get_user_pages_fast is *much* faster because the regular get_user_pages takes pmd_lock/pte_lock. In fact, when many CPUs take a nested vmexit at the same time the contention on those locks is visible, and this patch removes about 25% (compared to 4.18) from vmexit.flat on a 16 vCPU nested guest. Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: vmx: Add tlb_remote_flush callback supportTianyu Lan
Register tlb_remote_flush callback for vmx when hyperv capability of nested guest mapping flush is detected. The interface can help to reduce overhead when flush ept table among vcpus for nested VM. The tradition way is to send IPIs to all affected vcpus and executes INVEPT on each vcpus. It will trigger several vmexits for IPI and INVEPT emulation. Hyper-V provides such hypercall to do flush for all vcpus and call the hypercall when all ept table pointers of single VM are same. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: x86: Add tlb remote flush callback in kvm_x86_ops.Tianyu Lan
This patch is to provide a way for platforms to register hv tlb remote flush callback and this helps to optimize operation of tlb flush among vcpus for nested virtualization case. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06X86/Hyper-V: Add hyperv_nested_flush_guest_mapping ftrace supportTianyu Lan
This patch is to add hyperv_nested_flush_guest_mapping support to trace hvFlushGuestPhysicalAddressSpace hypercall. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06X86/Hyper-V: Add flush HvFlushGuestPhysicalAddressSpace hypercall supportTianyu Lan
Hyper-V supports a pv hypercall HvFlushGuestPhysicalAddressSpace to flush nested VM address space mapping in l1 hypervisor and it's to reduce overhead of flushing ept tlb among vcpus. This patch is to implement it. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06x86/kvm: Don't use pvqspinlock code if only 1 vCPUWaiman Long
On a VM with only 1 vCPU, the locking fast path will always be successful. In this case, there is no need to use the the PV qspinlock code which has higher overhead on the unlock side than the native qspinlock code. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM/MMU: Simplify __kvm_sync_page() functionTianyu Lan
Merge check of "sp->role.cr4_pae != !!is_pae(vcpu))" and "vcpu-> arch.mmu.sync_page(vcpu, sp) == 0". kvm_mmu_prepare_zap_page() is called under both these conditions. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Remove CR3_PCID_INVD flagJunaid Shahid
It is a duplicate of X86_CR3_PCID_NOFLUSH. So just use that instead. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Add multi-entry LRU cache for previous CR3sJunaid Shahid
Adds support for storing multiple previous CR3/root_hpa pairs maintained as an LRU cache, so that the lockless CR3 switch path can be used when switching back to any of them. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Flush only affected TLB entries in kvm_mmu_invlpg*Junaid Shahid
This needs a minor bug fix. The updated patch is as follows. Thanks, Junaid ------------------------------------------------------------------------------ kvm_mmu_invlpg() and kvm_mmu_invpcid_gva() only need to flush the TLB entries for the specific guest virtual address, instead of flushing all TLB entries associated with the VM. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Skip shadow page resync on CR3 switch when indicated by guestJunaid Shahid
When the guest indicates that the TLB doesn't need to be flushed in a CR3 switch, we can also skip resyncing the shadow page tables since an out-of-sync shadow page table is equivalent to an out-of-sync TLB. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Support selectively freeing either current or previous MMU rootJunaid Shahid
kvm_mmu_free_roots() now takes a mask specifying which roots to free, so that either one of the roots (active/previous) can be individually freed when needed. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Add a root_hpa parameter to kvm_mmu->invlpg()Junaid Shahid
This allows invlpg() to be called using either the active root_hpa or the prev_root_hpa. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Skip TLB flush on fast CR3 switch when indicated by guestJunaid Shahid
When PCIDs are enabled, the MSb of the source operand for a MOV-to-CR3 instruction indicates that the TLB doesn't need to be flushed. This change enables this optimization for MOV-to-CR3s in the guest that have been intercepted by KVM for shadow paging and are handled within the fast CR3 switch path. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: vmx: Support INVPCID in shadow paging modeJunaid Shahid
Implement support for INVPCID in shadow paging mode as well. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Propagate guest PCIDs to host PCIDsJunaid Shahid
When using shadow paging mode, propagate the guest's PCID value to the shadow CR3 in the host instead of always using PCID 0. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Add ability to skip TLB flush when switching CR3Junaid Shahid
Remove the implicit flush from the set_cr3 handlers, so that the callers are able to decide whether to flush the TLB or not. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Use fast CR3 switch for nested VMXJunaid Shahid
Use the fast CR3 switch mechanism to locklessly change the MMU root page when switching between L1 and L2. The switch from L2 to L1 should always go through the fast path, while the switch from L1 to L2 should go through the fast path if L1's CR3/EPTP for L2 hasn't changed since the last time. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Support resetting the MMU context without resetting rootsJunaid Shahid
This adds support for re-initializing the MMU context in a different mode while preserving the active root_hpa and the prev_root. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Add support for fast CR3 switch across different MMU modesJunaid Shahid
This generalizes the lockless CR3 switch path to be able to work across different MMU modes (e.g. nested vs non-nested) by checking that the expected page role of the new root page matches the page role of the previously stored root page in addition to checking that the new CR3 matches the previous CR3. Furthermore, instead of loading the hardware CR3 in fast_cr3_switch(), it is now done in vcpu_enter_guest(), as by that time the MMU context would be up-to-date with the VCPU mode. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Introduce KVM_REQ_LOAD_CR3Junaid Shahid
The KVM_REQ_LOAD_CR3 request loads the hardware CR3 using the current root_hpa. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Introduce kvm_mmu_calc_root_page_role()Junaid Shahid
These functions factor out the base role calculation from the corresponding kvm_init_*_mmu() functions. The new functions return what would be the role assigned to a root page in the current VCPU state. This can be masked with mmu_base_role_mask to derive the base role. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Add fast CR3 switch code pathJunaid Shahid
When using shadow paging, a CR3 switch in the guest results in a VM Exit. In the common case, that VM exit doesn't require much processing by KVM. However, it does acquire the MMU lock, which can start showing signs of contention under some workloads even on a 2 VCPU VM when the guest is using KPTI. Therefore, we add a fast path that avoids acquiring the MMU lock in the most common cases e.g. when switching back and forth between the kernel and user mode CR3s used by KPTI with no guest page table changes in between. For now, this fast path is implemented only for 64-bit guests and hosts to avoid the handling of PDPTEs, but it can be extended later to 32-bit guests and/or hosts as well. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Avoid taking MMU lock in kvm_mmu_sync_roots if no sync is neededJunaid Shahid
kvm_mmu_sync_roots() can locklessly check whether a sync is needed and just bail out if it isn't. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Make sync_page() flush remote TLBs once onlyJunaid Shahid
sync_page() calls set_spte() from a loop across a page table. It would work better if set_spte() left the TLB flushing to its callers, so that sync_page() can aggregate into a single call. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: MMU: drop vcpu param in gpte_accessPeter Xu
It's never used. Drop it. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Separate logic allocating shadow vmcs to a functionLiran Alon
No functionality change. This is done as a preparation for VMCS shadowing virtualization. Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: VMX: Mark vmcs header as shadow in case alloc_vmcs_cpu() allocate ↵Liran Alon
shadow vmcs No functionality change. Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Expose VMCS shadowing to L1 guestLiran Alon
Expose VMCS shadowing to L1 as a VMX capability of the virtual CPU, whether or not VMCS shadowing is supported by the physical CPU. (VMCS shadowing emulation) Shadowed VMREADs and VMWRITEs from L2 are handled by L0, without a VM-exit to L1. Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by ↵Liran Alon
vmcs12 vmread/vmwrite bitmaps This is done as a preparation for VMCS shadowing emulation. Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: vmread/vmwrite: Use shadow vmcs12 if running L2Liran Alon
This is done as a preparation to VMCS shadowing emulation. Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: selftests: add tests for shadow VMCS save/restorePaolo Bonzini
This includes setting up the shadow VMCS and the secondary execution controls in lib/vmx.c. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: include shadow vmcs12 in nested statePaolo Bonzini
The shadow vmcs12 cannot be flushed on KVM_GET_NESTED_STATE, because at that point guest memory is assumed by userspace to be immutable. Capture the cache in vmx_get_nested_state, adding another page at the end if there is an active shadow vmcs12. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Cache shadow vmcs12 on VMEntry and flush to memory on VMExitLiran Alon
This is done is done as a preparation to VMCS shadowing emulation. Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Verify VMCS shadowing VMCS link pointerLiran Alon
Intel SDM considers these checks to be part of "Checks on Guest Non-Register State". Note that it is legal for vmcs->vmcs_link_pointer to be -1ull when VMCS shadowing is enabled. In this case, any VMREAD/VMWRITE to shadowed-field sets the ALU flags for VMfailInvalid (i.e. CF=1). Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Verify VMCS shadowing controlsLiran Alon
Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Introduce nested_cpu_has_shadow_vmcs()Liran Alon
Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Fail VMLAUNCH and VMRESUME on shadow VMCSLiran Alon
Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Allow VMPTRLD for shadow VMCS if vCPU supports VMCS shadowingLiran Alon
Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>