summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-09-03KVM: mmio: cleanup kvm_set_mmio_spte_maskTiejun Chen
Just reuse rsvd_bits() inside kvm_set_mmio_spte_mask() for slightly better code. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-03kvm: x86: fix stale mmio cache bugDavid Matlack
The following events can lead to an incorrect KVM_EXIT_MMIO bubbling up to userspace: (1) Guest accesses gpa X without a memory slot. The gfn is cached in struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets the SPTE write-execute-noread so that future accesses cause EPT_MISCONFIGs. (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION covering the page just accessed. (3) Guest attempts to read or write to gpa X again. On Intel, this generates an EPT_MISCONFIG. The memory slot generation number that was incremented in (2) would normally take care of this but we fast path mmio faults through quickly_check_mmio_pf(), which only checks the per-vcpu mmio cache. Since we hit the cache, KVM passes a KVM_EXIT_MMIO up to userspace. This patch fixes the issue by using the memslot generation number to validate the mmio cache. Cc: stable@vger.kernel.org Signed-off-by: David Matlack <dmatlack@google.com> [xiaoguangrong: adjust the code to make it simpler for stable-tree fix.] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Tested-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-03kvm: fix potentially corrupt mmio cacheDavid Matlack
vcpu exits and memslot mutations can run concurrently as long as the vcpu does not aquire the slots mutex. Thus it is theoretically possible for memslots to change underneath a vcpu that is handling an exit. If we increment the memslot generation number again after synchronize_srcu_expedited(), vcpus can safely cache memslot generation without maintaining a single rcu_dereference through an entire vm exit. And much of the x86/kvm code does not maintain a single rcu_dereference of the current memslots during each exit. We can prevent the following case: vcpu (CPU 0) | thread (CPU 1) --------------------------------------------+-------------------------- 1 vm exit | 2 srcu_read_unlock(&kvm->srcu) | 3 decide to cache something based on | old memslots | 4 | change memslots | (increments generation) 5 | synchronize_srcu(&kvm->srcu); 6 retrieve generation # from new memslots | 7 tag cache with new memslot generation | 8 srcu_read_unlock(&kvm->srcu) | ... | <action based on cache occurs even | though the caching decision was based | on the old memslots> | ... | <action *continues* to occur until next | memslot generation change, which may | be never> | | By incrementing the generation after synchronizing with kvm->srcu readers, we ensure that the generation retrieved in (6) will become invalid soon after (8). Keeping the existing increment is not strictly necessary, but we do keep it and just move it for consistency from update_memslots to install_new_memslots. It invalidates old cached MMIOs immediately, instead of having to wait for the end of synchronize_srcu_expedited, which makes the code more clearly correct in case CPU 1 is preempted right after synchronize_srcu() returns. To avoid halving the generation space in SPTEs, always presume that the low bit of the generation is zero when reconstructing a generation number out of an SPTE. This effectively disables MMIO caching in SPTEs during the call to synchronize_srcu_expedited. Using the low bit this way is somewhat like a seqcount---where the protected thing is a cache, and instead of retrying we can simply punt if we observe the low bit to be 1. Cc: stable@vger.kernel.org Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-03KVM: do not bias the generation number in kvm_current_mmio_generationPaolo Bonzini
The next patch will give a meaning (a la seqcount) to the low bit of the generation number. Ensure that it matches between kvm->memslots->generation and kvm_current_mmio_generation(). Cc: stable@vger.kernel.org Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29KVM: x86: use guest maxphyaddr to check MTRR valuesPaolo Bonzini
The check introduced in commit d7a2a246a1b5 (KVM: x86: #GP when attempts to write reserved bits of Variable Range MTRRs, 2014-08-19) will break if the guest maxphyaddr is higher than the host's (which sometimes happens depending on your hardware and how QEMU is configured). To fix this, use cpuid_maxphyaddr similar to how the APIC_BASE MSR does already. Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Tested-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29KVM: remove garbage arg to *hardware_{en,dis}ableRadim Krčmář
In the beggining was on_each_cpu(), which required an unused argument to kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten. Remove unnecessary arguments that stem from this. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29KVM: static inline empty kvm_arch functionsRadim Krčmář
Using static inline is going to save few bytes and cycles. For example on powerpc, the difference is 700 B after stripping. (5 kB before) This patch also deals with two overlooked empty functions: kvm_arch_flush_shadow was not removed from arch/mips/kvm/mips.c 2df72e9bc KVM: split kvm_arch_flush_shadow and kvm_arch_sched_in never made it into arch/ia64/kvm/kvm-ia64.c. e790d9ef6 KVM: add kvm_arch_sched_in Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29KVM: forward declare structs in kvm_types.hPaolo Bonzini
Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid "'struct foo' declared inside parameter list" warnings (and consequent breakage due to conflicting types). Move them from individual files to a generic place in linux/kvm_types.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29KVM: x86: remove Aligned bit from movntps/movntpdPaolo Bonzini
These are not explicitly aligned, and do not require alignment on AVX. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29KVM: x86 emulator: emulate MOVNTDQAlex Williamson
Windows 8.1 guest with NVIDIA driver and GPU fails to boot with an emulation failure. The KVM spew suggests the fault is with lack of movntdq emulation (courtesy of Paolo): Code=02 00 00 b8 08 00 00 00 f3 0f 6f 44 0a f0 f3 0f 6f 4c 0a e0 <66> 0f e7 41 f0 66 0f e7 49 e0 48 83 e9 40 f3 0f 6f 44 0a 10 f3 0f 6f 0c 0a 66 0f e7 41 10 $ as -o a.out .section .text .byte 0x66, 0x0f, 0xe7, 0x41, 0xf0 .byte 0x66, 0x0f, 0xe7, 0x49, 0xe0 $ objdump -d a.out 0: 66 0f e7 41 f0 movntdq %xmm0,-0x10(%rcx) 5: 66 0f e7 49 e0 movntdq %xmm1,-0x20(%rcx) Add the necessary emulation. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29KVM: vmx: VMXOFF emulation in vm86 should cause #UDNadav Amit
Unlike VMCALL, the instructions VMXOFF, VMLAUNCH and VMRESUME should cause a UD exception in real-mode or vm86. However, the emulator considers all these instructions the same for the matter of mode checks, and emulation upon exit due to #UD exception. As a result, the hypervisor behaves incorrectly on vm86 mode. VMXOFF, VMLAUNCH or VMRESUME cause on vm86 exit due to #UD. The hypervisor then emulates these instruction and inject #GP to the guest instead of #UD. This patch creates a new group for these instructions and mark only VMCALL as an instruction which can be emulated. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29KVM: x86: fix some sparse warningsPaolo Bonzini
Sparse reports the following easily fixed warnings: arch/x86/kvm/vmx.c:8795:48: sparse: Using plain integer as NULL pointer arch/x86/kvm/vmx.c:2138:5: sparse: symbol vmx_read_l1_tsc was not declared. Should it be static? arch/x86/kvm/vmx.c:6151:48: sparse: Using plain integer as NULL pointer arch/x86/kvm/vmx.c:8851:6: sparse: symbol vmx_sched_in was not declared. Should it be static? arch/x86/kvm/svm.c:2162:5: sparse: symbol svm_read_l1_tsc was not declared. Should it be static? Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29KVM: nVMX: nested TPR shadow/threshold emulationWanpeng Li
This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=61411 TPR shadow/threshold feature is important to speed up the Windows guest. Besides, it is a must feature for certain VMM. We map virtual APIC page address and TPR threshold from L1 VMCS. If TPR_BELOW_THRESHOLD VM exit is triggered by L2 guest and L1 interested in, we inject it into L1 VMM for handling. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> [Add PAGE_ALIGNED check, do not write useless virtual APIC page address if TPR shadowing is disabled. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29KVM: nVMX: introduce nested_get_vmcs12_pagesWanpeng Li
Introduce function nested_get_vmcs12_pages() to check the valid of nested apic access page and virtual apic page earlier. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29KVM: Unconditionally export KVM_CAP_USER_NMIChristoffer Dall
The idea between capabilities and the KVM_CHECK_EXTENSION ioctl is that userspace can, at run-time, determine if a feature is supported or not. This allows KVM to being supporting a new feature with a new kernel version without any need to update user space. Unfortunately, since the definition of KVM_CAP_USER_NMI was guarded by #ifdef __KVM_HAVE_USER_NMI, such discovery still required a user space update. Therefore, unconditionally export KVM_CAP_USER_NMI and change the the typo in the comment for the IOCTL number definition as well. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29KVM: Unconditionally export KVM_CAP_READONLY_MEMChristoffer Dall
The idea between capabilities and the KVM_CHECK_EXTENSION ioctl is that userspace can, at run-time, determine if a feature is supported or not. This allows KVM to being supporting a new feature with a new kernel version without any need to update user space. Unfortunately, since the definition of KVM_CAP_READONLY_MEM was guarded by #ifdef __KVM_HAVE_READONLY_MEM, such discovery still required a user space update. Therefore, unconditionally export KVM_CAP_READONLY_MEM and change the in-kernel conditional to rely on __KVM_HAVE_READONLY_MEM. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29KVM: s390/mm: fix up indentation of set_guest_storage_keyChristian Borntraeger
commit ab3f285f227f ("KVM: s390/mm: try a cow on read only pages for key ops")' misaligned a code block. Let's fixup the indentation. Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-26Merge tag 'kvm-s390-next-20140825' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes and features for 3.18 part 1 1. The usual cleanups: get rid of duplicate code, use defines, factor out the sync_reg handling, additional docs for sync_regs, better error handling on interrupt injection 2. We use KVM_REQ_TLB_FLUSH instead of open coding tlb flushes 3. Additional registers for kvm_run sync regs. This is usually not needed in the fast path due to eventfd/irqfd, but kvm stat claims that we reduced the overhead of console output by ~50% on my system 4. A rework of the gmap infrastructure. This is the 2nd step towards host large page support (after getting rid of the storage key dependency). We introduces two radix trees to store the guest-to-host and host-to-guest translations. This gets us rid of most of the page-table walks in the gmap code. Only one in __gmap_link is left, this one is required to link the shadow page table to the process page table. Finally this contains the plumbing to support gmap page tables with less than 5 levels.
2014-08-26KVM: s390/mm: remove outdated gmap data structuresMartin Schwidefsky
The radix tree rework removed all code that uses the gmap_rmap and gmap_pgtable data structures. Remove these outdated definitions. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-26KVM: s390/mm: support gmap page tables with less than 5 levelsMartin Schwidefsky
Add an addressing limit to the gmap address spaces and only allocate the page table levels that are needed for the given limit. The limit is fixed and can not be changed after a gmap has been created. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-26KVM: s390/mm: use radix trees for guest to host mappingsMartin Schwidefsky
Store the target address for the gmap segments in a radix tree instead of using invalid segment table entries. gmap_translate becomes a simple radix_tree_lookup, gmap_fault is split into the address translation with gmap_translate and the part that does the linking of the gmap shadow page table with the process page table. A second radix tree is used to keep the pointers to the segment table entries for segments that are mapped in the guest address space. On unmap of a segment the pointer is retrieved from the radix tree and is used to carry out the segment invalidation in the gmap shadow page table. As the radix tree can only store one pointer, each host segment may only be mapped to exactly one guest location. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25kvm: x86: fix tracing for 32-bitPaolo Bonzini
Fix commit 7b46268d29543e313e731606d845e65c17f232e4, which mistakenly included the new tracepoint under #ifdef CONFIG_X86_64. Reported-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-25Merge tag 'kvm-s390-20140825' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD Here are two fixes for s390 KVM code that prevent: 1. a malicious user to trigger a kernel BUG 2. a malicious user to change the storage key of read-only pages
2014-08-25KVM: s390/mm: cleanup gmap function arguments, variable namesMartin Schwidefsky
Make the order of arguments for the gmap calls more consistent, if the gmap pointer is passed it is always the first argument. In addition distinguish between guest address and user address by naming the variables gaddr for a guest address and vmaddr for a user address. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25KVM: s390/mm: readd address parameter to gmap_do_ipte_notifyMartin Schwidefsky
Revert git commit c3a23b9874c1 ("remove unnecessary parameter from gmap_do_ipte_notify"). Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25KVM: s390/mm: readd address parameter to pgste_ipte_notifyMartin Schwidefsky
Revert git commit 1b7fd6952063 ("remove unecessary parameter from pgste_ipte_notify") Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25KVM: s390: don't use kvm lock in interrupt injection codeJens Freimann
The kvm lock protects us against vcpus going away, but they only go away when the virtual machine is shut down. We don't need this mutex here, so let's get rid of it. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25KVM: s390: return -EFAULT if lowcore is not mapped during irq deliveryJens Freimann
Currently we just kill the userspace process and exit the thread immediatly without making sure that we don't hold any locks etc. Improve this by making KVM_RUN return -EFAULT if the lowcore is not mapped during interrupt delivery. To achieve this we need to pass the return code of guest memory access routines used in interrupt delivery all the way back to the KVM_RUN ioctl. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25KVM: s390: implement KVM_REQ_TLB_FLUSH and make use of itDavid Hildenbrand
Use the KVM_REQ_TLB_FLUSH request in order to trigger tlb flushes instead of manipulating the SIE control block whenever we need it. Also trigger it for a control register sync directly instead of (ab)using kvm_s390_set_prefix(). Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25KVM: s390: synchronize more registers with kvm_runDavid Hildenbrand
In order to reduce the number of syscalls when dropping to user space, this patch enables the synchronization of the following "registers" with kvm_run: - ARCH0: CPU timer, clock comparator, TOD programmable register, guest breaking-event register, program parameter - PFAULT: pfault parameters (token, select, compare) The registers are grouped to reduce the overhead when syncing. As this grows the number of sync registers quite a bit, let's move the code synchronizing registers with kvm_run from kvm_arch_vcpu_ioctl_run() into separate helper routines. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25KVM: s390: no special machine check deliveryChristian Borntraeger
The load PSW handler does not have to inject pending machine checks. This can wait until the CPU runs the generic interrupt injection code. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-08-25KVM: s390: clear kvm_dirty_regs when dropping to user spaceDavid Hildenbrand
We should make sure that all kvm_dirty_regs bits are cleared before dropping to user space. Until now, some would remain pending. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25KVM: clarify the idea of kvm_dirty_regsDavid Hildenbrand
This patch clarifies that kvm_dirty_regs are just a hint to the kernel and that the kernel might just ignore some flags and sync the values (like done for acrs and gprs now). Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25KVM: s390: factor out get_ilc() functionJens Freimann
Let's make this a reusable function. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25KVM: s390/mm: try a cow on read only pages for key opsChristian Borntraeger
The PFMF instruction handler blindly wrote the storage key even if the page was mapped R/O in the host. Lets try a COW before continuing and bail out in case of errors. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: stable@vger.kernel.org
2014-08-25KVM: s390: add defines for pfault init delivery codeJens Freimann
Get rid of open coded values for pfault init. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25KVM: s390: Fix user triggerable bug in dead codeChristian Borntraeger
In the early days, we had some special handling for the KVM_EXIT_S390_SIEIC exit, but this was gone in 2009 with commit d7b0b5eb3000 (KVM: s390: Make psw available on all exits, not just a subset). Now this switch statement is just a sanity check for userspace not messing with the kvm_run structure. Unfortunately, this allows userspace to trigger a kernel BUG. Let's just remove this switch statement. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: stable@vger.kernel.org
2014-08-21KVM: trace kvm_ple_window grow/shrinkRadim Krčmář
Tracepoint for dynamic PLE window, fired on every potential change. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-21KVM: VMX: dynamise PLE windowRadim Krčmář
Window is increased on every PLE exit and decreased on every sched_in. The idea is that we don't want to PLE exit if there is no preemption going on. We do this with sched_in() because it does not hold rq lock. There are two new kernel parameters for changing the window: ple_window_grow and ple_window_shrink ple_window_grow affects the window on PLE exit and ple_window_shrink does it on sched_in; depending on their value, the window is modifier like this: (ple_window is kvm_intel's global) ple_window_shrink/ | ple_window_grow | PLE exit | sched_in -------------------+--------------------+--------------------- < 1 | = ple_window | = ple_window < ple_window | *= ple_window_grow | /= ple_window_shrink otherwise | += ple_window_grow | -= ple_window_shrink A third new parameter, ple_window_max, controls the maximal ple_window; it is internally rounded down to a closest multiple of ple_window_grow. VCPU's PLE window is never allowed below ple_window. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-21KVM: VMX: make PLE window per-VCPURadim Krčmář
Change PLE window into per-VCPU variable, seeded from module parameter, to allow greater flexibility. Brings in a small overhead on every vmentry. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-21KVM: x86: introduce sched_in to kvm_x86_opsRadim Krčmář
sched_in preempt notifier is available for x86, allow its use in specific virtualization technlogies as well. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-21KVM: add kvm_arch_sched_inRadim Krčmář
Introduce preempt notifiers for architecture specific code. Advantage over creating a new notifier in every arch is slightly simpler code and guaranteed call order with respect to kvm_sched_in. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-21KVM: x86: Replace X86_FEATURE_NX offset with the definitionNadav Amit
Replace reference to X86_FEATURE_NX using bit shift with the defined X86_FEATURE_NX. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-21KVM: avoid unnecessary synchronize_rcuChristian Borntraeger
We dont have to wait for a grace period if there is no oldpid that we are going to free. putpid also checks for NULL, so this patch only fences synchronize_rcu. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-20KVM: emulate: warn on invalid or uninitialized exception numbersPaolo Bonzini
These were reported when running Jailhouse on AMD processors. Initialize ctxt->exception.vector with an invalid exception number, and warn if it remained invalid even though the emulator got an X86EMUL_PROPAGATE_FAULT return code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-20KVM: emulate: do not return X86EMUL_PROPAGATE_FAULT explicitlyPaolo Bonzini
Always get it through emulate_exception or emulate_ts. This ensures that the ctxt->exception fields have been populated. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-20KVM: x86: Clarify PMU related features bit manipulationNadav Amit
kvm_pmu_cpuid_update makes a lot of bit manuiplation operations, when in fact there are already unions that can be used instead. Changing the bit manipulation to the union for clarity. This patch does not change the functionality. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-20KVM: vmx: fix ept reserved bits for 1-GByte pageWanpeng Li
EPT misconfig handler in kvm will check which reason lead to EPT misconfiguration after vmexit. One of the reasons is that an EPT paging-structure entry is configured with settings reserved for future functionality. However, the handler can't identify if paging-structure entry of reserved bits for 1-GByte page are configured, since PDPTE which point to 1-GByte page will reserve bits 29:12 instead of bits 7:3 which are reserved for PDPTE that references an EPT Page Directory. This patch fix it by reserve bits 29:12 for 1-GByte page. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19KVM: x86: recalculate_apic_map after enabling apicNadav Amit
Currently, recalculate_apic_map ignores vcpus whose lapic is software disabled through the spurious interrupt vector. However, once it is re-enabled, the map is not recalculated. Therefore, if the guest OS configured DFR while lapic is software-disabled, the map may be incorrect. This patch recalculates apic map after software enabling the lapic. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19KVM: x86: Clear apic tsc-deadline after deadlineNadav Amit
Intel SDM 10.5.4.1 says "When the timer generates an interrupt, it disarms itself and clears the IA32_TSC_DEADLINE MSR". This patch clears the MSR upon timer interrupt delivery which delivered on deadline mode. Since the MSR may be reconfigured while an interrupt is pending, causing the new value to be overriden, pending timer interrupts are checked before setting a new deadline. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>