summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-03-16KVM: Move memslot deletion to helper functionSean Christopherson
Move memslot deletion into its own routine so that the success path for other memslot updates does not need to use kvm_free_memslot(), i.e. can explicitly destroy the dirty bitmap when necessary. This paves the way for dropping @dont from kvm_free_memslot(), i.e. all callers now pass NULL for @dont. Add a comment above the code to make a copy of the existing memslot prior to deletion, it is not at all obvious that the pointer will become stale during sorting and/or installation of new memslots. Note, kvm_arch_commit_memory_region() allows an architecture to free resources when moving a memslot or changing its flags, e.g. x86 frees its arch specific memslot metadata during commit_memory_region(). Acked-by: Christoffer Dall <christoffer.dall@arm.com> Tested-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Free arrays for old memslot when moving memslot's base gfnSean Christopherson
Explicitly free the metadata arrays (stored in slot->arch) in the old memslot structure when moving the memslot's base gfn is committed. This eliminates x86's dependency on kvm_free_memslot() being called when a memslot move is committed, and paves the way for removing the funky code in kvm_free_memslot() that conditionally frees structures based on its @dont param. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: Drop "const" attribute from old memslot in commit_memory_region()Sean Christopherson
Drop the "const" attribute from @old in kvm_arch_commit_memory_region() to allow arch specific code to free arch specific resources in the old memslot without having to cast away the attribute. Freeing resources in kvm_arch_commit_memory_region() paves the way for simplifying kvm_free_memslot() by eliminating the last usage of its @dont param. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: Move setting of memslot into helper routineSean Christopherson
Split out the core functionality of setting a memslot into a separate helper in preparation for moving memslot deletion into its own routine. Tested-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: Refactor error handling for setting memory regionSean Christopherson
Replace a big pile o' gotos with returns to make it more obvious what error code is being returned, and to prepare for refactoring the functional, i.e. post-checks, portion of __kvm_set_memory_region(). Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: Explicitly free allocated-but-unused dirty bitmapSean Christopherson
Explicitly free an allocated-but-unused dirty bitmap instead of relying on kvm_free_memslot() if an error occurs in __kvm_set_memory_region(). There is no longer a need to abuse kvm_free_memslot() to free arch specific resources as arch specific code is now called only after the common flow is guaranteed to succeed. Arch code can still fail, but it's responsible for its own cleanup in that case. Eliminating the error path's abuse of kvm_free_memslot() paves the way for simplifying kvm_free_memslot(), i.e. dropping its @dont param. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: Drop kvm_arch_create_memslot()Sean Christopherson
Remove kvm_arch_create_memslot() now that all arch implementations are effectively nops. Removing kvm_arch_create_memslot() eliminates the possibility for arch specific code to allocate memory prior to setting a memslot, which sets the stage for simplifying kvm_free_memslot(). Cc: Janosch Frank <frankja@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Allocate memslot resources during prepare_memory_region()Sean Christopherson
Allocate the various metadata structures associated with a new memslot during kvm_arch_prepare_memory_region(), which paves the way for removing kvm_arch_create_memslot() altogether. Moving x86's memory allocation only changes the order of kernel memory allocations between x86 and common KVM code. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: PPC: Move memslot memory allocation into prepare_memory_region()Sean Christopherson
Allocate the rmap array during kvm_arch_prepare_memory_region() to pave the way for removing kvm_arch_create_memslot() altogether. Moving PPC's memory allocation only changes the order of kernel memory allocations between PPC and common KVM code. No functional change intended. Acked-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: Don't free new memslot if allocation of said memslot failsSean Christopherson
The two implementations of kvm_arch_create_memslot() in x86 and PPC are both good citizens and free up all local resources if creation fails. Return immediately (via a superfluous goto) instead of calling kvm_free_memslot(). Note, the call to kvm_free_memslot() is effectively an expensive nop in this case as there are no resources to be freed. No functional change intended. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: Reinstall old memslots if arch preparation failsSean Christopherson
Reinstall the old memslots if preparing the new memory region fails after invalidating a to-be-{re}moved memslot. Remove the superfluous 'old_memslots' variable so that it's somewhat clear that the error handling path needs to free the unused memslots, not simply the 'old' memslots. Fixes: bc6678a33d9b9 ("KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU update") Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Allocate new rmap and large page tracking when moving memslotSean Christopherson
Reallocate a rmap array and recalcuate large page compatibility when moving an existing memslot to correctly handle the alignment properties of the new memslot. The number of rmap entries required at each level is dependent on the alignment of the memslot's base gfn with respect to that level, e.g. moving a large-page aligned memslot so that it becomes unaligned will increase the number of rmap entries needed at the now unaligned level. Not updating the rmap array is the most obvious bug, as KVM accesses garbage data beyond the end of the rmap. KVM interprets the bad data as pointers, leading to non-canonical #GPs, unexpected #PFs, etc... general protection fault: 0000 [#1] SMP CPU: 0 PID: 1909 Comm: move_memory_reg Not tainted 5.4.0-rc7+ #139 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:rmap_get_first+0x37/0x50 [kvm] Code: <48> 8b 3b 48 85 ff 74 ec e8 6c f4 ff ff 85 c0 74 e3 48 89 d8 5b c3 RSP: 0018:ffffc9000021bbc8 EFLAGS: 00010246 RAX: ffff00617461642e RBX: ffff00617461642e RCX: 0000000000000012 RDX: ffff88827400f568 RSI: ffffc9000021bbe0 RDI: ffff88827400f570 RBP: 0010000000000000 R08: ffffc9000021bd00 R09: ffffc9000021bda8 R10: ffffc9000021bc48 R11: 0000000000000000 R12: 0030000000000000 R13: 0000000000000000 R14: ffff88827427d700 R15: ffffc9000021bce8 FS: 00007f7eda014700(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7ed9216ff8 CR3: 0000000274391003 CR4: 0000000000162eb0 Call Trace: kvm_mmu_slot_set_dirty+0xa1/0x150 [kvm] __kvm_set_memory_region.part.64+0x559/0x960 [kvm] kvm_set_memory_region+0x45/0x60 [kvm] kvm_vm_ioctl+0x30f/0x920 [kvm] do_vfs_ioctl+0xa1/0x620 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4c/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f7ed9911f47 Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 6f 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc00937498 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000001ab0010 RCX: 00007f7ed9911f47 RDX: 0000000001ab1350 RSI: 000000004020ae46 RDI: 0000000000000004 RBP: 000000000000000a R08: 0000000000000000 R09: 00007f7ed9214700 R10: 00007f7ed92149d0 R11: 0000000000000246 R12: 00000000bffff000 R13: 0000000000000003 R14: 00007f7ed9215000 R15: 0000000000000000 Modules linked in: kvm_intel kvm irqbypass ---[ end trace 0c5f570b3358ca89 ]--- The disallow_lpage tracking is more subtle. Failure to update results in KVM creating large pages when it shouldn't, either due to stale data or again due to indexing beyond the end of the metadata arrays, which can lead to memory corruption and/or leaking data to guest/userspace. Note, the arrays for the old memslot are freed by the unconditional call to kvm_free_memslot() in __kvm_set_memory_region(). Fixes: 05da45583de9b ("KVM: MMU: large page support") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Move gpa_val and gpa_available into the emulator contextSean Christopherson
Move the GPA tracking into the emulator context now that the context is guaranteed to be initialized via __init_emulate_ctxt() prior to dereferencing gpa_{available,val}, i.e. now that seeing a stale gpa_available will also trigger a WARN due to an invalid context. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Add EMULTYPE_PF when emulation is triggered by a page faultSean Christopherson
Add a new emulation type flag to explicitly mark emulation related to a page fault. Move the propation of the GPA into the emulator from the page fault handler into x86_emulate_instruction, using EMULTYPE_PF as an indicator that cr2 is valid. Similarly, don't propagate cr2 into the exception.address when it's *not* valid. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: apic: remove unused function apic_lvt_vector()Miaohe Lin
The function apic_lvt_vector() is unused now, remove it. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: VMX: Add 'else' to split mutually exclusive caseMiaohe Lin
Each if branch in handle_external_interrupt_irqoff() is mutually exclusive. Add 'else' to make it clear and also avoid some unnecessary check. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: eliminate some unreachable codeMiaohe Lin
These code are unreachable, remove them. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Fix print format and coding styleMiaohe Lin
Use %u to print u32 var and correct some coding style. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: vmx: rewrite the comment in vmx_get_mt_maskChia-I Wu
Better reflect the structure of the code and metion why we could not always honor the guest. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Cc: Gurchetan Singh <gurchetansingh@chromium.org> Cc: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: selftests: Convert some printf's to pr_info'sAndrew Jones
We leave some printf's because they inform the user the test is being skipped. QUIET should not disable those. We also leave the printf's used for help text. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: selftests: Rework debug message printingAndrew Jones
There were a few problems with the way we output "debug" messages. The first is that we used DEBUG() which is defined when NDEBUG is not defined, but NDEBUG will never be defined for kselftests because it relies too much on assert(). The next is that most of the DEBUG() messages were actually "info" messages, which users may want to turn off if they just want a silent test that either completes or asserts. Finally, a debug message output from a library function, and thus for all tests, was annoying when its information wasn't interesting for a test. Rework these messages so debug messages only output when DEBUG is defined and info messages output unless QUIET is defined. Also name the functions pr_debug and pr_info and make sure that when they're disabled we eat all the inputs. The later avoids unused variable warnings when the variables were only defined for the purpose of printing. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: selftests: Time guest demand pagingBen Gardon
In order to quantify demand paging performance, time guest execution during demand paging. Signed-off-by: Ben Gardon <bgardon@google.com> [Move timespec-diff to test_util.h] Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: selftests: Support multiple vCPUs in demand paging testBen Gardon
Most VMs have multiple vCPUs, the concurrent execution of which has a substantial impact on demand paging performance. Add an option to create multiple vCPUs to each access disjoint regions of memory. Signed-off-by: Ben Gardon <bgardon@google.com> [guest_code() can't return, use GUEST_ASSERT(). Ensure the number of guests pages is compatible with the host.] Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: selftests: Add support for vcpu_args_set to aarch64 and s390xBen Gardon
Currently vcpu_args_set is only implemented for x86. This makes writing tests with multiple vCPUs difficult as each guest vCPU must either a.) do the same thing or b.) derive some kind of unique token from it's registers or the architecture. To simplify the process of writing tests with multiple vCPUs for s390 and aarch64, add set args functions for those architectures. Signed-off-by: Ben Gardon <bgardon@google.com> [Fixed array index (num => i) and made some style changes.] Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: selftests: Pass args to vCPU in global vCPU args structBen Gardon
In preparation for supporting multiple vCPUs in the demand paging test, pass arguments to the vCPU in a consolidated global struct instead of syncing multiple globals. Signed-off-by: Ben Gardon <bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: selftests: Add memory size parameter to the demand paging testBen Gardon
Add an argument to allow the demand paging test to work on larger and smaller guest sizes. Signed-off-by: Ben Gardon <bgardon@google.com> [Rewrote parse_size() to simplify and provide user more flexibility as to how sizes are input. Also fixed size overflow assert.] Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: selftests: Add configurable demand paging delayBen Gardon
When running the demand paging test with the -u option, the User Fault FD handler essentially adds an arbitrary delay to page fault resolution. To enable better simulation of a real demand paging scenario, add a configurable delay to the UFFD handler. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Ben Gardon <bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: selftests: Add demand paging content to the demand paging testBen Gardon
The demand paging test is currently a simple page access test which, while potentially useful, doesn't add much versus the existing dirty logging test. To improve the demand paging test, add a basic userfaultfd demand paging implementation. Signed-off-by: Ben Gardon <bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-24KVM: selftests: Create a demand paging testBen Gardon
While userfaultfd, KVM's demand paging implementation, is not specific to KVM, having a benchmark for its performance will be useful for guiding performance improvements to KVM. As a first step towards creating a userfaultfd demand paging test, create a simple memory access test, based on dirty_log_test. Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-24KVM: selftests: Introduce num-pages conversion utilitiesAndrew Jones
Guests and hosts don't have to have the same page size. This means calculations are necessary when selecting the number of guest pages to allocate in order to ensure the number is compatible with the host. Provide utilities to help with those calculations and apply them where appropriate. We also revert commit bffed38d4fb5 ("kvm: selftests: aarch64: dirty_log_test: fix unaligned memslot size") and then use vm_adjust_num_guest_pages() there instead. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-24KVM: selftests: Introduce vm_guest_mode_paramsAndrew Jones
This array will allow us to easily translate modes to their parameter values. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-24KVM: selftests: Rename vm_guest_mode_paramsAndrew Jones
We're going to want this name in the library code, so use a shorter name in the tests. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-24KVM: selftests: aarch64: Remove unnecessary ifdefsAndrew Jones
Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-24KVM: selftests: Remove unnecessary definesAndrew Jones
BITS_PER_LONG and friends are provided by linux/bitops.h Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-24KVM: selftests: aarch64: Use stream when givenAndrew Jones
I'm not sure how we ended up using printf instead of fprintf in virt_dump(). Fix it. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-24KVM: s390: rstify new ioctls in api.rstChristian Borntraeger
We also need to rstify the new ioctls that we added in parallel to the rstification of the kvm docs. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: nVMX: Check IO instruction VM-exit conditionsOliver Upton
Consult the 'unconditional IO exiting' and 'use IO bitmaps' VM-execution controls when checking instruction interception. If the 'use IO bitmaps' VM-execution control is 1, check the instruction access against the IO bitmaps to determine if the instruction causes a VM-exit. Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: nVMX: Refactor IO bitmap checks into helper functionOliver Upton
Checks against the IO bitmap are useful for both instruction emulation and VM-exit reflection. Refactor the IO bitmap checks into a helper function. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: nVMX: Don't emulate instructions in guest modePaolo Bonzini
vmx_check_intercept is not yet fully implemented. To avoid emulating instructions disallowed by the L1 hypervisor, refuse to emulate instructions by default. Cc: stable@vger.kernel.org [Made commit, added commit msg - Oliver] Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: nVMX: Emulate MTF when performing instruction emulationOliver Upton
Since commit 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap flag processor-based execution control for its L2 guest. KVM simply forwards any MTF VM-exits to the L1 guest, which works for normal instruction execution. However, when KVM needs to emulate an instruction on the behalf of an L2 guest, the monitor trap flag is not emulated. Add the necessary logic to kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon instruction emulation for L2. Fixes: 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: fix error handling in svm_hardware_setupLi RongQing
rename svm_hardware_unsetup as svm_hardware_teardown, move it before svm_hardware_setup, and call it to free all memory if fail to setup in svm_hardware_setup, otherwise memory will be leaked remove __exit attribute for it since it is called in __init function Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: SVM: Fix potential memory leak in svm_cpu_init()Miaohe Lin
When kmalloc memory for sd->sev_vmcbs failed, we forget to free the page held by sd->save_area. Also get rid of the var r as '-ENOMEM' is actually the only possible outcome here. Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: apic: avoid calculating pending eoi from an uninitialized valMiaohe Lin
When pv_eoi_get_user() fails, 'val' may remain uninitialized and the return value of pv_eoi_get_pending() becomes random. Fix the issue by initializing the variable. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when ↵Vitaly Kuznetsov
apicv is globally disabled When apicv is disabled on a vCPU (e.g. by enabling KVM_CAP_HYPERV_SYNIC*), nothing happens to VMX MSRs on the already existing vCPUs, however, all new ones are created with PIN_BASED_POSTED_INTR filtered out. This is very confusing and results in the following picture inside the guest: $ rdmsr -ax 0x48d ff00000016 7f00000016 7f00000016 7f00000016 This is observed with QEMU and 4-vCPU guest: QEMU creates vCPU0, does KVM_CAP_HYPERV_SYNIC2 and then creates the remaining three. L1 hypervisor may only check CPU0's controls to find out what features are available and it will be very confused later. Switch to setting PIN_BASED_POSTED_INTR control based on global 'enable_apicv' setting. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1Vitaly Kuznetsov
Even when APICv is disabled for L1 it can (and, actually, is) still available for L2, this means we need to always call vmx_deliver_nested_posted_interrupt() when attempting an interrupt delivery. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21kvm: x86: svm: Fix NULL pointer dereference when AVIC not enabledSuravee Suthikulpanit
Launching VM w/ AVIC disabled together with pass-through device results in NULL pointer dereference bug with the following call trace. RIP: 0010:svm_refresh_apicv_exec_ctrl+0x17e/0x1a0 [kvm_amd] Call Trace: kvm_vcpu_update_apicv+0x44/0x60 [kvm] kvm_arch_vcpu_ioctl_run+0x3f4/0x1c80 [kvm] kvm_vcpu_ioctl+0x3d8/0x650 [kvm] do_vfs_ioctl+0xaa/0x660 ? tomoyo_file_ioctl+0x19/0x20 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x57/0x190 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Investigation shows that this is due to the uninitialized usage of struct vapu_svm.ir_list in the svm_set_pi_irte_mode(), which is called from svm_refresh_apicv_exec_ctrl(). The ir_list is initialized only if AVIC is enabled. So, fixes by adding a check if AVIC is enabled in the svm_refresh_apicv_exec_ctrl(). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206579 Fixes: 8937d762396d ("kvm: x86: svm: Add support to (de)activate posted interrupts.") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: VMX: Add VMX_FEATURE_USR_WAIT_PAUSEXiaoyao Li
Commit 159348784ff0 ("x86/vmx: Introduce VMX_FEATURES_*") missed bit 26 (enable user wait and pause) of Secondary Processor-based VM-Execution Controls. Add VMX_FEATURE_USR_WAIT_PAUSE flag so that it shows up in /proc/cpuinfo, and use it to define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE to make them uniform. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadowwanpeng li
For the duration of mapping eVMCS, it derefences ->memslots without holding ->srcu or ->slots_lock when accessing hv assist page. This patch fixes it by moving nested_sync_vmcs12_to_shadow to prepare_guest_switch, where the SRCU is already taken. It can be reproduced by running kvm's evmcs_test selftest. ============================= warning: suspicious rcu usage 5.6.0-rc1+ #53 tainted: g w ioe ----------------------------- ./include/linux/kvm_host.h:623 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by evmcs_test/8507: #0: ffff9ddd156d00d0 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x680 [kvm] stack backtrace: cpu: 6 pid: 8507 comm: evmcs_test tainted: g w ioe 5.6.0-rc1+ #53 hardware name: dell inc. optiplex 7040/0jctf8, bios 1.4.9 09/12/2016 call trace: dump_stack+0x68/0x9b kvm_read_guest_cached+0x11d/0x150 [kvm] kvm_hv_get_assist_page+0x33/0x40 [kvm] nested_enlightened_vmentry+0x2c/0x60 [kvm_intel] nested_vmx_handle_enlightened_vmptrld.part.52+0x32/0x1c0 [kvm_intel] nested_sync_vmcs12_to_shadow+0x439/0x680 [kvm_intel] vmx_vcpu_run+0x67a/0xe60 [kvm_intel] vcpu_enter_guest+0x35e/0x1bc0 [kvm] kvm_arch_vcpu_ioctl_run+0x40b/0x670 [kvm] kvm_vcpu_ioctl+0x370/0x680 [kvm] ksys_ioctl+0x235/0x850 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x77/0x780 entry_syscall_64_after_hwframe+0x49/0xbe Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOIMiaohe Lin
Commit 13db77347db1 ("KVM: x86: don't notify userspace IOAPIC on edge EOI") said, edge-triggered interrupts don't set a bit in TMR, which means that IOAPIC isn't notified on EOI. And var level indicates level-triggered interrupt. But commit 3159d36ad799 ("KVM: x86: use generic function for MSI parsing") replace var level with irq.level by mistake. Fix it by changing irq.level to irq.trig_mode. Cc: stable@vger.kernel.org Fixes: 3159d36ad799 ("KVM: x86: use generic function for MSI parsing") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-20kvm/emulate: fix a -Werror=cast-function-typeQian Cai
arch/x86/kvm/emulate.c: In function 'x86_emulate_insn': arch/x86/kvm/emulate.c:5686:22: error: cast between incompatible function types from 'int (*)(struct x86_emulate_ctxt *)' to 'void (*)(struct fastop *)' [-Werror=cast-function-type] rc = fastop(ctxt, (fastop_t)ctxt->execute); Fix it by using an unnamed union of a (*execute) function pointer and a (*fastop) function pointer. Fixes: 3009afc6e39e ("KVM: x86: Use a typedef for fastop functions") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>