summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2015-11-10KVM: VMX: Dump TSC multiplier in dump_vmcs()Haozhong Zhang
This patch enhances dump_vmcs() to dump the value of TSC multiplier field in VMCS. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSCHaozhong Zhang
This patch makes kvm-intel to return a scaled host TSC plus the TSC offset when handling guest readings to MSR_IA32_TSC. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10KVM: VMX: Setup TSC scaling ratio when a vcpu is loadedHaozhong Zhang
This patch makes kvm-intel module to load TSC scaling ratio into TSC multiplier field of VMCS when a vcpu is loaded, so that TSC scaling ratio can take effect if VMX TSC scaling is enabled. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10KVM: VMX: Enable and initialize VMX TSC scalingHaozhong Zhang
This patch exhances kvm-intel module to enable VMX TSC scaling and collects information of TSC scaling ratio during initialization. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10KVM: x86: Use the correct vcpu's TSC rate to compute time scaleHaozhong Zhang
This patch makes KVM use virtual_tsc_khz rather than the host TSC rate as vcpu's TSC rate to compute the time scale if TSC scaling is enabled. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc()Haozhong Zhang
Both VMX and SVM scales the host TSC in the same way in call-back read_l1_tsc(), so this patch moves the scaling logic from call-back read_l1_tsc() to a common function kvm_read_l1_tsc(). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset()Haozhong Zhang
For both VMX and SVM, if the 2nd argument of call-back adjust_tsc_offset() is the host TSC, then adjust_tsc_offset() will scale it first. This patch moves this common TSC scaling logic to its caller adjust_tsc_offset_host() and rename the call-back adjust_tsc_offset() to adjust_tsc_offset_guest(). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10KVM: x86: Replace call-back compute_tsc_offset() with a common functionHaozhong Zhang
Both VMX and SVM calculate the tsc-offset in the same way, so this patch removes the call-back compute_tsc_offset() and replaces it with a common function kvm_compute_tsc_offset(). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10KVM: x86: Replace call-back set_tsc_khz() with a common functionHaozhong Zhang
Both VMX and SVM propagate virtual_tsc_khz in the same way, so this patch removes the call-back set_tsc_khz() and replaces it with a common function. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10KVM: x86: Add a common TSC scaling functionHaozhong Zhang
VMX and SVM calculate the TSC scaling ratio in a similar logic, so this patch generalizes it to a common TSC scaling function. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> [Inline the multiplication and shift steps into mul_u64_u64_shr. Remove BUG_ON. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_archHaozhong Zhang
This patch moves the field of TSC scaling ratio from the architecture struct vcpu_svm to the common struct kvm_vcpu_arch. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10KVM: x86: Collect information for setting TSC scaling ratioHaozhong Zhang
The number of bits of the fractional part of the 64-bit TSC scaling ratio in VMX and SVM is different. This patch makes the architecture code to collect the number of fractional bits and other related information into variables that can be accessed in the common code. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10KVM: x86: declare a few variables as __read_mostlyPaolo Bonzini
These include module parameters and variables that are set by kvm_x86_ops->hardware_setup. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_commonPaolo Bonzini
They are exactly the same, except that handle_mmio_page_fault has an unused argument and a call to WARN_ON. Remove the unused argument from the callers, and move the warning to (the former) handle_mmio_page_fault_common. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-05KVM: VMX: Fix commit which broke PMLKai Huang
I found PML was broken since below commit: commit feda805fe7c4ed9cf78158e73b1218752e3b4314 Author: Xiao Guangrong <guangrong.xiao@linux.intel.com> Date: Wed Sep 9 14:05:55 2015 +0800 KVM: VMX: unify SECONDARY_VM_EXEC_CONTROL update Unify the update in vmx_cpuid_update() Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> [Rewrite to use vmcs_set_secondary_exec_control. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> The reason is in above commit vmx_cpuid_update calls vmx_secondary_exec_control, in which currently SECONDARY_EXEC_ENABLE_PML bit is cleared unconditionally (as PML is enabled in creating vcpu). Therefore if vcpu_cpuid_update is called after vcpu is created, PML will be disabled unexpectedly while log-dirty code still thinks PML is used. Fix this by clearing SECONDARY_EXEC_ENABLE_PML in vmx_secondary_exec_control only when PML is not supported or not enabled (!enable_pml). This is more reasonable as PML is currently either always enabled or disabled. With this explicit updating SECONDARY_EXEC_ENABLE_PML in vmx_enable{disable}_pml is not needed so also rename vmx_enable{disable}_pml to vmx_create{destroy}_pml_buffer. Fixes: feda805fe7c4ed9cf78158e73b1218752e3b4314 Signed-off-by: Kai Huang <kai.huang@linux.intel.com> [While at it, change a wrong ASSERT to an "if". The condition can happen if creating the VCPU fails with ENOMEM. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()Laszlo Ersek
Commit b18d5431acc7 ("KVM: x86: fix CR0.CD virtualization") was technically correct, but it broke OVMF guests by slowing down various parts of the firmware. Commit fb279950ba02 ("KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED") quirked the first function modified by b18d5431acc7, vmx_get_mt_mask(), for OVMF's sake. This restored the speed of the OVMF code that runs before PlatformPei (including the memory intensive LZMA decompression in SEC). This patch extends the quirk to the second function modified by b18d5431acc7, kvm_set_cr0(). It eliminates the intrusive slowdown that hits the EFI_MP_SERVICES_PROTOCOL implementation of edk2's UefiCpuPkg/CpuDxe -- which is built into OVMF --, when CpuDxe starts up all APs at once for initialization, in order to count them. We also carry over the kvm_arch_has_noncoherent_dma() sub-condition from the other half of the original commit b18d5431acc7. Fixes: b18d5431acc7a2fd22767925f3a6f597aa4bd29e Cc: stable@vger.kernel.org Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Tested-by: Janusz Mocek <januszmk6@gmail.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com># Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04KVM: x86: allow RSM from 64-bit modePaolo Bonzini
The SDM says that exiting system management mode from 64-bit mode is invalid, but that would be too good to be true. But actually, most of the code is already there to support exiting from compat mode (EFER.LME=1, EFER.LMA=0). Getting all the way from 64-bit mode to real mode only requires clearing CS.L and CR4.PCIDE. Cc: stable@vger.kernel.org Fixes: 660a5d517aaab9187f93854425c4c63f4a09195c Tested-by: Laszlo Ersek <lersek@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04KVM: VMX: fix SMEP and SMAP without EPTRadim Krčmář
The comment in code had it mostly right, but we enable paging for emulated real mode regardless of EPT. Without EPT (which implies emulated real mode), secondary VCPUs won't start unless we disable SM[AE]P when the guest doesn't use paging. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04KVM: x86: move kvm_set_irq_inatomic to legacy device assignmentPaolo Bonzini
The function is not used outside device assignment, and kvm_arch_set_irq_inatomic has a different prototype. Move it here and make it static to avoid confusion. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04KVM: device assignment: remove pointless #ifdefsPaolo Bonzini
The symbols are always defined. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomicPaolo Bonzini
We do not want to do too much work in atomic context, in particular not walking all the VCPUs of the virtual machine. So we want to distinguish the architecture-specific injection function for irqfd from kvm_set_msi. Since it's still empty, reuse the newly added kvm_arch_set_irq and rename it to kvm_arch_set_irq_inatomic. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04KVM: x86: zero apic_arb_prio on resetRadim Krčmář
BSP doesn't get INIT so its apic_arb_prio isn't zeroed after reboot. BSP won't get lowest priority interrupts until other VCPUs get enough interrupts to match their pre-reboot apic_arb_prio. That behavior doesn't fit into KVM's round-robin-like interpretation of lowest priority delivery ... userspace should KVM_SET_LAPIC on reset, so just zero apic_arb_prio there. Reported-by: Yuki Shibuya <shibuya.yk@ncos.nec.co.jp> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04drivers/hv: share Hyper-V SynIC constants with userspaceAndrey Smetanin
Moved Hyper-V synic contants from guest Hyper-V drivers private header into x86 arch uapi Hyper-V header. Added Hyper-V synic msr's flags into x86 arch uapi Hyper-V header. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04KVM: x86: handle SMBASE as physical address in RSMRadim Krčmář
GET_SMSTATE depends on real mode to ensure that smbase+offset is treated as a physical address, which has already caused a bug after shuffling the code. Enforce physical addressing. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reported-by: Laszlo Ersek <lersek@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04KVM: x86: add read_phys to x86_emulate_opsRadim Krčmář
We want to read the physical memory when emulating RSM. X86EMUL_IO_NEEDED is returned on all errors for consistency with other helpers. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04KVM: x86: removing unused variableSaurabh Sengar
removing unused variables, found by coccinelle Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04Merge tag 'kvm-arm-for-4.4' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Changes for v4.4-rc1 Includes a number of fixes for the arch-timer, introducing proper level-triggered semantics for the arch-timers, a series of patches to synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups getting rid of redundant state, and finally a stylistic change that gets rid of some ctags warnings. Conflicts: arch/x86/include/asm/kvm_host.h
2015-11-02Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD There's nothing much in the way of new features this time; it's mostly bug fixes, plus Nikunj has implemented support for KVM_CAP_NR_MEMSLOTS.
2015-11-02Merge tag 'kvm-s390-next-20151028' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Bugfix and cleanups There is one important bug fix for a potential memory corruption and/or guest errors for guests with 63 or 64 vCPUs. This fix would qualify for 4.3 but is some days too late giving that we are about to release 4.3. Given that this patch is cc stable >= 3.15 anyway, we can handle it via 4.4. merge window. This pull request also contains two cleanups.
2015-10-29KVM: s390: use simple switch statement as multiplexerChristian Borntraeger
We currently do some magic shifting (by exploiting that exit codes are always a multiple of 4) and a table lookup to jump into the exit handlers. This causes some calculations and checks, just to do an potentially expensive function call. Changing that to a switch statement gives the compiler the chance to inline and dynamically decide between jump tables or inline compare and branches. In addition it makes the code more readable. bloat-o-meter gives me a small reduction in code size: add/remove: 0/7 grow/shrink: 1/1 up/down: 986/-1334 (-348) function old new delta kvm_handle_sie_intercept 72 1058 +986 handle_prog 704 696 -8 handle_noop 54 - -54 handle_partial_execution 60 - -60 intercept_funcs 120 - -120 handle_instruction 198 - -198 handle_validity 210 - -210 handle_stop 316 - -316 handle_external_interrupt 368 - -368 Right now my gcc does conditional branches instead of jump tables. The inlining seems to give us enough cycles as some micro-benchmarking shows minimal improvements, but still in noise. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-10-29KVM: s390: drop useless newline in debugging dataChristian Borntraeger
the s390 debug feature does not need newlines. In fact it will result in empty lines. Get rid of 4 leftovers. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-10-29KVM: s390: SCA must not cross page boundariesDavid Hildenbrand
We seemed to have missed a few corner cases in commit f6c137ff00a4 ("KVM: s390: randomize sca address"). The SCA has a maximum size of 2112 bytes. By setting the sca_offset to some unlucky numbers, we exceed the page. 0x7c0 (1984) -> Fits exactly 0x7d0 (2000) -> 16 bytes out 0x7e0 (2016) -> 32 bytes out 0x7f0 (2032) -> 48 bytes out One VCPU entry is 32 bytes long. For the last two cases, we actually write data to the other page. 1. The address of the VCPU. 2. Injection/delivery/clearing of SIGP externall calls via SIGP IF. Especially the 2. happens regularly. So this could produce two problems: 1. The guest losing/getting external calls. 2. Random memory overwrites in the host. So this problem happens on every 127 + 128 created VM with 64 VCPUs. Cc: stable@vger.kernel.org # v3.15+ Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-22arm64: kvm: restore EL1N SP for panicMark Rutland
If we panic in hyp mode, we inject a call to panic() into the EL1N host kernel. If a guest context is active, we first attempt to restore the minimal amount of state necessary to execute the host kernel with restore_sysregs. However, the SP is restored as part of restore_common_regs, and so we may return to the host's panic() function with the SP of the guest. Any calculations based on the SP will be bogus, and any attempt to access the stack will result in recursive data aborts. When running Linux as a guest, the guest's EL1N SP is like to be some valid kernel address. In this case, the host kernel may use that region as a stack for panic(), corrupting it in the process. Avoid the problem by restoring the host SP prior to returning to the host. To prevent misleading backtraces in the host, the FP is zeroed at the same time. We don't need any of the other "common" registers in order to panic successfully. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: <kvmarm@lists.cs.columbia.edu> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22arm/arm64: KVM: Improve kvm_exit tracepointChristoffer Dall
The ARM architecture only saves the exit class to the HSR (ESR_EL2 for arm64) on synchronous exceptions, not on asynchronous exceptions like an IRQ. However, we only report the exception class on kvm_exit, which is confusing because an IRQ looks like it exited at some PC with the same reason as the previous exit. Add a lookup table for the exception index and prepend the kvm_exit tracepoint text with the exception type to clarify this situation. Also resolve the exception class (EC) to a human-friendly text version so the trace output becomes immediately usable for debugging this code. Cc: Wei Huang <wei@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22KVM: arm/arm64: implement kvm_arm_[halt,resume]_guestEric Auger
We introduce kvm_arm_halt_guest and resume functions. They will be used for IRQ forward state change. Halt is synchronous and prevents the guest from being re-entered. We use the same mechanism put in place for PSCI former pause, now renamed power_off. A new flag is introduced in arch vcpu state, pause, only meant to be used by those functions. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22KVM: arm/arm64: check power_off in critical section before VCPU runEric Auger
In case a vcpu off PSCI call is called just after we executed the vcpu_sleep check, we can enter the guest although power_off is set. Let's check the power_off state in the critical section, just before entering the guest. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reported-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22KVM: arm/arm64: check power_off in kvm_arch_vcpu_runnableEric Auger
kvm_arch_vcpu_runnable now also checks whether the power_off flag is set. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22KVM: arm/arm64: rename pause into power_offEric Auger
The kvm_vcpu_arch pause field is renamed into power_off to prepare for the introduction of a new pause field. Also vcpu_pause is renamed into vcpu_sleep since we will sleep until both power_off and pause are false. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22arm/arm64: KVM : Enable vhost device selection under KVM config menuWei Huang
vhost drivers provide guest VMs with better I/O performance and lower CPU utilization. This patch allows users to select vhost devices under KVM configuration menu on ARM. This makes vhost support on arm/arm64 on a par with other architectures (e.g. x86, ppc). Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22arm/arm64: KVM: Rework the arch timer to use level-triggered semanticsChristoffer Dall
The arch timer currently uses edge-triggered semantics in the sense that the line is never sampled by the vgic and lowering the line from the timer to the vgic doesn't have any effect on the pending state of virtual interrupts in the vgic. This means that we do not support a guest with the otherwise valid behavior of (1) disable interrupts (2) enable the timer (3) disable the timer (4) enable interrupts. Such a guest would validly not expect to see any interrupts on real hardware, but will see interrupts on KVM. This patch fixes this shortcoming through the following series of changes. First, we change the flow of the timer/vgic sync/flush operations. Now the timer is always flushed/synced before the vgic, because the vgic samples the state of the timer output. This has the implication that we move the timer operations in to non-preempible sections, but that is fine after the previous commit getting rid of hrtimer schedules on every entry/exit. Second, we change the internal behavior of the timer, letting the timer keep track of its previous output state, and only lower/raise the line to the vgic when the state changes. Note that in theory this could have been accomplished more simply by signalling the vgic every time the state *potentially* changed, but we don't want to be hitting the vgic more often than necessary. Third, we get rid of the use of the map->active field in the vgic and instead simply set the interrupt as active on the physical distributor whenever the input to the GIC is asserted and conversely clear the physical active state when the input to the GIC is deasserted. Fourth, and finally, we now initialize the timer PPIs (and all the other unused PPIs for now), to be level-triggered, and modify the sync code to sample the line state on HW sync and re-inject a new interrupt if it is still pending at that time. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_blockChristoffer Dall
We currently schedule a soft timer every time we exit the guest if the timer did not expire while running the guest. This is really not necessary, because the only work we do in the timer work function is to kick the vcpu. Kicking the vcpu does two things: (1) If the vpcu thread is on a waitqueue, make it runnable and remove it from the waitqueue. (2) If the vcpu is running on a different physical CPU from the one doing the kick, it sends a reschedule IPI. The second case cannot happen, because the soft timer is only ever scheduled when the vcpu is not running. The first case is only relevant when the vcpu thread is on a waitqueue, which is only the case when the vcpu thread has called kvm_vcpu_block(). Therefore, we only need to make sure a timer is scheduled for kvm_vcpu_block(), which we do by encapsulating all calls to kvm_vcpu_block() with kvm_timer_{un}schedule calls. Additionally, we only schedule a soft timer if the timer is enabled and unmasked, since it is useless otherwise. Note that theoretically userspace can use the SET_ONE_REG interface to change registers that should cause the timer to fire, even if the vcpu is blocked without a scheduled timer, but this case was not supported before this patch and we leave it for future work for now. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22KVM: Add kvm_arch_vcpu_{un}blocking callbacksChristoffer Dall
Some times it is useful for architecture implementations of KVM to know when the VCPU thread is about to block or when it comes back from blocking (arm/arm64 needs to know this to properly implement timers, for example). Therefore provide a generic architecture callback function in line with what we do elsewhere for KVM generic-arch interactions. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-21KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit pathGautham R. Shenoy
Currently a CPU running a guest can receive a H_DOORBELL in the following two cases: 1) When the CPU is napping due to CEDE or there not being a guest vcpu. 2) The CPU is running the guest vcpu. Case 1), the doorbell message is not cleared since we were waking up from nap. Hence when the EE bit gets set on transition from guest to host, the H_DOORBELL interrupt is delivered to the host and the corresponding handler is invoked. However in Case 2), the message gets cleared by the action of taking the H_DOORBELL interrupt. Since the CPU was running a guest, instead of invoking the doorbell handler, the code invokes the second-level interrupt handler to switch the context from the guest to the host. At this point the setting of the EE bit doesn't result in the CPU getting the doorbell interrupt since it has already been delivered once. So, the handler for this doorbell is never invoked! This causes softlockups if the missed DOORBELL was an IPI sent from a sibling subcore on the same CPU. This patch fixes it by explitly invoking the doorbell handler on the exit path if the exit reason is H_DOORBELL similar to the way an EXTERNAL interrupt is handled. Since this will also handle Case 1), we can unconditionally clear the doorbell message in kvmppc_check_wake_reason. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-10-21KVM: PPC: Implement extension to report number of memslotsNikunj A Dadhania
QEMU assumes 32 memslots if this extension is not implemented. Although, current value of KVM_USER_MEM_SLOTS is 32, once KVM_USER_MEM_SLOTS changes QEMU would take a wrong value. Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-10-21KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent HPTEsPaul Mackerras
This fixes a bug where the old HPTE value returned by H_REMOVE has the valid bit clear if the HPTE was an absent HPTE, as happens for HPTEs for emulated MMIO pages and for RAM pages that have been paged out by the host. If the absent bit is set, we clear it and set the valid bit, because from the guest's point of view, the HPTE is valid. Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-10-21KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation ioctlPaul Mackerras
Currently the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested size of HPT, and if that is not possible, then try to allocate smaller sizes (by factors of 2) until either a minimum is reached or the allocation succeeds. This is not ideal for userspace, particularly in migration scenarios, where the destination VM really does require the size requested. Also, the minimum HPT size of 256kB may be insufficient for the guest to run successfully. This removes the fallback to smaller sizes on allocation failure for the KVM_PPC_ALLOCATE_HTAB ioctl. The fallback still exists for the case where the HPT is allocated at the time the first VCPU is run, if no HPT has been allocated by ioctl by that time. Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-10-20KVM: arm: use GIC support unconditionallyArnd Bergmann
The vgic code on ARM is built for all configurations that enable KVM, but the parent_data field that it references is only present when CONFIG_IRQ_DOMAIN_HIERARCHY is set: virt/kvm/arm/vgic.c: In function 'kvm_vgic_map_phys_irq': virt/kvm/arm/vgic.c:1781:13: error: 'struct irq_data' has no member named 'parent_data' This flag is implied by the GIC driver, and indeed the VGIC code only makes sense if a GIC is present. This changes the CONFIG_KVM symbol to always select GIC, which avoids the issue. Fixes: 662d9715840 ("arm/arm64: KVM: Kill CONFIG_KVM_ARM_{VGIC,TIMER}") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-20KVM: arm/arm64: Fix memory leak if timer initialization failsPavel Fedin
Jump to correct label and free kvm_host_cpu_state Reviewed-by: Wei Huang <wei@redhat.com> Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-19KVM: x86: MMU: Initialize force_pt_level before calling mapping_level()Takuya Yoshikawa
Commit fd1369021878 ("KVM: x86: MMU: Move mapping_level_dirty_bitmap() call in mapping_level()") forgot to initialize force_pt_level to false in FNAME(page_fault)() before calling mapping_level() like nonpaging_map() does. This can sometimes result in forcing page table level mapping unnecessarily. Fix this and move the first *force_pt_level check in mapping_level() before kvm_vcpu_gfn_to_memslot() call to make it a bit clearer that the variable must be initialized before mapping_level() gets called. This change can also avoid calling kvm_vcpu_gfn_to_memslot() when !check_hugepage_cache_consistency() check in tdp_page_fault() forces page table level mapping. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-19kvm: x86: zero EFER on INITPaolo Bonzini
Not zeroing EFER means that a 32-bit firmware cannot enter paging mode without clearing EFER.LME first (which it should not know about). Yang Zhang from Intel confirmed that the manual is wrong and EFER is cleared to zero on INIT. Fixes: d28bc9dd25ce023270d2e039e7c98d38ecbf7758 Cc: stable@vger.kernel.org Cc: Yang Z Zhang <yang.z.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>