summaryrefslogtreecommitdiff
path: root/virt/kvm/kvm_main.c
AgeCommit message (Collapse)Author
2015-04-21KVM: PPC: Book3S HV: Create debugfs file for each guest's HPTPaul Mackerras
This creates a debugfs directory for each HV guest (assuming debugfs is enabled in the kernel config), and within that directory, a file by which the contents of the guest's HPT (hashed page table) can be read. The directory is named vmnnnn, where nnnn is the PID of the process that created the guest. The file is named "htab". This is intended to help in debugging problems in the host's management of guest memory. The contents of the file consist of a series of lines like this: 3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196 The first field is the index of the entry in the HPT, the second and third are the HPT entry, so the third entry contains the real page number that is mapped by the entry if the entry's valid bit is set. The fourth field is the guest's view of the second doubleword of the entry, so it contains the guest physical address. (The format of the second through fourth fields are described in the Power ISA and also in arch/powerpc/include/asm/mmu-hash64.h.) Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "First batch of KVM changes for 4.1 The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64. Summary: ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling vhost, too), page aging s390: interrupt handling rework, allowing to inject all local interrupts via new ioctl and to get/set the full local irq state for migration and introspection. New ioctls to access memory by virtual address, and to get/set the guest storage keys. SIMD support. MIPS: FPU and MIPS SIMD Architecture (MSA) support. Includes some patches from Ralf Baechle's MIPS tree. x86: bugfixes (notably for pvclock, the others are small) and cleanups. Another small latency improvement for the TSC deadline timer" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits) KVM: use slowpath for cross page cached accesses kvm: mmu: lazy collapse small sptes into large sptes KVM: x86: Clear CR2 on VCPU reset KVM: x86: DR0-DR3 are not clear on reset KVM: x86: BSP in MSR_IA32_APICBASE is writable KVM: x86: simplify kvm_apic_map KVM: x86: avoid logical_map when it is invalid KVM: x86: fix mixed APIC mode broadcast KVM: x86: use MDA for interrupt matching kvm/ppc/mpic: drop unused IRQ_testbit KVM: nVMX: remove unnecessary double caching of MAXPHYADDR KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu KVM: vmx: pass error code with internal error #2 x86: vdso: fix pvclock races with task migration KVM: remove kvm_read_hva and kvm_read_hva_atomic KVM: x86: optimize delivery of TSC deadline timer interrupt KVM: x86: extract blocking logic from __vcpu_run kvm: x86: fix x86 eflags fixed bit KVM: s390: migrate vcpu interrupt state ...
2015-04-10KVM: use slowpath for cross page cached accessesRadim Krčmář
kvm_write_guest_cached() does not mark all written pages as dirty and code comments in kvm_gfn_to_hva_cache_init() talk about NULL memslot with cross page accesses. Fix all the easy way. The check is '<= 1' to have the same result for 'len = 0' cache anywhere in the page. (nr_pages_needed is 0 on page boundary.) Fixes: 8f964525a121 ("KVM: Allow cross page reads and writes from cached translations.") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <20150408121648.GA3519@potion.brq.redhat.com> Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08KVM: remove kvm_read_hva and kvm_read_hva_atomicPaolo Bonzini
The corresponding write functions just use __copy_to_user. Do the same on the read side. This reverts what's left of commit 86ab8cffb498 (KVM: introduce gfn_to_hva_read/kvm_read_hva/kvm_read_hva_atomic, 2012-08-21) Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1427976500-28533-1-git-send-email-pbonzini@redhat.com>
2015-04-07Merge tag 'kvm-s390-next-20150331' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD Features and fixes for 4.1 (kvm/next) 1. Assorted changes 1.1 allow more feature bits for the guest 1.2 Store breaking event address on program interrupts 2. Interrupt handling rework 2.1 Fix copy_to_user while holding a spinlock (cc stable) 2.2 Rework floating interrupts to follow the priorities 2.3 Allow to inject all local interrupts via new ioctl 2.4 allow to get/set the full local irq state, e.g. for migration and introspection
2015-04-07Merge tag 'kvm-arm-for-4.1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into 'kvm-next' KVM/ARM changes for v4.1: - fixes for live migration - irqfd support - kvm-io-bus & vgic rework to enable ioeventfd - page ageing for stage-2 translation - various cleanups
2015-03-31KVM: s390: add ioctl to inject local interruptsJens Freimann
We have introduced struct kvm_s390_irq a while ago which allows to inject all kinds of interrupts as defined in the Principles of Operation. Add ioctl to inject interrupts with the extended struct kvm_s390_irq Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-03-26KVM: move iodev.h from virt/kvm/ to include/kvmAndre Przywara
iodev.h contains definitions for the kvm_io_bus framework. This is needed both by the generic KVM code in virt/kvm as well as by architecture specific code under arch/. Putting the header file in virt/kvm and using local includes in the architecture part seems at least dodgy to me, so let's move the file into include/kvm, so that a more natural "#include <kvm/iodev.h>" can be used by all of the code. This also solves a problem later when using struct kvm_io_device in arm_vgic.h. Fixing up the FSF address in the GPL header and a wrong include path on the way. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-26KVM: Redesign kvm_io_bus_ API to pass VCPU structure to the callbacks.Nikolay Nikolaev
This is needed in e.g. ARM vGIC emulation, where the MMIO handling depends on the VCPU that does the access. Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-23kvm: avoid page allocation failure in kvm_set_memory_region()Igor Mammedov
KVM guest can fail to startup with following trace on host: qemu-system-x86: page allocation failure: order:4, mode:0x40d0 Call Trace: dump_stack+0x47/0x67 warn_alloc_failed+0xee/0x150 __alloc_pages_direct_compact+0x14a/0x150 __alloc_pages_nodemask+0x776/0xb80 alloc_kmem_pages+0x3a/0x110 kmalloc_order+0x13/0x50 kmemdup+0x1b/0x40 __kvm_set_memory_region+0x24a/0x9f0 [kvm] kvm_set_ioapic+0x130/0x130 [kvm] kvm_set_memory_region+0x21/0x40 [kvm] kvm_vm_ioctl+0x43f/0x750 [kvm] Failure happens when attempting to allocate pages for 'struct kvm_memslots', however it doesn't have to be present in physically contiguous (kmalloc-ed) address space, change allocation to kvm_kvzalloc() so that it will be vmalloc-ed when its size is more then a page. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-18KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()Takuya Yoshikawa
When all bits in mask are not set, kvm_arch_mmu_enable_log_dirty_pt_masked() has nothing to do. But since it needs to be called from the generic code, it cannot be inlined, and a few function calls, two when PML is enabled, are wasted. Since it is common to see many pages remain clean, e.g. framebuffers can stay calm for a long time, it is worth eliminating this overhead. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10kvm: move advertising of KVM_CAP_IRQFD to common codePaolo Bonzini
POWER supports irqfds but forgot to advertise them. Some userspace does not check for the capability, but others check it---thus they work on x86 and s390 but not POWER. To avoid that other architectures in the future make the same mistake, let common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE. Reported-and-tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: 297e21053a52f060944e9f0de4c64fad9bcd72fc Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10KVM: Use pr_info/pr_err in kvm_main.cXiubo Li
WARNING: Prefer [subsystem eg: netdev]_info([subsystem]dev, ... then dev_info(dev, ... then pr_info(... to printk(KERN_INFO ... + printk(KERN_INFO "kvm: exiting hardware virtualization\n"); WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... + printk(KERN_ERR "kvm: misc device register failed\n"); Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10KVM: Fix indentation in kvm_main.cXiubo Li
ERROR: code indent should use tabs where possible + const struct kvm_io_range *r2)$ WARNING: please, no spaces at the start of a line + const struct kvm_io_range *r2)$ This patch fixes this ERROR & WARNING to reduce noise when checking new patches in kvm_main.c. Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10KVM: no space before tabs in kvm_main.cXiubo Li
WARNING: please, no space before tabs + * ^I^Ikvm->lock --> kvm->slots_lock --> kvm->irq_lock$ WARNING: please, no space before tabs +^I^I * ^I- gfn_to_hva (kvm_read_guest, gfn_to_pfn)$ WARNING: please, no space before tabs +^I^I * ^I- kvm_is_visible_gfn (mmu_check_roots)$ This patch fixes these warnings to reduce noise when checking new patches in kvm_main.c. Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10KVM: Missing blank line after declarations in kvm_main.cXiubo Li
There are many Warnings like this: WARNING: Missing a blank line after declarations + struct kvm_coalesced_mmio_zone zone; + r = -EFAULT; This patch fixes these warnings to reduce noise when checking new patches in kvm_main.c. Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10KVM: EXPORT_SYMBOL should immediately follow its functionXiubo Li
WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable +EXPORT_SYMBOL_GPL(gfn_to_page); This patch fixes these warnings to reduce noise when checking new patches in kvm_main.c. Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10KVM: Fix ERROR: do not initialise statics to 0 or NULL in kvm_main.cXiubo Li
ERROR: do not initialise statics to 0 or NULL +static int kvm_usage_count = 0; The kvm_usage_count will be placed to .bss segment when linking, so not need to set it to 0 here obviously. This patch fixes this ERROR to reduce noise when checking new patches in kvm_main.c. Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10KVM: Fix WARNING: labels should not be indented in kvm_main.cXiubo Li
WARNING: labels should not be indented + out_free_irq_routing: This patch fixes this WARNING to reduce noise when checking new patches in kvm_main.c. Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10KVM: Fix WARNINGs for 'sizeof(X)' instead of 'sizeof X' in kvm_main.cXiubo Li
There are many WARNINGs like this: WARNING: sizeof tr should be sizeof(tr) + if (copy_from_user(&tr, argp, sizeof tr)) In kvm_main.c many places are using 'sizeof(X)', and the other places are using 'sizeof X', while the kernel recommands to use 'sizeof(X)', so this patch will replace all 'sizeof X' to 'sizeof(X)' to make them consistent and at the same time to reduce the WARNINGs noise when we are checking new patches. Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10KVM: Get rid of kvm_kvfree()Thomas Huth
kvm_kvfree() provides exactly the same functionality as the new common kvfree() function - so let's simply replace the kvm function with the common function. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10KVM: make halt_poll_ns staticChristian Borntraeger
halt_poll_ns is used only locally. Make it static. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10KVM: white space formatting in kvm_main.cKevin Mulvey
Better alignment of loop using tabs rather than spaces, this makes checkpatch.pl happier. Signed-off-by: Kevin Mulvey <kmulvey@linux.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-02-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM update from Paolo Bonzini: "Fairly small update, but there are some interesting new features. Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: The highlights are support for GICv3 emulation and dirty page tracking s390: Several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. Powerpc: Nothing yet. The KVM/PPC changes will come in through the PPC maintainers, because I haven't received them yet and I might end up being offline for some part of next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: ia64: drop kvm.h from installed user headers KVM: x86: fix build with !CONFIG_SMP KVM: x86: emulate: correct page fault error code for NoWrite instructions KVM: Disable compat ioctl for s390 KVM: s390: add cpu model support KVM: s390: use facilities and cpu_id per KVM KVM: s390/CPACF: Choose crypto control block format s390/kernel: Update /proc/sysinfo file with Extended Name and UUID KVM: s390: reenable LPP facility KVM: s390: floating irqs: fix user triggerable endless loop kvm: add halt_poll_ns module parameter kvm: remove KVM_MMIO_SIZE KVM: MIPS: Don't leak FPU/DSP to guest KVM: MIPS: Disable HTW while in guest KVM: nVMX: Enable nested posted interrupt processing KVM: nVMX: Enable nested virtual interrupt delivery KVM: nVMX: Enable nested apic register virtualization KVM: nVMX: Make nested control MSRs per-cpu KVM: nVMX: Enable nested virtualize x2apic mode KVM: nVMX: Prepare for using hardware MSR bitmap ...
2015-02-11mm: gup: kvm use get_user_pages_unlockedAndrea Arcangeli
Use the more generic get_user_pages_unlocked which has the additional benefit of passing FAULT_FLAG_ALLOW_RETRY at the very first page fault (which allows the first page fault in an unmapped area to be always able to block indefinitely by being allowed to release the mmap_sem). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Peter Feiner <pfeiner@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-09KVM: Disable compat ioctl for s390Christian Borntraeger
We never had a 31bit QEMU/kuli running. We would need to review several ioctls to check if this creates holes, bugs or whatever to make it work. Lets just disable compat support for KVM on s390. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-06kvm: add halt_poll_ns module parameterPaolo Bonzini
This patch introduces a new module parameter for the KVM module; when it is present, KVM attempts a bit of polling on every HLT before scheduling itself out via kvm_vcpu_block. This parameter helps a lot for latency-bound workloads---in particular I tested it with O_DSYNC writes with a battery-backed disk in the host. In this case, writes are fast (because the data doesn't have to go all the way to the platters) but they cannot be merged by either the host or the guest. KVM's performance here is usually around 30% of bare metal, or 50% if you use cache=directsync or cache=writethrough (these parameters avoid that the guest sends pointless flush requests, and at the same time they are not slow because of the battery-backed cache). The bad performance happens because on every halt the host CPU decides to halt itself too. When the interrupt comes, the vCPU thread is then migrated to a new physical CPU, and in general the latency is horrible because the vCPU thread has to be scheduled back in. With this patch performance reaches 60-65% of bare metal and, more important, 99% of what you get if you use idle=poll in the guest. This means that the tunable gets rid of this particular bottleneck, and more work can be done to improve performance in the kernel or QEMU. Of course there is some price to pay; every time an otherwise idle vCPUs is interrupted by an interrupt, it will poll unnecessarily and thus impose a little load on the host. The above results were obtained with a mostly random value of the parameter (500000), and the load was around 1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU. The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll, that can be used to tune the parameter. It counts how many HLT instructions received an interrupt during the polling period; each successful poll avoids that Linux schedules the VCPU thread out and back in, and may also avoid a likely trip to C1 and back for the physical CPU. While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second. Of these halts, almost all are failed polls. During the benchmark, instead, basically all halts end within the polling period, except a more or less constant stream of 50 per second coming from vCPUs that are not running the benchmark. The wasted time is thus very low. Things may be slightly different for Windows VMs, which have a ~10 ms timer tick. The effect is also visible on Marcelo's recently-introduced latency test for the TSC deadline timer. Though of course a non-RT kernel has awful latency bounds, the latency of the timer is around 8000-10000 clock cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC deadline timer, thus, the effect is both a smaller average latency and a smaller variance. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-29KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for log ↵Kai Huang
dirty We don't have to write protect guest memory for dirty logging if architecture supports hardware dirty logging, such as PML on VMX, so rename it to be more generic. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-27kvm: update_memslots: clean flags for invalid memslotsTiejun Chen
Indeed, any invalid memslots should be new->npages = 0, new->base_gfn = 0 and new->flags = 0 at the same time. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-16KVM: Add generic support for dirty page loggingMario Smarduch
kvm_get_dirty_log() provides generic handling of dirty bitmap, currently reused by several architectures. Building on that we intrdoduce kvm_get_dirty_log_protect() adding write protection to mark these pages dirty for future write access, before next KVM_GET_DIRTY_LOG ioctl call from user space. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
2015-01-16KVM: Add architecture-defined TLB flush supportMario Smarduch
Allow architectures to override the generic kvm_flush_remote_tlbs() function via HAVE_KVM_ARCH_TLB_FLUSH_ALL. ARMv7 will need this to provide its own TLB flush interface. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
2015-01-08KVM: nVMX: Add nested msr load/restore algorithmWincy Van
Several hypervisors need MSR auto load/restore feature. We read MSRs from VM-entry MSR load area which specified by L1, and load them via kvm_set_msr in the nested entry. When nested exit occurs, we get MSRs via kvm_get_msr, writing them to L1`s MSR store area. After this, we read MSRs from VM-exit MSR load area, and load them via kvm_set_msr. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-28kvm: warn on more invariant breakagePaolo Bonzini
Modifying a non-existent slot is not allowed. Also check that the first loop doesn't move a deleted slot beyond the used part of the mslots array. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-28kvm: fix sorting of memslots with base_gfn == 0Paolo Bonzini
Before commit 0e60b0799fed (kvm: change memslot sorting rule from size to GFN, 2014-12-01), the memslots' sorting key was npages, meaning that a valid memslot couldn't have its sorting key equal to zero. On the other hand, a valid memslot can have base_gfn == 0, and invalid memslots are identified by base_gfn == npages == 0. Because of this, commit 0e60b0799fed broke the invariant that invalid memslots are at the end of the mslots array. When a memslot with base_gfn == 0 was created, any invalid memslot before it were left in place. This can be fixed by changing the insertion to use a ">=" comparison instead of "<=", but some care is needed to avoid breaking the case of deleting a memslot; see the comment in update_memslots. Thanks to Tiejun Chen for posting an initial patch for this bug. Reported-by: Jamie Heilman <jamie@audible.transient.net> Reported-by: Andy Lutomirski <luto@amacapital.net> Tested-by: Jamie Heilman <jamie@audible.transient.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-15Merge tag 'kvm-arm-for-3.19-take2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD Second round of changes for KVM for arm/arm64 for v3.19; fixes reboot problems, clarifies VCPU init, and fixes a regression concerning the VGIC init flow. Conflicts: arch/ia64/kvm/kvm-ia64.c [deleted in HEAD and modified in kvmarm]
2014-12-04KVM: track pid for VCPU only on KVM_RUN ioctlChristian Borntraeger
We currently track the pid of the task that runs the VCPU in vcpu_load. If a yield to that VCPU is triggered while the PID of the wrong thread is active, the wrong thread might receive a yield, but this will most likely not help the executing thread at all. Instead, if we only track the pid on the KVM_RUN ioctl, there are two possibilities: 1) the thread that did a non-KVM_RUN ioctl is holding a mutex that the VCPU thread is waiting for. In this case, the VCPU thread is not runnable, but we also do not do a wrong yield. 2) the thread that did a non-KVM_RUN ioctl is sleeping, or doing something that does not block the VCPU thread. In this case, the VCPU thread can receive the directed yield correctly. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> CC: Rik van Riel <riel@redhat.com> CC: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> CC: Michael Mueller <mimu@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-04KVM: don't check for PF_VCPU when yieldingDavid Hildenbrand
kvm_enter_guest() has to be called with preemption disabled and will set PF_VCPU. Current code takes PF_VCPU as a hint that the VCPU thread is running and therefore needs no yield. However, the check on PF_VCPU is wrong on s390, where preemption has to stay enabled in order to correctly process page faults. Thus, s390 reenables preemption and starts to execute the guest. The thread might be scheduled out between kvm_enter_guest() and kvm_exit_guest(), resulting in PF_VCPU being set but not being run. When this happens, the opportunity for directed yield is missed. However, this check is done already in kvm_vcpu_on_spin before calling kvm_vcpu_yield_loop: if (!ACCESS_ONCE(vcpu->preempted)) continue; so the check on PF_VCPU is superfluous in general, and this patch removes it. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-04kvm: optimize GFN to memslot lookup with large slots amountIgor Mammedov
Current linear search doesn't scale well when large amount of memslots is used and looked up slot is not in the beginning memslots array. Taking in account that memslots don't overlap, it's possible to switch sorting order of memslots array from 'npages' to 'base_gfn' and use binary search for memslot lookup by GFN. As result of switching to binary search lookup times are reduced with large amount of memslots. Following is a table of search_memslot() cycles during WS2008R2 guest boot. boot, boot + ~10 min mostly same of using it, slot lookup randomized lookup max average average cycles cycles cycles 13 slots : 1450 28 30 13 slots : 1400 30 40 binary search 117 slots : 13000 30 460 117 slots : 2000 35 180 binary search Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-04kvm: change memslot sorting rule from size to GFNIgor Mammedov
it will allow to use binary search for GFN -> memslot lookups, reducing lookup cost with large slots amount. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-04kvm: update_memslots: drop not needed check for the same slotIgor Mammedov
UP/DOWN shift loops will shift array in needed direction and stop at place where new slot should be placed regardless of old slot size. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-04kvm: update_memslots: drop not needed check for the same number of pagesIgor Mammedov
if number of pages haven't changed sorting algorithm will do nothing, so there is no need to do extra check to avoid entering sorting logic. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-25kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()Ard Biesheuvel
This reverts commit 85c8555ff0 ("KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn. The problem being addressed by the patch above was that some ARM code based the memory mapping attributes of a pfn on the return value of kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should be mapped as device memory. However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin, and the existing non-ARM users were already using it in a way which suggests that its name should probably have been 'kvm_is_reserved_pfn' from the beginning, e.g., whether or not to call get_page/put_page on it etc. This means that returning false for the zero page is a mistake and the patch above should be reverted. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-11-23kvm: x86: move assigned-dev.c and iommu.c to arch/x86/Radim Krčmář
Now that ia64 is gone, we can hide deprecated device assignment in x86. Notable changes: - kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl() The easy parts were removed from generic kvm code, remaining - kvm_iommu_(un)map_pages() would require new code to be moved - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-21kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/Paolo Bonzini
ia64 does not need them anymore. Ack notifiers become x86-specific too. Suggested-by: Gleb Natapov <gleb@kernel.org> Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-17kvm: simplify update_memslots invocationPaolo Bonzini
The update_memslots invocation is only needed in one case. Make the code clearer by moving it to __kvm_set_memory_region, and removing the wrapper around insert_memslot. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-17kvm: commonize allocation of the new memory slotsPaolo Bonzini
The two kmemdup invocations can be unified. I find that the new placement of the comment makes it easier to see what happens. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-14kvm: memslots: track id_to_index changes during the insertion sortPaolo Bonzini
This completes the optimization from the previous patch, by removing the KVM_MEM_SLOTS_NUM-iteration loop from insert_memslot. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-14kvm: memslots: replace heap sort with an insertion sort passIgor Mammedov
memslots is a sorted array. When a slot is changed, heapsort (lib/sort.c) would take O(n log n) time to update it; an optimized insertion sort will only cost O(n) on an array with just one item out of order. Replace sort() with a custom sort that takes advantage of memslots usage pattern and the known position of the changed slot. performance change of 128 memslots insertions with gradually increasing size (the worst case): heap sort custom sort max: 249747 2500 cycles with custom sort alg taking ~98% less then original update time. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03KVM: trivial fix comment regarding __kvm_set_memory_regionDominik Dingel
commit 72dc67a69690 ("KVM: remove the usage of the mmap_sem for the protection of the memory slots.") changed the lock which will be taken. This should be reflected in the function commentary. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24kvm: vfio: fix unregister kvm_device_ops of vfioWanpeng Li
After commit 80ce163 (KVM: VFIO: register kvm_device_ops dynamically), kvm_device_ops of vfio can be registered dynamically. Commit 3c3c29fd (kvm-vfio: do not use module_init) move the dynamic register invoked by kvm_init in order to fix broke unloading of the kvm module. However, kvm_device_ops of vfio is unregistered after rmmod kvm-intel module which lead to device type collision detection warning after kvm-intel module reinsmod. WARNING: CPU: 1 PID: 10358 at /root/cathy/kvm/arch/x86/kvm/../../../virt/kvm/kvm_main.c:3289 kvm_init+0x234/0x282 [kvm]() Modules linked in: kvm_intel(O+) kvm(O) nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs fscache lockd sunrpc pci_stub bridge stp llc autofs4 8021q cpufreq_ondemand ipv6 joydev microcode pcspkr igb i2c_algo_bit ehci_pci ehci_hcd e1000e i2c_i801 ixgbe ptp pps_core hwmon mdio tpm_tis tpm ipmi_si ipmi_msghandler acpi_cpufreq isci libsas scsi_transport_sas button dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kvm_intel] CPU: 1 PID: 10358 Comm: insmod Tainted: G W O 3.17.0-rc1 #2 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013 0000000000000cd9 ffff880ff08cfd18 ffffffff814a61d9 0000000000000cd9 0000000000000000 ffff880ff08cfd58 ffffffff810417b7 ffff880ff08cfd48 ffffffffa045bcac ffffffffa049c420 0000000000000040 00000000000000ff Call Trace: [<ffffffff814a61d9>] dump_stack+0x49/0x60 [<ffffffff810417b7>] warn_slowpath_common+0x7c/0x96 [<ffffffffa045bcac>] ? kvm_init+0x234/0x282 [kvm] [<ffffffff810417e6>] warn_slowpath_null+0x15/0x17 [<ffffffffa045bcac>] kvm_init+0x234/0x282 [kvm] [<ffffffffa016e995>] vmx_init+0x1bf/0x42a [kvm_intel] [<ffffffffa016e7d6>] ? vmx_check_processor_compat+0x64/0x64 [kvm_intel] [<ffffffff810002ab>] do_one_initcall+0xe3/0x170 [<ffffffff811168a9>] ? __vunmap+0xad/0xb8 [<ffffffff8109c58f>] do_init_module+0x2b/0x174 [<ffffffff8109d414>] load_module+0x43e/0x569 [<ffffffff8109c6d8>] ? do_init_module+0x174/0x174 [<ffffffff8109c75a>] ? copy_module_from_user+0x39/0x82 [<ffffffff8109b7dd>] ? module_sect_show+0x20/0x20 [<ffffffff8109d65f>] SyS_init_module+0x54/0x81 [<ffffffff814a9a12>] system_call_fastpath+0x16/0x1b ---[ end trace 0626f4a3ddea56f3 ]--- The bug can be reproduced by: rmmod kvm_intel.ko insmod kvm_intel.ko without rmmod/insmod kvm.ko This patch fixes the bug by unregistering kvm_device_ops of vfio when the kvm-intel module is removed. Reported-by: Liu Rongrong <rongrongx.liu@intel.com> Fixes: 3c3c29fd0d7cddc32862c350d0700ce69953e3bd Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>