summaryrefslogtreecommitdiff
path: root/arch/powerpc/kvm/book3s.c
AgeCommit message (Collapse)Author
2021-06-10KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 pathNicholas Piggin
In the interest of minimising the amount of code that is run in "real-mode", don't handle hcalls in real mode in the P9 path. This requires some new handlers for H_CEDE and xics-on-xive to be added before xive is pulled or cede logic is checked. This introduces a change in radix guest behaviour where radix guests that execute 'sc 1' in userspace now get a privilege fault whereas previously the 'sc 1' would be reflected as a syscall interrupt to the guest kernel. That reflection is only required for hash guests that run PR KVM. Background: In POWER8 and earlier processors, it is very expensive to exit from the HV real mode context of a guest hypervisor interrupt, and switch to host virtual mode. On those processors, guest->HV interrupts reach the hypervisor with the MMU off because the MMU is loaded with guest context (LPCR, SDR1, SLB), and the other threads in the sub-core need to be pulled out of the guest too. Then the primary must save off guest state, invalidate SLB and ERAT, and load up host state before the MMU can be enabled to run in host virtual mode (~= regular Linux mode). Hash guests also require a lot of hcalls to run due to the nature of the MMU architecture and paravirtualisation design. The XICS interrupt controller requires hcalls to run. So KVM traditionally tries hard to avoid the full exit, by handling hcalls and other interrupts in real mode as much as possible. By contrast, POWER9 has independent MMU context per-thread, and in radix mode the hypervisor is in host virtual memory mode when the HV interrupt is taken. Radix guests do not require significant hcalls to manage their translations, and xive guests don't need hcalls to handle interrupts. So it's much less important for performance to handle hcalls in real mode on POWER9. One caveat is that the TCE hcalls are performance critical, real-mode variants introduced for POWER8 in order to achieve 10GbE performance. Real mode TCE hcalls were found to be less important on POWER9, which was able to drive 40GBe networking without them (using the virt mode hcalls) but performance is still important. These hcalls will benefit from subsequent guest entry/exit optimisation including possibly a faster "partial exit" that does not entirely switch to host context to handle the hcall. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-14-npiggin@gmail.com
2021-04-17KVM: PPC: Convert to the gfn-based MMU notifier callbacksSean Christopherson
Move PPC to the gfn-base MMU notifier APIs, and update all 15 bajillion PPC-internal hooks to work with gfns instead of hvas. No meaningful functional change intended, though the exact order of operations is slightly different since the memslot lookups occur before calling into arch code. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210402005658.3024832-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-17KVM: PPC: Don't return -ENOTSUPP to userspace in ioctlsGreg Kurz
ENOTSUPP is a linux only thingy, the value of which is unknown to userspace, not to be confused with ENOTSUP which linux maps to EOPNOTSUPP, as permitted by POSIX [1]: [EOPNOTSUPP] Operation not supported on socket. The type of socket (address family or protocol) does not support the requested operation. A conforming implementation may assign the same values for [EOPNOTSUPP] and [ENOTSUP]. Return -EOPNOTSUPP instead of -ENOTSUPP for the following ioctls: - KVM_GET_FPU for Book3s and BookE - KVM_SET_FPU for Book3s and BookE - KVM_GET_DIRTY_LOG for BookE This doesn't affect QEMU which doesn't call the KVM_GET_FPU and KVM_SET_FPU ioctls on POWER anyway since they are not supported, and _buggily_ ignores anything but -EPERM for KVM_GET_DIRTY_LOG. [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html Signed-off-by: Greg Kurz <groug@kaod.org> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-09-03KVM: PPC: Book3S HV: XICS: Replace the 'destroy' method by a 'release' methodGreg Kurz
Similarly to what was done with XICS-on-XIVE and XIVE native KVM devices with commit 5422e95103cf ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method"), convert the historical XICS KVM device to implement the 'release' method. This is needed to run nested guests with an in-kernel IRQ chip. A typical POWER9 guest can select XICS or XIVE during boot, which requires to be able to destroy and to re-create the KVM device. Only the historical XICS KVM device is available under pseries at the current time and it still uses the legacy 'destroy' method. Switching to 'release' means that vCPUs might still be running when the device is destroyed. In order to avoid potential use-after-free, the kvmppc_xics structure is allocated on first usage and kept around until the VM exits. The same pointer is used each time a KVM XICS device is being created, but this is okay since we only have one per VM. Clear the ICP of each vCPU with vcpu->mutex held. This ensures that the next time the vCPU resumes execution, it won't be going into the XICS code anymore. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-08-21KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()Will Deacon
The 'flags' field of 'struct mmu_notifier_range' is used to indicate whether invalidate_range_{start,end}() are permitted to block. In the case of kvm_mmu_notifier_invalidate_range_start(), this field is not forwarded on to the architecture-specific implementation of kvm_unmap_hva_range() and therefore the backend cannot sensibly decide whether or not to block. Add an extra 'flags' parameter to kvm_unmap_hva_range() so that architectures are aware as to whether or not they are permitted to block. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20200811102725.7121-2-will@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27KVM: PPC: Clean up redundant 'kvm_run' parametersTianjia Zhang
In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu' structure. For historical reasons, many kvm-related function parameters retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This patch does a unified cleanup of these remaining redundant parameters. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-04-21kvm_host: unify VM_STAT and VCPU_STAT definitions in a single placeEmanuele Giuseppe Esposito
The macros VM_STAT and VCPU_STAT are redundantly implemented in multiple files, each used by a different architecure to initialize the debugfs entries for statistics. Since they all have the same purpose, they can be unified in a single common definition in include/linux/kvm_host.h Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20200414155625.20559-1-eesposit@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-19KVM: PPC: Kill kvmppc_ops::mmu_destroy() and kvmppc_mmu_destroy()Greg Kurz
These are only used by HV KVM and BookE, and in both cases they are nops. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-03-16KVM: Provide common implementation for generic dirty log functionsSean Christopherson
Move the implementations of KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG for CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT into common KVM code. The arch specific implemenations are extremely similar, differing only in whether the dirty log needs to be sync'd from hardware (x86) and how the TLBs are flushed. Add new arch hooks to handle sync and TLB flush; the sync will also be used for non-generic dirty log support in a future patch (s390). The ulterior motive for providing a common implementation is to eliminate the dependency between arch and common code with respect to the memslot referenced by the dirty log, i.e. to make it obvious in the code that the validity of the memslot is guaranteed, as a future patch will rework memslot handling such that id_to_memslot() can return NULL. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: Simplify kvm_free_memslot() and all its descendentsSean Christopherson
Now that all callers of kvm_free_memslot() pass NULL for @dont, remove the param from the top-level routine and all arch's implementations. No functional change intended. Tested-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: PPC: Move memslot memory allocation into prepare_memory_region()Sean Christopherson
Allocate the rmap array during kvm_arch_prepare_memory_region() to pave the way for removing kvm_arch_create_memslot() altogether. Moving PPC's memory allocation only changes the order of kernel memory allocations between PPC and common KVM code. No functional change intended. Acked-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: Drop kvm_arch_vcpu_setup()Sean Christopherson
Remove kvm_arch_vcpu_setup() now that all arch specific implementations are nops. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24KVM: PPC: Move kvm_vcpu_init() invocation to common codeSean Christopherson
Move the kvm_cpu_{un}init() calls to common PPC code as an intermediate step towards removing kvm_cpu_{un}init() altogether. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24KVM: PPC: Allocate vcpu struct in common PPC codeSean Christopherson
Move allocation of all flavors of PPC vCPUs to common PPC code. All variants either allocate 'struct kvm_vcpu' directly, or require that the embedded 'struct kvm_vcpu' member be located at offset 0, i.e. guarantee that the allocation can be directly interpreted as a 'struct kvm_vcpu' object. Remove the message from the build-time assertion regarding placement of the struct, as compatibility with the arch usercopy region is no longer the sole dependent on 'struct kvm_vcpu' being at offset zero. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: PPC: Book3S: Replace reset_msr mmu op with inject_interrupt arch opNicholas Piggin
reset_msr sets the MSR for interrupt injection, but it's cleaner and more flexible to provide a single op to set both MSR and PC for the interrupt. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Book3S: Define and use SRR1_MSR_BITSNicholas Piggin
Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-09-30kvm: x86, powerpc: do not allow clearing largepages debugfs entryPaolo Bonzini
The largepages debugfs entry is incremented/decremented as shadow pages are created or destroyed. Clearing it will result in an underflow, which is harmless to KVM but ugly (and could be misinterpreted by tools that use debugfs information), so make this particular statistic read-only. Cc: kvm-ppc@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-27KVM: PPC: Book3S: Enable XIVE native capability only if OPAL has required ↵Paul Mackerras
functions There are some POWER9 machines where the OPAL firmware does not support the OPAL_XIVE_GET_QUEUE_STATE and OPAL_XIVE_SET_QUEUE_STATE calls. The impact of this is that a guest using XIVE natively will not be able to be migrated successfully. On the source side, the get_attr operation on the KVM native device for the KVM_DEV_XIVE_GRP_EQ_CONFIG attribute will fail; on the destination side, the set_attr operation for the same attribute will fail. This adds tests for the existence of the OPAL get/set queue state functions, and if they are not supported, the XIVE-native KVM device is not created and the KVM_CAP_PPC_IRQ_XIVE capability returns false. Userspace can then either provide a software emulation of XIVE, or else tell the guest that it does not have a XIVE controller available to it. Cc: stable@vger.kernel.org # v5.2+ Fixes: 3fab2d10588e ("KVM: PPC: Book3S HV: XIVE: Activate XIVE exploitation mode") Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-29KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token listPaul Mackerras
Currently the Book 3S KVM code uses kvm->lock to synchronize access to the kvm->arch.rtas_tokens list. Because this list is scanned inside kvmppc_rtas_hcall(), which is called with the vcpu mutex held, taking kvm->lock cause a lock inversion problem, which could lead to a deadlock. To fix this, we add a new mutex, kvm->arch.rtas_token_lock, which nests inside the vcpu mutexes, and use that instead of kvm->lock when accessing the rtas token list. This removes the lockdep_assert_held() in kvmppc_rtas_tokens_free(). At this point we don't hold the new mutex, but that is OK because kvmppc_rtas_tokens_free() is only called when the whole VM is being destroyed, and at that point nothing can be looking up a token in the list. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-04-30KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' methodCédric Le Goater
When a P9 sPAPR VM boots, the CAS negotiation process determines which interrupt mode to use (XICS legacy or XIVE native) and invokes a machine reset to activate the chosen mode. We introduce 'release' methods for the XICS-on-XIVE and the XIVE native KVM devices which are called when the file descriptor of the device is closed after the TIMA and ESB pages have been unmapped. They perform the necessary cleanups : clear the vCPU interrupt presenters that could be attached and then destroy the device. The 'release' methods replace the 'destroy' methods as 'destroy' is not called anymore once 'release' is. Compatibility with older QEMU is nevertheless maintained. This is not considered as a safe operation as the vCPUs are still running and could be referencing the KVM device through their presenters. To protect the system from any breakage, the kvmppc_xive objects representing both KVM devices are now stored in an array under the VM. Allocation is performed on first usage and memory is freed only when the VM exits. [paulus@ozlabs.org - Moved freeing of xive structures to book3s.c, put it under #ifdef CONFIG_KVM_XICS.] Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-04-30KVM: PPC: Book3S HV: XIVE: Add get/set accessors for the VP XIVE stateCédric Le Goater
The state of the thread interrupt management registers needs to be collected for migration. These registers are cached under the 'xive_saved_state.w01' field of the VCPU when the VPCU context is pulled from the HW thread. An OPAL call retrieves the backup of the IPB register in the underlying XIVE NVT structure and merges it in the KVM state. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-04-30KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation modeCédric Le Goater
This is the basic framework for the new KVM device supporting the XIVE native exploitation mode. The user interface exposes a new KVM device to be created by QEMU, only available when running on a L0 hypervisor. Support for nested guests is not available yet. The XIVE device reuses the device structure of the XICS-on-XIVE device as they have a lot in common. That could possibly change in the future if the need arise. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-02-22Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-nextPaul Mackerras
This merges in the "ppc-kvm" topic branch of the powerpc tree to get a series of commits that touch both general arch/powerpc code and KVM code. These commits will be merged both via the KVM tree and the powerpc tree. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-02-21KVM: PPC: Book3S HV: Simplify machine check handlingPaul Mackerras
This makes the handling of machine check interrupts that occur inside a guest simpler and more robust, with less done in assembler code and in real mode. Now, when a machine check occurs inside a guest, we always get the machine check event struct and put a copy in the vcpu struct for the vcpu where the machine check occurred. We no longer call machine_check_queue_event() from kvmppc_realmode_mc_power7(), because on POWER8, when a vcpu is running on an offline secondary thread and we call machine_check_queue_event(), that calls irq_work_queue(), which doesn't work because the CPU is offline, but instead triggers the WARN_ON(lazy_irq_pending()) in pnv_smp_cpu_kill_self() (which fires again and again because nothing clears the condition). All that machine_check_queue_event() actually does is to cause the event to be printed to the console. For a machine check occurring in the guest, we now print the event in kvmppc_handle_exit_hv() instead. The assembly code at label machine_check_realmode now just calls C code and then continues exiting the guest. We no longer either synthesize a machine check for the guest in assembly code or return to the guest without a machine check. The code in kvmppc_handle_exit_hv() is extended to handle the case where the guest is not FWNMI-capable. In that case we now always synthesize a machine check interrupt for the guest. Previously, if the host thinks it has recovered the machine check fully, it would return to the guest without any notification that the machine check had occurred. If the machine check was caused by some action of the guest (such as creating duplicate SLB entries), it is much better to tell the guest that it has caused a problem. Therefore we now always generate a machine check interrupt for guests that are not FWNMI-capable. Reviewed-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-19KVM: PPC: Book3S HV: Add KVM stat largepages_[2M/1G]Suraj Jitindar Singh
This adds an entry to the kvm_stats_debugfs directory which provides the number of large (2M or 1G) pages which have been used to setup the guest mappings, for radix guests. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-02-19KVM: PPC: Book3S: Allow XICS emulation to work in nested hosts using XIVEPaul Mackerras
Currently, the KVM code assumes that if the host kernel is using the XIVE interrupt controller (the new interrupt controller that first appeared in POWER9 systems), then the in-kernel XICS emulation will use the XIVE hardware to deliver interrupts to the guest. However, this only works when the host is running in hypervisor mode and has full access to all of the XIVE functionality. It doesn't work in any nested virtualization scenario, either with PR KVM or nested-HV KVM, because the XICS-on-XIVE code calls directly into the native-XIVE routines, which are not initialized and cannot function correctly because they use OPAL calls, and OPAL is not available in a guest. This means that using the in-kernel XICS emulation in a nested hypervisor that is using XIVE as its interrupt controller will cause a (nested) host kernel crash. To fix this, we change most of the places where the current code calls xive_enabled() to select between the XICS-on-XIVE emulation and the plain XICS emulation to call a new function, xics_on_xive(), which returns false in a guest. However, there is a further twist. The plain XICS emulation has some functions which are used in real mode and access the underlying XICS controller (the interrupt controller of the host) directly. In the case of a nested hypervisor, this means doing XICS hypercalls directly. When the nested host is using XIVE as its interrupt controller, these hypercalls will fail. Therefore this also adds checks in the places where the XICS emulation wants to access the underlying interrupt controller directly, and if that is XIVE, makes the code use the virtual mode fallback paths, which call generic kernel infrastructure rather than doing direct XICS access. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-21KVM: Make kvm_set_spte_hva() return intLan Tianyu
The patch is to make kvm_set_spte_hva() return int and caller can check return value to determine flush tlb or not. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-17KVM: PPC: Pass change type down to memslot commit functionBharata B Rao
Currently, kvm_arch_commit_memory_region() gets called with a parameter indicating what type of change is being made to the memslot, but it doesn't pass it down to the platform-specific memslot commit functions. This adds the `change' parameter to the lower-level functions so that they can use it in future. [paulus@ozlabs.org - fix book E also.] Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-10-09Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-nextPaul Mackerras
This merges in the "ppc-kvm" topic branch of the powerpc tree to get a series of commits that touch both general arch/powerpc code and KVM code. These commits will be merged both via the KVM tree and the powerpc tree. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-10-09KVM: PPC: Book3S: Simplify external interrupt handlingPaul Mackerras
Currently we use two bits in the vcpu pending_exceptions bitmap to indicate that an external interrupt is pending for the guest, one for "one-shot" interrupts that are cleared when delivered, and one for interrupts that persist until cleared by an explicit action of the OS (e.g. an acknowledge to an interrupt controller). The BOOK3S_IRQPRIO_EXTERNAL bit is used for one-shot interrupt requests and BOOK3S_IRQPRIO_EXTERNAL_LEVEL is used for persisting interrupts. In practice BOOK3S_IRQPRIO_EXTERNAL never gets used, because our Book3S platforms generally, and pseries in particular, expect external interrupt requests to persist until they are acknowledged at the interrupt controller. That combined with the confusion introduced by having two bits for what is essentially the same thing makes it attractive to simplify things by only using one bit. This patch does that. With this patch there is only BOOK3S_IRQPRIO_EXTERNAL, and by default it has the semantics of a persisting interrupt. In order to avoid breaking the ABI, we introduce a new "external_oneshot" flag which preserves the behaviour of the KVM_INTERRUPT ioctl with the KVM_INTERRUPT_SET argument. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-05KVM: PPC: Book3S PR: Exiting split hack mode needs to fixup both PC and LRCameron Kaiser
When an OS (currently only classic Mac OS) is running in KVM-PR and makes a linked jump from code with split hack addressing enabled into code that does not, LR is not correctly updated and reflects the previously munged PC. To fix this, this patch undoes the address munge when exiting split hack mode so that code relying on LR being a proper address will now execute. This does not affect OS X or other operating systems running on KVM-PR. Signed-off-by: Cameron Kaiser <spectre@floodgap.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-07-30powerpc: remove unnecessary inclusion of asm/tlbflush.hChristophe Leroy
asm/tlbflush.h is only needed for: - using functions xxx_flush_tlb_xxx() - using MMU_NO_CONTEXT - including asm-generic/pgtable.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-13KVM: PPC: Book3S PR: Fix MSR setting when delivering interruptsPaul Mackerras
This makes sure that MSR "partial-function" bits are not transferred to SRR1 when delivering an interrupt. This was causing failures in guests running kernels that include commit f3d96e698ed0 ("powerpc/mm: Overhaul handling of bad page faults", 2017-07-19), which added code to check bits of SRR1 on instruction storage interrupts (ISIs) that indicate a bad page fault. The symptom was that a guest user program that handled a signal and attempted to return from the signal handler would get a SIGBUS signal and die. The code that generated ISIs and some other interrupts would previously set bits in the guest MSR to indicate the interrupt status and then call kvmppc_book3s_queue_irqprio(). This technique no longer works now that kvmppc_inject_interrupt() is masking off those bits. Instead we make kvmppc_core_queue_data_storage() and kvmppc_core_queue_inst_storage() call kvmppc_inject_interrupt() directly, and make sure that all the places that generate ISIs or DSIs call kvmppc_core_queue_{data,inst}_storage instead of kvmppc_book3s_queue_irqprio(). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S: Remove load/put vcpu for KVM_GET_REGS/KVM_SET_REGSSimon Guo
In both HV and PR KVM, the KVM_SET_REGS/KVM_GET_REGS ioctl should be able to perform without the vcpu loaded. Since the vcpu mutex locking/unlock has been moved out of vcpu_load() /vcpu_put(), KVM_SET_REGS/KVM_GET_REGS don't need to do ioctl with the vcpu loaded anymore. This patch removes vcpu_load()/vcpu_put() from KVM_SET_REGS/KVM_GET_REGS ioctl. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-22KVM: PPC: Reimplement non-SIMD LOAD/STORE instruction mmio emulation with ↵Simon Guo
analyse_instr() input This patch reimplements non-SIMD LOAD/STORE instruction MMIO emulation with analyse_instr() input. It utilizes the BYTEREV/UPDATE/SIGNEXT properties exported by analyse_instr() and invokes kvmppc_handle_load(s)/kvmppc_handle_store() accordingly. It also moves CACHEOP type handling into the skeleton. instruction_type within kvm_ppc.h is renamed to avoid conflict with sstep.h. Suggested-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-03-19KVM: PPC: Remove unused kvm_unmap_hva callbackPaul Mackerras
Since commit fb1522e099f0 ("KVM: update to new mmu_notifier semantic v2", 2017-08-31), the MMU notifier code in KVM no longer calls the kvm_unmap_hva callback. This removes the PPC implementations of kvm_unmap_hva(). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-12-14KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_guest_debugChristoffer Dall
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_guest_debug(). Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_sregsChristoffer Dall
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_sregs(). Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_get_sregsChristoffer Dall
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_get_sregs(). Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_regsChristoffer Dall
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_regs(). Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_get_regsChristoffer Dall
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_get_regs(). Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-28Merge remote-tracking branch 'remotes/powerpc/topic/xive' into kvm-ppc-nextPaul Mackerras
This merges in the powerpc topic/xive branch to bring in the code for the in-kernel XICS interrupt controller emulation to use the new XIVE (eXternal Interrupt Virtualization Engine) hardware in the POWER9 chip directly, rather than via a XICS emulation in firmware. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-27KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controllerBenjamin Herrenschmidt
This patch makes KVM capable of using the XIVE interrupt controller to provide the standard PAPR "XICS" style hypercalls. It is necessary for proper operations when the host uses XIVE natively. This has been lightly tested on an actual system, including PCI pass-through with a TG3 device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and adding empty stubs for the kvm_xive_xxx() routines, fixup subject, integrate fixes from Paul for building PR=y HV=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-20KVM: PPC: Provide functions for queueing up FP/VEC/VSX unavailable interruptsPaul Mackerras
This provides functions that can be used for generating interrupts indicating that a given functional unit (floating point, vector, or VSX) is unavailable. These functions will be used in instruction emulation code. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-10powerpc/kvm: Massage order of #includeBenjamin Herrenschmidt
We traditionally have linux/ before asm/ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31KVM: PPC: Book3S HV: Page table construction and page faults for radix guestsPaul Mackerras
This adds the code to construct the second-level ("partition-scoped" in architecturese) page tables for guests using the radix MMU. Apart from the PGD level, which is allocated when the guest is created, the rest of the tree is all constructed in response to hypervisor page faults. As well as hypervisor page faults for missing pages, we also get faults for reference/change (RC) bits needing to be set, as well as various other error conditions. For now, we only set the R or C bit in the guest page table if the same bit is set in the host PTE for the backing page. This code can take advantage of the guest being backed with either transparent or ordinary 2MB huge pages, and insert 2MB page entries into the guest page tables. There is no support for 1GB huge pages yet. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-27KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-threadPaul Mackerras
POWER8 has one virtual timebase (VTB) register per subcore, not one per CPU thread. The HV KVM code currently treats VTB as a per-thread register, which can lead to spurious soft lockup messages from guests which use the VTB as the time source for the soft lockup detector. (CPUs before POWER8 did not have the VTB register.) For HV KVM, this fixes the problem by making only the primary thread in each virtual core save and restore the VTB value. With this, the VTB state becomes part of the kvmppc_vcore structure. This also means that "piggybacking" of multiple virtual cores onto one subcore is not possible on POWER8, because then the virtual cores would share a single VTB register. PR KVM emulates a VTB register, which is per-vcpu because PR KVM has no notion of CPU threads or SMT. For PR KVM we move the VTB state into the kvmppc_vcpu_book3s struct. Cc: stable@vger.kernel.org # v3.14+ Reported-by: Thomas Huth <thuth@redhat.com> Tested-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12KVM: PPC: Book3S HV: Counters for passthrough IRQ statsSuresh Warrier
Add VCPU stat counters to track affinity for passthrough interrupts. pthru_all: Counts all passthrough interrupts whose IRQ mappings are in the kvmppc_passthru_irq_map structure. pthru_host: Counts all cached passthrough interrupts that were injected from the host through kvm_set_irq (i.e. not handled in real mode). pthru_bad_aff: Counts how many cached passthrough interrupts have bad affinity (receiving CPU is not running VCPU that is the target of the virtual interrupt in the guest). Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>