summaryrefslogtreecommitdiff
path: root/arch/powerpc/kvm/book3s_xics.c
AgeCommit message (Collapse)Author
2018-12-14KVM: PPC: Book3S HV: Change to use DEFINE_SHOW_ATTRIBUTE macroYangtao Li
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-10-09KVM: PPC: Book3S HV: Handle hypercalls correctly when nestedPaul Mackerras
When we are running as a nested hypervisor, we use a hypercall to enter the guest rather than code in book3s_hv_rmhandlers.S. This means that the hypercall handlers listed in hcall_real_table never get called. There are some hypercalls that are handled there and not in kvmppc_pseries_do_hcall(), which therefore won't get processed for a nested guest. To fix this, we add cases to kvmppc_pseries_do_hcall() to handle those hypercalls, with the following exceptions: - The HPT hypercalls (H_ENTER, H_REMOVE, etc.) are not handled because we only support radix mode for nested guests. - H_CEDE has to be handled specially because the cede logic in kvmhv_run_single_vcpu assumes that it has been processed by the time that kvmhv_p9_guest_entry() returns. Therefore we put a special case for H_CEDE in kvmhv_p9_guest_entry(). For the XICS hypercalls, if real-mode processing is enabled, then the virtual-mode handlers assume that they are being called only to finish up the operation. Therefore we turn off the real-mode flag in the XICS code when running as a nested hypervisor. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-09KVM: PPC: Book3S: Simplify external interrupt handlingPaul Mackerras
Currently we use two bits in the vcpu pending_exceptions bitmap to indicate that an external interrupt is pending for the guest, one for "one-shot" interrupts that are cleared when delivered, and one for interrupts that persist until cleared by an explicit action of the OS (e.g. an acknowledge to an interrupt controller). The BOOK3S_IRQPRIO_EXTERNAL bit is used for one-shot interrupt requests and BOOK3S_IRQPRIO_EXTERNAL_LEVEL is used for persisting interrupts. In practice BOOK3S_IRQPRIO_EXTERNAL never gets used, because our Book3S platforms generally, and pseries in particular, expect external interrupt requests to persist until they are acknowledged at the interrupt controller. That combined with the confusion introduced by having two bits for what is essentially the same thing makes it attractive to simplify things by only using one bit. This patch does that. With this patch there is only BOOK3S_IRQPRIO_EXTERNAL, and by default it has the semantics of a persisting interrupt. In order to avoid breaking the ABI, we introduce a new "external_oneshot" flag which preserves the behaviour of the KVM_INTERRUPT ioctl with the KVM_INTERRUPT_SET argument. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-22powerpc: Use octal numbers for file permissionsRussell Currey
Symbolic macros are unintuitive and hard to read, whereas octal constants are much easier to interpret. Replace macros for the basic permission flags (user/group/other read/write/execute) with numeric constants instead, across the whole powerpc tree. Introducing a significant number of changes across the tree for no runtime benefit isn't exactly desirable, but so long as these macros are still used in the tree people will keep sending patches that add them. Not only are they hard to parse at a glance, there are multiple ways of coming to the same value (as you can see with 0444 and 0644 in this patch) which hurts readability. Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-09Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD The main thing here is a new implementation of the in-kernel XICS interrupt controller emulation for POWER9 machines, from Ben Herrenschmidt. POWER9 has a new interrupt controller called XIVE (eXternal Interrupt Virtualization Engine) which is able to deliver interrupts directly to guest virtual CPUs in hardware without hypervisor intervention. With this new code, the guest still sees the old XICS interface but performance is better because the XICS emulation in the host uses the XIVE directly rather than going through a XICS emulation in firmware. Conflicts: arch/powerpc/kernel/cpu_setup_power.S [cherry-picked fix] arch/powerpc/kvm/book3s_xive.c [include asm/debugfs.h]
2017-04-27KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controllerBenjamin Herrenschmidt
This patch makes KVM capable of using the XIVE interrupt controller to provide the standard PAPR "XICS" style hypercalls. It is necessary for proper operations when the host uses XIVE natively. This has been lightly tested on an actual system, including PCI pass-through with a TG3 device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and adding empty stubs for the kvm_xive_xxx() routines, fixup subject, integrate fixes from Paul for building PR=y HV=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-12Merge branch 'topic/xive' (early part) into nextMichael Ellerman
This merges the arch part of the XIVE support, leaving the final commit with the KVM specific pieces dangling on the branch for Paul to merge via the kvm-ppc tree.
2017-04-11powerpc: Create asm/debugfs.h and move powerpc_debugfs_root thereMichael Ellerman
powerpc_debugfs_root is the dentry representing the root of the "powerpc" directory tree in debugfs. Currently it sits in asm/debug.h, a long with some other things that have "debug" in the name, but are otherwise unrelated. Pull it out into a separate header, which also includes linux/debugfs.h, and convert all the users to include debugfs.h instead of debug.h. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-10powerpc/kvm: Make kvmppc_xics_create_icp staticBenjamin Herrenschmidt
It's only used within the same file it's defined Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-27KVM: PPC: Book 3S: XICS: Don't lock twice when checking for resendLi Zhong
This patch improves the code that takes lock twice to check the resend flag and do the actual resending, by checking the resend flag locklessly, and add a boolean parameter check_resend to icp_[rm_]deliver_irq(), so the resend flag can be checked in the lock when doing the delivery. We need make sure when we clear the ics's bit in the icp's resend_map, we don't miss the resend flag of the irqs that set the bit. It could be ordered through the barrier in test_and_clear_bit(), and a newly added wmb between setting irq's resend flag, and icp's resend_map. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-27KVM: PPC: Book 3S: XICS: Implement ICS P/Q statesLi Zhong
This patch implements P(Presented)/Q(Queued) states for ICS irqs. When the interrupt is presented, set P. Present if P was not set. If P is already set, don't present again, set Q. When the interrupt is EOI'ed, move Q into P (and clear Q). If it is set, re-present. The asserted flag used by LSI is also incorporated into the P bit. When the irq state is saved, P/Q bits are also saved, they need some qemu modifications to be recognized and passed around to be restored. KVM_XICS_PENDING bit set and saved should also indicate KVM_XICS_PRESENTED bit set and saved. But it is possible some old code doesn't have/recognize the P bit, so when we restore, we set P for PENDING bit, too. The idea and much of the code come from Ben. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-27KVM: PPC: Book 3S: XICS: Fix potential issue with duplicate IRQ resendsLi Zhong
It is possible that in the following order, one irq is resent twice: CPU 1 CPU 2 ics_check_resend() lock ics_lock see resend set unlock ics_lock /* change affinity of the irq */ kvmppc_xics_set_xive() write_xive() lock ics_lock see resend set unlock ics_lock icp_deliver_irq() /* resend */ icp_deliver_irq() /* resend again */ It doesn't have any user-visible effect at present, but needs to be avoided when the following patch implementing the P/Q stuff is applied. This patch clears the resend flag before releasing the ics lock, when we know we will do a re-delivery after checking the flag, or setting the flag. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-27KVM: PPC: Book 3S: XICS cleanup: remove XICS_RM_REJECTLi Zhong
Commit b0221556dbd3 ("KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode") removed the setting of the XICS_RM_REJECT flag. And since that commit, nothing else sets the flag any more, so we can remove the flag and the remaining code that handles it, including the counter that counts how many times it get set. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-12KVM: PPC: Book3S HV: Set server for passed-through interruptsPaul Mackerras
When a guest has a PCI pass-through device with an interrupt, it will direct the interrupt to a particular guest VCPU. In fact the physical interrupt might arrive on any CPU, and then get delivered to the target VCPU in the emulated XICS (guest interrupt controller), and eventually delivered to the target VCPU. Now that we have code to handle device interrupts in real mode without exiting to the host kernel, there is an advantage to having the device interrupt arrive on the same sub(core) as the target VCPU is running on. In this situation, the interrupt can be delivered to the target VCPU without any exit to the host kernel (using a hypervisor doorbell interrupt between threads if necessary). This patch aims to get passed-through device interrupts arriving on the correct core by setting the interrupt server in the real hardware XICS for the interrupt to the first thread in the (sub)core where its target VCPU is running. We do this in the real-mode H_EOI code because the H_EOI handler already needs to look at the emulated ICS state for the interrupt (whereas the H_XIRR handler doesn't), and we know we are running in the target VCPU context at that point. We set the server CPU in hardware using an OPAL call, regardless of what the IRQ affinity mask for the interrupt says, and without updating the affinity mask. This amounts to saying that when an interrupt is passed through to a guest, as a matter of policy we allow the guest's affinity for the interrupt to override the host's. This is inspired by an earlier patch from Suresh Warrier, although none of this code came from that earlier patch. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12KVM: PPC: Book3S HV: Dump irqmap in debugfsSuresh Warrier
Dump the passthrough irqmap structure associated with a guest as part of /sys/kernel/debug/powerpc/kvm-xics-*. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12KVM: PPC: Book3S HV: Complete passthrough interrupt in hostSuresh Warrier
In existing real mode ICP code, when updating the virtual ICP state, if there is a required action that cannot be completely handled in real mode, as for instance, a VCPU needs to be woken up, flags are set in the ICP to indicate the required action. This is checked when returning from hypercalls to decide whether the call needs switch back to the host where the action can be performed in virtual mode. Note that if h_ipi_redirect is enabled, real mode code will first try to message a free host CPU to complete this job instead of returning the host to do it ourselves. Currently, the real mode PCI passthrough interrupt handling code checks if any of these flags are set and simply returns to the host. This is not good enough as the trap value (0x500) is treated as an external interrupt by the host code. It is only when the trap value is a hypercall that the host code searches for and acts on unfinished work by calling kvmppc_xics_rm_complete. This patch introduces a special trap BOOK3S_INTERRUPT_HV_RM_HARD which is returned by KVM if there is unfinished business to be completed in host virtual mode after handling a PCI passthrough interrupt. The host checks for this special interrupt condition and calls into the kvmppc_xics_rm_complete, which is made an exported function for this reason. [paulus@ozlabs.org - moved logic to set r12 to BOOK3S_INTERRUPT_HV_RM_HARD in book3s_hv_rmhandlers.S into the end of kvmppc_check_wake_reason.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-08-19KVM: PPC: Book3S: Don't crash if irqfd used with no in-kernel XICS emulationPaul Mackerras
It turns out that if userspace creates a pseries-type VM without in-kernel XICS (interrupt controller) emulation, and then connects an eventfd to the VM as an irqfd, and the eventfd gets signalled, that the code will try to deliver an interrupt via the non-existent XICS object and crash the host kernel with a NULL pointer dereference. To fix this, we check for the presence of the XICS object before trying to deliver the interrupt, and return with an error if not. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-08-12KVM: Protect device ops->create and list_add with kvm->lockChristoffer Dall
KVM devices were manipulating list data structures without any form of synchronization, and some implementations of the create operations also suffered from a lack of synchronization. Now when we've split the xics create operation into create and init, we can hold the kvm->lock mutex while calling the create operation and when manipulating the devices list. The error path in the generic code gets slightly ugly because we have to take the mutex again and delete the device from the list, but holding the mutex during anon_inode_getfd or releasing/locking the mutex in the common non-error path seemed wrong. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-08-12KVM: PPC: Move xics_debugfs_init out of createChristoffer Dall
As we are about to hold the kvm->lock during the create operation on KVM devices, we should move the call to xics_debugfs_init into its own function, since holding a mutex over extended amounts of time might not be a good idea. Introduce an init operation on the kvm_device_ops struct which cannot fail and call this, if configured, after the device has been created. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-05-12KVM: PPC: Book3S HV: Re-enable XICS fast path for irqfd-generated interruptsPaul Mackerras
Commit c9a5eccac1ab ("kvm/eventfd: add arch-specific set_irq", 2015-10-16) added the possibility for architecture-specific code to handle the generation of virtual interrupts in atomic context where possible, without having to schedule a work function. Since we can easily generate virtual interrupts on XICS without having to do anything worse than take a spinlock, we define a kvm_arch_set_irq_inatomic() for XICS. We also remove kvm_set_msi() since it is not used any more. The one slightly tricky thing is that with the new interface, we don't get told whether the interrupt is an MSI (or other edge sensitive interrupt) vs. level-sensitive. The difference as far as interrupt generation is concerned is that for LSIs we have to set the asserted flag so it will continue to fire until it is explicitly cleared. In fact the XICS code gets told which interrupts are LSIs by userspace when it configures the interrupt via the KVM_DEV_XICS_GRP_SOURCES attribute group on the XICS device. To store this information, we add a new "lsi" field to struct ics_irq_state. With that we can also do a better job of returning accurate values when reading the attribute group. Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-05-11KVM: PPC: Fix debug macrosAlexey Kardashevskiy
When XICS_DBG is enabled, gcc produces format errors. This fixes formats to match passed values types. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-03-01powerpc: Fix misspellings in comments.Adam Buchbinder
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-04KVM: PPC: Book3S: Fix typo in top comment about lockingGreg Kurz
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-04-29powerpc/kvm: Fix SMP=n build error in book3s_xics.cMichael Ellerman
Commit 34cb7954c0aa "Convert ICS mutex lock to spin lock" added an include of asm/spinlock.h, which does not work in the SMP=n case. It should instead include linux/spinlock.h Fixes: 34cb7954c0aa ("KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock") Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-21KVM: PPC: Book3S HV: Add ICP real mode countersSuresh Warrier
Add two counters to count how often we generate real-mode ICS resend and reject events. The counters provide some performance statistics that could be used in the future to consider if the real mode functions need further optimizing. The counters are displayed as part of IPC and ICP state provided by /sys/debug/kernel/powerpc/kvm* for each VM. Also added two counters that count (approximately) how many times we don't find an ICP or ICS we're looking for. These are not currently exposed through sysfs, but can be useful when debugging crashes. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-21KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lockSuresh Warrier
Replaces the ICS mutex lock with a spin lock since we will be porting these routines to real mode. Note that we need to disable interrupts before we take the lock in anticipation of the fact that on the guest side, we are running in the context of a hard irq and interrupts are disabled (EE bit off) when the lock is acquired. Again, because we will be acquiring the lock in hypervisor real mode, we need to use an arch_spinlock_t instead of a normal spinlock here as we want to avoid running any lockdep code (which may not be safe to execute in real mode). Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-21KVM: PPC: Book3S HV: Add guest->host real mode completion countersSuresh E. Warrier
Add counters to track number of times we switch from guest real mode to host virtual mode during an interrupt-related hyper call because the hypercall requires actions that cannot be completed in real mode. This will help when making optimizations that reduce guest-host transitions. It is safe to use an ordinary increment rather than an atomic operation because there is one ICP per virtual CPU and kvmppc_xics_rm_complete() only works on the ICP for the current VCPU. The counters are displayed as part of IPC and ICP state provided by /sys/debug/kernel/powerpc/kvm* for each VM. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-01-19ppc/kvm: Replace ACCESS_ONCE with READ_ONCEChristian Borntraeger
ACCESS_ONCE does not work reliably on non-scalar types. For example gcc 4.6 and 4.7 might remove the volatile tag for such accesses during the SRA (scalar replacement of aggregates) step (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145) Change the ppc/kvm code to replace ACCESS_ONCE with READ_ONCE. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPISuresh E. Warrier
This fixes some inaccuracies in the state machine for the virtualized ICP when implementing the H_IPI hcall (Set_MFFR and related states): 1. The old code wipes out any pending interrupts when the new MFRR is more favored than the CPPR but less favored than a pending interrupt (by always modifying xisr and the pending_pri). This can cause us to lose a pending external interrupt. The correct code here is to only modify the pending_pri and xisr in the ICP if the MFRR is equal to or more favored than the current pending pri (since in this case, it is guaranteed that that there cannot be a pending external interrupt). The code changes are required in both kvmppc_rm_h_ipi and kvmppc_h_ipi. 2. Again, in both kvmppc_rm_h_ipi and kvmppc_h_ipi, there is a check for whether MFRR is being made less favored AND further if new MFFR is also less favored than the current CPPR, we check for any resends pending in the ICP. These checks look like they are designed to cover the case where if the MFRR is being made less favored, we opportunistically trigger a resend of any interrupts that had been previously rejected. Although, this is not a state described by PAPR, this is an action we actually need to do especially if the CPPR is already at 0xFF. Because in this case, the resend bit will stay on until another ICP state change which may be a long time coming and the interrupt stays pending until then. The current code which checks for MFRR < CPPR is broken when CPPR is 0xFF since it will not get triggered in that case. Ideally, we would want to do a resend only if prio(pending_interrupt) < mfrr && prio(pending_interrupt) < cppr where pending interrupt is the one that was rejected. But we don't have the priority of the pending interrupt state saved, so we simply trigger a resend whenever the MFRR is made less favored. 3. In kvmppc_rm_h_ipi, where we save state to pass resends to the virtual mode, we also need to save the ICP whose need_resend we reset since this does not need to be my ICP (vcpu->arch.icp) as is incorrectly assumed by the current code. A new field rm_resend_icp is added to the kvmppc_icp structure for this purpose. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-08-05KVM: PPC: Enable IRQFD support for the XICS interrupt controllerPaul Mackerras
This makes it possible to use IRQFDs on platforms that use the XICS interrupt controller. To do this we implement kvm_irq_map_gsi() and kvm_irq_map_chip_pin() in book3s_xics.c, so as to provide a 1-1 mapping between global interrupt numbers and XICS interrupt source numbers. For now, all interrupts are mapped as "IRQCHIP" interrupts, and no MSI support is provided. This means that kvm_set_irq can now get called with level == 0 or 1 as well as the powerpc-specific values KVM_INTERRUPT_SET, KVM_INTERRUPT_UNSET and KVM_INTERRUPT_SET_LEVEL. We change ics_deliver_irq() to accept all those values, and remove its report_status argument, as it is always false, given that we don't support KVM_IRQ_LINE_STATUS. This also adds support for interrupt ack notifiers to the XICS code so that the IRQFD resampler functionality can be supported. Signed-off-by: Paul Mackerras <paulus@samba.org> Tested-by: Eric Auger <eric.auger@linaro.org> Tested-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-09KVM: PPC: fix couple of memory leaks in MPIC/XICS devicesGleb Natapov
XICS failed to free xics structure on error path. MPIC destroy handler forgot to delete kvm_device structure. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17kvm: powerpc: book3s: drop is_hv_enabledAneesh Kumar K.V
drop is_hv_enabled, because that should not be a callback property Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17kvm: powerpc: book3s: Allow the HV and PR selection per virtual machineAneesh Kumar K.V
This moves the kvmppc_ops callbacks to be a per VM entity. This enables us to select HV and PR mode when creating a VM. We also allow both kvm-hv and kvm-pr kernel module to be loaded. To achieve this we move /dev/kvm ownership to kvm.ko module. Depending on which KVM mode we select during VM creation we take a reference count on respective module Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [agraf: fix coding style] Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17kvm: powerpc: book3s: Support building HV and PR KVM as moduleAneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [agraf: squash in compile fix] Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17kvm: powerpc: book3s: Add is_hv_enabled to kvmppc_opsAneesh Kumar K.V
This help us to identify whether we are running with hypervisor mode KVM enabled. The change is needed so that we can have both HV and PR kvm enabled in the same kernel. If both HV and PR KVM are included, interrupts come in to the HV version of the kvmppc_interrupt code, which then jumps to the PR handler, renamed to kvmppc_interrupt_pr, if the guest is a PR guest. Allowing both PR and HV in the same kernel required some changes to kvm_dev_ioctl_check_extension(), since the values returned now can't be selected with #ifdefs as much as previously. We look at is_hv_enabled to return the right value when checking for capabilities.For capabilities that are only provided by HV KVM, we return the HV value only if is_hv_enabled is true. For capabilities provided by PR KVM but not HV, we return the PR value only if is_hv_enabled is false. NOTE: in later patch we replace is_hv_enabled with a static inline function comparing kvm_ppc_ops Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17kvm: powerpc: Add kvmppc_ops callbackAneesh Kumar K.V
This patch add a new callback kvmppc_ops. This will help us in enabling both HV and PR KVM together in the same kernel. The actual change to enable them together is done in the later patch in the series. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [agraf: squash in booke changes] Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28KVM: PPC: Book3S: Fix compile error in XICS emulationPaul Mackerras
Commit 8e44ddc3f3 ("powerpc/kvm/book3s: Add support for H_IPOLL and H_XIRR_X in XICS emulation") added a call to get_tb() but didn't include the header that defines it, and on some configs this means book3s_xics.c fails to compile: arch/powerpc/kvm/book3s_xics.c: In function ‘kvmppc_xics_hcall’: arch/powerpc/kvm/book3s_xics.c:812:3: error: implicit declaration of function ‘get_tb’ [-Werror=implicit-function-declaration] Cc: stable@vger.kernel.org [v3.10, v3.11] Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-06-01powerpc/kvm/book3s: Add support for H_IPOLL and H_XIRR_X in XICS emulationPaul Mackerras
This adds the remaining two hypercalls defined by PAPR for manipulating the XICS interrupt controller, H_IPOLL and H_XIRR_X. H_IPOLL returns information about the priority and pending interrupts for a virtual cpu, without changing any state. H_XIRR_X is like H_XIRR in that it reads and acknowledges the highest-priority pending interrupt, but it also returns the timestamp (timebase register value) from when the interrupt was first received by the hypervisor. Currently we just return the current time, since we don't do any software queueing of virtual interrupts inside the XICS emulation code. These hcalls are not currently used by Linux guests, but may be in future. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02KVM: PPC: Book3S: Add API for in-kernel XICS emulationPaul Mackerras
This adds the API for userspace to instantiate an XICS device in a VM and connect VCPUs to it. The API consists of a new device type for the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl, which is used to assert and deassert interrupt inputs of the XICS. The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES. Each attribute within this group corresponds to the state of one interrupt source. The attribute number is the same as the interrupt source number. This does not support irq routing or irqfd yet. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: Book3S: Facilities to save/restore XICS presentation ctrler statePaul Mackerras
This adds the ability for userspace to save and restore the state of the XICS interrupt presentation controllers (ICPs) via the KVM_GET/SET_ONE_REG interface. Since there is one ICP per vcpu, we simply define a new 64-bit register in the ONE_REG space for the ICP state. The state includes the CPU priority setting, the pending IPI priority, and the priority and source number of any pending external interrupt. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: Book3S: Add support for ibm,int-on/off RTAS callsPaul Mackerras
This adds support for the ibm,int-on and ibm,int-off RTAS calls to the in-kernel XICS emulation and corrects the handling of the saved priority by the ibm,set-xive RTAS call. With this, ibm,int-off sets the specified interrupt's priority in its saved_priority field and sets the priority to 0xff (the least favoured value). ibm,int-on restores the saved_priority to the priority field, and ibm,set-xive sets both the priority and the saved_priority to the specified priority value. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulationBenjamin Herrenschmidt
This adds an implementation of the XICS hypercalls in real mode for HV KVM, which allows us to avoid exiting the guest MMU context on all threads for a variety of operations such as fetching a pending interrupt, EOI of messages, IPIs, etc. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVMBenjamin Herrenschmidt
Currently, we wake up a CPU by sending a host IPI with smp_send_reschedule() to thread 0 of that core, which will take all threads out of the guest, and cause them to re-evaluate their interrupt status on the way back in. This adds a mechanism to differentiate real host IPIs from IPIs sent by KVM for guest threads to poke each other, in order to target the guest threads precisely when possible and avoid that global switch of the core to host state. We then use this new facility in the in-kernel XICS code. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controllerBenjamin Herrenschmidt
This adds in-kernel emulation of the XICS (eXternal Interrupt Controller Specification) interrupt controller specified by PAPR, for both HV and PR KVM guests. The XICS emulation supports up to 1048560 interrupt sources. Interrupt source numbers below 16 are reserved; 0 is used to mean no interrupt and 2 is used for IPIs. Internally these are represented in blocks of 1024, called ICS (interrupt controller source) entities, but that is not visible to userspace. Each vcpu gets one ICP (interrupt controller presentation) entity, used to store the per-vcpu state such as vcpu priority, pending interrupt state, IPI request, etc. This does not include any API or any way to connect vcpus to their ICP state; that will be added in later patches. This is based on an initial implementation by Michael Ellerman <michael@ellerman.id.au> reworked by Benjamin Herrenschmidt and Paul Mackerras. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix typo, add dependency on !KVM_MPIC] Signed-off-by: Alexander Graf <agraf@suse.de>