From 233a7cb235318223df8133235383f4c595c654c1 Mon Sep 17 00:00:00 2001 From: Suzuki K Poulose Date: Wed, 26 Sep 2018 17:32:54 +0100 Subject: kvm: arm64: Allow tuning the physical address size for VM MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Allow specifying the physical address size limit for a new VM via the kvm_type argument for the KVM_CREATE_VM ioctl. This allows us to finalise the stage2 page table as early as possible and hence perform the right checks on the memory slots without complication. The size is encoded as Log2(PA_Size) in bits[7:0] of the type field. For backward compatibility the value 0 is reserved and implies 40bits. Also, lift the limit of the IPA to host limit and allow lower IPA sizes (e.g, 32). The userspace could check the extension KVM_CAP_ARM_VM_IPA_SIZE for the availability of this feature. The cap check returns the maximum limit for the physical address shift supported by the host. Cc: Marc Zyngier Cc: Christoffer Dall Cc: Peter Maydell Cc: Paolo Bonzini Cc: Radim Krčmář Reviewed-by: Eric Auger Signed-off-by: Suzuki K Poulose Signed-off-by: Marc Zyngier --- Documentation/virtual/kvm/api.txt | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 647f94128a85..f6b0af55d010 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -123,6 +123,37 @@ memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the flag KVM_VM_MIPS_VZ. +On arm64, the physical address size for a VM (IPA Size limit) is limited +to 40bits by default. The limit can be configured if the host supports the +extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use +KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type +identifier, where IPA_Bits is the maximum width of any physical +address used by the VM. The IPA_Bits is encoded in bits[7-0] of the +machine type identifier. + +e.g, to configure a guest to use 48bit physical address size : + + vm_fd = ioctl(dev_fd, KVM_CREATE_VM, KVM_VM_TYPE_ARM_IPA_SIZE(48)); + +The requested size (IPA_Bits) must be : + 0 - Implies default size, 40bits (for backward compatibility) + + or + + N - Implies N bits, where N is a positive integer such that, + 32 <= N <= Host_IPA_Limit + +Host_IPA_Limit is the maximum possible value for IPA_Bits on the host and +is dependent on the CPU capability and the kernel configuration. The limit can +be retrieved using KVM_CAP_ARM_VM_IPA_SIZE of the KVM_CHECK_EXTENSION +ioctl() at run-time. + +Please note that configuring the IPA size does not affect the capability +exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects +size of the address translated by the stage2 level (guest physical to +host physical address translations). + + 4.3 KVM_GET_MSR_INDEX_LIST, KVM_GET_MSR_FEATURE_INDEX_LIST Capability: basic, KVM_CAP_GET_MSR_FEATURES for KVM_GET_MSR_FEATURE_INDEX_LIST -- cgit v1.2.3 From 3032341853daf07c723084a086ebe7a9bdf8f90b Mon Sep 17 00:00:00 2001 From: Paul Mackerras Date: Mon, 8 Oct 2018 16:31:13 +1100 Subject: KVM: PPC: Book3S HV: Add one-reg interface to virtual PTCR register This adds a one-reg register identifier which can be used to read and set the virtual PTCR for the guest. This register identifies the address and size of the virtual partition table for the guest, which contains information about the nested guests under this guest. Migrating this value is the only extra requirement for migrating a guest which has nested guests (assuming of course that the destination host supports nested virtualization in the kvm-hv module). Reviewed-by: David Gibson Signed-off-by: Paul Mackerras Signed-off-by: Michael Ellerman --- Documentation/virtual/kvm/api.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index c664064f76fb..017d851b6c56 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1922,6 +1922,7 @@ registers, find a list below: PPC | KVM_REG_PPC_TIDR | 64 PPC | KVM_REG_PPC_PSSCR | 64 PPC | KVM_REG_PPC_DEC_EXPIRY | 64 + PPC | KVM_REG_PPC_PTCR | 64 PPC | KVM_REG_PPC_TM_GPR0 | 64 ... PPC | KVM_REG_PPC_TM_GPR31 | 64 -- cgit v1.2.3 From aa069a996951f3e2e38437ef0316685a5893fc7e Mon Sep 17 00:00:00 2001 From: Paul Mackerras Date: Fri, 21 Sep 2018 20:02:01 +1000 Subject: KVM: PPC: Book3S HV: Add a VM capability to enable nested virtualization With this, userspace can enable a KVM-HV guest to run nested guests under it. The administrator can control whether any nested guests can be run; setting the "nested" module parameter to false prevents any guests becoming nested hypervisors (that is, any attempt to enable the nested capability on a guest will fail). Guests which are already nested hypervisors will continue to be so. Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/api.txt | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 2f5f9b743bff..fde48b6708f1 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -4532,6 +4532,20 @@ With this capability, a guest may read the MSR_PLATFORM_INFO MSR. Otherwise, a #GP would be raised when the guest tries to access. Currently, this capability does not enable write permissions of this MSR for the guest. +7.16 KVM_CAP_PPC_NESTED_HV + +Architectures: ppc +Parameters: none +Returns: 0 on success, -EINVAL when the implementation doesn't support + nested-HV virtualization. + +HV-KVM on POWER9 and later systems allows for "nested-HV" +virtualization, which provides a way for a guest VM to run guests that +can run using the CPU's supervisor mode (privileged non-hypervisor +state). Enabling this capability on a VM depends on the CPU having +the necessary functionality and on the facility being enabled with a +kvm-hv module parameter. + 8. Other capabilities. ---------------------- -- cgit v1.2.3 From 901f8c3f6feb0225c14b3bc6237850fb921d2f2d Mon Sep 17 00:00:00 2001 From: Paul Mackerras Date: Mon, 8 Oct 2018 14:24:30 +1100 Subject: KVM: PPC: Book3S HV: Add NO_HASH flag to GET_SMMU_INFO ioctl result This adds a KVM_PPC_NO_HASH flag to the flags field of the kvm_ppc_smmu_info struct, and arranges for it to be set when running as a nested hypervisor, as an unambiguous indication to userspace that HPT guests are not supported. Reporting the KVM_CAP_PPC_MMU_HASH_V3 capability as false could be taken as indicating only that the new HPT features in ISA V3.0 are not supported, leaving it ambiguous whether pre-V3.0 HPT features are supported. Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/api.txt | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index fde48b6708f1..df98b6304769 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2270,6 +2270,10 @@ The supported flags are: The emulated MMU supports 1T segments in addition to the standard 256M ones. + - KVM_PPC_NO_HASH + This flag indicates that HPT guests are not supported by KVM, + thus all guests must use radix MMU mode. + The "slb_size" field indicates how many SLB entries are supported The "sps" array contains 8 entries indicating the supported base -- cgit v1.2.3 From 214ff83d4473a7757fa18a64dc7efe3b0e158486 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Wed, 26 Sep 2018 19:02:59 +0200 Subject: KVM: x86: hyperv: implement PV IPI send hypercalls Using hypercall for sending IPIs is faster because this allows to specify any number of vCPUs (even > 64 with sparse CPU set), the whole procedure will take only one VMEXIT. Current Hyper-V TLFS (v5.0b) claims that HvCallSendSyntheticClusterIpi hypercall can't be 'fast' (passing parameters through registers) but apparently this is not true, Windows always uses it as 'fast' so we need to support that. Signed-off-by: Vitaly Kuznetsov Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index df98b6304769..48e5d1197295 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -4791,3 +4791,10 @@ CPU when the exception is taken. If this virtual SError is taken to EL1 using AArch64, this value will be reported in the ISS field of ESR_ELx. See KVM_CAP_VCPU_EVENTS for more details. +8.20 KVM_CAP_HYPERV_SEND_IPI + +Architectures: x86 + +This capability indicates that KVM supports paravirtualized Hyper-V IPI send +hypercalls: +HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx. -- cgit v1.2.3 From 9943450b7b8831c5045362eed45f6fefd1986d72 Mon Sep 17 00:00:00 2001 From: Peng Hao Date: Sun, 14 Oct 2018 07:09:56 +0800 Subject: kvm/x86 : add document for coalesced mmio Signed-off-by: Peng Hao Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 48e5d1197295..10d48eb67da9 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3681,6 +3681,30 @@ Returns: 0 on success, -1 on error This copies the vcpu's kvm_nested_state struct from userspace to the kernel. For the definition of struct kvm_nested_state, see KVM_GET_NESTED_STATE. +4.116 KVM_(UN)REGISTER_COALESCED_MMIO + +Capability: KVM_CAP_COALESCED_MMIO +Architectures: all +Type: vm ioctl +Parameters: struct kvm_coalesced_mmio_zone +Returns: 0 on success, < 0 on error + +Coalesced mmio is a performance optimization that defers hardware +register write emulation so that userspace exits are avoided. It is +typically used to reduce the overhead of emulating frequently accessed +hardware registers. + +When a hardware register is configured for coalesced mmio, write accesses +do not exit to userspace and their value is recorded in a ring buffer +that is shared between kernel and userspace. + +Coalesced mmio is used if one or more write accesses to a hardware +register can be deferred until a read or a write to another hardware +register on the same device. This last access will cause a vmexit and +userspace will process accesses from the ring buffer before emulating +it. That will avoid exiting to userspace on repeated writes to the +first register. + 5. The kvm_run structure ------------------------ -- cgit v1.2.3 From 0804c849f1df0992d39a37c4fc259f7f8b16f385 Mon Sep 17 00:00:00 2001 From: Peng Hao Date: Sun, 14 Oct 2018 07:09:55 +0800 Subject: kvm/x86 : add coalesced pio support Coalesced pio is based on coalesced mmio and can be used for some port like rtc port, pci-host config port and so on. Specially in case of rtc as coalesced pio, some versions of windows guest access rtc frequently because of rtc as system tick. guest access rtc like this: write register index to 0x70, then write or read data from 0x71. writing 0x70 port is just as index and do nothing else. So we can use coalesced pio to handle this scene to reduce VM-EXIT time. When starting and closing a virtual machine, it will access pci-host config port frequently. So setting these port as coalesced pio can reduce startup and shutdown time. without my patch, get the vm-exit time of accessing rtc 0x70 and piix 0xcf8 using perf tools: (guest OS : windows 7 64bit) IO Port Access Samples Samples% Time% Min Time Max Time Avg time 0x70:POUT 86 30.99% 74.59% 9us 29us 10.75us (+- 3.41%) 0xcf8:POUT 1119 2.60% 2.12% 2.79us 56.83us 3.41us (+- 2.23%) with my patch IO Port Access Samples Samples% Time% Min Time Max Time Avg time 0x70:POUT 106 32.02% 29.47% 0us 10us 1.57us (+- 7.38%) 0xcf8:POUT 1065 1.67% 0.28% 0.41us 65.44us 0.66us (+- 10.55%) Signed-off-by: Peng Hao Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 10d48eb67da9..70f9c8bb1840 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3683,27 +3683,31 @@ the definition of struct kvm_nested_state, see KVM_GET_NESTED_STATE. 4.116 KVM_(UN)REGISTER_COALESCED_MMIO -Capability: KVM_CAP_COALESCED_MMIO +Capability: KVM_CAP_COALESCED_MMIO (for coalesced mmio) + KVM_CAP_COALESCED_PIO (for coalesced pio) Architectures: all Type: vm ioctl Parameters: struct kvm_coalesced_mmio_zone Returns: 0 on success, < 0 on error -Coalesced mmio is a performance optimization that defers hardware +Coalesced I/O is a performance optimization that defers hardware register write emulation so that userspace exits are avoided. It is typically used to reduce the overhead of emulating frequently accessed hardware registers. -When a hardware register is configured for coalesced mmio, write accesses +When a hardware register is configured for coalesced I/O, write accesses do not exit to userspace and their value is recorded in a ring buffer that is shared between kernel and userspace. -Coalesced mmio is used if one or more write accesses to a hardware +Coalesced I/O is used if one or more write accesses to a hardware register can be deferred until a read or a write to another hardware register on the same device. This last access will cause a vmexit and userspace will process accesses from the ring buffer before emulating -it. That will avoid exiting to userspace on repeated writes to the -first register. +it. That will avoid exiting to userspace on repeated writes. + +Coalesced pio is based on coalesced mmio. There is little difference +between coalesced mmio and pio except that coalesced pio records accesses +to I/O ports. 5. The kvm_run structure ------------------------ -- cgit v1.2.3 From bba9ce58d9cb7ba9e121627108eca986760ad0e8 Mon Sep 17 00:00:00 2001 From: Jim Mattson Date: Tue, 16 Oct 2018 14:29:18 -0700 Subject: KVM: Documentation: Fix omission in struct kvm_vcpu_events The header file indicates that there are 36 reserved bytes at the end of this structure. Adjust the documentation to agree with the header file. Signed-off-by: Jim Mattson Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 70f9c8bb1840..cf9422f4f101 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -873,6 +873,7 @@ struct kvm_vcpu_events { __u8 smm_inside_nmi; __u8 latched_init; } smi; + __u32 reserved[9]; }; Only two fields are defined in the flags field: -- cgit v1.2.3 From 59073aaf6de0d2dacc2603cee6d1d6cd5592ac08 Mon Sep 17 00:00:00 2001 From: Jim Mattson Date: Tue, 16 Oct 2018 14:29:20 -0700 Subject: kvm: x86: Add exception payload fields to kvm_vcpu_events The per-VM capability KVM_CAP_EXCEPTION_PAYLOAD (to be introduced in a later commit) adds the following fields to struct kvm_vcpu_events: exception_has_payload, exception_payload, and exception.pending. With this capability set, all of the details of vcpu->arch.exception, including the payload for a pending exception, are reported to userspace in response to KVM_GET_VCPU_EVENTS. With this capability clear, the original ABI is preserved, and the exception.injected field is set for either pending or injected exceptions. When userspace calls KVM_SET_VCPU_EVENTS with KVM_CAP_EXCEPTION_PAYLOAD clear, exception.injected is no longer translated to exception.pending. KVM_SET_VCPU_EVENTS can now only establish a pending exception when KVM_CAP_EXCEPTION_PAYLOAD is set. Reported-by: Jim Mattson Suggested-by: Paolo Bonzini Signed-off-by: Jim Mattson Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index cf9422f4f101..e900ac31501c 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -850,7 +850,7 @@ struct kvm_vcpu_events { __u8 injected; __u8 nr; __u8 has_error_code; - __u8 pad; + __u8 pending; __u32 error_code; } exception; struct { @@ -873,16 +873,23 @@ struct kvm_vcpu_events { __u8 smm_inside_nmi; __u8 latched_init; } smi; - __u32 reserved[9]; + __u8 reserved[27]; + __u8 exception_has_payload; + __u64 exception_payload; }; -Only two fields are defined in the flags field: +The following bits are defined in the flags field: -- KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that +- KVM_VCPUEVENT_VALID_SHADOW may be set to signal that interrupt.shadow contains a valid state. -- KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that - smi contains a valid state. +- KVM_VCPUEVENT_VALID_SMM may be set to signal that smi contains a + valid state. + +- KVM_VCPUEVENT_VALID_PAYLOAD may be set to signal that the + exception_has_payload, exception_payload, and exception.pending + fields contain a valid state. This bit will be set whenever + KVM_CAP_EXCEPTION_PAYLOAD is enabled. ARM/ARM64: @@ -962,6 +969,11 @@ shall be written into the VCPU. KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available. +If KVM_CAP_EXCEPTION_PAYLOAD is enabled, KVM_VCPUEVENT_VALID_PAYLOAD +can be set in the flags field to signal that the +exception_has_payload, exception_payload, and exception.pending fields +contain a valid state and shall be written into the VCPU. + ARM/ARM64: Set the pending SError exception state for this VCPU. It is not possible to -- cgit v1.2.3 From c4f55198c7c2b87909b166ffc2f6b68d9af6766c Mon Sep 17 00:00:00 2001 From: Jim Mattson Date: Tue, 16 Oct 2018 14:29:24 -0700 Subject: kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD This is a per-VM capability which can be enabled by userspace so that the faulting linear address will be included with the information about a pending #PF in L2, and the "new DR6 bits" will be included with the information about a pending #DB in L2. With this capability enabled, the L1 hypervisor can now intercept #PF before CR2 is modified. Under VMX, the L1 hypervisor can now intercept #DB before DR6 and DR7 are modified. When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should generally provide an appropriate payload when injecting a #PF or #DB exception via KVM_SET_VCPU_EVENTS. However, to support restoring old checkpoints, this payload is not required. Note that bit 16 of the "new DR6 bits" is set to indicate that a debug exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM region while advanced debugging of RTM transactional regions was enabled. This is the reverse of DR6.RTM, which is cleared in this scenario. This capability also enables exception.pending in struct kvm_vcpu_events, which allows userspace to distinguish between pending and injected exceptions. Reported-by: Jim Mattson Suggested-by: Paolo Bonzini Signed-off-by: Jim Mattson Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index e900ac31501c..07e87a7c665d 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -4568,7 +4568,7 @@ hpage module parameter is not set to 1, -EINVAL is returned. While it is generally possible to create a huge page backed VM without this capability, the VM will not be able to run. -7.14 KVM_CAP_MSR_PLATFORM_INFO +7.15 KVM_CAP_MSR_PLATFORM_INFO Architectures: x86 Parameters: args[0] whether feature should be enabled or not @@ -4591,6 +4591,31 @@ state). Enabling this capability on a VM depends on the CPU having the necessary functionality and on the facility being enabled with a kvm-hv module parameter. +7.17 KVM_CAP_EXCEPTION_PAYLOAD + +Architectures: x86 +Parameters: args[0] whether feature should be enabled or not + +With this capability enabled, CR2 will not be modified prior to the +emulated VM-exit when L1 intercepts a #PF exception that occurs in +L2. Similarly, for kvm-intel only, DR6 will not be modified prior to +the emulated VM-exit when L1 intercepts a #DB exception that occurs in +L2. As a result, when KVM_GET_VCPU_EVENTS reports a pending #PF (or +#DB) exception for L2, exception.has_payload will be set and the +faulting address (or the new DR6 bits*) will be reported in the +exception_payload field. Similarly, when userspace injects a #PF (or +#DB) into L2 using KVM_SET_VCPU_EVENTS, it is expected to set +exception.has_payload and to put the faulting address (or the new DR6 +bits*) in the exception_payload field. + +This capability also enables exception.pending in struct +kvm_vcpu_events, which allows userspace to distinguish between pending +and injected exceptions. + + +* For the new DR6 bits, note that bit 16 is set iff the #DB exception + will clear DR6.RTM. + 8. Other capabilities. ---------------------- -- cgit v1.2.3