Age | Commit message (Collapse) | Author |
|
vgic.c is currently a mixture of generic vGIC emulation code and
functions specific to emulating a GICv2. To ease the addition of
GICv3, split off strictly v2 specific parts into a new file
vgic-v2-emul.c.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
-------
As the diff isn't always obvious here (and to aid eventual rebases),
here is a list of high-level changes done to the code:
* added new file to respective arm/arm64 Makefiles
* moved GICv2 specific functions to vgic-v2-emul.c:
- handle_mmio_misc()
- handle_mmio_set_enable_reg()
- handle_mmio_clear_enable_reg()
- handle_mmio_set_pending_reg()
- handle_mmio_clear_pending_reg()
- handle_mmio_priority_reg()
- vgic_get_target_reg()
- vgic_set_target_reg()
- handle_mmio_target_reg()
- handle_mmio_cfg_reg()
- handle_mmio_sgi_reg()
- vgic_v2_unqueue_sgi()
- read_set_clear_sgi_pend_reg()
- write_set_clear_sgi_pend_reg()
- handle_mmio_sgi_set()
- handle_mmio_sgi_clear()
- vgic_v2_handle_mmio()
- vgic_get_sgi_sources()
- vgic_dispatch_sgi()
- vgic_v2_queue_sgi()
- vgic_v2_map_resources()
- vgic_v2_init()
- vgic_v2_add_sgi_source()
- vgic_v2_init_model()
- vgic_v2_init_emulation()
- handle_cpu_mmio_misc()
- handle_mmio_abpr()
- handle_cpu_mmio_ident()
- vgic_attr_regs_access()
- vgic_create() (renamed to vgic_v2_create())
- vgic_destroy() (renamed to vgic_v2_destroy())
- vgic_has_attr() (renamed to vgic_v2_has_attr())
- vgic_set_attr() (renamed to vgic_v2_set_attr())
- vgic_get_attr() (renamed to vgic_v2_get_attr())
- struct kvm_mmio_range vgic_dist_ranges[]
- struct kvm_mmio_range vgic_cpu_ranges[]
- struct kvm_device_ops kvm_arm_vgic_v2_ops {}
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Currently the maximum number of vCPUs supported is a global value
limited by the used GIC model. GICv3 will lift this limit, but we
still need to observe it for guests using GICv2.
So the maximum number of vCPUs is per-VM value, depending on the
GIC model the guest uses.
Store and check the value in struct kvm_arch, but keep it down to
8 for now.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
With the introduction of a second emulated GIC model we need to let
userspace specify the GIC model to use for each VM. Pass the
userspace provided value down into the vGIC code and store it there
to differentiate later.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
The virtual MPIDR registers (containing topology information) for the
guest are currently mapped linearily to the vcpu_id. Improve this
mapping for arm64 by using three levels to not artificially limit the
number of vCPUs.
To help this, change and rename the kvm_vcpu_get_mpidr() function to
mask off the non-affinity bits in the MPIDR register.
Also add an accessor to later allow easier access to a vCPU with a
given MPIDR. Use this new accessor in the PSCI emulation.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
This patch enables ARMv8 ditry page logging support. Plugs ARMv8 into generic
layer through Kconfig symbol, and drops earlier ARM64 constraints to enable
logging at architecture layer.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
|
|
This patch adds support for 2nd stage page fault handling while dirty page
logging. On huge page faults, huge pages are dissolved to normal pages, and
rebuilding of 2nd stage huge pages is blocked. In case migration is
canceled this restriction is removed and huge pages may be rebuilt again.
Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Add support to track dirty pages between user space KVM_GET_DIRTY_LOG ioctl
calls. We call kvm_get_dirty_log_protect() function to do most of the work.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
|
|
Add support for initial write protection of VM memslots. This patch
series assumes that huge PUDs will not be used in 2nd stage tables, which is
always valid on ARMv7
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
|
|
This patch adds ARMv7 architecture TLB Flush function.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
|
|
Currently the trace printk talks about "wfi" only, though the trace
point triggers both on wfi and wfe traps.
Add a parameter to differentiate between the two.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Wei Huang <wei@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
SRCU is not necessary to be compiled by default in all cases. For tinification
efforts not compiling SRCU unless necessary is desirable.
The current patch tries to make compiling SRCU optional by introducing a new
Kconfig option CONFIG_SRCU which is selected when any of the components making
use of SRCU are selected.
If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.
text data bss dec hex filename
2007 0 0 2007 7d7 kernel/rcu/srcu.o
Size of arch/powerpc/boot/zImage changes from
text data bss dec hex filename
831552 64180 23944 919676 e087c arch/powerpc/boot/zImage : before
829504 64180 23952 917636 e0084 arch/powerpc/boot/zImage : after
so the savings are about ~2000 bytes.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]
|
|
Pull KVM update from Paolo Bonzini:
"3.19 changes for KVM:
- spring cleaning: removed support for IA64, and for hardware-
assisted virtualization on the PPC970
- ARM, PPC, s390 all had only small fixes
For x86:
- small performance improvements (though only on weird guests)
- usual round of hardware-compliancy fixes from Nadav
- APICv fixes
- XSAVES support for hosts and guests. XSAVES hosts were broken
because the (non-KVM) XSAVES patches inadvertently changed the KVM
userspace ABI whenever XSAVES was enabled; hence, this part is
going to stable. Guest support is just a matter of exposing the
feature and CPUID leaves support"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits)
KVM: move APIC types to arch/x86/
KVM: PPC: Book3S: Enable in-kernel XICS emulation by default
KVM: PPC: Book3S HV: Improve H_CONFER implementation
KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register
KVM: PPC: Book3S HV: Remove code for PPC970 processors
KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
KVM: PPC: Book3S HV: Simplify locking around stolen time calculations
arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function
arch: powerpc: kvm: book3s_pr.c: Remove unused function
arch: powerpc: kvm: book3s.c: Remove some unused functions
arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function
KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
KVM: PPC: Book3S HV: ptes are big endian
KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI
KVM: PPC: Book3S HV: Fix KSM memory corruption
KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
KVM: PPC: Book3S HV: Fix computation of tlbie operand
KVM: PPC: Book3S HV: Add missing HPTE unlock
KVM: PPC: BookE: Improve irq inject tracepoint
arm/arm64: KVM: Require in-kernel vgic for the arch timers
...
|
|
It is curently possible to run a VM with architected timers support
without creating an in-kernel VGIC, which will result in interrupts from
the virtual timer going nowhere.
To address this issue, move the architected timers initialization to the
time when we run a VCPU for the first time, and then only initialize
(and enable) the architected timers if we have a properly created and
initialized in-kernel VGIC.
When injecting interrupts from the virtual timer to the vgic, the
current setup should ensure that this never calls an on-demand init of
the VGIC, which is the only call path that could return an error from
kvm_vgic_inject_irq(), so capture the return value and raise a warning
if there's an error there.
We also change the kvm_timer_init() function from returning an int to be
a void function, since the function always succeeds.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
When the vgic initializes its internal state it does so based on the
number of VCPUs available at the time. If we allow KVM to create more
VCPUs after the VGIC has been initialized, we are likely to error out in
unfortunate ways later, perform buffer overflows etc.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
The vgic_initialized() macro currently returns the state of the
vgic->ready flag, which indicates if the vgic is ready to be used when
running a VM, not specifically if its internal state has been
initialized.
Rename the macro accordingly in preparation for a more nuanced
initialization flow.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
VGIC initialization currently happens in three phases:
(1) kvm_vgic_create() (triggered by userspace GIC creation)
(2) vgic_init_maps() (triggered by userspace GIC register read/write
requests, or from kvm_vgic_init() if not already run)
(3) kvm_vgic_init() (triggered by first VM run)
We were doing initialization of some state to correspond with the
state of a freshly-reset GIC in kvm_vgic_init(); this is too late,
since it will overwrite changes made by userspace using the
register access APIs before the VM is run. Move this initialization
earlier, into the vgic_init_maps() phase.
This fixes a bug where QEMU could successfully restore a saved
VM state snapshot into a VM that had already been run, but could
not restore it "from cold" using the -loadvm command line option
(the symptoms being that the restored VM would run but interrupts
were ignored).
Finally rename vgic_init_maps to vgic_init and renamed kvm_vgic_init to
kvm_vgic_map_resources.
[ This patch is originally written by Peter Maydell, but I have
modified it somewhat heavily, renaming various bits and moving code
around. If something is broken, I am to be blamed. - Christoffer ]
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Introduce a new function to unmap user RAM regions in the stage2 page
tables. This is needed on reboot (or when the guest turns off the MMU)
to ensure we fault in pages again and make the dcache, RAM, and icache
coherent.
Using unmap_stage2_range for the whole guest physical range does not
work, because that unmaps IO regions (such as the GIC) which will not be
recreated or in the best case faulted in on a page-by-page basis.
Call this function on secondary and subsequent calls to the
KVM_ARM_VCPU_INIT ioctl so that a reset VCPU will detect the guest
Stage-1 MMU is off when faulting in pages and make the caches coherent.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
When a vcpu calls SYSTEM_OFF or SYSTEM_RESET with PSCI v0.2, the vcpus
should really be turned off for the VM adhering to the suggestions in
the PSCI spec, and it's the sane thing to do.
Also, clarify the behavior and expectations for exits to user space with
the KVM_EXIT_SYSTEM_EVENT case.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
It is not clear that this ioctl can be called multiple times for a given
vcpu. Userspace already does this, so clarify the ABI.
Also specify that userspace is expected to always make secondary and
subsequent calls to the ioctl with the same parameters for the VCPU as
the initial call (which userspace also already does).
Add code to check that userspace doesn't violate that ABI in the future,
and move the kvm_vcpu_set_target() function which is currently
duplicated between the 32-bit and 64-bit versions in guest.c to a common
static function in arm.c, shared between both architectures.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
When userspace resets the vcpu using KVM_ARM_VCPU_INIT, we should also
reset the HCR, because we now modify the HCR dynamically to
enable/disable trapping of guest accesses to the VM registers.
This is crucial for reboot of VMs working since otherwise we will not be
doing the necessary cache maintenance operations when faulting in pages
with the guest MMU off.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
The implementation of KVM_ARM_VCPU_INIT is currently not doing what
userspace expects, namely making sure that a vcpu which may have been
turned off using PSCI is returned to its initial state, which would be
powered on if userspace does not set the KVM_ARM_VCPU_POWER_OFF flag.
Implement the expected functionality and clarify the ABI.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
If a VCPU was originally started with power off (typically to be brought
up by PSCI in SMP configurations), there is no need to clear the
POWER_OFF flag in the kernel, as this flag is only tested during the
init ioctl itself.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Instead of using kvm_is_mmio_pfn() to decide whether a host region
should be stage 2 mapped with device attributes, add a new static
function kvm_is_device_pfn() that disregards RAM pages with the
reserved bit set, as those should usually not be mapped as device
memory.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Currently if using a 48-bit VA, tearing down the hyp page tables (which
can happen in the absence of a GICH or GICV resource) results in the
rather nasty splat below, evidently becasue we access a table that
doesn't actually exist.
Commit 38f791a4e499792e (arm64: KVM: Implement 48 VA support for KVM EL2
and Stage-2) added a pgd_none check to __create_hyp_mappings to account
for the additional level of tables, but didn't add a corresponding check
to unmap_range, and this seems to be the source of the problem.
This patch adds the missing pgd_none check, ensuring we don't try to
access tables that don't exist.
Original splat below:
kvm [1]: Using HYP init bounce page @83fe94a000
kvm [1]: Cannot obtain GICH resource
Unable to handle kernel paging request at virtual address ffff7f7fff000000
pgd = ffff800000770000
[ffff7f7fff000000] *pgd=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc2+ #89
task: ffff8003eb500000 ti: ffff8003eb45c000 task.ti: ffff8003eb45c000
PC is at unmap_range+0x120/0x580
LR is at free_hyp_pgds+0xac/0xe4
pc : [<ffff80000009b768>] lr : [<ffff80000009cad8>] pstate: 80000045
sp : ffff8003eb45fbf0
x29: ffff8003eb45fbf0 x28: ffff800000736000
x27: ffff800000735000 x26: ffff7f7fff000000
x25: 0000000040000000 x24: ffff8000006f5000
x23: 0000000000000000 x22: 0000007fffffffff
x21: 0000800000000000 x20: 0000008000000000
x19: 0000000000000000 x18: ffff800000648000
x17: ffff800000537228 x16: 0000000000000000
x15: 000000000000001f x14: 0000000000000000
x13: 0000000000000001 x12: 0000000000000020
x11: 0000000000000062 x10: 0000000000000006
x9 : 0000000000000000 x8 : 0000000000000063
x7 : 0000000000000018 x6 : 00000003ff000000
x5 : ffff800000744188 x4 : 0000000000000001
x3 : 0000000040000000 x2 : ffff800000000000
x1 : 0000007fffffffff x0 : 000000003fffffff
Process swapper/0 (pid: 1, stack limit = 0xffff8003eb45c058)
Stack: (0xffff8003eb45fbf0 to 0xffff8003eb460000)
fbe0: eb45fcb0 ffff8003 0009cad8 ffff8000
fc00: 00000000 00000080 00736140 ffff8000 00736000 ffff8000 00000000 00007c80
fc20: 00000000 00000080 006f5000 ffff8000 00000000 00000080 00743000 ffff8000
fc40: 00735000 ffff8000 006d3030 ffff8000 006fe7b8 ffff8000 00000000 00000080
fc60: ffffffff 0000007f fdac1000 ffff8003 fd94b000 ffff8003 fda47000 ffff8003
fc80: 00502b40 ffff8000 ff000000 ffff7f7f fdec6000 00008003 fdac1630 ffff8003
fca0: eb45fcb0 ffff8003 ffffffff 0000007f eb45fd00 ffff8003 0009b378 ffff8000
fcc0: ffffffea 00000000 006fe000 ffff8000 00736728 ffff8000 00736120 ffff8000
fce0: 00000040 00000000 00743000 ffff8000 006fe7b8 ffff8000 0050cd48 00000000
fd00: eb45fd60 ffff8003 00096070 ffff8000 006f06e0 ffff8000 006f06e0 ffff8000
fd20: fd948b40 ffff8003 0009a320 ffff8000 00000000 00000000 00000000 00000000
fd40: 00000ae0 00000000 006aa25c ffff8000 eb45fd60 ffff8003 0017ca44 00000002
fd60: eb45fdc0 ffff8003 0009a33c ffff8000 006f06e0 ffff8000 006f06e0 ffff8000
fd80: fd948b40 ffff8003 0009a320 ffff8000 00000000 00000000 00735000 ffff8000
fda0: 006d3090 ffff8000 006aa25c ffff8000 00735000 ffff8000 006d3030 ffff8000
fdc0: eb45fdd0 ffff8003 000814c0 ffff8000 eb45fe50 ffff8003 006aaac4 ffff8000
fde0: 006ddd90 ffff8000 00000006 00000000 006d3000 ffff8000 00000095 00000000
fe00: 006a1e90 ffff8000 00735000 ffff8000 006d3000 ffff8000 006aa25c ffff8000
fe20: 00735000 ffff8000 006d3030 ffff8000 eb45fe50 ffff8003 006fac68 ffff8000
fe40: 00000006 00000006 fe293ee6 ffff8003 eb45feb0 ffff8003 004f8ee8 ffff8000
fe60: 004f8ed4 ffff8000 00735000 ffff8000 00000000 00000000 00000000 00000000
fe80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
fea0: 00000000 00000000 00000000 00000000 00000000 00000000 000843d0 ffff8000
fec0: 004f8ed4 ffff8000 00000000 00000000 00000000 00000000 00000000 00000000
fee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ff00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ff20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ff40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ff60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ff80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ffa0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000005 00000000
ffe0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Call trace:
[<ffff80000009b768>] unmap_range+0x120/0x580
[<ffff80000009cad4>] free_hyp_pgds+0xa8/0xe4
[<ffff80000009b374>] kvm_arch_init+0x268/0x44c
[<ffff80000009606c>] kvm_init+0x24/0x260
[<ffff80000009a338>] arm_init+0x18/0x24
[<ffff8000000814bc>] do_one_initcall+0x88/0x1a0
[<ffff8000006aaac0>] kernel_init_freeable+0x148/0x1e8
[<ffff8000004f8ee4>] kernel_init+0x10/0xd4
Code: 8b000263 92628479 d1000720 eb01001f (f9400340)
---[ end trace 3bc230562e926fa4 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jungseok Lee <jungseoklee85@gmail.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Currently we mangle the endianness of the guest's register even on an
MMIO _read_, where it is completely useless, because we will not use
the value of that register.
Rework the io_mem_abort() function to clearly separate between reads
and writes and only do the endianness mangling on MMIO writes.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Readonly memslots are often used to implement emulation of ROMs and
NOR flashes, in which case the guest may legally map these regions as
uncached.
To deal with the incoherency associated with uncached guest mappings,
treat all readonly memslots as incoherent, and ensure that pages that
belong to regions tagged as such are flushed to DRAM before being passed
to the guest.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
To allow handling of incoherent memslots in a subsequent patch, this
patch adds a paramater 'ipa_uncached' to cache_coherent_guest_page()
so that we can instruct it to flush the page's contents to DRAM even
if the guest has caching globally enabled.
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Instead of using kvm_is_mmio_pfn() to decide whether a host region
should be stage 2 mapped with device attributes, add a new static
function kvm_is_device_pfn() that disregards RAM pages with the
reserved bit set, as those should usually not be mapped as device
memory.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
Pull second batch of changes for KVM/{arm,arm64} from Marc Zyngier:
"The most obvious thing is the sizeable MMU changes to support 48bit
VAs on arm64.
Summary:
- support for 48bit IPA and VA (EL2)
- a number of fixes for devices mapped into guests
- yet another VGIC fix for BE
- a fix for CPU hotplug
- a few compile fixes (disabled VGIC, strict mm checks)"
[ I'm pulling directly from Marc at the request of Paolo Bonzini, whose
backpack was stolen at Düsseldorf airport and will do new keys and
rebuild his web of trust. - Linus ]
* tag 'kvm-arm-for-3.18-take-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm:
arm/arm64: KVM: Fix BE accesses to GICv2 EISR and ELRSR regs
arm: kvm: STRICT_MM_TYPECHECKS fix for user_mem_abort
arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE
arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2
arm/arm64: KVM: map MMIO regions at creation time
arm64: kvm: define PAGE_S2_DEVICE as read-only by default
ARM: kvm: define PAGE_S2_DEVICE as read-only by default
arm/arm64: KVM: add 'writable' parameter to kvm_phys_addr_ioremap
arm/arm64: KVM: fix potential NULL dereference in user_mem_abort()
arm/arm64: KVM: use __GFP_ZERO not memset() to get zeroed pages
ARM: KVM: fix vgic-disabled build
arm: kvm: fix CPU hotplug
|
|
The EIRSR and ELRSR registers are 32-bit registers on GICv2, and we
store these as an array of two such registers on the vgic vcpu struct.
However, we access them as a single 64-bit value or as a bitmap pointer
in the generic vgic code, which breaks BE support.
Instead, store them as u64 values on the vgic structure and do the
word-swapping in the assembly code, which already handles the byte order
for BE systems.
Tested-by: Victor Kamensky <victor.kamensky@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Commit:
b886576 ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping
introduced some code in user_mem_abort that failed to compile if
STRICT_MM_TYPECHECKS was enabled.
This patch fixes up the failing comparison.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Reviewed-by: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
When creating or moving a memslot, make sure the IPA space is within the
addressable range of the guest. Otherwise, user space can create too
large a memslot and KVM would try to access potentially unallocated page
table entries when inserting entries in the Stage-2 page tables.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
This patch adds the necessary support for all host kernel PGSIZE and
VA_SPACE configuration options for both EL2 and the Stage-2 page tables.
However, for 40bit and 42bit PARange systems, the architecture mandates
that VTCR_EL2.SL0 is maximum 1, resulting in fewer levels of stage-2
pagge tables than levels of host kernel page tables. At the same time,
systems with a PARange > 42bit, we limit the IPA range by always setting
VTCR_EL2.T0SZ to 24.
To solve the situation with different levels of page tables for Stage-2
translation than the host kernel page tables, we allocate a dummy PGD
with pointers to our actual inital level Stage-2 page table, in order
for us to reuse the kernel pgtable manipulation primitives. Reproducing
all these in KVM does not look pretty and unnecessarily complicates the
32-bit side.
Systems with a PARange < 40bits are not yet supported.
[ I have reworked this patch from its original form submitted by
Jungseok to take the architecture constraints into consideration.
There were too many changes from the original patch for me to
preserve the authorship. Thanks to Catalin Marinas for his help in
figuring out a good solution to this challenge. I have also fixed
various bugs and missing error code handling from the original
patch. - Christoffer ]
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Jungseok Lee <jungseoklee85@gmail.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
There is really no point in faulting in memory regions page by page
if they are not backed by demand paged system RAM but by a linear
passthrough mapping of a host MMIO region. So instead, detect such
regions at setup time and install the mappings for the backing all
at once.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Add support for read-only MMIO passthrough mappings by adding a
'writable' parameter to kvm_phys_addr_ioremap. For the moment,
mappings will be read-write even if 'writable' is false, but once
the definition of PAGE_S2_DEVICE gets changed, those mappings will
be created read-only.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Handle the potential NULL return value of find_vma_intersection()
before dereferencing it.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Pass __GFP_ZERO to __get_free_pages() instead of calling memset()
explicitly.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Pull KVM updates from Paolo Bonzini:
"Fixes and features for 3.18.
Apart from the usual cleanups, here is the summary of new features:
- s390 moves closer towards host large page support
- PowerPC has improved support for debugging (both inside the guest
and via gdbstub) and support for e6500 processors
- ARM/ARM64 support read-only memory (which is necessary to put
firmware in emulated NOR flash)
- x86 has the usual emulator fixes and nested virtualization
improvements (including improved Windows support on Intel and
Jailhouse hypervisor support on AMD), adaptive PLE which helps
overcommitting of huge guests. Also included are some patches that
make KVM more friendly to memory hot-unplug, and fixes for rare
caching bugs.
Two patches have trivial mm/ parts that were acked by Rik and Andrew.
Note: I will soon switch to a subkey for signing purposes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (157 commits)
kvm: do not handle APIC access page if in-kernel irqchip is not in use
KVM: s390: count vcpu wakeups in stat.halt_wakeup
KVM: s390/facilities: allow TOD-CLOCK steering facility bit
KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode
arm/arm64: KVM: Report correct FSC for unsupported fault types
arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc
kvm: Fix kvm_get_page_retry_io __gup retval check
arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset
kvm: x86: Unpin and remove kvm_arch->apic_access_page
kvm: vmx: Implement set_apic_access_page_addr
kvm: x86: Add request bit to reload APIC access page address
kvm: Add arch specific mmu notifier for page invalidation
kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static
kvm: Fix page ageing bugs
kvm/x86/mmu: Pass gfn and level to rmapp callback.
x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only
kvm: x86: use macros to compute bank MSRs
KVM: x86: Remove debug assertion of non-PAE reserved bits
kvm: don't take vcpu mutex for obviously invalid vcpu ioctls
kvm: Faults which trigger IO release the mmap_sem
...
|
|
On some platforms with no power management capabilities, the hotplug
implementation is allowed to return from a smp_ops.cpu_die() call as a
function return. Upon a CPU onlining event, the KVM CPU notifier tries
to reinstall the hyp stub, which fails on platform where no reset took
place following a hotplug event, with the message:
CPU1: smp_ops.cpu_die() returned, trying to resuscitate
CPU1: Booted secondary processor
Kernel panic - not syncing: unexpected prefetch abort in Hyp mode at: 0x80409540
unexpected data abort in Hyp mode at: 0x80401fe8
unexpected HVC/SVC trap in Hyp mode at: 0x805c6170
since KVM code is trying to reinstall the stub on a system where it is
already configured.
To prevent this issue, this patch adds a check in the KVM hotplug
notifier that detects if the HYP stub really needs re-installing when a
CPU is onlined and skips the installation call if the stub is already in
place, which means that the CPU has not been reset.
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
When we catch something that's not a permission fault or a translation
fault, we log the unsupported FSC in the kernel log, but we were masking
off the bottom bits of the FSC which was not very helpful.
Also correctly report the FSC for data and instruction faults rather
than telling people it was a DFCS, which doesn't exist in the ARM ARM.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
The current aarch64 calculation for VTTBR_BADDR_MASK masks only 39 bits
and not all the bits in the PA range. This is clearly a bug that
manifests itself on systems that allocate memory in the higher address
space range.
[ Modified from Joel's original patch to be based on PHYS_MASK_SHIFT
instead of a hard-coded value and to move the alignment check of the
allocation to mmu.c. Also added a comment explaining why we hardcode
the IPA range and changed the stage-2 pgd allocation to be based on
the 40 bit IPA range instead of the maximum possible 48 bit PA range.
- Christoffer ]
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Joel Schopp <joel.schopp@amd.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
It is now quite easy to delay the allocation of the vgic tables
until we actually require it to be up and running (when the first
vcpu is kicking around, or someones tries to access the GIC registers).
This allow us to allocate memory for the exact number of CPUs we
have. As nobody configures the number of interrupts just yet,
use a fallback to VGIC_NR_IRQS_LEGACY.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
So far, all the VGIC data structures are statically defined by the
*maximum* number of vcpus and interrupts it supports. It means that
we always have to oversize it to cater for the worse case.
Start by changing the data structures to be dynamically sizeable,
and allocate them at runtime.
The sizes are still very static though.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Conflicts:
arch/arm64/include/asm/kvm_host.h
virt/kvm/arm/vgic.c
|
|
The ISS encoding for an exception from a Data Abort has a WnR
bit[6] that indicates whether the Data Abort was caused by a
read or a write instruction. While there are several fields
in the encoding that are only valid if the ISV bit[24] is set,
WnR is not one of them, so we can read it unconditionally.
Instead of fixing both implementations of kvm_is_write_fault()
in place, reimplement it just once using kvm_vcpu_dabt_iswrite(),
which already does the right thing with respect to the WnR bit.
Also fix up the callers to pass 'vcpu'
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
In the beggining was on_each_cpu(), which required an unused argument to
kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten.
Remove unnecessary arguments that stem from this.
Signed-off-by: Radim KrÄmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Using static inline is going to save few bytes and cycles.
For example on powerpc, the difference is 700 B after stripping.
(5 kB before)
This patch also deals with two overlooked empty functions:
kvm_arch_flush_shadow was not removed from arch/mips/kvm/mips.c
2df72e9bc KVM: split kvm_arch_flush_shadow
and kvm_arch_sched_in never made it into arch/ia64/kvm/kvm-ia64.c.
e790d9ef6 KVM: add kvm_arch_sched_in
Signed-off-by: Radim KrÄmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The architecture specifies that when the processor wakes up from a WFE
or WFI instruction, the instruction is considered complete, however we
currrently return to EL1 (or EL0) at the WFI/WFE instruction itself.
While most guests may not be affected by this because their local
exception handler performs an exception returning setting the event bit
or with an interrupt pending, some guests like UEFI will get wedged due
this little mishap.
Simply skip the instruction when we have completed the emulation.
Cc: <stable@vger.kernel.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
X-Gene u-boot runs in EL2 mode with MMU enabled hence we might
have stale EL2 tlb enteris when we enable EL2 MMU on each host CPU.
This can happen on any ARM/ARM64 board running bootloader in
Hyp-mode (or EL2-mode) with MMU enabled.
This patch ensures that we flush all Hyp-mode (or EL2-mode) TLBs
on each host CPU before enabling Hyp-mode (or EL2-mode) MMU.
Cc: <stable@vger.kernel.org>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
We currently return the number of bytes not copied if set_timer_reg
fails, which is almost certainly not what userspace would like.
This patch returns -EFAULT instead.
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|