Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull initial set_fs() removal from Al Viro:
"Christoph's set_fs base series + fixups"
* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Allow a NULL pos pointer to __kernel_read
fs: Allow a NULL pos pointer to __kernel_write
powerpc: remove address space overrides using set_fs()
powerpc: use non-set_fs based maccess routines
x86: remove address space overrides using set_fs()
x86: make TASK_SIZE_MAX usable from assembly code
x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
lkdtm: remove set_fs-based tests
test_bitmap: remove user bitmap tests
uaccess: add infrastructure for kernel builds with set_fs()
fs: don't allow splice read/write without explicit ops
fs: don't allow kernel reads and writes without iter ops
sysctl: Convert to iter interfaces
proc: add a read_iter method to proc proc_ops
proc: cleanup the compat vs no compat file ops
proc: remove a level of indentation in proc_get_inode
|
|
When running in lazy TLB mode the currently active page tables might
be the ones of a previous process, e.g. when running a kernel thread.
This can be problematic in case kernel code is being modified via
text_poke() in a kernel thread, and on another processor exit_mmap()
is active for the process which was running on the first cpu before
the kernel thread.
As text_poke() is using a temporary address space and the former
address space (obtained via cpu_tlbstate.loaded_mm) is restored
afterwards, there is a race possible in case the cpu on which
exit_mmap() is running wants to make sure there are no stale
references to that address space on any cpu active (this e.g. is
required when running as a Xen PV guest, where this problem has been
observed and analyzed).
In order to avoid that, drop off TLB lazy mode before switching to the
temporary address space.
Fixes: cefa929c034eb5d ("x86/mm: Introduce temporary mm structs")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201009144225.12019-1-jgross@suse.com
|
|
When an UE or memory error exception is encountered the MCE handler
tries to find the pfn using addr_to_pfn() which takes effective
address as an argument, later pfn is used to poison the page where
memory error occurred, recent rework in this area made addr_to_pfn
to run in real mode, which can be fatal as it may try to access
memory outside RMO region.
Have two helper functions to separate things to be done in real mode
and virtual mode without changing any functionality. This also fixes
the following error as the use of addr_to_pfn is now moved to virtual
mode.
Without this change following kernel crash is seen on hitting UE.
[ 485.128036] Oops: Kernel access of bad area, sig: 11 [#1]
[ 485.128040] LE SMP NR_CPUS=2048 NUMA pSeries
[ 485.128047] Modules linked in:
[ 485.128067] CPU: 15 PID: 6536 Comm: insmod Kdump: loaded Tainted: G OE 5.7.0 #22
[ 485.128074] NIP: c00000000009b24c LR: c0000000000398d8 CTR: c000000000cd57c0
[ 485.128078] REGS: c000000003f1f970 TRAP: 0300 Tainted: G OE (5.7.0)
[ 485.128082] MSR: 8000000000001003 <SF,ME,RI,LE> CR: 28008284 XER: 00000001
[ 485.128088] CFAR: c00000000009b190 DAR: c0000001fab00000 DSISR: 40000000 IRQMASK: 1
[ 485.128088] GPR00: 0000000000000001 c000000003f1fbf0 c000000001634300 0000b0fa01000000
[ 485.128088] GPR04: d000000002220000 0000000000000000 00000000fab00000 0000000000000022
[ 485.128088] GPR08: c0000001fab00000 0000000000000000 c0000001fab00000 c000000003f1fc14
[ 485.128088] GPR12: 0000000000000008 c000000003ff5880 d000000002100008 0000000000000000
[ 485.128088] GPR16: 000000000000ff20 000000000000fff1 000000000000fff2 d0000000021a1100
[ 485.128088] GPR20: d000000002200000 c00000015c893c50 c000000000d49b28 c00000015c893c50
[ 485.128088] GPR24: d0000000021a0d08 c0000000014e5da8 d0000000021a0818 000000000000000a
[ 485.128088] GPR28: 0000000000000008 000000000000000a c0000000017e2970 000000000000000a
[ 485.128125] NIP [c00000000009b24c] __find_linux_pte+0x11c/0x310
[ 485.128130] LR [c0000000000398d8] addr_to_pfn+0x138/0x170
[ 485.128133] Call Trace:
[ 485.128135] Instruction dump:
[ 485.128138] 3929ffff 7d4a3378 7c883c36 7d2907b4 794a1564 7d294038 794af082 3900ffff
[ 485.128144] 79291f24 790af00e 78e70020 7d095214 <7c69502a> 2fa30000 419e011c 70690040
[ 485.128152] ---[ end trace d34b27e29ae0e340 ]---
Fixes: 9ca766f9891d ("powerpc/64s/pseries: machine check convert to use common event code")
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724063946.21378-1-ganeshgr@linux.ibm.com
|
|
GCC 4.9 sometimes fails to build with "m<>" constraint in
inline assembly.
CC lib/iov_iter.o
In file included from ./arch/powerpc/include/asm/cmpxchg.h:6:0,
from ./arch/powerpc/include/asm/atomic.h:11,
from ./include/linux/atomic.h:7,
from ./include/linux/crypto.h:15,
from ./include/crypto/hash.h:11,
from lib/iov_iter.c:2:
lib/iov_iter.c: In function 'iovec_from_user.part.30':
./arch/powerpc/include/asm/uaccess.h:287:2: error: 'asm' operand has impossible constraints
__asm__ __volatile__( \
^
./include/linux/compiler.h:78:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
^
./arch/powerpc/include/asm/uaccess.h:583:34: note: in expansion of macro 'unsafe_op_wrap'
#define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e)
^
./arch/powerpc/include/asm/uaccess.h:329:10: note: in expansion of macro '__get_user_asm'
case 4: __get_user_asm(x, (u32 __user *)ptr, retval, "lwz"); break; \
^
./arch/powerpc/include/asm/uaccess.h:363:3: note: in expansion of macro '__get_user_size_allowed'
__get_user_size_allowed(__gu_val, __gu_addr, __gu_size, __gu_err); \
^
./arch/powerpc/include/asm/uaccess.h:100:2: note: in expansion of macro '__get_user_nocheck'
__get_user_nocheck((x), (ptr), sizeof(*(ptr)), false)
^
./arch/powerpc/include/asm/uaccess.h:583:49: note: in expansion of macro '__get_user_allowed'
#define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e)
^
lib/iov_iter.c:1663:3: note: in expansion of macro 'unsafe_get_user'
unsafe_get_user(len, &uiov[i].iov_len, uaccess_end);
^
make[1]: *** [scripts/Makefile.build:283: lib/iov_iter.o] Error 1
Define a UPD_CONSTR macro that is "<>" by default and
only "" with GCC prior to GCC 5.
Fixes: fcf1f26895a4 ("powerpc/uaccess: Add pre-update addressing to __put_user_asm_goto()")
Fixes: 2f279eeb68b8 ("powerpc/uaccess: Add pre-update addressing to __get_user_asm() and __put_user_asm()")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/212d3bc4a52ca71523759517bb9c61f7e477c46a.1603179582.git.christophe.leroy@csgroup.eu
|
|
In commit 269e583357df ("powerpc/eeh: Delete eeh_pe->config_addr") the
following simplification was made:
- if (!pe->addr && !pe->config_addr) {
+ if (!pe->addr) {
eeh_stats.no_cfg_addr++;
return 0;
}
This introduced a bug which causes EEH checking to be skipped for
devices in PE#0.
Before the change above the check would always pass since at least one
of the two PE addresses would be non-zero in all circumstances. On
PowerNV pe->config_addr would be the BDFN of the first device added to
the PE. The zero BDFN is reserved for the PHB's root port, but this is
fine since for obscure platform reasons the root port is never
assigned to PE#0.
Similarly, on pseries pe->addr has always been non-zero for the
reasons outlined in commit 42de19d5ef71 ("powerpc/pseries/eeh: Allow
zero to be a valid PE configuration address").
We can fix the problem by deleting the block entirely The original
purpose of this test was to avoid performing EEH checks on devices
that were not on an EEH capable bus. In modern Linux the edev->pe
pointer will be NULL for devices that are not on an EEH capable bus.
The code block immediately above this one already checks for the
edev->pe == NULL case so this test (new and old) is entirely
redundant.
Ideally we'd delete eeh_stats.no_cfg_addr too since nothing increments
it any more. Unfortunately, that information is exposed via
/proc/powerpc/eeh which means it's technically ABI. We could make it
hard-coded, but that's a change for another patch.
Fixes: 269e583357df ("powerpc/eeh: Delete eeh_pe->config_addr")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201021232554.1434687-1-oohall@gmail.com
|
|
In order to avoid creating executable hugepages in the TDP MMU PF
handler, remove the dependency between disallowed_hugepage_adjust and
the shadow_walk_iterator. This will open the function up to being used
by the TDP MMU PF handler in a future patch.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-10-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add functions to zap SPTEs to the TDP MMU. These are needed to tear down
TDP MMU roots properly and implement other MMU functions which require
tearing down mappings. Future patches will add functions to populate the
page tables, but as for this patch there will not be any work for these
functions to do.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-8-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The existing bookkeeping done by KVM when a PTE is changed is spread
around several functions. This makes it difficult to remember all the
stats, bitmaps, and other subsystems that need to be updated whenever a
PTE is modified. When a non-leaf PTE is marked non-present or becomes a
leaf PTE, page table memory must also be freed. To simplify the MMU and
facilitate the use of atomic operations on SPTEs in future patches, create
functions to handle some of the bookkeeping required as a result of
a change.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Signed-off-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The TDP MMU must be able to allocate paging structure root pages and track
the usage of those pages. Implement a similar, but separate system for root
page allocation to that of the x86 shadow paging implementation. When
future patches add synchronization model changes to allow for parallel
page faults, these pages will need to be handled differently from the
x86 shadow paging based MMU's root pages.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Signed-off-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The TDP MMU offers an alternative mode of operation to the x86 shadow
paging based MMU, optimized for running an L1 guest with TDP. The TDP MMU
will require new fields that need to be initialized and torn down. Add
hooks into the existing KVM MMU initialization process to do that
initialization / cleanup. Currently the initialization and cleanup
fucntions do not do very much, however more operations will be added in
future patches.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-4-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The TDP iterator implements a pre-order traversal of a TDP paging
structure. This iterator will be used in future patches to create
an efficient implementation of the KVM MMU for the TDP case.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Signed-off-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The SPTE format will be common to both the shadow and the TDP MMU.
Extract code that implements the format to a separate module, as a
first step towards adding the TDP MMU and putting mmu.c on a diet.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The TDP MMU's own function for the changed-PTE notifier will need to be
update a PTE in the exact same way as the shadow MMU. Rather than
re-implementing this logic, factor the SPTE creation out of kvm_set_pte_rmapp.
Extracted out of a patch by Ben Gardon. <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Separate the functions for generating leaf page table entries from the
function that inserts them into the paging structure. This refactoring
will facilitate changes to the MMU sychronization model to use atomic
compare / exchanges (which are not guaranteed to succeed) instead of a
monolithic MMU lock.
No functional change expected.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This commit introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Signed-off-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The TDP MMU page fault handler will need to be able to create non-leaf
SPTEs to build up the paging structures. Rather than re-implementing the
function, factor the SPTE creation out of link_shadow_page.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20200925212302.3979661-9-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Pick up bugfixes from 5.9, otherwise various tests fail.
|
|
This should be const, so make it so.
Signed-off-by: Joe Perches <joe@perches.com>
Message-Id: <d130e88dd4c82a12d979da747cc0365c72c3ba15.1601770305.git.joe@perches.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add FSGSBASE to the set of possible guest-owned CR4 bits, i.e. let the
guest own it on VMX. KVM never queries the guest's CR4.FSGSBASE value,
thus there is no reason to force VM-Exit on FSGSBASE being toggled.
Note, because FSGSBASE is conditionally available, this is dependent on
recent changes to intercept reserved CR4 bits and to update the CR4
guest/host mask in response to guest CPUID changes.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
[sean: added justification in changelog]
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200930041659.28181-6-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Intercept CR4 bits that are guest reserved so that KVM correctly injects
a #GP fault if the guest attempts to set a reserved bit. If a feature
is supported by the CPU but is not exposed to the guest, and its
associated CR4 bit is not intercepted by KVM by default, then KVM will
fail to inject a #GP if the guest sets the CR4 bit without triggering
an exit, e.g. by toggling only the bit in question.
Note, KVM doesn't give the guest direct access to any CR4 bits that are
also dependent on guest CPUID. Yet.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200930041659.28181-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Now that vcpu_after_set_cpuid() and update_exception_bitmap() are called
back-to-back, subsume the exception bitmap update into the common CPUID
update. Drop the SVM invocation entirely as SVM's exception bitmap
doesn't vary with respect to guest CPUID.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200930041659.28181-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move the call to kvm_x86_ops.vcpu_after_set_cpuid() to the very end of
kvm_vcpu_after_set_cpuid() to allow the vendor implementation to react
to changes made by the common code. In the near future, this will be
used by VMX to update its CR4 guest/host masks to account for reserved
bits. In the long term, SGX support will update the allowed XCR0 mask
for enclaves based on the vCPU's allowed XCR0.
vcpu_after_set_cpuid() (nee kvm_update_cpuid()) was originally added by
commit 2acf923e38fb ("KVM: VMX: Enable XSAVE/XRSTOR for guest"), and was
called separately after kvm_x86_ops.vcpu_after_set_cpuid() (nee
kvm_x86_ops->cpuid_update()). There is no indication that the placement
of the common code updates after the vendor updates was anything more
than a "new function at the end" decision.
Inspection of the current code reveals no dependency on kvm_x86_ops'
vcpu_after_set_cpuid() in kvm_vcpu_after_set_cpuid() or any of its
helpers. The bulk of the common code depends only on the guest's CPUID
configuration, kvm_mmu_reset_context() does not consume dynamic vendor
state, and there are no collisions between kvm_pmu_refresh() and VMX's
update of PT state.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200930041659.28181-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Unconditionally intercept changes to CR4.LA57 so that KVM correctly
injects a #GP fault if the guest attempts to set CR4.LA57 when it's
supported in hardware but not exposed to the guest.
Long term, KVM needs to properly handle CR4 bits that can be under guest
control but also may be reserved from the guest's perspective. But, KVM
currently sets the CR4 guest/host mask only during vCPU creation, and
reworking flows to change that will take a bit of elbow grease.
Even if/when generic support for intercepting reserved bits exists, it's
probably not worth letting the guest set CR4.LA57 directly. LA57 can't
be toggled while long mode is enabled, thus it's all but guaranteed to
be set once (maybe twice, e.g. by BIOS and kernel) during boot and never
touched again. On the flip side, letting the guest own CR4.LA57 may
incur extra VMREADs. In other words, this temporary "hack" is probably
also the right long term fix.
Fixes: fd8cb433734e ("KVM: MMU: Expose the LA57 feature to VM.")
Cc: stable@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
[sean: rewrote changelog]
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200930041659.28181-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The function amd_ir_set_vcpu_affinity makes use of the parameter struct
amd_iommu_pi_data.prev_ga_tag to determine if it should delete struct
amd_iommu_pi_data from a list when not running in AVIC mode.
However, prev_ga_tag is initialized only when AVIC is enabled. The non-zero
uninitialized value can cause unintended code path, which ends up making
use of the struct vcpu_svm.ir_list and ir_list_lock without being
initialized (since they are intended only for the AVIC case).
This triggers NULL pointer dereference bug in the function vm_ir_list_del
with the following call trace:
svm_update_pi_irte+0x3c2/0x550 [kvm_amd]
? proc_create_single_data+0x41/0x50
kvm_arch_irq_bypass_add_producer+0x40/0x60 [kvm]
__connect+0x5f/0xb0 [irqbypass]
irq_bypass_register_producer+0xf8/0x120 [irqbypass]
vfio_msi_set_vector_signal+0x1de/0x2d0 [vfio_pci]
vfio_msi_set_block+0x77/0xe0 [vfio_pci]
vfio_pci_set_msi_trigger+0x25c/0x2f0 [vfio_pci]
vfio_pci_set_irqs_ioctl+0x88/0xb0 [vfio_pci]
vfio_pci_ioctl+0x2ea/0xed0 [vfio_pci]
? alloc_file_pseudo+0xa5/0x100
vfio_device_fops_unl_ioctl+0x26/0x30 [vfio]
? vfio_device_fops_unl_ioctl+0x26/0x30 [vfio]
__x64_sys_ioctl+0x96/0xd0
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Therefore, initialize prev_ga_tag to zero before use. This should be safe
because ga_tag value 0 is invalid (see function avic_vm_init).
Fixes: dfa20099e26e ("KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20201003232707.4662-1-suravee.suthikulpanit@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This way we don't waste memory on VMs which don't use nesting
virtualization even when the host enabled it for them.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201001112954.6258-5-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This will be used to signal an error to the userspace, in case
the vendor code failed during handling of this msr. (e.g -ENOMEM)
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201001112954.6258-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This will allow the KVM to report such errors (e.g -ENOMEM)
to the userspace.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201001112954.6258-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Return 1 on errors that are caused by wrong guest behavior
(which will inject #GP to the guest)
And return a negative error value on issues that are
the kernel's fault (e.g -ENOMEM)
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201001112954.6258-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
These should be const, so make it so.
Signed-off-by: Joe Perches <joe@perches.com>
Message-Id: <ed95eef4f10fc1317b66936c05bc7dd8f943a6d5.1601770305.git.joe@perches.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
As vcpu->arch.cpuid_entries is now allocated dynamically, the only
remaining use for KVM_MAX_CPUID_ENTRIES is to check KVM_SET_CPUID/
KVM_SET_CPUID2 input for sanity. Since it was reported that the
current limit (80) is insufficient for some CPUs, bump
KVM_MAX_CPUID_ENTRIES and use an arbitrary value '256' as the new
limit.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20201001130541.1398392-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The current limit for guest CPUID leaves (KVM_MAX_CPUID_ENTRIES, 80)
is reported to be insufficient but before we bump it let's switch to
allocating vcpu->arch.cpuid_entries[] array dynamically. Currently,
'struct kvm_cpuid_entry2' is 40 bytes so vcpu->arch.cpuid_entries is
3200 bytes which accounts for 1/4 of the whole 'struct kvm_vcpu_arch'
but having it pre-allocated (for all vCPUs which we also pre-allocate)
gives us no real benefits.
Another plus of the dynamic allocation is that we now do kvm_check_cpuid()
check before we assign anything to vcpu->arch.cpuid_nent/cpuid_entries so
no changes are made in case the check fails.
Opportunistically remove unneeded 'out' labels from
kvm_vcpu_ioctl_set_cpuid()/kvm_vcpu_ioctl_set_cpuid2() and return
directly whenever possible.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20201001130541.1398392-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
|
|
As a preparatory step to allocating vcpu->arch.cpuid_entries dynamically
make kvm_check_cpuid() check work with an arbitrary 'struct kvm_cpuid_entry2'
array.
Currently, when kvm_check_cpuid() fails we reset vcpu->arch.cpuid_nent to
0 and this is kind of weird, i.e. one would expect CPUIDs to remain
unchanged when KVM_SET_CPUID[2] call fails.
No functional change intended. It would've been possible to move the updated
kvm_check_cpuid() in kvm_vcpu_ioctl_set_cpuid2() and check the supplied
input before we start updating vcpu->arch.cpuid_entries/nent but we
can't do the same in kvm_vcpu_ioctl_set_cpuid() as we'll have to copy
'struct kvm_cpuid_entry' entries first. The change will be made when
vcpu->arch.cpuid_entries[] array becomes allocated dynamically.
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20201001130541.1398392-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
KVM unconditionally provides PV features to the guest, regardless of the
configured CPUID. An unwitting guest that doesn't check
KVM_CPUID_FEATURES before use could access paravirt features that
userspace did not intend to provide. Fix this by checking the guest's
CPUID before performing any paravirtual operations.
Introduce a capability, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, to gate the
aforementioned enforcement. Migrating a VM from a host w/o this patch to
a host with this patch could silently change the ABI exposed to the
guest, warranting that we default to the old behavior and opt-in for
the new one.
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Change-Id: I202a0926f65035b872bfe8ad15307c026de59a98
Message-Id: <20200818152429.1923996-4-oupton@google.com>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Small change to avoid meaningless duplication in the subsequent patch.
No functional change intended.
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Change-Id: I77ab9cdad239790766b7a49d5cbae5e57a3005ea
Message-Id: <20200818152429.1923996-3-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
No functional change intended.
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Change-Id: I7cbe71069db98d1ded612fd2ef088b70e7618426
Message-Id: <20200818152429.1923996-2-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
KVM was switched to interrupt-based mechanism for 'page ready' event
delivery in Linux-5.8 (see commit 2635b5c4a0e4 ("KVM: x86: interrupt based
APF 'page ready' event delivery")) and #PF (ab)use for 'page ready' event
delivery was removed. Linux guest switched to this new mechanism
exclusively in 5.9 (see commit b1d405751cd5 ("KVM: x86: Switch KVM guest to
using interrupts for page ready APF delivery")) so it is not possible to
get #PF for a 'page ready' event even when the guest is running on top
of an older KVM (APF mechanism won't be enabled). Update the comment in
exc_page_fault() to reflect the new reality.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20201002154313.1505327-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Let KVM_WERROR depend on KVM, so it doesn't show in menuconfig alone.
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Message-Id: <20201001112014.9561-1-mcroce@linux.microsoft.com>
Fixes: 4f337faf1c55e ("KVM: allow disabling -Werror")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Allowing userspace to intercept reads to x2APIC MSRs when APICV is
fully enabled for the guest simply can't work. But more in general,
the LAPIC could be set to in-kernel after the MSR filter is setup
and allowing accesses by userspace would be very confusing.
We could in principle allow userspace to intercept reads and writes to TPR,
and writes to EOI and SELF_IPI, but while that could be made it work, it
would still be silly.
Cc: Alexander Graf <graf@amazon.com>
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Rework the resetting of the MSR bitmap for x2APIC MSRs to ignore userspace
filtering. Allowing userspace to intercept reads to x2APIC MSRs when
APICV is fully enabled for the guest simply can't work; the LAPIC and thus
virtual APIC is in-kernel and cannot be directly accessed by userspace.
To keep things simple we will in fact forbid intercepting x2APIC MSRs
altogether, independent of the default_allow setting.
Cc: Alexander Graf <graf@amazon.com>
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201005195532.8674-3-sean.j.christopherson@intel.com>
[Modified to operate even if APICv is disabled, adjust documentation. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
- Remove unused msi_ctrl, io_optional and align_resource fields from ARM
struct hw_pci (Lorenzo Pieralisi)
* remotes/lorenzo/pci/arm:
ARM/PCI: Remove unused fields from struct hw_pci
|
|
- Remove useless __KERNEL__ preprocessor guard in sparc io_32.h (Lorenzo
Pieralisi)
- Move ioremap/iounmap declaration so it's visible in asm-generic/io.h
(Lorenzo Pieralisi)
- Fix memory leak in generic !CONFIG_GENERIC_IOMAP pci_iounmap()
implementation (Lorenzo Pieralisi)
* remotes/lorenzo/pci/pci-iomap:
asm-generic/io.h: Fix !CONFIG_GENERIC_IOMAP pci_iounmap() implementation
sparc32: Move ioremap/iounmap declaration before asm-generic/io.h include
sparc32: Remove useless io_32.h __KERNEL__ preprocessor guard
|
|
- Remove unnecessary #includes (Gustavo Pimentel)
- Fix intel_mid_pci.c build error when !CONFIG_ACPI (Randy Dunlap)
- Use scnprintf(), not snprintf(), in sysfs "show" functions (Krzysztof
Wilczyński)
- Simplify pci-pf-stub by using module_pci_driver() (Liu Shixin)
- Print IRQ used by Link Bandwidth Notification (Dongdong Liu)
- Update sysfs mmap-related #ifdef comments (Clint Sbisa)
- Simplify pci_dev_reset_slot_function() (Lukas Wunner)
- Use "NULL" instead of "0" to fix sparse warnings (Gustavo Pimentel)
- Simplify bool comparisons (Krzysztof Wilczyński)
- Drop double zeroing for P2PDMA sg_init_table() (Julia Lawall)
* pci/misc:
PCI: v3-semi: Remove unneeded break
PCI/P2PDMA: Drop double zeroing for sg_init_table()
PCI: Simplify bool comparisons
PCI: endpoint: Use "NULL" instead of "0" as a NULL pointer
PCI: Simplify pci_dev_reset_slot_function()
PCI: Update mmap-related #ifdef comments
PCI/LINK: Print IRQ number used by port
PCI/IOV: Simplify pci-pf-stub with module_pci_driver()
PCI: Use scnprintf(), not snprintf(), in sysfs "show" functions
x86/PCI: Fix intel_mid_pci.c build error when ACPI is not enabled
PCI: Remove unnecessary header includes
|
|
If protected virtualization is active on s390, VIRTIO has only retricted
access to the guest memory.
Define CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS and export
arch_has_restricted_virtio_memory_access to advertize VIRTIO if that's
the case.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Link: https://lore.kernel.org/r/1599728030-17085-3-git-send-email-pmorel@linux.ibm.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
The function detect_harden_bp_fw() is gone after commit d4647f0a2ad7
("arm64: Rewrite Spectre-v2 mitigation code"). Update this comment to
reflect the new state of affairs.
Fixes: d4647f0a2ad7 ("arm64: Rewrite Spectre-v2 mitigation code")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201020214544.3206838-3-swboyd@chromium.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply and reset updates from Sebastian Reichel:
"Power-supply core:
- add wireless type
- properly document current direction
Battery/charger driver changes:
- new fuel-gauge/charger driver for RN5T618/RN5T619
- new charger driver for BQ25980, BQ25975 and BQ25960
- bq27xxx-battery: add support for TI bq34z100
- gpio-charger: convert to GPIO descriptors
- gpio-charger: add optional support for charge current limiting
- max17040: add support for max17041, max17043, max17044
- max17040: add support for max17048, max17049, max17058, max17059
- smb347-charger: add DT support
- smb247-charger: add SMB345 and SMB358 support
- simple-battery: add temperature properties
- lots of minor fixes, cleanups and DT binding YAML conversions
Reset drivers:
- ocelot: Add support for Sparx5"
* tag 'for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (81 commits)
power: reset: POWER_RESET_OCELOT_RESET should depend on Ocelot or Sparx5
power: supply: bq25980: Fix uninitialized wd_reg_val and overrun
power: supply: ltc2941: Fix ptr to enum cast
power: supply: test-power: revise parameter printing to use sprintf
power: supply: charger-manager: fix incorrect check on charging_duration_ms
power: supply: max17040: Fix ptr to enum cast
power: supply: bq25980: Fix uninitialized wd_reg_val
power: supply: bq25980: remove redundant zero check on ret
power: reset: ocelot: Add support for Sparx5
dt-bindings: reset: ocelot: Add Sparx5 support
power: supply: sbs-battery: keep error code when get_property() fails
power: supply: bq25980: Add support for the BQ259xx family
dt-binding: bq25980: Add the bq25980 flash charger
power: supply: fix spelling mistake "unprecise" -> "imprecise"
power: supply: test_power: add missing newlines when printing parameters by sysfs
power: supply: pm2301: drop duplicated i2c_device_id
power: supply: charger-manager: drop unused charger assignment
power: supply: rt9455: skip 'struct acpi_device_id' when !CONFIG_ACPI
power: supply: goldfish: skip 'struct acpi_device_id' when !CONFIG_ACPI
power: supply: bq25890: skip 'struct acpi_device_id' when !CONFIG_ACPI
...
|
|
Pull ARM updates from Russell King:
- handle inexact watchpoint addresses (Douglas Anderson)
- decompressor serial debug cleanups (Linus Walleij)
- update L2 cache prefetch bits (Guillaume Tucker)
- add text offset and malloc size to the decompressor kexec data
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: add malloc size to decompressor kexec size structure
ARM: add TEXT_OFFSET to decompressor kexec image structure
ARM: 9007/1: l2c: fix prefetch bits init in L2X0_AUX_CTRL using DT values
ARM: 9010/1: uncompress: Print the location of appended DTB
ARM: 9009/1: uncompress: Enable debug in head.S
ARM: 9008/1: uncompress: Drop excess whitespace print
ARM: 9006/1: uncompress: Wait for ready and busy in debug prints
ARM: 9005/1: debug: Select flow control for all debug UARTs
ARM: 9004/1: debug: Split waituart to CTS and TXRDY
ARM: 9003/1: uncompress: Delete unused debug macros
ARM: 8997/2: hw_breakpoint: Handle inexact watchpoint addresses
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC updates from Vineet Gupta:
"The bulk of ARC pull request is removal of EZChip NPS platform which
was suffering from constant bitrot. In recent years EZChip has gone
though multiple successive acquisitions and I guess things and people
move on. I would like to take this opportunity to recognize and thank
all those good folks (Gilad, Noam, Ofer...) for contributing major
bits to ARC port (SMP, Big Endian).
Summary:
- drop support for EZChip NPS platform
- misc other fixes"
* tag 'arc-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
arc: include/asm: fix typos of "themselves"
ARC: SMP: fix typo and use "come up" instead of "comeup"
ARC: [dts] fix the errors detected by dtbs_check
arc: plat-hsdk: fix kconfig dependency warning when !RESET_CONTROLLER
ARC: [plat-eznps]: Drop support for EZChip NPS platform
|
|
This change removes all instances of DISABLE_LTO from
Makefiles, as they are currently unused, and the preferred
method of disabling LTO is to filter out the flags instead.
Note added by Masahiro Yamada:
DISABLE_LTO was added as preparation for GCC LTO, but GCC LTO was
not pulled into the mainline. (https://lkml.org/lkml/2014/4/8/272)
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Spectre-v2 can be mitigated on Falkor CPUs either by calling into
firmware or by issuing a magic, CPU-specific sequence of branches.
Although the latter is faster, the size of the code sequence means that
it cannot be used in the EL2 vectors, and so there is a need for both
mitigations to co-exist in order to achieve optimal performance.
Change the mitigation selection logic for Spectre-v2 so that the
CPU-specific mitigation is used only when the firmware mitigation is
also available, rather than when a firmware mitigation is unavailable.
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
ISA v3.1 removes transactional memory and hence it should not be present
in cpu_features or cpu_user_features2. Remove CPU_FTR_TM_COMP from
CPU_FTRS_POWER10. Remove PPC_FEATURE2_HTM_COMP and
PPC_FEATURE2_HTM_NOSC_COMP from COMMON_USER2_POWER10.
Fixes: a3ea40d5c736 ("powerpc: Add POWER10 architected mode")
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200827035529.900-1-jniethe5@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for Linux 5.10
- New page table code for both hypervisor and guest stage-2
- Introduction of a new EL2-private host context
- Allow EL2 to have its own private per-CPU variables
- Support of PMU event filtering
- Complete rework of the Spectre mitigation
|