Age | Commit message (Collapse) | Author |
|
Since LLVM commit 3787ee4, the '-stack-alignment' flag has been dropped
[1], leading to the following error message when building a LTO kernel
with Clang-13 and LLD-13:
ld.lld: error: -plugin-opt=-: ld.lld: Unknown command line argument
'-stack-alignment=8'. Try 'ld.lld --help'
ld.lld: Did you mean '--stackrealign=8'?
It also appears that the '-code-model' flag is not necessary anymore
starting with LLVM-9 [2].
Drop '-code-model' and make '-stack-alignment' conditional on LLD < 13.0.0.
These flags were necessary because these flags were not encoded in the
IR properly, so the link would restart optimizations without them. Now
there are properly encoded in the IR, and these flags exposing
implementation details are no longer necessary.
[1] https://reviews.llvm.org/D103048
[2] https://reviews.llvm.org/D52322
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1377
Signed-off-by: Tor Vic <torvic9@mailbox.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/f2c018ee-5999-741e-58d4-e482d5246067@mailbox.org
|
|
Pull kvm fixes from Paolo Bonzini:
"Bugfixes, including a TLB flush fix that affects processors without
nested page tables"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: fix previous commit for 32-bit builds
kvm: avoid speculation-based attacks from out-of-range memslot accesses
KVM: x86: Unload MMU on guest TLB flush if TDP disabled to force MMU sync
KVM: x86: Ensure liveliness of nested VM-Enter fail tracepoint message
selftests: kvm: Add support for customized slot0 memory size
KVM: selftests: introduce P47V64 for s390x
KVM: x86: Ensure PV TLB flush tracepoint reflects KVM behavior
KVM: X86: MMU: Use the correct inherited permissions to get shadow page
KVM: LAPIC: Write 0 to TMICT should also cancel vmx-preemption timer
KVM: SVM: Fix SEV SEND_START session length & SEND_UPDATE_DATA query length after commit 238eca821cee
|
|
When using shadow paging, unload the guest MMU when emulating a guest TLB
flush to ensure all roots are synchronized. From the guest's perspective,
flushing the TLB ensures any and all modifications to its PTEs will be
recognized by the CPU.
Note, unloading the MMU is overkill, but is done to mirror KVM's existing
handling of INVPCID(all) and ensure the bug is squashed. Future cleanup
can be done to more precisely synchronize roots when servicing a guest
TLB flush.
If TDP is enabled, synchronizing the MMU is unnecessary even if nested
TDP is in play, as a "legacy" TLB flush from L1 does not invalidate L1's
TDP mappings. For EPT, an explicit INVEPT is required to invalidate
guest-physical mappings; for NPT, guest mappings are always tagged with
an ASID and thus can only be invalidated via the VMCB's ASID control.
This bug has existed since the introduction of KVM_VCPU_FLUSH_TLB.
It was only recently exposed after Linux guests stopped flushing the
local CPU's TLB prior to flushing remote TLBs (see commit 4ce94eabac16,
"x86/mm/tlb: Flush remote and local TLBs concurrently"), but is also
visible in Windows 10 guests.
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Fixes: f38a7b75267f ("KVM: X86: support paravirtualized help for TLB shootdowns")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
[sean: massaged comment and changelog]
Message-Id: <20210531172256.2908-1-jiangshanlai@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Use the __string() machinery provided by the tracing subystem to make a
copy of the string literals consumed by the "nested VM-Enter failed"
tracepoint. A complete copy is necessary to ensure that the tracepoint
can't outlive the data/memory it consumes and deference stale memory.
Because the tracepoint itself is defined by kvm, if kvm-intel and/or
kvm-amd are built as modules, the memory holding the string literals
defined by the vendor modules will be freed when the module is unloaded,
whereas the tracepoint and its data in the ring buffer will live until
kvm is unloaded (or "indefinitely" if kvm is built-in).
This bug has existed since the tracepoint was added, but was recently
exposed by a new check in tracing to detect exactly this type of bug.
fmt: '%s%s
' current_buffer: ' vmx_dirty_log_t-140127 [003] .... kvm_nested_vmenter_failed: '
WARNING: CPU: 3 PID: 140134 at kernel/trace/trace.c:3759 trace_check_vprintf+0x3be/0x3e0
CPU: 3 PID: 140134 Comm: less Not tainted 5.13.0-rc1-ce2e73ce600a-req #184
Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
RIP: 0010:trace_check_vprintf+0x3be/0x3e0
Code: <0f> 0b 44 8b 4c 24 1c e9 a9 fe ff ff c6 44 02 ff 00 49 8b 97 b0 20
RSP: 0018:ffffa895cc37bcb0 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffffa895cc37bd08 RCX: 0000000000000027
RDX: 0000000000000027 RSI: 00000000ffffdfff RDI: ffff9766cfad74f8
RBP: ffffffffc0a041d4 R08: ffff9766cfad74f0 R09: ffffa895cc37bad8
R10: 0000000000000001 R11: 0000000000000001 R12: ffffffffc0a041d4
R13: ffffffffc0f4dba8 R14: 0000000000000000 R15: ffff976409f2c000
FS: 00007f92fa200740(0000) GS:ffff9766cfac0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000559bd11b0000 CR3: 000000019fbaa002 CR4: 00000000001726e0
Call Trace:
trace_event_printf+0x5e/0x80
trace_raw_output_kvm_nested_vmenter_failed+0x3a/0x60 [kvm]
print_trace_line+0x1dd/0x4e0
s_show+0x45/0x150
seq_read_iter+0x2d5/0x4c0
seq_read+0x106/0x150
vfs_read+0x98/0x180
ksys_read+0x5f/0xe0
do_syscall_64+0x40/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Cc: Steven Rostedt <rostedt@goodmis.org>
Fixes: 380e0055bc7e ("KVM: nVMX: trace nested VM-Enter failures detected by H/W")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Message-Id: <20210607175748.674002-1-seanjc@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull orphan section fixes from Kees Cook:
"These two corner case fixes have been in -next for about a week:
- Avoid orphan section in ARM cpuidle (Arnd Bergmann)
- Avoid orphan section with !SMP (Nathan Chancellor)"
* tag 'orphans-v5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
vmlinux.lds.h: Avoid orphan section with !SMP
ARM: cpuidle: Avoid orphan section warning
|
|
In record_steal_time(), st->preempted is read twice, and
trace_kvm_pv_tlb_flush() might output result inconsistent if
kvm_vcpu_flush_tlb_guest() see a different st->preempted later.
It is a very trivial problem and hardly has actual harm and can be
avoided by reseting and reading st->preempted in atomic way via xchg().
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20210531174628.10265-1-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When computing the access permissions of a shadow page, use the effective
permissions of the walk up to that point, i.e. the logic AND of its parents'
permissions. Two guest PxE entries that point at the same table gfn need to
be shadowed with different shadow pages if their parents' permissions are
different. KVM currently uses the effective permissions of the last
non-leaf entry for all non-leaf entries. Because all non-leaf SPTEs have
full ("uwx") permissions, and the effective permissions are recorded only
in role.access and merged into the leaves, this can lead to incorrect
reuse of a shadow page and eventually to a missing guest protection page
fault.
For example, here is a shared pagetable:
pgd[] pud[] pmd[] virtual address pointers
/->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--)
/->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-)
pgd-| (shared pmd[] as above)
\->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--)
\->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--)
pud1 and pud2 point to the same pmd table, so:
- ptr1 and ptr3 points to the same page.
- ptr2 and ptr4 points to the same page.
(pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)
- First, the guest reads from ptr1 first and KVM prepares a shadow
page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
"u--" comes from the effective permissions of pgd, pud1 and
pmd1, which are stored in pt->access. "u--" is used also to get
the pagetable for pud1, instead of "uw-".
- Then the guest writes to ptr2 and KVM reuses pud1 which is present.
The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
even though the pud1 pmd (because of the incorrect argument to
kvm_mmu_get_page in the previous step) has role.access="u--".
- Then the guest reads from ptr3. The hypervisor reuses pud1's
shadow pmd for pud2, because both use "u--" for their permissions.
Thus, the shadow pmd already includes entries for both pmd1 and pmd2.
- At last, the guest writes to ptr4. This causes no vmexit or pagefault,
because pud1's shadow page structures included an "uw-" page even though
its role.access was "u--".
Any kind of shared pagetable might have the similar problem when in
virtual machine without TDP enabled if the permissions are different
from different ancestors.
In order to fix the problem, we change pt->access to be an array, and
any access in it will not include permissions ANDed from child ptes.
The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
Remember to test it with TDP disabled.
The problem had existed long before the commit 41074d07c78b ("KVM: MMU:
Fix inherited permissions for emulated guest pte updates"), and it
is hard to find which is the culprit. So there is no fixes tag here.
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20210603052455.21023-1-jiangshanlai@gmail.com>
Cc: stable@vger.kernel.org
Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
According to the SDM 10.5.4.1:
A write of 0 to the initial-count register effectively stops the local
APIC timer, in both one-shot and periodic mode.
However, the lapic timer oneshot/periodic mode which is emulated by vmx-preemption
timer doesn't stop by writing 0 to TMICT since vmx->hv_deadline_tsc is still
programmed and the guest will receive the spurious timer interrupt later. This
patch fixes it by also cancelling the vmx-preemption timer when writing 0 to
the initial-count register.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1623050385-100988-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
after commit 238eca821cee
Commit 238eca821cee ("KVM: SVM: Allocate SEV command structures on local stack")
uses the local stack to allocate the structures used to communicate with the PSP,
which were earlier being kzalloced. This breaks SEV live migration for
computing the SEND_START session length and SEND_UPDATE_DATA query length as
session_len and trans_len and hdr_len fields are not zeroed respectively for
the above commands before issuing the SEV Firmware API call, hence the
firmware returns incorrect session length and update data header or trans length.
Also the SEV Firmware API returns SEV_RET_INVALID_LEN firmware error
for these length query API calls, and the return value and the
firmware error needs to be passed to the userspace as it is, so
need to remove the return check in the KVM code.
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <20210607061532.27459-1-Ashish.Kalra@amd.com>
Fixes: 238eca821cee ("KVM: SVM: Allocate SEV command structures on local stack")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
"A set of fixes that have been coming in over the last few weeks, the
usual mix of fixes:
- DT fixups for TI K3
- SATA drive detection fix for TI DRA7
- Power management fixes and a few build warning removals for OMAP
- OP-TEE fix to use standard API for UUID exporting
- DT fixes for a handful of i.MX boards
And a few other smaller items"
* tag 'arm-soc-fixes-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits)
arm64: meson: select COMMON_CLK
soc: amlogic: meson-clk-measure: remove redundant dev_err call in meson_msr_probe()
ARM: OMAP1: ams-delta: remove unused function ams_delta_camera_power
bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
ARM: dts: imx7d-pico: Fix the 'tuning-step' property
ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
arm64: dts: freescale: sl28: var1: fix RGMII clock and voltage
arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
ARM: imx: pm-imx27: Include "common.h"
arm64: dts: zii-ultra: fix 12V_MAIN voltage
arm64: dts: zii-ultra: remove second GEN_3V3 regulator instance
arm64: dts: ls1028a: fix memory node
bus: ti-sysc: Fix am335x resume hang for usb otg module
ARM: OMAP2+: Fix build warning when mmc_omap is not built
ARM: OMAP1: isp1301-omap: Add missing gpiod_add_lookup_table function
ARM: OMAP1: Fix use of possibly uninitialized irq variable
optee: use export_uuid() to copy client UUID
arm64: dts: ti: k3*: Introduce reg definition for interrupt routers
arm64: dts: ti: k3-am65|j721e|am64: Map the dma / navigator subsystem via explicit ranges
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fix our KVM reverse map real-mode handling since we enabled huge
vmalloc (in some configurations).
Revert a recent change to our IOMMU code which broke some devices.
Fix KVM handling of FSCR on P7/P8, which could have possibly let a
guest crash it's Qemu.
Fix kprobes validation of prefixed instructions across page boundary.
Thanks to Alexey Kardashevskiy, Christophe Leroy, Fabiano Rosas,
Frederic Barrat, Naveen N. Rao, and Nicholas Piggin"
* tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
Revert "powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs"
KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
powerpc: Fix reverse map real-mode address lookup with huge vmalloc
powerpc/kprobes: Fix validation of prefixed instructions across page boundary
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"A bunch of x86/urgent stuff accumulated for the last two weeks so
lemme unload it to you.
It should be all totally risk-free, of course. :-)
- Fix out-of-spec hardware (1st gen Hygon) which does not implement
MSR_AMD64_SEV even though the spec clearly states so, and check
CPUID bits first.
- Send only one signal to a task when it is a SEGV_PKUERR si_code
type.
- Do away with all the wankery of reserving X amount of memory in the
first megabyte to prevent BIOS corrupting it and simply and
unconditionally reserve the whole first megabyte.
- Make alternatives NOP optimization work at an arbitrary position
within the patched sequence because the compiler can put
single-byte NOPs for alignment anywhere in the sequence (32-bit
retpoline), vs our previous assumption that the NOPs are only
appended.
- Force-disable ENQCMD[S] instructions support and remove
update_pasid() because of insufficient protection against FPU state
modification in an interrupt context, among other xstate horrors
which are being addressed at the moment. This one limits the
fallout until proper enablement.
- Use cpu_feature_enabled() in the idxd driver so that it can be
build-time disabled through the defines in disabled-features.h.
- Fix LVT thermal setup for SMI delivery mode by making sure the APIC
LVT value is read before APIC initialization so that softlockups
during boot do not happen at least on one machine.
- Mark all legacy interrupts as legacy vectors when the IO-APIC is
disabled and when all legacy interrupts are routed through the PIC"
* tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Check SME/SEV support in CPUID first
x86/fault: Don't send SIGSEGV twice on SEGV_PKUERR
x86/setup: Always reserve the first 1M of RAM
x86/alternative: Optimize single-byte NOPs at an arbitrary position
x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
dmaengine: idxd: Use cpu_feature_enabled()
x86/thermal: Fix LVT thermal setup for SMI delivery mode
x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nmenon/linux into arm/fixes
Devicetree fixes for TI K3 platforms for v5.13 merge window:
These minor fixes include:
* Fixups for device tree discovered during yaml conversion
* Fixups for missing dma-coherent property in j7200
* Removal of camera sensor node from am65 evm dts to overlay
as camera sensor boards are variable.
* tag 'ti-k3-dt-fixes-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nmenon/linux:
arm64: dts: ti: k3*: Introduce reg definition for interrupt routers
arm64: dts: ti: k3-am65|j721e|am64: Map the dma / navigator subsystem via explicit ranges
arm64: dts: ti: k3-*: Rename the TI-SCI node
arm64: dts: ti: k3-am65-wakeup: Drop un-necessary properties from dmsc node
arm64: dts: ti: k3-am65-wakeup: Add debug region to TI-SCI node
arm64: dts: ti: k3-*: Rename the TI-SCI clocks node name
arm64: dts: ti: j7200-main: Mark Main NAVSS as dma-coherent
arm64: dts: ti: k3-am654-base-board: remove ov5640
Link: https://lore.kernel.org/r/20210518115634.467vgpbzplal5kou@obituary
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes
PM and build warning fixes for omaps
While chasing system suspend related regressions, I noticed few other
issues related to PM would be good to have fixed:
- UART idling does not always work for hardware autoidle features
- am335x resume works only the first time unless musb module is loaded
Then there are three patches for omap1 related warnings caused by the gpio
changes, and one build warning fix for legacy mmc platform code when mmc
is built as a loadable module.
These can all be merged whenever suitable naturally. I've sent the more
urgent SATA regression fix separately although it appears in this pull
request too because of the branches merged.
* tag 'omap-for-v5.13/fixes-pm' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP1: ams-delta: remove unused function ams_delta_camera_power
bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
bus: ti-sysc: Fix am335x resume hang for usb otg module
ARM: OMAP2+: Fix build warning when mmc_omap is not built
ARM: OMAP1: isp1301-omap: Add missing gpiod_add_lookup_table function
ARM: OMAP1: Fix use of possibly uninitialized irq variable
Link: https://lore.kernel.org/r/pull-1622614772-543196@atomide.com
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux into arm/fixes
Amlogic fixes for v5.13-rc1
- arm64: meson: select COMMON_CLK to select a proper implementation of the clock API
- soc: amlogic: meson-clk-measure: remove redundant dev_err call in meson_msr_probe()
* tag 'amlogic-fixes-v5.13-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux:
arm64: meson: select COMMON_CLK
soc: amlogic: meson-clk-measure: remove redundant dev_err call in meson_msr_probe()
Link: https://lore.kernel.org/r/73e76706-f3f4-bebf-10dd-d2ec9106a234@baylibre.com
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
i.MX fixes for 5.13:
- Fix missing-prototypes warning of 'imx27_pm_init' in i.MX27 platform
pm code.
- A couple of patches from Fabio Estevam to fix 'tuning-step' property
in imx7d-meerkat96 and imx7d-pico DT.
- Fix '#gpio-cells' of nxp,pca8574 device in imx6qdl-emcon-avari DT.
- A couple of patches from Lucas Stach to fix regulator and voltage for
imx8mq-zii-ultra board.
- Add missing regulators for imx6q-dhcom to avoid possible instability
issues.
- Fix memory-controller settings for fsl-ls1028a DT.
- Fix RGMII clock and voltage for a couple of fsl-ls1028a-kontron-sl28
boards.
- Fix RGMII connection to QCA8334 switch for imx6dl-yapp4 board.
* tag 'imx-fixes-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
ARM: dts: imx7d-pico: Fix the 'tuning-step' property
ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
arm64: dts: freescale: sl28: var1: fix RGMII clock and voltage
arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
ARM: imx: pm-imx27: Include "common.h"
arm64: dts: zii-ultra: fix 12V_MAIN voltage
arm64: dts: zii-ultra: remove second GEN_3V3 regulator instance
arm64: dts: ls1028a: fix memory node
ARM: dts: imx6q-dhcom: Add PU,VDD1P1,VDD2P5 regulators
ARM: dts: imx6dl-yapp4: Fix RGMII connection to QCA8334 switch
Link: https://lore.kernel.org/r/20210527011758.GD8194@dragon
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
Merge misc fixes from Andrew Morton:
"13 patches.
Subsystems affected by this patch series: mips, mm (kfence, debug,
pagealloc, memory-hotplug, hugetlb, kasan, and hugetlb), init, proc,
lib, ocfs2, and mailmap"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mailmap: use private address for Michel Lespinasse
ocfs2: fix data corruption by fallocate
lib: crc64: fix kernel-doc warning
mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
mm/kasan/init.c: fix doc warning
proc: add .gitignore for proc-subset-pid selftest
hugetlb: pass head page to remove_hugetlb_page()
drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64
mm/page_alloc: fix counting of free pages after take off from buddy
mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()
pid: take a reference when initializing `cad_pid`
kfence: use TASK_IDLE when awaiting allocation
Revert "MIPS: make userspace mapping young by default"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- Build with '-mno-relax' when using LLVM's linker, which doesn't
support linker relaxation.
- A fix to build without SiFive's errata.
- A fix to use PAs during init_resources()
- A fix to avoid W+X mappings during boot.
* tag 'riscv-for-linus-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Fix memblock_free() usages in init_resources()
riscv: skip errata_cip_453.o if CONFIG_ERRATA_SIFIVE_CIP_453 is disabled
riscv: mm: Fix W+X mappings at boot
riscv: Use -mno-relax when using lld linker
|
|
This reverts commit f685a533a7fab35c5d069dcd663f59c8e4171a75.
The MIPS cache flush logic needs to know whether the mapping was already
established to decide how to flush caches. This is done by checking the
valid bit in the PTE. The commit above breaks this logic by setting the
valid in the PTE in new mappings, which causes kernel crashes.
Link: https://lkml.kernel.org/r/20210526094335.92948-1-tsbogend@alpha.franken.de
Fixes: f685a533a7f ("MIPS: make userspace mapping young by default")
Reported-by: Zhou Yanjie <zhouyanjie@wanyeetech.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Huang Pei <huangpei@loongson.cn>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The first two bits of the CPUID leaf 0x8000001F EAX indicate whether SEV
or SME is supported, respectively. It's better to check whether SEV or
SME is actually supported before accessing the MSR_AMD64_SEV to check
whether SEV or SME is enabled.
This is both a bare-metal issue and a guest/VM issue. Since the first
generation Hygon Dhyana CPU doesn't support the MSR_AMD64_SEV, reading that
MSR results in a #GP - either directly from hardware in the bare-metal
case or via the hypervisor (because the RDMSR is actually intercepted)
in the guest/VM case, resulting in a failed boot. And since this is very
early in the boot phase, rdmsrl_safe()/native_read_msr_safe() can't be
used.
So check the CPUID bits first, before accessing the MSR.
[ tlendacky: Expand and improve commit message. ]
[ bp: Massage commit message. ]
Fixes: eab696d8e8b9 ("x86/sev: Do not require Hypervisor CPUID bit for SEV guests")
Signed-off-by: Pu Wen <puwen@hygon.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@vger.kernel.org> # v5.10+
Link: https://lkml.kernel.org/r/20210602070207.2480-1-puwen@hygon.cn
|
|
__bad_area_nosemaphore() calls both force_sig_pkuerr() and
force_sig_fault() when handling SEGV_PKUERR. This does not cause
problems because the second signal is filtered by the legacy_queue()
check in __send_signal() because in both cases, the signal is SIGSEGV,
the second one seeing that the first one is already pending.
This causes the kernel to do unnecessary work so send the signal only
once for SEGV_PKUERR.
[ bp: Massage commit message. ]
Fixes: 9db812dbb29d ("signal/x86: Call force_sig_pkuerr from __bad_area_nosemaphore")
Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Jiashuo Liang <liangjs@pku.edu.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Link: https://lkml.kernel.org/r/20210601085203.40214-1-liangjs@pku.edu.cn
|
|
There are BIOSes that are known to corrupt the memory under 1M, or more
precisely under 640K because the memory above 640K is anyway reserved
for the EGA/VGA frame buffer and BIOS.
To prevent usage of the memory that will be potentially clobbered by the
kernel, the beginning of the memory is always reserved. The exact size
of the reserved area is determined by CONFIG_X86_RESERVE_LOW build time
and the "reservelow=" command line option. The reserved range may be
from 4K to 640K with the default of 64K. There are also configurations
that reserve the entire 1M range, like machines with SandyBridge graphic
devices or systems that enable crash kernel.
In addition to the potentially clobbered memory, EBDA of unknown size may
be as low as 128K and the memory above that EBDA start is also reserved
early.
It would have been possible to reserve the entire range under 1M unless for
the real mode trampoline that must reside in that area.
To accommodate placement of the real mode trampoline and keep the memory
safe from being clobbered by BIOS, reserve the first 64K of RAM before
memory allocations are possible and then, after the real mode trampoline
is allocated, reserve the entire range from 0 to 1M.
Update trim_snb_memory() and reserve_real_mode() to avoid redundant
reservations of the same memory range.
Also make sure the memory under 1M is not getting freed by
efi_free_boot_services().
[ bp: Massage commit message and comments. ]
Fixes: a799c2bd29d1 ("x86/setup: Consolidate early memory reservations")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Hugh Dickins <hughd@google.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213177
Link: https://lkml.kernel.org/r/20210601075354.5149-2-rppt@kernel.org
|
|
Up until now the assumption was that an alternative patching site would
have some instructions at the beginning and trailing single-byte NOPs
(0x90) padding. Therefore, the patching machinery would go and optimize
those single-byte NOPs into longer ones.
However, this assumption is broken on 32-bit when code like
hv_do_hypercall() in hyperv_init() would use the ratpoline speculation
killer CALL_NOSPEC. The 32-bit version of that macro would align certain
insns to 16 bytes, leading to the compiler issuing a one or more
single-byte NOPs, depending on the holes it needs to fill for alignment.
That would lead to the warning in optimize_nops() to fire:
------------[ cut here ]------------
Not a NOP at 0xc27fb598
WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:211 optimize_nops.isra.13
due to that function verifying whether all of the following bytes really
are single-byte NOPs.
Therefore, carve out the NOP padding into a separate function and call
it for each NOP range beginning with a single-byte NOP.
Fixes: 23c1ad538f4f ("x86/alternatives: Optimize optimize_nops()")
Reported-by: Richard Narron <richard@aaazen.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213301
Link: https://lkml.kernel.org/r/20210601212125.17145-1-bp@alien8.de
|
|
While digesting the XSAVE-related horrors which got introduced with
the supervisor/user split, the recent addition of ENQCMD-related
functionality got on the radar and turned out to be similarly broken.
update_pasid(), which is only required when X86_FEATURE_ENQCMD is
available, is invoked from two places:
1) From switch_to() for the incoming task
2) Via a SMP function call from the IOMMU/SMV code
#1 is half-ways correct as it hacks around the brokenness of get_xsave_addr()
by enforcing the state to be 'present', but all the conditionals in that
code are completely pointless for that.
Also the invocation is just useless overhead because at that point
it's guaranteed that TIF_NEED_FPU_LOAD is set on the incoming task
and all of this can be handled at return to user space.
#2 is broken beyond repair. The comment in the code claims that it is safe
to invoke this in an IPI, but that's just wishful thinking.
FPU state of a running task is protected by fregs_lock() which is
nothing else than a local_bh_disable(). As BH-disabled regions run
usually with interrupts enabled the IPI can hit a code section which
modifies FPU state and there is absolutely no guarantee that any of the
assumptions which are made for the IPI case is true.
Also the IPI is sent to all CPUs in mm_cpumask(mm), but the IPI is
invoked with a NULL pointer argument, so it can hit a completely
unrelated task and unconditionally force an update for nothing.
Worse, it can hit a kernel thread which operates on a user space
address space and set a random PASID for it.
The offending commit does not cleanly revert, but it's sufficient to
force disable X86_FEATURE_ENQCMD and to remove the broken update_pasid()
code to make this dysfunctional all over the place. Anything more
complex would require more surgery and none of the related functions
outside of the x86 core code are blatantly wrong, so removing those
would be overkill.
As nothing enables the PASID bit in the IA32_XSS MSR yet, which is
required to make this actually work, this cannot result in a regression
except for related out of tree train-wrecks, but they are broken already
today.
Fixes: 20f0afd1fb3d ("x86/mmu: Allocate/free a PASID")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/87mtsd6gr9.ffs@nanos.tec.linutronix.de
|
|
Since commit 83109d5d5fba ("x86/build: Warn on orphan section placement"),
we get a warning for objects in orphan sections. The cpuidle implementation
for OMAP causes this when CONFIG_CPU_IDLE is disabled:
arm-linux-gnueabi-ld: warning: orphan section `__cpuidle_method_of_table' from `arch/arm/mach-omap2/pm33xx-core.o' being placed in section `__cpuidle_method_of_table'
arm-linux-gnueabi-ld: warning: orphan section `__cpuidle_method_of_table' from `arch/arm/mach-omap2/pm33xx-core.o' being placed in section `__cpuidle_method_of_table'
arm-linux-gnueabi-ld: warning: orphan section `__cpuidle_method_of_table' from `arch/arm/mach-omap2/pm33xx-core.o' being placed in section `__cpuidle_method_of_table'
Change the definition of CPUIDLE_METHOD_OF_DECLARE() to silently
drop the table and all code referenced from it when CONFIG_CPU_IDLE
is disabled.
Fixes: 06ee7a950b6a ("ARM: OMAP2+: pm33xx-core: Add cpuidle_ops for am335x/am437x")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20201230155506.1085689-1-arnd@kernel.org
|
|
This single commit is shared between fixes and for-next, as it fixes a
concrete bug while likely conflicting with a more invasive cleanup to
avoid these oddball mappings entirely.
* riscv/riscv-wx-mappings:
riscv: mm: Fix W+X mappings at boot
|
|
`memblock_free()` takes a physical address as its first argument.
Fix the wrong usages in `init_resources()`.
Fixes: ffe0e526126884cf036a6f724220f1f9b4094fd2 ("RISC-V: Improve init_resources()")
Fixes: 797f0375dd2ef5cdc68ac23450cbae9a5c67a74e ("RISC-V: Do not allocate memblock while iterating reserved memblocks")
Signed-off-by: Wende Tan <twd2.me@gmail.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
The errata_cip_453.o should be built only when the Kconfig
CONFIG_ERRATA_SIFIVE_CIP_453 is enabled.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Vincent <vincent.chen@sifive.com>
Fixes: 0e0d4992517f ("riscv: enable SiFive errata CIP-453 and CIP-1200 Kconfig only if CONFIG_64BIT=y")
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
When the kernel mapping was moved the last 2GB of the address space,
(__va(PFN_PHYS(max_low_pfn))) is much smaller than the .data section
start address, the last set_memory_nx() in protect_kernel_text_data()
will fail, thus the .data section is still mapped as W+X. This results
in below W+X mapping waring at boot. Fix it by passing the correct
.data section page num to the set_memory_nx().
[ 0.396516] ------------[ cut here ]------------
[ 0.396889] riscv/mm: Found insecure W+X mapping at address (____ptrval____)/0xffffffff80c00000
[ 0.398347] WARNING: CPU: 0 PID: 1 at arch/riscv/mm/ptdump.c:258 note_page+0x244/0x24a
[ 0.398964] Modules linked in:
[ 0.399459] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc1+ #14
[ 0.400003] Hardware name: riscv-virtio,qemu (DT)
[ 0.400591] epc : note_page+0x244/0x24a
[ 0.401368] ra : note_page+0x244/0x24a
[ 0.401772] epc : ffffffff80007c86 ra : ffffffff80007c86 sp : ffffffe000e7bc30
[ 0.402304] gp : ffffffff80caae88 tp : ffffffe000e70000 t0 : ffffffff80cb80cf
[ 0.402800] t1 : ffffffff80cb80c0 t2 : 0000000000000000 s0 : ffffffe000e7bc80
[ 0.403310] s1 : ffffffe000e7bde8 a0 : 0000000000000053 a1 : ffffffff80c83ff0
[ 0.403805] a2 : 0000000000000010 a3 : 0000000000000000 a4 : 6c7e7a5137233100
[ 0.404298] a5 : 6c7e7a5137233100 a6 : 0000000000000030 a7 : ffffffffffffffff
[ 0.404849] s2 : ffffffff80e00000 s3 : 0000000040000000 s4 : 0000000000000000
[ 0.405393] s5 : 0000000000000000 s6 : 0000000000000003 s7 : ffffffe000e7bd48
[ 0.405935] s8 : ffffffff81000000 s9 : ffffffffc0000000 s10: ffffffe000e7bd48
[ 0.406476] s11: 0000000000001000 t3 : 0000000000000072 t4 : ffffffffffffffff
[ 0.407016] t5 : 0000000000000002 t6 : ffffffe000e7b978
[ 0.407435] status: 0000000000000120 badaddr: 0000000000000000 cause: 0000000000000003
[ 0.408052] Call Trace:
[ 0.408343] [<ffffffff80007c86>] note_page+0x244/0x24a
[ 0.408855] [<ffffffff8010c5a6>] ptdump_hole+0x14/0x1e
[ 0.409263] [<ffffffff800f65c6>] walk_pgd_range+0x2a0/0x376
[ 0.409690] [<ffffffff800f6828>] walk_page_range_novma+0x4e/0x6e
[ 0.410146] [<ffffffff8010c5f8>] ptdump_walk_pgd+0x48/0x78
[ 0.410570] [<ffffffff80007d66>] ptdump_check_wx+0xb4/0xf8
[ 0.410990] [<ffffffff80006738>] mark_rodata_ro+0x26/0x2e
[ 0.411407] [<ffffffff8031961e>] kernel_init+0x44/0x108
[ 0.411814] [<ffffffff80002312>] ret_from_exception+0x0/0xc
[ 0.412309] ---[ end trace 7ec3459f2547ea83 ]---
[ 0.413141] Checked W+X mappings: failed, 512 W+X pages found
Fixes: 2bfc6cd81bd17e43 ("riscv: Move kernel mapping outside of linear mapping")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
This reverts commit 3c0468d4451eb6b4f6604370639f163f9637a479.
That commit was breaking alignment guarantees for the DMA address when
allocating coherent mappings, as described in
Documentation/core-api/dma-api-howto.rst
It was also noticed by Mellanox' driver:
[ 1515.763621] mlx5_core c002:01:00.0: mlx5_frag_buf_alloc_node:146:(pid 13402): unexpected map alignment: 0x0800000000c61000, page_shift=16
[ 1515.763635] mlx5_core c002:01:00.0: mlx5_cqwq_create:181:(pid
13402): mlx5_frag_buf_alloc_node() failed, -12
Fixes: 3c0468d4451e ("powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs")
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210526144540.117795-1-fbarrat@linux.ibm.com
|
|
There are machines out there with added value crap^WBIOS which provide an
SMI handler for the local APIC thermal sensor interrupt. Out of reset,
the BSP on those machines has something like 0x200 in that APIC register
(timestamps left in because this whole issue is timing sensitive):
[ 0.033858] read lvtthmr: 0x330, val: 0x200
which means:
- bit 16 - the interrupt mask bit is clear and thus that interrupt is enabled
- bits [10:8] have 010b which means SMI delivery mode.
Now, later during boot, when the kernel programs the local APIC, it
soft-disables it temporarily through the spurious vector register:
setup_local_APIC:
...
/*
* If this comes from kexec/kcrash the APIC might be enabled in
* SPIV. Soft disable it before doing further initialization.
*/
value = apic_read(APIC_SPIV);
value &= ~APIC_SPIV_APIC_ENABLED;
apic_write(APIC_SPIV, value);
which means (from the SDM):
"10.4.7.2 Local APIC State After It Has Been Software Disabled
...
* The mask bits for all the LVT entries are set. Attempts to reset these
bits will be ignored."
And this happens too:
[ 0.124111] APIC: Switch to symmetric I/O mode setup
[ 0.124117] lvtthmr 0x200 before write 0xf to APIC 0xf0
[ 0.124118] lvtthmr 0x10200 after write 0xf to APIC 0xf0
This results in CPU 0 soft lockups depending on the placement in time
when the APIC soft-disable happens. Those soft lockups are not 100%
reproducible and the reason for that can only be speculated as no one
tells you what SMM does. Likely, it confuses the SMM code that the APIC
is disabled and the thermal interrupt doesn't doesn't fire at all,
leading to CPU 0 stuck in SMM forever...
Now, before
4f432e8bb15b ("x86/mce: Get rid of mcheck_intel_therm_init()")
due to how the APIC_LVTTHMR was read before APIC initialization in
mcheck_intel_therm_init(), it would read the value with the mask bit 16
clear and then intel_init_thermal() would replicate it onto the APs and
all would be peachy - the thermal interrupt would remain enabled.
But that commit moved that reading to a later moment in
intel_init_thermal(), resulting in reading APIC_LVTTHMR on the BSP too
late and with its interrupt mask bit set.
Thus, revert back to the old behavior of reading the thermal LVT
register before the APIC gets initialized.
Fixes: 4f432e8bb15b ("x86/mce: Get rid of mcheck_intel_therm_init()")
Reported-by: James Feeney <james@nurealm.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lkml.kernel.org/r/YKIqDdFNaXYd39wz@zn.tnic
|
|
This fix the recent removal of clock drivers selection.
While it is not necessary to select the clock drivers themselves, we need
to select a proper implementation of the clock API, which for the meson, is
CCF
Fixes: ba66a25536dd ("arm64: meson: ship only the necessary clock controllers")
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://lore.kernel.org/r/20210429083823.59546-1-jbrunet@baylibre.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"This is a bit larger than usual at rc4 time. The reason is due to
Lee's work of fixing newly reported build warnings.
The rest is fixes as usual"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (22 commits)
MAINTAINERS: adjust to removing i2c designware platform data
i2c: s3c2410: fix possible NULL pointer deref on read message after write
i2c: mediatek: Disable i2c start_en and clear intr_stat brfore reset
i2c: i801: Don't generate an interrupt on bus reset
i2c: mpc: implement erratum A-004447 workaround
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 i2c controllers
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers
dt-bindings: i2c: mpc: Add fsl,i2c-erratum-a004447 flag
i2c: busses: i2c-stm32f4: Remove incorrectly placed ' ' from function name
i2c: busses: i2c-st: Fix copy/paste function misnaming issues
i2c: busses: i2c-pnx: Provide descriptions for 'alg_data' data structure
i2c: busses: i2c-ocores: Place the expected function names into the documentation headers
i2c: busses: i2c-eg20t: Fix 'bad line' issue and provide description for 'msgs' param
i2c: busses: i2c-designware-master: Fix misnaming of 'i2c_dw_init_master()'
i2c: busses: i2c-cadence: Fix incorrectly documented 'enum cdns_i2c_slave_mode'
i2c: busses: i2c-ali1563: File headers are not good candidates for kernel-doc
i2c: muxes: i2c-arb-gpio-challenge: Demote non-conformant kernel-doc headers
i2c: busses: i2c-nomadik: Fix formatting issue pertaining to 'timeout'
i2c: sh_mobile: Use new clock calculation formulas for RZ/G2E
i2c: I2C_HISI should depend on ACPI
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"A handful of RISC-V related fixes:
- avoid errors when the stack tracing code is tracing itself.
- resurrect the memtest= kernel command line argument on RISC-V,
which was briefly enabled during the merge window before a
refactoring disabled it.
- build fix and some warning cleanups"
* tag 'riscv-for-linus-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: kexec: Fix W=1 build warnings
riscv: kprobes: Fix build error when MMU=n
riscv: Select ARCH_USE_MEMTEST
riscv: stacktrace: fix the riscv stacktrace when CONFIG_FRAME_POINTER enabled
|
|
lld does not implement the RISCV relaxation optimizations like GNU ld
therefore disable it when building with lld, Also pass it to
assembler when using external GNU assembler ( LLVM_IAS != 1 ), this
ensures that relevant assembler option is also enabled along. if these
options are not used then we see following relocations in objects
0000000000000000 R_RISCV_ALIGN *ABS*+0x0000000000000002
These are then rejected by lld
ld.lld: error: capability.c:(.fixup+0x0): relocation R_RISCV_ALIGN requires unimplemented linker relaxation; recompile with -mno-relax but the .o is already compiled with -mno-relax
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Pull KVM fixes from Paolo Bonzini:
"ARM fixes:
- Another state update on exit to userspace fix
- Prevent the creation of mixed 32/64 VMs
- Fix regression with irqbypass not restarting the guest on failed
connect
- Fix regression with debug register decoding resulting in
overlapping access
- Commit exception state on exit to usrspace
- Fix the MMU notifier return values
- Add missing 'static' qualifiers in the new host stage-2 code
x86 fixes:
- fix guest missed wakeup with assigned devices
- fix WARN reported by syzkaller
- do not use BIT() in UAPI headers
- make the kvm_amd.avic parameter bool
PPC fixes:
- make halt polling heuristics consistent with other architectures
selftests:
- various fixes
- new performance selftest memslot_perf_test
- test UFFD minor faults in demand_paging_test"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (44 commits)
selftests: kvm: fix overlapping addresses in memslot_perf_test
KVM: X86: Kill off ctxt->ud
KVM: X86: Fix warning caused by stale emulation context
KVM: X86: Use kvm_get_linear_rip() in single-step and #DB/#BP interception
KVM: x86/mmu: Fix comment mentioning skip_4k
KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
KVM: rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK
KVM: x86: add start_assignment hook to kvm_x86_ops
KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch
selftests: kvm: do only 1 memslot_perf_test run by default
KVM: X86: Use _BITUL() macro in UAPI headers
KVM: selftests: add shared hugetlbfs backing source type
KVM: selftests: allow using UFFD minor faults for demand paging
KVM: selftests: create alias mappings when using shared memory
KVM: selftests: add shmem backing source type
KVM: selftests: refactor vm_mem_backing_src_type flags
KVM: selftests: allow different backing source types
KVM: selftests: compute correct demand paging size
KVM: selftests: simplify setup_demand_paging error handling
KVM: selftests: Print a message if /dev/kvm is missing
...
|
|
PIC interrupts do not support affinity setting and they can end up on
any online CPU. Therefore, it's required to mark the associated vectors
as system-wide reserved. Otherwise, the corresponding irq descriptors
are copied to the secondary CPUs but the vectors are not marked as
assigned or reserved. This works correctly for the IO/APIC case.
When the IO/APIC is disabled via config, kernel command line or lack of
enumeration then all legacy interrupts are routed through the PIC, but
nothing marks them as system-wide reserved vectors.
As a consequence, a subsequent allocation on a secondary CPU can result in
allocating one of these vectors, which triggers the BUG() in
apic_update_vector() because the interrupt descriptor slot is not empty.
Imran tried to work around that by marking those interrupts as allocated
when a CPU comes online. But that's wrong in case that the IO/APIC is
available and one of the legacy interrupts, e.g. IRQ0, has been switched to
PIC mode because then marking them as allocated will fail as they are
already marked as system vectors.
Stay consistent and update the legacy vectors after attempting IO/APIC
initialization and mark them as system vectors in case that no IO/APIC is
available.
Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment")
Reported-by: Imran Khan <imran.f.khan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210519233928.2157496-1-imran.f.khan@oracle.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- fix function/preempt trace hangs
- a few build fixes
* tag 'mips-fixes_5.13_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Fix kernel hang under FUNCTION_GRAPH_TRACER and PREEMPT_TRACER
MIPS: ralink: export rt_sysc_membase for rt2880_wdt.c
MIPS: launch.h: add include guard to prevent build errors
MIPS: alchemy: xxs1500: add gpio-au1000.h header file
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.13, take #2
- Another state update on exit to userspace fix
- Prevent the creation of mixed 32/64 VMs
|
|
ctxt->ud is consumed only by x86_decode_insn(), we can kill it off by
passing emulation_type to x86_decode_insn() and dropping ctxt->ud
altogether. Tracking that info in ctxt for literally one call is silly.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <1622160097-37633-2-git-send-email-wanpengli@tencent.com>
|
|
Reported by syzkaller:
WARNING: CPU: 7 PID: 10526 at linux/arch/x86/kvm//x86.c:7621 x86_emulate_instruction+0x41b/0x510 [kvm]
RIP: 0010:x86_emulate_instruction+0x41b/0x510 [kvm]
Call Trace:
kvm_mmu_page_fault+0x126/0x8f0 [kvm]
vmx_handle_exit+0x11e/0x680 [kvm_intel]
vcpu_enter_guest+0xd95/0x1b40 [kvm]
kvm_arch_vcpu_ioctl_run+0x377/0x6a0 [kvm]
kvm_vcpu_ioctl+0x389/0x630 [kvm]
__x64_sys_ioctl+0x8e/0xd0
do_syscall_64+0x3c/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Commit 4a1e10d5b5d8 ("KVM: x86: handle hardware breakpoints during emulation())
adds hardware breakpoints check before emulation the instruction and parts of
emulation context initialization, actually we don't have the EMULTYPE_NO_DECODE flag
here and the emulation context will not be reused. Commit c8848cee74ff ("KVM: x86:
set ctxt->have_exception in x86_decode_insn()) triggers the warning because it
catches the stale emulation context has #UD, however, it is not during instruction
decoding which should result in EMULATION_FAILED. This patch fixes it by moving
the second part emulation context initialization into init_emulate_ctxt() and
before hardware breakpoints check. The ctxt->ud will be dropped by a follow-up
patch.
syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=134683fdd00000
Reported-by: syzbot+71271244f206d17f6441@syzkaller.appspotmail.com
Fixes: 4a1e10d5b5d8 (KVM: x86: handle hardware breakpoints during emulation)
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <1622160097-37633-1-git-send-email-wanpengli@tencent.com>
|
|
The kvm_get_linear_rip() handles x86/long mode cases well and has
better readability, __kvm_set_rflags() also use the paired
function kvm_is_linear_rip() to check the vcpu->arch.singlestep_rip
set in kvm_arch_vcpu_ioctl_set_guest_debug(), so change the
"CS.BASE + RIP" code in kvm_arch_vcpu_ioctl_set_guest_debug() and
handle_exception_nmi() to this one.
Signed-off-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <20210526063828.1173-1-yuan.yao@linux.intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Similar to commit 25edcc50d76c ("KVM: PPC: Book3S HV: Save and restore
FSCR in the P9 path"), ensure the P7/8 path saves and restores the host
FSCR. The logic explained in that patch actually applies there to the
old path well: a context switch can be made before kvmppc_vcpu_run_hv
restores the host FSCR and returns.
Now both the p9 and the p7/8 paths now save and restore their FSCR, it
no longer needs to be restored at the end of kvmppc_vcpu_run_hv
Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs")
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210526125851.3436735-1-npiggin@gmail.com
|
|
real_vmalloc_addr() does not currently work for huge vmalloc, which is
what the reverse map can be allocated with for radix host, hash guest.
Extract the hugepage aware equivalent from eeh code into a helper, and
convert existing sites including this one to use it.
Fixes: 8abddd968a30 ("powerpc/64s/radix: Enable huge vmalloc mappings")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210526120005.3432222-1-npiggin@gmail.com
|
|
When checking if the probed instruction is the suffix of a prefixed
instruction, we access the instruction at the previous word. If the
probed instruction is the very first word of a module, we can end up
trying to access an invalid page.
Fix this by skipping the check for all instructions at the beginning of
a page. Prefixed instructions cannot cross a 64-byte boundary and as
such, we don't expect to encounter a suffix as the very first word in a
page for kernel text. Even if there are prefixed instructions crossing
a page boundary (from a module, for instance), the instruction will be
illegal, so preventing probing on the suffix of such prefix instructions
isn't worthwhile.
Fixes: b4657f7650ba ("powerpc/kprobes: Don't allow breakpoints on suffixes")
Cc: stable@vger.kernel.org # v5.8+
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0df9a032a05576a2fa8e97d1b769af2ff0eafbd6.1621416666.git.naveen.n.rao@linux.vnet.ibm.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Don't use contiguous or block mappings for the linear map when KFENCE
is enabled.
- Fix link in the arch_counter_enforce_ordering() comment.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: don't use CON and BLK mapping if KFENCE is enabled
arm64: Fix stale link in the arch_counter_enforce_ordering() comment
|
|
The i2c controllers on the P1010 have an erratum where the documented
scheme for i2c bus recovery will not work (A-004447). A different
mechanism is needed which is documented in the P1010 Chip Errata Rev L.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
The i2c controllers on the P2040/P2041 have an erratum where the
documented scheme for i2c bus recovery will not work (A-004447). A
different mechanism is needed which is documented in the P2040 Chip
Errata Rev Q (latest available at the time of writing).
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
This comment was left over from a previous version of the patch that
introduced wrprot_gfn_range, when skip_4k was passed in instead of
min_level.
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20210526163227.3113557-1-dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
For VMX, when a vcpu enters HLT emulation, pi_post_block will:
1) Add vcpu to per-cpu list of blocked vcpus.
2) Program the posted-interrupt descriptor "notification vector"
to POSTED_INTR_WAKEUP_VECTOR
With interrupt remapping, an interrupt will set the PIR bit for the
vector programmed for the device on the CPU, test-and-set the
ON bit on the posted interrupt descriptor, and if the ON bit is clear
generate an interrupt for the notification vector.
This way, the target CPU wakes upon a device interrupt and wakes up
the target vcpu.
Problem is that pi_post_block only programs the notification vector
if kvm_arch_has_assigned_device() is true. Its possible for the
following to happen:
1) vcpu V HLTs on pcpu P, kvm_arch_has_assigned_device is false,
notification vector is not programmed
2) device is assigned to VM
3) device interrupts vcpu V, sets ON bit
(notification vector not programmed, so pcpu P remains in idle)
4) vcpu 0 IPIs vcpu V (in guest), but since pi descriptor ON bit is set,
kvm_vcpu_kick is skipped
5) vcpu 0 busy spins on vcpu V's response for several seconds, until
RCU watchdog NMIs all vCPUs.
To fix this, use the start_assignment kvm_x86_ops callback to kick
vcpus out of the halt loop, so the notification vector is
properly reprogrammed to the wakeup vector.
Reported-by: Pei Zhang <pezhang@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Message-Id: <20210526172014.GA29007@fuller.cnet>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|