summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)Author
2021-08-20powerpc/64s: Fix scv implicit soft-mask table for relocated kernelsNicholas Piggin
The implict soft-mask table addresses get relocated if they use a relative symbol like a label. This is right for code that runs relocated but not for unrelocated. The scv interrupt vectors run unrelocated, so absolute addresses are required for their soft-mask table entry. This fixes crashing with relocated kernels, usually an asynchronous interrupt hitting in the scv handler, then hitting the trap that checks whether r1 is in userspace. Fixes: 325678fd0522 ("powerpc/64s: add a table of implicit soft-masked addresses") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210820103431.1701240-1-npiggin@gmail.com
2021-08-12powerpc/interrupt: Do not call single_step_exception() from other exceptionsChristophe Leroy
single_step_exception() is called by emulate_single_step() which is called from (at least) alignment exception() handler and program_check_exception() handler. Redefine it as a regular __single_step_exception() which is called by both single_step_exception() handler and emulate_single_step() function. Fixes: 3a96570ffceb ("powerpc: convert interrupt handlers to use wrappers") Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/aed174f5cbc06f2cf95233c071d8aac948e46043.1628611921.git.christophe.leroy@csgroup.eu
2021-08-12powerpc/interrupt: Fix OOPS by not calling do_IRQ() from timer_interrupt()Christophe Leroy
An interrupt handler shall not be called from another interrupt handler otherwise this leads to problems like the following: Kernel attempted to write user page (afd4fa84) - exploit attempt? (uid: 1000) ------------[ cut here ]------------ Bug: Write fault blocked by KUAP! WARNING: CPU: 0 PID: 1617 at arch/powerpc/mm/fault.c:230 do_page_fault+0x484/0x720 Modules linked in: CPU: 0 PID: 1617 Comm: sshd Tainted: G W 5.13.0-pmac-00010-g8393422eb77 #7 NIP: c001b77c LR: c001b77c CTR: 00000000 REGS: cb9e5bc0 TRAP: 0700 Tainted: G W (5.13.0-pmac-00010-g8393422eb77) MSR: 00021032 <ME,IR,DR,RI> CR: 24942424 XER: 00000000 GPR00: c001b77c cb9e5c80 c1582c00 00000021 3ffffbff 085b0000 00000027 c8eb644c GPR08: 00000023 00000000 00000000 00000000 24942424 0063f8c8 00000000 000186a0 GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 c07640c4 cb9e5e98 cb9e5e90 GPR24: 00000040 afd4fa96 00000040 02000000 c1fda6c0 afd4fa84 00000300 cb9e5cc0 NIP [c001b77c] do_page_fault+0x484/0x720 LR [c001b77c] do_page_fault+0x484/0x720 Call Trace: [cb9e5c80] [c001b77c] do_page_fault+0x484/0x720 (unreliable) [cb9e5cb0] [c000424c] DataAccess_virt+0xd4/0xe4 --- interrupt: 300 at __copy_tofrom_user+0x110/0x20c NIP: c001f9b4 LR: c03250a0 CTR: 00000004 REGS: cb9e5cc0 TRAP: 0300 Tainted: G W (5.13.0-pmac-00010-g8393422eb77) MSR: 00009032 <EE,ME,IR,DR,RI> CR: 48028468 XER: 20000000 DAR: afd4fa84 DSISR: 0a000000 GPR00: 20726f6f cb9e5d80 c1582c00 00000004 cb9e5e3a 00000016 afd4fa80 00000000 GPR08: 3835202d 72777872 2d78722d 00000004 28028464 0063f8c8 00000000 000186a0 GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 c07640c4 cb9e5e98 cb9e5e90 GPR24: 00000040 afd4fa96 00000040 cb9e5e0c 00000daa a0000000 cb9e5e98 afd4fa56 NIP [c001f9b4] __copy_tofrom_user+0x110/0x20c LR [c03250a0] _copy_to_iter+0x144/0x990 --- interrupt: 300 [cb9e5d80] [c03e89c0] n_tty_read+0xa4/0x598 (unreliable) [cb9e5df0] [c03e2a0c] tty_read+0xdc/0x2b4 [cb9e5e80] [c0156bf8] vfs_read+0x274/0x340 [cb9e5f00] [c01571ac] ksys_read+0x70/0x118 [cb9e5f30] [c0016048] ret_from_syscall+0x0/0x28 --- interrupt: c00 at 0xa7855c88 NIP: a7855c88 LR: a7855c5c CTR: 00000000 REGS: cb9e5f40 TRAP: 0c00 Tainted: G W (5.13.0-pmac-00010-g8393422eb77) MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 2402446c XER: 00000000 GPR00: 00000003 afd4ec70 a72137d0 0000000b afd4ecac 00004000 0065a990 00000800 GPR08: 00000000 a7947930 00000000 00000004 c15831b0 0063f8c8 00000000 000186a0 GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 0065a9e0 00000001 0065fac0 GPR24: 00000000 00000089 00664050 00000000 00668e30 a720c8dc a7943ff4 0065f9b0 NIP [a7855c88] 0xa7855c88 LR [a7855c5c] 0xa7855c5c --- interrupt: c00 Instruction dump: 3884aa88 38630178 48076861 807f0080 48042e45 2f830000 419e0148 3c80c079 3c60c076 38841be4 386301c0 4801f705 <0fe00000> 3860000b 4bfffe30 3c80c06b ---[ end trace fd69b91a8046c2e5 ]--- Here the problem is that by re-enterring an exception handler, kuap_save_and_lock() is called a second time with this time KUAP access locked, leading to regs->kuap being overwritten hence KUAP not being unlocked at exception exit as expected. Do not call do_IRQ() from timer_interrupt() directly. Instead, redefine do_IRQ() as a standard function named __do_IRQ(), and call it from both do_IRQ() and time_interrupt() handlers. Fixes: 3a96570ffceb ("powerpc: convert interrupt handlers to use wrappers") Cc: stable@vger.kernel.org # v5.12+ Reported-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c17d234f4927d39a1d7100864a8e1145323d33a0.1628611927.git.christophe.leroy@csgroup.eu
2021-08-09powerpc/kprobes: Fix kprobe Oops happens in bookePu Lehui
When using kprobe on powerpc booke series processor, Oops happens as show bellow: / # echo "p:myprobe do_nanosleep" > /sys/kernel/debug/tracing/kprobe_events / # echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable / # sleep 1 [ 50.076730] Oops: Exception in kernel mode, sig: 5 [#1] [ 50.077017] BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500 [ 50.077221] Modules linked in: [ 50.077462] CPU: 0 PID: 77 Comm: sleep Not tainted 5.14.0-rc4-00022-g251a1524293d #21 [ 50.077887] NIP: c0b9c4e0 LR: c00ebecc CTR: 00000000 [ 50.078067] REGS: c3883de0 TRAP: 0700 Not tainted (5.14.0-rc4-00022-g251a1524293d) [ 50.078349] MSR: 00029000 <CE,EE,ME> CR: 24000228 XER: 20000000 [ 50.078675] [ 50.078675] GPR00: c00ebdf0 c3883e90 c313e300 c3883ea0 00000001 00000000 c3883ecc 00000001 [ 50.078675] GPR08: c100598c c00ea250 00000004 00000000 24000222 102490c2 bff4180c 101e60d4 [ 50.078675] GPR16: 00000000 102454ac 00000040 10240000 10241100 102410f8 10240000 00500000 [ 50.078675] GPR24: 00000002 00000000 c3883ea0 00000001 00000000 0000c350 3b9b8d50 00000000 [ 50.080151] NIP [c0b9c4e0] do_nanosleep+0x0/0x190 [ 50.080352] LR [c00ebecc] hrtimer_nanosleep+0x14c/0x1e0 [ 50.080638] Call Trace: [ 50.080801] [c3883e90] [c00ebdf0] hrtimer_nanosleep+0x70/0x1e0 (unreliable) [ 50.081110] [c3883f00] [c00ec004] sys_nanosleep_time32+0xa4/0x110 [ 50.081336] [c3883f40] [c001509c] ret_from_syscall+0x0/0x28 [ 50.081541] --- interrupt: c00 at 0x100a4d08 [ 50.081749] NIP: 100a4d08 LR: 101b5234 CTR: 00000003 [ 50.081931] REGS: c3883f50 TRAP: 0c00 Not tainted (5.14.0-rc4-00022-g251a1524293d) [ 50.082183] MSR: 0002f902 <CE,EE,PR,FP,ME> CR: 24000222 XER: 00000000 [ 50.082457] [ 50.082457] GPR00: 000000a2 bf980040 1024b4d0 bf980084 bf980084 64000000 00555345 fefefeff [ 50.082457] GPR08: 7f7f7f7f 101e0000 00000069 00000003 28000422 102490c2 bff4180c 101e60d4 [ 50.082457] GPR16: 00000000 102454ac 00000040 10240000 10241100 102410f8 10240000 00500000 [ 50.082457] GPR24: 00000002 bf9803f4 10240000 00000000 00000000 100039e0 00000000 102444e8 [ 50.083789] NIP [100a4d08] 0x100a4d08 [ 50.083917] LR [101b5234] 0x101b5234 [ 50.084042] --- interrupt: c00 [ 50.084238] Instruction dump: [ 50.084483] 4bfffc40 60000000 60000000 60000000 9421fff0 39400402 914200c0 38210010 [ 50.084841] 4bfffc20 00000000 00000000 00000000 <7fe00008> 7c0802a6 7c892378 93c10048 [ 50.085487] ---[ end trace f6fffe98e2fa8f3e ]--- [ 50.085678] Trace/breakpoint trap There is no real mode for booke arch and the MMU translation is always on. The corresponding MSR_IS/MSR_DS bit in booke is used to switch the address space, but not for real mode judgment. Fixes: 21f8b2fa3ca5 ("powerpc/kprobes: Ignore traps that happened in real mode") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210809023658.218915-1-pulehui@huawei.com
2021-08-07powerpc/smp: Fix OOPS in topology_init()Christophe Leroy
Running an SMP kernel on an UP platform not prepared for it, I encountered the following OOPS: BUG: Kernel NULL pointer dereference on read at 0x00000034 Faulting instruction address: 0xc0a04110 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K SMP NR_CPUS=2 CMPCPRO Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-pmac-00001-g230fedfaad21 #5234 NIP: c0a04110 LR: c0a040d8 CTR: c0a04084 REGS: e100dda0 TRAP: 0300 Not tainted (5.13.0-pmac-00001-g230fedfaad21) MSR: 00009032 <EE,ME,IR,DR,RI> CR: 84000284 XER: 00000000 DAR: 00000034 DSISR: 20000000 GPR00: c0006bd4 e100de60 c1033320 00000000 00000000 c0942274 00000000 00000000 GPR08: 00000000 00000000 00000001 00000063 00000007 00000000 c0006f30 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000005 GPR24: c0c67d74 c0c67f1c c0c60000 c0c67d70 c0c0c558 1efdf000 c0c00020 00000000 NIP [c0a04110] topology_init+0x8c/0x138 LR [c0a040d8] topology_init+0x54/0x138 Call Trace: [e100de60] [80808080] 0x80808080 (unreliable) [e100de90] [c0006bd4] do_one_initcall+0x48/0x1bc [e100def0] [c0a0150c] kernel_init_freeable+0x1c8/0x278 [e100df20] [c0006f44] kernel_init+0x14/0x10c [e100df30] [c00190fc] ret_from_kernel_thread+0x14/0x1c Instruction dump: 7c692e70 7d290194 7c035040 7c7f1b78 5529103a 546706fe 5468103a 39400001 7c641b78 40800054 80c690b4 7fb9402e <81060034> 7fbeea14 2c080000 7fa3eb78 ---[ end trace b246ffbc6bbbb6fb ]--- Fix it by checking smp_ops before using it, as already done in several other places in the arch/powerpc/kernel/smp.c Fixes: 39f87561454d ("powerpc/smp: Move ppc_md.cpu_die() to smp_ops.cpu_offline_self()") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/75287841cbb8740edd44880fe60be66d489160d9.1628097995.git.christophe.leroy@csgroup.eu
2021-08-07powerpc/32: Fix critical and debug interrupts on BOOKEChristophe Leroy
32 bits BOOKE have special interrupts for debug and other critical events. When handling those interrupts, dedicated registers are saved in the stack frame in addition to the standard registers, leading to a shift of the pt_regs struct. Since commit db297c3b07af ("powerpc/32: Don't save thread.regs on interrupt entry"), the pt_regs struct is expected to be at the same place all the time. Instead of handling a special struct in addition to pt_regs, just add those special registers to struct pt_regs. Fixes: db297c3b07af ("powerpc/32: Don't save thread.regs on interrupt entry") Cc: stable@vger.kernel.org Reported-by: Radu Rendec <radu.rendec@gmail.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/028d5483b4851b01ea4334d0751e7f260419092b.1625637264.git.christophe.leroy@csgroup.eu
2021-08-07powerpc/32s: Fix napping restore in data storage interrupt (DSI)Christophe Leroy
When a DSI (Data Storage Interrupt) is taken while in NAP mode, r11 doesn't survive the call to power_save_ppc32_restore(). So use r1 instead of r11 as they both contain the virtual stack pointer at that point. Fixes: 4c0104a83fc3 ("powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE") Cc: stable@vger.kernel.org # v5.13+ Reported-by: Finn Thain <fthain@linux-m68k.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/731694e0885271f6ee9ffc179eb4bcee78313682.1628003562.git.christophe.leroy@csgroup.eu
2021-07-29powerpc/vdso: Don't use r30 to avoid breaking Go langMichael Ellerman
The Go runtime uses r30 for some special value called 'g'. It assumes that value will remain unchanged even when calling VDSO functions. Although r30 is non-volatile across function calls, the callee is free to use it, as long as the callee saves the value and restores it before returning. It used to be true by accident that the VDSO didn't use r30, because the VDSO was hand-written asm. When we switched to building the VDSO from C the compiler started using r30, at least in some builds, leading to crashes in Go. eg: ~/go/src$ ./all.bash Building Go cmd/dist using /usr/lib/go-1.16. (go1.16.2 linux/ppc64le) Building Go toolchain1 using /usr/lib/go-1.16. go build os/exec: /usr/lib/go-1.16/pkg/tool/linux_ppc64le/compile: signal: segmentation fault go build reflect: /usr/lib/go-1.16/pkg/tool/linux_ppc64le/compile: signal: segmentation fault go tool dist: FAILED: /usr/lib/go-1.16/bin/go install -gcflags=-l -tags=math_big_pure_go compiler_bootstrap bootstrap/cmd/...: exit status 1 There are patches in flight to fix Go[1], but until they are released and widely deployed we can workaround it in the VDSO by avoiding use of r30. Note this only works with GCC, clang does not support -ffixed-rN. 1: https://go-review.googlesource.com/c/go/+/328110 Fixes: ab037dd87a2f ("powerpc/vdso: Switch VDSO to generic C implementation.") Cc: stable@vger.kernel.org # v5.11+ Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210729131244.2595519-1-mpe@ellerman.id.au
2021-07-09Merge tag 'powerpc-5.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fix crashes on 64-bit Book3E due to use of Book3S only mtmsrd instruction. Fix "scheduling while atomic" warnings at boot due to preempt count underflow. Two commits fixing our handling of BPF atomic instructions. Fix error handling in xive when allocating an IPI. Fix lockup on kernel exec fault on 603. Thanks to Bharata B Rao, Cédric Le Goater, Christian Zigotzky, Christophe Leroy, Guenter Roeck, Jiri Olsa, Naveen N. Rao, Nicholas Piggin, and Valentin Schneider" * tag 'powerpc-5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/preempt: Don't touch the idle task's preempt_count during hotplug powerpc/64e: Fix system call illegal mtmsrd instruction powerpc/xive: Fix error handling when allocating an IPI powerpc/bpf: Reject atomic ops in ppc32 JIT powerpc/bpf: Fix detecting BPF atomic instructions powerpc/mm: Fix lockup on kernel exec fault
2021-07-08powerpc: convert to setup_initial_init_mm()Kefeng Wang
Use setup_initial_init_mm() helper to simplify code. Link: https://lkml.kernel.org/r/20210608083418.137226-12-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Souptick Joarder <jrdr.linux@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-06powerpc/64e: Fix system call illegal mtmsrd instructionNicholas Piggin
BookE does not have mtmsrd, switch to use wrteei to enable MSR[EE]. Fixes: dd152f70bdc1 ("powerpc/64s: system call avoid setting MSR[RI] until we set MSR[EE]") Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210706051310.608992-1-npiggin@gmail.com
2021-07-02Merge tag 'powerpc-5.14-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - A big series refactoring parts of our KVM code, and converting some to C. - Support for ARCH_HAS_SET_MEMORY, and ARCH_HAS_STRICT_MODULE_RWX on some CPUs. - Support for the Microwatt soft-core. - Optimisations to our interrupt return path on 64-bit. - Support for userspace access to the NX GZIP accelerator on PowerVM on Power10. - Enable KUAP and KUEP by default on 32-bit Book3S CPUs. - Other smaller features, fixes & cleanups. Thanks to: Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Baokun Li, Benjamin Herrenschmidt, Bharata B Rao, Christophe Leroy, Daniel Axtens, Daniel Henrique Barboza, Finn Thain, Geoff Levand, Haren Myneni, Jason Wang, Jiapeng Chong, Joel Stanley, Jordan Niethe, Kajol Jain, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Paul Mackerras, Russell Currey, Sathvika Vasireddy, Shaokun Zhang, Stephen Rothwell, Sudeep Holla, Suraj Jitindar Singh, Tom Rix, Vaibhav Jain, YueHaibing, Zhang Jianhua, and Zhen Lei. * tag 'powerpc-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (218 commits) powerpc: Only build restart_table.c for 64s powerpc/64s: move ret_from_fork etc above __end_soft_masked powerpc/64s/interrupt: clean up interrupt return labels powerpc/64/interrupt: add missing kprobe annotations on interrupt exit symbols powerpc/64: enable MSR[EE] in irq replay pt_regs powerpc/64s/interrupt: preserve regs->softe for NMI interrupts powerpc/64s: add a table of implicit soft-masked addresses powerpc/64e: remove implicit soft-masking and interrupt exit restart logic powerpc/64e: fix CONFIG_RELOCATABLE build warnings powerpc/64s: fix hash page fault interrupt handler powerpc/4xx: Fix setup_kuep() on SMP powerpc/32s: Fix setup_{kuap/kuep}() on SMP powerpc/interrupt: Use names in check_return_regs_valid() powerpc/interrupt: Also use exit_must_hard_disable() on PPC32 powerpc/sysfs: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE powerpc/ptrace: Refactor regs_set_return_{msr/ip} powerpc/ptrace: Move set_return_regs_changed() before regs_set_return_{msr/ip} powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi() powerpc/pseries/vas: Include irqdomain.h powerpc: mark local variables around longjmp as volatile ...
2021-07-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "190 patches. Subsystems affected by this patch series: mm (hugetlb, userfaultfd, vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock, migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap, zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc, core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs, signals, exec, kcov, selftests, compress/decompress, and ipc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits) ipc/util.c: use binary search for max_idx ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock ipc: use kmalloc for msg_queue and shmid_kernel ipc sem: use kvmalloc for sem_undo allocation lib/decompressors: remove set but not used variabled 'level' selftests/vm/pkeys: exercise x86 XSAVE init state selftests/vm/pkeys: refill shadow register after implicit kernel write selftests/vm/pkeys: handle negative sys_pkey_alloc() return code selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random kcov: add __no_sanitize_coverage to fix noinstr for all architectures exec: remove checks in __register_bimfmt() x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned hfsplus: report create_date to kstat.btime hfsplus: remove unnecessary oom message nilfs2: remove redundant continue statement in a while-loop kprobes: remove duplicated strong free_insn_page in x86 and s390 init: print out unknown kernel parameters checkpatch: do not complain about positive return values starting with EPOLL checkpatch: improve the indented label test checkpatch: scripts/spdxcheck.py now requires python3 ...
2021-07-01Merge tag 'fs_for_v5.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull misc fs updates from Jan Kara: "The new quotactl_fd() syscall (remake of quotactl_path() syscall that got introduced & disabled in 5.13 cycle), and couple of udf, reiserfs, isofs, and writeback fixes and cleanups" * tag 'fs_for_v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: writeback: fix obtain a reference to a freeing memcg css quota: remove unnecessary oom message isofs: remove redundant continue statement quota: Wire up quotactl_fd syscall quota: Change quotactl_path() systcall to an fd-based one reiserfs: Remove unneed check in reiserfs_write_full_page() udf: Fix NULL pointer dereference in udf_symlink function reiserfs: add check for invalid 1st journal block
2021-07-01kernel.h: split out panic and oops helpersAndy Shevchenko
kernel.h is being used as a dump for all kinds of stuff for a long time. Here is the attempt to start cleaning it up by splitting out panic and oops helpers. There are several purposes of doing this: - dropping dependency in bug.h - dropping a loop by moving out panic_notifier.h - unload kernel.h from something which has its own domain At the same time convert users tree-wide to use new headers, although for the time being include new header back to kernel.h to avoid twisted indirected includes for existing users. [akpm@linux-foundation.org: thread_info.h needs limits.h] [andriy.shevchenko@linux.intel.com: ia64 fix] Link: https://lkml.kernel.org/r/20210520130557.55277-1-andriy.shevchenko@linux.intel.com Link: https://lkml.kernel.org/r/20210511074137.33666-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Co-developed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Sebastian Reichel <sre@kernel.org> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Stephen Boyd <sboyd@kernel.org> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Acked-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30powerpc/64s: move ret_from_fork etc above __end_soft_maskedNicholas Piggin
Code which runs with interrupts enabled should be moved above __end_soft_masked where possible, because maskable interrupts that hit below that symbol will need to consult the soft mask table, which is an extra cost. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210630074621.2109197-10-npiggin@gmail.com
2021-06-30powerpc/64s/interrupt: clean up interrupt return labelsNicholas Piggin
Normal kernel-interrupt exits can get interrupt_return_srr_user_restart in their backtrace, which is an unusual and notable function, and it is part of the user-interrupt exit path, which is doubly confusing. Add non-local labels for both user and kernel interrupt exit cases to address this and make the user and kernel cases more symmetric. Also get rid of an unused label. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210630074621.2109197-9-npiggin@gmail.com
2021-06-30powerpc/64/interrupt: add missing kprobe annotations on interrupt exit symbolsNicholas Piggin
If one interrupt exit symbol must not be kprobed, none of them can be, without more justification for why it's safe. Disallow kprobing on any of the (non-local) labels in the exit paths. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210630074621.2109197-8-npiggin@gmail.com
2021-06-30powerpc/64: enable MSR[EE] in irq replay pt_regsNicholas Piggin
Similar to commit 2b48e96be2f9f ("powerpc/64: fix irq replay pt_regs->softe value"), enable MSR_EE in pt_regs->msr. This makes the regs look more normal. It also allows some extra debug checks to be added to interrupt handler entry. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210630074621.2109197-7-npiggin@gmail.com
2021-06-30powerpc/64s: add a table of implicit soft-masked addressesNicholas Piggin
Commit 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked") ends up catching too much code, including ret_from_fork, and parts of interrupt and syscall return that do not expect to be interrupts to be soft-masked. If an interrupt gets marked pending, and then the code proceeds out of the implicit soft-masked region it will fail to deal with the pending interrupt. Fix this by adding a new table of addresses which explicitly marks the regions of code that are soft masked. This table is only checked for interrupts that below __end_soft_masked, so most kernel interrupts will not have the overhead of the table search. Fixes: 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210630074621.2109197-5-npiggin@gmail.com
2021-06-30powerpc/64e: remove implicit soft-masking and interrupt exit restart logicNicholas Piggin
The implicit soft-masking to speed up interrupt return was going to be used by 64e as well, but it has not been extensively tested on that platform and is not considered ready. It was intended to be disabled before merge. Disable it for now. Most of the restart code is common with 64s, so with more correctness and performance testing this could be re-enabled again by adding the extra soft-mask checks to interrupt handlers and flipping exit_must_hard_disable(). Fixes: 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210630074621.2109197-4-npiggin@gmail.com
2021-06-30powerpc/64e: fix CONFIG_RELOCATABLE build warningsNicholas Piggin
CONFIG_RELOCATABLE=y causes build warnings from unresolved relocations. Fix these by using TOC addressing for these cases. Commit 24d33ac5b8ff ("powerpc/64s: Make prom_init require RELOCATABLE") caused some 64e configs to select RELOCATABLE resulting in these warnings, but the underlying issue was already there. This passes basic qemu testing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210630074621.2109197-3-npiggin@gmail.com
2021-06-29Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: "191 patches. Subsystems affected by this patch series: kthread, ia64, scripts, ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab, slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap, mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization, pagealloc, and memory-failure)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (191 commits) mm,hwpoison: make get_hwpoison_page() call get_any_page() mm,hwpoison: send SIGBUS with error virutal address mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes mm/page_alloc: allow high-order pages to be stored on the per-cpu lists mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA docs: remove description of DISCONTIGMEM arch, mm: remove stale mentions of DISCONIGMEM mm: remove CONFIG_DISCONTIGMEM m68k: remove support for DISCONTIGMEM arc: remove support for DISCONTIGMEM arc: update comment about HIGHMEM implementation alpha: remove DISCONTIGMEM and NUMA mm/page_alloc: move free_the_page mm/page_alloc: fix counting of managed_pages mm/page_alloc: improve memmap_pages dbg msg mm: drop SECTION_SHIFT in code comments mm/page_alloc: introduce vm.percpu_pagelist_high_fraction mm/page_alloc: limit the number of pages on PCP lists when reclaim is active mm/page_alloc: scale the number of pages that are batch freed ...
2021-06-29Merge tag 'irq-core-2021-06-29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt subsystem: Core changes: - Cleanup and simplification of common code to invoke the low level interrupt flow handlers when this invocation requires irqdomain resolution. Add the necessary core infrastructure. - Provide a proper interface for modular PMU drivers to set the interrupt affinity. - Add a request flag which allows to exclude interrupts from spurious interrupt detection. Useful especially for IPI handlers which always return IRQ_HANDLED which turns the spurious interrupt detection into a pointless waste of CPU cycles. Driver changes: - Bulk convert interrupt chip drivers to the new irqdomain low level flow handler invocation mechanism. - Add device tree bindings for the Renesas R-Car M3-W+ SoC - Enable modular build of the Qualcomm PDC driver - The usual small fixes and improvements" * tag 'irq-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) dt-bindings: interrupt-controller: arm,gic-v3: Describe GICv3 optional properties irqchip: gic-pm: Remove redundant error log of clock bulk irqchip/sun4i: Remove unnecessary oom message irqchip/irq-imx-gpcv2: Remove unnecessary oom message irqchip/imgpdc: Remove unnecessary oom message irqchip/gic-v3-its: Remove unnecessary oom message irqchip/gic-v2m: Remove unnecessary oom message irqchip/exynos-combiner: Remove unnecessary oom message irqchip: Bulk conversion to generic_handle_domain_irq() genirq: Move non-irqdomain handle_domain_irq() handling into ARM's handle_IRQ() genirq: Add generic_handle_domain_irq() helper irqchip/nvic: Convert from handle_IRQ() to handle_domain_irq() irqdesc: Fix __handle_domain_irq() comment genirq: Use irq_resolve_mapping() to implement __handle_domain_irq() and co irqdomain: Introduce irq_resolve_mapping() irqdomain: Protect the linear revmap with RCU irqdomain: Cache irq_data instead of a virq number in the revmap irqdomain: Use struct_size() helper when allocating irqdomain irqdomain: Make normal and nomap irqdomains exclusive powerpc: Move the use of irq_domain_add_nomap() behind a config option ...
2021-06-29mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMAMike Rapoport
After removal of DISCINTIGMEM the NEED_MULTIPLE_NODES and NUMA configuration options are equivalent. Drop CONFIG_NEED_MULTIPLE_NODES and use CONFIG_NUMA instead. Done with $ sed -i 's/CONFIG_NEED_MULTIPLE_NODES/CONFIG_NUMA/' \ $(git grep -wl CONFIG_NEED_MULTIPLE_NODES) $ sed -i 's/NEED_MULTIPLE_NODES/NUMA/' \ $(git grep -wl NEED_MULTIPLE_NODES) with manual tweaks afterwards. [rppt@linux.ibm.com: fix arm boot crash] Link: https://lkml.kernel.org/r/YMj9vHhHOiCVN4BF@linux.ibm.com Link: https://lkml.kernel.org/r/20210608091316.3622-9-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "This covers all architectures (except MIPS) so I don't expect any other feature pull requests this merge window. ARM: - Add MTE support in guests, complete with tag save/restore interface - Reduce the impact of CMOs by moving them in the page-table code - Allow device block mappings at stage-2 - Reduce the footprint of the vmemmap in protected mode - Support the vGIC on dumb systems such as the Apple M1 - Add selftest infrastructure to support multiple configuration and apply that to PMU/non-PMU setups - Add selftests for the debug architecture - The usual crop of PMU fixes PPC: - Support for the H_RPT_INVALIDATE hypercall - Conversion of Book3S entry/exit to C - Bug fixes S390: - new HW facilities for guests - make inline assembly more robust with KASAN and co x86: - Allow userspace to handle emulation errors (unknown instructions) - Lazy allocation of the rmap (host physical -> guest physical address) - Support for virtualizing TSC scaling on VMX machines - Optimizations to avoid shattering huge pages at the beginning of live migration - Support for initializing the PDPTRs without loading them from memory - Many TLB flushing cleanups - Refuse to load if two-stage paging is available but NX is not (this has been a requirement in practice for over a year) - A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from CR0/CR4/EFER, using the MMU mode everywhere once it is computed from the CPU registers - Use PM notifier to notify the guest about host suspend or hibernate - Support for passing arguments to Hyper-V hypercalls using XMM registers - Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap on AMD processors - Hide Hyper-V hypercalls that are not included in the guest CPUID - Fixes for live migration of virtual machines that use the Hyper-V "enlightened VMCS" optimization of nested virtualization - Bugfixes (not many) Generic: - Support for retrieving statistics without debugfs - Cleanups for the KVM selftests API" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (314 commits) KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled kvm: x86: disable the narrow guest module parameter on unload selftests: kvm: Allows userspace to handle emulation errors. kvm: x86: Allow userspace to handle emulation errors KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic KVM: x86: Enhance comments for MMU roles and nested transition trickiness KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU KVM: x86/mmu: Use MMU's role to determine PTTYPE KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers KVM: x86/mmu: Add a helper to calculate root from role_regs KVM: x86/mmu: Add helper to update paging metadata KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0 KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper KVM: x86/mmu: Get nested MMU's root level from the MMU's role ...
2021-06-28Merge tag 'sched-core-2021-06-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler udpates from Ingo Molnar: - Changes to core scheduling facilities: - Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables coordinated scheduling across SMT siblings. This is a much requested feature for cloud computing platforms, to allow the flexible utilization of SMT siblings, without exposing untrusted domains to information leaks & side channels, plus to ensure more deterministic computing performance on SMT systems used by heterogenous workloads. There are new prctls to set core scheduling groups, which allows more flexible management of workloads that can share siblings. - Fix task->state access anti-patterns that may result in missed wakeups and rename it to ->__state in the process to catch new abuses. - Load-balancing changes: - Tweak newidle_balance for fair-sched, to improve 'memcache'-like workloads. - "Age" (decay) average idle time, to better track & improve workloads such as 'tbench'. - Fix & improve energy-aware (EAS) balancing logic & metrics. - Fix & improve the uclamp metrics. - Fix task migration (taskset) corner case on !CONFIG_CPUSET. - Fix RT and deadline utilization tracking across policy changes - Introduce a "burstable" CFS controller via cgroups, which allows bursty CPU-bound workloads to borrow a bit against their future quota to improve overall latencies & batching. Can be tweaked via /sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us. - Rework assymetric topology/capacity detection & handling. - Scheduler statistics & tooling: - Disable delayacct by default, but add a sysctl to enable it at runtime if tooling needs it. Use static keys and other optimizations to make it more palatable. - Use sched_clock() in delayacct, instead of ktime_get_ns(). - Misc cleanups and fixes. * tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits) sched/doc: Update the CPU capacity asymmetry bits sched/topology: Rework CPU capacity asymmetry detection sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag psi: Fix race between psi_trigger_create/destroy sched/fair: Introduce the burstable CFS controller sched/uclamp: Fix uclamp_tg_restrict() sched/rt: Fix Deadline utilization tracking during policy change sched/rt: Fix RT utilization tracking during policy change sched: Change task_struct::state sched,arch: Remove unused TASK_STATE offsets sched,timer: Use __set_current_state() sched: Add get_current_state() sched,perf,kvm: Fix preemption condition sched: Introduce task_is_running() sched: Unbreak wakeups sched/fair: Age the average idle time sched/cpufreq: Consider reduced CPU capacity in energy calculation sched/fair: Take thermal pressure into account while estimating energy thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressure sched/fair: Return early from update_tg_cfs_load() if delta == 0 ...
2021-06-28Merge tag 'perf-core-2021-06-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: - Platform PMU driver updates: - x86 Intel uncore driver updates for Skylake (SNR) and Icelake (ICX) servers - Fix RDPMC support - Fix [extended-]PEBS-via-PT support - Fix Sapphire Rapids event constraints - Fix :ppp support on Sapphire Rapids - Fix fixed counter sanity check on Alder Lake & X86_FEATURE_HYBRID_CPU - Other heterogenous-PMU fixes - Kprobes: - Remove the unused and misguided kprobe::fault_handler callbacks. - Warn about kprobes taking a page fault. - Fix the 'nmissed' stat counter. - Misc cleanups and fixes. * tag 'perf-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix task context PMU for Hetero perf/x86/intel: Fix instructions:ppp support in Sapphire Rapids perf/x86/intel: Add more events requires FRONTEND MSR on Sapphire Rapids perf/x86/intel: Fix fixed counter check warning for some Alder Lake perf/x86/intel: Fix PEBS-via-PT reload base value for Extended PEBS perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task kprobes: Do not increment probe miss count in the fault handler x86,kprobes: WARN if kprobes tries to handle a fault kprobes: Remove kprobe::fault_handler uprobes: Update uprobe_write_opcode() kernel-doc comment perf/hw_breakpoint: Fix DocBook warnings in perf hw_breakpoint perf/core: Fix DocBook warnings perf/core: Make local function perf_pmu_snapshot_aux() static perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on ICX perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on SNR perf/x86/intel/uncore: Generalize I/O stacks to PMON mapping procedure perf/x86/intel/uncore: Drop unnecessary NULL checks after container_of()
2021-06-26powerpc/interrupt: Use names in check_return_regs_valid()Christophe Leroy
trap->regs == 0x3000 is trap_is_scv() trap 0x500 is INTERRUPT_EXTERNAL Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d48bf0184a1de185eb0ed3282247f8a294710674.1624632537.git.christophe.leroy@csgroup.eu
2021-06-26powerpc/interrupt: Also use exit_must_hard_disable() on PPC32Christophe Leroy
Reduce #ifdefs a bit by making exit_must_hard_disable() return true on PPC32. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/52531029563c1fc823b790058e799d0ca71b028c.1624631463.git.christophe.leroy@csgroup.eu
2021-06-25Merge tag 'kvmarm-5.14' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for v5.14. - Add MTE support in guests, complete with tag save/restore interface - Reduce the impact of CMOs by moving them in the page-table code - Allow device block mappings at stage-2 - Reduce the footprint of the vmemmap in protected mode - Support the vGIC on dumb systems such as the Apple M1 - Add selftest infrastructure to support multiple configuration and apply that to PMU/non-PMU setups - Add selftests for the debug architecture - The usual crop of PMU fixes
2021-06-26powerpc/sysfs: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZEJason Wang
The ARRAY_SIZE macro is more compact and more formal in linux source. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210624063632.25632-1-wangborong@cdjrlc.com
2021-06-26powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()Michael Ellerman
In raise_backtrace_ipi() we iterate through the cpumask of CPUs, sending each an IPI asking them to do a backtrace, but we don't wait for the backtrace to happen. We then iterate through the CPU mask again, and if any CPU hasn't done the backtrace and cleared itself from the mask, we print a trace on its behalf, noting that the trace may be "stale". This works well enough when a CPU is not responding, because in that case it doesn't receive the IPI and the sending CPU is left to print the trace. But when all CPUs are responding we are left with a race between the sending and receiving CPUs, if the sending CPU wins the race then it will erroneously print a trace. This leads to spurious "stale" traces from the sending CPU, which can then be interleaved messily with the receiving CPU, note the CPU numbers, eg: [ 1658.929157][ C7] rcu: Stack dump where RCU GP kthread last ran: [ 1658.929223][ C7] Sending NMI from CPU 7 to CPUs 1: [ 1658.929303][ C1] NMI backtrace for cpu 1 [ 1658.929303][ C7] CPU 1 didn't respond to backtrace IPI, inspecting paca. [ 1658.929362][ C1] CPU: 1 PID: 325 Comm: kworker/1:1H Tainted: G W E 5.13.0-rc2+ #46 [ 1658.929405][ C7] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 325 (kworker/1:1H) [ 1658.929465][ C1] Workqueue: events_highpri test_work_fn [test_lockup] [ 1658.929549][ C7] Back trace of paca->saved_r1 (0xc0000000057fb400) (possibly stale): [ 1658.929592][ C1] NIP: c00000000002cf50 LR: c008000000820178 CTR: c00000000002cfa0 To fix it, change the logic so that the sending CPU waits 5s for the receiving CPU to print its trace. If the receiving CPU prints its trace successfully then the sending CPU just continues, avoiding any spurious "stale" trace. This has the added benefit of allowing all CPUs to print their traces in order and avoids any interleaving of their output. Fixes: 5cc05910f26e ("powerpc/64s: Wire up arch_trigger_cpumask_backtrace()") Cc: stable@vger.kernel.org # v4.18+ Reported-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210625140408.3351173-1-mpe@ellerman.id.au
2021-06-25powerpc/rtas-rtc: remove unused constantNathan Lynch
RTAS_CLOCK_BUSY is unused, remove it. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210503175811.1528208-1-nathanl@linux.ibm.com
2021-06-25powerpc: Fix is_kvm_guest() / kvm_para_available()Michael Ellerman
Commit a21d1becaa3f ("powerpc: Reintroduce is_kvm_guest() as a fast-path check") added is_kvm_guest() and changed kvm_para_available() to use it. is_kvm_guest() checks a static key, kvm_guest, and that static key is set in check_kvm_guest(). The problem is check_kvm_guest() is only called on pseries, and even then only in some configurations. That means is_kvm_guest() always returns false on all non-pseries and some pseries depending on configuration. That's a bug. For PR KVM guests this is noticable because they no longer do live patching of themselves, which can be detected by the omission of a message in dmesg such as: KVM: Live patching for a fast VM worked To fix it make check_kvm_guest() an initcall, to ensure it's always called at boot. It needs to be core so that it runs before kvm_guest_init() which is postcore. To be an initcall it needs to return int, where 0 means success, so update that. We still call it manually in pSeries_smp_probe(), because that runs before init calls are run. Fixes: a21d1becaa3f ("powerpc: Reintroduce is_kvm_guest() as a fast-path check") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210623130514.2543232-1-mpe@ellerman.id.au
2021-06-25powerpc/64s: Make prom_init require RELOCATABLEMichael Ellerman
When we boot from open firmware (OF) using PPC_OF_BOOT_TRAMPOLINE, aka. prom_init, we run parts of the kernel at an address other than the link address. That happens because OF loads the kernel above zero (OF is at zero) and we run prom_init before copying the kernel down to zero. Currently that works even for non-relocatable kernels, because we do various fixups to the prom_init code to make it run where it's loaded. However those fixups are not sufficient if the kernel becomes large enough. In that case prom_init()'s final call to __start() can end up generating a plt branch: bl c000000002000018 <00000078.plt_branch.__start> That results in the kernel jumping to the linked address of __start, 0xc000000000000000, when really it needs to jump to the 0xc000000000000000 + the runtime address because the kernel is still running at the load address. We could do further shenanigans to handle that, see Jordan's patch for example: https://lore.kernel.org/linuxppc-dev/20210421021721.1539289-1-jniethe5@gmail.com However it is much simpler to just require a kernel with prom_init() to be built relocatable. The result works in all configurations without further work, and requires less code. This should have no effect on most people, as our defconfigs and essentially all distro configs already have RELOCATABLE enabled. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210623130454.2542945-1-mpe@ellerman.id.au
2021-06-25powerpc/kprobes: Warn if instruction patching failedNaveen N. Rao
When arming and disarming probes, we currently assume that instruction patching can never fail, and don't have a mechanism to surface errors. Add a warning in case instruction patching ever fails. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/18d7b1309f938c08ce07738100932b551bdd3a52.1621416666.git.naveen.n.rao@linux.vnet.ibm.com
2021-06-25powerpc/kprobes: Roll IS_RFI() macro into IS_RFID()Naveen N. Rao
In kprobes and xmon, we should exclude both 32-bit and 64-bit variants of mtmsr and rfi instructions from being stepped. Have IS_RFID() also detect a rfi instruction similar to IS_MTMSRD(). Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/eee32e1b75dae85d471c89b4c0a123ad4b0aabf8.1621416666.git.naveen.n.rao@linux.vnet.ibm.com
2021-06-25powerpc/kprobes: Fix Oops by passing ppc_inst as a pointer to emulate_step() ↵Naveen N. Rao
on ppc32 Trying to use a kprobe on ppc32 results in the below splat: BUG: Unable to handle kernel data access on read at 0x7c0802a6 Faulting instruction address: 0xc002e9f0 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K PowerPC 44x Platform Modules linked in: CPU: 0 PID: 89 Comm: sh Not tainted 5.13.0-rc1-01824-g3a81c0495fdb #7 NIP: c002e9f0 LR: c0011858 CTR: 00008a47 REGS: c292fd50 TRAP: 0300 Not tainted (5.13.0-rc1-01824-g3a81c0495fdb) MSR: 00009000 <EE,ME> CR: 24002002 XER: 20000000 DEAR: 7c0802a6 ESR: 00000000 <snip> NIP [c002e9f0] emulate_step+0x28/0x324 LR [c0011858] optinsn_slot+0x128/0x10000 Call Trace: opt_pre_handler+0x7c/0xb4 (unreliable) optinsn_slot+0x128/0x10000 ret_from_syscall+0x0/0x28 The offending instruction is: 81 24 00 00 lwz r9,0(r4) Here, we are trying to load the second argument to emulate_step(): struct ppc_inst, which is the instruction to be emulated. On ppc64, structures are passed in registers when passed by value. However, per the ppc32 ABI, structures are always passed to functions as pointers. This isn't being adhered to when setting up the call to emulate_step() in the optprobe trampoline. Fix the same. Fixes: eacf4c0202654a ("powerpc: Enable OPTPROBES on PPC32") Cc: stable@vger.kernel.org Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5bdc8cbc9a95d0779e27c9ddbf42b40f51f883c0.1624425798.git.christophe.leroy@csgroup.eu
2021-06-25powerpc/64s: Fix copy-paste data exposure into newly created tasksNicholas Piggin
copy-paste contains implicit "copy buffer" state that can contain arbitrary user data (if the user process executes a copy instruction). This could be snooped by another process if a context switch hits while the state is live. So cp_abort is executed on context switch to clear out possible sensitive data and prevent the leak. cp_abort is done after the low level _switch(), which means it is never reached by newly created tasks, so they could snoop on this buffer between their first and second context switch. Fix this by doing the cp_abort before calling _switch. Add some comments which should make the issue harder to miss. Fixes: 07d2a628bc000 ("powerpc/64s: Avoid cpabort in context switch when possible") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210622053036.474678-1-npiggin@gmail.com
2021-06-25powerpc/32: Avoid #ifdef nested with FTR_SECTION on booke syscall entryChristophe Leroy
On booke, SYSCALL_ENTRY macro nests an FTR_SECTION with a #ifdef CONFIG_KVM_BOOKE_HV. Duplicate the single instruction alternative to avoid nesting. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/33db61d5f85146262dbe26648f8f87eca3cae393.1622818435.git.christophe.leroy@csgroup.eu
2021-06-25powerpc/32: Reduce code duplication of system call entryChristophe Leroy
booke and non booke do pretty similar things in SYSCALL_ENTRY macro just before calling jumping to transfer_to_syscall(). Do them in transfer_to_syscall() instead. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/552e27fa09394a6bc70585fcdfa237f99a5d1267.1622818435.git.christophe.leroy@csgroup.eu
2021-06-25powerpc/32: Interchange r1 and r11 in SYSCALL_ENTRY on bookeChristophe Leroy
To better match non booke version of SYSCALL_ENTRY macro, interchange r1 and r11 in the booke version. While at it, in both versions use r1 instead of r11 to save _NIP and _CCR. All other uses of r11 will go away in next patch, so don't bother changing them for now. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1684c39724a069b0ce1aa82eaee6ec194e354e4e.1622818435.git.christophe.leroy@csgroup.eu
2021-06-25powerpc/32: Interchange r10 and r12 in SYSCALL_ENTRY on non bookeChristophe Leroy
To better match booke version of SYSCALL_ENTRY macro, interchange r10 and r12 in the non booke version. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5ab3a517bc883a2fc905fb2cb5ee9344f37b2cfa.1622818435.git.christophe.leroy@csgroup.eu
2021-06-25powerpc: Remove klimitChristophe Leroy
klimit is a global variable initialised at build time with the value of _end. This variable is never modified, so _end symbol can be used directly. Remove klimit. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9fa9ba6807c17f93f35a582c199c646c4a8bfd9c.1622800638.git.christophe.leroy@csgroup.eu
2021-06-25powerpc: Offline CPU in stop_this_cpu()Nicholas Piggin
printk_safe_flush_on_panic() has special lock breaking code for the case where we panic()ed with the console lock held. It relies on panic IPI causing other CPUs to mark themselves offline. Do as most other architectures do. This effectively reverts commit de6e5d38417e ("powerpc: smp_send_stop do not offline stopped CPUs"), unfortunately it may result in some false positive warnings, but the alternative is more situations where we can crash without getting messages out. Fixes: de6e5d38417e ("powerpc: smp_send_stop do not offline stopped CPUs") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210623041245.865134-1-npiggin@gmail.com
2021-06-25powerpc/64s: Remove irq mask workaround in accumulate_stolen_time()Nicholas Piggin
The caller has been moved to C after irq soft-mask state has been reconciled, and Linux IRQs have been marked as disabled, so this no longer needs to play games with IRQ internals. Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210623022924.704645-1-npiggin@gmail.com
2021-06-25powerpc/pseries: Enable hardlockup watchdog for PowerVM partitionsNicholas Piggin
PowerVM will not arbitrarily oversubscribe or stop guests, page out the guest kernel text to a NFS volume connected by carrier pigeon to abacus based storage, etc., as a KVM host might. So PowerVM guests are not likely to be killed by the hard lockup watchdog in normal operation, even with shared processor LPARs which still get a minimum allotment of CPU time. Enable the hard lockup detector by default on !KVM guests, which we will assume is PowerVM. It has been useful in finding problems on bare metal kernels. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210623021528.702241-1-npiggin@gmail.com
2021-06-25powerpc/64s/interrupt: Check and fix srr_valid without crashingNicholas Piggin
The PPC_RFI_SRR_DEBUG check added by patch "powerpc/64s: avoid reloading (H)SRR registers if they are still valid" has a few deficiencies. It does not fix the actual problem, it's not enabled by default, and it causes a program check interrupt which can cause more difficulties. However there are a lot of paths which may clobber SRRs or change return regs, and difficult to have a high confidence that all paths are covered without wider testing. Add a relatively low overhead always-enabled check that catches most such cases, reports once, and fixes it so the kernel can continue. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Rebase, use switch & INT names, squash in race fix from Nick] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-06-25powerpc/interrupt: Remove prep_irq_for_user_exit()Christophe Leroy
prep_irq_for_user_exit() has only one caller, squash it inside that caller. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-18-npiggin@gmail.com