From 4d2e26a38fbcde2ba14882cbdb845caa1c17e19b Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 10 Apr 2019 08:32:42 -0300 Subject: docs: powerpc: convert docs to ReST and rename to *.rst Convert docs to ReST and add them to the arch-specific book. The conversion here was trivial, as almost every file there was already using an elegant format close to ReST standard. The changes were mostly to mark literal blocks and add a few missing section title identifiers. One note with regards to "--": on Sphinx, this can't be used to identify a list, as it will format it badly. This can be used, however, to identify a long hyphen - and "---" is an even longer one. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab Acked-by: Andrew Donnellan # cxl --- arch/powerpc/kernel/exceptions-64s.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index eee5bef736c8..6ba3cc2ef8ab 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1531,7 +1531,7 @@ EXC_COMMON(trap_0b_common, 0xb00, unknown_exception) * * Call convention: * - * syscall register convention is in Documentation/powerpc/syscall64-abi.txt + * syscall register convention is in Documentation/powerpc/syscall64-abi.rst * * For hypercalls, the register convention is as follows: * r0 volatile -- cgit v1.2.3 From b4fc36e60f25cf22bf8b7b015a701015740c3743 Mon Sep 17 00:00:00 2001 From: Shawn Anastasio Date: Wed, 17 Jul 2019 18:54:37 -0500 Subject: powerpc/dma: Fix invalid DMA mmap behavior The refactor of powerpc DMA functions in commit 6666cc17d780 ("powerpc/dma: remove dma_nommu_mmap_coherent") incorrectly changes the way DMA mappings are handled on powerpc. Since this change, all mapped pages are marked as cache-inhibited through the default implementation of arch_dma_mmap_pgprot. This differs from the previous behavior of only marking pages in noncoherent mappings as cache-inhibited and has resulted in sporadic system crashes in certain hardware configurations and workloads (see Bugzilla). This commit restores the previous correct behavior by providing an implementation of arch_dma_mmap_pgprot that only marks pages in noncoherent mappings as cache-inhibited. As this behavior should be universal for all powerpc platforms a new file, dma-generic.c, was created to store it. Fixes: 6666cc17d780 ("powerpc/dma: remove dma_nommu_mmap_coherent") # NOTE: fixes commit 6666cc17d780 released in v5.1. # Consider a stable tag: # Cc: stable@vger.kernel.org # v5.1+ # NOTE: fixes commit 6666cc17d780 released in v5.1. # Consider a stable tag: # Cc: stable@vger.kernel.org # v5.1+ Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Shawn Anastasio Reviewed-by: Alexey Kardashevskiy Reviewed-by: Christoph Hellwig Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190717235437.12908-1-shawn@anastas.io --- arch/powerpc/kernel/Makefile | 3 ++- arch/powerpc/kernel/dma-common.c | 17 +++++++++++++++++ 2 files changed, 19 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/kernel/dma-common.c (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 56dfa7a2a6f2..ea0c69236789 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -49,7 +49,8 @@ obj-y := cputable.o ptrace.o syscalls.o \ signal.o sysfs.o cacheinfo.o time.o \ prom.o traps.o setup-common.o \ udbg.o misc.o io.o misc_$(BITS).o \ - of_platform.o prom_parse.o + of_platform.o prom_parse.o \ + dma-common.o obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \ signal_64.o ptrace32.o \ paca.o nvram_64.o firmware.o diff --git a/arch/powerpc/kernel/dma-common.c b/arch/powerpc/kernel/dma-common.c new file mode 100644 index 000000000000..dc7ef6b17b69 --- /dev/null +++ b/arch/powerpc/kernel/dma-common.c @@ -0,0 +1,17 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Contains common dma routines for all powerpc platforms. + * + * Copyright (C) 2019 Shawn Anastasio. + */ + +#include +#include + +pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot, + unsigned long attrs) +{ + if (!dev_is_dma_coherent(dev)) + return pgprot_noncached(prot); + return prot; +} -- cgit v1.2.3 From f16d80b75a096c52354c6e0a574993f3b0dfbdfe Mon Sep 17 00:00:00 2001 From: Michael Neuling Date: Fri, 19 Jul 2019 15:05:02 +1000 Subject: powerpc/tm: Fix oops on sigreturn on systems without TM On systems like P9 powernv where we have no TM (or P8 booted with ppc_tm=off), userspace can construct a signal context which still has the MSR TS bits set. The kernel tries to restore this context which results in the following crash: Unexpected TM Bad Thing exception at c0000000000022fc (msr 0x8000000102a03031) tm_scratch=800000020280f033 Oops: Unrecoverable exception, sig: 6 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 1636 Comm: sigfuz Not tainted 5.2.0-11043-g0a8ad0ffa4 #69 NIP: c0000000000022fc LR: 00007fffb2d67e48 CTR: 0000000000000000 REGS: c00000003fffbd70 TRAP: 0700 Not tainted (5.2.0-11045-g7142b497d8) MSR: 8000000102a03031 CR: 42004242 XER: 00000000 CFAR: c0000000000022e0 IRQMASK: 0 GPR00: 0000000000000072 00007fffb2b6e560 00007fffb2d87f00 0000000000000669 GPR04: 00007fffb2b6e728 0000000000000000 0000000000000000 00007fffb2b6f2a8 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00007fffb2b76900 0000000000000000 0000000000000000 GPR16: 00007fffb2370000 00007fffb2d84390 00007fffea3a15ac 000001000a250420 GPR20: 00007fffb2b6f260 0000000010001770 0000000000000000 0000000000000000 GPR24: 00007fffb2d843a0 00007fffea3a14a0 0000000000010000 0000000000800000 GPR28: 00007fffea3a14d8 00000000003d0f00 0000000000000000 00007fffb2b6e728 NIP [c0000000000022fc] rfi_flush_fallback+0x7c/0x80 LR [00007fffb2d67e48] 0x7fffb2d67e48 Call Trace: Instruction dump: e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 e92d0c00 e94d0c08 e96d0c10 e82d0c18 7db242a6 <4c000024> 7db243a6 7db142a6 f82d0c18 The problem is the signal code assumes TM is enabled when CONFIG_PPC_TRANSACTIONAL_MEM is enabled. This may not be the case as with P9 powernv or if `ppc_tm=off` is used on P8. This means any local user can crash the system. Fix the problem by returning a bad stack frame to the user if they try to set the MSR TS bits with sigreturn() on systems where TM is not supported. Found with sigfuz kernel selftest on P9. This fixes CVE-2019-13648. Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context") Cc: stable@vger.kernel.org # v3.9 Reported-by: Praveen Pandey Signed-off-by: Michael Neuling Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190719050502.405-1-mikey@neuling.org --- arch/powerpc/kernel/signal_32.c | 3 +++ arch/powerpc/kernel/signal_64.c | 5 +++++ 2 files changed, 8 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c index f50b708d6d77..98600b276f76 100644 --- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -1198,6 +1198,9 @@ SYSCALL_DEFINE0(rt_sigreturn) goto bad; if (MSR_TM_ACTIVE(msr_hi<<32)) { + /* Trying to start TM on non TM system */ + if (!cpu_has_feature(CPU_FTR_TM)) + goto bad; /* We only recheckpoint on return if we're * transaction. */ diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 2f80e270c7b0..117515564ec7 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -771,6 +771,11 @@ SYSCALL_DEFINE0(rt_sigreturn) if (MSR_TM_ACTIVE(msr)) { /* We recheckpoint on return. */ struct ucontext __user *uc_transact; + + /* Trying to start TM on non TM system */ + if (!cpu_has_feature(CPU_FTR_TM)) + goto badframe; + if (__get_user(uc_transact, &uc->uc_link)) goto badframe; if (restore_tm_sigcontexts(current, &uc->uc_mcontext, -- cgit v1.2.3 From cee3536d24a1d5db66b9f68c3ece0af128187ab4 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Mon, 22 Jul 2019 22:26:56 +1000 Subject: powerpc: Wire up clone3 syscall Wire up the new clone3 syscall added in commit 7f192e3cd316 ("fork: add clone3"). This requires a ppc_clone3 wrapper, in order to save the non-volatile GPRs before calling into the generic syscall code. Otherwise we hit the BUG_ON in CHECK_FULL_REGS in copy_thread(). Lightly tested using Christian's test code on a Power8 LE VM. Signed-off-by: Michael Ellerman Acked-by: Christian Brauner Link: https://lore.kernel.org/r/20190724140259.23554-1-mpe@ellerman.id.au --- arch/powerpc/kernel/entry_32.S | 8 ++++++++ arch/powerpc/kernel/entry_64.S | 5 +++++ arch/powerpc/kernel/syscalls/syscall.tbl | 2 +- 3 files changed, 14 insertions(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 85fdb6d879f1..54fab22c9a43 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -597,6 +597,14 @@ ppc_clone: stw r0,_TRAP(r1) /* register set saved */ b sys_clone + .globl ppc_clone3 +ppc_clone3: + SAVE_NVGPRS(r1) + lwz r0,_TRAP(r1) + rlwinm r0,r0,0,0,30 /* clear LSB to indicate full */ + stw r0,_TRAP(r1) /* register set saved */ + b sys_clone3 + .globl ppc_swapcontext ppc_swapcontext: SAVE_NVGPRS(r1) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index d9105fcf4021..0a0b5310f54a 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -487,6 +487,11 @@ _GLOBAL(ppc_clone) bl sys_clone b .Lsyscall_exit +_GLOBAL(ppc_clone3) + bl save_nvgprs + bl sys_clone3 + b .Lsyscall_exit + _GLOBAL(ppc32_swapcontext) bl save_nvgprs bl compat_sys_swapcontext diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 3331749aab20..43f736ed47f2 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -516,4 +516,4 @@ 432 common fsmount sys_fsmount 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open -# 435 reserved for clone3 +435 nospu clone3 ppc_clone3 -- cgit v1.2.3 From 7db57e77586744af46c8bbf8f831bb2b941b7afc Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Wed, 31 Jul 2019 00:00:15 +1000 Subject: powerpc/spe: Mark expected switch fall-throughs Mark switch cases where we are expecting to fall through. Fixes errors such as below, seen with mpc85xx_defconfig: arch/powerpc/kernel/align.c: In function 'emulate_spe': arch/powerpc/kernel/align.c:178:8: error: this statement may fall through ret |= __get_user_inatomic(temp.v[3], p++); ^~ Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190730141917.21817-1-mpe@ellerman.id.au --- arch/powerpc/kernel/align.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c index 7107ad86de65..92045ed64976 100644 --- a/arch/powerpc/kernel/align.c +++ b/arch/powerpc/kernel/align.c @@ -176,9 +176,11 @@ static int emulate_spe(struct pt_regs *regs, unsigned int reg, ret |= __get_user_inatomic(temp.v[1], p++); ret |= __get_user_inatomic(temp.v[2], p++); ret |= __get_user_inatomic(temp.v[3], p++); + /* fall through */ case 4: ret |= __get_user_inatomic(temp.v[4], p++); ret |= __get_user_inatomic(temp.v[5], p++); + /* fall through */ case 2: ret |= __get_user_inatomic(temp.v[6], p++); ret |= __get_user_inatomic(temp.v[7], p++); @@ -259,9 +261,11 @@ static int emulate_spe(struct pt_regs *regs, unsigned int reg, ret |= __put_user_inatomic(data.v[1], p++); ret |= __put_user_inatomic(data.v[2], p++); ret |= __put_user_inatomic(data.v[3], p++); + /* fall through */ case 4: ret |= __put_user_inatomic(data.v[4], p++); ret |= __put_user_inatomic(data.v[5], p++); + /* fall through */ case 2: ret |= __put_user_inatomic(data.v[6], p++); ret |= __put_user_inatomic(data.v[7], p++); -- cgit v1.2.3 From ae2e953fdca791270e80c08d6a830d9aa472a111 Mon Sep 17 00:00:00 2001 From: Nathan Lynch Date: Thu, 18 Jul 2019 11:22:14 -0500 Subject: powerpc/rtas: Unexport rtas_online_cpus_mask, rtas_offline_cpus_mask These aren't used by modular code, nor should they be. Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation") Signed-off-by: Nathan Lynch Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190718162214.5694-1-nathanl@linux.ibm.com --- arch/powerpc/kernel/rtas.c | 2 -- 1 file changed, 2 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 5faf0a64c92b..49159bb38949 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -922,13 +922,11 @@ int rtas_online_cpus_mask(cpumask_var_t cpus) return ret; } -EXPORT_SYMBOL(rtas_online_cpus_mask); int rtas_offline_cpus_mask(cpumask_var_t cpus) { return rtas_cpu_state_change_mask(DOWN, cpus); } -EXPORT_SYMBOL(rtas_offline_cpus_mask); int rtas_ibm_suspend_me(u64 handle) { -- cgit v1.2.3 From 33dcb37cef741294b481f4d889a465b8091f11bf Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 26 Jul 2019 09:26:40 +0200 Subject: dma-mapping: fix page attributes for dma_mmap_* All the way back to introducing dma_common_mmap we've defaulted to mark the pages as uncached. But this is wrong for DMA coherent devices. Later on DMA_ATTR_WRITE_COMBINE also got incorrect treatment as that flag is only treated special on the alloc side for non-coherent devices. Introduce a new dma_pgprot helper that deals with the check for coherent devices so that only the remapping cases ever reach arch_dma_mmap_pgprot and we thus ensure no aliasing of page attributes happens, which makes the powerpc version of arch_dma_mmap_pgprot obsolete and simplifies the remaining ones. Note that this means arch_dma_mmap_pgprot is a bit misnamed now, but we'll phase it out soon. Fixes: 64ccc9c033c6 ("common: dma-mapping: add support for generic dma_mmap_* calls") Reported-by: Shawn Anastasio Reported-by: Gavin Li Signed-off-by: Christoph Hellwig Acked-by: Catalin Marinas # arm64 --- arch/powerpc/kernel/Makefile | 3 +-- arch/powerpc/kernel/dma-common.c | 17 ----------------- 2 files changed, 1 insertion(+), 19 deletions(-) delete mode 100644 arch/powerpc/kernel/dma-common.c (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index ea0c69236789..56dfa7a2a6f2 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -49,8 +49,7 @@ obj-y := cputable.o ptrace.o syscalls.o \ signal.o sysfs.o cacheinfo.o time.o \ prom.o traps.o setup-common.o \ udbg.o misc.o io.o misc_$(BITS).o \ - of_platform.o prom_parse.o \ - dma-common.o + of_platform.o prom_parse.o obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \ signal_64.o ptrace32.o \ paca.o nvram_64.o firmware.o diff --git a/arch/powerpc/kernel/dma-common.c b/arch/powerpc/kernel/dma-common.c deleted file mode 100644 index dc7ef6b17b69..000000000000 --- a/arch/powerpc/kernel/dma-common.c +++ /dev/null @@ -1,17 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * Contains common dma routines for all powerpc platforms. - * - * Copyright (C) 2019 Shawn Anastasio. - */ - -#include -#include - -pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot, - unsigned long attrs) -{ - if (!dev_is_dma_coherent(dev)) - return pgprot_noncached(prot); - return prot; -} -- cgit v1.2.3 From 2b87a2553aa04105724d6c844e223415985246ed Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Thu, 16 May 2019 12:04:37 +1000 Subject: powerpc/64s: Make boot look nice(r) Radix boot looks like this: ----------------------------------------------------- phys_mem_size = 0x200000000 dcache_bsize = 0x80 icache_bsize = 0x80 cpu_features = 0x0000c06f8f5fb1a7 possible = 0x0000fbffcf5fb1a7 always = 0x00000003800081a1 cpu_user_features = 0xdc0065c2 0xaee00000 mmu_features = 0xbc006041 firmware_features = 0x0000000010000000 hash-mmu: ppc64_pft_size = 0x0 hash-mmu: kernel vmalloc start = 0xc008000000000000 hash-mmu: kernel IO start = 0xc00a000000000000 hash-mmu: kernel vmemmap start = 0xc00c000000000000 ----------------------------------------------------- Fix: ----------------------------------------------------- phys_mem_size = 0x200000000 dcache_bsize = 0x80 icache_bsize = 0x80 cpu_features = 0x0000c06f8f5fb1a7 possible = 0x0000fbffcf5fb1a7 always = 0x00000003800081a1 cpu_user_features = 0xdc0065c2 0xaee00000 mmu_features = 0xbc006041 firmware_features = 0x0000000010000000 vmalloc start = 0xc008000000000000 IO start = 0xc00a000000000000 vmemmap start = 0xc00c000000000000 ----------------------------------------------------- Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190516020437.11783-1-npiggin@gmail.com --- arch/powerpc/kernel/setup-common.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 1f8db666468d..7b4c921ec73f 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -806,9 +806,15 @@ static __init void print_system_info(void) pr_info("mmu_features = 0x%08x\n", cur_cpu_spec->mmu_features); #ifdef CONFIG_PPC64 pr_info("firmware_features = 0x%016lx\n", powerpc_firmware_features); +#ifdef CONFIG_PPC_BOOK3S + pr_info("vmalloc start = 0x%lx\n", KERN_VIRT_START); + pr_info("IO start = 0x%lx\n", KERN_IO_START); + pr_info("vmemmap start = 0x%lx\n", (unsigned long)vmemmap); +#endif #endif - print_system_hash_info(); + if (!early_radix_enabled()) + print_system_hash_info(); if (PHYSICAL_START > 0) pr_info("physical_start = 0x%llx\n", -- cgit v1.2.3 From 4f7e0babbc7c46fc0db4f5c14fe96bf6f4d69502 Mon Sep 17 00:00:00 2001 From: Alexey Kardashevskiy Date: Thu, 18 Jul 2019 15:11:37 +1000 Subject: powerpc/iommu: Allow bypass-only for DMA POWER8 and newer support a bypass mode which maps all host memory to PCI buses so an IOMMU table is not always required. However if we fail to create such a table, the DMA setup fails and the kernel does not boot. This skips the 32bit DMA setup check if the bypass is selected. Signed-off-by: Alexey Kardashevskiy Reviewed-by: David Gibson Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190718051139.74787-3-aik@ozlabs.ru --- arch/powerpc/kernel/dma-iommu.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c index a0879674a9c8..c963d704fa31 100644 --- a/arch/powerpc/kernel/dma-iommu.c +++ b/arch/powerpc/kernel/dma-iommu.c @@ -122,18 +122,17 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask) { struct iommu_table *tbl = get_iommu_table_base(dev); - if (!tbl) { - dev_info(dev, "Warning: IOMMU dma not supported: mask 0x%08llx" - ", table unavailable\n", mask); - return 0; - } - if (dev_is_pci(dev) && dma_iommu_bypass_supported(dev, mask)) { dev->archdata.iommu_bypass = true; dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n"); return 1; } + if (!tbl) { + dev_err(dev, "Warning: IOMMU dma not supported: mask 0x%08llx, table unavailable\n", mask); + return 0; + } + if (tbl->it_offset > (mask >> tbl->it_page_shift)) { dev_info(dev, "Warning: IOMMU offset too big for device mask\n"); dev_info(dev, "mask: 0x%08llx, table offset: 0x%08lx\n", -- cgit v1.2.3 From 201ed7f327a17577debec52c33786d4b3259d0dc Mon Sep 17 00:00:00 2001 From: Alexey Kardashevskiy Date: Thu, 18 Jul 2019 15:11:39 +1000 Subject: powerpc/powernv/ioda2: Create bigger default window with 64k IOMMU pages At the moment we create a small window only for 32bit devices, the window maps 0..2GB of the PCI space only. For other devices we either use a sketchy bypass or hardware bypass but the former can only work if the amount of RAM is no bigger than the device's DMA mask and the latter requires devices to support at least 59bit DMA. This extends the default DMA window to the maximum size possible to allow a wider DMA mask than just 32bit. The default window size is now limited by the the iommu_table::it_map allocation bitmap which is a contiguous array, 1 bit per an IOMMU page. This increases the default IOMMU page size from hard coded 4K to the system page size to allow wider DMA masks. This increases the level number to not exceed the max order allocation limit per TCE level. By the same time, this keeps minimal levels number as 2 in order to save memory. As the extended window now overlaps the 32bit MMIO region, this adds an area reservation to iommu_init_table(). After this change the default window size is 0x80000000000==1<<43 so devices limited to DMA mask smaller than the amount of system RAM can still use more than just 2GB of memory for DMA. This is an optimization and not a bug fix for DMA API usage. With the on-demand allocation of indirect TCE table levels enabled and 2 levels, the first TCE level size is just 1< Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190718051139.74787-5-aik@ozlabs.ru --- arch/powerpc/kernel/iommu.c | 74 +++++++++++++++++++++++++++++++-------------- 1 file changed, 52 insertions(+), 22 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 0a67ce9f827e..e7a2b160d4c6 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -633,11 +633,54 @@ static void iommu_table_clear(struct iommu_table *tbl) #endif } +static void iommu_table_reserve_pages(struct iommu_table *tbl, + unsigned long res_start, unsigned long res_end) +{ + int i; + + WARN_ON_ONCE(res_end < res_start); + /* + * Reserve page 0 so it will not be used for any mappings. + * This avoids buggy drivers that consider page 0 to be invalid + * to crash the machine or even lose data. + */ + if (tbl->it_offset == 0) + set_bit(0, tbl->it_map); + + tbl->it_reserved_start = res_start; + tbl->it_reserved_end = res_end; + + /* Check if res_start..res_end isn't empty and overlaps the table */ + if (res_start && res_end && + (tbl->it_offset + tbl->it_size < res_start || + res_end < tbl->it_offset)) + return; + + for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i) + set_bit(i - tbl->it_offset, tbl->it_map); +} + +static void iommu_table_release_pages(struct iommu_table *tbl) +{ + int i; + + /* + * In case we have reserved the first bit, we should not emit + * the warning below. + */ + if (tbl->it_offset == 0) + clear_bit(0, tbl->it_map); + + for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i) + clear_bit(i - tbl->it_offset, tbl->it_map); +} + /* * Build a iommu_table structure. This contains a bit map which * is used to manage allocation of the tce space. */ -struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid) +struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid, + unsigned long res_start, unsigned long res_end) { unsigned long sz; static int welcomed = 0; @@ -656,13 +699,7 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid) tbl->it_map = page_address(page); memset(tbl->it_map, 0, sz); - /* - * Reserve page 0 so it will not be used for any mappings. - * This avoids buggy drivers that consider page 0 to be invalid - * to crash the machine or even lose data. - */ - if (tbl->it_offset == 0) - set_bit(0, tbl->it_map); + iommu_table_reserve_pages(tbl, res_start, res_end); /* We only split the IOMMU table if we have 1GB or more of space */ if ((tbl->it_size << tbl->it_page_shift) >= (1UL * 1024 * 1024 * 1024)) @@ -714,12 +751,7 @@ static void iommu_table_free(struct kref *kref) return; } - /* - * In case we have reserved the first bit, we should not emit - * the warning below. - */ - if (tbl->it_offset == 0) - clear_bit(0, tbl->it_map); + iommu_table_release_pages(tbl); /* verify that table contains no entries */ if (!bitmap_empty(tbl->it_map, tbl->it_size)) @@ -1024,15 +1056,14 @@ int iommu_take_ownership(struct iommu_table *tbl) for (i = 0; i < tbl->nr_pools; i++) spin_lock(&tbl->pools[i].lock); - if (tbl->it_offset == 0) - clear_bit(0, tbl->it_map); + iommu_table_release_pages(tbl); if (!bitmap_empty(tbl->it_map, tbl->it_size)) { pr_err("iommu_tce: it_map is not empty"); ret = -EBUSY; - /* Restore bit#0 set by iommu_init_table() */ - if (tbl->it_offset == 0) - set_bit(0, tbl->it_map); + /* Undo iommu_table_release_pages, i.e. restore bit#0, etc */ + iommu_table_reserve_pages(tbl, tbl->it_reserved_start, + tbl->it_reserved_end); } else { memset(tbl->it_map, 0xff, sz); } @@ -1055,9 +1086,8 @@ void iommu_release_ownership(struct iommu_table *tbl) memset(tbl->it_map, 0, sz); - /* Restore bit#0 set by iommu_init_table() */ - if (tbl->it_offset == 0) - set_bit(0, tbl->it_map); + iommu_table_reserve_pages(tbl, tbl->it_reserved_start, + tbl->it_reserved_end); for (i = 0; i < tbl->nr_pools; i++) spin_unlock(&tbl->pools[i].lock); -- cgit v1.2.3 From 658d029df0bce6472c94b724ca54d74bc6659c2e Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 28 Jun 2019 15:55:52 +0000 Subject: powerpc/hw_breakpoint: move instruction stepping out of hw_breakpoint_handler() On 8xx, breakpoints stop after executing the instruction, so stepping/emulation is not needed. Move it into a sub-function and remove the #ifdefs. Signed-off-by: Christophe Leroy Reviewed-by: Ravi Bangoria Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/f8cdc3f1c66ad3c43ebc568abcc6c39ed4676284.1561737231.git.christophe.leroy@c-s.fr --- arch/powerpc/kernel/hw_breakpoint.c | 60 ++++++++++++++++++++----------------- 1 file changed, 33 insertions(+), 27 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index c8d1fa2e9d53..28ad3171bb82 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -198,15 +198,43 @@ void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs) /* * Handle debug exception notifications. */ +static bool stepping_handler(struct pt_regs *regs, struct perf_event *bp, + unsigned long addr) +{ + int stepped; + unsigned int instr; + + /* Do not emulate user-space instructions, instead single-step them */ + if (user_mode(regs)) { + current->thread.last_hit_ubp = bp; + regs->msr |= MSR_SE; + return false; + } + + stepped = 0; + instr = 0; + if (!__get_user_inatomic(instr, (unsigned int *)regs->nip)) + stepped = emulate_step(regs, instr); + + /* + * emulate_step() could not execute it. We've failed in reliably + * handling the hw-breakpoint. Unregister it and throw a warning + * message to let the user know about it. + */ + if (!stepped) { + WARN(1, "Unable to handle hardware breakpoint. Breakpoint at " + "0x%lx will be disabled.", addr); + perf_event_disable_inatomic(bp); + return false; + } + return true; +} + int hw_breakpoint_handler(struct die_args *args) { int rc = NOTIFY_STOP; struct perf_event *bp; struct pt_regs *regs = args->regs; -#ifndef CONFIG_PPC_8xx - int stepped = 1; - unsigned int instr; -#endif struct arch_hw_breakpoint *info; unsigned long dar = regs->dar; @@ -251,31 +279,9 @@ int hw_breakpoint_handler(struct die_args *args) (dar - bp->attr.bp_addr < bp->attr.bp_len))) info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ; -#ifndef CONFIG_PPC_8xx - /* Do not emulate user-space instructions, instead single-step them */ - if (user_mode(regs)) { - current->thread.last_hit_ubp = bp; - regs->msr |= MSR_SE; + if (!IS_ENABLED(CONFIG_PPC_8xx) && !stepping_handler(regs, bp, info->address)) goto out; - } - - stepped = 0; - instr = 0; - if (!__get_user_inatomic(instr, (unsigned int *) regs->nip)) - stepped = emulate_step(regs, instr); - /* - * emulate_step() could not execute it. We've failed in reliably - * handling the hw-breakpoint. Unregister it and throw a warning - * message to let the user know about it. - */ - if (!stepped) { - WARN(1, "Unable to handle hardware breakpoint. Breakpoint at " - "0x%lx will be disabled.", info->address); - perf_event_disable_inatomic(bp); - goto out; - } -#endif /* * As a policy, the callback is invoked in a 'trigger-after-execute' * fashion -- cgit v1.2.3 From 9d6d712fbf7766f21c838940eebcd7b4d476c5e6 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Wed, 14 Aug 2019 10:02:20 +0000 Subject: powerpc/32s: Fix boot failure with DEBUG_PAGEALLOC without KASAN. When KASAN is selected, the definitive hash table has to be set up later, but there is already an early temporary one. When KASAN is not selected, there is no early hash table, so the setup of the definitive hash table cannot be delayed. Fixes: 72f208c6a8f7 ("powerpc/32s: move hash code patching out of MMU_init_hw()") Cc: stable@vger.kernel.org # v5.2+ Reported-by: Jonathan Neuschafer Tested-by: Jonathan Neuschafer Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/b7860c5e1e784d6b96ba67edf47dd6cbc2e78ab6.1565776892.git.christophe.leroy@c-s.fr --- arch/powerpc/kernel/head_32.S | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index f255e22184b4..c8b4f7ed318c 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -897,9 +897,11 @@ start_here: bl machine_init bl __save_cpu_setup bl MMU_init +#ifdef CONFIG_KASAN BEGIN_MMU_FTR_SECTION bl MMU_init_hw_patch END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE) +#endif /* * Go back to running unmapped so we can load up new values -- cgit v1.2.3 From 7ab0b7cb8951d4095d73e203759b74d41916e455 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 16 Aug 2019 07:52:20 +0000 Subject: powerpc/32: Add warning on misaligned copy_page() or clear_page() copy_page() and clear_page() expect page aligned destination, and use dcbz instruction to clear entire cache lines based on the assumption that the destination is cache aligned. As shown during analysis of a bug in BTRFS filesystem, a misaligned copy_page() can create bugs that are difficult to locate (see Link). Add an explicit WARNING when copy_page() or clear_page() are called with misaligned destination. Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://bugzilla.kernel.org/show_bug.cgi?id=204371 Link: https://lore.kernel.org/r/c6cea38f90480268d439ca44a645647e260fff09.1565941808.git.christophe.leroy@c-s.fr --- arch/powerpc/kernel/misc_32.S | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S index fe4bd321730e..02d90e1ebf65 100644 --- a/arch/powerpc/kernel/misc_32.S +++ b/arch/powerpc/kernel/misc_32.S @@ -452,7 +452,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE) stwu r9,16(r3) _GLOBAL(copy_page) + rlwinm r5, r3, 0, L1_CACHE_BYTES - 1 addi r3,r3,-4 + +0: twnei r5, 0 /* WARN if r3 is not cache aligned */ + EMIT_BUG_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING + addi r4,r4,-4 li r5,4 -- cgit v1.2.3 From 415480dce2ef03bb8335deebd2f402f475443ce0 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Mon, 19 Aug 2019 06:40:25 +0000 Subject: powerpc/603: Fix handling of the DIRTY flag If a page is already mapped RW without the DIRTY flag, the DIRTY flag is never set and a TLB store miss exception is taken forever. This is easily reproduced with the following app: void main(void) { volatile char *ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); *ptr = *ptr; } When DIRTY flag is not set, bail out of TLB miss handler and take a minor page fault which will set the DIRTY flag. Fixes: f8b58c64eaef ("powerpc/603: let's handle PAGE_DIRTY directly") Cc: stable@vger.kernel.org # v5.1+ Reported-by: Doug Crawford Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/80432f71194d7ee75b2f5043ecf1501cf1cca1f3.1566196646.git.christophe.leroy@c-s.fr --- arch/powerpc/kernel/head_32.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index c8b4f7ed318c..9e6f01abb31e 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -557,9 +557,9 @@ DataStoreTLBMiss: cmplw 0,r1,r3 mfspr r2, SPRN_SPRG_PGDIR #ifdef CONFIG_SWAP - li r1, _PAGE_RW | _PAGE_PRESENT | _PAGE_ACCESSED + li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED #else - li r1, _PAGE_RW | _PAGE_PRESENT + li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT #endif bge- 112f lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */ -- cgit v1.2.3 From a6717c01ddc259f6f73364779df058e2c67309f8 Mon Sep 17 00:00:00 2001 From: Nathan Lynch Date: Fri, 2 Aug 2019 14:29:24 -0500 Subject: powerpc/rtas: use device model APIs and serialization during LPM The LPAR migration implementation and userspace-initiated cpu hotplug can interleave their executions like so: 1. Set cpu 7 offline via sysfs. 2. Begin a partition migration, whose implementation requires the OS to ensure all present cpus are online; cpu 7 is onlined: rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up This sets cpu 7 online in all respects except for the cpu's corresponding struct device; dev->offline remains true. 3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is already online and returns success. The driver core (device_online) sets dev->offline = false. 4. The migration completes and restores cpu 7 to offline state: rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down This leaves cpu7 in a state where the driver core considers the cpu device online, but in all other respects it is offline and unused. Attempts to online the cpu via sysfs appear to succeed but the driver core actually does not pass the request to the lower-level cpuhp support code. This makes the cpu unusable until the cpu device is manually set offline and then online again via sysfs. Instead of directly calling cpu_up/cpu_down, the migration code should use the higher-level device core APIs to maintain consistent state and serialize operations. Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation") Signed-off-by: Nathan Lynch Reviewed-by: Gautham R. Shenoy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802192926.19277-2-nathanl@linux.ibm.com --- arch/powerpc/kernel/rtas.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 49159bb38949..ef290d4036ba 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -871,15 +871,17 @@ static int rtas_cpu_state_change_mask(enum rtas_cpu_state state, return 0; for_each_cpu(cpu, cpus) { + struct device *dev = get_cpu_device(cpu); + switch (state) { case DOWN: - cpuret = cpu_down(cpu); + cpuret = device_offline(dev); break; case UP: - cpuret = cpu_up(cpu); + cpuret = device_online(dev); break; } - if (cpuret) { + if (cpuret < 0) { pr_debug("%s: cpu_%s for cpu#%d returned %d.\n", __func__, ((state == UP) ? "up" : "down"), @@ -966,6 +968,8 @@ int rtas_ibm_suspend_me(u64 handle) data.token = rtas_token("ibm,suspend-me"); data.complete = &done; + lock_device_hotplug(); + /* All present CPUs must be online */ cpumask_andnot(offline_mask, cpu_present_mask, cpu_online_mask); cpuret = rtas_online_cpus_mask(offline_mask); @@ -1004,6 +1008,7 @@ out_hotplug_enable: __func__); out: + unlock_device_hotplug(); free_cpumask_var(offline_mask); return atomic_read(&data.error); } -- cgit v1.2.3 From 10e4850d7c7f2af2e5c40520b8caf73bf9d7e2d1 Mon Sep 17 00:00:00 2001 From: Nathan Lynch Date: Fri, 2 Aug 2019 14:29:25 -0500 Subject: powerpc/rtas: allow rescheduling while changing cpu states rtas_cpu_state_change_mask() potentially operates on scores of cpus, so explicitly allow rescheduling in the loop body. Signed-off-by: Nathan Lynch Reviewed-by: Gautham R. Shenoy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802192926.19277-3-nathanl@linux.ibm.com --- arch/powerpc/kernel/rtas.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index ef290d4036ba..c5fa251b8950 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -898,6 +899,7 @@ static int rtas_cpu_state_change_mask(enum rtas_cpu_state state, cpumask_clear_cpu(cpu, cpus); } } + cond_resched(); } return ret; -- cgit v1.2.3 From b5bda6263cad9a927e1a4edb7493d542da0c1410 Mon Sep 17 00:00:00 2001 From: Santosh Sivaraj Date: Tue, 20 Aug 2019 13:43:46 +0530 Subject: powerpc/mce: Schedule work from irq_work schedule_work() cannot be called from MCE exception context as MCE can interrupt even in interrupt disabled context. Fixes: 733e4a4c4467 ("powerpc/mce: hookup memory_failure for UE errors") Cc: stable@vger.kernel.org # v4.15+ Reviewed-by: Mahesh Salgaonkar Reviewed-by: Nicholas Piggin Acked-by: Balbir Singh Signed-off-by: Santosh Sivaraj Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190820081352.8641-2-santosh@fossix.org --- arch/powerpc/kernel/mce.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index b18df633eae9..cff31d4a501f 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -33,6 +33,7 @@ static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_ue_event_queue); static void machine_check_process_queued_event(struct irq_work *work); +static void machine_check_ue_irq_work(struct irq_work *work); void machine_check_ue_event(struct machine_check_event *evt); static void machine_process_ue_event(struct work_struct *work); @@ -40,6 +41,10 @@ static struct irq_work mce_event_process_work = { .func = machine_check_process_queued_event, }; +static struct irq_work mce_ue_event_irq_work = { + .func = machine_check_ue_irq_work, +}; + DECLARE_WORK(mce_ue_event_work, machine_process_ue_event); static void mce_set_error_info(struct machine_check_event *mce, @@ -199,6 +204,10 @@ void release_mce_event(void) get_mce_event(NULL, true); } +static void machine_check_ue_irq_work(struct irq_work *work) +{ + schedule_work(&mce_ue_event_work); +} /* * Queue up the MCE event which then can be handled later. @@ -216,7 +225,7 @@ void machine_check_ue_event(struct machine_check_event *evt) memcpy(this_cpu_ptr(&mce_ue_event_queue[index]), evt, sizeof(*evt)); /* Queue work to process this event later. */ - schedule_work(&mce_ue_event_work); + irq_work_queue(&mce_ue_event_irq_work); } /* -- cgit v1.2.3 From 99ead78afd1128bfcebe7f88f3b102fb2da09aee Mon Sep 17 00:00:00 2001 From: Balbir Singh Date: Tue, 20 Aug 2019 13:43:47 +0530 Subject: powerpc/mce: Fix MCE handling for huge pages The current code would fail on huge pages addresses, since the shift would be incorrect. Use the correct page shift value returned by __find_linux_pte() to get the correct physical address. The code is more generic and can handle both regular and compound pages. Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Signed-off-by: Balbir Singh [arbab@linux.ibm.com: Fixup pseries_do_memory_failure()] Signed-off-by: Reza Arbab Tested-by: Mahesh Salgaonkar Signed-off-by: Santosh Sivaraj Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190820081352.8641-3-santosh@fossix.org --- arch/powerpc/kernel/mce_power.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c index a814d2dfb5b0..714a98e0927f 100644 --- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -26,6 +26,7 @@ unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) { pte_t *ptep; + unsigned int shift; unsigned long flags; struct mm_struct *mm; @@ -35,13 +36,18 @@ unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) mm = &init_mm; local_irq_save(flags); - if (mm == current->mm) - ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL); - else - ptep = find_init_mm_pte(addr, NULL); + ptep = __find_linux_pte(mm->pgd, addr, NULL, &shift); local_irq_restore(flags); + if (!ptep || pte_special(*ptep)) return ULONG_MAX; + + if (shift > PAGE_SHIFT) { + unsigned long rpnmask = (1ul << shift) - PAGE_SIZE; + + return pte_pfn(__pte(pte_val(*ptep) | (addr & rpnmask))); + } + return pte_pfn(*ptep); } @@ -344,7 +350,7 @@ static const struct mce_derror_table mce_p9_derror_table[] = { MCE_INITIATOR_CPU, MCE_SEV_SEVERE, true }, { 0, false, 0, 0, 0, 0, 0 } }; -static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr, +static int mce_find_instr_ea_and_phys(struct pt_regs *regs, uint64_t *addr, uint64_t *phys_addr) { /* @@ -541,7 +547,8 @@ static int mce_handle_derror(struct pt_regs *regs, * kernel/exception-64s.h */ if (get_paca()->in_mce < MAX_MCE_DEPTH) - mce_find_instr_ea_and_pfn(regs, addr, phys_addr); + mce_find_instr_ea_and_phys(regs, addr, + phys_addr); } found = 1; } -- cgit v1.2.3 From 1a1715f516fd7fcfedffee5978ec4c07c5164470 Mon Sep 17 00:00:00 2001 From: Reza Arbab Date: Tue, 20 Aug 2019 13:43:48 +0530 Subject: powerpc/mce: Make machine_check_ue_event() static The function doesn't get used outside this file, so make it static. Signed-off-by: Reza Arbab Signed-off-by: Santosh Sivaraj Reviewed-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190820081352.8641-4-santosh@fossix.org --- arch/powerpc/kernel/mce.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index cff31d4a501f..a3b122a685a5 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -34,7 +34,7 @@ static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], static void machine_check_process_queued_event(struct irq_work *work); static void machine_check_ue_irq_work(struct irq_work *work); -void machine_check_ue_event(struct machine_check_event *evt); +static void machine_check_ue_event(struct machine_check_event *evt); static void machine_process_ue_event(struct work_struct *work); static struct irq_work mce_event_process_work = { @@ -212,7 +212,7 @@ static void machine_check_ue_irq_work(struct irq_work *work) /* * Queue up the MCE event which then can be handled later. */ -void machine_check_ue_event(struct machine_check_event *evt) +static void machine_check_ue_event(struct machine_check_event *evt) { int index; -- cgit v1.2.3 From 895e3dceeb97855dc9990136cbb80a842fe581aa Mon Sep 17 00:00:00 2001 From: Balbir Singh Date: Tue, 20 Aug 2019 13:43:50 +0530 Subject: powerpc/mce: Handle UE event for memcpy_mcsafe If we take a UE on one of the instructions with a fixup entry, set nip to continue execution at the fixup entry. Stop processing the event further or print it. Co-developed-by: Reza Arbab Signed-off-by: Reza Arbab Signed-off-by: Balbir Singh Reviewed-by: Mahesh Salgaonkar Reviewed-by: Nicholas Piggin Signed-off-by: Santosh Sivaraj Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190820081352.8641-6-santosh@fossix.org --- arch/powerpc/kernel/mce.c | 16 ++++++++++++++++ arch/powerpc/kernel/mce_power.c | 15 +++++++++++++-- 2 files changed, 29 insertions(+), 2 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index a3b122a685a5..ec4b3e1087be 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -149,6 +149,7 @@ void save_mce_event(struct pt_regs *regs, long handled, if (phys_addr != ULONG_MAX) { mce->u.ue_error.physical_address_provided = true; mce->u.ue_error.physical_address = phys_addr; + mce->u.ue_error.ignore_event = mce_err->ignore_event; machine_check_ue_event(mce); } } @@ -266,8 +267,17 @@ static void machine_process_ue_event(struct work_struct *work) /* * This should probably queued elsewhere, but * oh! well + * + * Don't report this machine check because the caller has a + * asked us to ignore the event, it has a fixup handler which + * will do the appropriate error handling and reporting. */ if (evt->error_type == MCE_ERROR_TYPE_UE) { + if (evt->u.ue_error.ignore_event) { + __this_cpu_dec(mce_ue_count); + continue; + } + if (evt->u.ue_error.physical_address_provided) { unsigned long pfn; @@ -301,6 +311,12 @@ static void machine_check_process_queued_event(struct irq_work *work) while (__this_cpu_read(mce_queue_count) > 0) { index = __this_cpu_read(mce_queue_count) - 1; evt = this_cpu_ptr(&mce_event_queue[index]); + + if (evt->error_type == MCE_ERROR_TYPE_UE && + evt->u.ue_error.ignore_event) { + __this_cpu_dec(mce_queue_count); + continue; + } machine_check_print_event_info(evt, false, false); __this_cpu_dec(mce_queue_count); } diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c index 714a98e0927f..b6cbe3449358 100644 --- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -11,6 +11,7 @@ #include #include +#include #include #include #include @@ -18,6 +19,7 @@ #include #include #include +#include /* * Convert an address related to an mm to a PFN. NOTE: we are in real @@ -565,9 +567,18 @@ static int mce_handle_derror(struct pt_regs *regs, return 0; } -static long mce_handle_ue_error(struct pt_regs *regs) +static long mce_handle_ue_error(struct pt_regs *regs, + struct mce_error_info *mce_err) { long handled = 0; + const struct exception_table_entry *entry; + + entry = search_kernel_exception_table(regs->nip); + if (entry) { + mce_err->ignore_event = true; + regs->nip = extable_fixup(entry); + return 1; + } /* * On specific SCOM read via MMIO we may get a machine check @@ -600,7 +611,7 @@ static long mce_handle_error(struct pt_regs *regs, &phys_addr); if (!handled && mce_err.error_type == MCE_ERROR_TYPE_UE) - handled = mce_handle_ue_error(regs); + handled = mce_handle_ue_error(regs, &mce_err); save_mce_event(regs, handled, &mce_err, regs->nip, addr, phys_addr); -- cgit v1.2.3 From 3f068aae7a958555533847af88705b5629f31600 Mon Sep 17 00:00:00 2001 From: Sam Bobroff Date: Fri, 16 Aug 2019 14:48:05 +1000 Subject: powerpc/64: Adjust order in pcibios_init() The pcibios_init() function for PowerPC 64 currently calls pci_bus_add_devices() before pcibios_resource_survey(). This means that at boot time, when the pcibios_bus_add_device() hooks are called by pci_bus_add_devices(), device resources have not been allocated and they are unable to perform EEH setup, so a separate pass is needed. This patch adjusts that order so that it will become possible to consolidate the EEH setup work into a single location. The only functional change is to execute pcibios_resource_survey() (excepting ppc_md.pcibios_fixup(), see below) before pci_bus_add_devices() instead of after it. Because pcibios_scan_phb() and pci_bus_add_devices() are called together in a loop, this must be broken into one loop for each call. Then the call to pcibios_resource_survey() is moved up in between them. This changes the ordering but because pcibios_resource_survey() also calls ppc_md.pcibios_fixup(), that call is extracted out into pcibios_init() to where pcibios_resource_survey() was, so that it is not moved. The only other caller of pcibios_resource_survey() is the PowerPC 32 version of pcibios_init(), and therefore, that is modified to call ppc_md.pcibios_fixup() right after pcibios_resource_survey() so that there is no functional change there at all. The re-arrangement will cause very few side-effects because at this stage in the boot, pci_bus_add_devices() does very little: - pci_create_sysfs_dev_files() does nothing (no sysfs yet) - pci_proc_attach_device() does nothing (no proc yet) - device_attach() does nothing (no drivers yet) This leaves only the pci_final_fixup calls, D3 support, and marking the device as added. Of those, only the pci_final_fixup calls have the potential to be affected by resource allocation. The only pci_final_fixup handlers that touch resources seem to be one for x86 (pci_amd_enable_64bit_bar()), and a PowerPC 32 platform driver (quirk_final_uli1575()), neither of which use this pcibios_init() function. Even if they did, it would almost certainly be a bug, under the current ordering, to rely on or make changes to resources before they were allocated. Signed-off-by: Sam Bobroff Reviewed-by: Alexey Kardashevskiy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/4506b0489eabd0921a3587d90bd44c7683f3472d.1565930772.git.sbobroff@linux.ibm.com --- arch/powerpc/kernel/pci-common.c | 4 ---- arch/powerpc/kernel/pci_32.c | 4 ++++ arch/powerpc/kernel/pci_64.c | 12 +++++++++--- 3 files changed, 13 insertions(+), 7 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index f627e15bb43c..1c448cf25506 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -1379,10 +1379,6 @@ void __init pcibios_resource_survey(void) pr_debug("PCI: Assigning unassigned resources...\n"); pci_assign_unassigned_resources(); } - - /* Call machine dependent fixup */ - if (ppc_md.pcibios_fixup) - ppc_md.pcibios_fixup(); } /* This is used by the PCI hotplug driver to allocate resource diff --git a/arch/powerpc/kernel/pci_32.c b/arch/powerpc/kernel/pci_32.c index 50942a1d1a5f..b49e1060a3bf 100644 --- a/arch/powerpc/kernel/pci_32.c +++ b/arch/powerpc/kernel/pci_32.c @@ -263,6 +263,10 @@ static int __init pcibios_init(void) /* Call common code to handle resource allocation */ pcibios_resource_survey(); + /* Call machine dependent fixup */ + if (ppc_md.pcibios_fixup) + ppc_md.pcibios_fixup(); + /* Call machine dependent post-init code */ if (ppc_md.pcibios_after_init) ppc_md.pcibios_after_init(); diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c index b7030b1189d0..f83d1f69b1dd 100644 --- a/arch/powerpc/kernel/pci_64.c +++ b/arch/powerpc/kernel/pci_64.c @@ -54,14 +54,20 @@ static int __init pcibios_init(void) pci_add_flags(PCI_ENABLE_PROC_DOMAINS | PCI_COMPAT_DOMAIN_0); /* Scan all of the recorded PCI controllers. */ - list_for_each_entry_safe(hose, tmp, &hose_list, list_node) { + list_for_each_entry_safe(hose, tmp, &hose_list, list_node) pcibios_scan_phb(hose); - pci_bus_add_devices(hose->bus); - } /* Call common code to handle resource allocation */ pcibios_resource_survey(); + /* Add devices. */ + list_for_each_entry_safe(hose, tmp, &hose_list, list_node) + pci_bus_add_devices(hose->bus); + + /* Call machine dependent fixup */ + if (ppc_md.pcibios_fixup) + ppc_md.pcibios_fixup(); + printk(KERN_DEBUG "PCI: Probing PCI hardware done\n"); return 0; -- cgit v1.2.3 From aa06e3d60e245284d1e55497eb3108828092818d Mon Sep 17 00:00:00 2001 From: Sam Bobroff Date: Fri, 16 Aug 2019 14:48:06 +1000 Subject: powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag The EEH_DEV_NO_HANDLER flag is used by the EEH system to prevent the use of driver callbacks in drivers that have been bound part way through the recovery process. This is necessary to prevent later stage handlers from being called when the earlier stage handlers haven't, which can be confusing for drivers. However, the flag is set for all devices that are added after boot time and only cleared at the end of the EEH recovery process. This results in hot plugged devices erroneously having the flag set during the first recovery after they are added (causing their driver's handlers to be incorrectly ignored). To remedy this, clear the flag at the beginning of recovery processing. The flag is still cleared at the end of recovery processing, although it is no longer really necessary. Also clear the flag during eeh_handle_special_event(), for the same reasons. Signed-off-by: Sam Bobroff Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/b8ca5629d27de74c957d4f4b250177d1b6fc4bbd.1565930772.git.sbobroff@linux.ibm.com --- arch/powerpc/kernel/eeh_driver.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 89623962c727..1fbe541856f5 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -793,6 +793,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe) result = PCI_ERS_RESULT_DISCONNECT; } + eeh_for_each_pe(pe, tmp_pe) + eeh_pe_for_each_dev(tmp_pe, edev, tmp) + edev->mode &= ~EEH_DEV_NO_HANDLER; + /* Walk the various device drivers attached to this slot through * a reset sequence, giving each an opportunity to do what it needs * to accomplish the reset. Each child gets a report of the @@ -981,7 +985,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe) */ void eeh_handle_special_event(void) { - struct eeh_pe *pe, *phb_pe; + struct eeh_pe *pe, *phb_pe, *tmp_pe; + struct eeh_dev *edev, *tmp_edev; struct pci_bus *bus; struct pci_controller *hose; unsigned long flags; @@ -1050,6 +1055,10 @@ void eeh_handle_special_event(void) (phb_pe->state & EEH_PE_RECOVERING)) continue; + eeh_for_each_pe(pe, tmp_pe) + eeh_pe_for_each_dev(tmp_pe, edev, tmp_edev) + edev->mode &= ~EEH_DEV_NO_HANDLER; + /* Notify all devices to be down */ eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true); eeh_set_channel_state(pe, pci_channel_io_perm_failure); -- cgit v1.2.3 From 617082a4817a4354fa3de05c80b5f6088e2083b7 Mon Sep 17 00:00:00 2001 From: Sam Bobroff Date: Fri, 16 Aug 2019 14:48:07 +1000 Subject: powerpc/eeh: Improve debug messages around device addition Also remove useless comment. Signed-off-by: Sam Bobroff Reviewed-by: Alexey Kardashevskiy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/59db84f4bf94718a12f206bc923ac797d47e4cc1.1565930772.git.sbobroff@linux.ibm.com --- arch/powerpc/kernel/eeh.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index c0e4b73191f3..d187d2b290a8 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -1198,7 +1198,7 @@ void eeh_add_device_late(struct pci_dev *dev) pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn); edev = pdn_to_eeh_dev(pdn); if (edev->pdev == dev) { - pr_debug("EEH: Already referenced !\n"); + pr_debug("EEH: Device %s already referenced!\n", pci_name(dev)); return; } -- cgit v1.2.3 From 685a0bc00abcf1d40d160eaafab9989f565ab2b5 Mon Sep 17 00:00:00 2001 From: Sam Bobroff Date: Fri, 16 Aug 2019 14:48:08 +1000 Subject: powerpc/eeh: Initialize EEH address cache earlier The EEH address cache is currently initialized and populated by a single function: eeh_addr_cache_build(). While the initial population of the cache can only be done once resources are allocated, initialization (just setting up a spinlock) could be done much earlier. So move the initialization step into a separate function and call it from a core_initcall (rather than a subsys initcall). This will allow future work to make use of the cache during boot time PCI scanning. Signed-off-by: Sam Bobroff Reviewed-by: Alexey Kardashevskiy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/0557206741bffee76cdfff042f65321f6f7a5b41.1565930772.git.sbobroff@linux.ibm.com --- arch/powerpc/kernel/eeh.c | 2 ++ arch/powerpc/kernel/eeh_cache.c | 13 +++++++++++-- 2 files changed, 13 insertions(+), 2 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index d187d2b290a8..22f646176abb 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -1120,6 +1120,8 @@ static int eeh_init(void) list_for_each_entry_safe(hose, tmp, &hose_list, list_node) eeh_dev_phb_init_dynamic(hose); + eeh_addr_cache_init(); + /* Initialize EEH event */ return eeh_event_init(); } diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c index 05ffd32b3416..75f0d66dee4b 100644 --- a/arch/powerpc/kernel/eeh_cache.c +++ b/arch/powerpc/kernel/eeh_cache.c @@ -257,6 +257,17 @@ void eeh_addr_cache_rmv_dev(struct pci_dev *dev) spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); } +/** + * eeh_addr_cache_init - Initialize a cache of I/O addresses + * + * Initialize a cache of pci i/o addresses. This cache will be used to + * find the pci device that corresponds to a given address. + */ +void eeh_addr_cache_init(void) +{ + spin_lock_init(&pci_io_addr_cache_root.piar_lock); +} + /** * eeh_addr_cache_build - Build a cache of I/O addresses * @@ -272,8 +283,6 @@ void eeh_addr_cache_build(void) struct eeh_dev *edev; struct pci_dev *dev = NULL; - spin_lock_init(&pci_io_addr_cache_root.piar_lock); - for_each_pci_dev(dev) { pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn); if (!pdn) -- cgit v1.2.3 From b905f8cdca7725e750a84f7188ea6821750124c3 Mon Sep 17 00:00:00 2001 From: Sam Bobroff Date: Fri, 16 Aug 2019 14:48:09 +1000 Subject: powerpc/eeh: EEH for pSeries hot plug On PowerNV and pSeries, devices currently acquire EEH support from several different places: Boot-time devices from eeh_probe_devices() and eeh_addr_cache_build(), Virtual Function devices from the pcibios bus add device hooks and hot plugged devices from pci_hp_add_devices() (with other platforms using other methods as well). Unfortunately, pSeries machines currently discover hot plugged devices using pci_rescan_bus(), not pci_hp_add_devices(), and so those devices do not receive EEH support. Rather than adding another case for pci_rescan_bus(), this change widens the scope of the pcibios bus add device hooks so that they can handle all devices. As a side effect this also supports devices discovered after manually rescanning via /sys/bus/pci/rescan. Note that on PowerNV, this change allows the EEH subsystem to become enabled after boot as long as it has not been forced off, which was not previously possible (it was already possible on pSeries). Signed-off-by: Sam Bobroff Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/72ae8ae9c54097158894a52de23690448de38ea9.1565930772.git.sbobroff@linux.ibm.com --- arch/powerpc/kernel/eeh.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 22f646176abb..abf4c1bb1fab 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -1192,7 +1192,7 @@ void eeh_add_device_late(struct pci_dev *dev) struct pci_dn *pdn; struct eeh_dev *edev; - if (!dev || !eeh_enabled()) + if (!dev) return; pr_debug("EEH: Adding device %s\n", pci_name(dev)); @@ -1248,6 +1248,8 @@ void eeh_add_device_tree_late(struct pci_bus *bus) { struct pci_dev *dev; + if (eeh_has_flag(EEH_FORCE_DISABLED)) + return; list_for_each_entry(dev, &bus->devices, bus_list) { eeh_add_device_late(dev); if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { -- cgit v1.2.3 From c44e4ccadaca5884ac82b6dfffbd693bec3b583e Mon Sep 17 00:00:00 2001 From: Sam Bobroff Date: Fri, 16 Aug 2019 14:48:10 +1000 Subject: powerpc/eeh: Refactor around eeh_probe_devices() Now that EEH support for all devices (on PowerNV and pSeries) is provided by the pcibios bus add device hooks, eeh_probe_devices() and eeh_addr_cache_build() are redundant and can be removed. Move the EEH enabled message into it's own function so that it can be called from multiple places. Note that previously on pSeries, useless EEH sysfs files were created for some devices that did not have EEH support and this change prevents them from being created. Signed-off-by: Sam Bobroff Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/33b0a6339d5ac88693de092d6fba984f2a5add66.1565930772.git.sbobroff@linux.ibm.com --- arch/powerpc/kernel/eeh.c | 27 ++++++++++----------------- arch/powerpc/kernel/eeh_cache.c | 32 -------------------------------- 2 files changed, 10 insertions(+), 49 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index abf4c1bb1fab..fc975342e242 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -150,6 +150,16 @@ static int __init eeh_setup(char *str) } __setup("eeh=", eeh_setup); +void eeh_show_enabled(void) +{ + if (eeh_has_flag(EEH_FORCE_DISABLED)) + pr_info("EEH: Recovery disabled by kernel parameter.\n"); + else if (eeh_has_flag(EEH_ENABLED)) + pr_info("EEH: Capable adapter found: recovery enabled.\n"); + else + pr_info("EEH: No capable adapters found: recovery disabled.\n"); +} + /* * This routine captures assorted PCI configuration space data * for the indicated PCI device, and puts them into a buffer @@ -1063,23 +1073,6 @@ static struct notifier_block eeh_reboot_nb = { .notifier_call = eeh_reboot_notifier, }; -void eeh_probe_devices(void) -{ - struct pci_controller *hose, *tmp; - struct pci_dn *pdn; - - /* Enable EEH for all adapters */ - list_for_each_entry_safe(hose, tmp, &hose_list, list_node) { - pdn = hose->pci_data; - traverse_pci_dn(pdn, eeh_ops->probe, NULL); - } - if (eeh_enabled()) - pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n"); - else - pr_info("EEH: No capable adapters found\n"); - -} - /** * eeh_init - EEH initialization * diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c index 75f0d66dee4b..1e47cd04a1bd 100644 --- a/arch/powerpc/kernel/eeh_cache.c +++ b/arch/powerpc/kernel/eeh_cache.c @@ -268,38 +268,6 @@ void eeh_addr_cache_init(void) spin_lock_init(&pci_io_addr_cache_root.piar_lock); } -/** - * eeh_addr_cache_build - Build a cache of I/O addresses - * - * Build a cache of pci i/o addresses. This cache will be used to - * find the pci device that corresponds to a given address. - * This routine scans all pci busses to build the cache. - * Must be run late in boot process, after the pci controllers - * have been scanned for devices (after all device resources are known). - */ -void eeh_addr_cache_build(void) -{ - struct pci_dn *pdn; - struct eeh_dev *edev; - struct pci_dev *dev = NULL; - - for_each_pci_dev(dev) { - pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn); - if (!pdn) - continue; - - edev = pdn_to_eeh_dev(pdn); - if (!edev) - continue; - - dev->dev.archdata.edev = edev; - edev->pdev = dev; - - eeh_addr_cache_insert_dev(dev); - eeh_sysfs_add_device(dev); - } -} - static int eeh_addr_cache_show(struct seq_file *s, void *v) { struct pci_io_addr_range *piar; -- cgit v1.2.3 From 7c33a994d32d89937b23673e7507e8ec1daad893 Mon Sep 17 00:00:00 2001 From: Oliver O'Halloran Date: Fri, 16 Aug 2019 14:48:11 +1000 Subject: powerpc/eeh: Add bdfn field to eeh_dev Preparation for removing pci_dn from the powernv EEH code. The only thing we really use pci_dn for is to get the bdfn of the device for config space accesses, so adding that information to eeh_dev reduces the need to carry around the pci_dn. Signed-off-by: Oliver O'Halloran [SB: Re-wrapped commit message, fixed whitespace damage.] Signed-off-by: Sam Bobroff Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/e458eb69a1f591d8a120782f23a8506b15d3c654.1565930772.git.sbobroff@linux.ibm.com --- arch/powerpc/kernel/eeh_dev.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c index c4317c452d98..7370185c7a05 100644 --- a/arch/powerpc/kernel/eeh_dev.c +++ b/arch/powerpc/kernel/eeh_dev.c @@ -47,6 +47,8 @@ struct eeh_dev *eeh_dev_init(struct pci_dn *pdn) /* Associate EEH device with OF node */ pdn->edev = edev; edev->pdn = pdn; + edev->bdfn = (pdn->busno << 8) | pdn->devfn; + edev->controller = pdn->phb; return edev; } -- cgit v1.2.3 From b093f2cbedfbaba6bf1c520fbfcb46403f3c7802 Mon Sep 17 00:00:00 2001 From: Sam Bobroff Date: Fri, 16 Aug 2019 14:48:12 +1000 Subject: powerpc/eeh: Introduce EEH edev logging macros Now that struct eeh_dev includes the BDFN of it's PCI device, make use of it to replace eeh_edev_info() with a set of dev_dbg()-style macros that only need a struct edev. With the BDFN available without the struct pci_dev, eeh_pci_name() is now unnecessary, so remove it. While only the "info" level function is used here, the others will be used in followup work. Signed-off-by: Sam Bobroff Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/f90ae9a53d762be7b0ccbad79e62b5a1b4f4996e.1565930772.git.sbobroff@linux.ibm.com --- arch/powerpc/kernel/eeh_driver.c | 17 ----------------- 1 file changed, 17 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 1fbe541856f5..ab576bcbe4dd 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -81,23 +81,6 @@ static const char *pci_ers_result_name(enum pci_ers_result result) } }; -static __printf(2, 3) void eeh_edev_info(const struct eeh_dev *edev, - const char *fmt, ...) -{ - struct va_format vaf; - va_list args; - - va_start(args, fmt); - - vaf.fmt = fmt; - vaf.va = &args; - - printk(KERN_INFO "EEH: PE#%x (PCI %s): %pV\n", edev->pe_config_addr, - edev->pdev ? dev_name(&edev->pdev->dev) : "none", &vaf); - - va_end(args); -} - static enum pci_ers_result pci_ers_merge_result(enum pci_ers_result old, enum pci_ers_result new) { -- cgit v1.2.3 From 1ff8f36fc770dd2b3eb294312f270db8cf94cc13 Mon Sep 17 00:00:00 2001 From: Sam Bobroff Date: Fri, 16 Aug 2019 14:48:13 +1000 Subject: powerpc/eeh: Convert log messages to eeh_edev_* macros Convert existing messages, where appropriate, to use the eeh_edev_* logging macros. The only effect should be minor adjustments to the log messages, apart from: - A new message in pseries_eeh_probe() "Probing device" to match the powernv case. - The "Probing device" message in pnv_eeh_probe() is now generated slightly later, which will mean that it is no longer emitted for devices that aren't probed due to the initial checks. Signed-off-by: Sam Bobroff Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/ce505a0a7a4a5b0367f0f40f8b26e7c0a9cf4cb7.1565930772.git.sbobroff@linux.ibm.com --- arch/powerpc/kernel/eeh.c | 19 +++++++-------- arch/powerpc/kernel/eeh_cache.c | 8 +++---- arch/powerpc/kernel/eeh_driver.c | 7 ++---- arch/powerpc/kernel/eeh_pe.c | 51 ++++++++++------------------------------ 4 files changed, 27 insertions(+), 58 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index fc975342e242..958e03ca1db6 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -470,8 +470,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev) /* Access to IO BARs might get this far and still not want checking. */ if (!pe) { eeh_stats.ignored_check++; - pr_debug("EEH: Ignored check for %s\n", - eeh_pci_name(dev)); + eeh_edev_dbg(edev, "Ignored check\n"); return 0; } @@ -511,12 +510,11 @@ int eeh_dev_check_failure(struct eeh_dev *edev) if (dn) location = of_get_property(dn, "ibm,loc-code", NULL); - printk(KERN_ERR "EEH: %d reads ignored for recovering device at " - "location=%s driver=%s pci addr=%s\n", + eeh_edev_err(edev, "%d reads ignored for recovering device at location=%s driver=%s\n", pe->check_count, location ? location : "unknown", - eeh_driver_name(dev), eeh_pci_name(dev)); - printk(KERN_ERR "EEH: Might be infinite loop in %s driver\n", + eeh_driver_name(dev)); + eeh_edev_err(edev, "Might be infinite loop in %s driver\n", eeh_driver_name(dev)); dump_stack(); } @@ -1188,12 +1186,11 @@ void eeh_add_device_late(struct pci_dev *dev) if (!dev) return; - pr_debug("EEH: Adding device %s\n", pci_name(dev)); - pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn); edev = pdn_to_eeh_dev(pdn); + eeh_edev_dbg(edev, "Adding device\n"); if (edev->pdev == dev) { - pr_debug("EEH: Device %s already referenced!\n", pci_name(dev)); + eeh_edev_dbg(edev, "Device already referenced!\n"); return; } @@ -1296,10 +1293,10 @@ void eeh_remove_device(struct pci_dev *dev) edev = pci_dev_to_eeh_dev(dev); /* Unregister the device with the EEH/PCI address search system */ - pr_debug("EEH: Removing device %s\n", pci_name(dev)); + dev_dbg(&dev->dev, "EEH: Removing device\n"); if (!edev || !edev->pdev || !edev->pe) { - pr_debug("EEH: Not referenced !\n"); + dev_dbg(&dev->dev, "EEH: Device not referenced!\n"); return; } diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c index 1e47cd04a1bd..cf11277ebd02 100644 --- a/arch/powerpc/kernel/eeh_cache.c +++ b/arch/powerpc/kernel/eeh_cache.c @@ -148,8 +148,8 @@ eeh_addr_cache_insert(struct pci_dev *dev, resource_size_t alo, piar->pcidev = dev; piar->flags = flags; - pr_debug("PIAR: insert range=[%pap:%pap] dev=%s\n", - &alo, &ahi, pci_name(dev)); + eeh_edev_dbg(piar->edev, "PIAR: insert range=[%pap:%pap]\n", + &alo, &ahi); rb_link_node(&piar->rb_node, parent, p); rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); @@ -229,8 +229,8 @@ restart: piar = rb_entry(n, struct pci_io_addr_range, rb_node); if (piar->pcidev == dev) { - pr_debug("PIAR: remove range=[%pap:%pap] dev=%s\n", - &piar->addr_lo, &piar->addr_hi, pci_name(dev)); + eeh_edev_dbg(piar->edev, "PIAR: remove range=[%pap:%pap]\n", + &piar->addr_lo, &piar->addr_hi); rb_erase(n, &pci_io_addr_cache_root.rb_root); kfree(piar); goto restart; diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index ab576bcbe4dd..274075a814b6 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -456,12 +456,9 @@ static void *eeh_add_virt_device(struct eeh_dev *edev) { struct pci_driver *driver; struct pci_dev *dev = eeh_dev_to_pci_dev(edev); - struct pci_dn *pdn = eeh_dev_to_pdn(edev); if (!(edev->physfn)) { - pr_warn("%s: EEH dev %04x:%02x:%02x.%01x not for VF\n", - __func__, pdn->phb->global_number, pdn->busno, - PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn)); + eeh_edev_warn(edev, "Not for VF\n"); return NULL; } @@ -475,7 +472,7 @@ static void *eeh_add_virt_device(struct eeh_dev *edev) } #ifdef CONFIG_PCI_IOV - pci_iov_add_virtfn(edev->physfn, pdn->vf_index); + pci_iov_add_virtfn(edev->physfn, eeh_dev_to_pdn(edev)->vf_index); #endif return NULL; } diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index 854cef7b18f4..317a31624526 100644 --- a/arch/powerpc/kernel/eeh_pe.c +++ b/arch/powerpc/kernel/eeh_pe.c @@ -379,8 +379,7 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev) /* Check if the PE number is valid */ if (!eeh_has_flag(EEH_VALID_PE_ZERO) && !edev->pe_config_addr) { - pr_err("%s: Invalid PE#0 for edev 0x%x on PHB#%x\n", - __func__, config_addr, pdn->phb->global_number); + eeh_edev_err(edev, "PE#0 is invalid for this PHB!\n"); return -EINVAL; } @@ -398,12 +397,7 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev) /* Put the edev to PE */ list_add_tail(&edev->entry, &pe->edevs); - pr_debug("EEH: Add %04x:%02x:%02x.%01x to Bus PE#%x\n", - pdn->phb->global_number, - pdn->busno, - PCI_SLOT(pdn->devfn), - PCI_FUNC(pdn->devfn), - pe->addr); + eeh_edev_dbg(edev, "Added to bus PE\n"); return 0; } else if (pe && (pe->type & EEH_PE_INVALID)) { list_add_tail(&edev->entry, &pe->edevs); @@ -420,13 +414,8 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev) parent = parent->parent; } - pr_debug("EEH: Add %04x:%02x:%02x.%01x to Device " - "PE#%x, Parent PE#%x\n", - pdn->phb->global_number, - pdn->busno, - PCI_SLOT(pdn->devfn), - PCI_FUNC(pdn->devfn), - pe->addr, pe->parent->addr); + eeh_edev_dbg(edev, "Added to device PE (parent: PE#%x)\n", + pe->parent->addr); return 0; } @@ -468,13 +457,8 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev) list_add_tail(&pe->child, &parent->child_list); list_add_tail(&edev->entry, &pe->edevs); edev->pe = pe; - pr_debug("EEH: Add %04x:%02x:%02x.%01x to " - "Device PE#%x, Parent PE#%x\n", - pdn->phb->global_number, - pdn->busno, - PCI_SLOT(pdn->devfn), - PCI_FUNC(pdn->devfn), - pe->addr, pe->parent->addr); + eeh_edev_dbg(edev, "Added to device PE (parent: PE#%x)\n", + pe->parent->addr); return 0; } @@ -492,15 +476,10 @@ int eeh_rmv_from_parent_pe(struct eeh_dev *edev) { struct eeh_pe *pe, *parent, *child; int cnt; - struct pci_dn *pdn = eeh_dev_to_pdn(edev); pe = eeh_dev_to_pe(edev); if (!pe) { - pr_debug("%s: No PE found for device %04x:%02x:%02x.%01x\n", - __func__, pdn->phb->global_number, - pdn->busno, - PCI_SLOT(pdn->devfn), - PCI_FUNC(pdn->devfn)); + eeh_edev_dbg(edev, "No PE found for device.\n"); return -EEXIST; } @@ -717,17 +696,13 @@ static void eeh_bridge_check_link(struct eeh_dev *edev) if (!(edev->mode & (EEH_DEV_ROOT_PORT | EEH_DEV_DS_PORT))) return; - pr_debug("%s: Check PCIe link for %04x:%02x:%02x.%01x ...\n", - __func__, pdn->phb->global_number, - pdn->busno, - PCI_SLOT(pdn->devfn), - PCI_FUNC(pdn->devfn)); + eeh_edev_dbg(edev, "Checking PCIe link...\n"); /* Check slot status */ cap = edev->pcie_cap; eeh_ops->read_config(pdn, cap + PCI_EXP_SLTSTA, 2, &val); if (!(val & PCI_EXP_SLTSTA_PDS)) { - pr_debug(" No card in the slot (0x%04x) !\n", val); + eeh_edev_dbg(edev, "No card in the slot (0x%04x) !\n", val); return; } @@ -736,7 +711,7 @@ static void eeh_bridge_check_link(struct eeh_dev *edev) if (val & PCI_EXP_SLTCAP_PCP) { eeh_ops->read_config(pdn, cap + PCI_EXP_SLTCTL, 2, &val); if (val & PCI_EXP_SLTCTL_PCC) { - pr_debug(" In power-off state, power it on ...\n"); + eeh_edev_dbg(edev, "In power-off state, power it on ...\n"); val &= ~(PCI_EXP_SLTCTL_PCC | PCI_EXP_SLTCTL_PIC); val |= (0x0100 & PCI_EXP_SLTCTL_PIC); eeh_ops->write_config(pdn, cap + PCI_EXP_SLTCTL, 2, val); @@ -752,7 +727,7 @@ static void eeh_bridge_check_link(struct eeh_dev *edev) /* Check link */ eeh_ops->read_config(pdn, cap + PCI_EXP_LNKCAP, 4, &val); if (!(val & PCI_EXP_LNKCAP_DLLLARC)) { - pr_debug(" No link reporting capability (0x%08x) \n", val); + eeh_edev_dbg(edev, "No link reporting capability (0x%08x) \n", val); msleep(1000); return; } @@ -769,10 +744,10 @@ static void eeh_bridge_check_link(struct eeh_dev *edev) } if (val & PCI_EXP_LNKSTA_DLLLA) - pr_debug(" Link up (%s)\n", + eeh_edev_dbg(edev, "Link up (%s)\n", (val & PCI_EXP_LNKSTA_CLS_2_5GB) ? "2.5GB" : "5GB"); else - pr_debug(" Link not ready (0x%04x)\n", val); + eeh_edev_dbg(edev, "Link not ready (0x%04x)\n", val); } #define BYTE_SWAP(OFF) (8*((OFF)/4)+3-(OFF)) -- cgit v1.2.3 From 2e25505147b8acf6510b9d5d951fd4c75f2e9bf2 Mon Sep 17 00:00:00 2001 From: Sam Bobroff Date: Fri, 16 Aug 2019 14:48:14 +1000 Subject: powerpc/eeh: Fix crash when edev->pdev changes If a PCI device is removed during eeh_pe_report_edev(), between the calls to device_lock() and device_unlock(), edev->pdev will change and cause a crash as the wrong mutex is released. To correct this, hold the PCI rescan/remove lock while taking a copy of edev->pdev and performing a get_device() on it. Use this value to release the mutex, but also pass it through to the device driver's EEH handlers so that they always see the same device. Signed-off-by: Sam Bobroff Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/3c590579a0faa24d20c826dcd26c739eb4d454e6.1565930772.git.sbobroff@linux.ibm.com --- arch/powerpc/kernel/eeh_driver.c | 44 ++++++++++++++++++++++++++-------------- 1 file changed, 29 insertions(+), 15 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 274075a814b6..e817d78fe52d 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -257,20 +257,27 @@ static void eeh_set_irq_state(struct eeh_pe *root, bool enable) } typedef enum pci_ers_result (*eeh_report_fn)(struct eeh_dev *, + struct pci_dev *, struct pci_driver *); static void eeh_pe_report_edev(struct eeh_dev *edev, eeh_report_fn fn, enum pci_ers_result *result) { + struct pci_dev *pdev; struct pci_driver *driver; enum pci_ers_result new_result; - if (!edev->pdev) { + pci_lock_rescan_remove(); + pdev = edev->pdev; + if (pdev) + get_device(&pdev->dev); + pci_unlock_rescan_remove(); + if (!pdev) { eeh_edev_info(edev, "no device"); return; } - device_lock(&edev->pdev->dev); + device_lock(&pdev->dev); if (eeh_edev_actionable(edev)) { - driver = eeh_pcid_get(edev->pdev); + driver = eeh_pcid_get(pdev); if (!driver) eeh_edev_info(edev, "no driver"); @@ -279,7 +286,7 @@ static void eeh_pe_report_edev(struct eeh_dev *edev, eeh_report_fn fn, else if (edev->mode & EEH_DEV_NO_HANDLER) eeh_edev_info(edev, "driver bound too late"); else { - new_result = fn(edev, driver); + new_result = fn(edev, pdev, driver); eeh_edev_info(edev, "%s driver reports: '%s'", driver->name, pci_ers_result_name(new_result)); @@ -288,12 +295,15 @@ static void eeh_pe_report_edev(struct eeh_dev *edev, eeh_report_fn fn, new_result); } if (driver) - eeh_pcid_put(edev->pdev); + eeh_pcid_put(pdev); } else { - eeh_edev_info(edev, "not actionable (%d,%d,%d)", !!edev->pdev, + eeh_edev_info(edev, "not actionable (%d,%d,%d)", !!pdev, !eeh_dev_removed(edev), !eeh_pe_passed(edev->pe)); } - device_unlock(&edev->pdev->dev); + device_unlock(&pdev->dev); + if (edev->pdev != pdev) + eeh_edev_warn(edev, "Device changed during processing!\n"); + put_device(&pdev->dev); } static void eeh_pe_report(const char *name, struct eeh_pe *root, @@ -320,20 +330,20 @@ static void eeh_pe_report(const char *name, struct eeh_pe *root, * Report an EEH error to each device driver. */ static enum pci_ers_result eeh_report_error(struct eeh_dev *edev, + struct pci_dev *pdev, struct pci_driver *driver) { enum pci_ers_result rc; - struct pci_dev *dev = edev->pdev; if (!driver->err_handler->error_detected) return PCI_ERS_RESULT_NONE; eeh_edev_info(edev, "Invoking %s->error_detected(IO frozen)", driver->name); - rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen); + rc = driver->err_handler->error_detected(pdev, pci_channel_io_frozen); edev->in_error = true; - pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); + pci_uevent_ers(pdev, PCI_ERS_RESULT_NONE); return rc; } @@ -346,12 +356,13 @@ static enum pci_ers_result eeh_report_error(struct eeh_dev *edev, * are now enabled. */ static enum pci_ers_result eeh_report_mmio_enabled(struct eeh_dev *edev, + struct pci_dev *pdev, struct pci_driver *driver) { if (!driver->err_handler->mmio_enabled) return PCI_ERS_RESULT_NONE; eeh_edev_info(edev, "Invoking %s->mmio_enabled()", driver->name); - return driver->err_handler->mmio_enabled(edev->pdev); + return driver->err_handler->mmio_enabled(pdev); } /** @@ -365,12 +376,13 @@ static enum pci_ers_result eeh_report_mmio_enabled(struct eeh_dev *edev, * driver can work again while the device is recovered. */ static enum pci_ers_result eeh_report_reset(struct eeh_dev *edev, + struct pci_dev *pdev, struct pci_driver *driver) { if (!driver->err_handler->slot_reset || !edev->in_error) return PCI_ERS_RESULT_NONE; eeh_edev_info(edev, "Invoking %s->slot_reset()", driver->name); - return driver->err_handler->slot_reset(edev->pdev); + return driver->err_handler->slot_reset(pdev); } static void *eeh_dev_restore_state(struct eeh_dev *edev, void *userdata) @@ -411,13 +423,14 @@ static void *eeh_dev_restore_state(struct eeh_dev *edev, void *userdata) * to make the recovered device work again. */ static enum pci_ers_result eeh_report_resume(struct eeh_dev *edev, + struct pci_dev *pdev, struct pci_driver *driver) { if (!driver->err_handler->resume || !edev->in_error) return PCI_ERS_RESULT_NONE; eeh_edev_info(edev, "Invoking %s->resume()", driver->name); - driver->err_handler->resume(edev->pdev); + driver->err_handler->resume(pdev); pci_uevent_ers(edev->pdev, PCI_ERS_RESULT_RECOVERED); #ifdef CONFIG_PCI_IOV @@ -436,6 +449,7 @@ static enum pci_ers_result eeh_report_resume(struct eeh_dev *edev, * dead, and that no further recovery attempts will be made on it. */ static enum pci_ers_result eeh_report_failure(struct eeh_dev *edev, + struct pci_dev *pdev, struct pci_driver *driver) { enum pci_ers_result rc; @@ -445,10 +459,10 @@ static enum pci_ers_result eeh_report_failure(struct eeh_dev *edev, eeh_edev_info(edev, "Invoking %s->error_detected(permanent failure)", driver->name); - rc = driver->err_handler->error_detected(edev->pdev, + rc = driver->err_handler->error_detected(pdev, pci_channel_io_perm_failure); - pci_uevent_ers(edev->pdev, PCI_ERS_RESULT_DISCONNECT); + pci_uevent_ers(pdev, PCI_ERS_RESULT_DISCONNECT); return rc; } -- cgit v1.2.3 From cef50c67c1d511bbbc974cead2bebeb6f83730ce Mon Sep 17 00:00:00 2001 From: Sam Bobroff Date: Fri, 16 Aug 2019 14:48:15 +1000 Subject: powerpc/eeh: Remove unused return path from eeh_pe_dev_traverse() There are no users of the early-out return value from eeh_pe_dev_traverse(), so remove it. Signed-off-by: Sam Bobroff Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/c648070f5b28fe8ca1880b48e64b267959ffd369.1565930772.git.sbobroff@linux.ibm.com --- arch/powerpc/kernel/eeh.c | 16 +++++----------- arch/powerpc/kernel/eeh_driver.c | 26 +++++++++++--------------- arch/powerpc/kernel/eeh_pe.c | 25 +++++++------------------ 3 files changed, 23 insertions(+), 44 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 958e03ca1db6..7b2755f5c6fd 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -705,7 +705,7 @@ int eeh_pci_enable(struct eeh_pe *pe, int function) return rc; } -static void *eeh_disable_and_save_dev_state(struct eeh_dev *edev, +static void eeh_disable_and_save_dev_state(struct eeh_dev *edev, void *userdata) { struct pci_dev *pdev = eeh_dev_to_pci_dev(edev); @@ -716,7 +716,7 @@ static void *eeh_disable_and_save_dev_state(struct eeh_dev *edev, * state for the specified device */ if (!pdev || pdev == dev) - return NULL; + return; /* Ensure we have D0 power state */ pci_set_power_state(pdev, PCI_D0); @@ -729,18 +729,16 @@ static void *eeh_disable_and_save_dev_state(struct eeh_dev *edev, * interrupt from the device */ pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE); - - return NULL; } -static void *eeh_restore_dev_state(struct eeh_dev *edev, void *userdata) +static void eeh_restore_dev_state(struct eeh_dev *edev, void *userdata) { struct pci_dn *pdn = eeh_dev_to_pdn(edev); struct pci_dev *pdev = eeh_dev_to_pci_dev(edev); struct pci_dev *dev = userdata; if (!pdev) - return NULL; + return; /* Apply customization from firmware */ if (pdn && eeh_ops->restore_config) @@ -749,8 +747,6 @@ static void *eeh_restore_dev_state(struct eeh_dev *edev, void *userdata) /* The caller should restore state for the specified device */ if (pdev != dev) pci_restore_state(pdev); - - return NULL; } int eeh_restore_vf_config(struct pci_dn *pdn) @@ -876,7 +872,7 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat * the indicated device and its children so that the bunch of the * devices could be reset properly. */ -static void *eeh_set_dev_freset(struct eeh_dev *edev, void *flag) +static void eeh_set_dev_freset(struct eeh_dev *edev, void *flag) { struct pci_dev *dev; unsigned int *freset = (unsigned int *)flag; @@ -884,8 +880,6 @@ static void *eeh_set_dev_freset(struct eeh_dev *edev, void *flag) dev = eeh_dev_to_pci_dev(edev); if (dev) *freset |= dev->needs_freset; - - return NULL; } static void eeh_pe_refreeze_passed(struct eeh_pe *root) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index e817d78fe52d..a31cd32c4ce9 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -197,12 +197,12 @@ static void eeh_enable_irq(struct eeh_dev *edev) } } -static void *eeh_dev_save_state(struct eeh_dev *edev, void *userdata) +static void eeh_dev_save_state(struct eeh_dev *edev, void *userdata) { struct pci_dev *pdev; if (!edev) - return NULL; + return; /* * We cannot access the config space on some adapters. @@ -212,14 +212,13 @@ static void *eeh_dev_save_state(struct eeh_dev *edev, void *userdata) * device is created. */ if (edev->pe && (edev->pe->state & EEH_PE_CFG_RESTRICTED)) - return NULL; + return; pdev = eeh_dev_to_pci_dev(edev); if (!pdev) - return NULL; + return; pci_save_state(pdev); - return NULL; } static void eeh_set_channel_state(struct eeh_pe *root, enum pci_channel_state s) @@ -385,12 +384,12 @@ static enum pci_ers_result eeh_report_reset(struct eeh_dev *edev, return driver->err_handler->slot_reset(pdev); } -static void *eeh_dev_restore_state(struct eeh_dev *edev, void *userdata) +static void eeh_dev_restore_state(struct eeh_dev *edev, void *userdata) { struct pci_dev *pdev; if (!edev) - return NULL; + return; /* * The content in the config space isn't saved because @@ -402,15 +401,14 @@ static void *eeh_dev_restore_state(struct eeh_dev *edev, void *userdata) if (list_is_last(&edev->entry, &edev->pe->edevs)) eeh_pe_restore_bars(edev->pe); - return NULL; + return; } pdev = eeh_dev_to_pci_dev(edev); if (!pdev) - return NULL; + return; pci_restore_state(pdev); - return NULL; } /** @@ -491,7 +489,7 @@ static void *eeh_add_virt_device(struct eeh_dev *edev) return NULL; } -static void *eeh_rmv_device(struct eeh_dev *edev, void *userdata) +static void eeh_rmv_device(struct eeh_dev *edev, void *userdata) { struct pci_driver *driver; struct pci_dev *dev = eeh_dev_to_pci_dev(edev); @@ -506,7 +504,7 @@ static void *eeh_rmv_device(struct eeh_dev *edev, void *userdata) */ if (!eeh_edev_actionable(edev) || (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)) - return NULL; + return; if (rmv_data) { driver = eeh_pcid_get(dev); @@ -515,7 +513,7 @@ static void *eeh_rmv_device(struct eeh_dev *edev, void *userdata) driver->err_handler->error_detected && driver->err_handler->slot_reset) { eeh_pcid_put(dev); - return NULL; + return; } eeh_pcid_put(dev); } @@ -548,8 +546,6 @@ static void *eeh_rmv_device(struct eeh_dev *edev, void *userdata) pci_stop_and_remove_bus_device(dev); pci_unlock_rescan_remove(); } - - return NULL; } static void *eeh_pe_detach_dev(struct eeh_pe *pe, void *userdata) diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index 317a31624526..9c38fa7c33aa 100644 --- a/arch/powerpc/kernel/eeh_pe.c +++ b/arch/powerpc/kernel/eeh_pe.c @@ -231,29 +231,22 @@ void *eeh_pe_traverse(struct eeh_pe *root, * The function is used to traverse the devices of the specified * PE and its child PEs. */ -void *eeh_pe_dev_traverse(struct eeh_pe *root, +void eeh_pe_dev_traverse(struct eeh_pe *root, eeh_edev_traverse_func fn, void *flag) { struct eeh_pe *pe; struct eeh_dev *edev, *tmp; - void *ret; if (!root) { pr_warn("%s: Invalid PE %p\n", __func__, root); - return NULL; + return; } /* Traverse root PE */ - eeh_for_each_pe(root, pe) { - eeh_pe_for_each_dev(pe, edev, tmp) { - ret = fn(edev, flag); - if (ret) - return ret; - } - } - - return NULL; + eeh_for_each_pe(root, pe) + eeh_pe_for_each_dev(pe, edev, tmp) + fn(edev, flag); } /** @@ -602,13 +595,11 @@ void eeh_pe_mark_isolated(struct eeh_pe *root) } EXPORT_SYMBOL_GPL(eeh_pe_mark_isolated); -static void *__eeh_pe_dev_mode_mark(struct eeh_dev *edev, void *flag) +static void __eeh_pe_dev_mode_mark(struct eeh_dev *edev, void *flag) { int mode = *((int *)flag); edev->mode |= mode; - - return NULL; } /** @@ -827,7 +818,7 @@ static void eeh_restore_device_bars(struct eeh_dev *edev) * the expansion ROM base address, the latency timer, and etc. * from the saved values in the device node. */ -static void *eeh_restore_one_device_bars(struct eeh_dev *edev, void *flag) +static void eeh_restore_one_device_bars(struct eeh_dev *edev, void *flag) { struct pci_dn *pdn = eeh_dev_to_pdn(edev); @@ -839,8 +830,6 @@ static void *eeh_restore_one_device_bars(struct eeh_dev *edev, void *flag) if (eeh_ops->restore_config && pdn) eeh_ops->restore_config(pdn); - - return NULL; } /** -- cgit v1.2.3 From 27d4396ed5a1c70f894557fd5732725eb1d94542 Mon Sep 17 00:00:00 2001 From: Sam Bobroff Date: Fri, 16 Aug 2019 14:48:16 +1000 Subject: powerpc/eeh: Slightly simplify eeh_add_to_parent_pe() Simplify some needlessly complicated boolean logic in eeh_add_to_parent_pe(). Signed-off-by: Sam Bobroff Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/09259a50308f10aa764695912bc87dc1d1cf654c.1565930772.git.sbobroff@linux.ibm.com --- arch/powerpc/kernel/eeh_pe.c | 52 +++++++++++++++++++++++--------------------- 1 file changed, 27 insertions(+), 25 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index 9c38fa7c33aa..1a6254bcf056 100644 --- a/arch/powerpc/kernel/eeh_pe.c +++ b/arch/powerpc/kernel/eeh_pe.c @@ -383,32 +383,34 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev) * components. */ pe = eeh_pe_get(pdn->phb, edev->pe_config_addr, config_addr); - if (pe && !(pe->type & EEH_PE_INVALID)) { - /* Mark the PE as type of PCI bus */ - pe->type = EEH_PE_BUS; - edev->pe = pe; - - /* Put the edev to PE */ - list_add_tail(&edev->entry, &pe->edevs); - eeh_edev_dbg(edev, "Added to bus PE\n"); - return 0; - } else if (pe && (pe->type & EEH_PE_INVALID)) { - list_add_tail(&edev->entry, &pe->edevs); - edev->pe = pe; - /* - * We're running to here because of PCI hotplug caused by - * EEH recovery. We need clear EEH_PE_INVALID until the top. - */ - parent = pe; - while (parent) { - if (!(parent->type & EEH_PE_INVALID)) - break; - parent->type &= ~EEH_PE_INVALID; - parent = parent->parent; - } + if (pe) { + if (pe->type & EEH_PE_INVALID) { + list_add_tail(&edev->entry, &pe->edevs); + edev->pe = pe; + /* + * We're running to here because of PCI hotplug caused by + * EEH recovery. We need clear EEH_PE_INVALID until the top. + */ + parent = pe; + while (parent) { + if (!(parent->type & EEH_PE_INVALID)) + break; + parent->type &= ~EEH_PE_INVALID; + parent = parent->parent; + } - eeh_edev_dbg(edev, "Added to device PE (parent: PE#%x)\n", - pe->parent->addr); + eeh_edev_dbg(edev, + "Added to device PE (parent: PE#%x)\n", + pe->parent->addr); + } else { + /* Mark the PE as type of PCI bus */ + pe->type = EEH_PE_BUS; + edev->pe = pe; + + /* Put the edev to PE */ + list_add_tail(&edev->entry, &pe->edevs); + eeh_edev_dbg(edev, "Added to bus PE\n"); + } return 0; } -- cgit v1.2.3 From cdfee5623290bc893f595636b44fa28e8207c5b3 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 16 Aug 2019 08:24:35 +0200 Subject: driver core: initialize a default DMA mask for platform device We still treat devices without a DMA mask as defaulting to 32-bits for both mask, but a few releases ago we've started warning about such cases, as they require special cases to work around this sloppyness. Add a dma_mask field to struct platform_device so that we can initialize the dma_mask pointer in struct device and initialize both masks to 32-bits by default, replacing similar functionality in m68k and powerpc. The arch_setup_pdev_archdata hooks is now unused and removed. Note that the code looks a little odd with the various conditionals because we have to support platform_device structures that are statically allocated. Signed-off-by: Christoph Hellwig Acked-by: Geert Uytterhoeven Link: https://lore.kernel.org/r/20190816062435.881-7-hch@lst.de Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/setup-common.c | 6 ------ 1 file changed, 6 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 1f8db666468d..5e6543aba1b3 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -778,12 +778,6 @@ void ppc_printk_progress(char *s, unsigned short hex) pr_info("%s\n", s); } -void arch_setup_pdev_archdata(struct platform_device *pdev) -{ - pdev->archdata.dma_mask = DMA_BIT_MASK(32); - pdev->dev.dma_mask = &pdev->archdata.dma_mask; -} - static __init void print_system_info(void) { pr_info("-----------------------------------------------------\n"); -- cgit v1.2.3 From d8f0e0b073e1ec52a05f0c2a56318b47387d2f10 Mon Sep 17 00:00:00 2001 From: "Christopher M. Riedl" Date: Thu, 23 May 2019 21:46:48 -0500 Subject: powerpc/64s: support nospectre_v2 cmdline option Add support for disabling the kernel implemented spectre v2 mitigation (count cache flush on context switch) via the nospectre_v2 and mitigations=off cmdline options. Suggested-by: Michael Ellerman Signed-off-by: Christopher M. Riedl Reviewed-by: Andrew Donnellan Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190524024647.381-1-cmr@informatik.wtf --- arch/powerpc/kernel/security.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c index e1c9cf079503..7cfcb294b11c 100644 --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -28,7 +28,7 @@ static enum count_cache_flush_type count_cache_flush_type = COUNT_CACHE_FLUSH_NO bool barrier_nospec_enabled; static bool no_nospec; static bool btb_flush_enabled; -#ifdef CONFIG_PPC_FSL_BOOK3E +#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_BOOK3S_64) static bool no_spectrev2; #endif @@ -114,7 +114,7 @@ static __init int security_feature_debugfs_init(void) device_initcall(security_feature_debugfs_init); #endif /* CONFIG_DEBUG_FS */ -#ifdef CONFIG_PPC_FSL_BOOK3E +#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_BOOK3S_64) static int __init handle_nospectre_v2(char *p) { no_spectrev2 = true; @@ -122,6 +122,9 @@ static int __init handle_nospectre_v2(char *p) return 0; } early_param("nospectre_v2", handle_nospectre_v2); +#endif /* CONFIG_PPC_FSL_BOOK3E || CONFIG_PPC_BOOK3S_64 */ + +#ifdef CONFIG_PPC_FSL_BOOK3E void setup_spectre_v2(void) { if (no_spectrev2 || cpu_mitigations_off()) @@ -399,7 +402,17 @@ static void toggle_count_cache_flush(bool enable) void setup_count_cache_flush(void) { - toggle_count_cache_flush(true); + bool enable = true; + + if (no_spectrev2 || cpu_mitigations_off()) { + if (security_ftr_enabled(SEC_FTR_BCCTRL_SERIALISED) || + security_ftr_enabled(SEC_FTR_COUNT_CACHE_DISABLED)) + pr_warn("Spectre v2 mitigations not under software control, can't disable\n"); + + enable = false; + } + + toggle_count_cache_flush(enable); } #ifdef CONFIG_DEBUG_FS -- cgit v1.2.3 From 14b4d97669b79d1ac83e64d6795098394e15ab1b Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Tue, 20 Aug 2019 14:07:13 +0000 Subject: powerpc/mm: rework io-workaround invocation. ppc_md.ioremap() is only used for I/O workaround on CELL platform, so indirect function call can be avoided. This patch reworks the io-workaround and ioremap() functions to use the global 'io_workaround_inited' flag for the activation of io-workaround. When CONFIG_PPC_IO_WORKAROUNDS or CONFIG_PPC_INDIRECT_MMIO are not selected, the I/O workaround ioremap() voids and the global flag is not used. Signed-off-by: Christophe Leroy Reviewed-by: Christoph Hellwig Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/5fa3ef069fbd0f152512afaae19e7a60161454cf.1566309262.git.christophe.leroy@c-s.fr --- arch/powerpc/kernel/io-workarounds.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/io-workarounds.c b/arch/powerpc/kernel/io-workarounds.c index fbd2d0007c52..0276bc8c8969 100644 --- a/arch/powerpc/kernel/io-workarounds.c +++ b/arch/powerpc/kernel/io-workarounds.c @@ -149,8 +149,8 @@ static const struct ppc_pci_io iowa_pci_io = { }; #ifdef CONFIG_PPC_INDIRECT_MMIO -static void __iomem *iowa_ioremap(phys_addr_t addr, unsigned long size, - pgprot_t prot, void *caller) +void __iomem *iowa_ioremap(phys_addr_t addr, unsigned long size, + pgprot_t prot, void *caller) { struct iowa_bus *bus; void __iomem *res = __ioremap_caller(addr, size, prot, caller); @@ -163,20 +163,17 @@ static void __iomem *iowa_ioremap(phys_addr_t addr, unsigned long size, } return res; } -#else /* CONFIG_PPC_INDIRECT_MMIO */ -#define iowa_ioremap NULL #endif /* !CONFIG_PPC_INDIRECT_MMIO */ +bool io_workaround_inited; + /* Enable IO workaround */ static void io_workaround_init(void) { - static int io_workaround_inited; - if (io_workaround_inited) return; ppc_pci_io = iowa_pci_io; - ppc_md.ioremap = iowa_ioremap; - io_workaround_inited = 1; + io_workaround_inited = true; } /* Register new bus to support workaround */ -- cgit v1.2.3 From c691b4b83b6a348f7b9d13c36916e73c2e1d85e4 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Tue, 20 Aug 2019 14:34:12 +0000 Subject: powerpc: rewrite LOAD_REG_IMMEDIATE() as an intelligent macro Today LOAD_REG_IMMEDIATE() is a basic #define which loads all parts on a value into a register, including the parts that are NUL. This means always 2 instructions on PPC32 and always 5 instructions on PPC64. And those instructions cannot run in parallele as they are updating the same register. Ex: LOAD_REG_IMMEDIATE(r1,THREAD_SIZE) in head_64.S results in: 3c 20 00 00 lis r1,0 60 21 00 00 ori r1,r1,0 78 21 07 c6 rldicr r1,r1,32,31 64 21 00 00 oris r1,r1,0 60 21 40 00 ori r1,r1,16384 Rewrite LOAD_REG_IMMEDIATE() with GAS macro in order to skip the parts that are NUL. Rename existing LOAD_REG_IMMEDIATE() as LOAD_REG_IMMEDIATE_SYM() and use that one for loading value of symbols which are not known at compile time. Now LOAD_REG_IMMEDIATE(r1,THREAD_SIZE) in head_64.S results in: 38 20 40 00 li r1,16384 Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/d60ce8dd3a383c7adbfc322bf1d53d81724a6000.1566311636.git.christophe.leroy@c-s.fr --- arch/powerpc/kernel/exceptions-64e.S | 10 +++++----- arch/powerpc/kernel/head_64.S | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 1cfb3da4a84a..898aae6da167 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -751,8 +751,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) ld r14,interrupt_base_book3e@got(r15) ld r15,__end_interrupts@got(r15) #else - LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e) - LOAD_REG_IMMEDIATE(r15,__end_interrupts) + LOAD_REG_IMMEDIATE_SYM(r14,interrupt_base_book3e) + LOAD_REG_IMMEDIATE_SYM(r15,__end_interrupts) #endif cmpld cr0,r10,r14 cmpld cr1,r10,r15 @@ -821,8 +821,8 @@ kernel_dbg_exc: ld r14,interrupt_base_book3e@got(r15) ld r15,__end_interrupts@got(r15) #else - LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e) - LOAD_REG_IMMEDIATE(r15,__end_interrupts) + LOAD_REG_IMMEDIATE_SYM(r14,interrupt_base_book3e) + LOAD_REG_IMMEDIATE_SYM(r15,__end_interrupts) #endif cmpld cr0,r10,r14 cmpld cr1,r10,r15 @@ -1449,7 +1449,7 @@ a2_tlbinit_code_start: a2_tlbinit_after_linear_map: /* Now we branch the new virtual address mapped by this entry */ - LOAD_REG_IMMEDIATE(r3,1f) + LOAD_REG_IMMEDIATE_SYM(r3,1f) mtctr r3 bctr diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index 91d297e696dd..1fd44761e997 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -635,7 +635,7 @@ __after_prom_start: sub r5,r5,r11 #else /* just copy interrupts */ - LOAD_REG_IMMEDIATE(r5, FIXED_SYMBOL_ABS_ADDR(__end_interrupts)) + LOAD_REG_IMMEDIATE_SYM(r5, FIXED_SYMBOL_ABS_ADDR(__end_interrupts)) #endif b 5f 3: -- cgit v1.2.3 From ba18025fb03306ccdf3557a1e7b8a5b39b474872 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Tue, 20 Aug 2019 14:34:13 +0000 Subject: powerpc/32: replace LOAD_MSR_KERNEL() by LOAD_REG_IMMEDIATE() LOAD_MSR_KERNEL() and LOAD_REG_IMMEDIATE() are doing the same thing in the same way. Drop LOAD_MSR_KERNEL() Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/8f04a6df0bc8949517fd8236d50c15008ccf9231.1566311636.git.christophe.leroy@c-s.fr --- arch/powerpc/kernel/entry_32.S | 18 +++++++++--------- arch/powerpc/kernel/head_32.h | 21 ++++----------------- 2 files changed, 13 insertions(+), 26 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 54fab22c9a43..972b05504a0a 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -230,7 +230,7 @@ transfer_to_handler_cont: */ lis r12,reenable_mmu@h ori r12,r12,reenable_mmu@l - LOAD_MSR_KERNEL(r0, MSR_KERNEL) + LOAD_REG_IMMEDIATE(r0, MSR_KERNEL) mtspr SPRN_SRR0,r12 mtspr SPRN_SRR1,r0 SYNC @@ -304,7 +304,7 @@ stack_ovf: addi r1,r1,THREAD_SIZE-STACK_FRAME_OVERHEAD lis r9,StackOverflow@ha addi r9,r9,StackOverflow@l - LOAD_MSR_KERNEL(r10,MSR_KERNEL) + LOAD_REG_IMMEDIATE(r10,MSR_KERNEL) #if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS) mtspr SPRN_NRI, r0 #endif @@ -324,7 +324,7 @@ trace_syscall_entry_irq_off: bl trace_hardirqs_on /* Now enable for real */ - LOAD_MSR_KERNEL(r10, MSR_KERNEL | MSR_EE) + LOAD_REG_IMMEDIATE(r10, MSR_KERNEL | MSR_EE) mtmsr r10 REST_GPR(0, r1) @@ -394,7 +394,7 @@ ret_from_syscall: #endif mr r6,r3 /* disable interrupts so current_thread_info()->flags can't change */ - LOAD_MSR_KERNEL(r10,MSR_KERNEL) /* doesn't include MSR_EE */ + LOAD_REG_IMMEDIATE(r10,MSR_KERNEL) /* doesn't include MSR_EE */ /* Note: We don't bother telling lockdep about it */ SYNC MTMSRD(r10) @@ -824,7 +824,7 @@ ret_from_except: * can't change between when we test it and when we return * from the interrupt. */ /* Note: We don't bother telling lockdep about it */ - LOAD_MSR_KERNEL(r10,MSR_KERNEL) + LOAD_REG_IMMEDIATE(r10,MSR_KERNEL) SYNC /* Some chip revs have problems here... */ MTMSRD(r10) /* disable interrupts */ @@ -991,7 +991,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX) * can restart the exception exit path at the label * exc_exit_restart below. -- paulus */ - LOAD_MSR_KERNEL(r10,MSR_KERNEL & ~MSR_RI) + LOAD_REG_IMMEDIATE(r10,MSR_KERNEL & ~MSR_RI) SYNC MTMSRD(r10) /* clear the RI bit */ .globl exc_exit_restart @@ -1066,7 +1066,7 @@ exc_exit_restart_end: REST_NVGPRS(r1); \ lwz r3,_MSR(r1); \ andi. r3,r3,MSR_PR; \ - LOAD_MSR_KERNEL(r10,MSR_KERNEL); \ + LOAD_REG_IMMEDIATE(r10,MSR_KERNEL); \ bne user_exc_return; \ lwz r0,GPR0(r1); \ lwz r2,GPR2(r1); \ @@ -1236,7 +1236,7 @@ recheck: * neither. Those disable/enable cycles used to peek at * TI_FLAGS aren't advertised. */ - LOAD_MSR_KERNEL(r10,MSR_KERNEL) + LOAD_REG_IMMEDIATE(r10,MSR_KERNEL) SYNC MTMSRD(r10) /* disable interrupts */ lwz r9,TI_FLAGS(r2) @@ -1329,7 +1329,7 @@ _GLOBAL(enter_rtas) lwz r4,RTASBASE(r4) mfmsr r9 stw r9,8(r1) - LOAD_MSR_KERNEL(r0,MSR_KERNEL) + LOAD_REG_IMMEDIATE(r0,MSR_KERNEL) SYNC /* disable interrupts so SRR0/1 */ MTMSRD(r0) /* don't get trashed */ li r9,MSR_KERNEL & ~(MSR_IR|MSR_DR) diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h index 4a692553651f..8abc7783dbe5 100644 --- a/arch/powerpc/kernel/head_32.h +++ b/arch/powerpc/kernel/head_32.h @@ -4,19 +4,6 @@ #include /* for STACK_FRAME_REGS_MARKER */ -/* - * MSR_KERNEL is > 0x8000 on 4xx/Book-E since it include MSR_CE. - */ -.macro __LOAD_MSR_KERNEL r, x -.if \x >= 0x8000 - lis \r, (\x)@h - ori \r, \r, (\x)@l -.else - li \r, (\x) -.endif -.endm -#define LOAD_MSR_KERNEL(r, x) __LOAD_MSR_KERNEL r, x - /* * Exception entry code. This code runs with address translation * turned off, i.e. using physical addresses. @@ -92,7 +79,7 @@ #ifdef CONFIG_40x rlwinm r9,r9,0,14,12 /* clear MSR_WE (necessary?) */ #else - LOAD_MSR_KERNEL(r10, MSR_KERNEL & ~(MSR_IR|MSR_DR)) /* can take exceptions */ + LOAD_REG_IMMEDIATE(r10, MSR_KERNEL & ~(MSR_IR|MSR_DR)) /* can take exceptions */ MTMSRD(r10) /* (except for mach check in rtas) */ #endif lis r10,STACK_FRAME_REGS_MARKER@ha /* exception frame marker */ @@ -140,10 +127,10 @@ * otherwise we might risk taking an interrupt before we tell lockdep * they are enabled. */ - LOAD_MSR_KERNEL(r10, MSR_KERNEL) + LOAD_REG_IMMEDIATE(r10, MSR_KERNEL) rlwimi r10, r9, 0, MSR_EE #else - LOAD_MSR_KERNEL(r10, MSR_KERNEL | MSR_EE) + LOAD_REG_IMMEDIATE(r10, MSR_KERNEL | MSR_EE) #endif #if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS) mtspr SPRN_NRI, r0 @@ -187,7 +174,7 @@ label: #define EXC_XFER_TEMPLATE(hdlr, trap, msr, tfer, ret) \ li r10,trap; \ stw r10,_TRAP(r11); \ - LOAD_MSR_KERNEL(r10, msr); \ + LOAD_REG_IMMEDIATE(r10, msr); \ bl tfer; \ .long hdlr; \ .long ret -- cgit v1.2.3 From d7fb5b18a540efaf05da2b980fc11d50ba775677 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Tue, 20 Aug 2019 14:34:14 +0000 Subject: powerpc/64: optimise LOAD_REG_IMMEDIATE_SYM() Optimise LOAD_REG_IMMEDIATE_SYM() using a temporary register to parallelise operations. It reduces the path from 5 to 3 instructions. Suggested-by: Segher Boessenkool Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/bad41ed02531bb0382420cbab50a0d7153b71767.1566311636.git.christophe.leroy@c-s.fr --- arch/powerpc/kernel/exceptions-64e.S | 22 +++++++++++++--------- arch/powerpc/kernel/head_64.S | 2 +- 2 files changed, 14 insertions(+), 10 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 898aae6da167..829950b96d29 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -750,12 +750,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) ld r15,PACATOC(r13) ld r14,interrupt_base_book3e@got(r15) ld r15,__end_interrupts@got(r15) -#else - LOAD_REG_IMMEDIATE_SYM(r14,interrupt_base_book3e) - LOAD_REG_IMMEDIATE_SYM(r15,__end_interrupts) -#endif cmpld cr0,r10,r14 cmpld cr1,r10,r15 +#else + LOAD_REG_IMMEDIATE_SYM(r14, r15, interrupt_base_book3e) + cmpld cr0, r10, r14 + LOAD_REG_IMMEDIATE_SYM(r14, r15, __end_interrupts) + cmpld cr1, r10, r14 +#endif blt+ cr0,1f bge+ cr1,1f @@ -820,12 +822,14 @@ kernel_dbg_exc: ld r15,PACATOC(r13) ld r14,interrupt_base_book3e@got(r15) ld r15,__end_interrupts@got(r15) -#else - LOAD_REG_IMMEDIATE_SYM(r14,interrupt_base_book3e) - LOAD_REG_IMMEDIATE_SYM(r15,__end_interrupts) -#endif cmpld cr0,r10,r14 cmpld cr1,r10,r15 +#else + LOAD_REG_IMMEDIATE_SYM(r14, r15, interrupt_base_book3e) + cmpld cr0, r10, r14 + LOAD_REG_IMMEDIATE_SYM(r14, r15,__end_interrupts) + cmpld cr1, r10, r14 +#endif blt+ cr0,1f bge+ cr1,1f @@ -1449,7 +1453,7 @@ a2_tlbinit_code_start: a2_tlbinit_after_linear_map: /* Now we branch the new virtual address mapped by this entry */ - LOAD_REG_IMMEDIATE_SYM(r3,1f) + LOAD_REG_IMMEDIATE_SYM(r3, r5, 1f) mtctr r3 bctr diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index 1fd44761e997..0f2d61af47cc 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -635,7 +635,7 @@ __after_prom_start: sub r5,r5,r11 #else /* just copy interrupts */ - LOAD_REG_IMMEDIATE_SYM(r5, FIXED_SYMBOL_ABS_ADDR(__end_interrupts)) + LOAD_REG_IMMEDIATE_SYM(r5, r11, FIXED_SYMBOL_ABS_ADDR(__end_interrupts)) #endif b 5f 3: -- cgit v1.2.3 From 63ce271b5e377deaddace4bac6dafb6e79d2bee4 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Mon, 26 Aug 2019 11:10:23 +0000 Subject: powerpc/prom: convert PROM_BUG() to standard trap Prior to commit 1bd98d7fbaf5 ("ppc64: Update BUG handling based on ppc32"), BUG() family was using BUG_ILLEGAL_INSTRUCTION which was an invalid instruction opcode to trap into program check exception. That commit converted them to using standard trap instructions, but prom/prom_init and their PROM_BUG() macro were left over. head_64.S and exception-64s.S were left aside as well. Convert them to using the standard BUG infrastructure. Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/cdaf4bbbb64c288a077845846f04b12683f8875a.1566817807.git.christophe.leroy@c-s.fr --- arch/powerpc/kernel/exceptions-64s.S | 3 ++- arch/powerpc/kernel/head_64.S | 6 ++++-- arch/powerpc/kernel/prom_init.c | 2 +- 3 files changed, 7 insertions(+), 4 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 6ba3cc2ef8ab..dded4672579d 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1467,7 +1467,8 @@ EXC_COMMON_BEGIN(fp_unavailable_common) RECONCILE_IRQ_STATE(r10, r11) addi r3,r1,STACK_FRAME_OVERHEAD bl kernel_fp_unavailable_exception - BUG_OPCODE +0: trap + EMIT_BUG_ENTRY 0b, __FILE__, __LINE__, 0 1: #ifdef CONFIG_PPC_TRANSACTIONAL_MEM BEGIN_FTR_SECTION diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index 0f2d61af47cc..ad79fddb974d 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -182,7 +182,8 @@ __secondary_hold: isync bctr #else - BUG_OPCODE +0: trap + EMIT_BUG_ENTRY 0b, __FILE__, __LINE__, 0 #endif CLOSE_FIXED_SECTION(first_256B) @@ -998,7 +999,8 @@ start_here_common: bl start_kernel /* Not reached */ - BUG_OPCODE + trap + EMIT_BUG_ENTRY 0b, __FILE__, __LINE__, 0 /* * We put a few things here that have to be page-aligned. diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 514707ef6779..f2b63b4e1943 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -94,7 +94,7 @@ static int of_workarounds __prombss; #define PROM_BUG() do { \ prom_printf("kernel BUG at %s line 0x%x!\n", \ __FILE__, __LINE__); \ - __asm__ __volatile__(".long " BUG_ILLEGAL_INSTR); \ + __builtin_trap(); \ } while (0) #ifdef DEBUG_PROM -- cgit v1.2.3 From a04565741284f695db4cfe5a3e61940d2259cb8f Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Wed, 21 Aug 2019 20:00:34 +0000 Subject: powerpc/8xx: drop unused self-modifying code alternative to FixupDAR. The code which fixups the DAR on TLB errors for dbcX instructions has a self-modifying code alternative that has never been used. Drop it. Signed-off-by: Christophe Leroy Reviewed-by: Joakim Tjernlund Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/b095e12c82fcba1ac4c09fc3b85d969f36614746.1566417610.git.christophe.leroy@c-s.fr --- arch/powerpc/kernel/head_8xx.S | 24 ------------------------ 1 file changed, 24 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 5ab9178c2347..5db461db63cc 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -574,8 +574,6 @@ InstructionBreakpoint: * by decoding the registers used by the dcbx instruction and adding them. * DAR is set to the calculated address. */ - /* define if you don't want to use self modifying code */ -#define NO_SELF_MODIFYING_CODE FixupDAR:/* Entry point for dcbx workaround. */ mtspr SPRN_M_TW, r10 /* fetch instruction from memory. */ @@ -639,27 +637,6 @@ FixupDAR:/* Entry point for dcbx workaround. */ rlwinm r10, r10,0,7,5 /* Clear store bit for buggy dcbst insn */ mtspr SPRN_DSISR, r10 142: /* continue, it was a dcbx, dcbi instruction. */ -#ifndef NO_SELF_MODIFYING_CODE - andis. r10,r11,0x1f /* test if reg RA is r0 */ - li r10,modified_instr@l - dcbtst r0,r10 /* touch for store */ - rlwinm r11,r11,0,0,20 /* Zero lower 10 bits */ - oris r11,r11,640 /* Transform instr. to a "add r10,RA,RB" */ - ori r11,r11,532 - stw r11,0(r10) /* store add/and instruction */ - dcbf 0,r10 /* flush new instr. to memory. */ - icbi 0,r10 /* invalidate instr. cache line */ - mfspr r11, SPRN_SPRG_SCRATCH1 /* restore r11 */ - mfspr r10, SPRN_SPRG_SCRATCH0 /* restore r10 */ - isync /* Wait until new instr is loaded from memory */ -modified_instr: - .space 4 /* this is where the add instr. is stored */ - bne+ 143f - subf r10,r0,r10 /* r10=r10-r0, only if reg RA is r0 */ -143: mtdar r10 /* store faulting EA in DAR */ - mfspr r10,SPRN_M_TW - b DARFixed /* Go back to normal TLB handling */ -#else mfctr r10 mtdar r10 /* save ctr reg in DAR */ rlwinm r10, r11, 24, 24, 28 /* offset into jump table for reg RB */ @@ -723,7 +700,6 @@ modified_instr: add r10, r10, r11 /* add it */ mfctr r11 /* restore r11 */ b 151b -#endif /* * This is where the main kernel code starts. -- cgit v1.2.3 From 3bbd2343734e44de92238eea1a5cd3ad32a6baf0 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Wed, 21 Aug 2019 10:20:51 +0000 Subject: powerpc/8xx: set STACK_END_MAGIC earlier on the init_stack Today, the STACK_END_MAGIC is set on init_stack in start_kernel(). To avoid a false 'Thread overran stack, or stack corrupted' message on early Oopses, setup STACK_END_MAGIC as soon as possible. Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/54f67bb7ac486c1350f2fa8905cd279f94b9dfb1.1566382841.git.christophe.leroy@c-s.fr --- arch/powerpc/kernel/head_8xx.S | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 5db461db63cc..19f583e18402 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -15,6 +15,7 @@ */ #include +#include #include #include #include @@ -717,6 +718,9 @@ start_here: /* stack */ lis r1,init_thread_union@ha addi r1,r1,init_thread_union@l + lis r0, STACK_END_MAGIC@h + ori r0, r0, STACK_END_MAGIC@l + stw r0, 0(r1) li r0,0 stwu r0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1) -- cgit v1.2.3 From 12c3f1fd87bf4e55f06d079a45d6f15e2f6f9750 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Mon, 26 Aug 2019 15:52:14 +0000 Subject: powerpc/32s: get rid of CPU_FTR_601 feature Now that 601 is exclusive from other 6xx, CPU_FTR_601 and associated fixups are useless. Drop this feature and use #ifdefs instead. Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/ecdb7194a17dbfa01865df6a82979533adc2c70b.1566834712.git.christophe.leroy@c-s.fr --- arch/powerpc/kernel/cputable.c | 6 ++++-- arch/powerpc/kernel/entry_32.S | 22 ++++++++++++++++------ 2 files changed, 20 insertions(+), 8 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c index bfe5f4a2886b..e745abc5457a 100644 --- a/arch/powerpc/kernel/cputable.c +++ b/arch/powerpc/kernel/cputable.c @@ -569,7 +569,7 @@ static struct cpu_spec __initdata cpu_specs[] = { #endif /* CONFIG_PPC_BOOK3S_64 */ #ifdef CONFIG_PPC32 -#ifdef CONFIG_PPC_BOOK3S_32 +#ifdef CONFIG_PPC_BOOK3S_601 { /* 601 */ .pvr_mask = 0xffff0000, .pvr_value = 0x00010000, @@ -583,6 +583,8 @@ static struct cpu_spec __initdata cpu_specs[] = { .machine_check = machine_check_generic, .platform = "ppc601", }, +#endif /* CONFIG_PPC_BOOK3S_601 */ +#ifdef CONFIG_PPC_BOOK3S_6xx { /* 603 */ .pvr_mask = 0xffff0000, .pvr_value = 0x00030000, @@ -1212,7 +1214,7 @@ static struct cpu_spec __initdata cpu_specs[] = { .machine_check = machine_check_generic, .platform = "ppc603", }, -#endif /* CONFIG_PPC_BOOK3S_32 */ +#endif /* CONFIG_PPC_BOOK3S_6xx */ #ifdef CONFIG_PPC_8xx { /* 8xx */ .pvr_mask = 0xffff0000, diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 972b05504a0a..d60908ea37fb 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -777,11 +777,19 @@ fast_exception_return: 1: lis r3,exc_exit_restart_end@ha addi r3,r3,exc_exit_restart_end@l cmplw r12,r3 +#if CONFIG_PPC_BOOK3S_601 + bge 2b +#else bge 3f +#endif lis r4,exc_exit_restart@ha addi r4,r4,exc_exit_restart@l cmplw r12,r4 +#if CONFIG_PPC_BOOK3S_601 + blt 2b +#else blt 3f +#endif lis r3,fee_restarts@ha tophys(r3,r3) lwz r5,fee_restarts@l(r3) @@ -800,9 +808,6 @@ fee_restarts: /* aargh, we don't know which trap this is */ /* but the 601 doesn't implement the RI bit, so assume it's OK */ 3: -BEGIN_FTR_SECTION - b 2b -END_FTR_SECTION_IFSET(CPU_FTR_601) li r10,-1 stw r10,_TRAP(r11) addi r3,r1,STACK_FRAME_OVERHEAD @@ -1270,11 +1275,19 @@ nonrecoverable: lis r10,exc_exit_restart_end@ha addi r10,r10,exc_exit_restart_end@l cmplw r12,r10 +#ifdef CONFIG_PPC_BOOK3S_601 + bgelr +#else bge 3f +#endif lis r11,exc_exit_restart@ha addi r11,r11,exc_exit_restart@l cmplw r12,r11 +#ifdef CONFIG_PPC_BOOK3S_601 + bltlr +#else blt 3f +#endif lis r10,ee_restarts@ha lwz r12,ee_restarts@l(r10) addi r12,r12,1 @@ -1283,9 +1296,6 @@ nonrecoverable: blr 3: /* OK, we can't recover, kill this process */ /* but the 601 doesn't implement the RI bit, so assume it's OK */ -BEGIN_FTR_SECTION - blr -END_FTR_SECTION_IFSET(CPU_FTR_601) lwz r3,_TRAP(r1) andi. r0,r3,1 beq 5f -- cgit v1.2.3 From 88fb309409ab454b497a6abb0f931ce3b6d9969c Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Mon, 26 Aug 2019 15:52:16 +0000 Subject: powerpc/32s: drop CPU_FTR_USE_RTC feature CPU_FTR_USE_RTC feature only applies to powerpc601. Drop this feature and replace it with tests on CONFIG_PPC_BOOK3S_601. Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/170411e2360861f4a95c21faad43519a08bc4040.1566834712.git.christophe.leroy@c-s.fr --- arch/powerpc/kernel/vdso.c | 22 ---------------------- arch/powerpc/kernel/vdso32/datapage.S | 2 ++ arch/powerpc/kernel/vdso32/vdso32.lds.S | 4 +++- 3 files changed, 5 insertions(+), 23 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index d60598113a9f..eae9ddaecbcf 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -94,28 +94,6 @@ static struct vdso_patch_def vdso_patches[] = { CPU_FTR_COHERENT_ICACHE, CPU_FTR_COHERENT_ICACHE, "__kernel_sync_dicache", "__kernel_sync_dicache_p5" }, -#ifdef CONFIG_PPC32 - { - CPU_FTR_USE_RTC, CPU_FTR_USE_RTC, - "__kernel_gettimeofday", NULL - }, - { - CPU_FTR_USE_RTC, CPU_FTR_USE_RTC, - "__kernel_clock_gettime", NULL - }, - { - CPU_FTR_USE_RTC, CPU_FTR_USE_RTC, - "__kernel_clock_getres", NULL - }, - { - CPU_FTR_USE_RTC, CPU_FTR_USE_RTC, - "__kernel_get_tbfreq", NULL - }, - { - CPU_FTR_USE_RTC, CPU_FTR_USE_RTC, - "__kernel_time", NULL - }, -#endif }; /* diff --git a/arch/powerpc/kernel/vdso32/datapage.S b/arch/powerpc/kernel/vdso32/datapage.S index 6984125b9fc0..6c7401bd284e 100644 --- a/arch/powerpc/kernel/vdso32/datapage.S +++ b/arch/powerpc/kernel/vdso32/datapage.S @@ -70,6 +70,7 @@ V_FUNCTION_END(__kernel_get_syscall_map) * * returns the timebase frequency in HZ */ +#ifndef CONFIG_PPC_BOOK3S_601 V_FUNCTION_BEGIN(__kernel_get_tbfreq) .cfi_startproc mflr r12 @@ -82,3 +83,4 @@ V_FUNCTION_BEGIN(__kernel_get_tbfreq) blr .cfi_endproc V_FUNCTION_END(__kernel_get_tbfreq) +#endif diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S b/arch/powerpc/kernel/vdso32/vdso32.lds.S index 099a6db14e67..00c025ba4a92 100644 --- a/arch/powerpc/kernel/vdso32/vdso32.lds.S +++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S @@ -144,10 +144,13 @@ VERSION __kernel_datapage_offset; __kernel_get_syscall_map; +#ifndef CONFIG_PPC_BOOK3S_601 __kernel_gettimeofday; __kernel_clock_gettime; __kernel_clock_getres; + __kernel_time; __kernel_get_tbfreq; +#endif __kernel_sync_dicache; __kernel_sync_dicache_p5; __kernel_sigtramp32; @@ -155,7 +158,6 @@ VERSION #ifdef CONFIG_PPC64 __kernel_getcpu; #endif - __kernel_time; local: *; }; -- cgit v1.2.3 From 39097b9c6d762d3fcd6f753e05ee3e34ec250ff3 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Mon, 26 Aug 2019 15:52:17 +0000 Subject: powerpc/32s: use CONFIG_PPC_BOOK3S_601 instead of reading PVR Use CONFIG_PPC_BOOK3S_601 instead of reading PVR to know if it is a 601 or not. Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/909c26db9facd7fe454695b303f952e019dd9eda.1566834712.git.christophe.leroy@c-s.fr --- arch/powerpc/kernel/head_32.S | 49 +++++++++++++++++++------------------------ arch/powerpc/kernel/misc_32.S | 6 ++---- 2 files changed, 24 insertions(+), 31 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index 9e6f01abb31e..4a24f8f026c7 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -34,7 +34,16 @@ #include "head_32.h" -/* 601 only have IBAT; cr0.eq is set on 601 when using this macro */ +/* 601 only have IBAT */ +#ifdef CONFIG_PPC_BOOK3S_601 +#define LOAD_BAT(n, reg, RA, RB) \ + li RA,0; \ + mtspr SPRN_IBAT##n##U,RA; \ + lwz RA,(n*16)+0(reg); \ + lwz RB,(n*16)+4(reg); \ + mtspr SPRN_IBAT##n##U,RA; \ + mtspr SPRN_IBAT##n##L,RB +#else #define LOAD_BAT(n, reg, RA, RB) \ /* see the comment for clear_bats() -- Cort */ \ li RA,0; \ @@ -44,12 +53,11 @@ lwz RB,(n*16)+4(reg); \ mtspr SPRN_IBAT##n##U,RA; \ mtspr SPRN_IBAT##n##L,RB; \ - beq 1f; \ lwz RA,(n*16)+8(reg); \ lwz RB,(n*16)+12(reg); \ mtspr SPRN_DBAT##n##U,RA; \ - mtspr SPRN_DBAT##n##L,RB; \ -1: + mtspr SPRN_DBAT##n##L,RB +#endif __HEAD .stabs "arch/powerpc/kernel/",N_SO,0,0,0f @@ -820,9 +828,6 @@ load_up_mmu: /* Load the BAT registers with the values set up by MMU_init. MMU_init takes care of whether we're on a 601 or not. */ - mfpvr r3 - srwi r3,r3,16 - cmpwi r3,1 lis r3,BATS@ha addi r3,r3,BATS@l tophys(r3,r3) @@ -998,11 +1003,8 @@ EXPORT_SYMBOL(switch_mmu_context) */ clear_bats: li r10,0 - mfspr r9,SPRN_PVR - rlwinm r9,r9,16,16,31 /* r9 = 1 for 601, 4 for 604 */ - cmpwi r9, 1 - beq 1f +#ifndef CONFIG_PPC_BOOK3S_601 mtspr SPRN_DBAT0U,r10 mtspr SPRN_DBAT0L,r10 mtspr SPRN_DBAT1U,r10 @@ -1011,7 +1013,7 @@ clear_bats: mtspr SPRN_DBAT2L,r10 mtspr SPRN_DBAT3U,r10 mtspr SPRN_DBAT3L,r10 -1: +#endif mtspr SPRN_IBAT0U,r10 mtspr SPRN_IBAT0L,r10 mtspr SPRN_IBAT1U,r10 @@ -1106,10 +1108,7 @@ mmu_off: */ initial_bats: lis r11,PAGE_OFFSET@h - mfspr r9,SPRN_PVR - rlwinm r9,r9,16,16,31 /* r9 = 1 for 601, 4 for 604 */ - cmpwi 0,r9,1 - bne 4f +#ifdef CONFIG_PPC_BOOK3S_601 ori r11,r11,4 /* set up BAT registers for 601 */ li r8,0x7f /* valid, block length = 8MB */ mtspr SPRN_IBAT0U,r11 /* N.B. 601 has valid bit in */ @@ -1122,10 +1121,8 @@ initial_bats: addis r8,r8,0x800000@h mtspr SPRN_IBAT2U,r11 mtspr SPRN_IBAT2L,r8 - isync - blr - -4: tophys(r8,r11) +#else + tophys(r8,r11) #ifdef CONFIG_SMP ori r8,r8,0x12 /* R/W access, M=1 */ #else @@ -1137,10 +1134,10 @@ initial_bats: mtspr SPRN_DBAT0U,r11 /* bit in upper BAT register */ mtspr SPRN_IBAT0L,r8 mtspr SPRN_IBAT0U,r11 +#endif isync blr - #ifdef CONFIG_BOOTX_TEXT setup_disp_bat: /* @@ -1155,15 +1152,13 @@ setup_disp_bat: beqlr lwz r11,0(r8) lwz r8,4(r8) - mfspr r9,SPRN_PVR - rlwinm r9,r9,16,16,31 /* r9 = 1 for 601, 4 for 604 */ - cmpwi 0,r9,1 - beq 1f +#ifndef CONFIG_PPC_BOOK3S_601 mtspr SPRN_DBAT3L,r8 mtspr SPRN_DBAT3U,r11 - blr -1: mtspr SPRN_IBAT3L,r8 +#else + mtspr SPRN_IBAT3L,r8 mtspr SPRN_IBAT3U,r11 +#endif blr #endif /* CONFIG_BOOTX_TEXT */ diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S index 02d90e1ebf65..b917641cdaa6 100644 --- a/arch/powerpc/kernel/misc_32.S +++ b/arch/powerpc/kernel/misc_32.S @@ -303,11 +303,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_UNIFIED_ID_CACHE) mfspr r3,SPRN_L1CSR1 ori r3,r3,L1CSR1_ICFI|L1CSR1_ICLFR mtspr SPRN_L1CSR1,r3 +#elif defined(CONFIG_PPC_BOOK3S_601) + blr /* for 601, do nothing */ #else - mfspr r3,SPRN_PVR - rlwinm r3,r3,16,16,31 - cmpwi 0,r3,1 - beqlr /* for 601, do nothing */ /* 603/604 processor - use invalidate-all bit in HID0 */ mfspr r3,SPRN_HID0 ori r3,r3,HID0_ICFI -- cgit v1.2.3 From e0291f1decd6e8d447067f7d2cf01b1091b7cb3f Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Mon, 26 Aug 2019 15:52:18 +0000 Subject: powerpc/32: drop CPU_FTR_UNIFIED_ID_CACHE Only 601 and e200 have unified I/D cache. Drop the feature and use CONFIG_PPC_BOOK3S_601 and CONFIG_E200. Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/b5902144266d2f4eed1ffea53915bd0245841e02.1566834712.git.christophe.leroy@c-s.fr --- arch/powerpc/kernel/misc_32.S | 4 ++-- arch/powerpc/kernel/setup_32.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S index b917641cdaa6..3d21fb110797 100644 --- a/arch/powerpc/kernel/misc_32.S +++ b/arch/powerpc/kernel/misc_32.S @@ -292,14 +292,14 @@ _GLOBAL(flush_instruction_cache) iccci 0,r3 #endif #elif defined(CONFIG_FSL_BOOKE) -BEGIN_FTR_SECTION +#ifdef CONFIG_E200 mfspr r3,SPRN_L1CSR0 ori r3,r3,L1CSR0_CFI|L1CSR0_CLFC /* msync; isync recommended here */ mtspr SPRN_L1CSR0,r3 isync blr -END_FTR_SECTION_IFSET(CPU_FTR_UNIFIED_ID_CACHE) +#endif mfspr r3,SPRN_L1CSR1 ori r3,r3,L1CSR1_ICFI|L1CSR1_ICLFR mtspr SPRN_L1CSR1,r3 diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c index 94517e4a2723..a7541edf0cdb 100644 --- a/arch/powerpc/kernel/setup_32.c +++ b/arch/powerpc/kernel/setup_32.c @@ -206,6 +206,6 @@ __init void initialize_cache_info(void) dcache_bsize = cur_cpu_spec->dcache_bsize; icache_bsize = cur_cpu_spec->icache_bsize; ucache_bsize = 0; - if (cpu_has_feature(CPU_FTR_UNIFIED_ID_CACHE)) + if (IS_ENABLED(CONFIG_PPC_BOOK3S_601) || IS_ENABLED(CONFIG_E200)) ucache_bsize = icache_bsize = dcache_bsize; } -- cgit v1.2.3 From c7bf1252d5b3891b4ab7072b240a8422fb9da793 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Mon, 26 Aug 2019 15:52:19 +0000 Subject: powerpc/32: don't use CPU_FTR_COHERENT_ICACHE Only 601 and E200 have CPU_FTR_COHERENT_ICACHE. Just use #ifdefs instead of feature fixup. Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/5f3e92ccd64d06477b27626f6007a9da3b8da157.1566834712.git.christophe.leroy@c-s.fr --- arch/powerpc/kernel/misc_32.S | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S index 3d21fb110797..82df4b09e79f 100644 --- a/arch/powerpc/kernel/misc_32.S +++ b/arch/powerpc/kernel/misc_32.S @@ -324,10 +324,10 @@ EXPORT_SYMBOL(flush_instruction_cache) * flush_icache_range(unsigned long start, unsigned long stop) */ _GLOBAL(flush_icache_range) -BEGIN_FTR_SECTION +#if defined(CONFIG_PPC_BOOK3S_601) || defined(CONFIG_E200) PURGE_PREFETCHED_INS - blr /* for 601, do nothing */ -END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE) + blr /* for 601 and e200, do nothing */ +#else rlwinm r3,r3,0,0,31 - L1_CACHE_SHIFT subf r4,r3,r4 addi r4,r4,L1_CACHE_BYTES - 1 @@ -353,6 +353,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE) sync /* additional sync needed on g4 */ isync blr +#endif _ASM_NOKPROBE_SYMBOL(flush_icache_range) EXPORT_SYMBOL(flush_icache_range) @@ -360,15 +361,15 @@ EXPORT_SYMBOL(flush_icache_range) * Flush a particular page from the data cache to RAM. * Note: this is necessary because the instruction cache does *not* * snoop from the data cache. - * This is a no-op on the 601 which has a unified cache. + * This is a no-op on the 601 and e200 which have a unified cache. * * void __flush_dcache_icache(void *page) */ _GLOBAL(__flush_dcache_icache) -BEGIN_FTR_SECTION +#if defined(CONFIG_PPC_BOOK3S_601) || defined(CONFIG_E200) PURGE_PREFETCHED_INS blr -END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE) +#else rlwinm r3,r3,0,0,31-PAGE_SHIFT /* Get page base address */ li r4,PAGE_SIZE/L1_CACHE_BYTES /* Number of lines in a page */ mtctr r4 @@ -396,6 +397,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_44x) sync isync blr +#endif #ifndef CONFIG_BOOKE /* @@ -407,10 +409,10 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_44x) * void __flush_dcache_icache_phys(unsigned long physaddr) */ _GLOBAL(__flush_dcache_icache_phys) -BEGIN_FTR_SECTION +#if defined(CONFIG_PPC_BOOK3S_601) || defined(CONFIG_E200) PURGE_PREFETCHED_INS - blr /* for 601, do nothing */ -END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE) + blr /* for 601 and e200, do nothing */ +#else mfmsr r10 rlwinm r0,r10,0,28,26 /* clear DR */ mtmsr r0 @@ -431,6 +433,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE) mtmsr r10 /* restore DR */ isync blr +#endif #endif /* CONFIG_BOOKE */ /* -- cgit v1.2.3 From facd04a904ff6cdc6ee85d6e85d500f478a1bec4 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Tue, 27 Aug 2019 13:30:06 +1000 Subject: powerpc: convert to copy_thread_tls Commit 3033f14ab78c3 ("clone: support passing tls argument via C rather than pt_regs magic") introduced the HAVE_COPY_THREAD_TLS option. Use it to avoid a subtle assumption about the argument ordering of clone type syscalls. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190827033010.28090-2-npiggin@gmail.com --- arch/powerpc/kernel/process.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 8fc4de0d22b4..24621e7e5033 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1600,8 +1600,9 @@ static void setup_ksp_vsid(struct task_struct *p, unsigned long sp) /* * Copy architecture-specific thread state */ -int copy_thread(unsigned long clone_flags, unsigned long usp, - unsigned long kthread_arg, struct task_struct *p) +int copy_thread_tls(unsigned long clone_flags, unsigned long usp, + unsigned long kthread_arg, struct task_struct *p, + unsigned long tls) { struct pt_regs *childregs, *kregs; extern void ret_from_fork(void); @@ -1642,10 +1643,10 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, if (clone_flags & CLONE_SETTLS) { #ifdef CONFIG_PPC64 if (!is_32bit_task()) - childregs->gpr[13] = childregs->gpr[6]; + childregs->gpr[13] = tls; else #endif - childregs->gpr[2] = childregs->gpr[6]; + childregs->gpr[2] = tls; } f = ret_from_fork; -- cgit v1.2.3 From 555e28179d37e431e97d78d106fc917bec2c6f93 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Tue, 27 Aug 2019 13:30:07 +1000 Subject: powerpc/64: remove support for kernel-mode syscalls There is support for the kernel to execute the 'sc 0' instruction and make a system call to itself. This is a relic that is unused in the tree, therefore untested. It's also highly questionable for modules to be doing this. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190827033010.28090-3-npiggin@gmail.com --- arch/powerpc/kernel/entry_64.S | 21 ++++++--------------- arch/powerpc/kernel/exceptions-64s.S | 2 -- 2 files changed, 6 insertions(+), 17 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 0a0b5310f54a..6467bdab8d40 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -69,24 +69,20 @@ BEGIN_FTR_SECTION bne .Ltabort_syscall END_FTR_SECTION_IFSET(CPU_FTR_TM) #endif - andi. r10,r12,MSR_PR mr r10,r1 - addi r1,r1,-INT_FRAME_SIZE - beq- 1f ld r1,PACAKSAVE(r13) -1: std r10,0(r1) + std r10,0(r1) std r11,_NIP(r1) std r12,_MSR(r1) std r0,GPR0(r1) std r10,GPR1(r1) - beq 2f /* if from kernel mode */ #ifdef CONFIG_PPC_FSL_BOOK3E START_BTB_FLUSH_SECTION BTB_FLUSH(r10) END_BTB_FLUSH_SECTION #endif ACCOUNT_CPU_USER_ENTRY(r13, r10, r11) -2: std r2,GPR2(r1) + std r2,GPR2(r1) std r3,GPR3(r1) mfcr r2 std r4,GPR4(r1) @@ -122,14 +118,13 @@ END_BTB_FLUSH_SECTION #if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC_SPLPAR) BEGIN_FW_FTR_SECTION - beq 33f - /* if from user, see if there are any DTL entries to process */ + /* see if there are any DTL entries to process */ ld r10,PACALPPACAPTR(r13) /* get ptr to VPA */ ld r11,PACA_DTL_RIDX(r13) /* get log read index */ addi r10,r10,LPPACA_DTLIDX LDX_BE r10,0,r10 /* get log write index */ - cmpd cr1,r11,r10 - beq+ cr1,33f + cmpd r11,r10 + beq+ 33f bl accumulate_stolen_time REST_GPR(0,r1) REST_4GPRS(3,r1) @@ -203,6 +198,7 @@ system_call: /* label this so stack traces look sane */ mtctr r12 bctrl /* Call handler */ + /* syscall_exit can exit to kernel mode, via ret_from_kernel_thread */ .Lsyscall_exit: std r3,RESULT(r1) @@ -216,11 +212,6 @@ system_call: /* label this so stack traces look sane */ ld r12, PACA_THREAD_INFO(r13) ld r8,_MSR(r1) -#ifdef CONFIG_PPC_BOOK3S - /* No MSR:RI on BookE */ - andi. r10,r8,MSR_RI - beq- .Lunrecov_restore -#endif /* * This is a few instructions into the actual syscall exit path (which actually diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index dded4672579d..520804351601 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1522,8 +1522,6 @@ EXC_COMMON(trap_0b_common, 0xb00, unknown_exception) * system call / hypercall (0xc00, 0x4c00) * * The system call exception is invoked with "sc 0" and does not alter HV bit. - * There is support for kernel code to invoke system calls but there are no - * in-tree users. * * The hypercall is invoked with "sc 1" and sets HV=1. * -- cgit v1.2.3 From bc605cd79edb68131d3be5b00b949aa312277d39 Mon Sep 17 00:00:00 2001 From: Alexey Kardashevskiy Date: Thu, 29 Aug 2019 18:44:17 +1000 Subject: powerpc/of/pci: Rewrite pci_parse_of_flags The existing code uses bunch of hardcoded values from the PCI Bus Binding to IEEE Std 1275 spec; and it does so in quite non-obvious way. This defines fields from the cell#0 of the "reg" property of a PCI device and uses them for parsing. This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy [mpe: Unsplit some 80/81 char lines, space the code with some newlines] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190829084417.71873-1-aik@ozlabs.ru --- arch/powerpc/kernel/pci_of_scan.c | 66 ++++++++++++++++++++++++++++++++------- 1 file changed, 55 insertions(+), 11 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c index 409c6c1beabf..f91d7e94872e 100644 --- a/arch/powerpc/kernel/pci_of_scan.c +++ b/arch/powerpc/kernel/pci_of_scan.c @@ -34,31 +34,75 @@ static u32 get_int_prop(struct device_node *np, const char *name, u32 def) * pci_parse_of_flags - Parse the flags cell of a device tree PCI address * @addr0: value of 1st cell of a device tree PCI address. * @bridge: Set this flag if the address is from a bridge 'ranges' property + * + * PCI Bus Binding to IEEE Std 1275-1994 + * + * Bit# 33222222 22221111 11111100 00000000 + * 10987654 32109876 54321098 76543210 + * phys.hi cell: npt000ss bbbbbbbb dddddfff rrrrrrrr + * phys.mid cell: hhhhhhhh hhhhhhhh hhhhhhhh hhhhhhhh + * phys.lo cell: llllllll llllllll llllllll llllllll + * + * where: + * n is 0 if the address is relocatable, 1 otherwise + * p is 1 if the addressable region is "prefetchable", 0 otherwise + * t is 1 if the address is aliased (for non-relocatable I/O), + * below 1 MB (for Memory),or below 64 KB (for relocatable I/O). + * ss is the space code, denoting the address space: + * 00 denotes Configuration Space + * 01 denotes I/O Space + * 10 denotes 32-bit-address Memory Space + * 11 denotes 64-bit-address Memory Space + * bbbbbbbb is the 8-bit Bus Number + * ddddd is the 5-bit Device Number + * fff is the 3-bit Function Number + * rrrrrrrr is the 8-bit Register Number */ +#define OF_PCI_ADDR0_SPACE(ss) (((ss)&3)<<24) +#define OF_PCI_ADDR0_SPACE_CFG OF_PCI_ADDR0_SPACE(0) +#define OF_PCI_ADDR0_SPACE_IO OF_PCI_ADDR0_SPACE(1) +#define OF_PCI_ADDR0_SPACE_MMIO32 OF_PCI_ADDR0_SPACE(2) +#define OF_PCI_ADDR0_SPACE_MMIO64 OF_PCI_ADDR0_SPACE(3) +#define OF_PCI_ADDR0_SPACE_MASK OF_PCI_ADDR0_SPACE(3) +#define OF_PCI_ADDR0_RELOC (1UL<<31) +#define OF_PCI_ADDR0_PREFETCH (1UL<<30) +#define OF_PCI_ADDR0_ALIAS (1UL<<29) +#define OF_PCI_ADDR0_BUS 0x00FF0000UL +#define OF_PCI_ADDR0_DEV 0x0000F800UL +#define OF_PCI_ADDR0_FN 0x00000700UL +#define OF_PCI_ADDR0_BARREG 0x000000FFUL + unsigned int pci_parse_of_flags(u32 addr0, int bridge) { - unsigned int flags = 0; + unsigned int flags = 0, as = addr0 & OF_PCI_ADDR0_SPACE_MASK; - if (addr0 & 0x02000000) { + if (as == OF_PCI_ADDR0_SPACE_MMIO32 || as == OF_PCI_ADDR0_SPACE_MMIO64) { flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY; - flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64; - if (flags & PCI_BASE_ADDRESS_MEM_TYPE_64) - flags |= IORESOURCE_MEM_64; - flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M; - if (addr0 & 0x40000000) - flags |= IORESOURCE_PREFETCH - | PCI_BASE_ADDRESS_MEM_PREFETCH; + + if (as == OF_PCI_ADDR0_SPACE_MMIO64) + flags |= PCI_BASE_ADDRESS_MEM_TYPE_64 | IORESOURCE_MEM_64; + + if (addr0 & OF_PCI_ADDR0_ALIAS) + flags |= PCI_BASE_ADDRESS_MEM_TYPE_1M; + + if (addr0 & OF_PCI_ADDR0_PREFETCH) + flags |= IORESOURCE_PREFETCH | + PCI_BASE_ADDRESS_MEM_PREFETCH; + /* Note: We don't know whether the ROM has been left enabled * by the firmware or not. We mark it as disabled (ie, we do * not set the IORESOURCE_ROM_ENABLE flag) for now rather than * do a config space read, it will be force-enabled if needed */ - if (!bridge && (addr0 & 0xff) == 0x30) + if (!bridge && (addr0 & OF_PCI_ADDR0_BARREG) == PCI_ROM_ADDRESS) flags |= IORESOURCE_READONLY; - } else if (addr0 & 0x01000000) + + } else if (as == OF_PCI_ADDR0_SPACE_IO) flags = IORESOURCE_IO | PCI_BASE_ADDRESS_SPACE_IO; + if (flags) flags |= IORESOURCE_SIZEALIGN; + return flags; } -- cgit v1.2.3 From 35872480da47ec714fd9c4f2f3d2d83daf304851 Mon Sep 17 00:00:00 2001 From: Alexey Kardashevskiy Date: Thu, 29 Aug 2019 18:52:48 +1000 Subject: powerpc/powernv/ioda: Split out TCE invalidation from TCE updates At the moment updates in a TCE table are made by iommu_table_ops::exchange which update one TCE and invalidates an entry in the PHB/NPU TCE cache via set of registers called "TCE Kill" (hence the naming). Writing a TCE is a simple xchg() but invalidating the TCE cache is a relatively expensive OPAL call. Mapping a 100GB guest with PCI+NPU passed through devices takes about 20s. Thankfully we can do better. Since such big mappings happen at the boot time and when memory is plugged/onlined (i.e. not often), these requests come in 512 pages so we call call OPAL 512 times less which brings 20s from the above to less than 10s. Also, since TCE caches can be flushed entirely, calling OPAL for 512 TCEs helps skiboot [1] to decide whether to flush the entire cache or not. This implements 2 new iommu_table_ops callbacks: - xchg_no_kill() to update a single TCE with no TCE invalidation; - tce_kill() to invalidate multiple TCEs. This uses the same xchg_no_kill() callback for IODA1/2. This implements 2 new wrappers on top of the new callbacks similar to the existing iommu_tce_xchg(). This does not use the new callbacks yet, the next patches will; so this should not cause any behavioral change. Signed-off-by: Alexey Kardashevskiy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190829085252.72370-2-aik@ozlabs.ru --- arch/powerpc/kernel/iommu.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 0a67ce9f827e..145f29cf7e4c 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1005,6 +1005,33 @@ long iommu_tce_xchg(struct mm_struct *mm, struct iommu_table *tbl, } EXPORT_SYMBOL_GPL(iommu_tce_xchg); +extern long iommu_tce_xchg_no_kill(struct mm_struct *mm, + struct iommu_table *tbl, + unsigned long entry, unsigned long *hpa, + enum dma_data_direction *direction) +{ + long ret; + unsigned long size = 0; + + ret = tbl->it_ops->xchg_no_kill(tbl, entry, hpa, direction, false); + if (!ret && ((*direction == DMA_FROM_DEVICE) || + (*direction == DMA_BIDIRECTIONAL)) && + !mm_iommu_is_devmem(mm, *hpa, tbl->it_page_shift, + &size)) + SetPageDirty(pfn_to_page(*hpa >> PAGE_SHIFT)); + + return ret; +} +EXPORT_SYMBOL_GPL(iommu_tce_xchg_no_kill); + +void iommu_tce_kill(struct iommu_table *tbl, + unsigned long entry, unsigned long pages) +{ + if (tbl->it_ops->tce_kill) + tbl->it_ops->tce_kill(tbl, entry, pages, false); +} +EXPORT_SYMBOL_GPL(iommu_tce_kill); + int iommu_take_ownership(struct iommu_table *tbl) { unsigned long flags, i, sz = (tbl->it_size + 7) >> 3; -- cgit v1.2.3 From a102f139aac54689eeb05883952742ae780159f3 Mon Sep 17 00:00:00 2001 From: Alexey Kardashevskiy Date: Thu, 29 Aug 2019 18:52:52 +1000 Subject: powerpc/powernv/ioda: Remove obsolete iommu_table_ops::exchange callbacks As now we have xchg_no_kill/tce_kill, these are not used anymore so remove them. Signed-off-by: Alexey Kardashevskiy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190829085252.72370-6-aik@ozlabs.ru --- arch/powerpc/kernel/iommu.c | 26 +------------------------- 1 file changed, 1 insertion(+), 25 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 145f29cf7e4c..bf803000e4b3 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -981,30 +981,6 @@ int iommu_tce_check_gpa(unsigned long page_shift, unsigned long gpa) } EXPORT_SYMBOL_GPL(iommu_tce_check_gpa); -long iommu_tce_xchg(struct mm_struct *mm, struct iommu_table *tbl, - unsigned long entry, unsigned long *hpa, - enum dma_data_direction *direction) -{ - long ret; - unsigned long size = 0; - - ret = tbl->it_ops->exchange(tbl, entry, hpa, direction); - - if (!ret && ((*direction == DMA_FROM_DEVICE) || - (*direction == DMA_BIDIRECTIONAL)) && - !mm_iommu_is_devmem(mm, *hpa, tbl->it_page_shift, - &size)) - SetPageDirty(pfn_to_page(*hpa >> PAGE_SHIFT)); - - /* if (unlikely(ret)) - pr_err("iommu_tce: %s failed on hwaddr=%lx ioba=%lx kva=%lx ret=%d\n", - __func__, hwaddr, entry << tbl->it_page_shift, - hwaddr, ret); */ - - return ret; -} -EXPORT_SYMBOL_GPL(iommu_tce_xchg); - extern long iommu_tce_xchg_no_kill(struct mm_struct *mm, struct iommu_table *tbl, unsigned long entry, unsigned long *hpa, @@ -1044,7 +1020,7 @@ int iommu_take_ownership(struct iommu_table *tbl) * requires exchange() callback defined so if it is not * implemented, we disallow taking ownership over the table. */ - if (!tbl->it_ops->exchange) + if (!tbl->it_ops->xchg_no_kill) return -EINVAL; spin_lock_irqsave(&tbl->large_pool.lock, flags); -- cgit v1.2.3 From 70ed86f4de5bd74dd2d884dcd2f3275c4cfe665f Mon Sep 17 00:00:00 2001 From: Claudio Carvalho Date: Thu, 29 Aug 2019 12:50:20 -0300 Subject: powerpc: Add PowerPC Capabilities ELF note Add the PowerPC name and the PPC_ELFNOTE_CAPABILITIES type in the kernel binary ELF note. This type is a bitmap that can be used to advertise kernel capabilities to userland. This patch also defines PPCCAP_ULTRAVISOR_BIT as being the bit zero. Suggested-by: Paul Mackerras Signed-off-by: Claudio Carvalho [ maxiwell: Define the 'PowerPC' type in the elfnote.h ] Signed-off-by: Maxiwell S. Garcia Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190829155021.2915-2-maxiwell@linux.ibm.com --- arch/powerpc/kernel/Makefile | 2 +- arch/powerpc/kernel/note.S | 40 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/kernel/note.S (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index ea0c69236789..d4eb50de13b1 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -53,7 +53,7 @@ obj-y := cputable.o ptrace.o syscalls.o \ dma-common.o obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \ signal_64.o ptrace32.o \ - paca.o nvram_64.o firmware.o + paca.o nvram_64.o firmware.o note.o obj-$(CONFIG_VDSO32) += vdso32/ obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o diff --git a/arch/powerpc/kernel/note.S b/arch/powerpc/kernel/note.S new file mode 100644 index 000000000000..bcdad15395dd --- /dev/null +++ b/arch/powerpc/kernel/note.S @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * PowerPC ELF notes. + * + * Copyright 2019, IBM Corporation + */ + +#include +#include + +/* + * Ultravisor-capable bit (PowerNV only). + * + * Bit 0 indicates that the powerpc kernel binary knows how to run in an + * ultravisor-enabled system. + * + * In an ultravisor-enabled system, some machine resources are now controlled + * by the ultravisor. If the kernel is not ultravisor-capable, but it ends up + * being run on a machine with ultravisor, the kernel will probably crash + * trying to access ultravisor resources. For instance, it may crash in early + * boot trying to set the partition table entry 0. + * + * In an ultravisor-enabled system, a bootloader could warn the user or prevent + * the kernel from being run if the PowerPC ultravisor capability doesn't exist + * or the Ultravisor-capable bit is not set. + */ +#ifdef CONFIG_PPC_POWERNV +#define PPCCAP_ULTRAVISOR_BIT (1 << 0) +#else +#define PPCCAP_ULTRAVISOR_BIT 0 +#endif + +/* + * Add the PowerPC Capabilities in the binary ELF note. It is a bitmap that + * can be used to advertise kernel capabilities to userland. + */ +#define PPC_CAPABILITIES_BITMAP (PPCCAP_ULTRAVISOR_BIT) + +ELFNOTE(PowerPC, PPC_ELFNOTE_CAPABILITIES, + .long PPC_CAPABILITIES_BITMAP) -- cgit v1.2.3 From a49dddbdb0cca1d00fc9251e543a0aac09a6a65b Mon Sep 17 00:00:00 2001 From: Claudio Carvalho Date: Thu, 22 Aug 2019 00:48:33 -0300 Subject: powerpc/kernel: Add ucall_norets() ultravisor call handler The ultracalls (ucalls for short) allow the Secure Virtual Machines (SVM)s and hypervisor to request services from the ultravisor such as accessing a register or memory region that can only be accessed when running in ultravisor-privileged mode. This patch adds the ucall_norets() ultravisor call handler. The specific service needed from an ucall is specified in register R3 (the first parameter to the ucall). Other parameters to the ucall, if any, are specified in registers R4 through R12. Return value of all ucalls is in register R3. Other output values from the ucall, if any, are returned in registers R4 through R12. Each ucall returns specific error codes, applicable in the context of the ucall. However, like with the PowerPC Architecture Platform Reference (PAPR), if no specific error code is defined for a particular situation, then the ucall will fallback to an erroneous parameter-position based code. i.e U_PARAMETER, U_P2, U_P3 etc depending on the ucall parameter that may have caused the error. Every host kernel (powernv) needs to be able to do ucalls in case it ends up being run in a machine with ultravisor enabled. Otherwise, the kernel may crash early in boot trying to access ultravisor resources, for instance, trying to set the partition table entry 0. Secure guests also need to be able to do ucalls and its kernel may not have CONFIG_PPC_POWERNV=y. For that reason, the ucall.S file is placed under arch/powerpc/kernel. If ultravisor is not enabled, the ucalls will be redirected to the hypervisor which must handle/fail the call. Thanks to inputs from Ram Pai and Michael Anderson. Signed-off-by: Claudio Carvalho Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190822034838.27876-3-cclaudio@linux.ibm.com --- arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/ucall.S | 14 ++++++++++++++ 2 files changed, 15 insertions(+) create mode 100644 arch/powerpc/kernel/ucall.S (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index d4eb50de13b1..934e64b28894 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -156,6 +156,7 @@ endif obj-$(CONFIG_EPAPR_PARAVIRT) += epapr_paravirt.o epapr_hcalls.o obj-$(CONFIG_KVM_GUEST) += kvm.o kvm_emul.o +obj-$(CONFIG_PPC_POWERNV) += ucall.o # Disable GCOV, KCOV & sanitizers in odd or sensitive code GCOV_PROFILE_prom_init.o := n diff --git a/arch/powerpc/kernel/ucall.S b/arch/powerpc/kernel/ucall.S new file mode 100644 index 000000000000..07296bc39166 --- /dev/null +++ b/arch/powerpc/kernel/ucall.S @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Generic code to perform an ultravisor call. + * + * Copyright 2019, IBM Corporation. + * + */ +#include +#include + +_GLOBAL(ucall_norets) +EXPORT_SYMBOL_GPL(ucall_norets) + sc 2 /* Invoke the ultravisor */ + blr /* Return r3 = status */ -- cgit v1.2.3 From bb04ffe85eebebd64d5e673a9434d968e80f3aa1 Mon Sep 17 00:00:00 2001 From: Claudio Carvalho Date: Thu, 22 Aug 2019 00:48:34 -0300 Subject: powerpc/powernv: Introduce FW_FEATURE_ULTRAVISOR In PEF enabled systems, some of the resources which were previously hypervisor privileged are now ultravisor privileged and controlled by the ultravisor firmware. This adds FW_FEATURE_ULTRAVISOR to indicate if PEF is enabled. The host kernel can use FW_FEATURE_ULTRAVISOR, for instance, to skip accessing resources (e.g. PTCR and LDBAR) in case PEF is enabled. Signed-off-by: Claudio Carvalho [ andmike: Device node name to "ibm,ultravisor" ] Signed-off-by: Michael Anderson Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190822034838.27876-4-cclaudio@linux.ibm.com --- arch/powerpc/kernel/prom.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index 7159e791a70d..5828f1c81dc9 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -55,6 +55,7 @@ #include #include #include +#include #include @@ -702,6 +703,9 @@ void __init early_init_devtree(void *params) #ifdef CONFIG_PPC_POWERNV /* Some machines might need OPAL info for debugging, grab it now. */ of_scan_flat_dt(early_init_dt_scan_opal, NULL); + + /* Scan tree for ultravisor feature */ + of_scan_flat_dt(early_init_dt_scan_ultravisor, NULL); #endif #ifdef CONFIG_FA_DUMP -- cgit v1.2.3 From 6c85b7bc637b64e681760f62c0eafba2f56745c6 Mon Sep 17 00:00:00 2001 From: Sukadev Bhattiprolu Date: Thu, 22 Aug 2019 00:48:38 -0300 Subject: powerpc/kvm: Use UV_RETURN ucall to return to ultravisor When an SVM makes an hypercall or incurs some other exception, the Ultravisor usually forwards (a.k.a. reflects) the exceptions to the Hypervisor. After processing the exception, Hypervisor uses the UV_RETURN ultracall to return control back to the SVM. The expected register state on entry to this ultracall is: * Non-volatile registers are restored to their original values. * If returning from an hypercall, register R0 contains the return value (unlike other ultracalls) and, registers R4 through R12 contain any output values of the hypercall. * R3 contains the ultracall number, i.e UV_RETURN. * If returning with a synthesized interrupt, R2 contains the synthesized interrupt number. Thanks to input from Paul Mackerras, Ram Pai and Mike Anderson. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Claudio Carvalho Acked-by: Paul Mackerras Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190822034838.27876-8-cclaudio@linux.ibm.com --- arch/powerpc/kernel/asm-offsets.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 4ccb6b3a7fbd..484f54dab247 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -506,6 +506,7 @@ int main(void) OFFSET(KVM_VRMA_SLB_V, kvm, arch.vrma_slb_v); OFFSET(KVM_RADIX, kvm, arch.radix); OFFSET(KVM_FWNMI, kvm, arch.fwnmi_enabled); + OFFSET(KVM_SECURE_GUEST, kvm, arch.secure_guest); OFFSET(VCPU_DSISR, kvm_vcpu, arch.shregs.dsisr); OFFSET(VCPU_DAR, kvm_vcpu, arch.shregs.dar); OFFSET(VCPU_VPA, kvm_vcpu, arch.vpa.pinned_addr); -- cgit v1.2.3 From 136bc0397ae21dbf63ca02e5775ad353a479cd2f Mon Sep 17 00:00:00 2001 From: Thiago Jung Bauermann Date: Mon, 19 Aug 2019 23:13:12 -0300 Subject: powerpc/pseries: Introduce option to build secure virtual machines Introduce CONFIG_PPC_SVM to control support for secure guests and include Ultravisor-related helpers when it is selected Signed-off-by: Thiago Jung Bauermann Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190820021326.6884-3-bauerman@linux.ibm.com --- arch/powerpc/kernel/Makefile | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 934e64b28894..c6ae0e7914bc 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -156,7 +156,9 @@ endif obj-$(CONFIG_EPAPR_PARAVIRT) += epapr_paravirt.o epapr_hcalls.o obj-$(CONFIG_KVM_GUEST) += kvm.o kvm_emul.o -obj-$(CONFIG_PPC_POWERNV) += ucall.o +ifneq ($(CONFIG_PPC_POWERNV)$(CONFIG_PPC_SVM),) +obj-y += ucall.o +endif # Disable GCOV, KCOV & sanitizers in odd or sensitive code GCOV_PROFILE_prom_init.o := n -- cgit v1.2.3 From 6a9c930bd7751bf0630d8b9b73b07af5c6842da6 Mon Sep 17 00:00:00 2001 From: Ram Pai Date: Mon, 19 Aug 2019 23:13:14 -0300 Subject: powerpc/prom_init: Add the ESM call to prom_init Make the Enter-Secure-Mode (ESM) ultravisor call to switch the VM to secure mode. Pass kernel base address and FDT address so that the Ultravisor is able to verify the integrity of the VM using information from the ESM blob. Add "svm=" command line option to turn on switching to secure mode. Signed-off-by: Ram Pai [ andmike: Generate an RTAS os-term hcall when the ESM ucall fails. ] Signed-off-by: Michael Anderson [ bauerman: Cleaned up the code a bit. ] Signed-off-by: Thiago Jung Bauermann Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190820021326.6884-5-bauerman@linux.ibm.com --- arch/powerpc/kernel/prom_init.c | 96 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index f2b63b4e1943..a4e7762dd286 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -40,6 +40,7 @@ #include #include #include +#include #include @@ -171,6 +172,10 @@ static bool __prombss prom_radix_disable; static bool __prombss prom_xive_disable; #endif +#ifdef CONFIG_PPC_SVM +static bool __prombss prom_svm_enable; +#endif + struct platform_support { bool hash_mmu; bool radix_mmu; @@ -812,6 +817,17 @@ static void __init early_cmdline_parse(void) prom_debug("XIVE disabled from cmdline\n"); } #endif /* CONFIG_PPC_PSERIES */ + +#ifdef CONFIG_PPC_SVM + opt = prom_strstr(prom_cmd_line, "svm="); + if (opt) { + bool val; + + opt += sizeof("svm=") - 1; + if (!prom_strtobool(opt, &val)) + prom_svm_enable = val; + } +#endif /* CONFIG_PPC_SVM */ } #ifdef CONFIG_PPC_PSERIES @@ -1712,6 +1728,43 @@ static void __init prom_close_stdin(void) } } +#ifdef CONFIG_PPC_SVM +static int prom_rtas_hcall(uint64_t args) +{ + register uint64_t arg1 asm("r3") = H_RTAS; + register uint64_t arg2 asm("r4") = args; + + asm volatile("sc 1\n" : "=r" (arg1) : + "r" (arg1), + "r" (arg2) :); + return arg1; +} + +static struct rtas_args __prombss os_term_args; + +static void __init prom_rtas_os_term(char *str) +{ + phandle rtas_node; + __be32 val; + u32 token; + + prom_debug("%s: start...\n", __func__); + rtas_node = call_prom("finddevice", 1, 1, ADDR("/rtas")); + prom_debug("rtas_node: %x\n", rtas_node); + if (!PHANDLE_VALID(rtas_node)) + return; + + val = 0; + prom_getprop(rtas_node, "ibm,os-term", &val, sizeof(val)); + token = be32_to_cpu(val); + prom_debug("ibm,os-term: %x\n", token); + if (token == 0) + prom_panic("Could not get token for ibm,os-term\n"); + os_term_args.token = cpu_to_be32(token); + prom_rtas_hcall((uint64_t)&os_term_args); +} +#endif /* CONFIG_PPC_SVM */ + /* * Allocate room for and instantiate RTAS */ @@ -3168,6 +3221,46 @@ static void unreloc_toc(void) #endif #endif +#ifdef CONFIG_PPC_SVM +/* + * Perform the Enter Secure Mode ultracall. + */ +static int enter_secure_mode(unsigned long kbase, unsigned long fdt) +{ + register unsigned long r3 asm("r3") = UV_ESM; + register unsigned long r4 asm("r4") = kbase; + register unsigned long r5 asm("r5") = fdt; + + asm volatile("sc 2" : "+r"(r3) : "r"(r4), "r"(r5)); + + return r3; +} + +/* + * Call the Ultravisor to transfer us to secure memory if we have an ESM blob. + */ +static void setup_secure_guest(unsigned long kbase, unsigned long fdt) +{ + int ret; + + if (!prom_svm_enable) + return; + + /* Switch to secure mode. */ + prom_printf("Switching to secure mode.\n"); + + ret = enter_secure_mode(kbase, fdt); + if (ret != U_SUCCESS) { + prom_printf("Returned %d from switching to secure mode.\n", ret); + prom_rtas_os_term("Switch to secure mode failed.\n"); + } +} +#else +static void setup_secure_guest(unsigned long kbase, unsigned long fdt) +{ +} +#endif /* CONFIG_PPC_SVM */ + /* * We enter here early on, when the Open Firmware prom is still * handling exceptions and the MMU hash table for us. @@ -3366,6 +3459,9 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4, unreloc_toc(); #endif + /* Move to secure memory if we're supposed to be secure guests. */ + setup_secure_guest(kbase, hdr); + __start(hdr, kbase, 0, 0, 0, 0, 0); return 0; -- cgit v1.2.3 From e311a92da18cbdd4972dab0cda88b1b8484b8fef Mon Sep 17 00:00:00 2001 From: Thiago Jung Bauermann Date: Mon, 19 Aug 2019 23:13:17 -0300 Subject: powerpc/pseries: Add and use LPPACA_SIZE constant Helps document what the hard-coded number means. Also take the opportunity to fix an #endif comment. Suggested-by: Alexey Kardashevskiy Signed-off-by: Thiago Jung Bauermann Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190820021326.6884-8-bauerman@linux.ibm.com --- arch/powerpc/kernel/paca.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c index e3ad8aa4730d..612fc87ef785 100644 --- a/arch/powerpc/kernel/paca.c +++ b/arch/powerpc/kernel/paca.c @@ -52,6 +52,8 @@ static void *__init alloc_paca_data(unsigned long size, unsigned long align, #ifdef CONFIG_PPC_PSERIES +#define LPPACA_SIZE 0x400 + /* * See asm/lppaca.h for more detail. * @@ -65,7 +67,7 @@ static inline void init_lppaca(struct lppaca *lppaca) *lppaca = (struct lppaca) { .desc = cpu_to_be32(0xd397d781), /* "LpPa" */ - .size = cpu_to_be16(0x400), + .size = cpu_to_be16(LPPACA_SIZE), .fpregs_in_use = 1, .slb_count = cpu_to_be16(64), .vmxregs_in_use = 0, @@ -75,19 +77,18 @@ static inline void init_lppaca(struct lppaca *lppaca) static struct lppaca * __init new_lppaca(int cpu, unsigned long limit) { struct lppaca *lp; - size_t size = 0x400; - BUILD_BUG_ON(size < sizeof(struct lppaca)); + BUILD_BUG_ON(sizeof(struct lppaca) > LPPACA_SIZE); if (early_cpu_has_feature(CPU_FTR_HVMODE)) return NULL; - lp = alloc_paca_data(size, 0x400, limit, cpu); + lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu); init_lppaca(lp); return lp; } -#endif /* CONFIG_PPC_BOOK3S */ +#endif /* CONFIG_PPC_PSERIES */ #ifdef CONFIG_PPC_BOOK3S_64 -- cgit v1.2.3 From bd104e6db6f0ad124e507a9ecf1a468efe5697db Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Mon, 19 Aug 2019 23:13:18 -0300 Subject: powerpc/pseries/svm: Use shared memory for LPPACA structures LPPACA structures need to be shared with the host. Hence they need to be in shared memory. Instead of allocating individual chunks of memory for a given structure from memblock, a contiguous chunk of memory is allocated and then converted into shared memory. Subsequent allocation requests will come from the contiguous chunk which will be always shared memory for all structures. While we are able to use a kmem_cache constructor for the Debug Trace Log, LPPACAs are allocated very early in the boot process (before SLUB is available) so we need to use a simpler scheme here. Introduce helper is_svm_platform() which uses the S bit of the MSR to tell whether we're running as a secure guest. Signed-off-by: Anshuman Khandual Signed-off-by: Thiago Jung Bauermann Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190820021326.6884-9-bauerman@linux.ibm.com --- arch/powerpc/kernel/paca.c | 43 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c index 612fc87ef785..949eceb254d8 100644 --- a/arch/powerpc/kernel/paca.c +++ b/arch/powerpc/kernel/paca.c @@ -14,6 +14,8 @@ #include #include #include +#include +#include #include "setup.h" @@ -54,6 +56,41 @@ static void *__init alloc_paca_data(unsigned long size, unsigned long align, #define LPPACA_SIZE 0x400 +static void *__init alloc_shared_lppaca(unsigned long size, unsigned long align, + unsigned long limit, int cpu) +{ + size_t shared_lppaca_total_size = PAGE_ALIGN(nr_cpu_ids * LPPACA_SIZE); + static unsigned long shared_lppaca_size; + static void *shared_lppaca; + void *ptr; + + if (!shared_lppaca) { + memblock_set_bottom_up(true); + + shared_lppaca = + memblock_alloc_try_nid(shared_lppaca_total_size, + PAGE_SIZE, MEMBLOCK_LOW_LIMIT, + limit, NUMA_NO_NODE); + if (!shared_lppaca) + panic("cannot allocate shared data"); + + memblock_set_bottom_up(false); + uv_share_page(PHYS_PFN(__pa(shared_lppaca)), + shared_lppaca_total_size >> PAGE_SHIFT); + } + + ptr = shared_lppaca + shared_lppaca_size; + shared_lppaca_size += size; + + /* + * This is very early in boot, so no harm done if the kernel crashes at + * this point. + */ + BUG_ON(shared_lppaca_size >= shared_lppaca_total_size); + + return ptr; +} + /* * See asm/lppaca.h for more detail. * @@ -83,7 +120,11 @@ static struct lppaca * __init new_lppaca(int cpu, unsigned long limit) if (early_cpu_has_feature(CPU_FTR_HVMODE)) return NULL; - lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu); + if (is_secure_guest()) + lp = alloc_shared_lppaca(LPPACA_SIZE, 0x400, limit, cpu); + else + lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu); + init_lppaca(lp); return lp; -- cgit v1.2.3 From 256ba2c1689efd4f5383cf7ebe2f9970c198b79d Mon Sep 17 00:00:00 2001 From: Ram Pai Date: Mon, 19 Aug 2019 23:13:20 -0300 Subject: powerpc/pseries/svm: Unshare all pages before kexecing a new kernel A new kernel deserves a clean slate. Any pages shared with the hypervisor is unshared before invoking the new kernel. However there are exceptions. If the new kernel is invoked to dump the current kernel, or if there is a explicit request to preserve the state of the current kernel, unsharing of pages is skipped. NOTE: While testing crashkernel, make sure at least 256M is reserved for crashkernel. Otherwise SWIOTLB allocation will fail and crash kernel will fail to boot. Signed-off-by: Ram Pai Signed-off-by: Thiago Jung Bauermann Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190820021326.6884-11-bauerman@linux.ibm.com --- arch/powerpc/kernel/machine_kexec_64.c | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 18481b0e2788..04a7cba58eff 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -29,6 +29,8 @@ #include #include #include +#include +#include int default_machine_kexec_prepare(struct kimage *image) { @@ -327,6 +329,13 @@ void default_machine_kexec(struct kimage *image) #ifdef CONFIG_PPC_PSERIES kexec_paca.lppaca_ptr = NULL; #endif + + if (is_secure_guest() && !(image->preserve_context || + image->type == KEXEC_TYPE_CRASH)) { + uv_unshare_all_pages(); + printk("kexec: Unshared all shared pages.\n"); + } + paca_ptrs[kexec_paca.paca_index] = &kexec_paca; setup_paca(&kexec_paca); -- cgit v1.2.3 From 734560ac39aeb2516419c5878856df011f794a74 Mon Sep 17 00:00:00 2001 From: Ryan Grimm Date: Mon, 19 Aug 2019 23:13:21 -0300 Subject: powerpc/pseries/svm: Export guest SVM status to user space via sysfs User space might want to know it's running in a secure VM. It can't do a mfmsr because mfmsr is a privileged instruction. The solution here is to create a cpu attribute: /sys/devices/system/cpu/svm which will read 0 or 1 based on the S bit of the current CPU. Signed-off-by: Ryan Grimm Signed-off-by: Thiago Jung Bauermann Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190820021326.6884-12-bauerman@linux.ibm.com --- arch/powerpc/kernel/sysfs.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index e2147d7c9e72..80a676da11cb 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -19,6 +19,7 @@ #include #include #include +#include #include "cacheinfo.h" #include "setup.h" @@ -715,6 +716,23 @@ static struct device_attribute pa6t_attrs[] = { #endif /* HAS_PPC_PMC_PA6T */ #endif /* HAS_PPC_PMC_CLASSIC */ +#ifdef CONFIG_PPC_SVM +static ssize_t show_svm(struct device *dev, struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%u\n", is_secure_guest()); +} +static DEVICE_ATTR(svm, 0444, show_svm, NULL); + +static void create_svm_file(void) +{ + device_create_file(cpu_subsys.dev_root, &dev_attr_svm); +} +#else +static void create_svm_file(void) +{ +} +#endif /* CONFIG_PPC_SVM */ + static int register_cpu_online(unsigned int cpu) { struct cpu *c = &per_cpu(cpu_devices, cpu); @@ -1058,6 +1076,8 @@ static int __init topology_init(void) sysfs_create_dscr_default(); #endif /* CONFIG_PPC64 */ + create_svm_file(); + return 0; } subsys_initcall(topology_init); -- cgit v1.2.3 From 0be9f7fd5d8fd984b34ad98838ef7cfd0079ddae Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:26 +1000 Subject: powerpc/64s/exception: machine check fwnmi remove HV case fwnmi does not trigger in HV mode, so remove always-true feature test. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-2-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 520804351601..515ea0243ff2 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1026,9 +1026,8 @@ TRAMP_REAL_BEGIN(machine_check_pSeries) .globl machine_check_fwnmi machine_check_fwnmi: EXCEPTION_PROLOG_0 PACA_EXMC -BEGIN_FTR_SECTION b machine_check_common_early -END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE) + machine_check_pSeries_0: EXCEPTION_PROLOG_1 EXC_STD, PACA_EXMC, 1, 0x200, 1, 1, 0 /* -- cgit v1.2.3 From 1039f62431e2aa16487cd6d64bc841d71f6465b8 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:27 +1000 Subject: powerpc/64s/exception: machine check remove bitrotted comment Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-3-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 4 ---- 1 file changed, 4 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 515ea0243ff2..8cf4e44d2d76 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -933,10 +933,6 @@ EXC_COMMON_BEGIN(system_reset_common) EXC_REAL_BEGIN(machine_check, 0x200, 0x100) - /* This is moved out of line as it can be patched by FW, but - * some code path might still want to branch into the original - * vector - */ EXCEPTION_PROLOG_0 PACA_EXMC BEGIN_FTR_SECTION b machine_check_common_early -- cgit v1.2.3 From 19dbe673e62b076f00ae2841fcf5898b4728d5ab Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:28 +1000 Subject: powerpc/64s/exception: machine check fix KVM guest test The machine_check_handle_early hypervisor guest test is skipped if !HVMODE or MSR[HV]=0, which is wrong for PR or nested hypervisors that could be running a guest in this state. Test HSTATE_IN_GUEST up front and use that to branch out to the KVM handler, then MSR[PR] alone can test for this kernel's userspace. This matches all other interrupt handling. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-4-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 33 +++++++++++++-------------------- 1 file changed, 13 insertions(+), 20 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 8cf4e44d2d76..7c28c22fc6a6 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1108,11 +1108,8 @@ EXC_COMMON_BEGIN(machine_check_handle_early) bl machine_check_early std r3,RESULT(r1) /* Save result */ ld r12,_MSR(r1) -BEGIN_FTR_SECTION - b 4f -END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE) -#ifdef CONFIG_PPC_P7_NAP +#ifdef CONFIG_PPC_P7_NAP /* * Check if thread was in power saving mode. We come here when any * of the following is true: @@ -1128,30 +1125,26 @@ BEGIN_FTR_SECTION END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) #endif - /* - * Check if we are coming from hypervisor userspace. If yes then we - * continue in host kernel in V mode to deliver the MC event. - */ - rldicl. r11,r12,4,63 /* See if MC hit while in HV mode. */ - beq 5f -4: andi. r11,r12,MSR_PR /* See if coming from user. */ - bne 9f /* continue in V mode if we are. */ - -5: #ifdef CONFIG_KVM_BOOK3S_64_HANDLER -BEGIN_FTR_SECTION /* - * We are coming from kernel context. Check if we are coming from - * guest. if yes, then we can continue. We will fall through - * do_kvm_200->kvmppc_interrupt to deliver the MC event to guest. + * Check if we are coming from guest. If yes, then run the normal + * exception handler which will take the do_kvm_200->kvmppc_interrupt + * branch to deliver the MC event to guest. */ lbz r11,HSTATE_IN_GUEST(r13) cmpwi r11,0 /* Check if coming from guest */ bne 9f /* continue if we are. */ -END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) #endif + + /* + * Check if we are coming from userspace. If yes, then run the normal + * exception handler which will deliver the MC event to this kernel. + */ + andi. r11,r12,MSR_PR /* See if coming from user. */ + bne 9f /* continue in V mode if we are. */ + /* - * At this point we are not sure about what context we come from. + * At this point we are coming from kernel context. * Queue up the MCE event and return from the interrupt. * But before that, check if this is an un-recoverable exception. * If yes, then stay on emergency stack and panic. -- cgit v1.2.3 From fe9d482b1d87c76441492e51d866cee652eee4d5 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:29 +1000 Subject: powerpc/64s/exception: machine check adjust RFI target The host kernel delivery case for powernv does RFI_TO_USER_OR_KERNEL, but should just use RFI_TO_KERNEL which makes it clear this is not a user case. This is not a bug because RFI_TO_USER_OR_KERNEL deals with kernel returns just fine. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-5-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 7c28c22fc6a6..9a7cc3edc721 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1184,7 +1184,7 @@ BEGIN_FTR_SECTION */ bl machine_check_queue_event MACHINE_CHECK_HANDLER_WINDUP - RFI_TO_USER_OR_KERNEL + RFI_TO_KERNEL FTR_SECTION_ELSE /* * pSeries: Return from MC interrupt. Before that stay on emergency -- cgit v1.2.3 From b5c27f7c5679c3726148fd25ad220b4560d210cf Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:30 +1000 Subject: powerpc/64s/exception: machine check pseries should always run the early handler Now that pseries with fwnmi registered runs the early machine check handler, there is no good reason to special case the non-fwnmi case and skip the early handler. Reducing the code and number of paths is a top priority for asm code, it's better to handle this in C where possible (and the pseries early handler is a no-op if fwnmi is not registered). Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-6-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 4 ---- 1 file changed, 4 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 9a7cc3edc721..73be4f9027de 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -934,11 +934,7 @@ EXC_COMMON_BEGIN(system_reset_common) EXC_REAL_BEGIN(machine_check, 0x200, 0x100) EXCEPTION_PROLOG_0 PACA_EXMC -BEGIN_FTR_SECTION b machine_check_common_early -FTR_SECTION_ELSE - b machine_check_pSeries_0 -ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE) EXC_REAL_END(machine_check, 0x200, 0x100) EXC_VIRT_NONE(0x4200, 0x100) TRAMP_REAL_BEGIN(machine_check_common_early) -- cgit v1.2.3 From fa2760eca504f554a5adb6cd2f576828933c4c7b Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:31 +1000 Subject: powerpc/64s/exception: machine check remove machine_check_pSeries_0 branch This label has only one caller, so unwind the branch and move it inline. The location of the comment is adjusted to match similar one in system reset. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-7-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 73be4f9027de..577d93a83d05 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1014,20 +1014,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) b 1b b . /* prevent speculative execution */ -TRAMP_REAL_BEGIN(machine_check_pSeries) - .globl machine_check_fwnmi -machine_check_fwnmi: +#ifdef CONFIG_PPC_PSERIES +TRAMP_REAL_BEGIN(machine_check_fwnmi) EXCEPTION_PROLOG_0 PACA_EXMC b machine_check_common_early - -machine_check_pSeries_0: - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXMC, 1, 0x200, 1, 1, 0 - /* - * MSR_RI is not enabled, because PACA_EXMC is being used, so a - * nested machine check corrupts it. machine_check_common enables - * MSR_RI. - */ - EXCEPTION_PROLOG_2_REAL machine_check_common, EXC_STD, 0 +#endif TRAMP_KVM_SKIP(PACA_EXMC, 0x200) @@ -1197,7 +1188,13 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE) /* Deliver the machine check to host kernel in V mode. */ MACHINE_CHECK_HANDLER_WINDUP EXCEPTION_PROLOG_0 PACA_EXMC - b machine_check_pSeries_0 + EXCEPTION_PROLOG_1 EXC_STD, PACA_EXMC, 1, 0x200, 1, 1, 0 + EXCEPTION_PROLOG_2_REAL machine_check_common, EXC_STD, 0 + /* + * MSR_RI is not enabled, because PACA_EXMC is being used, so a + * nested machine check corrupts it. machine_check_common enables + * MSR_RI. + */ EXC_COMMON_BEGIN(unrecover_mce) /* Invoke machine_check_exception to print MCE event and panic. */ -- cgit v1.2.3 From 0b66370c61fcf5fcc1d6901013e110284da6e2bb Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:32 +1000 Subject: powerpc/64s/exception: machine check use correct cfar for late handler Bare metal machine checks run an "early" handler in real mode before running the main handler which reports the event. The main handler runs exactly as a normal interrupt handler, after the "windup" which sets registers back as they were at interrupt entry. CFAR does not get restored by the windup code, so that will be wrong when the handler is run. Restore the CFAR to the saved value before running the late handler. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-8-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 577d93a83d05..b7c4149cb91c 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1186,6 +1186,10 @@ FTR_SECTION_ELSE ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE) 9: /* Deliver the machine check to host kernel in V mode. */ +BEGIN_FTR_SECTION + ld r10,ORIG_GPR3(r1) + mtspr SPRN_CFAR,r10 +END_FTR_SECTION_IFSET(CPU_FTR_CFAR) MACHINE_CHECK_HANDLER_WINDUP EXCEPTION_PROLOG_0 PACA_EXMC EXCEPTION_PROLOG_1 EXC_STD, PACA_EXMC, 1, 0x200, 1, 1, 0 -- cgit v1.2.3 From 7290f3b3d3e66b54720f23079ffc60e0b7bbb0cc Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:33 +1000 Subject: powerpc/64s/powernv: machine check dump SLB contents Re-use the code introduced in pseries to save and dump the contents of the SLB in the case of an SLB involved machine check exception. This patch also avoids allocating the SLB save array on pseries radix. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-9-npiggin@gmail.com --- arch/powerpc/kernel/mce.c | 6 ++++++ arch/powerpc/kernel/mce_power.c | 4 ++++ 2 files changed, 10 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index ec4b3e1087be..04280a5871fc 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -511,6 +511,12 @@ void machine_check_print_event_info(struct machine_check_event *evt, subtype = evt->error_class < ARRAY_SIZE(mc_error_class) ? mc_error_class[evt->error_class] : "Unknown"; printk("%sMCE: CPU%d: %s\n", level, evt->cpu, subtype); + +#ifdef CONFIG_PPC_BOOK3S_64 + /* Display faulty slb contents for SLB errors. */ + if (evt->error_type == MCE_ERROR_TYPE_SLB) + slb_dump_contents(local_paca->mce_faulty_slbs); +#endif } EXPORT_SYMBOL_GPL(machine_check_print_event_info); diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c index b6cbe3449358..356e7b99f661 100644 --- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -405,6 +405,8 @@ static int mce_handle_ierror(struct pt_regs *regs, /* attempt to correct the error */ switch (table[i].error_type) { case MCE_ERROR_TYPE_SLB: + if (local_paca->in_mce == 1) + slb_save_contents(local_paca->mce_faulty_slbs); handled = mce_flush(MCE_FLUSH_SLB); break; case MCE_ERROR_TYPE_ERAT: @@ -490,6 +492,8 @@ static int mce_handle_derror(struct pt_regs *regs, /* attempt to correct the error */ switch (table[i].error_type) { case MCE_ERROR_TYPE_SLB: + if (local_paca->in_mce == 1) + slb_save_contents(local_paca->mce_faulty_slbs); if (mce_flush(MCE_FLUSH_SLB)) handled = 1; break; -- cgit v1.2.3 From 9ca766f9891d23743b4e1a7b1cafdc63723cd6a7 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:34 +1000 Subject: powerpc/64s/pseries: machine check convert to use common event code The common machine_check_event data structures and queues are mostly platform independent, with powernv decoding SRR1/DSISR/etc., into machine_check_event objects. This patch converts pseries to use this infrastructure by decoding fwnmi/rtas data into machine_check_event objects. This allows queueing to be used by a subsequent change to delay the virtual mode handling of machine checks that occur in kernel space where it is unsafe to switch immediately to virtual mode, similarly to powernv. Signed-off-by: Nicholas Piggin [mpe: Fix implicit fallthrough warnings in mce_handle_error()] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-10-npiggin@gmail.com --- arch/powerpc/kernel/mce.c | 34 +++++++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index 04280a5871fc..34c1001e9e8b 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -325,7 +325,7 @@ static void machine_check_process_queued_event(struct irq_work *work) void machine_check_print_event_info(struct machine_check_event *evt, bool user_mode, bool in_guest) { - const char *level, *sevstr, *subtype, *err_type; + const char *level, *sevstr, *subtype, *err_type, *initiator; uint64_t ea = 0, pa = 0; int n = 0; char dar_str[50]; @@ -410,6 +410,28 @@ void machine_check_print_event_info(struct machine_check_event *evt, break; } + switch(evt->initiator) { + case MCE_INITIATOR_CPU: + initiator = "CPU"; + break; + case MCE_INITIATOR_PCI: + initiator = "PCI"; + break; + case MCE_INITIATOR_ISA: + initiator = "ISA"; + break; + case MCE_INITIATOR_MEMORY: + initiator = "Memory"; + break; + case MCE_INITIATOR_POWERMGM: + initiator = "Power Management"; + break; + case MCE_INITIATOR_UNKNOWN: + default: + initiator = "Unknown"; + break; + } + switch (evt->error_type) { case MCE_ERROR_TYPE_UE: err_type = "UE"; @@ -476,6 +498,14 @@ void machine_check_print_event_info(struct machine_check_event *evt, if (evt->u.link_error.effective_address_provided) ea = evt->u.link_error.effective_address; break; + case MCE_ERROR_TYPE_DCACHE: + err_type = "D-Cache"; + subtype = "Unknown"; + break; + case MCE_ERROR_TYPE_ICACHE: + err_type = "I-Cache"; + subtype = "Unknown"; + break; default: case MCE_ERROR_TYPE_UNKNOWN: err_type = "Unknown"; @@ -508,6 +538,8 @@ void machine_check_print_event_info(struct machine_check_event *evt, level, evt->cpu, evt->srr0, (void *)evt->srr0, pa_str); } + printk("%sMCE: CPU%d: Initiator %s\n", level, evt->cpu, initiator); + subtype = evt->error_class < ARRAY_SIZE(mc_error_class) ? mc_error_class[evt->error_class] : "Unknown"; printk("%sMCE: CPU%d: %s\n", level, evt->cpu, subtype); -- cgit v1.2.3 From 272f636445cf556498c8840dc63ad1218e94391b Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:35 +1000 Subject: powerpc/64s/exception: machine check pseries should skip the late handler for kernel MCEs The powernv machine check handler copes with taking a MCE from one of three contexts, guest, kernel, and user. In each case the early handler runs first on a special stack, then: - The guest case branches to the KVM interrupt handler (via standard interrupt macros). - The user case will run the "late" handler which is like a normal interrupt that runs in virtual mode and uses the regular kernel stack. - The kernel case queues the event and schedules it for processing with irq work. The last case is important, it must not enable virtual memory because the MMU state may not be set up to deal with that (e.g., SLB might be clear), it must not use the regular kernel stack for similar reasons (e.g., might be in OPAL with OPAL stack in r1), and the kernel does not expect anything to touch its stack if interrupts are disabled. The pseries handler does not do this queueing, but instead it always runs the late handler for host MCEs, which has some of the same problems. Now that pseries is using machine_check_events, change it to do the same as powernv and queue events for kernel MCEs. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-11-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 22 ++-------------------- 1 file changed, 2 insertions(+), 20 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index b7c4149cb91c..aa3720c0b8fe 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1163,7 +1163,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) cmpdi r3,0 /* see if we handled MCE successfully */ beq 1b /* if !handled then panic */ -BEGIN_FTR_SECTION + /* * Return from MC interrupt. * Queue up the MCE event so that we can log it later, while @@ -1172,18 +1172,7 @@ BEGIN_FTR_SECTION bl machine_check_queue_event MACHINE_CHECK_HANDLER_WINDUP RFI_TO_KERNEL -FTR_SECTION_ELSE - /* - * pSeries: Return from MC interrupt. Before that stay on emergency - * stack and call machine_check_exception to log the MCE event. - */ - LOAD_HANDLER(r10,mce_return) - mtspr SPRN_SRR0,r10 - ld r10,PACAKMSR(r13) - mtspr SPRN_SRR1,r10 - RFI_TO_KERNEL - b . -ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE) + 9: /* Deliver the machine check to host kernel in V mode. */ BEGIN_FTR_SECTION @@ -1212,13 +1201,6 @@ EXC_COMMON_BEGIN(unrecover_mce) bl unrecoverable_exception b 1b -EXC_COMMON_BEGIN(mce_return) - /* Invoke machine_check_exception to print MCE event and return. */ - addi r3,r1,STACK_FRAME_OVERHEAD - bl machine_check_exception - MACHINE_CHECK_HANDLER_WINDUP - RFI_TO_KERNEL - b . EXC_REAL_BEGIN(data_access, 0x300, 0x80) EXCEPTION_PROLOG_0 PACA_EXGEN -- cgit v1.2.3 From c8eb54dbc8087c3d114cb583925395e211bfffa4 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:36 +1000 Subject: powerpc/64s/exception: machine check restructure to reuse common macros Follow the pattern of sreset and HMI handlers more closely: use EXCEPTION_PROLOG_COMMON_1 rather than open-coding it, and run the handler at the relocated location. This helps later simplification and code sharing. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-12-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 71 ++++++++++++++++++------------------ 1 file changed, 36 insertions(+), 35 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index aa3720c0b8fe..a1f0a88d39a5 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -934,17 +934,23 @@ EXC_COMMON_BEGIN(system_reset_common) EXC_REAL_BEGIN(machine_check, 0x200, 0x100) EXCEPTION_PROLOG_0 PACA_EXMC - b machine_check_common_early + EXCEPTION_PROLOG_1 EXC_STD, PACA_EXMC, 0, 0x200, 1, 1, 0 + mfctr r10 /* save ctr, even for !RELOCATABLE */ + BRANCH_TO_C000(r11, machine_check_early_common) + /* + * MSR_RI is not enabled, because PACA_EXMC is being used, so a + * nested machine check corrupts it. machine_check_common enables + * MSR_RI. + */ EXC_REAL_END(machine_check, 0x200, 0x100) EXC_VIRT_NONE(0x4200, 0x100) -TRAMP_REAL_BEGIN(machine_check_common_early) - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXMC, 0, 0x200, 0, 0, 0 + +EXC_COMMON_BEGIN(machine_check_early_common) + mtctr r10 /* Restore ctr */ + mfspr r11,SPRN_SRR0 + mfspr r12,SPRN_SRR1 + /* - * Register contents: - * R13 = PACA - * R9 = CR - * Original R9 to R13 is saved on PACA_EXMC - * * Switch to mc_emergency stack and handle re-entrancy (we limit * the nested MCE upto level 4 to avoid stack overflow). * Save MCE registers srr1, srr0, dar and dsisr and then set ME=1 @@ -965,32 +971,30 @@ TRAMP_REAL_BEGIN(machine_check_common_early) * the machine check is handled then the idle wakeup code is called * to restore state. */ - mr r11,r1 /* Save r1 */ lhz r10,PACA_IN_MCE(r13) cmpwi r10,0 /* Are we in nested machine check */ - bne 0f /* Yes, we are. */ - /* First machine check entry */ - ld r1,PACAMCEMERGSP(r13) /* Use MC emergency stack */ -0: subi r1,r1,INT_FRAME_SIZE /* alloc stack frame */ + cmpwi cr1,r10,MAX_MCE_DEPTH /* Are we at maximum nesting */ addi r10,r10,1 /* increment paca->in_mce */ sth r10,PACA_IN_MCE(r13) + + mr r10,r1 /* Save r1 */ + bne 1f + /* First machine check entry */ + ld r1,PACAMCEMERGSP(r13) /* Use MC emergency stack */ +1: subi r1,r1,INT_FRAME_SIZE /* alloc stack frame */ /* Limit nested MCE to level 4 to avoid stack overflow */ - cmpwi r10,MAX_MCE_DEPTH - bgt 2f /* Check if we hit limit of 4 */ - std r11,GPR1(r1) /* Save r1 on the stack. */ - std r11,0(r1) /* make stack chain pointer */ - mfspr r11,SPRN_SRR0 /* Save SRR0 */ - std r11,_NIP(r1) - mfspr r11,SPRN_SRR1 /* Save SRR1 */ - std r11,_MSR(r1) - mfspr r11,SPRN_DAR /* Save DAR */ - std r11,_DAR(r1) - mfspr r11,SPRN_DSISR /* Save DSISR */ - std r11,_DSISR(r1) - std r9,_CCR(r1) /* Save CR in stackframe */ + bge cr1,2f /* Check if we hit limit of 4 */ + + EXCEPTION_PROLOG_COMMON_1() /* We don't touch AMR here, we never go to virtual mode */ - /* Save r9 through r13 from EXMC save area to stack frame. */ EXCEPTION_PROLOG_COMMON_2(PACA_EXMC) + EXCEPTION_PROLOG_COMMON_3(0x200) + + ld r3,PACA_EXMC+EX_DAR(r13) + lwz r4,PACA_EXMC+EX_DSISR(r13) + std r3,_DAR(r1) + std r4,_DSISR(r1) + mfmsr r11 /* get MSR value */ BEGIN_FTR_SECTION ori r11,r11,MSR_ME /* turn on ME bit */ @@ -1016,8 +1020,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) #ifdef CONFIG_PPC_PSERIES TRAMP_REAL_BEGIN(machine_check_fwnmi) + /* See comment at machine_check exception, don't turn on RI */ EXCEPTION_PROLOG_0 PACA_EXMC - b machine_check_common_early + EXCEPTION_PROLOG_1 EXC_STD, PACA_EXMC, 0, 0x200, 1, 1, 0 + mfctr r10 /* save ctr */ + BRANCH_TO_C000(r11, machine_check_early_common) #endif TRAMP_KVM_SKIP(PACA_EXMC, 0x200) @@ -1088,8 +1095,6 @@ EXC_COMMON_BEGIN(machine_check_idle_common) * ME=1, MMU (IR=0 and DR=0) off and using MC emergency stack. */ EXC_COMMON_BEGIN(machine_check_handle_early) - std r0,GPR0(r1) /* Save r0 */ - EXCEPTION_PROLOG_COMMON_3(0x200) bl save_nvgprs addi r3,r1,STACK_FRAME_OVERHEAD bl machine_check_early @@ -1180,14 +1185,10 @@ BEGIN_FTR_SECTION mtspr SPRN_CFAR,r10 END_FTR_SECTION_IFSET(CPU_FTR_CFAR) MACHINE_CHECK_HANDLER_WINDUP + /* See comment at machine_check exception, don't turn on RI */ EXCEPTION_PROLOG_0 PACA_EXMC EXCEPTION_PROLOG_1 EXC_STD, PACA_EXMC, 1, 0x200, 1, 1, 0 EXCEPTION_PROLOG_2_REAL machine_check_common, EXC_STD, 0 - /* - * MSR_RI is not enabled, because PACA_EXMC is being used, so a - * nested machine check corrupts it. machine_check_common enables - * MSR_RI. - */ EXC_COMMON_BEGIN(unrecover_mce) /* Invoke machine_check_exception to print MCE event and panic. */ -- cgit v1.2.3 From abd1f4ca2b41ffba768c3baadc006a95d178fbf1 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:37 +1000 Subject: powerpc/64s/exception: machine check move tramp code Following convention, move the tramp code (unrelocated) above the common handlers (relocated). Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-13-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index a1f0a88d39a5..06821b199511 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -945,6 +945,17 @@ EXC_REAL_BEGIN(machine_check, 0x200, 0x100) EXC_REAL_END(machine_check, 0x200, 0x100) EXC_VIRT_NONE(0x4200, 0x100) +#ifdef CONFIG_PPC_PSERIES +TRAMP_REAL_BEGIN(machine_check_fwnmi) + /* See comment at machine_check exception, don't turn on RI */ + EXCEPTION_PROLOG_0 PACA_EXMC + EXCEPTION_PROLOG_1 EXC_STD, PACA_EXMC, 0, 0x200, 1, 1, 0 + mfctr r10 /* save ctr */ + BRANCH_TO_C000(r11, machine_check_early_common) +#endif + +TRAMP_KVM_SKIP(PACA_EXMC, 0x200) + EXC_COMMON_BEGIN(machine_check_early_common) mtctr r10 /* Restore ctr */ mfspr r11,SPRN_SRR0 @@ -1018,17 +1029,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) b 1b b . /* prevent speculative execution */ -#ifdef CONFIG_PPC_PSERIES -TRAMP_REAL_BEGIN(machine_check_fwnmi) - /* See comment at machine_check exception, don't turn on RI */ - EXCEPTION_PROLOG_0 PACA_EXMC - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXMC, 0, 0x200, 1, 1, 0 - mfctr r10 /* save ctr */ - BRANCH_TO_C000(r11, machine_check_early_common) -#endif - -TRAMP_KVM_SKIP(PACA_EXMC, 0x200) - EXC_COMMON_BEGIN(machine_check_common) /* * Machine check is different because we use a different -- cgit v1.2.3 From 296e753fb447eb14c8c9fd6a7c48e7ffab269343 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:38 +1000 Subject: powerpc/64s/exception: simplify machine check early path machine_check_handle_early_common can reach machine_check_handle_early directly now that it runs at the relocated address, so just branch directly. The rfi sequence is required to enable MSR[ME] but that step is moved into a helper function, making the code easier to follow. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-14-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 06821b199511..bbbcab88cf78 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1006,16 +1006,13 @@ EXC_COMMON_BEGIN(machine_check_early_common) std r3,_DAR(r1) std r4,_DSISR(r1) - mfmsr r11 /* get MSR value */ BEGIN_FTR_SECTION - ori r11,r11,MSR_ME /* turn on ME bit */ + bl enable_machine_check END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) - ori r11,r11,MSR_RI /* turn on RI bit */ - LOAD_HANDLER(r12, machine_check_handle_early) -1: mtspr SPRN_SRR0,r12 - mtspr SPRN_SRR1,r11 - RFI_TO_KERNEL - b . /* prevent speculative execution */ + li r10,MSR_RI + mtmsrd r10,1 + b machine_check_handle_early + 2: /* Stack overflow. Stay on emergency stack and panic. * Keep the ME bit off while panic-ing, so that if we hit @@ -1026,7 +1023,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) LOAD_HANDLER(r12, unrecover_mce) li r10,MSR_ME andc r11,r11,r10 /* Turn off MSR_ME */ - b 1b + mtspr SPRN_SRR0,r12 + mtspr SPRN_SRR1,r11 + RFI_TO_KERNEL b . /* prevent speculative execution */ EXC_COMMON_BEGIN(machine_check_common) @@ -2269,6 +2268,20 @@ CLOSE_FIXED_SECTION(virt_trampolines); USE_TEXT_SECTION() +/* MSR[RI] should be clear because this uses SRR[01] */ +enable_machine_check: + mflr r0 + bcl 20,31,$+4 +0: mflr r3 + addi r3,r3,(1f - 0b) + mtspr SPRN_SRR0,r3 + mfmsr r3 + ori r3,r3,MSR_ME + mtspr SPRN_SRR1,r3 + RFI_TO_KERNEL +1: mtlr r0 + blr + /* * Hash table stuff */ -- cgit v1.2.3 From b7d9ccec3056913528690c5fae7cc86a5ea3dffc Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:39 +1000 Subject: powerpc/64s/exception: machine check move unrecoverable handling out of line Similarly to the previous change, all callers of the unrecoverable handler run relocated so can reach it with a direct branch. This makes it easy to move out of line, which makes the "normal" path less cluttered and easier to follow. MSR[ME] manipulation still requires the rfi, so that is moved out of line to its own function. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-15-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 87 ++++++++++++++++++------------------ 1 file changed, 44 insertions(+), 43 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index bbbcab88cf78..af18d0f1d4ab 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -992,9 +992,9 @@ EXC_COMMON_BEGIN(machine_check_early_common) bne 1f /* First machine check entry */ ld r1,PACAMCEMERGSP(r13) /* Use MC emergency stack */ -1: subi r1,r1,INT_FRAME_SIZE /* alloc stack frame */ - /* Limit nested MCE to level 4 to avoid stack overflow */ - bge cr1,2f /* Check if we hit limit of 4 */ +1: /* Limit nested MCE to level 4 to avoid stack overflow */ + bgt cr1,unrecoverable_mce /* Check if we hit limit of 4 */ + subi r1,r1,INT_FRAME_SIZE /* alloc stack frame */ EXCEPTION_PROLOG_COMMON_1() /* We don't touch AMR here, we never go to virtual mode */ @@ -1013,21 +1013,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) mtmsrd r10,1 b machine_check_handle_early -2: - /* Stack overflow. Stay on emergency stack and panic. - * Keep the ME bit off while panic-ing, so that if we hit - * another machine check we checkstop. - */ - addi r1,r1,INT_FRAME_SIZE /* go back to previous stack frame */ - ld r11,PACAKMSR(r13) - LOAD_HANDLER(r12, unrecover_mce) - li r10,MSR_ME - andc r11,r11,r10 /* Turn off MSR_ME */ - mtspr SPRN_SRR0,r12 - mtspr SPRN_SRR1,r11 - RFI_TO_KERNEL - b . /* prevent speculative execution */ - EXC_COMMON_BEGIN(machine_check_common) /* * Machine check is different because we use a different @@ -1141,32 +1126,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) * If yes, then stay on emergency stack and panic. */ andi. r11,r12,MSR_RI - bne 2f -1: mfspr r11,SPRN_SRR0 - LOAD_HANDLER(r10,unrecover_mce) - mtspr SPRN_SRR0,r10 - ld r10,PACAKMSR(r13) - /* - * We are going down. But there are chances that we might get hit by - * another MCE during panic path and we may run into unstable state - * with no way out. Hence, turn ME bit off while going down, so that - * when another MCE is hit during panic path, system will checkstop - * and hypervisor will get restarted cleanly by SP. - */ - li r3,MSR_ME - andc r10,r10,r3 /* Turn off MSR_ME */ - mtspr SPRN_SRR1,r10 - RFI_TO_KERNEL - b . -2: + beq unrecoverable_mce + /* * Check if we have successfully handled/recovered from error, if not * then stay on emergency stack and panic. */ ld r3,RESULT(r1) /* Load result */ cmpdi r3,0 /* see if we handled MCE successfully */ - - beq 1b /* if !handled then panic */ + beq unrecoverable_mce /* if !handled then panic */ /* * Return from MC interrupt. @@ -1189,17 +1157,35 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR) EXCEPTION_PROLOG_1 EXC_STD, PACA_EXMC, 1, 0x200, 1, 1, 0 EXCEPTION_PROLOG_2_REAL machine_check_common, EXC_STD, 0 -EXC_COMMON_BEGIN(unrecover_mce) +EXC_COMMON_BEGIN(unrecoverable_mce) + /* + * We are going down. But there are chances that we might get hit by + * another MCE during panic path and we may run into unstable state + * with no way out. Hence, turn ME bit off while going down, so that + * when another MCE is hit during panic path, system will checkstop + * and hypervisor will get restarted cleanly by SP. + */ +BEGIN_FTR_SECTION + li r10,0 /* clear MSR_RI */ + mtmsrd r10,1 + bl disable_machine_check +END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) + ld r10,PACAKMSR(r13) + li r3,MSR_ME + andc r10,r10,r3 + mtmsrd r10 + /* Invoke machine_check_exception to print MCE event and panic. */ addi r3,r1,STACK_FRAME_OVERHEAD bl machine_check_exception + /* - * We will not reach here. Even if we did, there is no way out. Call - * unrecoverable_exception and die. + * We will not reach here. Even if we did, there is no way out. + * Call unrecoverable_exception and die. */ -1: addi r3,r1,STACK_FRAME_OVERHEAD + addi r3,r1,STACK_FRAME_OVERHEAD bl unrecoverable_exception - b 1b + b . EXC_REAL_BEGIN(data_access, 0x300, 0x80) @@ -2282,6 +2268,21 @@ enable_machine_check: 1: mtlr r0 blr +/* MSR[RI] should be clear because this uses SRR[01] */ +disable_machine_check: + mflr r0 + bcl 20,31,$+4 +0: mflr r3 + addi r3,r3,(1f - 0b) + mtspr SPRN_SRR0,r3 + mfmsr r3 + li r4,MSR_ME + andc r3,r3,r4 + mtspr SPRN_SRR1,r3 + RFI_TO_KERNEL +1: mtlr r0 + blr + /* * Hash table stuff */ -- cgit v1.2.3 From fce16d482276f059b08368c833a1188ac1f25e86 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:40 +1000 Subject: powerpc/64s/exception: untangle early machine check handler branch machine_check_early_common now branches to machine_check_handle_early which is its only caller. Move interleaving code out of the way, and remove the branch. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-16-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 129 +++++++++++++++++------------------ 1 file changed, 62 insertions(+), 67 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index af18d0f1d4ab..3a7f18021365 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -956,6 +956,16 @@ TRAMP_REAL_BEGIN(machine_check_fwnmi) TRAMP_KVM_SKIP(PACA_EXMC, 0x200) +#define MACHINE_CHECK_HANDLER_WINDUP \ + /* Clear MSR_RI before setting SRR0 and SRR1. */\ + li r9,0; \ + mtmsrd r9,1; /* Clear MSR_RI */ \ + /* Decrement paca->in_mce now RI is clear. */ \ + lhz r12,PACA_IN_MCE(r13); \ + subi r12,r12,1; \ + sth r12,PACA_IN_MCE(r13); \ + EXCEPTION_RESTORE_REGS EXC_STD + EXC_COMMON_BEGIN(machine_check_early_common) mtctr r10 /* Restore ctr */ mfspr r11,SPRN_SRR0 @@ -1011,74 +1021,7 @@ BEGIN_FTR_SECTION END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) li r10,MSR_RI mtmsrd r10,1 - b machine_check_handle_early -EXC_COMMON_BEGIN(machine_check_common) - /* - * Machine check is different because we use a different - * save area: PACA_EXMC instead of PACA_EXGEN. - */ - EXCEPTION_COMMON(PACA_EXMC, 0x200) - FINISH_NAP - RECONCILE_IRQ_STATE(r10, r11) - ld r3,PACA_EXMC+EX_DAR(r13) - lwz r4,PACA_EXMC+EX_DSISR(r13) - /* Enable MSR_RI when finished with PACA_EXMC */ - li r10,MSR_RI - mtmsrd r10,1 - std r3,_DAR(r1) - std r4,_DSISR(r1) - bl save_nvgprs - addi r3,r1,STACK_FRAME_OVERHEAD - bl machine_check_exception - b ret_from_except - -#define MACHINE_CHECK_HANDLER_WINDUP \ - /* Clear MSR_RI before setting SRR0 and SRR1. */\ - li r9,0; \ - mtmsrd r9,1; /* Clear MSR_RI */ \ - /* Decrement paca->in_mce now RI is clear. */ \ - lhz r12,PACA_IN_MCE(r13); \ - subi r12,r12,1; \ - sth r12,PACA_IN_MCE(r13); \ - EXCEPTION_RESTORE_REGS EXC_STD - -#ifdef CONFIG_PPC_P7_NAP -/* - * This is an idle wakeup. Low level machine check has already been - * done. Queue the event then call the idle code to do the wake up. - */ -EXC_COMMON_BEGIN(machine_check_idle_common) - bl machine_check_queue_event - - /* - * We have not used any non-volatile GPRs here, and as a rule - * most exception code including machine check does not. - * Therefore PACA_NAPSTATELOST does not need to be set. Idle - * wakeup will restore volatile registers. - * - * Load the original SRR1 into r3 for pnv_powersave_wakeup_mce. - * - * Then decrement MCE nesting after finishing with the stack. - */ - ld r3,_MSR(r1) - ld r4,_LINK(r1) - - lhz r11,PACA_IN_MCE(r13) - subi r11,r11,1 - sth r11,PACA_IN_MCE(r13) - - mtlr r4 - rlwinm r10,r3,47-31,30,31 - cmpwi cr1,r10,2 - bltlr cr1 /* no state loss, return to idle caller */ - b idle_return_gpr_loss -#endif - /* - * Handle machine check early in real mode. We come here with - * ME=1, MMU (IR=0 and DR=0) off and using MC emergency stack. - */ -EXC_COMMON_BEGIN(machine_check_handle_early) bl save_nvgprs addi r3,r1,STACK_FRAME_OVERHEAD bl machine_check_early @@ -1157,6 +1100,58 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR) EXCEPTION_PROLOG_1 EXC_STD, PACA_EXMC, 1, 0x200, 1, 1, 0 EXCEPTION_PROLOG_2_REAL machine_check_common, EXC_STD, 0 +EXC_COMMON_BEGIN(machine_check_common) + /* + * Machine check is different because we use a different + * save area: PACA_EXMC instead of PACA_EXGEN. + */ + EXCEPTION_COMMON(PACA_EXMC, 0x200) + FINISH_NAP + RECONCILE_IRQ_STATE(r10, r11) + ld r3,PACA_EXMC+EX_DAR(r13) + lwz r4,PACA_EXMC+EX_DSISR(r13) + /* Enable MSR_RI when finished with PACA_EXMC */ + li r10,MSR_RI + mtmsrd r10,1 + std r3,_DAR(r1) + std r4,_DSISR(r1) + bl save_nvgprs + addi r3,r1,STACK_FRAME_OVERHEAD + bl machine_check_exception + b ret_from_except + +#ifdef CONFIG_PPC_P7_NAP +/* + * This is an idle wakeup. Low level machine check has already been + * done. Queue the event then call the idle code to do the wake up. + */ +EXC_COMMON_BEGIN(machine_check_idle_common) + bl machine_check_queue_event + + /* + * We have not used any non-volatile GPRs here, and as a rule + * most exception code including machine check does not. + * Therefore PACA_NAPSTATELOST does not need to be set. Idle + * wakeup will restore volatile registers. + * + * Load the original SRR1 into r3 for pnv_powersave_wakeup_mce. + * + * Then decrement MCE nesting after finishing with the stack. + */ + ld r3,_MSR(r1) + ld r4,_LINK(r1) + + lhz r11,PACA_IN_MCE(r13) + subi r11,r11,1 + sth r11,PACA_IN_MCE(r13) + + mtlr r4 + rlwinm r10,r3,47-31,30,31 + cmpwi cr1,r10,2 + bltlr cr1 /* no state loss, return to idle caller */ + b idle_return_gpr_loss +#endif + EXC_COMMON_BEGIN(unrecoverable_mce) /* * We are going down. But there are chances that we might get hit by -- cgit v1.2.3 From b3fe35261e329e15736bc95630fd865df9896c66 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:41 +1000 Subject: powerpc/64s/exception: machine check improve labels and comments Short forward and backward branches can be given number labels, but larger significant divergences in code path a more readable if they're given descriptive names. Also adjusts a comment to account for guest delivery. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-17-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 3a7f18021365..c2474c9c8d41 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1052,7 +1052,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) */ lbz r11,HSTATE_IN_GUEST(r13) cmpwi r11,0 /* Check if coming from guest */ - bne 9f /* continue if we are. */ + bne mce_deliver /* continue if we are. */ #endif /* @@ -1060,7 +1060,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) * exception handler which will deliver the MC event to this kernel. */ andi. r11,r12,MSR_PR /* See if coming from user. */ - bne 9f /* continue in V mode if we are. */ + bne mce_deliver /* continue in V mode if we are. */ /* * At this point we are coming from kernel context. @@ -1088,8 +1088,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) MACHINE_CHECK_HANDLER_WINDUP RFI_TO_KERNEL -9: - /* Deliver the machine check to host kernel in V mode. */ +mce_deliver: + /* + * This is a host user or guest MCE. Restore all registers, then + * run the "late" handler. For host user, this will run the + * machine_check_exception handler in virtual mode like a normal + * interrupt handler. For guest, this will trigger the KVM test + * and branch to the KVM interrupt similarly to other interrupts. + */ BEGIN_FTR_SECTION ld r10,ORIG_GPR3(r1) mtspr SPRN_CFAR,r10 -- cgit v1.2.3 From c31f7134dc53f7020b3d49e846d1b950a761e324 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:42 +1000 Subject: powerpc/64s/exception: Fix DAR load for handle_page_fault error case This buglet goes back to before the 64/32 arch merge, but it does not seem to have had practical consequences because bad_page_fault does not use the 2nd argument, but rather regs->dar/nip. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-18-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index c2474c9c8d41..d44f6d103014 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -2335,7 +2335,7 @@ handle_page_fault: bl save_nvgprs mr r5,r3 addi r3,r1,STACK_FRAME_OVERHEAD - lwz r4,_DAR(r1) + ld r4,_DAR(r1) bl bad_page_fault b ret_from_except -- cgit v1.2.3 From a243281195c338489ec5088380821f64340cb82f Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:43 +1000 Subject: powerpc/64s/exception: move head-64.h exception code to exception-64s.S The head-64.h code should deal only with the head code sections and offset calculations. No generated code change except BUG line number constants. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-19-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 41 ++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index d44f6d103014..febeea021939 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -43,6 +43,47 @@ .endif #endif +#define EXC_REAL_BEGIN(name, start, size) \ + FIXED_SECTION_ENTRY_BEGIN_LOCATION(real_vectors, exc_real_##start##_##name, start, size) + +#define EXC_REAL_END(name, start, size) \ + FIXED_SECTION_ENTRY_END_LOCATION(real_vectors, exc_real_##start##_##name, start, size) + +#define EXC_VIRT_BEGIN(name, start, size) \ + FIXED_SECTION_ENTRY_BEGIN_LOCATION(virt_vectors, exc_virt_##start##_##name, start, size) + +#define EXC_VIRT_END(name, start, size) \ + FIXED_SECTION_ENTRY_END_LOCATION(virt_vectors, exc_virt_##start##_##name, start, size) + +#define EXC_COMMON_BEGIN(name) \ + USE_TEXT_SECTION(); \ + .balign IFETCH_ALIGN_BYTES; \ + .global name; \ + _ASM_NOKPROBE_SYMBOL(name); \ + DEFINE_FIXED_SYMBOL(name); \ +name: + +#define TRAMP_REAL_BEGIN(name) \ + FIXED_SECTION_ENTRY_BEGIN(real_trampolines, name) + +#define TRAMP_VIRT_BEGIN(name) \ + FIXED_SECTION_ENTRY_BEGIN(virt_trampolines, name) + +#ifdef CONFIG_KVM_BOOK3S_64_HANDLER +#define TRAMP_KVM_BEGIN(name) \ + TRAMP_VIRT_BEGIN(name) +#else +#define TRAMP_KVM_BEGIN(name) +#endif + +#define EXC_REAL_NONE(start, size) \ + FIXED_SECTION_ENTRY_BEGIN_LOCATION(real_vectors, exc_real_##start##_##unused, start, size); \ + FIXED_SECTION_ENTRY_END_LOCATION(real_vectors, exc_real_##start##_##unused, start, size) + +#define EXC_VIRT_NONE(start, size) \ + FIXED_SECTION_ENTRY_BEGIN_LOCATION(virt_vectors, exc_virt_##start##_##unused, start, size); \ + FIXED_SECTION_ENTRY_END_LOCATION(virt_vectors, exc_virt_##start##_##unused, start, size) + /* * We're short on space and time in the exception prolog, so we can't * use the normal LOAD_REG_IMMEDIATE macro to load the address of label. -- cgit v1.2.3 From def0db4f9ddc24ad7b37735c34a78e9c3c7978ef Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:44 +1000 Subject: powerpc/64s/exception: Add EXC_HV_OR_STD, which selects HSRR if HVMODE Add EXC_HV_OR_STD and use it to consolidate the 0x500 external interrupt. Executed code is unchanged. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-20-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 102 +++++++++++++++++++++++++++-------- 1 file changed, 79 insertions(+), 23 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index febeea021939..33bd99116ae4 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -109,6 +109,7 @@ name: addis reg,reg,(ABS_ADDR(label))@h /* Exception register prefixes */ +#define EXC_HV_OR_STD 2 /* depends on HVMODE */ #define EXC_HV 1 #define EXC_STD 0 @@ -205,7 +206,13 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) .abort "Bad maskable vector" .endif - .if \hsrr + .if \hsrr == EXC_HV_OR_STD + BEGIN_FTR_SECTION + bne masked_Hinterrupt + FTR_SECTION_ELSE + bne masked_interrupt + ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + .elseif \hsrr bne masked_Hinterrupt .else bne masked_interrupt @@ -237,7 +244,17 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) .if ! \set_ri xori r10,r10,MSR_RI /* Clear MSR_RI */ .endif - .if \hsrr + .if \hsrr == EXC_HV_OR_STD + BEGIN_FTR_SECTION + mfspr r11,SPRN_HSRR0 /* save HSRR0 */ + mfspr r12,SPRN_HSRR1 /* and HSRR1 */ + mtspr SPRN_HSRR1,r10 + FTR_SECTION_ELSE + mfspr r11,SPRN_SRR0 /* save SRR0 */ + mfspr r12,SPRN_SRR1 /* and SRR1 */ + mtspr SPRN_SRR1,r10 + ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + .elseif \hsrr mfspr r11,SPRN_HSRR0 /* save HSRR0 */ mfspr r12,SPRN_HSRR1 /* and HSRR1 */ mtspr SPRN_HSRR1,r10 @@ -247,7 +264,15 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) mtspr SPRN_SRR1,r10 .endif LOAD_HANDLER(r10, \label\()) - .if \hsrr + .if \hsrr == EXC_HV_OR_STD + BEGIN_FTR_SECTION + mtspr SPRN_HSRR0,r10 + HRFI_TO_KERNEL + FTR_SECTION_ELSE + mtspr SPRN_SRR0,r10 + RFI_TO_KERNEL + ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + .elseif \hsrr mtspr SPRN_HSRR0,r10 HRFI_TO_KERNEL .else @@ -259,14 +284,26 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) .macro EXCEPTION_PROLOG_2_VIRT label, hsrr #ifdef CONFIG_RELOCATABLE - .if \hsrr + .if \hsrr == EXC_HV_OR_STD + BEGIN_FTR_SECTION + mfspr r11,SPRN_HSRR0 /* save HSRR0 */ + FTR_SECTION_ELSE + mfspr r11,SPRN_SRR0 /* save SRR0 */ + ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + .elseif \hsrr mfspr r11,SPRN_HSRR0 /* save HSRR0 */ .else mfspr r11,SPRN_SRR0 /* save SRR0 */ .endif LOAD_HANDLER(r12, \label\()) mtctr r12 - .if \hsrr + .if \hsrr == EXC_HV_OR_STD + BEGIN_FTR_SECTION + mfspr r12,SPRN_HSRR1 /* and HSRR1 */ + FTR_SECTION_ELSE + mfspr r12,SPRN_SRR1 /* and HSRR1 */ + ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + .elseif \hsrr mfspr r12,SPRN_HSRR1 /* and HSRR1 */ .else mfspr r12,SPRN_SRR1 /* and HSRR1 */ @@ -275,7 +312,15 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) mtmsrd r10,1 /* Set RI (EE=0) */ bctr #else - .if \hsrr + .if \hsrr == EXC_HV_OR_STD + BEGIN_FTR_SECTION + mfspr r11,SPRN_HSRR0 /* save HSRR0 */ + mfspr r12,SPRN_HSRR1 /* and HSRR1 */ + FTR_SECTION_ELSE + mfspr r11,SPRN_SRR0 /* save SRR0 */ + mfspr r12,SPRN_SRR1 /* and SRR1 */ + ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + .elseif \hsrr mfspr r11,SPRN_HSRR0 /* save HSRR0 */ mfspr r12,SPRN_HSRR1 /* and HSRR1 */ .else @@ -316,7 +361,13 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) .macro KVMTEST hsrr, n lbz r10,HSTATE_IN_GUEST(r13) cmpwi r10,0 - .if \hsrr + .if \hsrr == EXC_HV_OR_STD + BEGIN_FTR_SECTION + bne do_kvm_H\n + FTR_SECTION_ELSE + bne do_kvm_\n + ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + .elseif \hsrr bne do_kvm_H\n .else bne do_kvm_\n @@ -342,7 +393,13 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) std r12,HSTATE_SCRATCH0(r13) sldi r12,r9,32 /* HSRR variants have the 0x2 bit added to their trap number */ - .if \hsrr + .if \hsrr == EXC_HV_OR_STD + BEGIN_FTR_SECTION + ori r12,r12,(\n + 0x2) + FTR_SECTION_ELSE + ori r12,r12,(\n) + ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + .elseif \hsrr ori r12,r12,(\n + 0x2) .else ori r12,r12,(\n) @@ -370,7 +427,13 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) 89: mtocrf 0x80,r9 ld r9,\area+EX_R9(r13) ld r10,\area+EX_R10(r13) - .if \hsrr + .if \hsrr == EXC_HV_OR_STD + BEGIN_FTR_SECTION + b kvmppc_skip_Hinterrupt + FTR_SECTION_ELSE + b kvmppc_skip_interrupt + ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + .elseif \hsrr b kvmppc_skip_Hinterrupt .else b kvmppc_skip_interrupt @@ -469,6 +532,9 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66); \ .macro EXCEPTION_RESTORE_REGS hsrr /* Move original SRR0 and SRR1 into the respective regs */ ld r9,_MSR(r1) + .if \hsrr == EXC_HV_OR_STD + .error "EXC_HV_OR_STD Not implemented for EXCEPTION_RESTORE_REGS" + .endif .if \hsrr mtspr SPRN_HSRR1,r9 .else @@ -1363,24 +1429,14 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) EXC_REAL_BEGIN(hardware_interrupt, 0x500, 0x100) EXCEPTION_PROLOG_0 PACA_EXGEN -BEGIN_FTR_SECTION - EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, 0x500, 0, 0, IRQS_DISABLED - EXCEPTION_PROLOG_2_REAL hardware_interrupt_common, EXC_HV, 1 -FTR_SECTION_ELSE - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, 0x500, 0, 0, IRQS_DISABLED - EXCEPTION_PROLOG_2_REAL hardware_interrupt_common, EXC_STD, 1 -ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + EXCEPTION_PROLOG_1 EXC_HV_OR_STD, PACA_EXGEN, 1, 0x500, 0, 0, IRQS_DISABLED + EXCEPTION_PROLOG_2_REAL hardware_interrupt_common, EXC_HV_OR_STD, 1 EXC_REAL_END(hardware_interrupt, 0x500, 0x100) EXC_VIRT_BEGIN(hardware_interrupt, 0x4500, 0x100) EXCEPTION_PROLOG_0 PACA_EXGEN -BEGIN_FTR_SECTION - EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, 0x500, 0, 0, IRQS_DISABLED - EXCEPTION_PROLOG_2_VIRT hardware_interrupt_common, EXC_HV -FTR_SECTION_ELSE - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, 0x500, 0, 0, IRQS_DISABLED - EXCEPTION_PROLOG_2_VIRT hardware_interrupt_common, EXC_STD -ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE) + EXCEPTION_PROLOG_1 EXC_HV_OR_STD, PACA_EXGEN, 1, 0x500, 0, 0, IRQS_DISABLED + EXCEPTION_PROLOG_2_VIRT hardware_interrupt_common, EXC_HV_OR_STD EXC_VIRT_END(hardware_interrupt, 0x4500, 0x100) TRAMP_KVM(PACA_EXGEN, 0x500) -- cgit v1.2.3 From 9a7a0773d7d2fd1e4c581f42ad75de7872a174dc Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:45 +1000 Subject: powerpc/64s/exception: Fix performance monitor virt handler The perf virt handler uses EXCEPTION_PROLOG_2_REAL rather than _VIRT. In practice this is okay because the _REAL variant is usable by virt mode interrupts, but should be fixed (and is a performance win). Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-21-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 33bd99116ae4..4b685d894ba1 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -750,7 +750,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) #define __TRAMP_VIRT_OOL_MASKABLE(name, realvec, bitmask) \ TRAMP_VIRT_BEGIN(tramp_virt_##name); \ EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 0, realvec, 0, 0, bitmask ; \ - EXCEPTION_PROLOG_2_REAL name##_common, EXC_STD, 1 + EXCEPTION_PROLOG_2_VIRT name##_common, EXC_STD #define EXC_VIRT_OOL_MASKABLE(name, start, size, realvec, bitmask) \ __EXC_VIRT_OOL_MASKABLE(name, start, size); \ -- cgit v1.2.3 From 5ff79a5ea69f8f5d1131064af9f8c9a8b1bc8266 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:46 +1000 Subject: powerpc/64s/exception: remove 0xb00 handler This vector is not used by any supported processor, and has been implemented as an unknown exception going back to 2.6. There is nothing special about 0xb00, so remove it like other unused vectors. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-22-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 4b685d894ba1..d9c953e419bf 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1564,10 +1564,8 @@ EXC_COMMON_ASYNC(doorbell_super_common, 0xa00, unknown_exception) #endif -EXC_REAL(trap_0b, 0xb00, 0x100) -EXC_VIRT(trap_0b, 0x4b00, 0x100, 0xb00) -TRAMP_KVM(PACA_EXGEN, 0xb00) -EXC_COMMON(trap_0b_common, 0xb00, unknown_exception) +EXC_REAL_NONE(0xb00, 0x100) +EXC_VIRT_NONE(0x4b00, 0x100) /* * system call / hypercall (0xc00, 0x4c00) -- cgit v1.2.3 From 7299417c8214ce09c0da01d7719c7b1c11503578 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:47 +1000 Subject: powerpc/64s/exception: Replace PROLOG macros and EXC helpers with a gas macro This creates a single macro that generates the exception prolog code, with variants specified by arguments, rather than assorted nested macros for different variants. The increasing length of macro argument list is not nice to read or modify, but this is a temporary condition that will be improved in later changes. No generated code change except BUG line number constants and label names. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-23-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 502 ++++++++++++++--------------------- 1 file changed, 206 insertions(+), 296 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index d9c953e419bf..a476bb2f80f3 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -43,6 +43,17 @@ .endif #endif +/* + * Following are fixed section helper macros. + * + * EXC_REAL_BEGIN/END - real, unrelocated exception vectors + * EXC_VIRT_BEGIN/END - virt (AIL), unrelocated exception vectors + * TRAMP_REAL_BEGIN - real, unrelocated helpers (virt may call these) + * TRAMP_VIRT_BEGIN - virt, unreloc helpers (in practice, real can use) + * TRAMP_KVM_BEGIN - KVM handlers, these are put into real, unrelocated + * EXC_COMMON - After switching to virtual, relocated mode. + */ + #define EXC_REAL_BEGIN(name, start, size) \ FIXED_SECTION_ENTRY_BEGIN_LOCATION(real_vectors, exc_real_##start##_##name, start, size) @@ -589,196 +600,54 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) #endif /* - * Following are the BOOK3S exception handler helper macros. - * Handlers come in a number of types, and each type has a number of varieties. - * - * EXC_REAL_* - real, unrelocated exception vectors - * EXC_VIRT_* - virt (AIL), unrelocated exception vectors - * TRAMP_REAL_* - real, unrelocated helpers (virt can call these) - * TRAMP_VIRT_* - virt, unreloc helpers (in practice, real can use) - * TRAMP_KVM - KVM handlers that get put into real, unrelocated - * EXC_COMMON - virt, relocated common handlers - * - * The EXC handlers are given a name, and branch to name_common, or the - * appropriate KVM or masking function. Vector handler verieties are as - * follows: - * - * EXC_{REAL|VIRT}_BEGIN/END - used to open-code the exception - * - * EXC_{REAL|VIRT} - standard exception - * - * EXC_{REAL|VIRT}_suffix - * where _suffix is: - * - _MASKABLE - maskable exception - * - _OOL - out of line with trampoline to common handler - * - _HV - HV exception - * - * There can be combinations, e.g., EXC_VIRT_OOL_MASKABLE_HV + * This is the BOOK3S interrupt entry code macro. * - * KVM handlers come in the following verieties: - * TRAMP_KVM - * TRAMP_KVM_SKIP - * TRAMP_KVM_HV - * TRAMP_KVM_HV_SKIP - * - * COMMON handlers come in the following verieties: - * EXC_COMMON_BEGIN/END - used to open-code the handler - * EXC_COMMON - * EXC_COMMON_ASYNC - * - * TRAMP_REAL and TRAMP_VIRT can be used with BEGIN/END. KVM - * and OOL handlers are implemented as types of TRAMP and TRAMP_VIRT handlers. + * This can result in one of several things happening: + * - Branch to the _common handler, relocated, in virtual mode. + * These are normal interrupts (synchronous and asynchronous) handled by + * the kernel. + * - Branch to KVM, relocated but real mode interrupts remain in real mode. + * These occur when HSTATE_IN_GUEST is set. The interrupt may be caused by + * / intended for host or guest kernel, but KVM must always be involved + * because the machine state is set for guest execution. + * - Branch to the masked handler, unrelocated. + * These occur when maskable asynchronous interrupts are taken with the + * irq_soft_mask set. + * - Branch to an "early" handler in real mode but relocated. + * This is done if early=1. MCE and HMI use these to handle errors in real + * mode. + * - Fall through and continue executing in real, unrelocated mode. + * This is done if early=2. */ +.macro INT_HANDLER name, vec, ool, early, virt, hsrr, area, ri, dar, dsisr, bitmask, kvm + EXCEPTION_PROLOG_0 \area + .if \ool + .if !\virt + b tramp_real_\name + .pushsection .text + TRAMP_REAL_BEGIN(tramp_real_\name) + .else + b tramp_virt_\name + .pushsection .text + TRAMP_VIRT_BEGIN(tramp_virt_\name) + .endif + .endif + EXCEPTION_PROLOG_1 \hsrr, \area, \kvm, \vec, \dar, \dsisr, \bitmask + .if \early == 2 + /* nothing more */ + .elseif \early + mfctr r10 /* save ctr, even for !RELOCATABLE */ + BRANCH_TO_C000(r11, \name\()_early_common) + .elseif !\virt + EXCEPTION_PROLOG_2_REAL \name\()_common, \hsrr, \ri + .else + EXCEPTION_PROLOG_2_VIRT \name\()_common, \hsrr + .endif + .if \ool + .popsection + .endif +.endm -#define __EXC_REAL(name, start, size, area) \ - EXC_REAL_BEGIN(name, start, size); \ - EXCEPTION_PROLOG_0 area ; \ - EXCEPTION_PROLOG_1 EXC_STD, area, 1, start, 0, 0, 0 ; \ - EXCEPTION_PROLOG_2_REAL name##_common, EXC_STD, 1 ; \ - EXC_REAL_END(name, start, size) - -#define EXC_REAL(name, start, size) \ - __EXC_REAL(name, start, size, PACA_EXGEN) - -#define __EXC_VIRT(name, start, size, realvec, area) \ - EXC_VIRT_BEGIN(name, start, size); \ - EXCEPTION_PROLOG_0 area ; \ - EXCEPTION_PROLOG_1 EXC_STD, area, 0, realvec, 0, 0, 0; \ - EXCEPTION_PROLOG_2_VIRT name##_common, EXC_STD ; \ - EXC_VIRT_END(name, start, size) - -#define EXC_VIRT(name, start, size, realvec) \ - __EXC_VIRT(name, start, size, realvec, PACA_EXGEN) - -#define EXC_REAL_MASKABLE(name, start, size, bitmask) \ - EXC_REAL_BEGIN(name, start, size); \ - EXCEPTION_PROLOG_0 PACA_EXGEN ; \ - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, start, 0, 0, bitmask ; \ - EXCEPTION_PROLOG_2_REAL name##_common, EXC_STD, 1 ; \ - EXC_REAL_END(name, start, size) - -#define EXC_VIRT_MASKABLE(name, start, size, realvec, bitmask) \ - EXC_VIRT_BEGIN(name, start, size); \ - EXCEPTION_PROLOG_0 PACA_EXGEN ; \ - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 0, realvec, 0, 0, bitmask ; \ - EXCEPTION_PROLOG_2_VIRT name##_common, EXC_STD ; \ - EXC_VIRT_END(name, start, size) - -#define EXC_REAL_HV(name, start, size) \ - EXC_REAL_BEGIN(name, start, size); \ - EXCEPTION_PROLOG_0 PACA_EXGEN; \ - EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, start, 0, 0, 0 ; \ - EXCEPTION_PROLOG_2_REAL name##_common, EXC_HV, 1 ; \ - EXC_REAL_END(name, start, size) - -#define EXC_VIRT_HV(name, start, size, realvec) \ - EXC_VIRT_BEGIN(name, start, size); \ - EXCEPTION_PROLOG_0 PACA_EXGEN; \ - EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, realvec, 0, 0, 0 ; \ - EXCEPTION_PROLOG_2_VIRT name##_common, EXC_HV ; \ - EXC_VIRT_END(name, start, size) - -#define __EXC_REAL_OOL(name, start, size) \ - EXC_REAL_BEGIN(name, start, size); \ - EXCEPTION_PROLOG_0 PACA_EXGEN ; \ - b tramp_real_##name ; \ - EXC_REAL_END(name, start, size) - -#define __TRAMP_REAL_OOL(name, vec) \ - TRAMP_REAL_BEGIN(tramp_real_##name); \ - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, vec, 0, 0, 0 ; \ - EXCEPTION_PROLOG_2_REAL name##_common, EXC_STD, 1 - -#define EXC_REAL_OOL(name, start, size) \ - __EXC_REAL_OOL(name, start, size); \ - __TRAMP_REAL_OOL(name, start) - -#define __EXC_REAL_OOL_MASKABLE(name, start, size) \ - __EXC_REAL_OOL(name, start, size) - -#define __TRAMP_REAL_OOL_MASKABLE(name, vec, bitmask) \ - TRAMP_REAL_BEGIN(tramp_real_##name); \ - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, vec, 0, 0, bitmask ; \ - EXCEPTION_PROLOG_2_REAL name##_common, EXC_STD, 1 - -#define EXC_REAL_OOL_MASKABLE(name, start, size, bitmask) \ - __EXC_REAL_OOL_MASKABLE(name, start, size); \ - __TRAMP_REAL_OOL_MASKABLE(name, start, bitmask) - -#define __EXC_REAL_OOL_HV(name, start, size) \ - __EXC_REAL_OOL(name, start, size) - -#define __TRAMP_REAL_OOL_HV(name, vec) \ - TRAMP_REAL_BEGIN(tramp_real_##name); \ - EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, vec, 0, 0, 0 ; \ - EXCEPTION_PROLOG_2_REAL name##_common, EXC_HV, 1 - -#define EXC_REAL_OOL_HV(name, start, size) \ - __EXC_REAL_OOL_HV(name, start, size); \ - __TRAMP_REAL_OOL_HV(name, start) - -#define __EXC_REAL_OOL_MASKABLE_HV(name, start, size) \ - __EXC_REAL_OOL(name, start, size) - -#define __TRAMP_REAL_OOL_MASKABLE_HV(name, vec, bitmask) \ - TRAMP_REAL_BEGIN(tramp_real_##name); \ - EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, vec, 0, 0, bitmask ; \ - EXCEPTION_PROLOG_2_REAL name##_common, EXC_HV, 1 - -#define EXC_REAL_OOL_MASKABLE_HV(name, start, size, bitmask) \ - __EXC_REAL_OOL_MASKABLE_HV(name, start, size); \ - __TRAMP_REAL_OOL_MASKABLE_HV(name, start, bitmask) - -#define __EXC_VIRT_OOL(name, start, size) \ - EXC_VIRT_BEGIN(name, start, size); \ - EXCEPTION_PROLOG_0 PACA_EXGEN ; \ - b tramp_virt_##name; \ - EXC_VIRT_END(name, start, size) - -#define __TRAMP_VIRT_OOL(name, realvec) \ - TRAMP_VIRT_BEGIN(tramp_virt_##name); \ - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 0, vec, 0, 0, 0 ; \ - EXCEPTION_PROLOG_2_VIRT name##_common, EXC_STD - -#define EXC_VIRT_OOL(name, start, size, realvec) \ - __EXC_VIRT_OOL(name, start, size); \ - __TRAMP_VIRT_OOL(name, realvec) - -#define __EXC_VIRT_OOL_MASKABLE(name, start, size) \ - __EXC_VIRT_OOL(name, start, size) - -#define __TRAMP_VIRT_OOL_MASKABLE(name, realvec, bitmask) \ - TRAMP_VIRT_BEGIN(tramp_virt_##name); \ - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 0, realvec, 0, 0, bitmask ; \ - EXCEPTION_PROLOG_2_VIRT name##_common, EXC_STD - -#define EXC_VIRT_OOL_MASKABLE(name, start, size, realvec, bitmask) \ - __EXC_VIRT_OOL_MASKABLE(name, start, size); \ - __TRAMP_VIRT_OOL_MASKABLE(name, realvec, bitmask) - -#define __EXC_VIRT_OOL_HV(name, start, size) \ - __EXC_VIRT_OOL(name, start, size) - -#define __TRAMP_VIRT_OOL_HV(name, realvec) \ - TRAMP_VIRT_BEGIN(tramp_virt_##name); \ - EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, realvec, 0, 0, 0 ; \ - EXCEPTION_PROLOG_2_VIRT name##_common, EXC_HV - -#define EXC_VIRT_OOL_HV(name, start, size, realvec) \ - __EXC_VIRT_OOL_HV(name, start, size); \ - __TRAMP_VIRT_OOL_HV(name, realvec) - -#define __EXC_VIRT_OOL_MASKABLE_HV(name, start, size) \ - __EXC_VIRT_OOL(name, start, size) - -#define __TRAMP_VIRT_OOL_MASKABLE_HV(name, realvec, bitmask) \ - TRAMP_VIRT_BEGIN(tramp_virt_##name); \ - EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, realvec, 0, 0, bitmask ; \ - EXCEPTION_PROLOG_2_VIRT name##_common, EXC_HV - -#define EXC_VIRT_OOL_MASKABLE_HV(name, start, size, realvec, bitmask) \ - __EXC_VIRT_OOL_MASKABLE_HV(name, start, size); \ - __TRAMP_VIRT_OOL_MASKABLE_HV(name, realvec, bitmask) #define TRAMP_KVM(area, n) \ TRAMP_KVM_BEGIN(do_kvm_##n); \ @@ -943,9 +812,7 @@ BEGIN_FTR_SECTION END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) #endif - EXCEPTION_PROLOG_0 PACA_EXNMI - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXNMI, 1, 0x100, 0, 0, 0 - EXCEPTION_PROLOG_2_REAL system_reset_common, EXC_STD, 0 + INT_HANDLER system_reset, 0x100, 0, 0, 0, EXC_STD, PACA_EXNMI, 0, 0, 0, 0, 1 /* * MSR_RI is not enabled, because PACA_EXNMI and nmi stack is * being used, so a nested NMI exception would corrupt it. @@ -975,9 +842,7 @@ TRAMP_REAL_BEGIN(system_reset_idle_wake) */ TRAMP_REAL_BEGIN(system_reset_fwnmi) /* See comment at system_reset exception, don't turn on RI */ - EXCEPTION_PROLOG_0 PACA_EXNMI - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXNMI, 0, 0x100, 0, 0, 0 - EXCEPTION_PROLOG_2_REAL system_reset_common, EXC_STD, 0 + INT_HANDLER system_reset, 0x100, 0, 0, 0, EXC_STD, PACA_EXNMI, 0, 0, 0, 0, 0 #endif /* CONFIG_PPC_PSERIES */ @@ -1040,10 +905,7 @@ EXC_COMMON_BEGIN(system_reset_common) EXC_REAL_BEGIN(machine_check, 0x200, 0x100) - EXCEPTION_PROLOG_0 PACA_EXMC - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXMC, 0, 0x200, 1, 1, 0 - mfctr r10 /* save ctr, even for !RELOCATABLE */ - BRANCH_TO_C000(r11, machine_check_early_common) + INT_HANDLER machine_check, 0x200, 0, 1, 0, EXC_STD, PACA_EXMC, 0, 1, 1, 0, 0 /* * MSR_RI is not enabled, because PACA_EXMC is being used, so a * nested machine check corrupts it. machine_check_common enables @@ -1055,10 +917,7 @@ EXC_VIRT_NONE(0x4200, 0x100) #ifdef CONFIG_PPC_PSERIES TRAMP_REAL_BEGIN(machine_check_fwnmi) /* See comment at machine_check exception, don't turn on RI */ - EXCEPTION_PROLOG_0 PACA_EXMC - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXMC, 0, 0x200, 1, 1, 0 - mfctr r10 /* save ctr */ - BRANCH_TO_C000(r11, machine_check_early_common) + INT_HANDLER machine_check, 0x200, 0, 1, 0, EXC_STD, PACA_EXMC, 0, 1, 1, 0, 0 #endif TRAMP_KVM_SKIP(PACA_EXMC, 0x200) @@ -1209,9 +1068,7 @@ BEGIN_FTR_SECTION END_FTR_SECTION_IFSET(CPU_FTR_CFAR) MACHINE_CHECK_HANDLER_WINDUP /* See comment at machine_check exception, don't turn on RI */ - EXCEPTION_PROLOG_0 PACA_EXMC - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXMC, 1, 0x200, 1, 1, 0 - EXCEPTION_PROLOG_2_REAL machine_check_common, EXC_STD, 0 + INT_HANDLER machine_check, 0x200, 0, 0, 0, EXC_STD, PACA_EXMC, 0, 1, 1, 0, 1 EXC_COMMON_BEGIN(machine_check_common) /* @@ -1297,18 +1154,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) EXC_REAL_BEGIN(data_access, 0x300, 0x80) - EXCEPTION_PROLOG_0 PACA_EXGEN - b tramp_real_data_access + INT_HANDLER data_access, 0x300, 1, 0, 0, EXC_STD, PACA_EXGEN, 1, 1, 1, 0, 1 EXC_REAL_END(data_access, 0x300, 0x80) - -TRAMP_REAL_BEGIN(tramp_real_data_access) - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, 0x300, 1, 1, 0 - EXCEPTION_PROLOG_2_REAL data_access_common, EXC_STD, 1 - EXC_VIRT_BEGIN(data_access, 0x4300, 0x80) - EXCEPTION_PROLOG_0 PACA_EXGEN - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 0, 0x300, 1, 1, 0 -EXCEPTION_PROLOG_2_VIRT data_access_common, EXC_STD + INT_HANDLER data_access, 0x300, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 1, 1, 0, 0 EXC_VIRT_END(data_access, 0x4300, 0x80) TRAMP_KVM_SKIP(PACA_EXGEN, 0x300) @@ -1336,18 +1185,10 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80) - EXCEPTION_PROLOG_0 PACA_EXSLB - b tramp_real_data_access_slb + INT_HANDLER data_access_slb, 0x380, 1, 0, 0, 0, PACA_EXSLB, 1, 1, 0, 0, 1 EXC_REAL_END(data_access_slb, 0x380, 0x80) - -TRAMP_REAL_BEGIN(tramp_real_data_access_slb) - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXSLB, 1, 0x380, 1, 0, 0 - EXCEPTION_PROLOG_2_REAL data_access_slb_common, EXC_STD, 1 - EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80) - EXCEPTION_PROLOG_0 PACA_EXSLB - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXSLB, 0, 0x380, 1, 0, 0 - EXCEPTION_PROLOG_2_VIRT data_access_slb_common, EXC_STD + INT_HANDLER data_access_slb, 0x380, 0, 0, 1, 0, PACA_EXSLB, 1, 1, 0, 0, 0 EXC_VIRT_END(data_access_slb, 0x4380, 0x80) TRAMP_KVM_SKIP(PACA_EXSLB, 0x380) @@ -1378,8 +1219,13 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) b ret_from_except -EXC_REAL(instruction_access, 0x400, 0x80) -EXC_VIRT(instruction_access, 0x4400, 0x80, 0x400) +EXC_REAL_BEGIN(instruction_access, 0x400, 0x80) + INT_HANDLER instruction_access, 0x400, 0, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_REAL_END(instruction_access, 0x400, 0x80) +EXC_VIRT_BEGIN(instruction_access, 0x4400, 0x80) + INT_HANDLER instruction_access, 0x400, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 +EXC_VIRT_END(instruction_access, 0x4400, 0x80) + TRAMP_KVM(PACA_EXGEN, 0x400) EXC_COMMON_BEGIN(instruction_access_common) @@ -1398,8 +1244,12 @@ MMU_FTR_SECTION_ELSE ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) -__EXC_REAL(instruction_access_slb, 0x480, 0x80, PACA_EXSLB) -__EXC_VIRT(instruction_access_slb, 0x4480, 0x80, 0x480, PACA_EXSLB) +EXC_REAL_BEGIN(instruction_access_slb, 0x480, 0x80) + INT_HANDLER instruction_access_slb, 0x480, 0, 0, 0, EXC_STD, PACA_EXSLB, 1, 0, 0, 0, 1 +EXC_REAL_END(instruction_access_slb, 0x480, 0x80) +EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80) + INT_HANDLER instruction_access_slb, 0x480, 0, 0, 1, EXC_STD, PACA_EXSLB, 1, 0, 0, 0, 0 +EXC_VIRT_END(instruction_access_slb, 0x4480, 0x80) TRAMP_KVM(PACA_EXSLB, 0x480) EXC_COMMON_BEGIN(instruction_access_slb_common) @@ -1426,17 +1276,11 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) bl do_bad_slb_fault b ret_from_except - EXC_REAL_BEGIN(hardware_interrupt, 0x500, 0x100) - EXCEPTION_PROLOG_0 PACA_EXGEN - EXCEPTION_PROLOG_1 EXC_HV_OR_STD, PACA_EXGEN, 1, 0x500, 0, 0, IRQS_DISABLED - EXCEPTION_PROLOG_2_REAL hardware_interrupt_common, EXC_HV_OR_STD, 1 + INT_HANDLER hardware_interrupt, 0x500, 0, 0, 0, EXC_HV_OR_STD, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 EXC_REAL_END(hardware_interrupt, 0x500, 0x100) - EXC_VIRT_BEGIN(hardware_interrupt, 0x4500, 0x100) - EXCEPTION_PROLOG_0 PACA_EXGEN - EXCEPTION_PROLOG_1 EXC_HV_OR_STD, PACA_EXGEN, 1, 0x500, 0, 0, IRQS_DISABLED - EXCEPTION_PROLOG_2_VIRT hardware_interrupt_common, EXC_HV_OR_STD + INT_HANDLER hardware_interrupt, 0x500, 0, 0, 1, EXC_HV_OR_STD, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 EXC_VIRT_END(hardware_interrupt, 0x4500, 0x100) TRAMP_KVM(PACA_EXGEN, 0x500) @@ -1445,15 +1289,10 @@ EXC_COMMON_ASYNC(hardware_interrupt_common, 0x500, do_IRQ) EXC_REAL_BEGIN(alignment, 0x600, 0x100) - EXCEPTION_PROLOG_0 PACA_EXGEN - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, 0x600, 1, 1, 0 - EXCEPTION_PROLOG_2_REAL alignment_common, EXC_STD, 1 + INT_HANDLER alignment, 0x600, 0, 0, 0, EXC_STD, PACA_EXGEN, 1, 1, 1, 0, 1 EXC_REAL_END(alignment, 0x600, 0x100) - EXC_VIRT_BEGIN(alignment, 0x4600, 0x100) - EXCEPTION_PROLOG_0 PACA_EXGEN - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 0, 0x600, 1, 1, 0 - EXCEPTION_PROLOG_2_VIRT alignment_common, EXC_STD + INT_HANDLER alignment, 0x600, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 1, 1, 0, 0 EXC_VIRT_END(alignment, 0x4600, 0x100) TRAMP_KVM(PACA_EXGEN, 0x600) @@ -1470,8 +1309,12 @@ EXC_COMMON_BEGIN(alignment_common) b ret_from_except -EXC_REAL(program_check, 0x700, 0x100) -EXC_VIRT(program_check, 0x4700, 0x100, 0x700) +EXC_REAL_BEGIN(program_check, 0x700, 0x100) + INT_HANDLER program_check, 0x700, 0, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_REAL_END(program_check, 0x700, 0x100) +EXC_VIRT_BEGIN(program_check, 0x4700, 0x100) + INT_HANDLER program_check, 0x700, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 +EXC_VIRT_END(program_check, 0x4700, 0x100) TRAMP_KVM(PACA_EXGEN, 0x700) EXC_COMMON_BEGIN(program_check_common) /* @@ -1508,8 +1351,12 @@ EXC_COMMON_BEGIN(program_check_common) b ret_from_except -EXC_REAL(fp_unavailable, 0x800, 0x100) -EXC_VIRT(fp_unavailable, 0x4800, 0x100, 0x800) +EXC_REAL_BEGIN(fp_unavailable, 0x800, 0x100) + INT_HANDLER fp_unavailable, 0x800, 0, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_REAL_END(fp_unavailable, 0x800, 0x100) +EXC_VIRT_BEGIN(fp_unavailable, 0x4800, 0x100) + INT_HANDLER fp_unavailable, 0x800, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 +EXC_VIRT_END(fp_unavailable, 0x4800, 0x100) TRAMP_KVM(PACA_EXGEN, 0x800) EXC_COMMON_BEGIN(fp_unavailable_common) EXCEPTION_COMMON(PACA_EXGEN, 0x800) @@ -1542,20 +1389,32 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM) #endif -EXC_REAL_OOL_MASKABLE(decrementer, 0x900, 0x80, IRQS_DISABLED) -EXC_VIRT_MASKABLE(decrementer, 0x4900, 0x80, 0x900, IRQS_DISABLED) +EXC_REAL_BEGIN(decrementer, 0x900, 0x80) + INT_HANDLER decrementer, 0x900, 1, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 +EXC_REAL_END(decrementer, 0x900, 0x80) +EXC_VIRT_BEGIN(decrementer, 0x4900, 0x80) + INT_HANDLER decrementer, 0x900, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 0 +EXC_VIRT_END(decrementer, 0x4900, 0x80) TRAMP_KVM(PACA_EXGEN, 0x900) EXC_COMMON_ASYNC(decrementer_common, 0x900, timer_interrupt) -EXC_REAL_HV(hdecrementer, 0x980, 0x80) -EXC_VIRT_HV(hdecrementer, 0x4980, 0x80, 0x980) +EXC_REAL_BEGIN(hdecrementer, 0x980, 0x80) + INT_HANDLER hdecrementer, 0x980, 0, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_REAL_END(hdecrementer, 0x980, 0x80) +EXC_VIRT_BEGIN(hdecrementer, 0x4980, 0x80) + INT_HANDLER hdecrementer, 0x980, 0, 0, 1, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_VIRT_END(hdecrementer, 0x4980, 0x80) TRAMP_KVM_HV(PACA_EXGEN, 0x980) EXC_COMMON(hdecrementer_common, 0x980, hdec_interrupt) -EXC_REAL_MASKABLE(doorbell_super, 0xa00, 0x100, IRQS_DISABLED) -EXC_VIRT_MASKABLE(doorbell_super, 0x4a00, 0x100, 0xa00, IRQS_DISABLED) +EXC_REAL_BEGIN(doorbell_super, 0xa00, 0x100) + INT_HANDLER doorbell_super, 0xa00, 0, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 +EXC_REAL_END(doorbell_super, 0xa00, 0x100) +EXC_VIRT_BEGIN(doorbell_super, 0x4a00, 0x100) + INT_HANDLER doorbell_super, 0xa00, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 0 +EXC_VIRT_END(doorbell_super, 0x4a00, 0x100) TRAMP_KVM(PACA_EXGEN, 0xa00) #ifdef CONFIG_PPC_DOORBELL EXC_COMMON_ASYNC(doorbell_super_common, 0xa00, doorbell_exception) @@ -1669,7 +1528,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) EXC_REAL_BEGIN(system_call, 0xc00, 0x100) SYSTEM_CALL 0 EXC_REAL_END(system_call, 0xc00, 0x100) - EXC_VIRT_BEGIN(system_call, 0x4c00, 0x100) SYSTEM_CALL 1 EXC_VIRT_END(system_call, 0x4c00, 0x100) @@ -1699,13 +1557,22 @@ TRAMP_KVM_BEGIN(do_kvm_0xc00) #endif -EXC_REAL(single_step, 0xd00, 0x100) -EXC_VIRT(single_step, 0x4d00, 0x100, 0xd00) +EXC_REAL_BEGIN(single_step, 0xd00, 0x100) + INT_HANDLER single_step, 0xd00, 0, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_REAL_END(single_step, 0xd00, 0x100) +EXC_VIRT_BEGIN(single_step, 0x4d00, 0x100) + INT_HANDLER single_step, 0xd00, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 +EXC_VIRT_END(single_step, 0x4d00, 0x100) TRAMP_KVM(PACA_EXGEN, 0xd00) EXC_COMMON(single_step_common, 0xd00, single_step_exception) -EXC_REAL_OOL_HV(h_data_storage, 0xe00, 0x20) -EXC_VIRT_OOL_HV(h_data_storage, 0x4e00, 0x20, 0xe00) + +EXC_REAL_BEGIN(h_data_storage, 0xe00, 0x20) + INT_HANDLER h_data_storage, 0xe00, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_REAL_END(h_data_storage, 0xe00, 0x20) +EXC_VIRT_BEGIN(h_data_storage, 0x4e00, 0x20) + INT_HANDLER h_data_storage, 0xe00, 1, 0, 1, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_VIRT_END(h_data_storage, 0x4e00, 0x20) TRAMP_KVM_HV_SKIP(PACA_EXGEN, 0xe00) EXC_COMMON_BEGIN(h_data_storage_common) mfspr r10,SPRN_HDAR @@ -1729,14 +1596,22 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_TYPE_RADIX) b ret_from_except -EXC_REAL_OOL_HV(h_instr_storage, 0xe20, 0x20) -EXC_VIRT_OOL_HV(h_instr_storage, 0x4e20, 0x20, 0xe20) +EXC_REAL_BEGIN(h_instr_storage, 0xe20, 0x20) + INT_HANDLER h_instr_storage, 0xe20, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_REAL_END(h_instr_storage, 0xe20, 0x20) +EXC_VIRT_BEGIN(h_instr_storage, 0x4e20, 0x20) + INT_HANDLER h_instr_storage, 0xe20, 1, 0, 1, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_VIRT_END(h_instr_storage, 0x4e20, 0x20) TRAMP_KVM_HV(PACA_EXGEN, 0xe20) EXC_COMMON(h_instr_storage_common, 0xe20, unknown_exception) -EXC_REAL_OOL_HV(emulation_assist, 0xe40, 0x20) -EXC_VIRT_OOL_HV(emulation_assist, 0x4e40, 0x20, 0xe40) +EXC_REAL_BEGIN(emulation_assist, 0xe40, 0x20) + INT_HANDLER emulation_assist, 0xe40, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_REAL_END(emulation_assist, 0xe40, 0x20) +EXC_VIRT_BEGIN(emulation_assist, 0x4e40, 0x20) + INT_HANDLER emulation_assist, 0xe40, 1, 0, 1, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_VIRT_END(emulation_assist, 0x4e40, 0x20) TRAMP_KVM_HV(PACA_EXGEN, 0xe40) EXC_COMMON(emulation_assist_common, 0xe40, emulation_assist_interrupt) @@ -1747,15 +1622,10 @@ EXC_COMMON(emulation_assist_common, 0xe40, emulation_assist_interrupt) * mode. */ EXC_REAL_BEGIN(hmi_exception, 0xe60, 0x20) - EXCEPTION_PROLOG_0 PACA_EXGEN - b hmi_exception_early + INT_HANDLER hmi_exception, 0xe60, 1, 1, 0, EXC_HV, PACA_EXGEN, 0, 0, 0, 0, 1 EXC_REAL_END(hmi_exception, 0xe60, 0x20) EXC_VIRT_NONE(0x4e60, 0x20) TRAMP_KVM_HV(PACA_EXGEN, 0xe60) -TRAMP_REAL_BEGIN(hmi_exception_early) - EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, 0xe60, 0, 0, 0 - mfctr r10 /* save ctr, even for !RELOCATABLE */ - BRANCH_TO_C000(r11, hmi_exception_early_common) EXC_COMMON_BEGIN(hmi_exception_early_common) mtctr r10 /* Restore ctr */ @@ -1782,9 +1652,7 @@ EXC_COMMON_BEGIN(hmi_exception_early_common) * firmware. */ EXCEPTION_RESTORE_REGS EXC_HV - EXCEPTION_PROLOG_0 PACA_EXGEN - EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, 0xe60, 0, 0, IRQS_DISABLED - EXCEPTION_PROLOG_2_REAL hmi_exception_common, EXC_HV, 1 + INT_HANDLER hmi_exception, 0xe60, 0, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 EXC_COMMON_BEGIN(hmi_exception_common) EXCEPTION_COMMON(PACA_EXGEN, 0xe60) @@ -1796,8 +1664,13 @@ EXC_COMMON_BEGIN(hmi_exception_common) bl handle_hmi_exception b ret_from_except -EXC_REAL_OOL_MASKABLE_HV(h_doorbell, 0xe80, 0x20, IRQS_DISABLED) -EXC_VIRT_OOL_MASKABLE_HV(h_doorbell, 0x4e80, 0x20, 0xe80, IRQS_DISABLED) + +EXC_REAL_BEGIN(h_doorbell, 0xe80, 0x20) + INT_HANDLER h_doorbell, 0xe80, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 +EXC_REAL_END(h_doorbell, 0xe80, 0x20) +EXC_VIRT_BEGIN(h_doorbell, 0x4e80, 0x20) + INT_HANDLER h_doorbell, 0xe80, 1, 0, 1, EXC_HV, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 +EXC_VIRT_END(h_doorbell, 0x4e80, 0x20) TRAMP_KVM_HV(PACA_EXGEN, 0xe80) #ifdef CONFIG_PPC_DOORBELL EXC_COMMON_ASYNC(h_doorbell_common, 0xe80, doorbell_exception) @@ -1806,8 +1679,12 @@ EXC_COMMON_ASYNC(h_doorbell_common, 0xe80, unknown_exception) #endif -EXC_REAL_OOL_MASKABLE_HV(h_virt_irq, 0xea0, 0x20, IRQS_DISABLED) -EXC_VIRT_OOL_MASKABLE_HV(h_virt_irq, 0x4ea0, 0x20, 0xea0, IRQS_DISABLED) +EXC_REAL_BEGIN(h_virt_irq, 0xea0, 0x20) + INT_HANDLER h_virt_irq, 0xea0, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 +EXC_REAL_END(h_virt_irq, 0xea0, 0x20) +EXC_VIRT_BEGIN(h_virt_irq, 0x4ea0, 0x20) + INT_HANDLER h_virt_irq, 0xea0, 1, 0, 1, EXC_HV, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 +EXC_VIRT_END(h_virt_irq, 0x4ea0, 0x20) TRAMP_KVM_HV(PACA_EXGEN, 0xea0) EXC_COMMON_ASYNC(h_virt_irq_common, 0xea0, do_IRQ) @@ -1818,14 +1695,22 @@ EXC_REAL_NONE(0xee0, 0x20) EXC_VIRT_NONE(0x4ee0, 0x20) -EXC_REAL_OOL_MASKABLE(performance_monitor, 0xf00, 0x20, IRQS_PMI_DISABLED) -EXC_VIRT_OOL_MASKABLE(performance_monitor, 0x4f00, 0x20, 0xf00, IRQS_PMI_DISABLED) +EXC_REAL_BEGIN(performance_monitor, 0xf00, 0x20) + INT_HANDLER performance_monitor, 0xf00, 1, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, IRQS_PMI_DISABLED, 1 +EXC_REAL_END(performance_monitor, 0xf00, 0x20) +EXC_VIRT_BEGIN(performance_monitor, 0x4f00, 0x20) + INT_HANDLER performance_monitor, 0xf00, 1, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, IRQS_PMI_DISABLED, 0 +EXC_VIRT_END(performance_monitor, 0x4f00, 0x20) TRAMP_KVM(PACA_EXGEN, 0xf00) EXC_COMMON_ASYNC(performance_monitor_common, 0xf00, performance_monitor_exception) -EXC_REAL_OOL(altivec_unavailable, 0xf20, 0x20) -EXC_VIRT_OOL(altivec_unavailable, 0x4f20, 0x20, 0xf20) +EXC_REAL_BEGIN(altivec_unavailable, 0xf20, 0x20) + INT_HANDLER altivec_unavailable, 0xf20, 1, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_REAL_END(altivec_unavailable, 0xf20, 0x20) +EXC_VIRT_BEGIN(altivec_unavailable, 0x4f20, 0x20) + INT_HANDLER altivec_unavailable, 0xf20, 1, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 +EXC_VIRT_END(altivec_unavailable, 0x4f20, 0x20) TRAMP_KVM(PACA_EXGEN, 0xf20) EXC_COMMON_BEGIN(altivec_unavailable_common) EXCEPTION_COMMON(PACA_EXGEN, 0xf20) @@ -1861,8 +1746,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) b ret_from_except -EXC_REAL_OOL(vsx_unavailable, 0xf40, 0x20) -EXC_VIRT_OOL(vsx_unavailable, 0x4f40, 0x20, 0xf40) +EXC_REAL_BEGIN(vsx_unavailable, 0xf40, 0x20) + INT_HANDLER vsx_unavailable, 0xf40, 1, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_REAL_END(vsx_unavailable, 0xf40, 0x20) +EXC_VIRT_BEGIN(vsx_unavailable, 0x4f40, 0x20) + INT_HANDLER vsx_unavailable, 0xf40, 1, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 +EXC_VIRT_END(vsx_unavailable, 0x4f40, 0x20) TRAMP_KVM(PACA_EXGEN, 0xf40) EXC_COMMON_BEGIN(vsx_unavailable_common) EXCEPTION_COMMON(PACA_EXGEN, 0xf40) @@ -1897,14 +1786,22 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) b ret_from_except -EXC_REAL_OOL(facility_unavailable, 0xf60, 0x20) -EXC_VIRT_OOL(facility_unavailable, 0x4f60, 0x20, 0xf60) +EXC_REAL_BEGIN(facility_unavailable, 0xf60, 0x20) + INT_HANDLER facility_unavailable, 0xf60, 1, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_REAL_END(facility_unavailable, 0xf60, 0x20) +EXC_VIRT_BEGIN(facility_unavailable, 0x4f60, 0x20) + INT_HANDLER facility_unavailable, 0xf60, 1, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 +EXC_VIRT_END(facility_unavailable, 0x4f60, 0x20) TRAMP_KVM(PACA_EXGEN, 0xf60) EXC_COMMON(facility_unavailable_common, 0xf60, facility_unavailable_exception) -EXC_REAL_OOL_HV(h_facility_unavailable, 0xf80, 0x20) -EXC_VIRT_OOL_HV(h_facility_unavailable, 0x4f80, 0x20, 0xf80) +EXC_REAL_BEGIN(h_facility_unavailable, 0xf80, 0x20) + INT_HANDLER h_facility_unavailable, 0xf80, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_REAL_END(h_facility_unavailable, 0xf80, 0x20) +EXC_VIRT_BEGIN(h_facility_unavailable, 0x4f80, 0x20) + INT_HANDLER h_facility_unavailable, 0xf80, 1, 0, 1, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_VIRT_END(h_facility_unavailable, 0x4f80, 0x20) TRAMP_KVM_HV(PACA_EXGEN, 0xf80) EXC_COMMON(h_facility_unavailable_common, 0xf80, facility_unavailable_exception) @@ -1922,7 +1819,9 @@ EXC_REAL_NONE(0x1100, 0x100) EXC_VIRT_NONE(0x5100, 0x100) #ifdef CONFIG_CBE_RAS -EXC_REAL_HV(cbe_system_error, 0x1200, 0x100) +EXC_REAL_BEGIN(cbe_system_error, 0x1200, 0x100) + INT_HANDLER cbe_system_error, 0x1200, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_REAL_END(cbe_system_error, 0x1200, 0x100) EXC_VIRT_NONE(0x5200, 0x100) TRAMP_KVM_HV_SKIP(PACA_EXGEN, 0x1200) EXC_COMMON(cbe_system_error_common, 0x1200, cbe_system_error_exception) @@ -1932,24 +1831,26 @@ EXC_VIRT_NONE(0x5200, 0x100) #endif -EXC_REAL(instruction_breakpoint, 0x1300, 0x100) -EXC_VIRT(instruction_breakpoint, 0x5300, 0x100, 0x1300) +EXC_REAL_BEGIN(instruction_breakpoint, 0x1300, 0x100) + INT_HANDLER instruction_breakpoint, 0x1300, 0, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_REAL_END(instruction_breakpoint, 0x1300, 0x100) +EXC_VIRT_BEGIN(instruction_breakpoint, 0x5300, 0x100) + INT_HANDLER instruction_breakpoint, 0x1300, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 +EXC_VIRT_END(instruction_breakpoint, 0x5300, 0x100) TRAMP_KVM_SKIP(PACA_EXGEN, 0x1300) EXC_COMMON(instruction_breakpoint_common, 0x1300, instruction_breakpoint_exception) + EXC_REAL_NONE(0x1400, 0x100) EXC_VIRT_NONE(0x5400, 0x100) EXC_REAL_BEGIN(denorm_exception_hv, 0x1500, 0x100) - EXCEPTION_PROLOG_0 PACA_EXGEN - EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 0, 0x1500, 0, 0, 0 - + INT_HANDLER denorm_exception_hv, 0x1500, 0, 2, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 0 #ifdef CONFIG_PPC_DENORMALISATION mfspr r10,SPRN_HSRR1 andis. r10,r10,(HSRR1_DENORM)@h /* denorm? */ bne+ denorm_assist #endif - KVMTEST EXC_HV 0x1500 EXCEPTION_PROLOG_2_REAL denorm_common, EXC_HV, 1 EXC_REAL_END(denorm_exception_hv, 0x1500, 0x100) @@ -2037,7 +1938,9 @@ EXC_COMMON(denorm_common, 0x1500, unknown_exception) #ifdef CONFIG_CBE_RAS -EXC_REAL_HV(cbe_maintenance, 0x1600, 0x100) +EXC_REAL_BEGIN(cbe_maintenance, 0x1600, 0x100) + INT_HANDLER cbe_maintenance, 0x1600, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_REAL_END(cbe_maintenance, 0x1600, 0x100) EXC_VIRT_NONE(0x5600, 0x100) TRAMP_KVM_HV_SKIP(PACA_EXGEN, 0x1600) EXC_COMMON(cbe_maintenance_common, 0x1600, cbe_maintenance_exception) @@ -2047,8 +1950,12 @@ EXC_VIRT_NONE(0x5600, 0x100) #endif -EXC_REAL(altivec_assist, 0x1700, 0x100) -EXC_VIRT(altivec_assist, 0x5700, 0x100, 0x1700) +EXC_REAL_BEGIN(altivec_assist, 0x1700, 0x100) + INT_HANDLER altivec_assist, 0x1700, 0, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_REAL_END(altivec_assist, 0x1700, 0x100) +EXC_VIRT_BEGIN(altivec_assist, 0x5700, 0x100) + INT_HANDLER altivec_assist, 0x1700, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 +EXC_VIRT_END(altivec_assist, 0x5700, 0x100) TRAMP_KVM(PACA_EXGEN, 0x1700) #ifdef CONFIG_ALTIVEC EXC_COMMON(altivec_assist_common, 0x1700, altivec_assist_exception) @@ -2058,7 +1965,9 @@ EXC_COMMON(altivec_assist_common, 0x1700, unknown_exception) #ifdef CONFIG_CBE_RAS -EXC_REAL_HV(cbe_thermal, 0x1800, 0x100) +EXC_REAL_BEGIN(cbe_thermal, 0x1800, 0x100) + INT_HANDLER cbe_thermal, 0x1800, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 +EXC_REAL_END(cbe_thermal, 0x1800, 0x100) EXC_VIRT_NONE(0x5800, 0x100) TRAMP_KVM_HV_SKIP(PACA_EXGEN, 0x1800) EXC_COMMON(cbe_thermal_common, 0x1800, cbe_thermal_exception) @@ -2067,6 +1976,7 @@ EXC_REAL_NONE(0x1800, 0x100) EXC_VIRT_NONE(0x5800, 0x100) #endif + #ifdef CONFIG_PPC_WATCHDOG #define MASKED_DEC_HANDLER_LABEL 3f -- cgit v1.2.3 From 9b40f62b8a49f0c922eb4b6cf6502f1307745c98 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Thu, 29 Aug 2019 23:36:08 +1000 Subject: powerpc/64s/exceptions: Use keyword params to shorten arg lists The argument lists for the INT_HANDLER macro are getting a bit unwieldy. Use keyword parameters with default values to shorten them. Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190830011426.16810-1-mpe@ellerman.id.au --- arch/powerpc/kernel/exceptions-64s.S | 120 +++++++++++++++++------------------ 1 file changed, 60 insertions(+), 60 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index a476bb2f80f3..10037981ff2a 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -619,7 +619,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) * - Fall through and continue executing in real, unrelocated mode. * This is done if early=2. */ -.macro INT_HANDLER name, vec, ool, early, virt, hsrr, area, ri, dar, dsisr, bitmask, kvm +.macro INT_HANDLER name, vec, ool=0, early=0, virt=0, hsrr=0, area=PACA_EXGEN, ri=1, dar=0, dsisr=0, bitmask=0, kvm=0 EXCEPTION_PROLOG_0 \area .if \ool .if !\virt @@ -812,7 +812,7 @@ BEGIN_FTR_SECTION END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) #endif - INT_HANDLER system_reset, 0x100, 0, 0, 0, EXC_STD, PACA_EXNMI, 0, 0, 0, 0, 1 + INT_HANDLER system_reset, 0x100, area=PACA_EXNMI, ri=0, kvm=1 /* * MSR_RI is not enabled, because PACA_EXNMI and nmi stack is * being used, so a nested NMI exception would corrupt it. @@ -842,7 +842,7 @@ TRAMP_REAL_BEGIN(system_reset_idle_wake) */ TRAMP_REAL_BEGIN(system_reset_fwnmi) /* See comment at system_reset exception, don't turn on RI */ - INT_HANDLER system_reset, 0x100, 0, 0, 0, EXC_STD, PACA_EXNMI, 0, 0, 0, 0, 0 + INT_HANDLER system_reset, 0x100, area=PACA_EXNMI, ri=0 #endif /* CONFIG_PPC_PSERIES */ @@ -905,7 +905,7 @@ EXC_COMMON_BEGIN(system_reset_common) EXC_REAL_BEGIN(machine_check, 0x200, 0x100) - INT_HANDLER machine_check, 0x200, 0, 1, 0, EXC_STD, PACA_EXMC, 0, 1, 1, 0, 0 + INT_HANDLER machine_check, 0x200, early=1, area=PACA_EXMC, dar=1, dsisr=1 /* * MSR_RI is not enabled, because PACA_EXMC is being used, so a * nested machine check corrupts it. machine_check_common enables @@ -917,7 +917,7 @@ EXC_VIRT_NONE(0x4200, 0x100) #ifdef CONFIG_PPC_PSERIES TRAMP_REAL_BEGIN(machine_check_fwnmi) /* See comment at machine_check exception, don't turn on RI */ - INT_HANDLER machine_check, 0x200, 0, 1, 0, EXC_STD, PACA_EXMC, 0, 1, 1, 0, 0 + INT_HANDLER machine_check, 0x200, early=1, area=PACA_EXMC, dar=1, dsisr=1 #endif TRAMP_KVM_SKIP(PACA_EXMC, 0x200) @@ -1068,7 +1068,7 @@ BEGIN_FTR_SECTION END_FTR_SECTION_IFSET(CPU_FTR_CFAR) MACHINE_CHECK_HANDLER_WINDUP /* See comment at machine_check exception, don't turn on RI */ - INT_HANDLER machine_check, 0x200, 0, 0, 0, EXC_STD, PACA_EXMC, 0, 1, 1, 0, 1 + INT_HANDLER machine_check, 0x200, area=PACA_EXMC, ri=0, dar=1, dsisr=1, kvm=1 EXC_COMMON_BEGIN(machine_check_common) /* @@ -1154,10 +1154,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) EXC_REAL_BEGIN(data_access, 0x300, 0x80) - INT_HANDLER data_access, 0x300, 1, 0, 0, EXC_STD, PACA_EXGEN, 1, 1, 1, 0, 1 + INT_HANDLER data_access, 0x300, ool=1, dar=1, dsisr=1, kvm=1 EXC_REAL_END(data_access, 0x300, 0x80) EXC_VIRT_BEGIN(data_access, 0x4300, 0x80) - INT_HANDLER data_access, 0x300, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 1, 1, 0, 0 + INT_HANDLER data_access, 0x300, virt=1, dar=1, dsisr=1 EXC_VIRT_END(data_access, 0x4300, 0x80) TRAMP_KVM_SKIP(PACA_EXGEN, 0x300) @@ -1185,10 +1185,10 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80) - INT_HANDLER data_access_slb, 0x380, 1, 0, 0, 0, PACA_EXSLB, 1, 1, 0, 0, 1 + INT_HANDLER data_access_slb, 0x380, ool=1, area=PACA_EXSLB, dar=1, kvm=1 EXC_REAL_END(data_access_slb, 0x380, 0x80) EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80) - INT_HANDLER data_access_slb, 0x380, 0, 0, 1, 0, PACA_EXSLB, 1, 1, 0, 0, 0 + INT_HANDLER data_access_slb, 0x380, virt=1, area=PACA_EXSLB, dar=1 EXC_VIRT_END(data_access_slb, 0x4380, 0x80) TRAMP_KVM_SKIP(PACA_EXSLB, 0x380) @@ -1220,10 +1220,10 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) EXC_REAL_BEGIN(instruction_access, 0x400, 0x80) - INT_HANDLER instruction_access, 0x400, 0, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER instruction_access, 0x400, kvm=1 EXC_REAL_END(instruction_access, 0x400, 0x80) EXC_VIRT_BEGIN(instruction_access, 0x4400, 0x80) - INT_HANDLER instruction_access, 0x400, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 + INT_HANDLER instruction_access, 0x400, virt=1 EXC_VIRT_END(instruction_access, 0x4400, 0x80) TRAMP_KVM(PACA_EXGEN, 0x400) @@ -1245,10 +1245,10 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) EXC_REAL_BEGIN(instruction_access_slb, 0x480, 0x80) - INT_HANDLER instruction_access_slb, 0x480, 0, 0, 0, EXC_STD, PACA_EXSLB, 1, 0, 0, 0, 1 + INT_HANDLER instruction_access_slb, 0x480, area=PACA_EXSLB, kvm=1 EXC_REAL_END(instruction_access_slb, 0x480, 0x80) EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80) - INT_HANDLER instruction_access_slb, 0x480, 0, 0, 1, EXC_STD, PACA_EXSLB, 1, 0, 0, 0, 0 + INT_HANDLER instruction_access_slb, 0x480, virt=1, area=PACA_EXSLB EXC_VIRT_END(instruction_access_slb, 0x4480, 0x80) TRAMP_KVM(PACA_EXSLB, 0x480) @@ -1277,10 +1277,10 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) b ret_from_except EXC_REAL_BEGIN(hardware_interrupt, 0x500, 0x100) - INT_HANDLER hardware_interrupt, 0x500, 0, 0, 0, EXC_HV_OR_STD, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 + INT_HANDLER hardware_interrupt, 0x500, hsrr=EXC_HV_OR_STD, bitmask=IRQS_DISABLED, kvm=1 EXC_REAL_END(hardware_interrupt, 0x500, 0x100) EXC_VIRT_BEGIN(hardware_interrupt, 0x4500, 0x100) - INT_HANDLER hardware_interrupt, 0x500, 0, 0, 1, EXC_HV_OR_STD, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 + INT_HANDLER hardware_interrupt, 0x500, virt=1, hsrr=EXC_HV_OR_STD, bitmask=IRQS_DISABLED, kvm=1 EXC_VIRT_END(hardware_interrupt, 0x4500, 0x100) TRAMP_KVM(PACA_EXGEN, 0x500) @@ -1289,10 +1289,10 @@ EXC_COMMON_ASYNC(hardware_interrupt_common, 0x500, do_IRQ) EXC_REAL_BEGIN(alignment, 0x600, 0x100) - INT_HANDLER alignment, 0x600, 0, 0, 0, EXC_STD, PACA_EXGEN, 1, 1, 1, 0, 1 + INT_HANDLER alignment, 0x600, dar=1, dsisr=1, kvm=1 EXC_REAL_END(alignment, 0x600, 0x100) EXC_VIRT_BEGIN(alignment, 0x4600, 0x100) - INT_HANDLER alignment, 0x600, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 1, 1, 0, 0 + INT_HANDLER alignment, 0x600, virt=1, dar=1, dsisr=1 EXC_VIRT_END(alignment, 0x4600, 0x100) TRAMP_KVM(PACA_EXGEN, 0x600) @@ -1310,10 +1310,10 @@ EXC_COMMON_BEGIN(alignment_common) EXC_REAL_BEGIN(program_check, 0x700, 0x100) - INT_HANDLER program_check, 0x700, 0, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER program_check, 0x700, kvm=1 EXC_REAL_END(program_check, 0x700, 0x100) EXC_VIRT_BEGIN(program_check, 0x4700, 0x100) - INT_HANDLER program_check, 0x700, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 + INT_HANDLER program_check, 0x700, virt=1 EXC_VIRT_END(program_check, 0x4700, 0x100) TRAMP_KVM(PACA_EXGEN, 0x700) EXC_COMMON_BEGIN(program_check_common) @@ -1352,10 +1352,10 @@ EXC_COMMON_BEGIN(program_check_common) EXC_REAL_BEGIN(fp_unavailable, 0x800, 0x100) - INT_HANDLER fp_unavailable, 0x800, 0, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER fp_unavailable, 0x800, kvm=1 EXC_REAL_END(fp_unavailable, 0x800, 0x100) EXC_VIRT_BEGIN(fp_unavailable, 0x4800, 0x100) - INT_HANDLER fp_unavailable, 0x800, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 + INT_HANDLER fp_unavailable, 0x800, virt=1 EXC_VIRT_END(fp_unavailable, 0x4800, 0x100) TRAMP_KVM(PACA_EXGEN, 0x800) EXC_COMMON_BEGIN(fp_unavailable_common) @@ -1390,30 +1390,30 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM) EXC_REAL_BEGIN(decrementer, 0x900, 0x80) - INT_HANDLER decrementer, 0x900, 1, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 + INT_HANDLER decrementer, 0x900, ool=1, bitmask=IRQS_DISABLED, kvm=1 EXC_REAL_END(decrementer, 0x900, 0x80) EXC_VIRT_BEGIN(decrementer, 0x4900, 0x80) - INT_HANDLER decrementer, 0x900, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 0 + INT_HANDLER decrementer, 0x900, virt=1, bitmask=IRQS_DISABLED EXC_VIRT_END(decrementer, 0x4900, 0x80) TRAMP_KVM(PACA_EXGEN, 0x900) EXC_COMMON_ASYNC(decrementer_common, 0x900, timer_interrupt) EXC_REAL_BEGIN(hdecrementer, 0x980, 0x80) - INT_HANDLER hdecrementer, 0x980, 0, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER hdecrementer, 0x980, hsrr=EXC_HV, kvm=1 EXC_REAL_END(hdecrementer, 0x980, 0x80) EXC_VIRT_BEGIN(hdecrementer, 0x4980, 0x80) - INT_HANDLER hdecrementer, 0x980, 0, 0, 1, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER hdecrementer, 0x980, virt=1, hsrr=EXC_HV, kvm=1 EXC_VIRT_END(hdecrementer, 0x4980, 0x80) TRAMP_KVM_HV(PACA_EXGEN, 0x980) EXC_COMMON(hdecrementer_common, 0x980, hdec_interrupt) EXC_REAL_BEGIN(doorbell_super, 0xa00, 0x100) - INT_HANDLER doorbell_super, 0xa00, 0, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 + INT_HANDLER doorbell_super, 0xa00, bitmask=IRQS_DISABLED, kvm=1 EXC_REAL_END(doorbell_super, 0xa00, 0x100) EXC_VIRT_BEGIN(doorbell_super, 0x4a00, 0x100) - INT_HANDLER doorbell_super, 0xa00, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 0 + INT_HANDLER doorbell_super, 0xa00, virt=1, bitmask=IRQS_DISABLED EXC_VIRT_END(doorbell_super, 0x4a00, 0x100) TRAMP_KVM(PACA_EXGEN, 0xa00) #ifdef CONFIG_PPC_DOORBELL @@ -1558,20 +1558,20 @@ TRAMP_KVM_BEGIN(do_kvm_0xc00) EXC_REAL_BEGIN(single_step, 0xd00, 0x100) - INT_HANDLER single_step, 0xd00, 0, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER single_step, 0xd00, kvm=1 EXC_REAL_END(single_step, 0xd00, 0x100) EXC_VIRT_BEGIN(single_step, 0x4d00, 0x100) - INT_HANDLER single_step, 0xd00, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 + INT_HANDLER single_step, 0xd00, virt=1 EXC_VIRT_END(single_step, 0x4d00, 0x100) TRAMP_KVM(PACA_EXGEN, 0xd00) EXC_COMMON(single_step_common, 0xd00, single_step_exception) EXC_REAL_BEGIN(h_data_storage, 0xe00, 0x20) - INT_HANDLER h_data_storage, 0xe00, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER h_data_storage, 0xe00, ool=1, hsrr=EXC_HV, kvm=1 EXC_REAL_END(h_data_storage, 0xe00, 0x20) EXC_VIRT_BEGIN(h_data_storage, 0x4e00, 0x20) - INT_HANDLER h_data_storage, 0xe00, 1, 0, 1, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER h_data_storage, 0xe00, ool=1, virt=1, hsrr=EXC_HV, kvm=1 EXC_VIRT_END(h_data_storage, 0x4e00, 0x20) TRAMP_KVM_HV_SKIP(PACA_EXGEN, 0xe00) EXC_COMMON_BEGIN(h_data_storage_common) @@ -1597,20 +1597,20 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_TYPE_RADIX) EXC_REAL_BEGIN(h_instr_storage, 0xe20, 0x20) - INT_HANDLER h_instr_storage, 0xe20, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER h_instr_storage, 0xe20, ool=1, hsrr=EXC_HV, kvm=1 EXC_REAL_END(h_instr_storage, 0xe20, 0x20) EXC_VIRT_BEGIN(h_instr_storage, 0x4e20, 0x20) - INT_HANDLER h_instr_storage, 0xe20, 1, 0, 1, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER h_instr_storage, 0xe20, ool=1, virt=1, hsrr=EXC_HV, kvm=1 EXC_VIRT_END(h_instr_storage, 0x4e20, 0x20) TRAMP_KVM_HV(PACA_EXGEN, 0xe20) EXC_COMMON(h_instr_storage_common, 0xe20, unknown_exception) EXC_REAL_BEGIN(emulation_assist, 0xe40, 0x20) - INT_HANDLER emulation_assist, 0xe40, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER emulation_assist, 0xe40, ool=1, hsrr=EXC_HV, kvm=1 EXC_REAL_END(emulation_assist, 0xe40, 0x20) EXC_VIRT_BEGIN(emulation_assist, 0x4e40, 0x20) - INT_HANDLER emulation_assist, 0xe40, 1, 0, 1, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER emulation_assist, 0xe40, ool=1, virt=1, hsrr=EXC_HV, kvm=1 EXC_VIRT_END(emulation_assist, 0x4e40, 0x20) TRAMP_KVM_HV(PACA_EXGEN, 0xe40) EXC_COMMON(emulation_assist_common, 0xe40, emulation_assist_interrupt) @@ -1622,7 +1622,7 @@ EXC_COMMON(emulation_assist_common, 0xe40, emulation_assist_interrupt) * mode. */ EXC_REAL_BEGIN(hmi_exception, 0xe60, 0x20) - INT_HANDLER hmi_exception, 0xe60, 1, 1, 0, EXC_HV, PACA_EXGEN, 0, 0, 0, 0, 1 + INT_HANDLER hmi_exception, 0xe60, ool=1, early=1, hsrr=EXC_HV, ri=0, kvm=1 EXC_REAL_END(hmi_exception, 0xe60, 0x20) EXC_VIRT_NONE(0x4e60, 0x20) TRAMP_KVM_HV(PACA_EXGEN, 0xe60) @@ -1652,7 +1652,7 @@ EXC_COMMON_BEGIN(hmi_exception_early_common) * firmware. */ EXCEPTION_RESTORE_REGS EXC_HV - INT_HANDLER hmi_exception, 0xe60, 0, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 + INT_HANDLER hmi_exception, 0xe60, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1 EXC_COMMON_BEGIN(hmi_exception_common) EXCEPTION_COMMON(PACA_EXGEN, 0xe60) @@ -1666,10 +1666,10 @@ EXC_COMMON_BEGIN(hmi_exception_common) EXC_REAL_BEGIN(h_doorbell, 0xe80, 0x20) - INT_HANDLER h_doorbell, 0xe80, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 + INT_HANDLER h_doorbell, 0xe80, ool=1, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1 EXC_REAL_END(h_doorbell, 0xe80, 0x20) EXC_VIRT_BEGIN(h_doorbell, 0x4e80, 0x20) - INT_HANDLER h_doorbell, 0xe80, 1, 0, 1, EXC_HV, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 + INT_HANDLER h_doorbell, 0xe80, ool=1, virt=1, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1 EXC_VIRT_END(h_doorbell, 0x4e80, 0x20) TRAMP_KVM_HV(PACA_EXGEN, 0xe80) #ifdef CONFIG_PPC_DOORBELL @@ -1680,10 +1680,10 @@ EXC_COMMON_ASYNC(h_doorbell_common, 0xe80, unknown_exception) EXC_REAL_BEGIN(h_virt_irq, 0xea0, 0x20) - INT_HANDLER h_virt_irq, 0xea0, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 + INT_HANDLER h_virt_irq, 0xea0, ool=1, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1 EXC_REAL_END(h_virt_irq, 0xea0, 0x20) EXC_VIRT_BEGIN(h_virt_irq, 0x4ea0, 0x20) - INT_HANDLER h_virt_irq, 0xea0, 1, 0, 1, EXC_HV, PACA_EXGEN, 1, 0, 0, IRQS_DISABLED, 1 + INT_HANDLER h_virt_irq, 0xea0, ool=1, virt=1, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1 EXC_VIRT_END(h_virt_irq, 0x4ea0, 0x20) TRAMP_KVM_HV(PACA_EXGEN, 0xea0) EXC_COMMON_ASYNC(h_virt_irq_common, 0xea0, do_IRQ) @@ -1696,20 +1696,20 @@ EXC_VIRT_NONE(0x4ee0, 0x20) EXC_REAL_BEGIN(performance_monitor, 0xf00, 0x20) - INT_HANDLER performance_monitor, 0xf00, 1, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, IRQS_PMI_DISABLED, 1 + INT_HANDLER performance_monitor, 0xf00, ool=1, bitmask=IRQS_PMI_DISABLED, kvm=1 EXC_REAL_END(performance_monitor, 0xf00, 0x20) EXC_VIRT_BEGIN(performance_monitor, 0x4f00, 0x20) - INT_HANDLER performance_monitor, 0xf00, 1, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, IRQS_PMI_DISABLED, 0 + INT_HANDLER performance_monitor, 0xf00, ool=1, virt=1, bitmask=IRQS_PMI_DISABLED EXC_VIRT_END(performance_monitor, 0x4f00, 0x20) TRAMP_KVM(PACA_EXGEN, 0xf00) EXC_COMMON_ASYNC(performance_monitor_common, 0xf00, performance_monitor_exception) EXC_REAL_BEGIN(altivec_unavailable, 0xf20, 0x20) - INT_HANDLER altivec_unavailable, 0xf20, 1, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER altivec_unavailable, 0xf20, ool=1, kvm=1 EXC_REAL_END(altivec_unavailable, 0xf20, 0x20) EXC_VIRT_BEGIN(altivec_unavailable, 0x4f20, 0x20) - INT_HANDLER altivec_unavailable, 0xf20, 1, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 + INT_HANDLER altivec_unavailable, 0xf20, ool=1, virt=1 EXC_VIRT_END(altivec_unavailable, 0x4f20, 0x20) TRAMP_KVM(PACA_EXGEN, 0xf20) EXC_COMMON_BEGIN(altivec_unavailable_common) @@ -1747,10 +1747,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) EXC_REAL_BEGIN(vsx_unavailable, 0xf40, 0x20) - INT_HANDLER vsx_unavailable, 0xf40, 1, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER vsx_unavailable, 0xf40, ool=1, kvm=1 EXC_REAL_END(vsx_unavailable, 0xf40, 0x20) EXC_VIRT_BEGIN(vsx_unavailable, 0x4f40, 0x20) - INT_HANDLER vsx_unavailable, 0xf40, 1, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 + INT_HANDLER vsx_unavailable, 0xf40, ool=1, virt=1 EXC_VIRT_END(vsx_unavailable, 0x4f40, 0x20) TRAMP_KVM(PACA_EXGEN, 0xf40) EXC_COMMON_BEGIN(vsx_unavailable_common) @@ -1787,20 +1787,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) EXC_REAL_BEGIN(facility_unavailable, 0xf60, 0x20) - INT_HANDLER facility_unavailable, 0xf60, 1, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER facility_unavailable, 0xf60, ool=1, kvm=1 EXC_REAL_END(facility_unavailable, 0xf60, 0x20) EXC_VIRT_BEGIN(facility_unavailable, 0x4f60, 0x20) - INT_HANDLER facility_unavailable, 0xf60, 1, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 + INT_HANDLER facility_unavailable, 0xf60, ool=1, virt=1 EXC_VIRT_END(facility_unavailable, 0x4f60, 0x20) TRAMP_KVM(PACA_EXGEN, 0xf60) EXC_COMMON(facility_unavailable_common, 0xf60, facility_unavailable_exception) EXC_REAL_BEGIN(h_facility_unavailable, 0xf80, 0x20) - INT_HANDLER h_facility_unavailable, 0xf80, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER h_facility_unavailable, 0xf80, ool=1, hsrr=EXC_HV, kvm=1 EXC_REAL_END(h_facility_unavailable, 0xf80, 0x20) EXC_VIRT_BEGIN(h_facility_unavailable, 0x4f80, 0x20) - INT_HANDLER h_facility_unavailable, 0xf80, 1, 0, 1, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER h_facility_unavailable, 0xf80, ool=1, virt=1, hsrr=EXC_HV, kvm=1 EXC_VIRT_END(h_facility_unavailable, 0x4f80, 0x20) TRAMP_KVM_HV(PACA_EXGEN, 0xf80) EXC_COMMON(h_facility_unavailable_common, 0xf80, facility_unavailable_exception) @@ -1820,7 +1820,7 @@ EXC_VIRT_NONE(0x5100, 0x100) #ifdef CONFIG_CBE_RAS EXC_REAL_BEGIN(cbe_system_error, 0x1200, 0x100) - INT_HANDLER cbe_system_error, 0x1200, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER cbe_system_error, 0x1200, ool=1, hsrr=EXC_HV, kvm=1 EXC_REAL_END(cbe_system_error, 0x1200, 0x100) EXC_VIRT_NONE(0x5200, 0x100) TRAMP_KVM_HV_SKIP(PACA_EXGEN, 0x1200) @@ -1832,10 +1832,10 @@ EXC_VIRT_NONE(0x5200, 0x100) EXC_REAL_BEGIN(instruction_breakpoint, 0x1300, 0x100) - INT_HANDLER instruction_breakpoint, 0x1300, 0, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER instruction_breakpoint, 0x1300, kvm=1 EXC_REAL_END(instruction_breakpoint, 0x1300, 0x100) EXC_VIRT_BEGIN(instruction_breakpoint, 0x5300, 0x100) - INT_HANDLER instruction_breakpoint, 0x1300, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 + INT_HANDLER instruction_breakpoint, 0x1300, virt=1 EXC_VIRT_END(instruction_breakpoint, 0x5300, 0x100) TRAMP_KVM_SKIP(PACA_EXGEN, 0x1300) EXC_COMMON(instruction_breakpoint_common, 0x1300, instruction_breakpoint_exception) @@ -1845,7 +1845,7 @@ EXC_REAL_NONE(0x1400, 0x100) EXC_VIRT_NONE(0x5400, 0x100) EXC_REAL_BEGIN(denorm_exception_hv, 0x1500, 0x100) - INT_HANDLER denorm_exception_hv, 0x1500, 0, 2, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 0 + INT_HANDLER denorm_exception_hv, 0x1500, early=2, hsrr=EXC_HV #ifdef CONFIG_PPC_DENORMALISATION mfspr r10,SPRN_HSRR1 andis. r10,r10,(HSRR1_DENORM)@h /* denorm? */ @@ -1939,7 +1939,7 @@ EXC_COMMON(denorm_common, 0x1500, unknown_exception) #ifdef CONFIG_CBE_RAS EXC_REAL_BEGIN(cbe_maintenance, 0x1600, 0x100) - INT_HANDLER cbe_maintenance, 0x1600, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER cbe_maintenance, 0x1600, ool=1, hsrr=EXC_HV, kvm=1 EXC_REAL_END(cbe_maintenance, 0x1600, 0x100) EXC_VIRT_NONE(0x5600, 0x100) TRAMP_KVM_HV_SKIP(PACA_EXGEN, 0x1600) @@ -1951,10 +1951,10 @@ EXC_VIRT_NONE(0x5600, 0x100) EXC_REAL_BEGIN(altivec_assist, 0x1700, 0x100) - INT_HANDLER altivec_assist, 0x1700, 0, 0, 0, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER altivec_assist, 0x1700, kvm=1 EXC_REAL_END(altivec_assist, 0x1700, 0x100) EXC_VIRT_BEGIN(altivec_assist, 0x5700, 0x100) - INT_HANDLER altivec_assist, 0x1700, 0, 0, 1, EXC_STD, PACA_EXGEN, 1, 0, 0, 0, 0 + INT_HANDLER altivec_assist, 0x1700, virt=1 EXC_VIRT_END(altivec_assist, 0x5700, 0x100) TRAMP_KVM(PACA_EXGEN, 0x1700) #ifdef CONFIG_ALTIVEC @@ -1966,7 +1966,7 @@ EXC_COMMON(altivec_assist_common, 0x1700, unknown_exception) #ifdef CONFIG_CBE_RAS EXC_REAL_BEGIN(cbe_thermal, 0x1800, 0x100) - INT_HANDLER cbe_thermal, 0x1800, 1, 0, 0, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 1 + INT_HANDLER cbe_thermal, 0x1800, ool=1, hsrr=EXC_HV, kvm=1 EXC_REAL_END(cbe_thermal, 0x1800, 0x100) EXC_VIRT_NONE(0x5800, 0x100) TRAMP_KVM_HV_SKIP(PACA_EXGEN, 0x1800) -- cgit v1.2.3 From d29768e13cf6b50bd54a690e2ac52ab71465e2eb Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:48 +1000 Subject: powerpc/64s/exception: remove EXCEPTION_PROLOG_0/1, rename _2 EXCEPTION_PROLOG_0 and _1 have only a single caller, so expand them into it. Rename EXCEPTION_PROLOG_2_REAL to INT_SAVE_SRR_AND_JUMP and EXCEPTION_PROLOG_2_VIRT to INT_VIRT_SAVE_SRR_AND_JUMP, which are more descriptive. No generated code change except BUG line number constants. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-24-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 151 +++++++++++++++++------------------ 1 file changed, 73 insertions(+), 78 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 10037981ff2a..1ae2a8d59aa0 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -180,77 +180,7 @@ BEGIN_FTR_SECTION_NESTED(943) \ std ra,offset(r13); \ END_FTR_SECTION_NESTED(ftr,ftr,943) -.macro EXCEPTION_PROLOG_0 area - SET_SCRATCH0(r13) /* save r13 */ - GET_PACA(r13) - std r9,\area\()+EX_R9(r13) /* save r9 */ - OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR) - HMT_MEDIUM - std r10,\area\()+EX_R10(r13) /* save r10 - r12 */ - OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR) -.endm - -.macro EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, dar, dsisr, bitmask - OPT_SAVE_REG_TO_PACA(\area\()+EX_PPR, r9, CPU_FTR_HAS_PPR) - OPT_SAVE_REG_TO_PACA(\area\()+EX_CFAR, r10, CPU_FTR_CFAR) - INTERRUPT_TO_KERNEL - SAVE_CTR(r10, \area\()) - mfcr r9 - .if \kvm - KVMTEST \hsrr \vec - .endif - .if \bitmask - lbz r10,PACAIRQSOFTMASK(r13) - andi. r10,r10,\bitmask - /* Associate vector numbers with bits in paca->irq_happened */ - .if \vec == 0x500 || \vec == 0xea0 - li r10,PACA_IRQ_EE - .elseif \vec == 0x900 - li r10,PACA_IRQ_DEC - .elseif \vec == 0xa00 || \vec == 0xe80 - li r10,PACA_IRQ_DBELL - .elseif \vec == 0xe60 - li r10,PACA_IRQ_HMI - .elseif \vec == 0xf00 - li r10,PACA_IRQ_PMI - .else - .abort "Bad maskable vector" - .endif - - .if \hsrr == EXC_HV_OR_STD - BEGIN_FTR_SECTION - bne masked_Hinterrupt - FTR_SECTION_ELSE - bne masked_interrupt - ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) - .elseif \hsrr - bne masked_Hinterrupt - .else - bne masked_interrupt - .endif - .endif - - std r11,\area\()+EX_R11(r13) - std r12,\area\()+EX_R12(r13) - - /* - * DAR/DSISR, SCRATCH0 must be read before setting MSR[RI], - * because a d-side MCE will clobber those registers so is - * not recoverable if they are live. - */ - GET_SCRATCH0(r10) - std r10,\area\()+EX_R13(r13) - .if \dar - mfspr r10,SPRN_DAR - std r10,\area\()+EX_DAR(r13) - .endif - .if \dsisr - mfspr r10,SPRN_DSISR - stw r10,\area\()+EX_DSISR(r13) - .endif -.endm - -.macro EXCEPTION_PROLOG_2_REAL label, hsrr, set_ri +.macro INT_SAVE_SRR_AND_JUMP label, hsrr, set_ri ld r10,PACAKMSR(r13) /* get MSR value for kernel */ .if ! \set_ri xori r10,r10,MSR_RI /* Clear MSR_RI */ @@ -293,7 +223,8 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) b . /* prevent speculative execution */ .endm -.macro EXCEPTION_PROLOG_2_VIRT label, hsrr +/* INT_SAVE_SRR_AND_JUMP works for real or virt, this is faster but virt only */ +.macro INT_VIRT_SAVE_SRR_AND_JUMP label, hsrr #ifdef CONFIG_RELOCATABLE .if \hsrr == EXC_HV_OR_STD BEGIN_FTR_SECTION @@ -620,7 +551,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) * This is done if early=2. */ .macro INT_HANDLER name, vec, ool=0, early=0, virt=0, hsrr=0, area=PACA_EXGEN, ri=1, dar=0, dsisr=0, bitmask=0, kvm=0 - EXCEPTION_PROLOG_0 \area + SET_SCRATCH0(r13) /* save r13 */ + GET_PACA(r13) + std r9,\area\()+EX_R9(r13) /* save r9 */ + OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR) + HMT_MEDIUM + std r10,\area\()+EX_R10(r13) /* save r10 - r12 */ + OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR) .if \ool .if !\virt b tramp_real_\name @@ -632,16 +569,74 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) TRAMP_VIRT_BEGIN(tramp_virt_\name) .endif .endif - EXCEPTION_PROLOG_1 \hsrr, \area, \kvm, \vec, \dar, \dsisr, \bitmask + + OPT_SAVE_REG_TO_PACA(\area\()+EX_PPR, r9, CPU_FTR_HAS_PPR) + OPT_SAVE_REG_TO_PACA(\area\()+EX_CFAR, r10, CPU_FTR_CFAR) + INTERRUPT_TO_KERNEL + SAVE_CTR(r10, \area\()) + mfcr r9 + .if \kvm + KVMTEST \hsrr \vec + .endif + .if \bitmask + lbz r10,PACAIRQSOFTMASK(r13) + andi. r10,r10,\bitmask + /* Associate vector numbers with bits in paca->irq_happened */ + .if \vec == 0x500 || \vec == 0xea0 + li r10,PACA_IRQ_EE + .elseif \vec == 0x900 + li r10,PACA_IRQ_DEC + .elseif \vec == 0xa00 || \vec == 0xe80 + li r10,PACA_IRQ_DBELL + .elseif \vec == 0xe60 + li r10,PACA_IRQ_HMI + .elseif \vec == 0xf00 + li r10,PACA_IRQ_PMI + .else + .abort "Bad maskable vector" + .endif + + .if \hsrr == EXC_HV_OR_STD + BEGIN_FTR_SECTION + bne masked_Hinterrupt + FTR_SECTION_ELSE + bne masked_interrupt + ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + .elseif \hsrr + bne masked_Hinterrupt + .else + bne masked_interrupt + .endif + .endif + + std r11,\area\()+EX_R11(r13) + std r12,\area\()+EX_R12(r13) + + /* + * DAR/DSISR, SCRATCH0 must be read before setting MSR[RI], + * because a d-side MCE will clobber those registers so is + * not recoverable if they are live. + */ + GET_SCRATCH0(r10) + std r10,\area\()+EX_R13(r13) + .if \dar + mfspr r10,SPRN_DAR + std r10,\area\()+EX_DAR(r13) + .endif + .if \dsisr + mfspr r10,SPRN_DSISR + stw r10,\area\()+EX_DSISR(r13) + .endif + .if \early == 2 /* nothing more */ .elseif \early mfctr r10 /* save ctr, even for !RELOCATABLE */ BRANCH_TO_C000(r11, \name\()_early_common) .elseif !\virt - EXCEPTION_PROLOG_2_REAL \name\()_common, \hsrr, \ri + INT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr, \ri .else - EXCEPTION_PROLOG_2_VIRT \name\()_common, \hsrr + INT_VIRT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr .endif .if \ool .popsection @@ -1852,7 +1847,7 @@ EXC_REAL_BEGIN(denorm_exception_hv, 0x1500, 0x100) bne+ denorm_assist #endif KVMTEST EXC_HV 0x1500 - EXCEPTION_PROLOG_2_REAL denorm_common, EXC_HV, 1 + INT_SAVE_SRR_AND_JUMP denorm_common, EXC_HV, 1 EXC_REAL_END(denorm_exception_hv, 0x1500, 0x100) #ifdef CONFIG_PPC_DENORMALISATION @@ -1986,7 +1981,7 @@ EXC_VIRT_NONE(0x5800, 0x100) std r12,PACA_EXGEN+EX_R12(r13); \ GET_SCRATCH0(r10); \ std r10,PACA_EXGEN+EX_R13(r13); \ - EXCEPTION_PROLOG_2_REAL soft_nmi_common, _H, 1 + INT_SAVE_SRR_AND_JUMP soft_nmi_common, _H, 1 /* * Branch to soft_nmi_interrupt using the emergency stack. The emergency -- cgit v1.2.3 From 52b989231c6fad42dff57b69a53f38756db48e06 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:49 +1000 Subject: powerpc/64s/exception: Add the virt variant of the denorm interrupt handler All other virt handlers have the prolog code in the virt vector rather than branch to the real vector. Follow this pattern in the denorm virt handler. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-25-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 1ae2a8d59aa0..d2aa63b6a8a8 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1852,7 +1852,11 @@ EXC_REAL_END(denorm_exception_hv, 0x1500, 0x100) #ifdef CONFIG_PPC_DENORMALISATION EXC_VIRT_BEGIN(denorm_exception, 0x5500, 0x100) - b exc_real_0x1500_denorm_exception_hv + INT_HANDLER denorm_exception, 0x1500, 0, 2, 1, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 0 + mfspr r10,SPRN_HSRR1 + andis. r10,r10,(HSRR1_DENORM)@h /* denorm? */ + bne+ denorm_assist + INT_VIRT_SAVE_SRR_AND_JUMP denorm_common, EXC_HV EXC_VIRT_END(denorm_exception, 0x5500, 0x100) #else EXC_VIRT_NONE(0x5500, 0x100) -- cgit v1.2.3 From 4515c5fa41936088a57efe0b64d1bb46a4943582 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:50 +1000 Subject: powerpc/64s/exception: INT_HANDLER support HDAR/HDSISR and use it in HDSI Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-26-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index d2aa63b6a8a8..476e4bbf1bf5 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -620,11 +620,19 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) GET_SCRATCH0(r10) std r10,\area\()+EX_R13(r13) .if \dar + .if \hsrr + mfspr r10,SPRN_HDAR + .else mfspr r10,SPRN_DAR + .endif std r10,\area\()+EX_DAR(r13) .endif .if \dsisr + .if \hsrr + mfspr r10,SPRN_HDSISR + .else mfspr r10,SPRN_DSISR + .endif stw r10,\area\()+EX_DSISR(r13) .endif @@ -1563,17 +1571,13 @@ EXC_COMMON(single_step_common, 0xd00, single_step_exception) EXC_REAL_BEGIN(h_data_storage, 0xe00, 0x20) - INT_HANDLER h_data_storage, 0xe00, ool=1, hsrr=EXC_HV, kvm=1 + INT_HANDLER h_data_storage, 0xe00, ool=1, hsrr=EXC_HV, dar=1, dsisr=1, kvm=1 EXC_REAL_END(h_data_storage, 0xe00, 0x20) EXC_VIRT_BEGIN(h_data_storage, 0x4e00, 0x20) - INT_HANDLER h_data_storage, 0xe00, ool=1, virt=1, hsrr=EXC_HV, kvm=1 + INT_HANDLER h_data_storage, 0xe00, ool=1, virt=1, hsrr=EXC_HV, dar=1, dsisr=1, kvm=1 EXC_VIRT_END(h_data_storage, 0x4e00, 0x20) TRAMP_KVM_HV_SKIP(PACA_EXGEN, 0xe00) EXC_COMMON_BEGIN(h_data_storage_common) - mfspr r10,SPRN_HDAR - std r10,PACA_EXGEN+EX_DAR(r13) - mfspr r10,SPRN_HDSISR - stw r10,PACA_EXGEN+EX_DSISR(r13) EXCEPTION_COMMON(PACA_EXGEN, 0xe00) bl save_nvgprs RECONCILE_IRQ_STATE(r10, r11) -- cgit v1.2.3 From 141fed2669a93604bd5ce8b793d85f4798626ef5 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:51 +1000 Subject: powerpc/64s/exception: Add INT_KVM_HANDLER gas macro Replace the 4 variants of cpp macros with one gas macro. No generated code change except BUG line number constants. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-27-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 99 +++++++++++++++--------------------- 1 file changed, 40 insertions(+), 59 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 476e4bbf1bf5..cff48d212011 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -651,22 +651,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) .endif .endm - -#define TRAMP_KVM(area, n) \ - TRAMP_KVM_BEGIN(do_kvm_##n); \ - KVM_HANDLER area, EXC_STD, n, 0 - -#define TRAMP_KVM_SKIP(area, n) \ - TRAMP_KVM_BEGIN(do_kvm_##n); \ - KVM_HANDLER area, EXC_STD, n, 1 - -#define TRAMP_KVM_HV(area, n) \ - TRAMP_KVM_BEGIN(do_kvm_H##n); \ - KVM_HANDLER area, EXC_HV, n, 0 - -#define TRAMP_KVM_HV_SKIP(area, n) \ - TRAMP_KVM_BEGIN(do_kvm_H##n); \ - KVM_HANDLER area, EXC_HV, n, 1 +.macro INT_KVM_HANDLER vec, hsrr, area, skip + .if \hsrr + TRAMP_KVM_BEGIN(do_kvm_H\vec\()) + .else + TRAMP_KVM_BEGIN(do_kvm_\vec\()) + .endif + KVM_HANDLER \area, \hsrr, \vec, \skip +.endm #define EXC_COMMON(name, realvec, hdlr) \ EXC_COMMON_BEGIN(name); \ @@ -827,9 +819,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) * be dangerous anyway. */ EXC_REAL_END(system_reset, 0x100, 0x100) - EXC_VIRT_NONE(0x4100, 0x100) -TRAMP_KVM(PACA_EXNMI, 0x100) +INT_KVM_HANDLER 0x100, EXC_STD, PACA_EXNMI, 0 #ifdef CONFIG_PPC_P7_NAP TRAMP_REAL_BEGIN(system_reset_idle_wake) @@ -923,7 +914,7 @@ TRAMP_REAL_BEGIN(machine_check_fwnmi) INT_HANDLER machine_check, 0x200, early=1, area=PACA_EXMC, dar=1, dsisr=1 #endif -TRAMP_KVM_SKIP(PACA_EXMC, 0x200) +INT_KVM_HANDLER 0x200, EXC_STD, PACA_EXMC, 1 #define MACHINE_CHECK_HANDLER_WINDUP \ /* Clear MSR_RI before setting SRR0 and SRR1. */\ @@ -1162,9 +1153,7 @@ EXC_REAL_END(data_access, 0x300, 0x80) EXC_VIRT_BEGIN(data_access, 0x4300, 0x80) INT_HANDLER data_access, 0x300, virt=1, dar=1, dsisr=1 EXC_VIRT_END(data_access, 0x4300, 0x80) - -TRAMP_KVM_SKIP(PACA_EXGEN, 0x300) - +INT_KVM_HANDLER 0x300, EXC_STD, PACA_EXGEN, 1 EXC_COMMON_BEGIN(data_access_common) /* * Here r13 points to the paca, r9 contains the saved CR, @@ -1193,9 +1182,7 @@ EXC_REAL_END(data_access_slb, 0x380, 0x80) EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80) INT_HANDLER data_access_slb, 0x380, virt=1, area=PACA_EXSLB, dar=1 EXC_VIRT_END(data_access_slb, 0x4380, 0x80) - -TRAMP_KVM_SKIP(PACA_EXSLB, 0x380) - +INT_KVM_HANDLER 0x380, EXC_STD, PACA_EXSLB, 1 EXC_COMMON_BEGIN(data_access_slb_common) EXCEPTION_COMMON(PACA_EXSLB, 0x380) ld r4,PACA_EXSLB+EX_DAR(r13) @@ -1228,9 +1215,7 @@ EXC_REAL_END(instruction_access, 0x400, 0x80) EXC_VIRT_BEGIN(instruction_access, 0x4400, 0x80) INT_HANDLER instruction_access, 0x400, virt=1 EXC_VIRT_END(instruction_access, 0x4400, 0x80) - -TRAMP_KVM(PACA_EXGEN, 0x400) - +INT_KVM_HANDLER 0x400, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(instruction_access_common) EXCEPTION_COMMON(PACA_EXGEN, 0x400) RECONCILE_IRQ_STATE(r10, r11) @@ -1253,8 +1238,7 @@ EXC_REAL_END(instruction_access_slb, 0x480, 0x80) EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80) INT_HANDLER instruction_access_slb, 0x480, virt=1, area=PACA_EXSLB EXC_VIRT_END(instruction_access_slb, 0x4480, 0x80) -TRAMP_KVM(PACA_EXSLB, 0x480) - +INT_KVM_HANDLER 0x480, EXC_STD, PACA_EXSLB, 0 EXC_COMMON_BEGIN(instruction_access_slb_common) EXCEPTION_COMMON(PACA_EXSLB, 0x480) ld r4,_NIP(r1) @@ -1285,9 +1269,8 @@ EXC_REAL_END(hardware_interrupt, 0x500, 0x100) EXC_VIRT_BEGIN(hardware_interrupt, 0x4500, 0x100) INT_HANDLER hardware_interrupt, 0x500, virt=1, hsrr=EXC_HV_OR_STD, bitmask=IRQS_DISABLED, kvm=1 EXC_VIRT_END(hardware_interrupt, 0x4500, 0x100) - -TRAMP_KVM(PACA_EXGEN, 0x500) -TRAMP_KVM_HV(PACA_EXGEN, 0x500) +INT_KVM_HANDLER 0x500, EXC_STD, PACA_EXGEN, 0 +INT_KVM_HANDLER 0x500, EXC_HV, PACA_EXGEN, 0 EXC_COMMON_ASYNC(hardware_interrupt_common, 0x500, do_IRQ) @@ -1297,8 +1280,7 @@ EXC_REAL_END(alignment, 0x600, 0x100) EXC_VIRT_BEGIN(alignment, 0x4600, 0x100) INT_HANDLER alignment, 0x600, virt=1, dar=1, dsisr=1 EXC_VIRT_END(alignment, 0x4600, 0x100) - -TRAMP_KVM(PACA_EXGEN, 0x600) +INT_KVM_HANDLER 0x600, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(alignment_common) EXCEPTION_COMMON(PACA_EXGEN, 0x600) ld r3,PACA_EXGEN+EX_DAR(r13) @@ -1318,7 +1300,7 @@ EXC_REAL_END(program_check, 0x700, 0x100) EXC_VIRT_BEGIN(program_check, 0x4700, 0x100) INT_HANDLER program_check, 0x700, virt=1 EXC_VIRT_END(program_check, 0x4700, 0x100) -TRAMP_KVM(PACA_EXGEN, 0x700) +INT_KVM_HANDLER 0x700, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(program_check_common) /* * It's possible to receive a TM Bad Thing type program check with @@ -1360,7 +1342,7 @@ EXC_REAL_END(fp_unavailable, 0x800, 0x100) EXC_VIRT_BEGIN(fp_unavailable, 0x4800, 0x100) INT_HANDLER fp_unavailable, 0x800, virt=1 EXC_VIRT_END(fp_unavailable, 0x4800, 0x100) -TRAMP_KVM(PACA_EXGEN, 0x800) +INT_KVM_HANDLER 0x800, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(fp_unavailable_common) EXCEPTION_COMMON(PACA_EXGEN, 0x800) bne 1f /* if from user, just load it up */ @@ -1398,7 +1380,7 @@ EXC_REAL_END(decrementer, 0x900, 0x80) EXC_VIRT_BEGIN(decrementer, 0x4900, 0x80) INT_HANDLER decrementer, 0x900, virt=1, bitmask=IRQS_DISABLED EXC_VIRT_END(decrementer, 0x4900, 0x80) -TRAMP_KVM(PACA_EXGEN, 0x900) +INT_KVM_HANDLER 0x900, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_ASYNC(decrementer_common, 0x900, timer_interrupt) @@ -1408,7 +1390,7 @@ EXC_REAL_END(hdecrementer, 0x980, 0x80) EXC_VIRT_BEGIN(hdecrementer, 0x4980, 0x80) INT_HANDLER hdecrementer, 0x980, virt=1, hsrr=EXC_HV, kvm=1 EXC_VIRT_END(hdecrementer, 0x4980, 0x80) -TRAMP_KVM_HV(PACA_EXGEN, 0x980) +INT_KVM_HANDLER 0x980, EXC_HV, PACA_EXGEN, 0 EXC_COMMON(hdecrementer_common, 0x980, hdec_interrupt) @@ -1418,7 +1400,7 @@ EXC_REAL_END(doorbell_super, 0xa00, 0x100) EXC_VIRT_BEGIN(doorbell_super, 0x4a00, 0x100) INT_HANDLER doorbell_super, 0xa00, virt=1, bitmask=IRQS_DISABLED EXC_VIRT_END(doorbell_super, 0x4a00, 0x100) -TRAMP_KVM(PACA_EXGEN, 0xa00) +INT_KVM_HANDLER 0xa00, EXC_STD, PACA_EXGEN, 0 #ifdef CONFIG_PPC_DOORBELL EXC_COMMON_ASYNC(doorbell_super_common, 0xa00, doorbell_exception) #else @@ -1566,7 +1548,7 @@ EXC_REAL_END(single_step, 0xd00, 0x100) EXC_VIRT_BEGIN(single_step, 0x4d00, 0x100) INT_HANDLER single_step, 0xd00, virt=1 EXC_VIRT_END(single_step, 0x4d00, 0x100) -TRAMP_KVM(PACA_EXGEN, 0xd00) +INT_KVM_HANDLER 0xd00, EXC_STD, PACA_EXGEN, 0 EXC_COMMON(single_step_common, 0xd00, single_step_exception) @@ -1576,7 +1558,7 @@ EXC_REAL_END(h_data_storage, 0xe00, 0x20) EXC_VIRT_BEGIN(h_data_storage, 0x4e00, 0x20) INT_HANDLER h_data_storage, 0xe00, ool=1, virt=1, hsrr=EXC_HV, dar=1, dsisr=1, kvm=1 EXC_VIRT_END(h_data_storage, 0x4e00, 0x20) -TRAMP_KVM_HV_SKIP(PACA_EXGEN, 0xe00) +INT_KVM_HANDLER 0xe00, EXC_HV, PACA_EXGEN, 1 EXC_COMMON_BEGIN(h_data_storage_common) EXCEPTION_COMMON(PACA_EXGEN, 0xe00) bl save_nvgprs @@ -1601,7 +1583,7 @@ EXC_REAL_END(h_instr_storage, 0xe20, 0x20) EXC_VIRT_BEGIN(h_instr_storage, 0x4e20, 0x20) INT_HANDLER h_instr_storage, 0xe20, ool=1, virt=1, hsrr=EXC_HV, kvm=1 EXC_VIRT_END(h_instr_storage, 0x4e20, 0x20) -TRAMP_KVM_HV(PACA_EXGEN, 0xe20) +INT_KVM_HANDLER 0xe20, EXC_HV, PACA_EXGEN, 0 EXC_COMMON(h_instr_storage_common, 0xe20, unknown_exception) @@ -1611,7 +1593,7 @@ EXC_REAL_END(emulation_assist, 0xe40, 0x20) EXC_VIRT_BEGIN(emulation_assist, 0x4e40, 0x20) INT_HANDLER emulation_assist, 0xe40, ool=1, virt=1, hsrr=EXC_HV, kvm=1 EXC_VIRT_END(emulation_assist, 0x4e40, 0x20) -TRAMP_KVM_HV(PACA_EXGEN, 0xe40) +INT_KVM_HANDLER 0xe40, EXC_HV, PACA_EXGEN, 0 EXC_COMMON(emulation_assist_common, 0xe40, emulation_assist_interrupt) @@ -1624,8 +1606,7 @@ EXC_REAL_BEGIN(hmi_exception, 0xe60, 0x20) INT_HANDLER hmi_exception, 0xe60, ool=1, early=1, hsrr=EXC_HV, ri=0, kvm=1 EXC_REAL_END(hmi_exception, 0xe60, 0x20) EXC_VIRT_NONE(0x4e60, 0x20) -TRAMP_KVM_HV(PACA_EXGEN, 0xe60) - +INT_KVM_HANDLER 0xe60, EXC_HV, PACA_EXGEN, 0 EXC_COMMON_BEGIN(hmi_exception_early_common) mtctr r10 /* Restore ctr */ mfspr r11,SPRN_HSRR0 /* Save HSRR0 */ @@ -1670,7 +1651,7 @@ EXC_REAL_END(h_doorbell, 0xe80, 0x20) EXC_VIRT_BEGIN(h_doorbell, 0x4e80, 0x20) INT_HANDLER h_doorbell, 0xe80, ool=1, virt=1, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1 EXC_VIRT_END(h_doorbell, 0x4e80, 0x20) -TRAMP_KVM_HV(PACA_EXGEN, 0xe80) +INT_KVM_HANDLER 0xe80, EXC_HV, PACA_EXGEN, 0 #ifdef CONFIG_PPC_DOORBELL EXC_COMMON_ASYNC(h_doorbell_common, 0xe80, doorbell_exception) #else @@ -1684,7 +1665,7 @@ EXC_REAL_END(h_virt_irq, 0xea0, 0x20) EXC_VIRT_BEGIN(h_virt_irq, 0x4ea0, 0x20) INT_HANDLER h_virt_irq, 0xea0, ool=1, virt=1, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1 EXC_VIRT_END(h_virt_irq, 0x4ea0, 0x20) -TRAMP_KVM_HV(PACA_EXGEN, 0xea0) +INT_KVM_HANDLER 0xea0, EXC_HV, PACA_EXGEN, 0 EXC_COMMON_ASYNC(h_virt_irq_common, 0xea0, do_IRQ) @@ -1700,7 +1681,7 @@ EXC_REAL_END(performance_monitor, 0xf00, 0x20) EXC_VIRT_BEGIN(performance_monitor, 0x4f00, 0x20) INT_HANDLER performance_monitor, 0xf00, ool=1, virt=1, bitmask=IRQS_PMI_DISABLED EXC_VIRT_END(performance_monitor, 0x4f00, 0x20) -TRAMP_KVM(PACA_EXGEN, 0xf00) +INT_KVM_HANDLER 0xf00, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_ASYNC(performance_monitor_common, 0xf00, performance_monitor_exception) @@ -1710,7 +1691,7 @@ EXC_REAL_END(altivec_unavailable, 0xf20, 0x20) EXC_VIRT_BEGIN(altivec_unavailable, 0x4f20, 0x20) INT_HANDLER altivec_unavailable, 0xf20, ool=1, virt=1 EXC_VIRT_END(altivec_unavailable, 0x4f20, 0x20) -TRAMP_KVM(PACA_EXGEN, 0xf20) +INT_KVM_HANDLER 0xf20, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(altivec_unavailable_common) EXCEPTION_COMMON(PACA_EXGEN, 0xf20) #ifdef CONFIG_ALTIVEC @@ -1751,7 +1732,7 @@ EXC_REAL_END(vsx_unavailable, 0xf40, 0x20) EXC_VIRT_BEGIN(vsx_unavailable, 0x4f40, 0x20) INT_HANDLER vsx_unavailable, 0xf40, ool=1, virt=1 EXC_VIRT_END(vsx_unavailable, 0x4f40, 0x20) -TRAMP_KVM(PACA_EXGEN, 0xf40) +INT_KVM_HANDLER 0xf40, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(vsx_unavailable_common) EXCEPTION_COMMON(PACA_EXGEN, 0xf40) #ifdef CONFIG_VSX @@ -1791,7 +1772,7 @@ EXC_REAL_END(facility_unavailable, 0xf60, 0x20) EXC_VIRT_BEGIN(facility_unavailable, 0x4f60, 0x20) INT_HANDLER facility_unavailable, 0xf60, ool=1, virt=1 EXC_VIRT_END(facility_unavailable, 0x4f60, 0x20) -TRAMP_KVM(PACA_EXGEN, 0xf60) +INT_KVM_HANDLER 0xf60, EXC_STD, PACA_EXGEN, 0 EXC_COMMON(facility_unavailable_common, 0xf60, facility_unavailable_exception) @@ -1801,7 +1782,7 @@ EXC_REAL_END(h_facility_unavailable, 0xf80, 0x20) EXC_VIRT_BEGIN(h_facility_unavailable, 0x4f80, 0x20) INT_HANDLER h_facility_unavailable, 0xf80, ool=1, virt=1, hsrr=EXC_HV, kvm=1 EXC_VIRT_END(h_facility_unavailable, 0x4f80, 0x20) -TRAMP_KVM_HV(PACA_EXGEN, 0xf80) +INT_KVM_HANDLER 0xf80, EXC_HV, PACA_EXGEN, 0 EXC_COMMON(h_facility_unavailable_common, 0xf80, facility_unavailable_exception) @@ -1822,7 +1803,7 @@ EXC_REAL_BEGIN(cbe_system_error, 0x1200, 0x100) INT_HANDLER cbe_system_error, 0x1200, ool=1, hsrr=EXC_HV, kvm=1 EXC_REAL_END(cbe_system_error, 0x1200, 0x100) EXC_VIRT_NONE(0x5200, 0x100) -TRAMP_KVM_HV_SKIP(PACA_EXGEN, 0x1200) +INT_KVM_HANDLER 0x1200, EXC_HV, PACA_EXGEN, 1 EXC_COMMON(cbe_system_error_common, 0x1200, cbe_system_error_exception) #else /* CONFIG_CBE_RAS */ EXC_REAL_NONE(0x1200, 0x100) @@ -1836,7 +1817,7 @@ EXC_REAL_END(instruction_breakpoint, 0x1300, 0x100) EXC_VIRT_BEGIN(instruction_breakpoint, 0x5300, 0x100) INT_HANDLER instruction_breakpoint, 0x1300, virt=1 EXC_VIRT_END(instruction_breakpoint, 0x5300, 0x100) -TRAMP_KVM_SKIP(PACA_EXGEN, 0x1300) +INT_KVM_HANDLER 0x1300, EXC_STD, PACA_EXGEN, 1 EXC_COMMON(instruction_breakpoint_common, 0x1300, instruction_breakpoint_exception) @@ -1866,7 +1847,7 @@ EXC_VIRT_END(denorm_exception, 0x5500, 0x100) EXC_VIRT_NONE(0x5500, 0x100) #endif -TRAMP_KVM_HV(PACA_EXGEN, 0x1500) +INT_KVM_HANDLER 0x1500, EXC_HV, PACA_EXGEN, 0 #ifdef CONFIG_PPC_DENORMALISATION TRAMP_REAL_BEGIN(denorm_assist) @@ -1945,7 +1926,7 @@ EXC_REAL_BEGIN(cbe_maintenance, 0x1600, 0x100) INT_HANDLER cbe_maintenance, 0x1600, ool=1, hsrr=EXC_HV, kvm=1 EXC_REAL_END(cbe_maintenance, 0x1600, 0x100) EXC_VIRT_NONE(0x5600, 0x100) -TRAMP_KVM_HV_SKIP(PACA_EXGEN, 0x1600) +INT_KVM_HANDLER 0x1600, EXC_HV, PACA_EXGEN, 1 EXC_COMMON(cbe_maintenance_common, 0x1600, cbe_maintenance_exception) #else /* CONFIG_CBE_RAS */ EXC_REAL_NONE(0x1600, 0x100) @@ -1959,7 +1940,7 @@ EXC_REAL_END(altivec_assist, 0x1700, 0x100) EXC_VIRT_BEGIN(altivec_assist, 0x5700, 0x100) INT_HANDLER altivec_assist, 0x1700, virt=1 EXC_VIRT_END(altivec_assist, 0x5700, 0x100) -TRAMP_KVM(PACA_EXGEN, 0x1700) +INT_KVM_HANDLER 0x1700, EXC_STD, PACA_EXGEN, 0 #ifdef CONFIG_ALTIVEC EXC_COMMON(altivec_assist_common, 0x1700, altivec_assist_exception) #else @@ -1972,7 +1953,7 @@ EXC_REAL_BEGIN(cbe_thermal, 0x1800, 0x100) INT_HANDLER cbe_thermal, 0x1800, ool=1, hsrr=EXC_HV, kvm=1 EXC_REAL_END(cbe_thermal, 0x1800, 0x100) EXC_VIRT_NONE(0x5800, 0x100) -TRAMP_KVM_HV_SKIP(PACA_EXGEN, 0x1800) +INT_KVM_HANDLER 0x1800, EXC_HV, PACA_EXGEN, 1 EXC_COMMON(cbe_thermal_common, 0x1800, cbe_thermal_exception) #else /* CONFIG_CBE_RAS */ EXC_REAL_NONE(0x1800, 0x100) -- cgit v1.2.3 From 7027d53d1ab17d28b65913148585d5a331446b8b Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:52 +1000 Subject: powerpc/64s/exception: KVM_HANDLER reorder arguments to match other macros Also change argument name (n -> vec) to match others. No generated code change. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-28-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index cff48d212011..d37420dee447 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -316,7 +316,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) .endif .endm -.macro KVM_HANDLER area, hsrr, n, skip +.macro KVM_HANDLER vec, hsrr, area, skip .if \skip cmpwi r10,KVM_GUEST_MODE_SKIP beq 89f @@ -337,14 +337,14 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) /* HSRR variants have the 0x2 bit added to their trap number */ .if \hsrr == EXC_HV_OR_STD BEGIN_FTR_SECTION - ori r12,r12,(\n + 0x2) + ori r12,r12,(\vec + 0x2) FTR_SECTION_ELSE - ori r12,r12,(\n) + ori r12,r12,(\vec) ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) .elseif \hsrr - ori r12,r12,(\n + 0x2) + ori r12,r12,(\vec + 0x2) .else - ori r12,r12,(\n) + ori r12,r12,(\vec) .endif #ifdef CONFIG_RELOCATABLE @@ -386,7 +386,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) #else .macro KVMTEST hsrr, n .endm -.macro KVM_HANDLER area, hsrr, n, skip +.macro KVM_HANDLER vec, hsrr, area, skip .endm #endif @@ -657,7 +657,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) .else TRAMP_KVM_BEGIN(do_kvm_\vec\()) .endif - KVM_HANDLER \area, \hsrr, \vec, \skip + KVM_HANDLER \vec, \hsrr, \area, \skip .endm #define EXC_COMMON(name, realvec, hdlr) \ @@ -1538,7 +1538,7 @@ TRAMP_KVM_BEGIN(do_kvm_0xc00) SET_SCRATCH0(r10) std r9,PACA_EXGEN+EX_R9(r13) mfcr r9 - KVM_HANDLER PACA_EXGEN, EXC_STD, 0xc00, 0 + KVM_HANDLER 0xc00, EXC_STD, PACA_EXGEN, 0 #endif -- cgit v1.2.3 From 9a9c739aa83d031da7468028de8a65608146eccc Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:53 +1000 Subject: powerpc/64s/exception: Merge EXCEPTION_PROLOG_COMMON_2/3 Merge EXCEPTION_PROLOG_COMMON_3 into EXCEPTION_PROLOG_COMMON_2. No generated code change except BUG line number constants. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-29-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index d37420dee447..0643ae57badc 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -399,7 +399,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) std r10,GPR1(r1); /* save r1 in stackframe */ \ /* Save original regs values from save area to stack frame. */ -#define EXCEPTION_PROLOG_COMMON_2(area) \ +#define EXCEPTION_PROLOG_COMMON_2(area, trap) \ ld r9,area+EX_R9(r13); /* move r9, r10 to stackframe */ \ ld r10,area+EX_R10(r13); \ std r9,GPR9(r1); \ @@ -415,9 +415,7 @@ BEGIN_FTR_SECTION_NESTED(66); \ std r10,ORIG_GPR3(r1); \ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66); \ GET_CTR(r10, area); \ - std r10,_CTR(r1); - -#define EXCEPTION_PROLOG_COMMON_3(trap) \ + std r10,_CTR(r1); \ std r2,GPR2(r1); /* save r2 in stackframe */ \ SAVE_4GPRS(3, r1); /* save r3 - r6 in stackframe */ \ SAVE_2GPRS(7, r1); /* save r7, r8 in stackframe */ \ @@ -453,8 +451,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66); \ beq 4f; /* if from kernel mode */ \ ACCOUNT_CPU_USER_ENTRY(r13, r9, r10); \ SAVE_PPR(area, r9); \ -4: EXCEPTION_PROLOG_COMMON_2(area); \ - EXCEPTION_PROLOG_COMMON_3(trap); \ +4: EXCEPTION_PROLOG_COMMON_2(area, trap); \ ACCOUNT_STOLEN_TIME /* @@ -464,8 +461,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66); \ #define EXCEPTION_COMMON_STACK(area, trap) \ EXCEPTION_PROLOG_COMMON_1(); \ kuap_save_amr_and_lock r9, r10, cr1; \ - EXCEPTION_PROLOG_COMMON_2(area); \ - EXCEPTION_PROLOG_COMMON_3(trap) + EXCEPTION_PROLOG_COMMON_2(area, trap) /* * Restore all registers including H/SRR0/1 saved in a stack frame of a @@ -968,8 +964,7 @@ EXC_COMMON_BEGIN(machine_check_early_common) EXCEPTION_PROLOG_COMMON_1() /* We don't touch AMR here, we never go to virtual mode */ - EXCEPTION_PROLOG_COMMON_2(PACA_EXMC) - EXCEPTION_PROLOG_COMMON_3(0x200) + EXCEPTION_PROLOG_COMMON_2(PACA_EXMC, 0x200) ld r3,PACA_EXMC+EX_DAR(r13) lwz r4,PACA_EXMC+EX_DSISR(r13) @@ -1616,8 +1611,7 @@ EXC_COMMON_BEGIN(hmi_exception_early_common) subi r1,r1,INT_FRAME_SIZE /* alloc stack frame */ EXCEPTION_PROLOG_COMMON_1() /* We don't touch AMR here, we never go to virtual mode */ - EXCEPTION_PROLOG_COMMON_2(PACA_EXGEN) - EXCEPTION_PROLOG_COMMON_3(0xe60) + EXCEPTION_PROLOG_COMMON_2(PACA_EXGEN, 0xe60) addi r3,r1,STACK_FRAME_OVERHEAD bl hmi_exception_realmode cmpdi cr0,r3,0 -- cgit v1.2.3 From bcbceed40a8c355b48678d90c5c407dfca811f0e Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:54 +1000 Subject: powerpc/64s/exception: Add INT_COMMON gas macro to generate common exception code No generated code change except BUG line number constants. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-30-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 52 ++++++++++++++++++++++-------------- 1 file changed, 32 insertions(+), 20 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 0643ae57badc..7829a6ad99aa 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -463,6 +463,18 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66); \ kuap_save_amr_and_lock r9, r10, cr1; \ EXCEPTION_PROLOG_COMMON_2(area, trap) +.macro INT_COMMON vec, area, stack, kaup + .if \stack + EXCEPTION_COMMON(\area, \vec) + .else + EXCEPTION_PROLOG_COMMON_1() + .if \kaup + kuap_save_amr_and_lock r9, r10, cr1 + .endif + EXCEPTION_PROLOG_COMMON_2(\area, \vec) + .endif +.endm + /* * Restore all registers including H/SRR0/1 saved in a stack frame of a * standard exception. @@ -658,7 +670,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) #define EXC_COMMON(name, realvec, hdlr) \ EXC_COMMON_BEGIN(name); \ - EXCEPTION_COMMON(PACA_EXGEN, realvec); \ + INT_COMMON realvec, PACA_EXGEN, 1, 1 ; \ bl save_nvgprs; \ RECONCILE_IRQ_STATE(r10, r11); \ addi r3,r1,STACK_FRAME_OVERHEAD; \ @@ -671,7 +683,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) */ #define EXC_COMMON_ASYNC(name, realvec, hdlr) \ EXC_COMMON_BEGIN(name); \ - EXCEPTION_COMMON(PACA_EXGEN, realvec); \ + INT_COMMON realvec, PACA_EXGEN, 1, 1 ; \ FINISH_NAP; \ RECONCILE_IRQ_STATE(r10, r11); \ RUNLATCH_ON; \ @@ -852,7 +864,7 @@ EXC_COMMON_BEGIN(system_reset_common) mr r10,r1 ld r1,PACA_NMI_EMERG_SP(r13) subi r1,r1,INT_FRAME_SIZE - EXCEPTION_COMMON_STACK(PACA_EXNMI, 0x100) + INT_COMMON 0x100, PACA_EXNMI, 0, 1 bl save_nvgprs /* * Set IRQS_ALL_DISABLED unconditionally so arch_irqs_disabled does @@ -962,9 +974,8 @@ EXC_COMMON_BEGIN(machine_check_early_common) bgt cr1,unrecoverable_mce /* Check if we hit limit of 4 */ subi r1,r1,INT_FRAME_SIZE /* alloc stack frame */ - EXCEPTION_PROLOG_COMMON_1() /* We don't touch AMR here, we never go to virtual mode */ - EXCEPTION_PROLOG_COMMON_2(PACA_EXMC, 0x200) + INT_COMMON 0x200, PACA_EXMC, 0, 0 ld r3,PACA_EXMC+EX_DAR(r13) lwz r4,PACA_EXMC+EX_DSISR(r13) @@ -1064,7 +1075,7 @@ EXC_COMMON_BEGIN(machine_check_common) * Machine check is different because we use a different * save area: PACA_EXMC instead of PACA_EXGEN. */ - EXCEPTION_COMMON(PACA_EXMC, 0x200) + INT_COMMON 0x200, PACA_EXMC, 1, 1 FINISH_NAP RECONCILE_IRQ_STATE(r10, r11) ld r3,PACA_EXMC+EX_DAR(r13) @@ -1156,7 +1167,7 @@ EXC_COMMON_BEGIN(data_access_common) * r9 - r13 are saved in paca->exgen. * EX_DAR and EX_DSISR have saved DAR/DSISR */ - EXCEPTION_COMMON(PACA_EXGEN, 0x300) + INT_COMMON 0x300, PACA_EXGEN, 1, 1 RECONCILE_IRQ_STATE(r10, r11) ld r12,_MSR(r1) ld r3,PACA_EXGEN+EX_DAR(r13) @@ -1179,7 +1190,7 @@ EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80) EXC_VIRT_END(data_access_slb, 0x4380, 0x80) INT_KVM_HANDLER 0x380, EXC_STD, PACA_EXSLB, 1 EXC_COMMON_BEGIN(data_access_slb_common) - EXCEPTION_COMMON(PACA_EXSLB, 0x380) + INT_COMMON 0x380, PACA_EXSLB, 1, 1 ld r4,PACA_EXSLB+EX_DAR(r13) std r4,_DAR(r1) addi r3,r1,STACK_FRAME_OVERHEAD @@ -1212,7 +1223,7 @@ EXC_VIRT_BEGIN(instruction_access, 0x4400, 0x80) EXC_VIRT_END(instruction_access, 0x4400, 0x80) INT_KVM_HANDLER 0x400, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(instruction_access_common) - EXCEPTION_COMMON(PACA_EXGEN, 0x400) + INT_COMMON 0x400, PACA_EXGEN, 1, 1 RECONCILE_IRQ_STATE(r10, r11) ld r12,_MSR(r1) ld r3,_NIP(r1) @@ -1235,7 +1246,7 @@ EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80) EXC_VIRT_END(instruction_access_slb, 0x4480, 0x80) INT_KVM_HANDLER 0x480, EXC_STD, PACA_EXSLB, 0 EXC_COMMON_BEGIN(instruction_access_slb_common) - EXCEPTION_COMMON(PACA_EXSLB, 0x480) + INT_COMMON 0x480, PACA_EXSLB, 1, 1 ld r4,_NIP(r1) addi r3,r1,STACK_FRAME_OVERHEAD BEGIN_MMU_FTR_SECTION @@ -1277,7 +1288,7 @@ EXC_VIRT_BEGIN(alignment, 0x4600, 0x100) EXC_VIRT_END(alignment, 0x4600, 0x100) INT_KVM_HANDLER 0x600, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(alignment_common) - EXCEPTION_COMMON(PACA_EXGEN, 0x600) + INT_COMMON 0x600, PACA_EXGEN, 1, 1 ld r3,PACA_EXGEN+EX_DAR(r13) lwz r4,PACA_EXGEN+EX_DSISR(r13) std r3,_DAR(r1) @@ -1323,7 +1334,7 @@ EXC_COMMON_BEGIN(program_check_common) subi r1,r1,INT_FRAME_SIZE /* alloc stack frame */ b 3f /* Jump into the macro !! */ 2: - EXCEPTION_COMMON(PACA_EXGEN, 0x700) + INT_COMMON 0x700, PACA_EXGEN, 1, 1 bl save_nvgprs RECONCILE_IRQ_STATE(r10, r11) addi r3,r1,STACK_FRAME_OVERHEAD @@ -1339,7 +1350,7 @@ EXC_VIRT_BEGIN(fp_unavailable, 0x4800, 0x100) EXC_VIRT_END(fp_unavailable, 0x4800, 0x100) INT_KVM_HANDLER 0x800, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(fp_unavailable_common) - EXCEPTION_COMMON(PACA_EXGEN, 0x800) + INT_COMMON 0x800, PACA_EXGEN, 1, 1 bne 1f /* if from user, just load it up */ bl save_nvgprs RECONCILE_IRQ_STATE(r10, r11) @@ -1555,7 +1566,7 @@ EXC_VIRT_BEGIN(h_data_storage, 0x4e00, 0x20) EXC_VIRT_END(h_data_storage, 0x4e00, 0x20) INT_KVM_HANDLER 0xe00, EXC_HV, PACA_EXGEN, 1 EXC_COMMON_BEGIN(h_data_storage_common) - EXCEPTION_COMMON(PACA_EXGEN, 0xe00) + INT_COMMON 0xe00, PACA_EXGEN, 1, 1 bl save_nvgprs RECONCILE_IRQ_STATE(r10, r11) addi r3,r1,STACK_FRAME_OVERHEAD @@ -1609,9 +1620,10 @@ EXC_COMMON_BEGIN(hmi_exception_early_common) mr r10,r1 /* Save r1 */ ld r1,PACAEMERGSP(r13) /* Use emergency stack for realmode */ subi r1,r1,INT_FRAME_SIZE /* alloc stack frame */ - EXCEPTION_PROLOG_COMMON_1() + /* We don't touch AMR here, we never go to virtual mode */ - EXCEPTION_PROLOG_COMMON_2(PACA_EXGEN, 0xe60) + INT_COMMON 0xe60, PACA_EXGEN, 0, 0 + addi r3,r1,STACK_FRAME_OVERHEAD bl hmi_exception_realmode cmpdi cr0,r3,0 @@ -1629,7 +1641,7 @@ EXC_COMMON_BEGIN(hmi_exception_early_common) INT_HANDLER hmi_exception, 0xe60, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1 EXC_COMMON_BEGIN(hmi_exception_common) - EXCEPTION_COMMON(PACA_EXGEN, 0xe60) + INT_COMMON 0xe60, PACA_EXGEN, 1, 1 FINISH_NAP bl save_nvgprs RECONCILE_IRQ_STATE(r10, r11) @@ -1687,7 +1699,7 @@ EXC_VIRT_BEGIN(altivec_unavailable, 0x4f20, 0x20) EXC_VIRT_END(altivec_unavailable, 0x4f20, 0x20) INT_KVM_HANDLER 0xf20, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(altivec_unavailable_common) - EXCEPTION_COMMON(PACA_EXGEN, 0xf20) + INT_COMMON 0xf20, PACA_EXGEN, 1, 1 #ifdef CONFIG_ALTIVEC BEGIN_FTR_SECTION beq 1f @@ -1728,7 +1740,7 @@ EXC_VIRT_BEGIN(vsx_unavailable, 0x4f40, 0x20) EXC_VIRT_END(vsx_unavailable, 0x4f40, 0x20) INT_KVM_HANDLER 0xf40, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(vsx_unavailable_common) - EXCEPTION_COMMON(PACA_EXGEN, 0xf40) + INT_COMMON 0xf40, PACA_EXGEN, 1, 1 #ifdef CONFIG_VSX BEGIN_FTR_SECTION beq 1f @@ -1979,7 +1991,7 @@ EXC_COMMON_BEGIN(soft_nmi_common) mr r10,r1 ld r1,PACAEMERGSP(r13) subi r1,r1,INT_FRAME_SIZE - EXCEPTION_COMMON_STACK(PACA_EXGEN, 0x900) + INT_COMMON 0x900, PACA_EXGEN, 0, 1 bl save_nvgprs RECONCILE_IRQ_STATE(r10, r11) addi r3,r1,STACK_FRAME_OVERHEAD -- cgit v1.2.3 From 5d5e0edfd5fa2a60d0f9b8d6ece1a5ce51aae3b5 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:55 +1000 Subject: powerpc/64s/exception: Expand EXCEPTION_COMMON macro into caller No generated code change except BUG line number constants. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-31-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 54 ++++++++++++++++++------------------ 1 file changed, 27 insertions(+), 27 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 7829a6ad99aa..492786604b10 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -437,41 +437,41 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66); \ * On entry r13 points to the paca, r9-r13 are saved in the paca, * r9 contains the saved CR, r11 and r12 contain the saved SRR0 and * SRR1, and relocation is on. + * + * If stack=0, then the stack is already set in r1, and r1 is saved in r10. + * PPR save and CPU accounting is not done for the !stack case (XXX why not?) */ -#define EXCEPTION_COMMON(area, trap) \ - andi. r10,r12,MSR_PR; /* See if coming from user */ \ - mr r10,r1; /* Save r1 */ \ - subi r1,r1,INT_FRAME_SIZE; /* alloc frame on kernel stack */ \ - beq- 1f; \ - ld r1,PACAKSAVE(r13); /* kernel stack to use */ \ -1: tdgei r1,-INT_FRAME_SIZE; /* trap if r1 is in userspace */ \ - EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,0; \ -3: EXCEPTION_PROLOG_COMMON_1(); \ - kuap_save_amr_and_lock r9, r10, cr1, cr0; \ - beq 4f; /* if from kernel mode */ \ - ACCOUNT_CPU_USER_ENTRY(r13, r9, r10); \ - SAVE_PPR(area, r9); \ -4: EXCEPTION_PROLOG_COMMON_2(area, trap); \ - ACCOUNT_STOLEN_TIME - -/* - * Exception where stack is already set in r1, r1 is saved in r10. - * PPR save and CPU accounting is not done (for some reason). - */ -#define EXCEPTION_COMMON_STACK(area, trap) \ - EXCEPTION_PROLOG_COMMON_1(); \ - kuap_save_amr_and_lock r9, r10, cr1; \ - EXCEPTION_PROLOG_COMMON_2(area, trap) - .macro INT_COMMON vec, area, stack, kaup .if \stack - EXCEPTION_COMMON(\area, \vec) - .else + andi. r10,r12,MSR_PR /* See if coming from user */ + mr r10,r1 /* Save r1 */ + subi r1,r1,INT_FRAME_SIZE /* alloc frame on kernel stack */ + beq- 1f + ld r1,PACAKSAVE(r13) /* kernel stack to use */ +1: tdgei r1,-INT_FRAME_SIZE /* trap if r1 is in userspace */ + EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,0 +3: + .endif EXCEPTION_PROLOG_COMMON_1() + + .if \stack + .if \kaup + kuap_save_amr_and_lock r9, r10, cr1, cr0 + .endif + beq 4f /* if from kernel mode */ + ACCOUNT_CPU_USER_ENTRY(r13, r9, r10) + SAVE_PPR(\area, r9) +4: + .else .if \kaup kuap_save_amr_and_lock r9, r10, cr1 .endif + .endif + EXCEPTION_PROLOG_COMMON_2(\area, \vec) + + .if \stack + ACCOUNT_STOLEN_TIME .endif .endm -- cgit v1.2.3 From 8c9fb5d4f3ddf02fb0fa3dec2dffd6b007ac894c Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:56 +1000 Subject: powerpc/64s/exception: Expand EXCEPTION_PROLOG_COMMON_1 and 2 into caller No generated code change except BUG line number constants. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-32-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 85 +++++++++++++++++------------------- 1 file changed, 40 insertions(+), 45 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 492786604b10..7a4f215c4c5b 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -390,49 +390,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) .endm #endif -#define EXCEPTION_PROLOG_COMMON_1() \ - std r9,_CCR(r1); /* save CR in stackframe */ \ - std r11,_NIP(r1); /* save SRR0 in stackframe */ \ - std r12,_MSR(r1); /* save SRR1 in stackframe */ \ - std r10,0(r1); /* make stack chain pointer */ \ - std r0,GPR0(r1); /* save r0 in stackframe */ \ - std r10,GPR1(r1); /* save r1 in stackframe */ \ - -/* Save original regs values from save area to stack frame. */ -#define EXCEPTION_PROLOG_COMMON_2(area, trap) \ - ld r9,area+EX_R9(r13); /* move r9, r10 to stackframe */ \ - ld r10,area+EX_R10(r13); \ - std r9,GPR9(r1); \ - std r10,GPR10(r1); \ - ld r9,area+EX_R11(r13); /* move r11 - r13 to stackframe */ \ - ld r10,area+EX_R12(r13); \ - ld r11,area+EX_R13(r13); \ - std r9,GPR11(r1); \ - std r10,GPR12(r1); \ - std r11,GPR13(r1); \ -BEGIN_FTR_SECTION_NESTED(66); \ - ld r10,area+EX_CFAR(r13); \ - std r10,ORIG_GPR3(r1); \ -END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66); \ - GET_CTR(r10, area); \ - std r10,_CTR(r1); \ - std r2,GPR2(r1); /* save r2 in stackframe */ \ - SAVE_4GPRS(3, r1); /* save r3 - r6 in stackframe */ \ - SAVE_2GPRS(7, r1); /* save r7, r8 in stackframe */ \ - mflr r9; /* Get LR, later save to stack */ \ - ld r2,PACATOC(r13); /* get kernel TOC into r2 */ \ - std r9,_LINK(r1); \ - lbz r10,PACAIRQSOFTMASK(r13); \ - mfspr r11,SPRN_XER; /* save XER in stackframe */ \ - std r10,SOFTE(r1); \ - std r11,_XER(r1); \ - li r9,(trap)+1; \ - std r9,_TRAP(r1); /* set trap number */ \ - li r10,0; \ - ld r11,exception_marker@toc(r2); \ - std r10,RESULT(r1); /* clear regs->result */ \ - std r11,STACK_FRAME_OVERHEAD-16(r1); /* mark the frame */ - /* * On entry r13 points to the paca, r9-r13 are saved in the paca, * r9 contains the saved CR, r11 and r12 contain the saved SRR0 and @@ -452,7 +409,13 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66); \ EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,0 3: .endif - EXCEPTION_PROLOG_COMMON_1() + + std r9,_CCR(r1) /* save CR in stackframe */ + std r11,_NIP(r1) /* save SRR0 in stackframe */ + std r12,_MSR(r1) /* save SRR1 in stackframe */ + std r10,0(r1) /* make stack chain pointer */ + std r0,GPR0(r1) /* save r0 in stackframe */ + std r10,GPR1(r1) /* save r1 in stackframe */ .if \stack .if \kaup @@ -468,7 +431,39 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66); \ .endif .endif - EXCEPTION_PROLOG_COMMON_2(\area, \vec) + /* Save original regs values from save area to stack frame. */ + ld r9,\area+EX_R9(r13) /* move r9, r10 to stackframe */ + ld r10,\area+EX_R10(r13) + std r9,GPR9(r1) + std r10,GPR10(r1) + ld r9,\area+EX_R11(r13) /* move r11 - r13 to stackframe */ + ld r10,\area+EX_R12(r13) + ld r11,\area+EX_R13(r13) + std r9,GPR11(r1) + std r10,GPR12(r1) + std r11,GPR13(r1) +BEGIN_FTR_SECTION_NESTED(66) + ld r10,\area+EX_CFAR(r13) + std r10,ORIG_GPR3(r1) +END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66) + GET_CTR(r10, \area) + std r10,_CTR(r1) + std r2,GPR2(r1) /* save r2 in stackframe */ + SAVE_4GPRS(3, r1) /* save r3 - r6 in stackframe */ + SAVE_2GPRS(7, r1) /* save r7, r8 in stackframe */ + mflr r9 /* Get LR, later save to stack */ + ld r2,PACATOC(r13) /* get kernel TOC into r2 */ + std r9,_LINK(r1) + lbz r10,PACAIRQSOFTMASK(r13) + mfspr r11,SPRN_XER /* save XER in stackframe */ + std r10,SOFTE(r1) + std r11,_XER(r1) + li r9,(\vec)+1 + std r9,_TRAP(r1) /* set trap number */ + li r10,0 + ld r11,exception_marker@toc(r2) + std r10,RESULT(r1) /* clear regs->result */ + std r11,STACK_FRAME_OVERHEAD-16(r1) /* mark the frame */ .if \stack ACCOUNT_STOLEN_TIME -- cgit v1.2.3 From d1a84718888e768296557b83f4c56cb1caef8fdd Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:57 +1000 Subject: powerpc/64s/exception: INT_COMMON add DAR, DSISR, reconcile options Move DAR and DSISR saving to pt_regs into INT_COMMON. Also add an option to expand RECONCILE_IRQ_STATE. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-33-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 111 ++++++++++++++++------------------- 1 file changed, 51 insertions(+), 60 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 7a4f215c4c5b..c2235876e397 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -398,7 +398,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) * If stack=0, then the stack is already set in r1, and r1 is saved in r10. * PPR save and CPU accounting is not done for the !stack case (XXX why not?) */ -.macro INT_COMMON vec, area, stack, kaup +.macro INT_COMMON vec, area, stack, kaup, reconcile, dar, dsisr .if \stack andi. r10,r12,MSR_PR /* See if coming from user */ mr r10,r1 /* Save r1 */ @@ -442,6 +442,24 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) std r9,GPR11(r1) std r10,GPR12(r1) std r11,GPR13(r1) + .if \dar + .if \dar == 2 + ld r10,_NIP(r1) + .else + ld r10,\area+EX_DAR(r13) + .endif + std r10,_DAR(r1) + .endif + .if \dsisr + .if \dsisr == 2 + ld r10,_MSR(r1) + lis r11,DSISR_SRR1_MATCH_64S@h + and r10,r10,r11 + .else + lwz r10,\area+EX_DSISR(r13) + .endif + std r10,_DSISR(r1) + .endif BEGIN_FTR_SECTION_NESTED(66) ld r10,\area+EX_CFAR(r13) std r10,ORIG_GPR3(r1) @@ -468,6 +486,10 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66) .if \stack ACCOUNT_STOLEN_TIME .endif + + .if \reconcile + RECONCILE_IRQ_STATE(r10, r11) + .endif .endm /* @@ -665,9 +687,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) #define EXC_COMMON(name, realvec, hdlr) \ EXC_COMMON_BEGIN(name); \ - INT_COMMON realvec, PACA_EXGEN, 1, 1 ; \ + INT_COMMON realvec, PACA_EXGEN, 1, 1, 1, 0, 0 ; \ bl save_nvgprs; \ - RECONCILE_IRQ_STATE(r10, r11); \ addi r3,r1,STACK_FRAME_OVERHEAD; \ bl hdlr; \ b ret_from_except @@ -678,9 +699,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) */ #define EXC_COMMON_ASYNC(name, realvec, hdlr) \ EXC_COMMON_BEGIN(name); \ - INT_COMMON realvec, PACA_EXGEN, 1, 1 ; \ + INT_COMMON realvec, PACA_EXGEN, 1, 1, 1, 0, 0 ; \ FINISH_NAP; \ - RECONCILE_IRQ_STATE(r10, r11); \ RUNLATCH_ON; \ addi r3,r1,STACK_FRAME_OVERHEAD; \ bl hdlr; \ @@ -859,7 +879,7 @@ EXC_COMMON_BEGIN(system_reset_common) mr r10,r1 ld r1,PACA_NMI_EMERG_SP(r13) subi r1,r1,INT_FRAME_SIZE - INT_COMMON 0x100, PACA_EXNMI, 0, 1 + INT_COMMON 0x100, PACA_EXNMI, 0, 1, 0, 0, 0 bl save_nvgprs /* * Set IRQS_ALL_DISABLED unconditionally so arch_irqs_disabled does @@ -970,12 +990,7 @@ EXC_COMMON_BEGIN(machine_check_early_common) subi r1,r1,INT_FRAME_SIZE /* alloc stack frame */ /* We don't touch AMR here, we never go to virtual mode */ - INT_COMMON 0x200, PACA_EXMC, 0, 0 - - ld r3,PACA_EXMC+EX_DAR(r13) - lwz r4,PACA_EXMC+EX_DSISR(r13) - std r3,_DAR(r1) - std r4,_DSISR(r1) + INT_COMMON 0x200, PACA_EXMC, 0, 0, 0, 1, 1 BEGIN_FTR_SECTION bl enable_machine_check @@ -1070,16 +1085,11 @@ EXC_COMMON_BEGIN(machine_check_common) * Machine check is different because we use a different * save area: PACA_EXMC instead of PACA_EXGEN. */ - INT_COMMON 0x200, PACA_EXMC, 1, 1 + INT_COMMON 0x200, PACA_EXMC, 1, 1, 1, 1, 1 FINISH_NAP - RECONCILE_IRQ_STATE(r10, r11) - ld r3,PACA_EXMC+EX_DAR(r13) - lwz r4,PACA_EXMC+EX_DSISR(r13) /* Enable MSR_RI when finished with PACA_EXMC */ li r10,MSR_RI mtmsrd r10,1 - std r3,_DAR(r1) - std r4,_DSISR(r1) bl save_nvgprs addi r3,r1,STACK_FRAME_OVERHEAD bl machine_check_exception @@ -1162,14 +1172,11 @@ EXC_COMMON_BEGIN(data_access_common) * r9 - r13 are saved in paca->exgen. * EX_DAR and EX_DSISR have saved DAR/DSISR */ - INT_COMMON 0x300, PACA_EXGEN, 1, 1 - RECONCILE_IRQ_STATE(r10, r11) + INT_COMMON 0x300, PACA_EXGEN, 1, 1, 1, 1, 1 ld r12,_MSR(r1) - ld r3,PACA_EXGEN+EX_DAR(r13) - lwz r4,PACA_EXGEN+EX_DSISR(r13) + ld r3,_DAR(r1) + ld r4,_DSISR(r1) li r5,0x300 - std r3,_DAR(r1) - std r4,_DSISR(r1) BEGIN_MMU_FTR_SECTION b do_hash_page /* Try to handle as hpte fault */ MMU_FTR_SECTION_ELSE @@ -1185,9 +1192,8 @@ EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80) EXC_VIRT_END(data_access_slb, 0x4380, 0x80) INT_KVM_HANDLER 0x380, EXC_STD, PACA_EXSLB, 1 EXC_COMMON_BEGIN(data_access_slb_common) - INT_COMMON 0x380, PACA_EXSLB, 1, 1 - ld r4,PACA_EXSLB+EX_DAR(r13) - std r4,_DAR(r1) + INT_COMMON 0x380, PACA_EXSLB, 1, 1, 0, 1, 0 + ld r4,_DAR(r1) addi r3,r1,STACK_FRAME_OVERHEAD BEGIN_MMU_FTR_SECTION /* HPT case, do SLB fault */ @@ -1218,14 +1224,11 @@ EXC_VIRT_BEGIN(instruction_access, 0x4400, 0x80) EXC_VIRT_END(instruction_access, 0x4400, 0x80) INT_KVM_HANDLER 0x400, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(instruction_access_common) - INT_COMMON 0x400, PACA_EXGEN, 1, 1 - RECONCILE_IRQ_STATE(r10, r11) - ld r12,_MSR(r1) - ld r3,_NIP(r1) - andis. r4,r12,DSISR_SRR1_MATCH_64S@h + INT_COMMON 0x400, PACA_EXGEN, 1, 1, 1, 2, 2 + ld r12,_MSR(r1) + ld r3,_DAR(r1) + ld r4,_DSISR(r1) li r5,0x400 - std r3,_DAR(r1) - std r4,_DSISR(r1) BEGIN_MMU_FTR_SECTION b do_hash_page /* Try to handle as hpte fault */ MMU_FTR_SECTION_ELSE @@ -1241,8 +1244,8 @@ EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80) EXC_VIRT_END(instruction_access_slb, 0x4480, 0x80) INT_KVM_HANDLER 0x480, EXC_STD, PACA_EXSLB, 0 EXC_COMMON_BEGIN(instruction_access_slb_common) - INT_COMMON 0x480, PACA_EXSLB, 1, 1 - ld r4,_NIP(r1) + INT_COMMON 0x480, PACA_EXSLB, 1, 1, 0, 2, 0 + ld r4,_DAR(r1) addi r3,r1,STACK_FRAME_OVERHEAD BEGIN_MMU_FTR_SECTION /* HPT case, do SLB fault */ @@ -1258,7 +1261,7 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) std r3,RESULT(r1) bl save_nvgprs RECONCILE_IRQ_STATE(r10, r11) - ld r4,_NIP(r1) + ld r4,_DAR(r1) ld r5,RESULT(r1) addi r3,r1,STACK_FRAME_OVERHEAD bl do_bad_slb_fault @@ -1283,13 +1286,8 @@ EXC_VIRT_BEGIN(alignment, 0x4600, 0x100) EXC_VIRT_END(alignment, 0x4600, 0x100) INT_KVM_HANDLER 0x600, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(alignment_common) - INT_COMMON 0x600, PACA_EXGEN, 1, 1 - ld r3,PACA_EXGEN+EX_DAR(r13) - lwz r4,PACA_EXGEN+EX_DSISR(r13) - std r3,_DAR(r1) - std r4,_DSISR(r1) + INT_COMMON 0x600, PACA_EXGEN, 1, 1, 1, 1, 1 bl save_nvgprs - RECONCILE_IRQ_STATE(r10, r11) addi r3,r1,STACK_FRAME_OVERHEAD bl alignment_exception b ret_from_except @@ -1329,9 +1327,8 @@ EXC_COMMON_BEGIN(program_check_common) subi r1,r1,INT_FRAME_SIZE /* alloc stack frame */ b 3f /* Jump into the macro !! */ 2: - INT_COMMON 0x700, PACA_EXGEN, 1, 1 + INT_COMMON 0x700, PACA_EXGEN, 1, 1, 1, 0, 0 bl save_nvgprs - RECONCILE_IRQ_STATE(r10, r11) addi r3,r1,STACK_FRAME_OVERHEAD bl program_check_exception b ret_from_except @@ -1345,7 +1342,7 @@ EXC_VIRT_BEGIN(fp_unavailable, 0x4800, 0x100) EXC_VIRT_END(fp_unavailable, 0x4800, 0x100) INT_KVM_HANDLER 0x800, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(fp_unavailable_common) - INT_COMMON 0x800, PACA_EXGEN, 1, 1 + INT_COMMON 0x800, PACA_EXGEN, 1, 1, 0, 0, 0 bne 1f /* if from user, just load it up */ bl save_nvgprs RECONCILE_IRQ_STATE(r10, r11) @@ -1561,15 +1558,11 @@ EXC_VIRT_BEGIN(h_data_storage, 0x4e00, 0x20) EXC_VIRT_END(h_data_storage, 0x4e00, 0x20) INT_KVM_HANDLER 0xe00, EXC_HV, PACA_EXGEN, 1 EXC_COMMON_BEGIN(h_data_storage_common) - INT_COMMON 0xe00, PACA_EXGEN, 1, 1 + INT_COMMON 0xe00, PACA_EXGEN, 1, 1, 1, 1, 1 bl save_nvgprs - RECONCILE_IRQ_STATE(r10, r11) addi r3,r1,STACK_FRAME_OVERHEAD BEGIN_MMU_FTR_SECTION - ld r4,PACA_EXGEN+EX_DAR(r13) - lwz r5,PACA_EXGEN+EX_DSISR(r13) - std r4,_DAR(r1) - std r5,_DSISR(r1) + ld r4,_DAR(r1) li r5,SIGSEGV bl bad_page_fault MMU_FTR_SECTION_ELSE @@ -1617,7 +1610,7 @@ EXC_COMMON_BEGIN(hmi_exception_early_common) subi r1,r1,INT_FRAME_SIZE /* alloc stack frame */ /* We don't touch AMR here, we never go to virtual mode */ - INT_COMMON 0xe60, PACA_EXGEN, 0, 0 + INT_COMMON 0xe60, PACA_EXGEN, 0, 0, 0, 0, 0 addi r3,r1,STACK_FRAME_OVERHEAD bl hmi_exception_realmode @@ -1636,11 +1629,10 @@ EXC_COMMON_BEGIN(hmi_exception_early_common) INT_HANDLER hmi_exception, 0xe60, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1 EXC_COMMON_BEGIN(hmi_exception_common) - INT_COMMON 0xe60, PACA_EXGEN, 1, 1 + INT_COMMON 0xe60, PACA_EXGEN, 1, 1, 1, 0, 0 FINISH_NAP - bl save_nvgprs - RECONCILE_IRQ_STATE(r10, r11) RUNLATCH_ON + bl save_nvgprs addi r3,r1,STACK_FRAME_OVERHEAD bl handle_hmi_exception b ret_from_except @@ -1694,7 +1686,7 @@ EXC_VIRT_BEGIN(altivec_unavailable, 0x4f20, 0x20) EXC_VIRT_END(altivec_unavailable, 0x4f20, 0x20) INT_KVM_HANDLER 0xf20, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(altivec_unavailable_common) - INT_COMMON 0xf20, PACA_EXGEN, 1, 1 + INT_COMMON 0xf20, PACA_EXGEN, 1, 1, 0, 0, 0 #ifdef CONFIG_ALTIVEC BEGIN_FTR_SECTION beq 1f @@ -1735,7 +1727,7 @@ EXC_VIRT_BEGIN(vsx_unavailable, 0x4f40, 0x20) EXC_VIRT_END(vsx_unavailable, 0x4f40, 0x20) INT_KVM_HANDLER 0xf40, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(vsx_unavailable_common) - INT_COMMON 0xf40, PACA_EXGEN, 1, 1 + INT_COMMON 0xf40, PACA_EXGEN, 1, 1, 0, 0, 0 #ifdef CONFIG_VSX BEGIN_FTR_SECTION beq 1f @@ -1986,9 +1978,8 @@ EXC_COMMON_BEGIN(soft_nmi_common) mr r10,r1 ld r1,PACAEMERGSP(r13) subi r1,r1,INT_FRAME_SIZE - INT_COMMON 0x900, PACA_EXGEN, 0, 1 + INT_COMMON 0x900, PACA_EXGEN, 0, 1, 1, 0, 0 bl save_nvgprs - RECONCILE_IRQ_STATE(r10, r11) addi r3,r1,STACK_FRAME_OVERHEAD bl soft_nmi_interrupt b ret_from_except -- cgit v1.2.3 From c7c5cbb42d6e207a059c64740b1654376619345e Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:58 +1000 Subject: powerpc/64s/exception: move interrupt entry code above the common handler This better reflects the order in which the code is executed. No generated code change except BUG line number constants. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-34-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 450 +++++++++++++++++------------------ 1 file changed, 225 insertions(+), 225 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index c2235876e397..aabd84e83615 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -180,101 +180,6 @@ BEGIN_FTR_SECTION_NESTED(943) \ std ra,offset(r13); \ END_FTR_SECTION_NESTED(ftr,ftr,943) -.macro INT_SAVE_SRR_AND_JUMP label, hsrr, set_ri - ld r10,PACAKMSR(r13) /* get MSR value for kernel */ - .if ! \set_ri - xori r10,r10,MSR_RI /* Clear MSR_RI */ - .endif - .if \hsrr == EXC_HV_OR_STD - BEGIN_FTR_SECTION - mfspr r11,SPRN_HSRR0 /* save HSRR0 */ - mfspr r12,SPRN_HSRR1 /* and HSRR1 */ - mtspr SPRN_HSRR1,r10 - FTR_SECTION_ELSE - mfspr r11,SPRN_SRR0 /* save SRR0 */ - mfspr r12,SPRN_SRR1 /* and SRR1 */ - mtspr SPRN_SRR1,r10 - ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) - .elseif \hsrr - mfspr r11,SPRN_HSRR0 /* save HSRR0 */ - mfspr r12,SPRN_HSRR1 /* and HSRR1 */ - mtspr SPRN_HSRR1,r10 - .else - mfspr r11,SPRN_SRR0 /* save SRR0 */ - mfspr r12,SPRN_SRR1 /* and SRR1 */ - mtspr SPRN_SRR1,r10 - .endif - LOAD_HANDLER(r10, \label\()) - .if \hsrr == EXC_HV_OR_STD - BEGIN_FTR_SECTION - mtspr SPRN_HSRR0,r10 - HRFI_TO_KERNEL - FTR_SECTION_ELSE - mtspr SPRN_SRR0,r10 - RFI_TO_KERNEL - ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) - .elseif \hsrr - mtspr SPRN_HSRR0,r10 - HRFI_TO_KERNEL - .else - mtspr SPRN_SRR0,r10 - RFI_TO_KERNEL - .endif - b . /* prevent speculative execution */ -.endm - -/* INT_SAVE_SRR_AND_JUMP works for real or virt, this is faster but virt only */ -.macro INT_VIRT_SAVE_SRR_AND_JUMP label, hsrr -#ifdef CONFIG_RELOCATABLE - .if \hsrr == EXC_HV_OR_STD - BEGIN_FTR_SECTION - mfspr r11,SPRN_HSRR0 /* save HSRR0 */ - FTR_SECTION_ELSE - mfspr r11,SPRN_SRR0 /* save SRR0 */ - ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) - .elseif \hsrr - mfspr r11,SPRN_HSRR0 /* save HSRR0 */ - .else - mfspr r11,SPRN_SRR0 /* save SRR0 */ - .endif - LOAD_HANDLER(r12, \label\()) - mtctr r12 - .if \hsrr == EXC_HV_OR_STD - BEGIN_FTR_SECTION - mfspr r12,SPRN_HSRR1 /* and HSRR1 */ - FTR_SECTION_ELSE - mfspr r12,SPRN_SRR1 /* and HSRR1 */ - ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) - .elseif \hsrr - mfspr r12,SPRN_HSRR1 /* and HSRR1 */ - .else - mfspr r12,SPRN_SRR1 /* and HSRR1 */ - .endif - li r10,MSR_RI - mtmsrd r10,1 /* Set RI (EE=0) */ - bctr -#else - .if \hsrr == EXC_HV_OR_STD - BEGIN_FTR_SECTION - mfspr r11,SPRN_HSRR0 /* save HSRR0 */ - mfspr r12,SPRN_HSRR1 /* and HSRR1 */ - FTR_SECTION_ELSE - mfspr r11,SPRN_SRR0 /* save SRR0 */ - mfspr r12,SPRN_SRR1 /* and SRR1 */ - ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) - .elseif \hsrr - mfspr r11,SPRN_HSRR0 /* save HSRR0 */ - mfspr r12,SPRN_HSRR1 /* and HSRR1 */ - .else - mfspr r11,SPRN_SRR0 /* save SRR0 */ - mfspr r12,SPRN_SRR1 /* and SRR1 */ - .endif - li r10,MSR_RI - mtmsrd r10,1 /* Set RI (EE=0) */ - b \label -#endif -.endm - /* * Branch to label using its 0xC000 address. This results in instruction * address suitable for MSR[IR]=0 or 1, which allows relocation to be turned @@ -288,6 +193,15 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) mtctr reg; \ bctr +.macro INT_KVM_HANDLER vec, hsrr, area, skip + .if \hsrr + TRAMP_KVM_BEGIN(do_kvm_H\vec\()) + .else + TRAMP_KVM_BEGIN(do_kvm_\vec\()) + .endif + KVM_HANDLER \vec, \hsrr, \area, \skip +.endm + #ifdef CONFIG_KVM_BOOK3S_64_HANDLER #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE /* @@ -390,6 +304,222 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) .endm #endif +.macro INT_SAVE_SRR_AND_JUMP label, hsrr, set_ri + ld r10,PACAKMSR(r13) /* get MSR value for kernel */ + .if ! \set_ri + xori r10,r10,MSR_RI /* Clear MSR_RI */ + .endif + .if \hsrr == EXC_HV_OR_STD + BEGIN_FTR_SECTION + mfspr r11,SPRN_HSRR0 /* save HSRR0 */ + mfspr r12,SPRN_HSRR1 /* and HSRR1 */ + mtspr SPRN_HSRR1,r10 + FTR_SECTION_ELSE + mfspr r11,SPRN_SRR0 /* save SRR0 */ + mfspr r12,SPRN_SRR1 /* and SRR1 */ + mtspr SPRN_SRR1,r10 + ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + .elseif \hsrr + mfspr r11,SPRN_HSRR0 /* save HSRR0 */ + mfspr r12,SPRN_HSRR1 /* and HSRR1 */ + mtspr SPRN_HSRR1,r10 + .else + mfspr r11,SPRN_SRR0 /* save SRR0 */ + mfspr r12,SPRN_SRR1 /* and SRR1 */ + mtspr SPRN_SRR1,r10 + .endif + LOAD_HANDLER(r10, \label\()) + .if \hsrr == EXC_HV_OR_STD + BEGIN_FTR_SECTION + mtspr SPRN_HSRR0,r10 + HRFI_TO_KERNEL + FTR_SECTION_ELSE + mtspr SPRN_SRR0,r10 + RFI_TO_KERNEL + ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + .elseif \hsrr + mtspr SPRN_HSRR0,r10 + HRFI_TO_KERNEL + .else + mtspr SPRN_SRR0,r10 + RFI_TO_KERNEL + .endif + b . /* prevent speculative execution */ +.endm + +/* INT_SAVE_SRR_AND_JUMP works for real or virt, this is faster but virt only */ +.macro INT_VIRT_SAVE_SRR_AND_JUMP label, hsrr +#ifdef CONFIG_RELOCATABLE + .if \hsrr == EXC_HV_OR_STD + BEGIN_FTR_SECTION + mfspr r11,SPRN_HSRR0 /* save HSRR0 */ + FTR_SECTION_ELSE + mfspr r11,SPRN_SRR0 /* save SRR0 */ + ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + .elseif \hsrr + mfspr r11,SPRN_HSRR0 /* save HSRR0 */ + .else + mfspr r11,SPRN_SRR0 /* save SRR0 */ + .endif + LOAD_HANDLER(r12, \label\()) + mtctr r12 + .if \hsrr == EXC_HV_OR_STD + BEGIN_FTR_SECTION + mfspr r12,SPRN_HSRR1 /* and HSRR1 */ + FTR_SECTION_ELSE + mfspr r12,SPRN_SRR1 /* and HSRR1 */ + ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + .elseif \hsrr + mfspr r12,SPRN_HSRR1 /* and HSRR1 */ + .else + mfspr r12,SPRN_SRR1 /* and HSRR1 */ + .endif + li r10,MSR_RI + mtmsrd r10,1 /* Set RI (EE=0) */ + bctr +#else + .if \hsrr == EXC_HV_OR_STD + BEGIN_FTR_SECTION + mfspr r11,SPRN_HSRR0 /* save HSRR0 */ + mfspr r12,SPRN_HSRR1 /* and HSRR1 */ + FTR_SECTION_ELSE + mfspr r11,SPRN_SRR0 /* save SRR0 */ + mfspr r12,SPRN_SRR1 /* and SRR1 */ + ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + .elseif \hsrr + mfspr r11,SPRN_HSRR0 /* save HSRR0 */ + mfspr r12,SPRN_HSRR1 /* and HSRR1 */ + .else + mfspr r11,SPRN_SRR0 /* save SRR0 */ + mfspr r12,SPRN_SRR1 /* and SRR1 */ + .endif + li r10,MSR_RI + mtmsrd r10,1 /* Set RI (EE=0) */ + b \label +#endif +.endm + +/* + * This is the BOOK3S interrupt entry code macro. + * + * This can result in one of several things happening: + * - Branch to the _common handler, relocated, in virtual mode. + * These are normal interrupts (synchronous and asynchronous) handled by + * the kernel. + * - Branch to KVM, relocated but real mode interrupts remain in real mode. + * These occur when HSTATE_IN_GUEST is set. The interrupt may be caused by + * / intended for host or guest kernel, but KVM must always be involved + * because the machine state is set for guest execution. + * - Branch to the masked handler, unrelocated. + * These occur when maskable asynchronous interrupts are taken with the + * irq_soft_mask set. + * - Branch to an "early" handler in real mode but relocated. + * This is done if early=1. MCE and HMI use these to handle errors in real + * mode. + * - Fall through and continue executing in real, unrelocated mode. + * This is done if early=2. + */ +.macro INT_HANDLER name, vec, ool=0, early=0, virt=0, hsrr=0, area=PACA_EXGEN, ri=1, dar=0, dsisr=0, bitmask=0, kvm=0 + SET_SCRATCH0(r13) /* save r13 */ + GET_PACA(r13) + std r9,\area\()+EX_R9(r13) /* save r9 */ + OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR) + HMT_MEDIUM + std r10,\area\()+EX_R10(r13) /* save r10 - r12 */ + OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR) + .if \ool + .if !\virt + b tramp_real_\name + .pushsection .text + TRAMP_REAL_BEGIN(tramp_real_\name) + .else + b tramp_virt_\name + .pushsection .text + TRAMP_VIRT_BEGIN(tramp_virt_\name) + .endif + .endif + + OPT_SAVE_REG_TO_PACA(\area\()+EX_PPR, r9, CPU_FTR_HAS_PPR) + OPT_SAVE_REG_TO_PACA(\area\()+EX_CFAR, r10, CPU_FTR_CFAR) + INTERRUPT_TO_KERNEL + SAVE_CTR(r10, \area\()) + mfcr r9 + .if \kvm + KVMTEST \hsrr \vec + .endif + .if \bitmask + lbz r10,PACAIRQSOFTMASK(r13) + andi. r10,r10,\bitmask + /* Associate vector numbers with bits in paca->irq_happened */ + .if \vec == 0x500 || \vec == 0xea0 + li r10,PACA_IRQ_EE + .elseif \vec == 0x900 + li r10,PACA_IRQ_DEC + .elseif \vec == 0xa00 || \vec == 0xe80 + li r10,PACA_IRQ_DBELL + .elseif \vec == 0xe60 + li r10,PACA_IRQ_HMI + .elseif \vec == 0xf00 + li r10,PACA_IRQ_PMI + .else + .abort "Bad maskable vector" + .endif + + .if \hsrr == EXC_HV_OR_STD + BEGIN_FTR_SECTION + bne masked_Hinterrupt + FTR_SECTION_ELSE + bne masked_interrupt + ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + .elseif \hsrr + bne masked_Hinterrupt + .else + bne masked_interrupt + .endif + .endif + + std r11,\area\()+EX_R11(r13) + std r12,\area\()+EX_R12(r13) + + /* + * DAR/DSISR, SCRATCH0 must be read before setting MSR[RI], + * because a d-side MCE will clobber those registers so is + * not recoverable if they are live. + */ + GET_SCRATCH0(r10) + std r10,\area\()+EX_R13(r13) + .if \dar + .if \hsrr + mfspr r10,SPRN_HDAR + .else + mfspr r10,SPRN_DAR + .endif + std r10,\area\()+EX_DAR(r13) + .endif + .if \dsisr + .if \hsrr + mfspr r10,SPRN_HDSISR + .else + mfspr r10,SPRN_DSISR + .endif + stw r10,\area\()+EX_DSISR(r13) + .endif + + .if \early == 2 + /* nothing more */ + .elseif \early + mfctr r10 /* save ctr, even for !RELOCATABLE */ + BRANCH_TO_C000(r11, \name\()_early_common) + .elseif !\virt + INT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr, \ri + .else + INT_VIRT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr + .endif + .if \ool + .popsection + .endif +.endm + /* * On entry r13 points to the paca, r9-r13 are saved in the paca, * r9 contains the saved CR, r11 and r12 contain the saved SRR0 and @@ -555,136 +685,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) #define FINISH_NAP #endif -/* - * This is the BOOK3S interrupt entry code macro. - * - * This can result in one of several things happening: - * - Branch to the _common handler, relocated, in virtual mode. - * These are normal interrupts (synchronous and asynchronous) handled by - * the kernel. - * - Branch to KVM, relocated but real mode interrupts remain in real mode. - * These occur when HSTATE_IN_GUEST is set. The interrupt may be caused by - * / intended for host or guest kernel, but KVM must always be involved - * because the machine state is set for guest execution. - * - Branch to the masked handler, unrelocated. - * These occur when maskable asynchronous interrupts are taken with the - * irq_soft_mask set. - * - Branch to an "early" handler in real mode but relocated. - * This is done if early=1. MCE and HMI use these to handle errors in real - * mode. - * - Fall through and continue executing in real, unrelocated mode. - * This is done if early=2. - */ -.macro INT_HANDLER name, vec, ool=0, early=0, virt=0, hsrr=0, area=PACA_EXGEN, ri=1, dar=0, dsisr=0, bitmask=0, kvm=0 - SET_SCRATCH0(r13) /* save r13 */ - GET_PACA(r13) - std r9,\area\()+EX_R9(r13) /* save r9 */ - OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR) - HMT_MEDIUM - std r10,\area\()+EX_R10(r13) /* save r10 - r12 */ - OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR) - .if \ool - .if !\virt - b tramp_real_\name - .pushsection .text - TRAMP_REAL_BEGIN(tramp_real_\name) - .else - b tramp_virt_\name - .pushsection .text - TRAMP_VIRT_BEGIN(tramp_virt_\name) - .endif - .endif - - OPT_SAVE_REG_TO_PACA(\area\()+EX_PPR, r9, CPU_FTR_HAS_PPR) - OPT_SAVE_REG_TO_PACA(\area\()+EX_CFAR, r10, CPU_FTR_CFAR) - INTERRUPT_TO_KERNEL - SAVE_CTR(r10, \area\()) - mfcr r9 - .if \kvm - KVMTEST \hsrr \vec - .endif - .if \bitmask - lbz r10,PACAIRQSOFTMASK(r13) - andi. r10,r10,\bitmask - /* Associate vector numbers with bits in paca->irq_happened */ - .if \vec == 0x500 || \vec == 0xea0 - li r10,PACA_IRQ_EE - .elseif \vec == 0x900 - li r10,PACA_IRQ_DEC - .elseif \vec == 0xa00 || \vec == 0xe80 - li r10,PACA_IRQ_DBELL - .elseif \vec == 0xe60 - li r10,PACA_IRQ_HMI - .elseif \vec == 0xf00 - li r10,PACA_IRQ_PMI - .else - .abort "Bad maskable vector" - .endif - - .if \hsrr == EXC_HV_OR_STD - BEGIN_FTR_SECTION - bne masked_Hinterrupt - FTR_SECTION_ELSE - bne masked_interrupt - ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) - .elseif \hsrr - bne masked_Hinterrupt - .else - bne masked_interrupt - .endif - .endif - - std r11,\area\()+EX_R11(r13) - std r12,\area\()+EX_R12(r13) - - /* - * DAR/DSISR, SCRATCH0 must be read before setting MSR[RI], - * because a d-side MCE will clobber those registers so is - * not recoverable if they are live. - */ - GET_SCRATCH0(r10) - std r10,\area\()+EX_R13(r13) - .if \dar - .if \hsrr - mfspr r10,SPRN_HDAR - .else - mfspr r10,SPRN_DAR - .endif - std r10,\area\()+EX_DAR(r13) - .endif - .if \dsisr - .if \hsrr - mfspr r10,SPRN_HDSISR - .else - mfspr r10,SPRN_DSISR - .endif - stw r10,\area\()+EX_DSISR(r13) - .endif - - .if \early == 2 - /* nothing more */ - .elseif \early - mfctr r10 /* save ctr, even for !RELOCATABLE */ - BRANCH_TO_C000(r11, \name\()_early_common) - .elseif !\virt - INT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr, \ri - .else - INT_VIRT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr - .endif - .if \ool - .popsection - .endif -.endm - -.macro INT_KVM_HANDLER vec, hsrr, area, skip - .if \hsrr - TRAMP_KVM_BEGIN(do_kvm_H\vec\()) - .else - TRAMP_KVM_BEGIN(do_kvm_\vec\()) - .endif - KVM_HANDLER \vec, \hsrr, \area, \skip -.endm - #define EXC_COMMON(name, realvec, hdlr) \ EXC_COMMON_BEGIN(name); \ INT_COMMON realvec, PACA_EXGEN, 1, 1, 1, 0, 0 ; \ -- cgit v1.2.3 From 1b3599829a2560fe6c9a5a4a6bb6b2bc4e5bdaee Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:56:59 +1000 Subject: powerpc/64s/exception: program check handler do not branch into a macro It is clever, but the small code saving is not worth the spaghetti of jumping to a label in an expanded macro, particularly when the label is just a number rather than a descriptive name. So expand the INT_COMMON macro twice, once for the stack and no stack cases, and branch to those. The slight code size increase is worth the improved clarity of branches for this non-performance critical code. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-35-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index aabd84e83615..b0649d56bb15 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -533,11 +533,10 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) andi. r10,r12,MSR_PR /* See if coming from user */ mr r10,r1 /* Save r1 */ subi r1,r1,INT_FRAME_SIZE /* alloc frame on kernel stack */ - beq- 1f + beq- 100f ld r1,PACAKSAVE(r13) /* kernel stack to use */ -1: tdgei r1,-INT_FRAME_SIZE /* trap if r1 is in userspace */ - EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,0 -3: +100: tdgei r1,-INT_FRAME_SIZE /* trap if r1 is in userspace */ + EMIT_BUG_ENTRY 100b,__FILE__,__LINE__,0 .endif std r9,_CCR(r1) /* save CR in stackframe */ @@ -551,10 +550,10 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) .if \kaup kuap_save_amr_and_lock r9, r10, cr1, cr0 .endif - beq 4f /* if from kernel mode */ + beq 101f /* if from kernel mode */ ACCOUNT_CPU_USER_ENTRY(r13, r9, r10) SAVE_PPR(\area, r9) -4: +101: .else .if \kaup kuap_save_amr_and_lock r9, r10, cr1 @@ -1325,9 +1324,11 @@ EXC_COMMON_BEGIN(program_check_common) mr r10,r1 /* Save r1 */ ld r1,PACAEMERGSP(r13) /* Use emergency stack */ subi r1,r1,INT_FRAME_SIZE /* alloc stack frame */ - b 3f /* Jump into the macro !! */ + INT_COMMON 0x700, PACA_EXGEN, 0, 1, 1, 0, 0 + b 3f 2: INT_COMMON 0x700, PACA_EXGEN, 1, 1, 1, 0, 0 +3: bl save_nvgprs addi r3,r1,STACK_FRAME_OVERHEAD bl program_check_exception -- cgit v1.2.3 From 05f97d94dd0e2883bc7b2e6b7b5e4c088e0d1437 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:57:00 +1000 Subject: powerpc/64s/exception: Remove pointless KVM handler name bifurcation Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-36-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 102 +++++++++++++++-------------------- 1 file changed, 44 insertions(+), 58 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index b0649d56bb15..f54a38417904 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -193,12 +193,8 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) mtctr reg; \ bctr -.macro INT_KVM_HANDLER vec, hsrr, area, skip - .if \hsrr - TRAMP_KVM_BEGIN(do_kvm_H\vec\()) - .else - TRAMP_KVM_BEGIN(do_kvm_\vec\()) - .endif +.macro INT_KVM_HANDLER name, vec, hsrr, area, skip + TRAMP_KVM_BEGIN(\name\()_kvm) KVM_HANDLER \vec, \hsrr, \area, \skip .endm @@ -214,20 +210,10 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #define kvmppc_interrupt kvmppc_interrupt_pr #endif -.macro KVMTEST hsrr, n +.macro KVMTEST name, hsrr, n lbz r10,HSTATE_IN_GUEST(r13) cmpwi r10,0 - .if \hsrr == EXC_HV_OR_STD - BEGIN_FTR_SECTION - bne do_kvm_H\n - FTR_SECTION_ELSE - bne do_kvm_\n - ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) - .elseif \hsrr - bne do_kvm_H\n - .else - bne do_kvm_\n - .endif + bne \name\()_kvm .endm .macro KVM_HANDLER vec, hsrr, area, skip @@ -298,9 +284,9 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) .endm #else -.macro KVMTEST hsrr, n +.macro KVMTEST name, hsrr, n .endm -.macro KVM_HANDLER vec, hsrr, area, skip +.macro KVM_HANDLER name, vec, hsrr, area, skip .endm #endif @@ -445,7 +431,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) SAVE_CTR(r10, \area\()) mfcr r9 .if \kvm - KVMTEST \hsrr \vec + KVMTEST \name \hsrr \vec .endif .if \bitmask lbz r10,PACAIRQSOFTMASK(r13) @@ -842,7 +828,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) */ EXC_REAL_END(system_reset, 0x100, 0x100) EXC_VIRT_NONE(0x4100, 0x100) -INT_KVM_HANDLER 0x100, EXC_STD, PACA_EXNMI, 0 +INT_KVM_HANDLER system_reset 0x100, EXC_STD, PACA_EXNMI, 0 #ifdef CONFIG_PPC_P7_NAP TRAMP_REAL_BEGIN(system_reset_idle_wake) @@ -936,7 +922,7 @@ TRAMP_REAL_BEGIN(machine_check_fwnmi) INT_HANDLER machine_check, 0x200, early=1, area=PACA_EXMC, dar=1, dsisr=1 #endif -INT_KVM_HANDLER 0x200, EXC_STD, PACA_EXMC, 1 +INT_KVM_HANDLER machine_check 0x200, EXC_STD, PACA_EXMC, 1 #define MACHINE_CHECK_HANDLER_WINDUP \ /* Clear MSR_RI before setting SRR0 and SRR1. */\ @@ -1022,8 +1008,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) #ifdef CONFIG_KVM_BOOK3S_64_HANDLER /* * Check if we are coming from guest. If yes, then run the normal - * exception handler which will take the do_kvm_200->kvmppc_interrupt - * branch to deliver the MC event to guest. + * exception handler which will take the + * machine_check_kvm->kvmppc_interrupt branch to deliver the MC event + * to guest. */ lbz r11,HSTATE_IN_GUEST(r13) cmpwi r11,0 /* Check if coming from guest */ @@ -1163,7 +1150,7 @@ EXC_REAL_END(data_access, 0x300, 0x80) EXC_VIRT_BEGIN(data_access, 0x4300, 0x80) INT_HANDLER data_access, 0x300, virt=1, dar=1, dsisr=1 EXC_VIRT_END(data_access, 0x4300, 0x80) -INT_KVM_HANDLER 0x300, EXC_STD, PACA_EXGEN, 1 +INT_KVM_HANDLER data_access, 0x300, EXC_STD, PACA_EXGEN, 1 EXC_COMMON_BEGIN(data_access_common) /* * Here r13 points to the paca, r9 contains the saved CR, @@ -1189,7 +1176,7 @@ EXC_REAL_END(data_access_slb, 0x380, 0x80) EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80) INT_HANDLER data_access_slb, 0x380, virt=1, area=PACA_EXSLB, dar=1 EXC_VIRT_END(data_access_slb, 0x4380, 0x80) -INT_KVM_HANDLER 0x380, EXC_STD, PACA_EXSLB, 1 +INT_KVM_HANDLER data_access_slb, 0x380, EXC_STD, PACA_EXSLB, 1 EXC_COMMON_BEGIN(data_access_slb_common) INT_COMMON 0x380, PACA_EXSLB, 1, 1, 0, 1, 0 ld r4,_DAR(r1) @@ -1221,7 +1208,7 @@ EXC_REAL_END(instruction_access, 0x400, 0x80) EXC_VIRT_BEGIN(instruction_access, 0x4400, 0x80) INT_HANDLER instruction_access, 0x400, virt=1 EXC_VIRT_END(instruction_access, 0x4400, 0x80) -INT_KVM_HANDLER 0x400, EXC_STD, PACA_EXGEN, 0 +INT_KVM_HANDLER instruction_access, 0x400, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(instruction_access_common) INT_COMMON 0x400, PACA_EXGEN, 1, 1, 1, 2, 2 ld r12,_MSR(r1) @@ -1241,7 +1228,7 @@ EXC_REAL_END(instruction_access_slb, 0x480, 0x80) EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80) INT_HANDLER instruction_access_slb, 0x480, virt=1, area=PACA_EXSLB EXC_VIRT_END(instruction_access_slb, 0x4480, 0x80) -INT_KVM_HANDLER 0x480, EXC_STD, PACA_EXSLB, 0 +INT_KVM_HANDLER instruction_access_slb, 0x480, EXC_STD, PACA_EXSLB, 0 EXC_COMMON_BEGIN(instruction_access_slb_common) INT_COMMON 0x480, PACA_EXSLB, 1, 1, 0, 2, 0 ld r4,_DAR(r1) @@ -1272,8 +1259,7 @@ EXC_REAL_END(hardware_interrupt, 0x500, 0x100) EXC_VIRT_BEGIN(hardware_interrupt, 0x4500, 0x100) INT_HANDLER hardware_interrupt, 0x500, virt=1, hsrr=EXC_HV_OR_STD, bitmask=IRQS_DISABLED, kvm=1 EXC_VIRT_END(hardware_interrupt, 0x4500, 0x100) -INT_KVM_HANDLER 0x500, EXC_STD, PACA_EXGEN, 0 -INT_KVM_HANDLER 0x500, EXC_HV, PACA_EXGEN, 0 +INT_KVM_HANDLER hardware_interrupt, 0x500, EXC_HV_OR_STD, PACA_EXGEN, 0 EXC_COMMON_ASYNC(hardware_interrupt_common, 0x500, do_IRQ) @@ -1283,7 +1269,7 @@ EXC_REAL_END(alignment, 0x600, 0x100) EXC_VIRT_BEGIN(alignment, 0x4600, 0x100) INT_HANDLER alignment, 0x600, virt=1, dar=1, dsisr=1 EXC_VIRT_END(alignment, 0x4600, 0x100) -INT_KVM_HANDLER 0x600, EXC_STD, PACA_EXGEN, 0 +INT_KVM_HANDLER alignment, 0x600, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(alignment_common) INT_COMMON 0x600, PACA_EXGEN, 1, 1, 1, 1, 1 bl save_nvgprs @@ -1298,7 +1284,7 @@ EXC_REAL_END(program_check, 0x700, 0x100) EXC_VIRT_BEGIN(program_check, 0x4700, 0x100) INT_HANDLER program_check, 0x700, virt=1 EXC_VIRT_END(program_check, 0x4700, 0x100) -INT_KVM_HANDLER 0x700, EXC_STD, PACA_EXGEN, 0 +INT_KVM_HANDLER program_check, 0x700, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(program_check_common) /* * It's possible to receive a TM Bad Thing type program check with @@ -1341,7 +1327,7 @@ EXC_REAL_END(fp_unavailable, 0x800, 0x100) EXC_VIRT_BEGIN(fp_unavailable, 0x4800, 0x100) INT_HANDLER fp_unavailable, 0x800, virt=1 EXC_VIRT_END(fp_unavailable, 0x4800, 0x100) -INT_KVM_HANDLER 0x800, EXC_STD, PACA_EXGEN, 0 +INT_KVM_HANDLER fp_unavailable, 0x800, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(fp_unavailable_common) INT_COMMON 0x800, PACA_EXGEN, 1, 1, 0, 0, 0 bne 1f /* if from user, just load it up */ @@ -1379,7 +1365,7 @@ EXC_REAL_END(decrementer, 0x900, 0x80) EXC_VIRT_BEGIN(decrementer, 0x4900, 0x80) INT_HANDLER decrementer, 0x900, virt=1, bitmask=IRQS_DISABLED EXC_VIRT_END(decrementer, 0x4900, 0x80) -INT_KVM_HANDLER 0x900, EXC_STD, PACA_EXGEN, 0 +INT_KVM_HANDLER decrementer, 0x900, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_ASYNC(decrementer_common, 0x900, timer_interrupt) @@ -1389,7 +1375,7 @@ EXC_REAL_END(hdecrementer, 0x980, 0x80) EXC_VIRT_BEGIN(hdecrementer, 0x4980, 0x80) INT_HANDLER hdecrementer, 0x980, virt=1, hsrr=EXC_HV, kvm=1 EXC_VIRT_END(hdecrementer, 0x4980, 0x80) -INT_KVM_HANDLER 0x980, EXC_HV, PACA_EXGEN, 0 +INT_KVM_HANDLER hdecrementer, 0x980, EXC_HV, PACA_EXGEN, 0 EXC_COMMON(hdecrementer_common, 0x980, hdec_interrupt) @@ -1399,7 +1385,7 @@ EXC_REAL_END(doorbell_super, 0xa00, 0x100) EXC_VIRT_BEGIN(doorbell_super, 0x4a00, 0x100) INT_HANDLER doorbell_super, 0xa00, virt=1, bitmask=IRQS_DISABLED EXC_VIRT_END(doorbell_super, 0x4a00, 0x100) -INT_KVM_HANDLER 0xa00, EXC_STD, PACA_EXGEN, 0 +INT_KVM_HANDLER doorbell_super, 0xa00, EXC_STD, PACA_EXGEN, 0 #ifdef CONFIG_PPC_DOORBELL EXC_COMMON_ASYNC(doorbell_super_common, 0xa00, doorbell_exception) #else @@ -1458,7 +1444,7 @@ EXC_VIRT_NONE(0x4b00, 0x100) GET_PACA(r13) std r10,PACA_EXGEN+EX_R10(r13) INTERRUPT_TO_KERNEL - KVMTEST EXC_STD 0xc00 /* uses r10, branch to do_kvm_0xc00_system_call */ + KVMTEST system_call EXC_STD 0xc00 /* uses r10, branch to system_call_kvm */ mfctr r9 #else mr r9,r13 @@ -1524,7 +1510,7 @@ EXC_VIRT_END(system_call, 0x4c00, 0x100) * ctr = orig r13 * orig r10 saved in PACA */ -TRAMP_KVM_BEGIN(do_kvm_0xc00) +TRAMP_KVM_BEGIN(system_call_kvm) /* * Save the PPR (on systems that support it) before changing to * HMT_MEDIUM. That allows the KVM code to save that value into the @@ -1547,7 +1533,7 @@ EXC_REAL_END(single_step, 0xd00, 0x100) EXC_VIRT_BEGIN(single_step, 0x4d00, 0x100) INT_HANDLER single_step, 0xd00, virt=1 EXC_VIRT_END(single_step, 0x4d00, 0x100) -INT_KVM_HANDLER 0xd00, EXC_STD, PACA_EXGEN, 0 +INT_KVM_HANDLER single_step, 0xd00, EXC_STD, PACA_EXGEN, 0 EXC_COMMON(single_step_common, 0xd00, single_step_exception) @@ -1557,7 +1543,7 @@ EXC_REAL_END(h_data_storage, 0xe00, 0x20) EXC_VIRT_BEGIN(h_data_storage, 0x4e00, 0x20) INT_HANDLER h_data_storage, 0xe00, ool=1, virt=1, hsrr=EXC_HV, dar=1, dsisr=1, kvm=1 EXC_VIRT_END(h_data_storage, 0x4e00, 0x20) -INT_KVM_HANDLER 0xe00, EXC_HV, PACA_EXGEN, 1 +INT_KVM_HANDLER h_data_storage, 0xe00, EXC_HV, PACA_EXGEN, 1 EXC_COMMON_BEGIN(h_data_storage_common) INT_COMMON 0xe00, PACA_EXGEN, 1, 1, 1, 1, 1 bl save_nvgprs @@ -1578,7 +1564,7 @@ EXC_REAL_END(h_instr_storage, 0xe20, 0x20) EXC_VIRT_BEGIN(h_instr_storage, 0x4e20, 0x20) INT_HANDLER h_instr_storage, 0xe20, ool=1, virt=1, hsrr=EXC_HV, kvm=1 EXC_VIRT_END(h_instr_storage, 0x4e20, 0x20) -INT_KVM_HANDLER 0xe20, EXC_HV, PACA_EXGEN, 0 +INT_KVM_HANDLER h_instr_storage, 0xe20, EXC_HV, PACA_EXGEN, 0 EXC_COMMON(h_instr_storage_common, 0xe20, unknown_exception) @@ -1588,7 +1574,7 @@ EXC_REAL_END(emulation_assist, 0xe40, 0x20) EXC_VIRT_BEGIN(emulation_assist, 0x4e40, 0x20) INT_HANDLER emulation_assist, 0xe40, ool=1, virt=1, hsrr=EXC_HV, kvm=1 EXC_VIRT_END(emulation_assist, 0x4e40, 0x20) -INT_KVM_HANDLER 0xe40, EXC_HV, PACA_EXGEN, 0 +INT_KVM_HANDLER emulation_assist, 0xe40, EXC_HV, PACA_EXGEN, 0 EXC_COMMON(emulation_assist_common, 0xe40, emulation_assist_interrupt) @@ -1601,7 +1587,7 @@ EXC_REAL_BEGIN(hmi_exception, 0xe60, 0x20) INT_HANDLER hmi_exception, 0xe60, ool=1, early=1, hsrr=EXC_HV, ri=0, kvm=1 EXC_REAL_END(hmi_exception, 0xe60, 0x20) EXC_VIRT_NONE(0x4e60, 0x20) -INT_KVM_HANDLER 0xe60, EXC_HV, PACA_EXGEN, 0 +INT_KVM_HANDLER hmi_exception, 0xe60, EXC_HV, PACA_EXGEN, 0 EXC_COMMON_BEGIN(hmi_exception_early_common) mtctr r10 /* Restore ctr */ mfspr r11,SPRN_HSRR0 /* Save HSRR0 */ @@ -1645,7 +1631,7 @@ EXC_REAL_END(h_doorbell, 0xe80, 0x20) EXC_VIRT_BEGIN(h_doorbell, 0x4e80, 0x20) INT_HANDLER h_doorbell, 0xe80, ool=1, virt=1, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1 EXC_VIRT_END(h_doorbell, 0x4e80, 0x20) -INT_KVM_HANDLER 0xe80, EXC_HV, PACA_EXGEN, 0 +INT_KVM_HANDLER h_doorbell, 0xe80, EXC_HV, PACA_EXGEN, 0 #ifdef CONFIG_PPC_DOORBELL EXC_COMMON_ASYNC(h_doorbell_common, 0xe80, doorbell_exception) #else @@ -1659,7 +1645,7 @@ EXC_REAL_END(h_virt_irq, 0xea0, 0x20) EXC_VIRT_BEGIN(h_virt_irq, 0x4ea0, 0x20) INT_HANDLER h_virt_irq, 0xea0, ool=1, virt=1, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1 EXC_VIRT_END(h_virt_irq, 0x4ea0, 0x20) -INT_KVM_HANDLER 0xea0, EXC_HV, PACA_EXGEN, 0 +INT_KVM_HANDLER h_virt_irq, 0xea0, EXC_HV, PACA_EXGEN, 0 EXC_COMMON_ASYNC(h_virt_irq_common, 0xea0, do_IRQ) @@ -1675,7 +1661,7 @@ EXC_REAL_END(performance_monitor, 0xf00, 0x20) EXC_VIRT_BEGIN(performance_monitor, 0x4f00, 0x20) INT_HANDLER performance_monitor, 0xf00, ool=1, virt=1, bitmask=IRQS_PMI_DISABLED EXC_VIRT_END(performance_monitor, 0x4f00, 0x20) -INT_KVM_HANDLER 0xf00, EXC_STD, PACA_EXGEN, 0 +INT_KVM_HANDLER performance_monitor, 0xf00, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_ASYNC(performance_monitor_common, 0xf00, performance_monitor_exception) @@ -1685,7 +1671,7 @@ EXC_REAL_END(altivec_unavailable, 0xf20, 0x20) EXC_VIRT_BEGIN(altivec_unavailable, 0x4f20, 0x20) INT_HANDLER altivec_unavailable, 0xf20, ool=1, virt=1 EXC_VIRT_END(altivec_unavailable, 0x4f20, 0x20) -INT_KVM_HANDLER 0xf20, EXC_STD, PACA_EXGEN, 0 +INT_KVM_HANDLER altivec_unavailable, 0xf20, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(altivec_unavailable_common) INT_COMMON 0xf20, PACA_EXGEN, 1, 1, 0, 0, 0 #ifdef CONFIG_ALTIVEC @@ -1726,7 +1712,7 @@ EXC_REAL_END(vsx_unavailable, 0xf40, 0x20) EXC_VIRT_BEGIN(vsx_unavailable, 0x4f40, 0x20) INT_HANDLER vsx_unavailable, 0xf40, ool=1, virt=1 EXC_VIRT_END(vsx_unavailable, 0x4f40, 0x20) -INT_KVM_HANDLER 0xf40, EXC_STD, PACA_EXGEN, 0 +INT_KVM_HANDLER vsx_unavailable, 0xf40, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(vsx_unavailable_common) INT_COMMON 0xf40, PACA_EXGEN, 1, 1, 0, 0, 0 #ifdef CONFIG_VSX @@ -1766,7 +1752,7 @@ EXC_REAL_END(facility_unavailable, 0xf60, 0x20) EXC_VIRT_BEGIN(facility_unavailable, 0x4f60, 0x20) INT_HANDLER facility_unavailable, 0xf60, ool=1, virt=1 EXC_VIRT_END(facility_unavailable, 0x4f60, 0x20) -INT_KVM_HANDLER 0xf60, EXC_STD, PACA_EXGEN, 0 +INT_KVM_HANDLER facility_unavailable, 0xf60, EXC_STD, PACA_EXGEN, 0 EXC_COMMON(facility_unavailable_common, 0xf60, facility_unavailable_exception) @@ -1776,7 +1762,7 @@ EXC_REAL_END(h_facility_unavailable, 0xf80, 0x20) EXC_VIRT_BEGIN(h_facility_unavailable, 0x4f80, 0x20) INT_HANDLER h_facility_unavailable, 0xf80, ool=1, virt=1, hsrr=EXC_HV, kvm=1 EXC_VIRT_END(h_facility_unavailable, 0x4f80, 0x20) -INT_KVM_HANDLER 0xf80, EXC_HV, PACA_EXGEN, 0 +INT_KVM_HANDLER h_facility_unavailable, 0xf80, EXC_HV, PACA_EXGEN, 0 EXC_COMMON(h_facility_unavailable_common, 0xf80, facility_unavailable_exception) @@ -1797,7 +1783,7 @@ EXC_REAL_BEGIN(cbe_system_error, 0x1200, 0x100) INT_HANDLER cbe_system_error, 0x1200, ool=1, hsrr=EXC_HV, kvm=1 EXC_REAL_END(cbe_system_error, 0x1200, 0x100) EXC_VIRT_NONE(0x5200, 0x100) -INT_KVM_HANDLER 0x1200, EXC_HV, PACA_EXGEN, 1 +INT_KVM_HANDLER cbe_system_error, 0x1200, EXC_HV, PACA_EXGEN, 1 EXC_COMMON(cbe_system_error_common, 0x1200, cbe_system_error_exception) #else /* CONFIG_CBE_RAS */ EXC_REAL_NONE(0x1200, 0x100) @@ -1811,7 +1797,7 @@ EXC_REAL_END(instruction_breakpoint, 0x1300, 0x100) EXC_VIRT_BEGIN(instruction_breakpoint, 0x5300, 0x100) INT_HANDLER instruction_breakpoint, 0x1300, virt=1 EXC_VIRT_END(instruction_breakpoint, 0x5300, 0x100) -INT_KVM_HANDLER 0x1300, EXC_STD, PACA_EXGEN, 1 +INT_KVM_HANDLER instruction_breakpoint, 0x1300, EXC_STD, PACA_EXGEN, 1 EXC_COMMON(instruction_breakpoint_common, 0x1300, instruction_breakpoint_exception) @@ -1825,7 +1811,7 @@ EXC_REAL_BEGIN(denorm_exception_hv, 0x1500, 0x100) andis. r10,r10,(HSRR1_DENORM)@h /* denorm? */ bne+ denorm_assist #endif - KVMTEST EXC_HV 0x1500 + KVMTEST denorm_exception_hv, EXC_HV 0x1500 INT_SAVE_SRR_AND_JUMP denorm_common, EXC_HV, 1 EXC_REAL_END(denorm_exception_hv, 0x1500, 0x100) @@ -1841,7 +1827,7 @@ EXC_VIRT_END(denorm_exception, 0x5500, 0x100) EXC_VIRT_NONE(0x5500, 0x100) #endif -INT_KVM_HANDLER 0x1500, EXC_HV, PACA_EXGEN, 0 +INT_KVM_HANDLER denorm_exception_hv, 0x1500, EXC_HV, PACA_EXGEN, 0 #ifdef CONFIG_PPC_DENORMALISATION TRAMP_REAL_BEGIN(denorm_assist) @@ -1920,7 +1906,7 @@ EXC_REAL_BEGIN(cbe_maintenance, 0x1600, 0x100) INT_HANDLER cbe_maintenance, 0x1600, ool=1, hsrr=EXC_HV, kvm=1 EXC_REAL_END(cbe_maintenance, 0x1600, 0x100) EXC_VIRT_NONE(0x5600, 0x100) -INT_KVM_HANDLER 0x1600, EXC_HV, PACA_EXGEN, 1 +INT_KVM_HANDLER cbe_maintenance, 0x1600, EXC_HV, PACA_EXGEN, 1 EXC_COMMON(cbe_maintenance_common, 0x1600, cbe_maintenance_exception) #else /* CONFIG_CBE_RAS */ EXC_REAL_NONE(0x1600, 0x100) @@ -1934,7 +1920,7 @@ EXC_REAL_END(altivec_assist, 0x1700, 0x100) EXC_VIRT_BEGIN(altivec_assist, 0x5700, 0x100) INT_HANDLER altivec_assist, 0x1700, virt=1 EXC_VIRT_END(altivec_assist, 0x5700, 0x100) -INT_KVM_HANDLER 0x1700, EXC_STD, PACA_EXGEN, 0 +INT_KVM_HANDLER altivec_assist, 0x1700, EXC_STD, PACA_EXGEN, 0 #ifdef CONFIG_ALTIVEC EXC_COMMON(altivec_assist_common, 0x1700, altivec_assist_exception) #else @@ -1947,7 +1933,7 @@ EXC_REAL_BEGIN(cbe_thermal, 0x1800, 0x100) INT_HANDLER cbe_thermal, 0x1800, ool=1, hsrr=EXC_HV, kvm=1 EXC_REAL_END(cbe_thermal, 0x1800, 0x100) EXC_VIRT_NONE(0x5800, 0x100) -INT_KVM_HANDLER 0x1800, EXC_HV, PACA_EXGEN, 1 +INT_KVM_HANDLER cbe_thermal, 0x1800, EXC_HV, PACA_EXGEN, 1 EXC_COMMON(cbe_thermal_common, 0x1800, cbe_thermal_exception) #else /* CONFIG_CBE_RAS */ EXC_REAL_NONE(0x1800, 0x100) -- cgit v1.2.3 From 9b123d1ea23701bc00ebf712f1e03b25b8195eeb Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 2 Aug 2019 20:57:01 +1000 Subject: powerpc/64s/exception: reduce page fault unnecessary loads This avoids 3 loads in the radix page fault case, 1 load in the hash fault case, and 2 loads in the hash miss page fault case. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190802105709.27696-37-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 38 ++++++++++++++++-------------------- 1 file changed, 17 insertions(+), 21 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index f54a38417904..d0018dd17e0a 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1159,11 +1159,11 @@ EXC_COMMON_BEGIN(data_access_common) * EX_DAR and EX_DSISR have saved DAR/DSISR */ INT_COMMON 0x300, PACA_EXGEN, 1, 1, 1, 1, 1 - ld r12,_MSR(r1) - ld r3,_DAR(r1) - ld r4,_DSISR(r1) - li r5,0x300 + ld r4,_DAR(r1) + ld r5,_DSISR(r1) BEGIN_MMU_FTR_SECTION + ld r6,_MSR(r1) + li r3,0x300 b do_hash_page /* Try to handle as hpte fault */ MMU_FTR_SECTION_ELSE b handle_page_fault @@ -1211,11 +1211,11 @@ EXC_VIRT_END(instruction_access, 0x4400, 0x80) INT_KVM_HANDLER instruction_access, 0x400, EXC_STD, PACA_EXGEN, 0 EXC_COMMON_BEGIN(instruction_access_common) INT_COMMON 0x400, PACA_EXGEN, 1, 1, 1, 2, 2 - ld r12,_MSR(r1) - ld r3,_DAR(r1) - ld r4,_DSISR(r1) - li r5,0x400 + ld r4,_DAR(r1) + ld r5,_DSISR(r1) BEGIN_MMU_FTR_SECTION + ld r6,_MSR(r1) + li r3,0x400 b do_hash_page /* Try to handle as hpte fault */ MMU_FTR_SECTION_ELSE b handle_page_fault @@ -2260,7 +2260,7 @@ do_hash_page: #ifdef CONFIG_PPC_BOOK3S_64 lis r0,(DSISR_BAD_FAULT_64S | DSISR_DABRMATCH | DSISR_KEYFAULT)@h ori r0,r0,DSISR_BAD_FAULT_64S@l - and. r0,r4,r0 /* weird error? */ + and. r0,r5,r0 /* weird error? */ bne- handle_page_fault /* if not, try to insert a HPTE */ ld r11, PACA_THREAD_INFO(r13) lwz r0,TI_PREEMPT(r11) /* If we're in an "NMI" */ @@ -2268,15 +2268,13 @@ do_hash_page: bne 77f /* then don't call hash_page now */ /* - * r3 contains the faulting address - * r4 msr - * r5 contains the trap number - * r6 contains dsisr + * r3 contains the trap number + * r4 contains the faulting address + * r5 contains dsisr + * r6 msr * * at return r3 = 0 for success, 1 for page fault, negative for error */ - mr r4,r12 - ld r6,_DSISR(r1) bl __hash_page /* build HPTE if possible */ cmpdi r3,0 /* see if __hash_page succeeded */ @@ -2286,16 +2284,15 @@ do_hash_page: /* Error */ blt- 13f - /* Reload DSISR into r4 for the DABR check below */ - ld r4,_DSISR(r1) + /* Reload DAR/DSISR into r4/r5 for the DABR check below */ + ld r4,_DAR(r1) + ld r5,_DSISR(r1) #endif /* CONFIG_PPC_BOOK3S_64 */ /* Here we have a page fault that hash_page can't handle. */ handle_page_fault: -11: andis. r0,r4,DSISR_DABRMATCH@h +11: andis. r0,r5,DSISR_DABRMATCH@h bne- handle_dabr_fault - ld r4,_DAR(r1) - ld r5,_DSISR(r1) addi r3,r1,STACK_FRAME_OVERHEAD bl do_page_fault cmpdi r3,0 @@ -2342,7 +2339,6 @@ handle_dabr_fault: * the access, or panic if there isn't a handler. */ 77: bl save_nvgprs - mr r4,r3 addi r3,r1,STACK_FRAME_OVERHEAD li r5,SIGSEGV bl bad_page_fault -- cgit v1.2.3 From f9f3232a7d0ab73a33d11f4056c5823010f03d55 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 6 Aug 2019 15:01:50 +0300 Subject: dma-mapping: explicitly wire up ->mmap and ->get_sgtable While the default ->mmap and ->get_sgtable implementations work for the majority of our dma_map_ops impementations they are inherently safe for others that don't use the page allocator or CMA and/or use their own way of remapping not covered by the common code. So remove the defaults if these methods are not wired up, but instead wire up the default implementations for all safe instances. Fixes: e1c7e324539a ("dma-mapping: always provide the dma_map_ops based implementation") Signed-off-by: Christoph Hellwig --- arch/powerpc/kernel/dma-iommu.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c index a0879674a9c8..2f5a53874f6d 100644 --- a/arch/powerpc/kernel/dma-iommu.c +++ b/arch/powerpc/kernel/dma-iommu.c @@ -208,4 +208,6 @@ const struct dma_map_ops dma_iommu_ops = { .sync_single_for_device = dma_iommu_sync_for_device, .sync_sg_for_cpu = dma_iommu_sync_sg_for_cpu, .sync_sg_for_device = dma_iommu_sync_sg_for_device, + .mmap = dma_common_mmap, + .get_sgtable = dma_common_get_sgtable, }; -- cgit v1.2.3 From 8205d5d98ef7f155de211f5e2eb6ca03d95a5a60 Mon Sep 17 00:00:00 2001 From: Gustavo Romero Date: Wed, 4 Sep 2019 00:55:27 -0400 Subject: powerpc/tm: Fix FP/VMX unavailable exceptions inside a transaction When we take an FP unavailable exception in a transaction we have to account for the hardware FP TM checkpointed registers being incorrect. In this case for this process we know the current and checkpointed FP registers must be the same (since FP wasn't used inside the transaction) hence in the thread_struct we copy the current FP registers to the checkpointed ones. This copy is done in tm_reclaim_thread(). We use thread->ckpt_regs.msr to determine if FP was on when in userspace. thread->ckpt_regs.msr represents the state of the MSR when exiting userspace. This is setup by check_if_tm_restore_required(). Unfortunatley there is an optimisation in giveup_all() which returns early if tsk->thread.regs->msr (via local variable `usermsr`) has FP=VEC=VSX=SPE=0. This optimisation means that check_if_tm_restore_required() is not called and hence thread->ckpt_regs.msr is not updated and will contain an old value. This can happen if due to load_fp=255 we start a userspace process with MSR FP=1 and then we are context switched out. In this case thread->ckpt_regs.msr will contain FP=1. If that same process is then context switched in and load_fp overflows, MSR will have FP=0. If that process now enters a transaction and does an FP instruction, the FP unavailable will not update thread->ckpt_regs.msr (the bug) and MSR FP=1 will be retained in thread->ckpt_regs.msr. tm_reclaim_thread() will then not perform the required memcpy and the checkpointed FP regs in the thread struct will contain the wrong values. The code path for this happening is: Userspace: Kernel Start userspace with MSR FP/VEC/VSX/SPE=0 TM=1 < ----- ... tbegin bne fp instruction FP unavailable ---- > fp_unavailable_tm() tm_reclaim_current() tm_reclaim_thread() giveup_all() return early since FP/VMX/VSX=0 /* ckpt MSR not updated (Incorrect) */ tm_reclaim() /* thread_struct ckpt FP regs contain junk (OK) */ /* Sees ckpt MSR FP=1 (Incorrect) */ no memcpy() performed /* thread_struct ckpt FP regs not fixed (Incorrect) */ tm_recheckpoint() /* Put junk in hardware checkpoint FP regs */ .... < ----- Return to userspace with MSR TM=1 FP=1 with junk in the FP TM checkpoint TM rollback reads FP junk This is a data integrity problem for the current process as the FP registers are corrupted. It's also a security problem as the FP registers from one process may be leaked to another. This patch moves up check_if_tm_restore_required() in giveup_all() to ensure thread->ckpt_regs.msr is updated correctly. A simple testcase to replicate this will be posted to tools/testing/selftests/powerpc/tm/tm-poison.c Similarly for VMX. This fixes CVE-2019-15030. Fixes: f48e91e87e67 ("powerpc/tm: Fix FP and VMX register corruption") Cc: stable@vger.kernel.org # 4.12+ Signed-off-by: Gustavo Romero Signed-off-by: Michael Neuling Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190904045529.23002-1-gromero@linux.vnet.ibm.com --- arch/powerpc/kernel/process.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 8fc4de0d22b4..437b57068cf8 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -497,13 +497,14 @@ void giveup_all(struct task_struct *tsk) if (!tsk->thread.regs) return; + check_if_tm_restore_required(tsk); + usermsr = tsk->thread.regs->msr; if ((usermsr & msr_all_available) == 0) return; msr_check_and_set(msr_all_available); - check_if_tm_restore_required(tsk); WARN_ON((usermsr & MSR_VSX) && !((usermsr & MSR_FP) && (usermsr & MSR_VEC))); -- cgit v1.2.3 From a8318c13e79badb92bc6640704a64cc022a6eb97 Mon Sep 17 00:00:00 2001 From: Gustavo Romero Date: Wed, 4 Sep 2019 00:55:28 -0400 Subject: powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts When in userspace and MSR FP=0 the hardware FP state is unrelated to the current process. This is extended for transactions where if tbegin is run with FP=0, the hardware checkpoint FP state will also be unrelated to the current process. Due to this, we need to ensure this hardware checkpoint is updated with the correct state before we enable FP for this process. Unfortunately we get this wrong when returning to a process from a hardware interrupt. A process that starts a transaction with FP=0 can take an interrupt. When the kernel returns back to that process, we change to FP=1 but with hardware checkpoint FP state not updated. If this transaction is then rolled back, the FP registers now contain the wrong state. The process looks like this: Userspace: Kernel Start userspace with MSR FP=0 TM=1 < ----- ... tbegin bne Hardware interrupt ---- > .... ret_from_except restore_math() /* sees FP=0 */ restore_fp() tm_active_with_fp() /* sees FP=1 (Incorrect) */ load_fp_state() FP = 0 -> 1 < ----- Return to userspace with MSR TM=1 FP=1 with junk in the FP TM checkpoint TM rollback reads FP junk When returning from the hardware exception, tm_active_with_fp() is incorrectly making restore_fp() call load_fp_state() which is setting FP=1. The fix is to remove tm_active_with_fp(). tm_active_with_fp() is attempting to handle the case where FP state has been changed inside a transaction. In this case the checkpointed and transactional FP state is different and hence we must restore the FP state (ie. we can't do lazy FP restore inside a transaction that's used FP). It's safe to remove tm_active_with_fp() as this case is handled by restore_tm_state(). restore_tm_state() detects if FP has been using inside a transaction and will set load_fp and call restore_math() to ensure the FP state (checkpoint and transaction) is restored. This is a data integrity problem for the current process as the FP registers are corrupted. It's also a security problem as the FP registers from one process may be leaked to another. Similarly for VMX. A simple testcase to replicate this will be posted to tools/testing/selftests/powerpc/tm/tm-poison.c This fixes CVE-2019-15031. Fixes: a7771176b439 ("powerpc: Don't enable FP/Altivec if not checkpointed") Cc: stable@vger.kernel.org # 4.15+ Signed-off-by: Gustavo Romero Signed-off-by: Michael Neuling Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190904045529.23002-2-gromero@linux.vnet.ibm.com --- arch/powerpc/kernel/process.c | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 437b57068cf8..7a84c9f1778e 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -101,21 +101,8 @@ static void check_if_tm_restore_required(struct task_struct *tsk) } } -static bool tm_active_with_fp(struct task_struct *tsk) -{ - return MSR_TM_ACTIVE(tsk->thread.regs->msr) && - (tsk->thread.ckpt_regs.msr & MSR_FP); -} - -static bool tm_active_with_altivec(struct task_struct *tsk) -{ - return MSR_TM_ACTIVE(tsk->thread.regs->msr) && - (tsk->thread.ckpt_regs.msr & MSR_VEC); -} #else static inline void check_if_tm_restore_required(struct task_struct *tsk) { } -static inline bool tm_active_with_fp(struct task_struct *tsk) { return false; } -static inline bool tm_active_with_altivec(struct task_struct *tsk) { return false; } #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ bool strict_msr_control; @@ -252,7 +239,7 @@ EXPORT_SYMBOL(enable_kernel_fp); static int restore_fp(struct task_struct *tsk) { - if (tsk->thread.load_fp || tm_active_with_fp(tsk)) { + if (tsk->thread.load_fp) { load_fp_state(¤t->thread.fp_state); current->thread.load_fp++; return 1; @@ -334,8 +321,7 @@ EXPORT_SYMBOL_GPL(flush_altivec_to_thread); static int restore_altivec(struct task_struct *tsk) { - if (cpu_has_feature(CPU_FTR_ALTIVEC) && - (tsk->thread.load_vec || tm_active_with_altivec(tsk))) { + if (cpu_has_feature(CPU_FTR_ALTIVEC) && (tsk->thread.load_vec)) { load_vr_state(&tsk->thread.vr_state); tsk->thread.used_vr = 1; tsk->thread.load_vec++; -- cgit v1.2.3 From 858805b336be1cabb3d9033adaa3676574d12e37 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Sun, 25 Aug 2019 22:28:37 +0900 Subject: kbuild: add $(BASH) to run scripts with bash-extension CONFIG_SHELL falls back to sh when bash is not installed on the system, but nobody is testing such a case since bash is usually installed. So, shell scripts invoked by CONFIG_SHELL are only tested with bash. It makes it difficult to test whether the hashbang #!/bin/sh is real. For example, #!/bin/sh in arch/powerpc/kernel/prom_init_check.sh is false. (I fixed it up) Besides, some shell scripts invoked by CONFIG_SHELL use bash-extension and #!/bin/bash is specified as the hashbang, while CONFIG_SHELL may not always be set to bash. Probably, the right thing to do is to introduce BASH, which is bash by default, and always set CONFIG_SHELL to sh. Replace $(CONFIG_SHELL) with $(BASH) for bash scripts. If somebody tries to add bash-extension to a #!/bin/sh script, it will be caught in testing because /bin/sh is a symlink to dash on some major distributions. Signed-off-by: Masahiro Yamada --- arch/powerpc/kernel/prom_init_check.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/prom_init_check.sh b/arch/powerpc/kernel/prom_init_check.sh index 160bef0d553d..78bab17b1396 100644 --- a/arch/powerpc/kernel/prom_init_check.sh +++ b/arch/powerpc/kernel/prom_init_check.sh @@ -33,7 +33,7 @@ OBJ="$2" ERROR=0 -function check_section() +check_section() { file=$1 section=$2 -- cgit v1.2.3 From 799abe283e5103d48e079149579b4f167c95ea0e Mon Sep 17 00:00:00 2001 From: Oliver O'Halloran Date: Tue, 3 Sep 2019 20:15:52 +1000 Subject: powerpc/eeh: Clean up EEH PEs after recovery finishes When the last device in an eeh_pe is removed the eeh_pe structure itself (and any empty parents) are freed since they are no longer needed. This results in a crash when a hotplug driver is involved since the following may occur: 1. Device is suprise removed. 2. Driver performs an MMIO, which fails and queues and eeh_event. 3. Hotplug driver receives a hotplug interrupt and removes any pci_devs that were under the slot. 4. pci_dev is torn down and the eeh_pe is freed. 5. The EEH event handler thread processes the eeh_event and crashes since the eeh_pe pointer in the eeh_event structure is no longer valid. Crashing is generally considered poor form. Instead of doing that use the fact PEs are marked as EEH_PE_INVALID to keep them around until the end of the recovery cycle, at which point we can safely prune any empty PEs. Signed-off-by: Oliver O'Halloran Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190903101605.2890-2-oohall@gmail.com --- arch/powerpc/kernel/eeh_driver.c | 36 ++++++++++++++++++++++++++++++++++-- arch/powerpc/kernel/eeh_event.c | 8 ++++++++ arch/powerpc/kernel/eeh_pe.c | 23 ++++++++++++++++++++++- 3 files changed, 64 insertions(+), 3 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index a31cd32c4ce9..75266156943f 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -734,6 +734,33 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, */ #define MAX_WAIT_FOR_RECOVERY 300 + +/* Walks the PE tree after processing an event to remove any stale PEs. + * + * NB: This needs to be recursive to ensure the leaf PEs get removed + * before their parents do. Although this is possible to do recursively + * we don't since this is easier to read and we need to garantee + * the leaf nodes will be handled first. + */ +static void eeh_pe_cleanup(struct eeh_pe *pe) +{ + struct eeh_pe *child_pe, *tmp; + + list_for_each_entry_safe(child_pe, tmp, &pe->child_list, child) + eeh_pe_cleanup(child_pe); + + if (pe->state & EEH_PE_KEEP) + return; + + if (!(pe->state & EEH_PE_INVALID)) + return; + + if (list_empty(&pe->edevs) && list_empty(&pe->child_list)) { + list_del(&pe->child); + kfree(pe); + } +} + /** * eeh_handle_normal_event - Handle EEH events on a specific PE * @pe: EEH PE - which should not be used after we return, as it may @@ -772,8 +799,6 @@ void eeh_handle_normal_event(struct eeh_pe *pe) return; } - eeh_pe_state_mark(pe, EEH_PE_RECOVERING); - eeh_pe_update_time_stamp(pe); pe->freeze_count++; if (pe->freeze_count > eeh_max_freezes) { @@ -963,6 +988,12 @@ void eeh_handle_normal_event(struct eeh_pe *pe) return; } } + + /* + * Clean up any PEs without devices. While marked as EEH_PE_RECOVERYING + * we don't want to modify the PE tree structure so we do it here. + */ + eeh_pe_cleanup(pe); eeh_pe_state_clear(pe, EEH_PE_RECOVERING, true); } @@ -1035,6 +1066,7 @@ void eeh_handle_special_event(void) */ if (rc == EEH_NEXT_ERR_FROZEN_PE || rc == EEH_NEXT_ERR_FENCED_PHB) { + eeh_pe_state_mark(pe, EEH_PE_RECOVERING); eeh_handle_normal_event(pe); } else { pci_lock_rescan_remove(); diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c index 64cfbe41174b..e36653e5f76b 100644 --- a/arch/powerpc/kernel/eeh_event.c +++ b/arch/powerpc/kernel/eeh_event.c @@ -121,6 +121,14 @@ int __eeh_send_failure_event(struct eeh_pe *pe) } event->pe = pe; + /* + * Mark the PE as recovering before inserting it in the queue. + * This prevents the PE from being free()ed by a hotplug driver + * while the PE is sitting in the event queue. + */ + if (pe) + eeh_pe_state_mark(pe, EEH_PE_RECOVERING); + /* We may or may not be called in an interrupt context */ spin_lock_irqsave(&eeh_eventlist_lock, flags); list_add(&event->list, &eeh_eventlist); diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index 1a6254bcf056..177852e39a25 100644 --- a/arch/powerpc/kernel/eeh_pe.c +++ b/arch/powerpc/kernel/eeh_pe.c @@ -470,6 +470,7 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev) int eeh_rmv_from_parent_pe(struct eeh_dev *edev) { struct eeh_pe *pe, *parent, *child; + bool keep, recover; int cnt; pe = eeh_dev_to_pe(edev); @@ -490,10 +491,21 @@ int eeh_rmv_from_parent_pe(struct eeh_dev *edev) */ while (1) { parent = pe->parent; + + /* PHB PEs should never be removed */ if (pe->type & EEH_PE_PHB) break; - if (!(pe->state & EEH_PE_KEEP)) { + /* + * XXX: KEEP is set while resetting a PE. I don't think it's + * ever set without RECOVERING also being set. I could + * be wrong though so catch that with a WARN. + */ + keep = !!(pe->state & EEH_PE_KEEP); + recover = !!(pe->state & EEH_PE_RECOVERING); + WARN_ON(keep && !recover); + + if (!keep && !recover) { if (list_empty(&pe->edevs) && list_empty(&pe->child_list)) { list_del(&pe->child); @@ -502,6 +514,15 @@ int eeh_rmv_from_parent_pe(struct eeh_dev *edev) break; } } else { + /* + * Mark the PE as invalid. At the end of the recovery + * process any invalid PEs will be garbage collected. + * + * We need to delay the free()ing of them since we can + * remove edev's while traversing the PE tree which + * might trigger the removal of a PE and we can't + * deal with that (yet). + */ if (list_empty(&pe->edevs)) { cnt = 0; list_for_each_entry(child, &pe->child_list, child) { -- cgit v1.2.3 From 5ef753ae435a5cea8af5c84a65fc5dd30b773040 Mon Sep 17 00:00:00 2001 From: Oliver O'Halloran Date: Tue, 3 Sep 2019 20:15:53 +1000 Subject: powerpc/eeh: Fix race when freeing PDNs When hot-adding devices we rely on the hotplug driver to create pci_dn's for the devices under the hotplug slot. Converse, when hot-removing the driver will remove the pci_dn's that it created. This is a problem because the pci_dev is still live until it's refcount drops to zero. This can happen if the driver is slow to tear down it's internal state. Ideally, the driver would not attempt to perform any config accesses to the device once it's been marked as removed, but sometimes it happens. As a result, we might attempt to access the pci_dn for a device that has been torn down and the kernel may crash as a result. To fix this, don't free the pci_dn unless the corresponding pci_dev has been released. If the pci_dev is still live, then we mark the pci_dn with a flag that indicates the pci_dev's release function should free it. Signed-off-by: Oliver O'Halloran Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190903101605.2890-3-oohall@gmail.com --- arch/powerpc/kernel/pci-hotplug.c | 7 +++++++ arch/powerpc/kernel/pci_dn.c | 21 +++++++++++++++++++-- 2 files changed, 26 insertions(+), 2 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c index 0b0cf8168b47..fc62c4bc47b1 100644 --- a/arch/powerpc/kernel/pci-hotplug.c +++ b/arch/powerpc/kernel/pci-hotplug.c @@ -55,11 +55,18 @@ EXPORT_SYMBOL_GPL(pci_find_bus_by_node); void pcibios_release_device(struct pci_dev *dev) { struct pci_controller *phb = pci_bus_to_host(dev->bus); + struct pci_dn *pdn = pci_get_pdn(dev); eeh_remove_device(dev); if (phb->controller_ops.release_device) phb->controller_ops.release_device(dev); + + /* free()ing the pci_dn has been deferred to us, do it now */ + if (pdn && (pdn->flags & PCI_DN_FLAG_DEAD)) { + pci_dbg(dev, "freeing dead pdn\n"); + kfree(pdn); + } } /** diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c index c4c8c237a106..9524009ca1ae 100644 --- a/arch/powerpc/kernel/pci_dn.c +++ b/arch/powerpc/kernel/pci_dn.c @@ -323,6 +323,7 @@ void pci_remove_device_node_info(struct device_node *dn) { struct pci_dn *pdn = dn ? PCI_DN(dn) : NULL; struct device_node *parent; + struct pci_dev *pdev; #ifdef CONFIG_EEH struct eeh_dev *edev = pdn_to_eeh_dev(pdn); @@ -336,12 +337,28 @@ void pci_remove_device_node_info(struct device_node *dn) WARN_ON(!list_empty(&pdn->child_list)); list_del(&pdn->list); + /* Drop the parent pci_dn's ref to our backing dt node */ parent = of_get_parent(dn); if (parent) of_node_put(parent); - dn->data = NULL; - kfree(pdn); + /* + * At this point we *might* still have a pci_dev that was + * instantiated from this pci_dn. So defer free()ing it until + * the pci_dev's release function is called. + */ + pdev = pci_get_domain_bus_and_slot(pdn->phb->global_number, + pdn->busno, pdn->devfn); + if (pdev) { + /* NB: pdev has a ref to dn */ + pci_dbg(pdev, "marked pdn (from %pOF) as dead\n", dn); + pdn->flags |= PCI_DN_FLAG_DEAD; + } else { + dn->data = NULL; + kfree(pdn); + } + + pci_dev_put(pdev); } EXPORT_SYMBOL_GPL(pci_remove_device_node_info); -- cgit v1.2.3 From 38ddc011478e573c9ab4e3e9bc54cc5bfc542351 Mon Sep 17 00:00:00 2001 From: Oliver O'Halloran Date: Tue, 3 Sep 2019 20:15:54 +1000 Subject: powerpc/eeh: Make permanently failed devices non-actionable If a device is torn down by a hotplug slot driver it's marked as removed and marked as permaantly failed. There's no point in trying to recover a permernantly failed device so it should be considered un-actionable. Signed-off-by: Oliver O'Halloran Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190903101605.2890-4-oohall@gmail.com --- arch/powerpc/kernel/eeh_driver.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 75266156943f..18a69fac4d80 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -96,8 +96,16 @@ static bool eeh_dev_removed(struct eeh_dev *edev) static bool eeh_edev_actionable(struct eeh_dev *edev) { - return (edev->pdev && !eeh_dev_removed(edev) && - !eeh_pe_passed(edev->pe)); + if (!edev->pdev) + return false; + if (edev->pdev->error_state == pci_channel_io_perm_failure) + return false; + if (eeh_dev_removed(edev)) + return false; + if (eeh_pe_passed(edev->pe)) + return false; + + return true; } /** -- cgit v1.2.3 From b104af5a7687060792ca398bb86b033057afce75 Mon Sep 17 00:00:00 2001 From: Oliver O'Halloran Date: Tue, 3 Sep 2019 20:15:55 +1000 Subject: powerpc/eeh: Check slot presence state in eeh_handle_normal_event() When a device is surprise removed while undergoing IO we will probably get an EEH PE freeze due to MMIO timeouts and other errors. When a freeze is detected we send a recovery event to the EEH worker thread which will notify drivers, and perform recovery as needed. In the event of a hot-remove we don't want recovery to occur since there isn't a device to recover. The recovery process is fairly long due to the number of wait states (required by PCIe) which causes problems when devices are removed and replaced (e.g. hot swapping of U.2 NVMe drives). To determine if we need to skip the recovery process we can use the get_adapter_state() operation of the hotplug_slot to determine if the slot contains a device or not, and if the slot is empty we can skip recovery entirely. One thing to note is that the slot being EEH frozen does not prevent the hotplug driver from working. We don't have the EEH recovery thread remove any of the devices since it's assumed that the hotplug driver will handle tearing down the slot state. Signed-off-by: Oliver O'Halloran Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190903101605.2890-5-oohall@gmail.com --- arch/powerpc/kernel/eeh_driver.c | 60 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 18a69fac4d80..52ce7584af43 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -769,6 +770,46 @@ static void eeh_pe_cleanup(struct eeh_pe *pe) } } +/** + * eeh_check_slot_presence - Check if a device is still present in a slot + * @pdev: pci_dev to check + * + * This function may return a false positive if we can't determine the slot's + * presence state. This might happen for for PCIe slots if the PE containing + * the upstream bridge is also frozen, or the bridge is part of the same PE + * as the device. + * + * This shouldn't happen often, but you might see it if you hotplug a PCIe + * switch. + */ +static bool eeh_slot_presence_check(struct pci_dev *pdev) +{ + const struct hotplug_slot_ops *ops; + struct pci_slot *slot; + u8 state; + int rc; + + if (!pdev) + return false; + + if (pdev->error_state == pci_channel_io_perm_failure) + return false; + + slot = pdev->slot; + if (!slot || !slot->hotplug) + return true; + + ops = slot->hotplug->ops; + if (!ops || !ops->get_adapter_status) + return true; + + rc = ops->get_adapter_status(slot->hotplug, &state); + if (rc) + return true; + + return !!state; +} + /** * eeh_handle_normal_event - Handle EEH events on a specific PE * @pe: EEH PE - which should not be used after we return, as it may @@ -799,6 +840,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) enum pci_ers_result result = PCI_ERS_RESULT_NONE; struct eeh_rmv_data rmv_data = {LIST_HEAD_INIT(rmv_data.removed_vf_list), 0}; + int devices = 0; bus = eeh_pe_bus_get(pe); if (!bus) { @@ -807,6 +849,23 @@ void eeh_handle_normal_event(struct eeh_pe *pe) return; } + /* + * When devices are hot-removed we might get an EEH due to + * a driver attempting to touch the MMIO space of a removed + * device. In this case we don't have a device to recover + * so suppress the event if we can't find any present devices. + * + * The hotplug driver should take care of tearing down the + * device itself. + */ + eeh_for_each_pe(pe, tmp_pe) + eeh_pe_for_each_dev(tmp_pe, edev, tmp) + if (eeh_slot_presence_check(edev->pdev)) + devices++; + + if (!devices) + goto out; /* nothing to recover */ + eeh_pe_update_time_stamp(pe); pe->freeze_count++; if (pe->freeze_count > eeh_max_freezes) { @@ -997,6 +1056,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) } } +out: /* * Clean up any PEs without devices. While marked as EEH_PE_RECOVERYING * we don't want to modify the PE tree structure so we do it here. -- cgit v1.2.3 From 25baf3d81614b0b8ca8958f4d6f111ccaaaad578 Mon Sep 17 00:00:00 2001 From: Oliver O'Halloran Date: Tue, 3 Sep 2019 20:15:56 +1000 Subject: powerpc/eeh: Defer printing stack trace Currently we print a stack trace in the event handler to help with debugging EEH issues. In the case of suprise hot-unplug this is unneeded, so we want to prevent printing the stack trace unless we know it's due to an actual device error. To accomplish this, we can save a stack trace at the point of detection and only print it once the EEH recovery handler has determined the freeze was due to an actual error. Since the whole point of this is to prevent spurious EEH output we also move a few prints out of the detection thread, or mark them as pr_debug so anyone interested can get output from the eeh_check_dev_failure() if they want. Signed-off-by: Oliver O'Halloran Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190903101605.2890-6-oohall@gmail.com --- arch/powerpc/kernel/eeh.c | 15 ++++----------- arch/powerpc/kernel/eeh_driver.c | 38 +++++++++++++++++++++++++++++++++++++- arch/powerpc/kernel/eeh_event.c | 26 ++++++++++++-------------- 3 files changed, 53 insertions(+), 26 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 7b2755f5c6fd..398def61f8a6 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -420,11 +420,9 @@ static int eeh_phb_check_failure(struct eeh_pe *pe) eeh_pe_mark_isolated(phb_pe); eeh_serialize_unlock(flags); - pr_err("EEH: PHB#%x failure detected, location: %s\n", + pr_debug("EEH: PHB#%x failure detected, location: %s\n", phb_pe->phb->global_number, eeh_pe_loc_get(phb_pe)); - dump_stack(); eeh_send_failure_event(phb_pe); - return 1; out: eeh_serialize_unlock(flags); @@ -451,7 +449,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev) unsigned long flags; struct device_node *dn; struct pci_dev *dev; - struct eeh_pe *pe, *parent_pe, *phb_pe; + struct eeh_pe *pe, *parent_pe; int rc = 0; const char *location = NULL; @@ -581,13 +579,8 @@ int eeh_dev_check_failure(struct eeh_dev *edev) * a stack trace will help the device-driver authors figure * out what happened. So print that out. */ - phb_pe = eeh_phb_pe_get(pe->phb); - pr_err("EEH: Frozen PHB#%x-PE#%x detected\n", - pe->phb->global_number, pe->addr); - pr_err("EEH: PE location: %s, PHB location: %s\n", - eeh_pe_loc_get(pe), eeh_pe_loc_get(phb_pe)); - dump_stack(); - + pr_debug("EEH: %s: Frozen PHB#%x-PE#%x detected\n", + __func__, pe->phb->global_number, pe->addr); eeh_send_failure_event(pe); return 1; diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 52ce7584af43..0d34cc12c529 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -863,8 +863,44 @@ void eeh_handle_normal_event(struct eeh_pe *pe) if (eeh_slot_presence_check(edev->pdev)) devices++; - if (!devices) + if (!devices) { + pr_debug("EEH: Frozen PHB#%x-PE#%x is empty!\n", + pe->phb->global_number, pe->addr); goto out; /* nothing to recover */ + } + + /* Log the event */ + if (pe->type & EEH_PE_PHB) { + pr_err("EEH: PHB#%x failure detected, location: %s\n", + pe->phb->global_number, eeh_pe_loc_get(pe)); + } else { + struct eeh_pe *phb_pe = eeh_phb_pe_get(pe->phb); + + pr_err("EEH: Frozen PHB#%x-PE#%x detected\n", + pe->phb->global_number, pe->addr); + pr_err("EEH: PE location: %s, PHB location: %s\n", + eeh_pe_loc_get(pe), eeh_pe_loc_get(phb_pe)); + } + + /* + * Print the saved stack trace now that we've verified there's + * something to recover. + */ + if (pe->trace_entries) { + void **ptrs = (void **) pe->stack_trace; + int i; + + pr_err("EEH: Frozen PHB#%x-PE#%x detected\n", + pe->phb->global_number, pe->addr); + + /* FIXME: Use the same format as dump_stack() */ + pr_err("EEH: Call Trace:\n"); + for (i = 0; i < pe->trace_entries; i++) + pr_err("EEH: [%pK] %pS\n", ptrs[i], ptrs[i]); + + pe->trace_entries = 0; + } + eeh_pe_update_time_stamp(pe); pe->freeze_count++; diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c index e36653e5f76b..1d55486adb0f 100644 --- a/arch/powerpc/kernel/eeh_event.c +++ b/arch/powerpc/kernel/eeh_event.c @@ -40,7 +40,6 @@ static int eeh_event_handler(void * dummy) { unsigned long flags; struct eeh_event *event; - struct eeh_pe *pe; while (!kthread_should_stop()) { if (wait_for_completion_interruptible(&eeh_eventlist_event)) @@ -59,19 +58,10 @@ static int eeh_event_handler(void * dummy) continue; /* We might have event without binding PE */ - pe = event->pe; - if (pe) { - if (pe->type & EEH_PE_PHB) - pr_info("EEH: Detected error on PHB#%x\n", - pe->phb->global_number); - else - pr_info("EEH: Detected PCI bus error on " - "PHB#%x-PE#%x\n", - pe->phb->global_number, pe->addr); - eeh_handle_normal_event(pe); - } else { + if (event->pe) + eeh_handle_normal_event(event->pe); + else eeh_handle_special_event(); - } kfree(event); } @@ -126,8 +116,16 @@ int __eeh_send_failure_event(struct eeh_pe *pe) * This prevents the PE from being free()ed by a hotplug driver * while the PE is sitting in the event queue. */ - if (pe) + if (pe) { + /* + * Save the current stack trace so we can dump it from the + * event handler thread. + */ + pe->trace_entries = stack_trace_save(pe->stack_trace, + ARRAY_SIZE(pe->stack_trace), 0); + eeh_pe_state_mark(pe, EEH_PE_RECOVERING); + } /* We may or may not be called in an interrupt context */ spin_lock_irqsave(&eeh_eventlist_lock, flags); -- cgit v1.2.3 From aeff27c121ba7397c21a47c749e2b5be07f48c17 Mon Sep 17 00:00:00 2001 From: Oliver O'Halloran Date: Tue, 3 Sep 2019 20:16:02 +1000 Subject: powerpc/eeh: Set attention indicator while recovering I am the RAS team. Hear me roar. Roar. On a more serious note, being able to locate failed devices can be helpful. Set the attention indicator if the slot supports it once we've determined the device is present and only clear it if the device is fully recovered. Signed-off-by: Oliver O'Halloran Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190903101605.2890-12-oohall@gmail.com --- arch/powerpc/kernel/eeh_driver.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 0d34cc12c529..80bd157fcb45 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -803,6 +803,10 @@ static bool eeh_slot_presence_check(struct pci_dev *pdev) if (!ops || !ops->get_adapter_status) return true; + /* set the attention indicator while we've got the slot ops */ + if (ops->set_attention_status) + ops->set_attention_status(slot->hotplug, 1); + rc = ops->get_adapter_status(slot->hotplug, &state); if (rc) return true; @@ -810,6 +814,28 @@ static bool eeh_slot_presence_check(struct pci_dev *pdev) return !!state; } +static void eeh_clear_slot_attention(struct pci_dev *pdev) +{ + const struct hotplug_slot_ops *ops; + struct pci_slot *slot; + + if (!pdev) + return; + + if (pdev->error_state == pci_channel_io_perm_failure) + return; + + slot = pdev->slot; + if (!slot || !slot->hotplug) + return; + + ops = slot->hotplug->ops; + if (!ops || !ops->set_attention_status) + return; + + ops->set_attention_status(slot->hotplug, 0); +} + /** * eeh_handle_normal_event - Handle EEH events on a specific PE * @pe: EEH PE - which should not be used after we return, as it may @@ -1098,6 +1124,12 @@ out: * we don't want to modify the PE tree structure so we do it here. */ eeh_pe_cleanup(pe); + + /* clear the slot attention LED for all recovered devices */ + eeh_for_each_pe(pe, tmp_pe) + eeh_pe_for_each_dev(tmp_pe, edev, tmp) + eeh_clear_slot_attention(edev->pdev); + eeh_pe_state_clear(pe, EEH_PE_RECOVERING, true); } -- cgit v1.2.3 From 22cda7c1680c1ddfe941adae45e7e7ef52d0e411 Mon Sep 17 00:00:00 2001 From: Oliver O'Halloran Date: Tue, 3 Sep 2019 20:16:03 +1000 Subject: powerpc/eeh: Add debugfs interface to run an EEH check Detecting an frozen EEH PE usually occurs when an MMIO load returns a 0xFFs response. When performing EEH testing using the EEH error injection feature available on some platforms there is no simple way to kick-off the kernel's recovery process since any accesses from userspace (usually /dev/mem) will bypass the MMIO helpers in the kernel which check if a 0xFF response is due to an EEH freeze or not. If a device contains a 0xFF byte in it's config space it's possible to trigger the recovery process via config space read from userspace, but this is not a reliable method. If a driver is bound to the device an in use it will frequently trigger the MMIO check, but this is also inconsistent. To solve these problems this patch adds a debugfs file called "eeh_dev_check" which accepts a ::. string and runs eeh_dev_check_failure() on it. This is the same check that's done when the kernel gets a 0xFF result from an config or MMIO read with the added benifit that it can be reliably triggered from userspace. Signed-off-by: Oliver O'Halloran Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190903101605.2890-13-oohall@gmail.com --- arch/powerpc/kernel/eeh.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 398def61f8a6..2b3c03215a95 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -1871,6 +1871,64 @@ static const struct file_operations eeh_force_recover_fops = { .llseek = no_llseek, .write = eeh_force_recover_write, }; + +static ssize_t eeh_debugfs_dev_usage(struct file *filp, + char __user *user_buf, + size_t count, loff_t *ppos) +{ + static const char usage[] = "input format: ::.\n"; + + return simple_read_from_buffer(user_buf, count, ppos, + usage, sizeof(usage) - 1); +} + +static ssize_t eeh_dev_check_write(struct file *filp, + const char __user *user_buf, + size_t count, loff_t *ppos) +{ + uint32_t domain, bus, dev, fn; + struct pci_dev *pdev; + struct eeh_dev *edev; + char buf[20]; + int ret; + + ret = simple_write_to_buffer(buf, sizeof(buf), ppos, user_buf, count); + if (!ret) + return -EFAULT; + + ret = sscanf(buf, "%x:%x:%x.%x", &domain, &bus, &dev, &fn); + if (ret != 4) { + pr_err("%s: expected 4 args, got %d\n", __func__, ret); + return -EINVAL; + } + + pdev = pci_get_domain_bus_and_slot(domain, bus, (dev << 3) | fn); + if (!pdev) + return -ENODEV; + + edev = pci_dev_to_eeh_dev(pdev); + if (!edev) { + pci_err(pdev, "No eeh_dev for this device!\n"); + pci_dev_put(pdev); + return -ENODEV; + } + + ret = eeh_dev_check_failure(edev); + pci_info(pdev, "eeh_dev_check_failure(%04x:%02x:%02x.%01x) = %d\n", + domain, bus, dev, fn, ret); + + pci_dev_put(pdev); + + return count; +} + +static const struct file_operations eeh_dev_check_fops = { + .open = simple_open, + .llseek = no_llseek, + .write = eeh_dev_check_write, + .read = eeh_debugfs_dev_usage, +}; + #endif static int __init eeh_init_proc(void) @@ -1886,6 +1944,9 @@ static int __init eeh_init_proc(void) debugfs_create_bool("eeh_disable_recovery", 0600, powerpc_debugfs_root, &eeh_debugfs_no_recover); + debugfs_create_file_unsafe("eeh_dev_check", 0600, + powerpc_debugfs_root, NULL, + &eeh_dev_check_fops); debugfs_create_file_unsafe("eeh_force_recover", 0600, powerpc_debugfs_root, NULL, &eeh_force_recover_fops); -- cgit v1.2.3 From bd6461cc7b3c4fd12dcba4b0e95dfc612df872fd Mon Sep 17 00:00:00 2001 From: Oliver O'Halloran Date: Tue, 3 Sep 2019 20:16:04 +1000 Subject: powerpc/eeh: Add a eeh_dev_break debugfs interface Add an interface to debugfs for generating an EEH event on a given device. This works by disabling memory accesses to and from the device by setting the PCI_COMMAND register (or the VF Memory Space Enable on the parent PF). This is a somewhat portable alternative to using the platform specific error injection mechanisms since those tend to be either hard to use, or straight up broken. For pseries the interfaces also requires the use of /dev/mem which is probably going to go away in a post-LOCKDOWN world (and it's a horrific hack to begin with) so moving to a kernel-provided interface makes sense and provides a sane, cross-platform interface for userspace so we can write more generic testing scripts. Signed-off-by: Oliver O'Halloran Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190903101605.2890-14-oohall@gmail.com --- arch/powerpc/kernel/eeh.c | 139 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 138 insertions(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 2b3c03215a95..0a91dee51245 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -1892,7 +1892,8 @@ static ssize_t eeh_dev_check_write(struct file *filp, char buf[20]; int ret; - ret = simple_write_to_buffer(buf, sizeof(buf), ppos, user_buf, count); + memset(buf, 0, sizeof(buf)); + ret = simple_write_to_buffer(buf, sizeof(buf)-1, ppos, user_buf, count); if (!ret) return -EFAULT; @@ -1929,6 +1930,139 @@ static const struct file_operations eeh_dev_check_fops = { .read = eeh_debugfs_dev_usage, }; +static int eeh_debugfs_break_device(struct pci_dev *pdev) +{ + struct resource *bar = NULL; + void __iomem *mapped; + u16 old, bit; + int i, pos; + + /* Do we have an MMIO BAR to disable? */ + for (i = 0; i <= PCI_STD_RESOURCE_END; i++) { + struct resource *r = &pdev->resource[i]; + + if (!r->flags || !r->start) + continue; + if (r->flags & IORESOURCE_IO) + continue; + if (r->flags & IORESOURCE_UNSET) + continue; + + bar = r; + break; + } + + if (!bar) { + pci_err(pdev, "Unable to find Memory BAR to cause EEH with\n"); + return -ENXIO; + } + + pci_err(pdev, "Going to break: %pR\n", bar); + + if (pdev->is_virtfn) { +#ifndef CONFIG_IOV + return -ENXIO; +#else + /* + * VFs don't have a per-function COMMAND register, so the best + * we can do is clear the Memory Space Enable bit in the PF's + * SRIOV control reg. + * + * Unfortunately, this requires that we have a PF (i.e doesn't + * work for a passed-through VF) and it has the potential side + * effect of also causing an EEH on every other VF under the + * PF. Oh well. + */ + pdev = pdev->physfn; + if (!pdev) + return -ENXIO; /* passed through VFs have no PF */ + + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_SRIOV); + pos += PCI_SRIOV_CTRL; + bit = PCI_SRIOV_CTRL_MSE; +#endif /* !CONFIG_IOV */ + } else { + bit = PCI_COMMAND_MEMORY; + pos = PCI_COMMAND; + } + + /* + * Process here is: + * + * 1. Disable Memory space. + * + * 2. Perform an MMIO to the device. This should result in an error + * (CA / UR) being raised by the device which results in an EEH + * PE freeze. Using the in_8() accessor skips the eeh detection hook + * so the freeze hook so the EEH Detection machinery won't be + * triggered here. This is to match the usual behaviour of EEH + * where the HW will asyncronously freeze a PE and it's up to + * the kernel to notice and deal with it. + * + * 3. Turn Memory space back on. This is more important for VFs + * since recovery will probably fail if we don't. For normal + * the COMMAND register is reset as a part of re-initialising + * the device. + * + * Breaking stuff is the point so who cares if it's racy ;) + */ + pci_read_config_word(pdev, pos, &old); + + mapped = ioremap(bar->start, PAGE_SIZE); + if (!mapped) { + pci_err(pdev, "Unable to map MMIO BAR %pR\n", bar); + return -ENXIO; + } + + pci_write_config_word(pdev, pos, old & ~bit); + in_8(mapped); + pci_write_config_word(pdev, pos, old); + + iounmap(mapped); + + return 0; +} + +static ssize_t eeh_dev_break_write(struct file *filp, + const char __user *user_buf, + size_t count, loff_t *ppos) +{ + uint32_t domain, bus, dev, fn; + struct pci_dev *pdev; + char buf[20]; + int ret; + + memset(buf, 0, sizeof(buf)); + ret = simple_write_to_buffer(buf, sizeof(buf)-1, ppos, user_buf, count); + if (!ret) + return -EFAULT; + + ret = sscanf(buf, "%x:%x:%x.%x", &domain, &bus, &dev, &fn); + if (ret != 4) { + pr_err("%s: expected 4 args, got %d\n", __func__, ret); + return -EINVAL; + } + + pdev = pci_get_domain_bus_and_slot(domain, bus, (dev << 3) | fn); + if (!pdev) + return -ENODEV; + + ret = eeh_debugfs_break_device(pdev); + pci_dev_put(pdev); + + if (ret < 0) + return ret; + + return count; +} + +static const struct file_operations eeh_dev_break_fops = { + .open = simple_open, + .llseek = no_llseek, + .write = eeh_dev_break_write, + .read = eeh_debugfs_dev_usage, +}; + #endif static int __init eeh_init_proc(void) @@ -1947,6 +2081,9 @@ static int __init eeh_init_proc(void) debugfs_create_file_unsafe("eeh_dev_check", 0600, powerpc_debugfs_root, NULL, &eeh_dev_check_fops); + debugfs_create_file_unsafe("eeh_dev_break", 0600, + powerpc_debugfs_root, NULL, + &eeh_dev_break_fops); debugfs_create_file_unsafe("eeh_force_recover", 0600, powerpc_debugfs_root, NULL, &eeh_force_recover_fops); -- cgit v1.2.3 From 175fca3bf91a1111b7e46f6655666640556b9059 Mon Sep 17 00:00:00 2001 From: Sven Schnelle Date: Fri, 23 Aug 2019 21:49:13 +0200 Subject: kexec: add KEXEC_ELF Right now powerpc provides an implementation to read elf files with the kexec_file_load() syscall. Make that available as a public kexec interface so it can be re-used on other architectures. Signed-off-by: Sven Schnelle Reviewed-by: Thiago Jung Bauermann Signed-off-by: Helge Deller --- arch/powerpc/kernel/kexec_elf_64.c | 545 +------------------------------------ 1 file changed, 5 insertions(+), 540 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/kexec_elf_64.c b/arch/powerpc/kernel/kexec_elf_64.c index 83cf7b852876..3072fd6dbe94 100644 --- a/arch/powerpc/kernel/kexec_elf_64.c +++ b/arch/powerpc/kernel/kexec_elf_64.c @@ -23,541 +23,6 @@ #include #include -#define PURGATORY_STACK_SIZE (16 * 1024) - -#define elf_addr_to_cpu elf64_to_cpu - -#ifndef Elf_Rel -#define Elf_Rel Elf64_Rel -#endif /* Elf_Rel */ - -struct elf_info { - /* - * Where the ELF binary contents are kept. - * Memory managed by the user of the struct. - */ - const char *buffer; - - const struct elfhdr *ehdr; - const struct elf_phdr *proghdrs; - struct elf_shdr *sechdrs; -}; - -static inline bool elf_is_elf_file(const struct elfhdr *ehdr) -{ - return memcmp(ehdr->e_ident, ELFMAG, SELFMAG) == 0; -} - -static uint64_t elf64_to_cpu(const struct elfhdr *ehdr, uint64_t value) -{ - if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB) - value = le64_to_cpu(value); - else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB) - value = be64_to_cpu(value); - - return value; -} - -static uint16_t elf16_to_cpu(const struct elfhdr *ehdr, uint16_t value) -{ - if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB) - value = le16_to_cpu(value); - else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB) - value = be16_to_cpu(value); - - return value; -} - -static uint32_t elf32_to_cpu(const struct elfhdr *ehdr, uint32_t value) -{ - if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB) - value = le32_to_cpu(value); - else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB) - value = be32_to_cpu(value); - - return value; -} - -/** - * elf_is_ehdr_sane - check that it is safe to use the ELF header - * @buf_len: size of the buffer in which the ELF file is loaded. - */ -static bool elf_is_ehdr_sane(const struct elfhdr *ehdr, size_t buf_len) -{ - if (ehdr->e_phnum > 0 && ehdr->e_phentsize != sizeof(struct elf_phdr)) { - pr_debug("Bad program header size.\n"); - return false; - } else if (ehdr->e_shnum > 0 && - ehdr->e_shentsize != sizeof(struct elf_shdr)) { - pr_debug("Bad section header size.\n"); - return false; - } else if (ehdr->e_ident[EI_VERSION] != EV_CURRENT || - ehdr->e_version != EV_CURRENT) { - pr_debug("Unknown ELF version.\n"); - return false; - } - - if (ehdr->e_phoff > 0 && ehdr->e_phnum > 0) { - size_t phdr_size; - - /* - * e_phnum is at most 65535 so calculating the size of the - * program header cannot overflow. - */ - phdr_size = sizeof(struct elf_phdr) * ehdr->e_phnum; - - /* Sanity check the program header table location. */ - if (ehdr->e_phoff + phdr_size < ehdr->e_phoff) { - pr_debug("Program headers at invalid location.\n"); - return false; - } else if (ehdr->e_phoff + phdr_size > buf_len) { - pr_debug("Program headers truncated.\n"); - return false; - } - } - - if (ehdr->e_shoff > 0 && ehdr->e_shnum > 0) { - size_t shdr_size; - - /* - * e_shnum is at most 65536 so calculating - * the size of the section header cannot overflow. - */ - shdr_size = sizeof(struct elf_shdr) * ehdr->e_shnum; - - /* Sanity check the section header table location. */ - if (ehdr->e_shoff + shdr_size < ehdr->e_shoff) { - pr_debug("Section headers at invalid location.\n"); - return false; - } else if (ehdr->e_shoff + shdr_size > buf_len) { - pr_debug("Section headers truncated.\n"); - return false; - } - } - - return true; -} - -static int elf_read_ehdr(const char *buf, size_t len, struct elfhdr *ehdr) -{ - struct elfhdr *buf_ehdr; - - if (len < sizeof(*buf_ehdr)) { - pr_debug("Buffer is too small to hold ELF header.\n"); - return -ENOEXEC; - } - - memset(ehdr, 0, sizeof(*ehdr)); - memcpy(ehdr->e_ident, buf, sizeof(ehdr->e_ident)); - if (!elf_is_elf_file(ehdr)) { - pr_debug("No ELF header magic.\n"); - return -ENOEXEC; - } - - if (ehdr->e_ident[EI_CLASS] != ELF_CLASS) { - pr_debug("Not a supported ELF class.\n"); - return -ENOEXEC; - } else if (ehdr->e_ident[EI_DATA] != ELFDATA2LSB && - ehdr->e_ident[EI_DATA] != ELFDATA2MSB) { - pr_debug("Not a supported ELF data format.\n"); - return -ENOEXEC; - } - - buf_ehdr = (struct elfhdr *) buf; - if (elf16_to_cpu(ehdr, buf_ehdr->e_ehsize) != sizeof(*buf_ehdr)) { - pr_debug("Bad ELF header size.\n"); - return -ENOEXEC; - } - - ehdr->e_type = elf16_to_cpu(ehdr, buf_ehdr->e_type); - ehdr->e_machine = elf16_to_cpu(ehdr, buf_ehdr->e_machine); - ehdr->e_version = elf32_to_cpu(ehdr, buf_ehdr->e_version); - ehdr->e_entry = elf_addr_to_cpu(ehdr, buf_ehdr->e_entry); - ehdr->e_phoff = elf_addr_to_cpu(ehdr, buf_ehdr->e_phoff); - ehdr->e_shoff = elf_addr_to_cpu(ehdr, buf_ehdr->e_shoff); - ehdr->e_flags = elf32_to_cpu(ehdr, buf_ehdr->e_flags); - ehdr->e_phentsize = elf16_to_cpu(ehdr, buf_ehdr->e_phentsize); - ehdr->e_phnum = elf16_to_cpu(ehdr, buf_ehdr->e_phnum); - ehdr->e_shentsize = elf16_to_cpu(ehdr, buf_ehdr->e_shentsize); - ehdr->e_shnum = elf16_to_cpu(ehdr, buf_ehdr->e_shnum); - ehdr->e_shstrndx = elf16_to_cpu(ehdr, buf_ehdr->e_shstrndx); - - return elf_is_ehdr_sane(ehdr, len) ? 0 : -ENOEXEC; -} - -/** - * elf_is_phdr_sane - check that it is safe to use the program header - * @buf_len: size of the buffer in which the ELF file is loaded. - */ -static bool elf_is_phdr_sane(const struct elf_phdr *phdr, size_t buf_len) -{ - - if (phdr->p_offset + phdr->p_filesz < phdr->p_offset) { - pr_debug("ELF segment location wraps around.\n"); - return false; - } else if (phdr->p_offset + phdr->p_filesz > buf_len) { - pr_debug("ELF segment not in file.\n"); - return false; - } else if (phdr->p_paddr + phdr->p_memsz < phdr->p_paddr) { - pr_debug("ELF segment address wraps around.\n"); - return false; - } - - return true; -} - -static int elf_read_phdr(const char *buf, size_t len, struct elf_info *elf_info, - int idx) -{ - /* Override the const in proghdrs, we are the ones doing the loading. */ - struct elf_phdr *phdr = (struct elf_phdr *) &elf_info->proghdrs[idx]; - const char *pbuf; - struct elf_phdr *buf_phdr; - - pbuf = buf + elf_info->ehdr->e_phoff + (idx * sizeof(*buf_phdr)); - buf_phdr = (struct elf_phdr *) pbuf; - - phdr->p_type = elf32_to_cpu(elf_info->ehdr, buf_phdr->p_type); - phdr->p_offset = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_offset); - phdr->p_paddr = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_paddr); - phdr->p_vaddr = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_vaddr); - phdr->p_flags = elf32_to_cpu(elf_info->ehdr, buf_phdr->p_flags); - - /* - * The following fields have a type equivalent to Elf_Addr - * both in 32 bit and 64 bit ELF. - */ - phdr->p_filesz = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_filesz); - phdr->p_memsz = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_memsz); - phdr->p_align = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_align); - - return elf_is_phdr_sane(phdr, len) ? 0 : -ENOEXEC; -} - -/** - * elf_read_phdrs - read the program headers from the buffer - * - * This function assumes that the program header table was checked for sanity. - * Use elf_is_ehdr_sane() if it wasn't. - */ -static int elf_read_phdrs(const char *buf, size_t len, - struct elf_info *elf_info) -{ - size_t phdr_size, i; - const struct elfhdr *ehdr = elf_info->ehdr; - - /* - * e_phnum is at most 65535 so calculating the size of the - * program header cannot overflow. - */ - phdr_size = sizeof(struct elf_phdr) * ehdr->e_phnum; - - elf_info->proghdrs = kzalloc(phdr_size, GFP_KERNEL); - if (!elf_info->proghdrs) - return -ENOMEM; - - for (i = 0; i < ehdr->e_phnum; i++) { - int ret; - - ret = elf_read_phdr(buf, len, elf_info, i); - if (ret) { - kfree(elf_info->proghdrs); - elf_info->proghdrs = NULL; - return ret; - } - } - - return 0; -} - -/** - * elf_is_shdr_sane - check that it is safe to use the section header - * @buf_len: size of the buffer in which the ELF file is loaded. - */ -static bool elf_is_shdr_sane(const struct elf_shdr *shdr, size_t buf_len) -{ - bool size_ok; - - /* SHT_NULL headers have undefined values, so we can't check them. */ - if (shdr->sh_type == SHT_NULL) - return true; - - /* Now verify sh_entsize */ - switch (shdr->sh_type) { - case SHT_SYMTAB: - size_ok = shdr->sh_entsize == sizeof(Elf_Sym); - break; - case SHT_RELA: - size_ok = shdr->sh_entsize == sizeof(Elf_Rela); - break; - case SHT_DYNAMIC: - size_ok = shdr->sh_entsize == sizeof(Elf_Dyn); - break; - case SHT_REL: - size_ok = shdr->sh_entsize == sizeof(Elf_Rel); - break; - case SHT_NOTE: - case SHT_PROGBITS: - case SHT_HASH: - case SHT_NOBITS: - default: - /* - * This is a section whose entsize requirements - * I don't care about. If I don't know about - * the section I can't care about it's entsize - * requirements. - */ - size_ok = true; - break; - } - - if (!size_ok) { - pr_debug("ELF section with wrong entry size.\n"); - return false; - } else if (shdr->sh_addr + shdr->sh_size < shdr->sh_addr) { - pr_debug("ELF section address wraps around.\n"); - return false; - } - - if (shdr->sh_type != SHT_NOBITS) { - if (shdr->sh_offset + shdr->sh_size < shdr->sh_offset) { - pr_debug("ELF section location wraps around.\n"); - return false; - } else if (shdr->sh_offset + shdr->sh_size > buf_len) { - pr_debug("ELF section not in file.\n"); - return false; - } - } - - return true; -} - -static int elf_read_shdr(const char *buf, size_t len, struct elf_info *elf_info, - int idx) -{ - struct elf_shdr *shdr = &elf_info->sechdrs[idx]; - const struct elfhdr *ehdr = elf_info->ehdr; - const char *sbuf; - struct elf_shdr *buf_shdr; - - sbuf = buf + ehdr->e_shoff + idx * sizeof(*buf_shdr); - buf_shdr = (struct elf_shdr *) sbuf; - - shdr->sh_name = elf32_to_cpu(ehdr, buf_shdr->sh_name); - shdr->sh_type = elf32_to_cpu(ehdr, buf_shdr->sh_type); - shdr->sh_addr = elf_addr_to_cpu(ehdr, buf_shdr->sh_addr); - shdr->sh_offset = elf_addr_to_cpu(ehdr, buf_shdr->sh_offset); - shdr->sh_link = elf32_to_cpu(ehdr, buf_shdr->sh_link); - shdr->sh_info = elf32_to_cpu(ehdr, buf_shdr->sh_info); - - /* - * The following fields have a type equivalent to Elf_Addr - * both in 32 bit and 64 bit ELF. - */ - shdr->sh_flags = elf_addr_to_cpu(ehdr, buf_shdr->sh_flags); - shdr->sh_size = elf_addr_to_cpu(ehdr, buf_shdr->sh_size); - shdr->sh_addralign = elf_addr_to_cpu(ehdr, buf_shdr->sh_addralign); - shdr->sh_entsize = elf_addr_to_cpu(ehdr, buf_shdr->sh_entsize); - - return elf_is_shdr_sane(shdr, len) ? 0 : -ENOEXEC; -} - -/** - * elf_read_shdrs - read the section headers from the buffer - * - * This function assumes that the section header table was checked for sanity. - * Use elf_is_ehdr_sane() if it wasn't. - */ -static int elf_read_shdrs(const char *buf, size_t len, - struct elf_info *elf_info) -{ - size_t shdr_size, i; - - /* - * e_shnum is at most 65536 so calculating - * the size of the section header cannot overflow. - */ - shdr_size = sizeof(struct elf_shdr) * elf_info->ehdr->e_shnum; - - elf_info->sechdrs = kzalloc(shdr_size, GFP_KERNEL); - if (!elf_info->sechdrs) - return -ENOMEM; - - for (i = 0; i < elf_info->ehdr->e_shnum; i++) { - int ret; - - ret = elf_read_shdr(buf, len, elf_info, i); - if (ret) { - kfree(elf_info->sechdrs); - elf_info->sechdrs = NULL; - return ret; - } - } - - return 0; -} - -/** - * elf_read_from_buffer - read ELF file and sets up ELF header and ELF info - * @buf: Buffer to read ELF file from. - * @len: Size of @buf. - * @ehdr: Pointer to existing struct which will be populated. - * @elf_info: Pointer to existing struct which will be populated. - * - * This function allows reading ELF files with different byte order than - * the kernel, byte-swapping the fields as needed. - * - * Return: - * On success returns 0, and the caller should call elf_free_info(elf_info) to - * free the memory allocated for the section and program headers. - */ -int elf_read_from_buffer(const char *buf, size_t len, struct elfhdr *ehdr, - struct elf_info *elf_info) -{ - int ret; - - ret = elf_read_ehdr(buf, len, ehdr); - if (ret) - return ret; - - elf_info->buffer = buf; - elf_info->ehdr = ehdr; - if (ehdr->e_phoff > 0 && ehdr->e_phnum > 0) { - ret = elf_read_phdrs(buf, len, elf_info); - if (ret) - return ret; - } - if (ehdr->e_shoff > 0 && ehdr->e_shnum > 0) { - ret = elf_read_shdrs(buf, len, elf_info); - if (ret) { - kfree(elf_info->proghdrs); - return ret; - } - } - - return 0; -} - -/** - * elf_free_info - free memory allocated by elf_read_from_buffer - */ -void elf_free_info(struct elf_info *elf_info) -{ - kfree(elf_info->proghdrs); - kfree(elf_info->sechdrs); - memset(elf_info, 0, sizeof(*elf_info)); -} -/** - * build_elf_exec_info - read ELF executable and check that we can use it - */ -static int build_elf_exec_info(const char *buf, size_t len, struct elfhdr *ehdr, - struct elf_info *elf_info) -{ - int i; - int ret; - - ret = elf_read_from_buffer(buf, len, ehdr, elf_info); - if (ret) - return ret; - - /* Big endian vmlinux has type ET_DYN. */ - if (ehdr->e_type != ET_EXEC && ehdr->e_type != ET_DYN) { - pr_err("Not an ELF executable.\n"); - goto error; - } else if (!elf_info->proghdrs) { - pr_err("No ELF program header.\n"); - goto error; - } - - for (i = 0; i < ehdr->e_phnum; i++) { - /* - * Kexec does not support loading interpreters. - * In addition this check keeps us from attempting - * to kexec ordinay executables. - */ - if (elf_info->proghdrs[i].p_type == PT_INTERP) { - pr_err("Requires an ELF interpreter.\n"); - goto error; - } - } - - return 0; -error: - elf_free_info(elf_info); - return -ENOEXEC; -} - -static int elf64_probe(const char *buf, unsigned long len) -{ - struct elfhdr ehdr; - struct elf_info elf_info; - int ret; - - ret = build_elf_exec_info(buf, len, &ehdr, &elf_info); - if (ret) - return ret; - - elf_free_info(&elf_info); - - return elf_check_arch(&ehdr) ? 0 : -ENOEXEC; -} - -/** - * elf_exec_load - load ELF executable image - * @lowest_load_addr: On return, will be the address where the first PT_LOAD - * section will be loaded in memory. - * - * Return: - * 0 on success, negative value on failure. - */ -static int elf_exec_load(struct kimage *image, struct elfhdr *ehdr, - struct elf_info *elf_info, - unsigned long *lowest_load_addr) -{ - unsigned long base = 0, lowest_addr = UINT_MAX; - int ret; - size_t i; - struct kexec_buf kbuf = { .image = image, .buf_max = ppc64_rma_size, - .top_down = false }; - - /* Read in the PT_LOAD segments. */ - for (i = 0; i < ehdr->e_phnum; i++) { - unsigned long load_addr; - size_t size; - const struct elf_phdr *phdr; - - phdr = &elf_info->proghdrs[i]; - if (phdr->p_type != PT_LOAD) - continue; - - size = phdr->p_filesz; - if (size > phdr->p_memsz) - size = phdr->p_memsz; - - kbuf.buffer = (void *) elf_info->buffer + phdr->p_offset; - kbuf.bufsz = size; - kbuf.memsz = phdr->p_memsz; - kbuf.buf_align = phdr->p_align; - kbuf.buf_min = phdr->p_paddr + base; - kbuf.mem = KEXEC_BUF_MEM_UNKNOWN; - ret = kexec_add_buffer(&kbuf); - if (ret) - goto out; - load_addr = kbuf.mem; - - if (load_addr < lowest_addr) - lowest_addr = load_addr; - } - - /* Update entry point to reflect new load address. */ - ehdr->e_entry += base; - - *lowest_load_addr = lowest_addr; - ret = 0; - out: - return ret; -} - static void *elf64_load(struct kimage *image, char *kernel_buf, unsigned long kernel_len, char *initrd, unsigned long initrd_len, char *cmdline, @@ -570,18 +35,18 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, void *fdt; const void *slave_code; struct elfhdr ehdr; - struct elf_info elf_info; + struct kexec_elf_info elf_info; struct kexec_buf kbuf = { .image = image, .buf_min = 0, .buf_max = ppc64_rma_size }; struct kexec_buf pbuf = { .image = image, .buf_min = 0, .buf_max = ppc64_rma_size, .top_down = true, .mem = KEXEC_BUF_MEM_UNKNOWN }; - ret = build_elf_exec_info(kernel_buf, kernel_len, &ehdr, &elf_info); + ret = kexec_build_elf_info(kernel_buf, kernel_len, &ehdr, &elf_info); if (ret) goto out; - ret = elf_exec_load(image, &ehdr, &elf_info, &kernel_load_addr); + ret = kexec_elf_load(image, &ehdr, &elf_info, &kbuf, &kernel_load_addr); if (ret) goto out; @@ -648,13 +113,13 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, pr_err("Error setting up the purgatory.\n"); out: - elf_free_info(&elf_info); + kexec_free_elf_info(&elf_info); /* Make kimage_file_post_load_cleanup free the fdt buffer for us. */ return ret ? ERR_PTR(ret) : fdt; } const struct kexec_file_ops kexec_elf64_ops = { - .probe = elf64_probe, + .probe = kexec_elf_probe, .load = elf64_load, }; -- cgit v1.2.3 From bc01bdf6c5df5023272a7399962cf64f8fedc93e Mon Sep 17 00:00:00 2001 From: Ravi Bangoria Date: Tue, 10 Sep 2019 18:45:13 +0530 Subject: powerpc/watchpoint: Disable watchpoint hit by larx/stcx instructions If watchpoint exception is generated by larx/stcx instructions, the reservation created by larx gets lost while handling exception, and thus stcx instruction always fails. Generally these instructions are used in a while(1) loop, for example spinlocks. And because stcx never succeeds, it loops forever and ultimately hangs the system. Note that ptrace anyway works in one-shot mode and thus for ptrace we don't change the behaviour. It's up to ptrace user to take care of this. Signed-off-by: Ravi Bangoria Acked-by: Naveen N. Rao Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190910131513.30499-1-ravi.bangoria@linux.ibm.com --- arch/powerpc/kernel/hw_breakpoint.c | 49 +++++++++++++++++++++++++------------ 1 file changed, 33 insertions(+), 16 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 28ad3171bb82..1007ec36b4cb 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -195,14 +195,32 @@ void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs) tsk->thread.last_hit_ubp = NULL; } +static bool is_larx_stcx_instr(struct pt_regs *regs, unsigned int instr) +{ + int ret, type; + struct instruction_op op; + + ret = analyse_instr(&op, regs, instr); + type = GETTYPE(op.type); + return (!ret && (type == LARX || type == STCX)); +} + /* * Handle debug exception notifications. */ static bool stepping_handler(struct pt_regs *regs, struct perf_event *bp, unsigned long addr) { - int stepped; - unsigned int instr; + unsigned int instr = 0; + + if (__get_user_inatomic(instr, (unsigned int *)regs->nip)) + goto fail; + + if (is_larx_stcx_instr(regs, instr)) { + printk_ratelimited("Breakpoint hit on instruction that can't be emulated." + " Breakpoint at 0x%lx will be disabled.\n", addr); + goto disable; + } /* Do not emulate user-space instructions, instead single-step them */ if (user_mode(regs)) { @@ -211,23 +229,22 @@ static bool stepping_handler(struct pt_regs *regs, struct perf_event *bp, return false; } - stepped = 0; - instr = 0; - if (!__get_user_inatomic(instr, (unsigned int *)regs->nip)) - stepped = emulate_step(regs, instr); + if (!emulate_step(regs, instr)) + goto fail; + return true; + +fail: /* - * emulate_step() could not execute it. We've failed in reliably - * handling the hw-breakpoint. Unregister it and throw a warning - * message to let the user know about it. + * We've failed in reliably handling the hw-breakpoint. Unregister + * it and throw a warning message to let the user know about it. */ - if (!stepped) { - WARN(1, "Unable to handle hardware breakpoint. Breakpoint at " - "0x%lx will be disabled.", addr); - perf_event_disable_inatomic(bp); - return false; - } - return true; + WARN(1, "Unable to handle hardware breakpoint. Breakpoint at " + "0x%lx will be disabled.", addr); + +disable: + perf_event_disable_inatomic(bp); + return false; } int hw_breakpoint_handler(struct die_args *args) -- cgit v1.2.3 From 1b7f3b6c43675ef2cfb9d8f48bde057794820f7c Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Fri, 13 Sep 2019 23:32:13 +1000 Subject: powerpc/eeh: Fix build with STACKTRACE=n The build breaks when STACKTRACE=n, eg. skiroot_defconfig: arch/powerpc/kernel/eeh_event.c:124:23: error: implicit declaration of function 'stack_trace_save' Fix it with some ifdefs for now. Fixes: 25baf3d81614 ("powerpc/eeh: Defer printing stack trace") Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/eeh_driver.c | 3 ++- arch/powerpc/kernel/eeh_event.c | 2 ++ 2 files changed, 4 insertions(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 80bd157fcb45..d9279d0ee9f5 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -908,6 +908,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) eeh_pe_loc_get(pe), eeh_pe_loc_get(phb_pe)); } +#ifdef CONFIG_STACKTRACE /* * Print the saved stack trace now that we've verified there's * something to recover. @@ -926,7 +927,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) pe->trace_entries = 0; } - +#endif /* CONFIG_STACKTRACE */ eeh_pe_update_time_stamp(pe); pe->freeze_count++; diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c index 1d55486adb0f..a7a8dc182efb 100644 --- a/arch/powerpc/kernel/eeh_event.c +++ b/arch/powerpc/kernel/eeh_event.c @@ -117,12 +117,14 @@ int __eeh_send_failure_event(struct eeh_pe *pe) * while the PE is sitting in the event queue. */ if (pe) { +#ifdef CONFIG_STACKTRACE /* * Save the current stack trace so we can dump it from the * event handler thread. */ pe->trace_entries = stack_trace_save(pe->stack_trace, ARRAY_SIZE(pe->stack_trace), 0); +#endif /* CONFIG_STACKTRACE */ eeh_pe_state_mark(pe, EEH_PE_RECOVERING); } -- cgit v1.2.3 From 0cb0837f9db1a6ed5b764ef61dd5f1a314b8231a Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Wed, 11 Sep 2019 21:57:43 +1000 Subject: powerpc/kvm: Move kvm_tmp into .text, shrink to 64K In some configurations of KVM, guests binary patch themselves to avoid/reduce trapping into the hypervisor. For some instructions this requires replacing one instruction with a sequence of instructions. For those cases we need to write the sequence of instructions somewhere and then patch the location of the original instruction to branch to the sequence. That requires that the location of the sequence be within 32MB of the original instruction. The current solution for this is that we create a 1MB array in BSS, write sequences into there, and then free the remainder of the array. This has a few problems: - it confuses kmemleak. - it confuses lockdep. - it requires mapping kvm_tmp executable, which can cause adjacent areas to also be mapped executable if we're using 16M pages for the linear mapping. - the 32MB limit can be exceeded if the kernel is big enough, especially with STRICT_KERNEL_RWX enabled, which then prevents the patching from working at all. We can fix all those problems by making kvm_tmp just a region of regular .text. However currently it's 1MB in size, and we don't want to waste 1MB of text. In practice however I only see ~30KB of kvm_tmp being used even for an allyes_config. So shrink kvm_tmp to 64K, which ought to be enough for everyone, and move it into .text. Signed-off-by: Michael Ellerman Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190911115746.12433-1-mpe@ellerman.id.au --- arch/powerpc/kernel/kvm.c | 24 +++++------------------- arch/powerpc/kernel/kvm_emul.S | 8 ++++++++ 2 files changed, 13 insertions(+), 19 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index b7b3a5e4e224..e3b5aa583319 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -64,7 +64,8 @@ #define KVM_INST_MTSRIN 0x7c0001e4 static bool kvm_patching_worked = true; -char kvm_tmp[1024 * 1024]; +extern char kvm_tmp[]; +extern char kvm_tmp_end[]; static int kvm_tmp_index; static inline void kvm_patch_ins(u32 *inst, u32 new_inst) @@ -132,7 +133,7 @@ static u32 *kvm_alloc(int len) { u32 *p; - if ((kvm_tmp_index + len) > ARRAY_SIZE(kvm_tmp)) { + if ((kvm_tmp_index + len) > (kvm_tmp_end - kvm_tmp)) { printk(KERN_ERR "KVM: No more space (%d + %d)\n", kvm_tmp_index, len); kvm_patching_worked = false; @@ -699,25 +700,13 @@ static void kvm_use_magic_page(void) kvm_patching_worked ? "worked" : "failed"); } -static __init void kvm_free_tmp(void) -{ - /* - * Inform kmemleak about the hole in the .bss section since the - * corresponding pages will be unmapped with DEBUG_PAGEALLOC=y. - */ - kmemleak_free_part(&kvm_tmp[kvm_tmp_index], - ARRAY_SIZE(kvm_tmp) - kvm_tmp_index); - free_reserved_area(&kvm_tmp[kvm_tmp_index], - &kvm_tmp[ARRAY_SIZE(kvm_tmp)], -1, NULL); -} - static int __init kvm_guest_init(void) { if (!kvm_para_available()) - goto free_tmp; + return 0; if (!epapr_paravirt_enabled) - goto free_tmp; + return 0; if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE)) kvm_use_magic_page(); @@ -727,9 +716,6 @@ static int __init kvm_guest_init(void) powersave_nap = 1; #endif -free_tmp: - kvm_free_tmp(); - return 0; } diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S index eb2568f583ae..9dd17dce10a1 100644 --- a/arch/powerpc/kernel/kvm_emul.S +++ b/arch/powerpc/kernel/kvm_emul.S @@ -334,5 +334,13 @@ kvm_emulate_mtsrin_orig_ins_offs: kvm_emulate_mtsrin_len: .long (kvm_emulate_mtsrin_end - kvm_emulate_mtsrin) / 4 + .balign 4 + .global kvm_tmp +kvm_tmp: + .space (64 * 1024) + +.global kvm_tmp_end +kvm_tmp_end: + .global kvm_template_end kvm_template_end: -- cgit v1.2.3 From 731dade128ebc35044e7f9b9d396e4c1bed6ecbc Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Wed, 11 Sep 2019 21:57:45 +1000 Subject: powerpc/kvm: Explicitly mark kvm guest code as __init All the code in kvm.c can be marked __init. Most of it is already inlined into the initcall, but not all. So instead of relying on the inlining, mark it all as __init. This saves ~280 bytes of text for my configuration. Signed-off-by: Michael Ellerman Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190911115746.12433-3-mpe@ellerman.id.au --- arch/powerpc/kernel/kvm.c | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index e3b5aa583319..617eba82531c 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -68,13 +68,13 @@ extern char kvm_tmp[]; extern char kvm_tmp_end[]; static int kvm_tmp_index; -static inline void kvm_patch_ins(u32 *inst, u32 new_inst) +static void __init kvm_patch_ins(u32 *inst, u32 new_inst) { *inst = new_inst; flush_icache_range((ulong)inst, (ulong)inst + 4); } -static void kvm_patch_ins_ll(u32 *inst, long addr, u32 rt) +static void __init kvm_patch_ins_ll(u32 *inst, long addr, u32 rt) { #ifdef CONFIG_64BIT kvm_patch_ins(inst, KVM_INST_LD | rt | (addr & 0x0000fffc)); @@ -83,7 +83,7 @@ static void kvm_patch_ins_ll(u32 *inst, long addr, u32 rt) #endif } -static void kvm_patch_ins_ld(u32 *inst, long addr, u32 rt) +static void __init kvm_patch_ins_ld(u32 *inst, long addr, u32 rt) { #ifdef CONFIG_64BIT kvm_patch_ins(inst, KVM_INST_LD | rt | (addr & 0x0000fffc)); @@ -92,12 +92,12 @@ static void kvm_patch_ins_ld(u32 *inst, long addr, u32 rt) #endif } -static void kvm_patch_ins_lwz(u32 *inst, long addr, u32 rt) +static void __init kvm_patch_ins_lwz(u32 *inst, long addr, u32 rt) { kvm_patch_ins(inst, KVM_INST_LWZ | rt | (addr & 0x0000ffff)); } -static void kvm_patch_ins_std(u32 *inst, long addr, u32 rt) +static void __init kvm_patch_ins_std(u32 *inst, long addr, u32 rt) { #ifdef CONFIG_64BIT kvm_patch_ins(inst, KVM_INST_STD | rt | (addr & 0x0000fffc)); @@ -106,17 +106,17 @@ static void kvm_patch_ins_std(u32 *inst, long addr, u32 rt) #endif } -static void kvm_patch_ins_stw(u32 *inst, long addr, u32 rt) +static void __init kvm_patch_ins_stw(u32 *inst, long addr, u32 rt) { kvm_patch_ins(inst, KVM_INST_STW | rt | (addr & 0x0000fffc)); } -static void kvm_patch_ins_nop(u32 *inst) +static void __init kvm_patch_ins_nop(u32 *inst) { kvm_patch_ins(inst, KVM_INST_NOP); } -static void kvm_patch_ins_b(u32 *inst, int addr) +static void __init kvm_patch_ins_b(u32 *inst, int addr) { #if defined(CONFIG_RELOCATABLE) && defined(CONFIG_PPC_BOOK3S) /* On relocatable kernels interrupts handlers and our code @@ -129,7 +129,7 @@ static void kvm_patch_ins_b(u32 *inst, int addr) kvm_patch_ins(inst, KVM_INST_B | (addr & KVM_INST_B_MASK)); } -static u32 *kvm_alloc(int len) +static u32 * __init kvm_alloc(int len) { u32 *p; @@ -152,7 +152,7 @@ extern u32 kvm_emulate_mtmsrd_orig_ins_offs; extern u32 kvm_emulate_mtmsrd_len; extern u32 kvm_emulate_mtmsrd[]; -static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt) +static void __init kvm_patch_ins_mtmsrd(u32 *inst, u32 rt) { u32 *p; int distance_start; @@ -205,7 +205,7 @@ extern u32 kvm_emulate_mtmsr_orig_ins_offs; extern u32 kvm_emulate_mtmsr_len; extern u32 kvm_emulate_mtmsr[]; -static void kvm_patch_ins_mtmsr(u32 *inst, u32 rt) +static void __init kvm_patch_ins_mtmsr(u32 *inst, u32 rt) { u32 *p; int distance_start; @@ -266,7 +266,7 @@ extern u32 kvm_emulate_wrtee_orig_ins_offs; extern u32 kvm_emulate_wrtee_len; extern u32 kvm_emulate_wrtee[]; -static void kvm_patch_ins_wrtee(u32 *inst, u32 rt, int imm_one) +static void __init kvm_patch_ins_wrtee(u32 *inst, u32 rt, int imm_one) { u32 *p; int distance_start; @@ -323,7 +323,7 @@ extern u32 kvm_emulate_wrteei_0_branch_offs; extern u32 kvm_emulate_wrteei_0_len; extern u32 kvm_emulate_wrteei_0[]; -static void kvm_patch_ins_wrteei_0(u32 *inst) +static void __init kvm_patch_ins_wrteei_0(u32 *inst) { u32 *p; int distance_start; @@ -364,7 +364,7 @@ extern u32 kvm_emulate_mtsrin_orig_ins_offs; extern u32 kvm_emulate_mtsrin_len; extern u32 kvm_emulate_mtsrin[]; -static void kvm_patch_ins_mtsrin(u32 *inst, u32 rt, u32 rb) +static void __init kvm_patch_ins_mtsrin(u32 *inst, u32 rt, u32 rb) { u32 *p; int distance_start; @@ -400,7 +400,7 @@ static void kvm_patch_ins_mtsrin(u32 *inst, u32 rt, u32 rb) #endif -static void kvm_map_magic_page(void *data) +static void __init kvm_map_magic_page(void *data) { u32 *features = data; @@ -415,7 +415,7 @@ static void kvm_map_magic_page(void *data) *features = out[0]; } -static void kvm_check_ins(u32 *inst, u32 features) +static void __init kvm_check_ins(u32 *inst, u32 features) { u32 _inst = *inst; u32 inst_no_rt = _inst & ~KVM_MASK_RT; @@ -659,7 +659,7 @@ static void kvm_check_ins(u32 *inst, u32 features) extern u32 kvm_template_start[]; extern u32 kvm_template_end[]; -static void kvm_use_magic_page(void) +static void __init kvm_use_magic_page(void) { u32 *p; u32 *start, *end; -- cgit v1.2.3 From caff52dc0b71538cff43c06ede3621fbbf359978 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Wed, 11 Sep 2019 21:57:46 +1000 Subject: powerpc/kvm: Add ifdefs around template code Some of the templates used for KVM patching are only used on certain platforms, but currently they are always built-in, fix that. Signed-off-by: Michael Ellerman Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190911115746.12433-4-mpe@ellerman.id.au --- arch/powerpc/kernel/kvm_emul.S | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S index 9dd17dce10a1..7af6f8b50c5d 100644 --- a/arch/powerpc/kernel/kvm_emul.S +++ b/arch/powerpc/kernel/kvm_emul.S @@ -192,6 +192,8 @@ kvm_emulate_mtmsr_orig_ins_offs: kvm_emulate_mtmsr_len: .long (kvm_emulate_mtmsr_end - kvm_emulate_mtmsr) / 4 +#ifdef CONFIG_BOOKE + /* also used for wrteei 1 */ .global kvm_emulate_wrtee kvm_emulate_wrtee: @@ -285,6 +287,10 @@ kvm_emulate_wrteei_0_branch_offs: kvm_emulate_wrteei_0_len: .long (kvm_emulate_wrteei_0_end - kvm_emulate_wrteei_0) / 4 +#endif /* CONFIG_BOOKE */ + +#ifdef CONFIG_PPC_BOOK3S_32 + .global kvm_emulate_mtsrin kvm_emulate_mtsrin: @@ -334,6 +340,8 @@ kvm_emulate_mtsrin_orig_ins_offs: kvm_emulate_mtsrin_len: .long (kvm_emulate_mtsrin_end - kvm_emulate_mtsrin) / 4 +#endif /* CONFIG_PPC_BOOK3S_32 */ + .balign 4 .global kvm_tmp kvm_tmp: -- cgit v1.2.3 From 1fdfa4c6af0cc1854b017f308af6bece94568bb6 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Thu, 12 Sep 2019 16:40:37 +0900 Subject: powerpc: improve prom_init_check rule This slightly improves the prom_init_check rule. [1] Avoid needless check Currently, prom_init_check.sh is invoked every time you run 'make' even if you have changed nothing in prom_init.c. With this commit, the script is re-run only when prom_init.o is recompiled. [2] Beautify the build log Currently, the O= build shows the absolute path to the script: CALL /abs/path/to/source/of/linux/arch/powerpc/kernel/prom_init_check.sh With this commit, it is always a relative path to the timestamp file: PROMCHK arch/powerpc/kernel/prom_init_check Signed-off-by: Masahiro Yamada Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190912074037.13813-1-yamada.masahiro@socionext.com --- arch/powerpc/kernel/.gitignore | 1 + arch/powerpc/kernel/Makefile | 14 ++++++-------- 2 files changed, 7 insertions(+), 8 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/.gitignore b/arch/powerpc/kernel/.gitignore index c5f676c3c224..67ebd3003c05 100644 --- a/arch/powerpc/kernel/.gitignore +++ b/arch/powerpc/kernel/.gitignore @@ -1 +1,2 @@ +prom_init_check vmlinux.lds diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index c6ae0e7914bc..66c54443187d 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -188,15 +188,13 @@ extra-$(CONFIG_ALTIVEC) += vector.o extra-$(CONFIG_PPC64) += entry_64.o extra-$(CONFIG_PPC_OF_BOOT_TRAMPOLINE) += prom_init.o -ifdef CONFIG_PPC_OF_BOOT_TRAMPOLINE -$(obj)/built-in.a: prom_init_check +extra-$(CONFIG_PPC_OF_BOOT_TRAMPOLINE) += prom_init_check -quiet_cmd_prom_init_check = CALL $< - cmd_prom_init_check = $(CONFIG_SHELL) $< "$(NM)" "$(obj)/prom_init.o" +quiet_cmd_prom_init_check = PROMCHK $@ + cmd_prom_init_check = $(CONFIG_SHELL) $< "$(NM)" $(obj)/prom_init.o; touch $@ -PHONY += prom_init_check -prom_init_check: $(src)/prom_init_check.sh $(obj)/prom_init.o - $(call cmd,prom_init_check) -endif +$(obj)/prom_init_check: $(src)/prom_init_check.sh $(obj)/prom_init.o FORCE + $(call if_changed,prom_init_check) +targets += prom_init_check clean-files := vmlinux.lds -- cgit v1.2.3 From ca986d7fa7e7f7b3f018f227b999f35e654fbb79 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:16:21 +0530 Subject: powerpc/fadump: move internal macros/definitions to a new header Though asm/fadump.h is meant to be used by other components dealing with FADump, it also has macros/definitions internal to FADump code. Move them to a new header file used within FADump code. This also makes way for refactoring platform specific FADump code. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821313134.5656.6597770626574392140.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 4eab97292cc2..7c55044cf9d4 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -30,6 +30,7 @@ #include #include #include +#include #include static struct fw_dump fw_dump; -- cgit v1.2.3 From 961cf26a98648a294de45ea6f806dc84dfc91197 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:16:36 +0530 Subject: powerpc/fadump: add helper functions Add helper functions to setup & free CPU notes buffer and to find if a given memory area is contiguous. Also, use boolean as return type for the function that finds if boot memory area is contiguous. While at it, save the virtual address of CPU notes buffer instead of physical address as virtual address is used often. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821318971.5656.9281936950510635858.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 127 ++++++++++++++++++++++--------------------- 1 file changed, 66 insertions(+), 61 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 7c55044cf9d4..e2b83a991303 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -201,64 +201,55 @@ int is_fadump_active(void) } /* - * Returns 1, if there are no holes in boot memory area, - * 0 otherwise. + * Returns true, if there are no holes in memory area between d_start to d_end, + * false otherwise. */ -static int is_boot_memory_area_contiguous(void) +static bool is_fadump_mem_area_contiguous(u64 d_start, u64 d_end) { struct memblock_region *reg; - unsigned long tstart, tend; - unsigned long start_pfn = PHYS_PFN(RMA_START); - unsigned long end_pfn = PHYS_PFN(RMA_START + fw_dump.boot_memory_size); - unsigned int ret = 0; + bool ret = false; + u64 start, end; for_each_memblock(memory, reg) { - tstart = max(start_pfn, memblock_region_memory_base_pfn(reg)); - tend = min(end_pfn, memblock_region_memory_end_pfn(reg)); - if (tstart < tend) { - /* Memory hole from start_pfn to tstart */ - if (tstart > start_pfn) + start = max_t(u64, d_start, reg->base); + end = min_t(u64, d_end, (reg->base + reg->size)); + if (d_start < end) { + /* Memory hole from d_start to start */ + if (start > d_start) break; - if (tend == end_pfn) { - ret = 1; + if (end == d_end) { + ret = true; break; } - start_pfn = tend + 1; + d_start = end + 1; } } return ret; } +/* + * Returns true, if there are no holes in boot memory area, + * false otherwise. + */ +static bool is_boot_memory_area_contiguous(void) +{ + return is_fadump_mem_area_contiguous(0, fw_dump.boot_memory_size); +} + /* * Returns true, if there are no holes in reserved memory area, * false otherwise. */ static bool is_reserved_memory_area_contiguous(void) { - struct memblock_region *reg; - unsigned long start, end; - unsigned long d_start = fw_dump.reserve_dump_area_start; - unsigned long d_end = d_start + fw_dump.reserve_dump_area_size; + u64 d_start, d_end; - for_each_memblock(memory, reg) { - start = max(d_start, (unsigned long)reg->base); - end = min(d_end, (unsigned long)(reg->base + reg->size)); - if (d_start < end) { - /* Memory hole from d_start to start */ - if (start > d_start) - break; - - if (end == d_end) - return true; - - d_start = end + 1; - } - } - - return false; + d_start = fw_dump.reserve_dump_area_start; + d_end = d_start + fw_dump.reserve_dump_area_size; + return is_fadump_mem_area_contiguous(d_start, d_end); } /* Print firmware assisted dump configurations for debugging purpose. */ @@ -785,7 +776,7 @@ static void fadump_update_elfcore_header(char *bufp) phdr = (struct elf_phdr *)bufp; if (phdr->p_type == PT_NOTE) { - phdr->p_paddr = fw_dump.cpu_notes_buf; + phdr->p_paddr = __pa(fw_dump.cpu_notes_buf_vaddr); phdr->p_offset = phdr->p_paddr; phdr->p_filesz = fw_dump.cpu_notes_buf_size; phdr->p_memsz = fw_dump.cpu_notes_buf_size; @@ -793,7 +784,7 @@ static void fadump_update_elfcore_header(char *bufp) return; } -static void *fadump_cpu_notes_buf_alloc(unsigned long size) +static void *fadump_alloc_buffer(unsigned long size) { void *vaddr; struct page *page; @@ -811,7 +802,7 @@ static void *fadump_cpu_notes_buf_alloc(unsigned long size) return vaddr; } -static void fadump_cpu_notes_buf_free(unsigned long vaddr, unsigned long size) +static void fadump_free_buffer(unsigned long vaddr, unsigned long size) { struct page *page; unsigned long order, count, i; @@ -824,6 +815,36 @@ static void fadump_cpu_notes_buf_free(unsigned long vaddr, unsigned long size) __free_pages(page, order); } +static s32 fadump_setup_cpu_notes_buf(u32 num_cpus) +{ + /* Allocate buffer to hold cpu crash notes. */ + fw_dump.cpu_notes_buf_size = num_cpus * sizeof(note_buf_t); + fw_dump.cpu_notes_buf_size = PAGE_ALIGN(fw_dump.cpu_notes_buf_size); + fw_dump.cpu_notes_buf_vaddr = + (unsigned long)fadump_alloc_buffer(fw_dump.cpu_notes_buf_size); + if (!fw_dump.cpu_notes_buf_vaddr) { + pr_err("Failed to allocate %ld bytes for CPU notes buffer\n", + fw_dump.cpu_notes_buf_size); + return -ENOMEM; + } + + pr_debug("Allocated buffer for cpu notes of size %ld at 0x%lx\n", + fw_dump.cpu_notes_buf_size, + fw_dump.cpu_notes_buf_vaddr); + return 0; +} + +static void fadump_free_cpu_notes_buf(void) +{ + if (!fw_dump.cpu_notes_buf_vaddr) + return; + + fadump_free_buffer(fw_dump.cpu_notes_buf_vaddr, + fw_dump.cpu_notes_buf_size); + fw_dump.cpu_notes_buf_vaddr = 0; + fw_dump.cpu_notes_buf_size = 0; +} + /* * Read CPU state dump data and convert it into ELF notes. * The CPU dump starts with magic number "REGSAVE". NumCpusOffset should be @@ -870,19 +891,11 @@ static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm) vaddr += sizeof(u32); reg_entry = (struct fadump_reg_entry *)vaddr; - /* Allocate buffer to hold cpu crash notes. */ - fw_dump.cpu_notes_buf_size = num_cpus * sizeof(note_buf_t); - fw_dump.cpu_notes_buf_size = PAGE_ALIGN(fw_dump.cpu_notes_buf_size); - note_buf = fadump_cpu_notes_buf_alloc(fw_dump.cpu_notes_buf_size); - if (!note_buf) { - printk(KERN_ERR "Failed to allocate 0x%lx bytes for " - "cpu notes buffer\n", fw_dump.cpu_notes_buf_size); - return -ENOMEM; - } - fw_dump.cpu_notes_buf = __pa(note_buf); + rc = fadump_setup_cpu_notes_buf(num_cpus); + if (rc != 0) + return rc; - pr_debug("Allocated buffer for cpu notes of size %ld at %p\n", - (num_cpus * sizeof(note_buf_t)), note_buf); + note_buf = (u32 *)fw_dump.cpu_notes_buf_vaddr; if (fw_dump.fadumphdr_addr) fdh = __va(fw_dump.fadumphdr_addr); @@ -920,10 +933,7 @@ static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm) return 0; error_out: - fadump_cpu_notes_buf_free((unsigned long)__va(fw_dump.cpu_notes_buf), - fw_dump.cpu_notes_buf_size); - fw_dump.cpu_notes_buf = 0; - fw_dump.cpu_notes_buf_size = 0; + fadump_free_cpu_notes_buf(); return rc; } @@ -1470,13 +1480,8 @@ static void fadump_invalidate_release_mem(void) fw_dump.reserve_dump_area_size = get_fadump_area_size(); fadump_release_memory(reserved_area_start, reserved_area_end); - if (fw_dump.cpu_notes_buf) { - fadump_cpu_notes_buf_free( - (unsigned long)__va(fw_dump.cpu_notes_buf), - fw_dump.cpu_notes_buf_size); - fw_dump.cpu_notes_buf = 0; - fw_dump.cpu_notes_buf_size = 0; - } + fadump_free_cpu_notes_buf(); + /* Initialize the kernel dump memory structure for FAD registration. */ init_fadump_mem_struct(&fdm, fw_dump.reserve_dump_area_start); } -- cgit v1.2.3 From 7f0ad11d3fb948a0d7770bd38ae17a51413c3dac Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:16:52 +0530 Subject: powerpc/fadump: declare helper functions in internal header file Declare helper functions, that can be reused by multiple platforms, in the internal header file. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821320487.5656.2660730464212209984.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index e2b83a991303..eb0745e418db 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -234,7 +234,7 @@ static bool is_fadump_mem_area_contiguous(u64 d_start, u64 d_end) * Returns true, if there are no holes in boot memory area, * false otherwise. */ -static bool is_boot_memory_area_contiguous(void) +bool is_fadump_boot_mem_contiguous(void) { return is_fadump_mem_area_contiguous(0, fw_dump.boot_memory_size); } @@ -243,7 +243,7 @@ static bool is_boot_memory_area_contiguous(void) * Returns true, if there are no holes in reserved memory area, * false otherwise. */ -static bool is_reserved_memory_area_contiguous(void) +bool is_fadump_reserved_mem_contiguous(void) { u64 d_start, d_end; @@ -617,9 +617,9 @@ static int register_fw_dump(struct fadump_mem_struct *fdm) " dump. Hardware Error(%d).\n", rc); break; case -3: - if (!is_boot_memory_area_contiguous()) + if (!is_fadump_boot_mem_contiguous()) pr_err("Can't have holes in boot memory area while registering fadump\n"); - else if (!is_reserved_memory_area_contiguous()) + else if (!is_fadump_reserved_mem_contiguous()) pr_err("Can't have holes in reserved memory area while" " registering fadump\n"); @@ -749,7 +749,7 @@ fadump_read_registers(struct fadump_reg_entry *reg_entry, struct pt_regs *regs) return reg_entry; } -static u32 *fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs) +u32 *fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs) { struct elf_prstatus prstatus; @@ -764,7 +764,7 @@ static u32 *fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs) return buf; } -static void fadump_update_elfcore_header(char *bufp) +void fadump_update_elfcore_header(char *bufp) { struct elfhdr *elf; struct elf_phdr *phdr; @@ -815,7 +815,7 @@ static void fadump_free_buffer(unsigned long vaddr, unsigned long size) __free_pages(page, order); } -static s32 fadump_setup_cpu_notes_buf(u32 num_cpus) +s32 fadump_setup_cpu_notes_buf(u32 num_cpus) { /* Allocate buffer to hold cpu crash notes. */ fw_dump.cpu_notes_buf_size = num_cpus * sizeof(note_buf_t); @@ -834,7 +834,7 @@ static s32 fadump_setup_cpu_notes_buf(u32 num_cpus) return 0; } -static void fadump_free_cpu_notes_buf(void) +void fadump_free_cpu_notes_buf(void) { if (!fw_dump.cpu_notes_buf_vaddr) return; -- cgit v1.2.3 From 72aa651795f0e9f48bfdb2b2dd0b3e6900351d2a Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:17:56 +0530 Subject: powerpc/fadump: use helper functions to reserve/release cpu notes buffer Use helper functions to simplify memory allocation, pinning down and freeing the memory used for CPU notes buffer. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821323555.5656.2486038022572739622.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index eb0745e418db..994fc09e9cbf 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -786,33 +786,24 @@ void fadump_update_elfcore_header(char *bufp) static void *fadump_alloc_buffer(unsigned long size) { - void *vaddr; + unsigned long count, i; struct page *page; - unsigned long order, count, i; + void *vaddr; - order = get_order(size); - vaddr = (void *)__get_free_pages(GFP_KERNEL|__GFP_ZERO, order); + vaddr = alloc_pages_exact(size, GFP_KERNEL | __GFP_ZERO); if (!vaddr) return NULL; - count = 1 << order; + count = PAGE_ALIGN(size) / PAGE_SIZE; page = virt_to_page(vaddr); for (i = 0; i < count; i++) - SetPageReserved(page + i); + mark_page_reserved(page + i); return vaddr; } static void fadump_free_buffer(unsigned long vaddr, unsigned long size) { - struct page *page; - unsigned long order, count, i; - - order = get_order(size); - count = 1 << order; - page = virt_to_page(vaddr); - for (i = 0; i < count; i++) - ClearPageReserved(page + i); - __free_pages(page, order); + free_reserved_area((void *)vaddr, (void *)(vaddr + size), -1, NULL); } s32 fadump_setup_cpu_notes_buf(u32 num_cpus) -- cgit v1.2.3 From 0226e55275e569126882a7befe0b1a1c9bd270aa Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:18:14 +0530 Subject: powerpc/fadump: move rtas specific definitions to platform code Currently, FADump is only supported on pSeries but that is going to change soon with FADump support being added on PowerNV platform. So, move rtas specific definitions to platform code to allow FADump to have multiple platforms support. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821328494.5656.16219929140866195511.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 96 ++++++++++++++++++++++---------------------- 1 file changed, 49 insertions(+), 47 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 994fc09e9cbf..03f2708cd954 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -33,12 +33,11 @@ #include #include +#include "../platforms/pseries/rtas-fadump.h" + static struct fw_dump fw_dump; -static struct fadump_mem_struct fdm; -static const struct fadump_mem_struct *fdm_active; -#ifdef CONFIG_CMA -static struct cma *fadump_cma; -#endif +static struct rtas_fadump_mem_struct fdm; +static const struct rtas_fadump_mem_struct *fdm_active; static DEFINE_MUTEX(fadump_mutex); struct fad_crash_memory_ranges *crash_memory_ranges; @@ -47,6 +46,8 @@ int crash_mem_ranges; int max_crash_mem_ranges; #ifdef CONFIG_CMA +static struct cma *fadump_cma; + /* * fadump_cma_init() - Initialize CMA area from a fadump reserved memory * @@ -156,11 +157,11 @@ int __init early_init_dt_scan_fw_dump(unsigned long node, u32 type = (u32)of_read_number(sections, 1); switch (type) { - case FADUMP_CPU_STATE_DATA: + case RTAS_FADUMP_CPU_STATE_DATA: fw_dump.cpu_state_data_size = of_read_ulong(§ions[1], 2); break; - case FADUMP_HPTE_REGION: + case RTAS_FADUMP_HPTE_REGION: fw_dump.hpte_region_size = of_read_ulong(§ions[1], 2); break; @@ -271,20 +272,20 @@ static void fadump_show_config(void) pr_debug("Boot memory size : %lx\n", fw_dump.boot_memory_size); } -static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm, +static unsigned long init_fadump_mem_struct(struct rtas_fadump_mem_struct *fdm, unsigned long addr) { if (!fdm) return 0; - memset(fdm, 0, sizeof(struct fadump_mem_struct)); + memset(fdm, 0, sizeof(struct rtas_fadump_mem_struct)); addr = addr & PAGE_MASK; fdm->header.dump_format_version = cpu_to_be32(0x00000001); fdm->header.dump_num_sections = cpu_to_be16(3); fdm->header.dump_status_flag = 0; fdm->header.offset_first_dump_section = - cpu_to_be32((u32)offsetof(struct fadump_mem_struct, cpu_state_data)); + cpu_to_be32((u32)offsetof(struct rtas_fadump_mem_struct, cpu_state_data)); /* * Fields for disk dump option. @@ -300,24 +301,24 @@ static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm, /* Kernel dump sections */ /* cpu state data section. */ - fdm->cpu_state_data.request_flag = cpu_to_be32(FADUMP_REQUEST_FLAG); - fdm->cpu_state_data.source_data_type = cpu_to_be16(FADUMP_CPU_STATE_DATA); + fdm->cpu_state_data.request_flag = cpu_to_be32(RTAS_FADUMP_REQUEST_FLAG); + fdm->cpu_state_data.source_data_type = cpu_to_be16(RTAS_FADUMP_CPU_STATE_DATA); fdm->cpu_state_data.source_address = 0; fdm->cpu_state_data.source_len = cpu_to_be64(fw_dump.cpu_state_data_size); fdm->cpu_state_data.destination_address = cpu_to_be64(addr); addr += fw_dump.cpu_state_data_size; /* hpte region section */ - fdm->hpte_region.request_flag = cpu_to_be32(FADUMP_REQUEST_FLAG); - fdm->hpte_region.source_data_type = cpu_to_be16(FADUMP_HPTE_REGION); + fdm->hpte_region.request_flag = cpu_to_be32(RTAS_FADUMP_REQUEST_FLAG); + fdm->hpte_region.source_data_type = cpu_to_be16(RTAS_FADUMP_HPTE_REGION); fdm->hpte_region.source_address = 0; fdm->hpte_region.source_len = cpu_to_be64(fw_dump.hpte_region_size); fdm->hpte_region.destination_address = cpu_to_be64(addr); addr += fw_dump.hpte_region_size; /* RMA region section */ - fdm->rmr_region.request_flag = cpu_to_be32(FADUMP_REQUEST_FLAG); - fdm->rmr_region.source_data_type = cpu_to_be16(FADUMP_REAL_MODE_REGION); + fdm->rmr_region.request_flag = cpu_to_be32(RTAS_FADUMP_REQUEST_FLAG); + fdm->rmr_region.source_data_type = cpu_to_be16(RTAS_FADUMP_REAL_MODE_REGION); fdm->rmr_region.source_address = cpu_to_be64(RMA_START); fdm->rmr_region.source_len = cpu_to_be64(fw_dump.boot_memory_size); fdm->rmr_region.destination_address = cpu_to_be64(addr); @@ -588,7 +589,7 @@ static int __init early_fadump_reserve_mem(char *p) } early_param("fadump_reserve_mem", early_fadump_reserve_mem); -static int register_fw_dump(struct fadump_mem_struct *fdm) +static int register_fw_dump(struct rtas_fadump_mem_struct *fdm) { int rc, err; unsigned int wait_time; @@ -599,7 +600,7 @@ static int register_fw_dump(struct fadump_mem_struct *fdm) do { rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL, FADUMP_REGISTER, fdm, - sizeof(struct fadump_mem_struct)); + sizeof(struct rtas_fadump_mem_struct)); wait_time = rtas_busy_delay_time(rc); if (wait_time) @@ -695,7 +696,7 @@ static inline int fadump_gpr_index(u64 id) int i = -1; char str[3]; - if ((id & GPR_MASK) == REG_ID("GPR")) { + if ((id & GPR_MASK) == fadump_str_to_u64("GPR")) { /* get the digits at the end */ id &= ~GPR_MASK; id >>= 24; @@ -717,30 +718,30 @@ static inline void fadump_set_regval(struct pt_regs *regs, u64 reg_id, i = fadump_gpr_index(reg_id); if (i >= 0) regs->gpr[i] = (unsigned long)reg_val; - else if (reg_id == REG_ID("NIA")) + else if (reg_id == fadump_str_to_u64("NIA")) regs->nip = (unsigned long)reg_val; - else if (reg_id == REG_ID("MSR")) + else if (reg_id == fadump_str_to_u64("MSR")) regs->msr = (unsigned long)reg_val; - else if (reg_id == REG_ID("CTR")) + else if (reg_id == fadump_str_to_u64("CTR")) regs->ctr = (unsigned long)reg_val; - else if (reg_id == REG_ID("LR")) + else if (reg_id == fadump_str_to_u64("LR")) regs->link = (unsigned long)reg_val; - else if (reg_id == REG_ID("XER")) + else if (reg_id == fadump_str_to_u64("XER")) regs->xer = (unsigned long)reg_val; - else if (reg_id == REG_ID("CR")) + else if (reg_id == fadump_str_to_u64("CR")) regs->ccr = (unsigned long)reg_val; - else if (reg_id == REG_ID("DAR")) + else if (reg_id == fadump_str_to_u64("DAR")) regs->dar = (unsigned long)reg_val; - else if (reg_id == REG_ID("DSISR")) + else if (reg_id == fadump_str_to_u64("DSISR")) regs->dsisr = (unsigned long)reg_val; } -static struct fadump_reg_entry* -fadump_read_registers(struct fadump_reg_entry *reg_entry, struct pt_regs *regs) +static struct rtas_fadump_reg_entry* +fadump_read_registers(struct rtas_fadump_reg_entry *reg_entry, struct pt_regs *regs) { memset(regs, 0, sizeof(struct pt_regs)); - while (be64_to_cpu(reg_entry->reg_id) != REG_ID("CPUEND")) { + while (be64_to_cpu(reg_entry->reg_id) != fadump_str_to_u64("CPUEND")) { fadump_set_regval(regs, be64_to_cpu(reg_entry->reg_id), be64_to_cpu(reg_entry->reg_value)); reg_entry++; @@ -850,10 +851,10 @@ void fadump_free_cpu_notes_buf(void) * state from fadump crash info structure populated by first kernel at the * time of crash. */ -static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm) +static int __init fadump_build_cpu_notes(const struct rtas_fadump_mem_struct *fdm) { - struct fadump_reg_save_area_header *reg_header; - struct fadump_reg_entry *reg_entry; + struct rtas_fadump_reg_save_area_header *reg_header; + struct rtas_fadump_reg_entry *reg_entry; struct fadump_crash_info_header *fdh = NULL; void *vaddr; unsigned long addr; @@ -868,7 +869,8 @@ static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm) vaddr = __va(addr); reg_header = vaddr; - if (be64_to_cpu(reg_header->magic_number) != REGSAVE_AREA_MAGIC) { + if (be64_to_cpu(reg_header->magic_number) != + fadump_str_to_u64("REGSAVE")) { printk(KERN_ERR "Unable to read register save area.\n"); return -ENOENT; } @@ -880,7 +882,7 @@ static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm) num_cpus = be32_to_cpu(*((__be32 *)(vaddr))); pr_debug("NumCpus : %u\n", num_cpus); vaddr += sizeof(u32); - reg_entry = (struct fadump_reg_entry *)vaddr; + reg_entry = (struct rtas_fadump_reg_entry *)vaddr; rc = fadump_setup_cpu_notes_buf(num_cpus); if (rc != 0) @@ -892,22 +894,22 @@ static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm) fdh = __va(fw_dump.fadumphdr_addr); for (i = 0; i < num_cpus; i++) { - if (be64_to_cpu(reg_entry->reg_id) != REG_ID("CPUSTRT")) { + if (be64_to_cpu(reg_entry->reg_id) != fadump_str_to_u64("CPUSTRT")) { printk(KERN_ERR "Unable to read CPU state data\n"); rc = -ENOENT; goto error_out; } /* Lower 4 bytes of reg_value contains logical cpu id */ - cpu = be64_to_cpu(reg_entry->reg_value) & FADUMP_CPU_ID_MASK; + cpu = be64_to_cpu(reg_entry->reg_value) & RTAS_FADUMP_CPU_ID_MASK; if (fdh && !cpumask_test_cpu(cpu, &fdh->online_mask)) { - SKIP_TO_NEXT_CPU(reg_entry); + RTAS_FADUMP_SKIP_TO_NEXT_CPU(reg_entry); continue; } pr_debug("Reading register data for cpu %d...\n", cpu); if (fdh && fdh->crashing_cpu == cpu) { regs = fdh->regs; note_buf = fadump_regs_to_elf_notes(note_buf, ®s); - SKIP_TO_NEXT_CPU(reg_entry); + RTAS_FADUMP_SKIP_TO_NEXT_CPU(reg_entry); } else { reg_entry++; reg_entry = fadump_read_registers(reg_entry, ®s); @@ -933,7 +935,7 @@ error_out: * Validate and process the dump data stored by firmware before exporting * it through '/proc/vmcore'. */ -static int __init process_fadump(const struct fadump_mem_struct *fdm_active) +static int __init process_fadump(const struct rtas_fadump_mem_struct *fdm_active) { struct fadump_crash_info_header *fdh; int rc = 0; @@ -942,7 +944,7 @@ static int __init process_fadump(const struct fadump_mem_struct *fdm_active) return -EINVAL; /* Check if the dump data is valid. */ - if ((be16_to_cpu(fdm_active->header.dump_status_flag) == FADUMP_ERROR_FLAG) || + if ((be16_to_cpu(fdm_active->header.dump_status_flag) == RTAS_FADUMP_ERROR_FLAG) || (fdm_active->cpu_state_data.error_flags != 0) || (fdm_active->rmr_region.error_flags != 0)) { printk(KERN_ERR "Dump taken by platform is not valid\n"); @@ -1273,7 +1275,7 @@ static unsigned long init_fadump_header(unsigned long addr) fdh->magic_number = FADUMP_CRASH_INFO_MAGIC; fdh->elfcorehdr_addr = addr; /* We will set the crashing cpu id in crash_fadump() during crash. */ - fdh->crashing_cpu = CPU_UNKNOWN; + fdh->crashing_cpu = FADUMP_CPU_UNKNOWN; return addr; } @@ -1307,7 +1309,7 @@ static int register_fadump(void) return register_fw_dump(&fdm); } -static int fadump_unregister_dump(struct fadump_mem_struct *fdm) +static int fadump_unregister_dump(struct rtas_fadump_mem_struct *fdm) { int rc = 0; unsigned int wait_time; @@ -1318,7 +1320,7 @@ static int fadump_unregister_dump(struct fadump_mem_struct *fdm) do { rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL, FADUMP_UNREGISTER, fdm, - sizeof(struct fadump_mem_struct)); + sizeof(struct rtas_fadump_mem_struct)); wait_time = rtas_busy_delay_time(rc); if (wait_time) @@ -1334,7 +1336,7 @@ static int fadump_unregister_dump(struct fadump_mem_struct *fdm) return 0; } -static int fadump_invalidate_dump(const struct fadump_mem_struct *fdm) +static int fadump_invalidate_dump(const struct rtas_fadump_mem_struct *fdm) { int rc = 0; unsigned int wait_time; @@ -1345,7 +1347,7 @@ static int fadump_invalidate_dump(const struct fadump_mem_struct *fdm) do { rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL, FADUMP_INVALIDATE, fdm, - sizeof(struct fadump_mem_struct)); + sizeof(struct rtas_fadump_mem_struct)); wait_time = rtas_busy_delay_time(rc); if (wait_time) @@ -1561,7 +1563,7 @@ unlock_out: static int fadump_region_show(struct seq_file *m, void *private) { - const struct fadump_mem_struct *fdm_ptr; + const struct rtas_fadump_mem_struct *fdm_ptr; if (!fw_dump.fadump_enabled) return 0; -- cgit v1.2.3 From d3833a7010817f82bff373e26d146e6401c695f4 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:18:40 +0530 Subject: powerpc/fadump: introduce callbacks for platform specific operations Introduce callback functions for platform specific operations like register, unregister, invalidate & such. Also, define place-holders for the same on pSeries platform. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821330286.5656.15538934400074110770.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 45 +------------------------------------------- 1 file changed, 1 insertion(+), 44 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 03f2708cd954..aa342ee53acb 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -112,24 +112,10 @@ static int __init fadump_cma_init(void) { return 1; } int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname, int depth, void *data) { - const __be32 *sections; - int i, num_sections; - int size; - const __be32 *token; - if (depth != 1 || strcmp(uname, "rtas") != 0) return 0; - /* - * Check if Firmware Assisted dump is supported. if yes, check - * if dump has been initiated on last reboot. - */ - token = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump", NULL); - if (!token) - return 1; - - fw_dump.fadump_supported = 1; - fw_dump.ibm_configure_kernel_dump = be32_to_cpu(*token); + rtas_fadump_dt_scan(&fw_dump, node); /* * The 'ibm,kernel-dump' rtas node is present only if there is @@ -139,35 +125,6 @@ int __init early_init_dt_scan_fw_dump(unsigned long node, if (fdm_active) fw_dump.dump_active = 1; - /* Get the sizes required to store dump data for the firmware provided - * dump sections. - * For each dump section type supported, a 32bit cell which defines - * the ID of a supported section followed by two 32 bit cells which - * gives teh size of the section in bytes. - */ - sections = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes", - &size); - - if (!sections) - return 1; - - num_sections = size / (3 * sizeof(u32)); - - for (i = 0; i < num_sections; i++, sections += 3) { - u32 type = (u32)of_read_number(sections, 1); - - switch (type) { - case RTAS_FADUMP_CPU_STATE_DATA: - fw_dump.cpu_state_data_size = - of_read_ulong(§ions[1], 2); - break; - case RTAS_FADUMP_HPTE_REGION: - fw_dump.hpte_region_size = - of_read_ulong(§ions[1], 2); - break; - } - } - return 1; } -- cgit v1.2.3 From 41a65d1618238e63be1439871eaf44dc3c6a737c Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:18:57 +0530 Subject: pseries/fadump: define RTAS register/un-register callback functions Move platform specific register/un-register code, the RTAS calls, to register/un-register callback functions. This would also mean moving code that initializes and prints the platform specific FADump data. Signed-off-by: Hari Bathini Reviewed-by: Mahesh Salgaonkar Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821332856.5656.16380417702046411631.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 165 ++++--------------------------------------- 1 file changed, 15 insertions(+), 150 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index aa342ee53acb..3cf621080427 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -36,7 +36,6 @@ #include "../platforms/pseries/rtas-fadump.h" static struct fw_dump fw_dump; -static struct rtas_fadump_mem_struct fdm; static const struct rtas_fadump_mem_struct *fdm_active; static DEFINE_MUTEX(fadump_mutex); @@ -229,61 +228,6 @@ static void fadump_show_config(void) pr_debug("Boot memory size : %lx\n", fw_dump.boot_memory_size); } -static unsigned long init_fadump_mem_struct(struct rtas_fadump_mem_struct *fdm, - unsigned long addr) -{ - if (!fdm) - return 0; - - memset(fdm, 0, sizeof(struct rtas_fadump_mem_struct)); - addr = addr & PAGE_MASK; - - fdm->header.dump_format_version = cpu_to_be32(0x00000001); - fdm->header.dump_num_sections = cpu_to_be16(3); - fdm->header.dump_status_flag = 0; - fdm->header.offset_first_dump_section = - cpu_to_be32((u32)offsetof(struct rtas_fadump_mem_struct, cpu_state_data)); - - /* - * Fields for disk dump option. - * We are not using disk dump option, hence set these fields to 0. - */ - fdm->header.dd_block_size = 0; - fdm->header.dd_block_offset = 0; - fdm->header.dd_num_blocks = 0; - fdm->header.dd_offset_disk_path = 0; - - /* set 0 to disable an automatic dump-reboot. */ - fdm->header.max_time_auto = 0; - - /* Kernel dump sections */ - /* cpu state data section. */ - fdm->cpu_state_data.request_flag = cpu_to_be32(RTAS_FADUMP_REQUEST_FLAG); - fdm->cpu_state_data.source_data_type = cpu_to_be16(RTAS_FADUMP_CPU_STATE_DATA); - fdm->cpu_state_data.source_address = 0; - fdm->cpu_state_data.source_len = cpu_to_be64(fw_dump.cpu_state_data_size); - fdm->cpu_state_data.destination_address = cpu_to_be64(addr); - addr += fw_dump.cpu_state_data_size; - - /* hpte region section */ - fdm->hpte_region.request_flag = cpu_to_be32(RTAS_FADUMP_REQUEST_FLAG); - fdm->hpte_region.source_data_type = cpu_to_be16(RTAS_FADUMP_HPTE_REGION); - fdm->hpte_region.source_address = 0; - fdm->hpte_region.source_len = cpu_to_be64(fw_dump.hpte_region_size); - fdm->hpte_region.destination_address = cpu_to_be64(addr); - addr += fw_dump.hpte_region_size; - - /* RMA region section */ - fdm->rmr_region.request_flag = cpu_to_be32(RTAS_FADUMP_REQUEST_FLAG); - fdm->rmr_region.source_data_type = cpu_to_be16(RTAS_FADUMP_REAL_MODE_REGION); - fdm->rmr_region.source_address = cpu_to_be64(RMA_START); - fdm->rmr_region.source_len = cpu_to_be64(fw_dump.boot_memory_size); - fdm->rmr_region.destination_address = cpu_to_be64(addr); - addr += fw_dump.boot_memory_size; - - return addr; -} - /** * fadump_calculate_reserve_size(): reserve variable boot area 5% of System RAM * @@ -546,61 +490,6 @@ static int __init early_fadump_reserve_mem(char *p) } early_param("fadump_reserve_mem", early_fadump_reserve_mem); -static int register_fw_dump(struct rtas_fadump_mem_struct *fdm) -{ - int rc, err; - unsigned int wait_time; - - pr_debug("Registering for firmware-assisted kernel dump...\n"); - - /* TODO: Add upper time limit for the delay */ - do { - rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL, - FADUMP_REGISTER, fdm, - sizeof(struct rtas_fadump_mem_struct)); - - wait_time = rtas_busy_delay_time(rc); - if (wait_time) - mdelay(wait_time); - - } while (wait_time); - - err = -EIO; - switch (rc) { - default: - pr_err("Failed to register. Unknown Error(%d).\n", rc); - break; - case -1: - printk(KERN_ERR "Failed to register firmware-assisted kernel" - " dump. Hardware Error(%d).\n", rc); - break; - case -3: - if (!is_fadump_boot_mem_contiguous()) - pr_err("Can't have holes in boot memory area while registering fadump\n"); - else if (!is_fadump_reserved_mem_contiguous()) - pr_err("Can't have holes in reserved memory area while" - " registering fadump\n"); - - printk(KERN_ERR "Failed to register firmware-assisted kernel" - " dump. Parameter Error(%d).\n", rc); - err = -EINVAL; - break; - case -9: - printk(KERN_ERR "firmware-assisted kernel dump is already " - " registered."); - fw_dump.dump_registered = 1; - err = -EEXIST; - break; - case 0: - printk(KERN_INFO "firmware-assisted kernel dump registration" - " is successful\n"); - fw_dump.dump_registered = 1; - err = 0; - break; - } - return err; -} - void crash_fadump(struct pt_regs *regs, const char *str) { struct fadump_crash_info_header *fdh = NULL; @@ -643,8 +532,7 @@ void crash_fadump(struct pt_regs *regs, const char *str) fdh->online_mask = *cpu_online_mask; - /* Call ibm,os-term rtas call to trigger firmware assisted dump */ - rtas_os_term((char *)str); + fw_dump.ops->fadump_trigger(fdh, str); } #define GPR_MASK 0xffffff0000000000 @@ -1129,7 +1017,7 @@ static int fadump_setup_crash_memory_ranges(void) static inline unsigned long fadump_relocate(unsigned long paddr) { if (paddr > RMA_START && paddr < fw_dump.boot_memory_size) - return be64_to_cpu(fdm.rmr_region.destination_address) + paddr; + return fw_dump.boot_mem_dest_addr + paddr; else return paddr; } @@ -1202,7 +1090,7 @@ static int fadump_create_elfcore_headers(char *bufp) * to the specified destination_address. Hence set * the correct offset. */ - phdr->p_offset = be64_to_cpu(fdm.rmr_region.destination_address); + phdr->p_offset = fw_dump.boot_mem_dest_addr; } phdr->p_paddr = mbase; @@ -1254,7 +1142,8 @@ static int register_fadump(void) if (ret) return ret; - addr = be64_to_cpu(fdm.rmr_region.destination_address) + be64_to_cpu(fdm.rmr_region.source_len); + addr = fw_dump.fadumphdr_addr; + /* Initialize fadump crash info header. */ addr = init_fadump_header(addr); vaddr = __va(addr); @@ -1263,34 +1152,8 @@ static int register_fadump(void) fadump_create_elfcore_headers(vaddr); /* register the future kernel dump with firmware. */ - return register_fw_dump(&fdm); -} - -static int fadump_unregister_dump(struct rtas_fadump_mem_struct *fdm) -{ - int rc = 0; - unsigned int wait_time; - - pr_debug("Un-register firmware-assisted dump\n"); - - /* TODO: Add upper time limit for the delay */ - do { - rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL, - FADUMP_UNREGISTER, fdm, - sizeof(struct rtas_fadump_mem_struct)); - - wait_time = rtas_busy_delay_time(rc); - if (wait_time) - mdelay(wait_time); - } while (wait_time); - - if (rc) { - printk(KERN_ERR "Failed to un-register firmware-assisted dump." - " unexpected error(%d).\n", rc); - return rc; - } - fw_dump.dump_registered = 0; - return 0; + pr_debug("Registering for firmware-assisted kernel dump...\n"); + return fw_dump.ops->fadump_register(&fw_dump); } static int fadump_invalidate_dump(const struct rtas_fadump_mem_struct *fdm) @@ -1328,7 +1191,7 @@ void fadump_cleanup(void) fadump_invalidate_dump(fdm_active); } else if (fw_dump.dump_registered) { /* Un-register Firmware-assisted dump if it was registered. */ - fadump_unregister_dump(&fdm); + fw_dump.ops->fadump_unregister(&fw_dump); free_crash_memory_ranges(); } } @@ -1433,7 +1296,7 @@ static void fadump_invalidate_release_mem(void) fadump_free_cpu_notes_buf(); /* Initialize the kernel dump memory structure for FAD registration. */ - init_fadump_mem_struct(&fdm, fw_dump.reserve_dump_area_start); + fw_dump.ops->fadump_init_mem_struct(&fw_dump); } static ssize_t fadump_release_memory_store(struct kobject *kobj, @@ -1498,12 +1361,13 @@ static ssize_t fadump_register_store(struct kobject *kobj, goto unlock_out; } /* Un-register Firmware-assisted dump */ - fadump_unregister_dump(&fdm); + pr_debug("Un-register firmware-assisted dump\n"); + fw_dump.ops->fadump_unregister(&fw_dump); break; case 1: if (fw_dump.dump_registered == 1) { /* Un-register Firmware-assisted dump */ - fadump_unregister_dump(&fdm); + fw_dump.ops->fadump_unregister(&fw_dump); } /* Register Firmware-assisted dump */ ret = register_fadump(); @@ -1530,7 +1394,8 @@ static int fadump_region_show(struct seq_file *m, void *private) fdm_ptr = fdm_active; else { mutex_unlock(&fadump_mutex); - fdm_ptr = &fdm; + fw_dump.ops->fadump_region_show(&fw_dump, m); + return 0; } seq_printf(m, @@ -1651,7 +1516,7 @@ int __init setup_fadump(void) } /* Initialize the kernel dump memory structure for FAD registration. */ else if (fw_dump.reserve_dump_area_size) - init_fadump_mem_struct(&fdm, fw_dump.reserve_dump_area_start); + fw_dump.ops->fadump_init_mem_struct(&fw_dump); fadump_init_files(); return 1; -- cgit v1.2.3 From 8255da95e54519bb74638c2448ac17f4b34fe6f5 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:19:28 +0530 Subject: powerpc/fadump: release all the memory above boot memory size Except for Reserved dump area (see Documentation/powerpc/firmware- assisted-dump.rst) which is permanent reserved, all memory above boot memory size, where boot memory size is the memory required for the kernel to boot successfully when booted with restricted memory (memory for capture kernel), is released when the dump is invalidated. Make this a bit more explicit in the code. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821336092.5656.1079046285366041687.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 34 ++++++++++------------------------ 1 file changed, 10 insertions(+), 24 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 3cf621080427..bff909151b44 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -391,6 +391,8 @@ int __init fadump_reserve_mem(void) else memory_boundary = memblock_end_of_DRAM(); + size = get_fadump_area_size(); + fw_dump.reserve_dump_area_size = size; if (fw_dump.dump_active) { pr_info("Firmware-assisted dump is active.\n"); @@ -416,11 +418,14 @@ int __init fadump_reserve_mem(void) be64_to_cpu(fdm_active->rmr_region.destination_address) + be64_to_cpu(fdm_active->rmr_region.source_len); pr_debug("fadumphdr_addr = %pa\n", &fw_dump.fadumphdr_addr); - fw_dump.reserve_dump_area_start = base; - fw_dump.reserve_dump_area_size = size; - } else { - size = get_fadump_area_size(); + /* + * Start address of reserve dump area (permanent reservation) + * for re-registering FADump after dump capture. + */ + fw_dump.reserve_dump_area_start = + be64_to_cpu(fdm_active->cpu_state_data.destination_address); + } else { /* * Reserve memory at an offset closer to bottom of the RAM to * minimize the impact of memory hot-remove operation. We can't @@ -447,7 +452,6 @@ int __init fadump_reserve_mem(void) (unsigned long)(memblock_phys_mem_size() >> 20)); fw_dump.reserve_dump_area_start = base; - fw_dump.reserve_dump_area_size = size; return fadump_cma_init(); } return 1; @@ -1265,34 +1269,16 @@ static void fadump_release_memory(unsigned long begin, unsigned long end) static void fadump_invalidate_release_mem(void) { - unsigned long reserved_area_start, reserved_area_end; - unsigned long destination_address; - mutex_lock(&fadump_mutex); if (!fw_dump.dump_active) { mutex_unlock(&fadump_mutex); return; } - destination_address = be64_to_cpu(fdm_active->cpu_state_data.destination_address); fadump_cleanup(); mutex_unlock(&fadump_mutex); - /* - * Save the current reserved memory bounds we will require them - * later for releasing the memory for general use. - */ - reserved_area_start = fw_dump.reserve_dump_area_start; - reserved_area_end = reserved_area_start + - fw_dump.reserve_dump_area_size; - /* - * Setup reserve_dump_area_start and its size so that we can - * reuse this reserved memory for Re-registration. - */ - fw_dump.reserve_dump_area_start = destination_address; - fw_dump.reserve_dump_area_size = get_fadump_area_size(); - - fadump_release_memory(reserved_area_start, reserved_area_end); + fadump_release_memory(fw_dump.boot_memory_size, memblock_end_of_DRAM()); fadump_free_cpu_notes_buf(); /* Initialize the kernel dump memory structure for FAD registration. */ -- cgit v1.2.3 From f35120115b767c49ad8de56dd78c86540a14df5b Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:19:44 +0530 Subject: pseries/fadump: move out platform specific support from generic code Move code that supports processing the crash'ed kernel's memory preserved by firmware to platform specific callback functions. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821337690.5656.13050665924800177744.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 329 ++----------------------------------------- 1 file changed, 14 insertions(+), 315 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index bff909151b44..9d9f7c384a71 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -28,15 +28,11 @@ #include #include #include -#include #include #include #include -#include "../platforms/pseries/rtas-fadump.h" - static struct fw_dump fw_dump; -static const struct rtas_fadump_mem_struct *fdm_active; static DEFINE_MUTEX(fadump_mutex); struct fad_crash_memory_ranges *crash_memory_ranges; @@ -108,22 +104,13 @@ static int __init fadump_cma_init(void) { return 1; } #endif /* CONFIG_CMA */ /* Scan the Firmware Assisted dump configuration details. */ -int __init early_init_dt_scan_fw_dump(unsigned long node, - const char *uname, int depth, void *data) +int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname, + int depth, void *data) { if (depth != 1 || strcmp(uname, "rtas") != 0) return 0; rtas_fadump_dt_scan(&fw_dump, node); - - /* - * The 'ibm,kernel-dump' rtas node is present only if there is - * dump data waiting for us. - */ - fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL); - if (fdm_active) - fw_dump.dump_active = 1; - return 1; } @@ -358,9 +345,7 @@ int __init fadump_reserve_mem(void) * If dump is active then we have already calculated the size during * first kernel. */ - if (fdm_active) - fw_dump.boot_memory_size = be64_to_cpu(fdm_active->rmr_region.source_len); - else { + if (!fw_dump.dump_active) { fw_dump.boot_memory_size = fadump_calculate_reserve_size(); #ifdef CONFIG_CMA if (!fw_dump.nocma) @@ -414,17 +399,9 @@ int __init fadump_reserve_mem(void) size = memory_boundary - base; fadump_reserve_crash_area(base, size); - fw_dump.fadumphdr_addr = - be64_to_cpu(fdm_active->rmr_region.destination_address) + - be64_to_cpu(fdm_active->rmr_region.source_len); - pr_debug("fadumphdr_addr = %pa\n", &fw_dump.fadumphdr_addr); - - /* - * Start address of reserve dump area (permanent reservation) - * for re-registering FADump after dump capture. - */ - fw_dump.reserve_dump_area_start = - be64_to_cpu(fdm_active->cpu_state_data.destination_address); + pr_debug("fadumphdr_addr = %#016lx\n", fw_dump.fadumphdr_addr); + pr_debug("Reserve dump area start address: 0x%lx\n", + fw_dump.reserve_dump_area_start); } else { /* * Reserve memory at an offset closer to bottom of the RAM to @@ -539,66 +516,6 @@ void crash_fadump(struct pt_regs *regs, const char *str) fw_dump.ops->fadump_trigger(fdh, str); } -#define GPR_MASK 0xffffff0000000000 -static inline int fadump_gpr_index(u64 id) -{ - int i = -1; - char str[3]; - - if ((id & GPR_MASK) == fadump_str_to_u64("GPR")) { - /* get the digits at the end */ - id &= ~GPR_MASK; - id >>= 24; - str[2] = '\0'; - str[1] = id & 0xff; - str[0] = (id >> 8) & 0xff; - sscanf(str, "%d", &i); - if (i > 31) - i = -1; - } - return i; -} - -static inline void fadump_set_regval(struct pt_regs *regs, u64 reg_id, - u64 reg_val) -{ - int i; - - i = fadump_gpr_index(reg_id); - if (i >= 0) - regs->gpr[i] = (unsigned long)reg_val; - else if (reg_id == fadump_str_to_u64("NIA")) - regs->nip = (unsigned long)reg_val; - else if (reg_id == fadump_str_to_u64("MSR")) - regs->msr = (unsigned long)reg_val; - else if (reg_id == fadump_str_to_u64("CTR")) - regs->ctr = (unsigned long)reg_val; - else if (reg_id == fadump_str_to_u64("LR")) - regs->link = (unsigned long)reg_val; - else if (reg_id == fadump_str_to_u64("XER")) - regs->xer = (unsigned long)reg_val; - else if (reg_id == fadump_str_to_u64("CR")) - regs->ccr = (unsigned long)reg_val; - else if (reg_id == fadump_str_to_u64("DAR")) - regs->dar = (unsigned long)reg_val; - else if (reg_id == fadump_str_to_u64("DSISR")) - regs->dsisr = (unsigned long)reg_val; -} - -static struct rtas_fadump_reg_entry* -fadump_read_registers(struct rtas_fadump_reg_entry *reg_entry, struct pt_regs *regs) -{ - memset(regs, 0, sizeof(struct pt_regs)); - - while (be64_to_cpu(reg_entry->reg_id) != fadump_str_to_u64("CPUEND")) { - fadump_set_regval(regs, be64_to_cpu(reg_entry->reg_id), - be64_to_cpu(reg_entry->reg_value)); - reg_entry++; - } - reg_entry++; - return reg_entry; -} - u32 *fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs) { struct elf_prstatus prstatus; @@ -686,147 +603,6 @@ void fadump_free_cpu_notes_buf(void) fw_dump.cpu_notes_buf_size = 0; } -/* - * Read CPU state dump data and convert it into ELF notes. - * The CPU dump starts with magic number "REGSAVE". NumCpusOffset should be - * used to access the data to allow for additional fields to be added without - * affecting compatibility. Each list of registers for a CPU starts with - * "CPUSTRT" and ends with "CPUEND". Each register entry is of 16 bytes, - * 8 Byte ASCII identifier and 8 Byte register value. The register entry - * with identifier "CPUSTRT" and "CPUEND" contains 4 byte cpu id as part - * of register value. For more details refer to PAPR document. - * - * Only for the crashing cpu we ignore the CPU dump data and get exact - * state from fadump crash info structure populated by first kernel at the - * time of crash. - */ -static int __init fadump_build_cpu_notes(const struct rtas_fadump_mem_struct *fdm) -{ - struct rtas_fadump_reg_save_area_header *reg_header; - struct rtas_fadump_reg_entry *reg_entry; - struct fadump_crash_info_header *fdh = NULL; - void *vaddr; - unsigned long addr; - u32 num_cpus, *note_buf; - struct pt_regs regs; - int i, rc = 0, cpu = 0; - - if (!fdm->cpu_state_data.bytes_dumped) - return -EINVAL; - - addr = be64_to_cpu(fdm->cpu_state_data.destination_address); - vaddr = __va(addr); - - reg_header = vaddr; - if (be64_to_cpu(reg_header->magic_number) != - fadump_str_to_u64("REGSAVE")) { - printk(KERN_ERR "Unable to read register save area.\n"); - return -ENOENT; - } - pr_debug("--------CPU State Data------------\n"); - pr_debug("Magic Number: %llx\n", be64_to_cpu(reg_header->magic_number)); - pr_debug("NumCpuOffset: %x\n", be32_to_cpu(reg_header->num_cpu_offset)); - - vaddr += be32_to_cpu(reg_header->num_cpu_offset); - num_cpus = be32_to_cpu(*((__be32 *)(vaddr))); - pr_debug("NumCpus : %u\n", num_cpus); - vaddr += sizeof(u32); - reg_entry = (struct rtas_fadump_reg_entry *)vaddr; - - rc = fadump_setup_cpu_notes_buf(num_cpus); - if (rc != 0) - return rc; - - note_buf = (u32 *)fw_dump.cpu_notes_buf_vaddr; - - if (fw_dump.fadumphdr_addr) - fdh = __va(fw_dump.fadumphdr_addr); - - for (i = 0; i < num_cpus; i++) { - if (be64_to_cpu(reg_entry->reg_id) != fadump_str_to_u64("CPUSTRT")) { - printk(KERN_ERR "Unable to read CPU state data\n"); - rc = -ENOENT; - goto error_out; - } - /* Lower 4 bytes of reg_value contains logical cpu id */ - cpu = be64_to_cpu(reg_entry->reg_value) & RTAS_FADUMP_CPU_ID_MASK; - if (fdh && !cpumask_test_cpu(cpu, &fdh->online_mask)) { - RTAS_FADUMP_SKIP_TO_NEXT_CPU(reg_entry); - continue; - } - pr_debug("Reading register data for cpu %d...\n", cpu); - if (fdh && fdh->crashing_cpu == cpu) { - regs = fdh->regs; - note_buf = fadump_regs_to_elf_notes(note_buf, ®s); - RTAS_FADUMP_SKIP_TO_NEXT_CPU(reg_entry); - } else { - reg_entry++; - reg_entry = fadump_read_registers(reg_entry, ®s); - note_buf = fadump_regs_to_elf_notes(note_buf, ®s); - } - } - final_note(note_buf); - - if (fdh) { - pr_debug("Updating elfcore header (%llx) with cpu notes\n", - fdh->elfcorehdr_addr); - fadump_update_elfcore_header((char *)__va(fdh->elfcorehdr_addr)); - } - return 0; - -error_out: - fadump_free_cpu_notes_buf(); - return rc; - -} - -/* - * Validate and process the dump data stored by firmware before exporting - * it through '/proc/vmcore'. - */ -static int __init process_fadump(const struct rtas_fadump_mem_struct *fdm_active) -{ - struct fadump_crash_info_header *fdh; - int rc = 0; - - if (!fdm_active || !fw_dump.fadumphdr_addr) - return -EINVAL; - - /* Check if the dump data is valid. */ - if ((be16_to_cpu(fdm_active->header.dump_status_flag) == RTAS_FADUMP_ERROR_FLAG) || - (fdm_active->cpu_state_data.error_flags != 0) || - (fdm_active->rmr_region.error_flags != 0)) { - printk(KERN_ERR "Dump taken by platform is not valid\n"); - return -EINVAL; - } - if ((fdm_active->rmr_region.bytes_dumped != - fdm_active->rmr_region.source_len) || - !fdm_active->cpu_state_data.bytes_dumped) { - printk(KERN_ERR "Dump taken by platform is incomplete\n"); - return -EINVAL; - } - - /* Validate the fadump crash info header */ - fdh = __va(fw_dump.fadumphdr_addr); - if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) { - printk(KERN_ERR "Crash info header is not valid.\n"); - return -EINVAL; - } - - rc = fadump_build_cpu_notes(fdm_active); - if (rc) - return rc; - - /* - * We are done validating dump info and elfcore header is now ready - * to be exported. set elfcorehdr_addr so that vmcore module will - * export the elfcore header through '/proc/vmcore'. - */ - elfcorehdr_addr = fdh->elfcorehdr_addr; - - return 0; -} - static void free_crash_memory_ranges(void) { kfree(crash_memory_ranges); @@ -1116,7 +892,6 @@ static unsigned long init_fadump_header(unsigned long addr) if (!addr) return 0; - fw_dump.fadumphdr_addr = addr; fdh = __va(addr); addr += sizeof(struct fadump_crash_info_header); @@ -1160,39 +935,12 @@ static int register_fadump(void) return fw_dump.ops->fadump_register(&fw_dump); } -static int fadump_invalidate_dump(const struct rtas_fadump_mem_struct *fdm) -{ - int rc = 0; - unsigned int wait_time; - - pr_debug("Invalidating firmware-assisted dump registration\n"); - - /* TODO: Add upper time limit for the delay */ - do { - rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL, - FADUMP_INVALIDATE, fdm, - sizeof(struct rtas_fadump_mem_struct)); - - wait_time = rtas_busy_delay_time(rc); - if (wait_time) - mdelay(wait_time); - } while (wait_time); - - if (rc) { - pr_err("Failed to invalidate firmware-assisted dump registration. Unexpected error (%d).\n", rc); - return rc; - } - fw_dump.dump_active = 0; - fdm_active = NULL; - return 0; -} - void fadump_cleanup(void) { /* Invalidate the registration only if dump is active. */ if (fw_dump.dump_active) { - /* pass the same memory dump structure provided by platform */ - fadump_invalidate_dump(fdm_active); + pr_debug("Invalidating firmware-assisted dump registration\n"); + fw_dump.ops->fadump_invalidate(&fw_dump); } else if (fw_dump.dump_registered) { /* Un-register Firmware-assisted dump if it was registered. */ fw_dump.ops->fadump_unregister(&fw_dump); @@ -1333,7 +1081,7 @@ static ssize_t fadump_register_store(struct kobject *kobj, int ret = 0; int input = -1; - if (!fw_dump.fadump_enabled || fdm_active) + if (!fw_dump.fadump_enabled || fw_dump.dump_active) return -EPERM; if (kstrtoint(buf, 0, &input)) @@ -1346,6 +1094,7 @@ static ssize_t fadump_register_store(struct kobject *kobj, if (fw_dump.dump_registered == 0) { goto unlock_out; } + /* Un-register Firmware-assisted dump */ pr_debug("Un-register firmware-assisted dump\n"); fw_dump.ops->fadump_unregister(&fw_dump); @@ -1370,63 +1119,12 @@ unlock_out: static int fadump_region_show(struct seq_file *m, void *private) { - const struct rtas_fadump_mem_struct *fdm_ptr; - if (!fw_dump.fadump_enabled) return 0; mutex_lock(&fadump_mutex); - if (fdm_active) - fdm_ptr = fdm_active; - else { - mutex_unlock(&fadump_mutex); - fw_dump.ops->fadump_region_show(&fw_dump, m); - return 0; - } - - seq_printf(m, - "CPU : [%#016llx-%#016llx] %#llx bytes, " - "Dumped: %#llx\n", - be64_to_cpu(fdm_ptr->cpu_state_data.destination_address), - be64_to_cpu(fdm_ptr->cpu_state_data.destination_address) + - be64_to_cpu(fdm_ptr->cpu_state_data.source_len) - 1, - be64_to_cpu(fdm_ptr->cpu_state_data.source_len), - be64_to_cpu(fdm_ptr->cpu_state_data.bytes_dumped)); - seq_printf(m, - "HPTE: [%#016llx-%#016llx] %#llx bytes, " - "Dumped: %#llx\n", - be64_to_cpu(fdm_ptr->hpte_region.destination_address), - be64_to_cpu(fdm_ptr->hpte_region.destination_address) + - be64_to_cpu(fdm_ptr->hpte_region.source_len) - 1, - be64_to_cpu(fdm_ptr->hpte_region.source_len), - be64_to_cpu(fdm_ptr->hpte_region.bytes_dumped)); - seq_printf(m, - "DUMP: [%#016llx-%#016llx] %#llx bytes, " - "Dumped: %#llx\n", - be64_to_cpu(fdm_ptr->rmr_region.destination_address), - be64_to_cpu(fdm_ptr->rmr_region.destination_address) + - be64_to_cpu(fdm_ptr->rmr_region.source_len) - 1, - be64_to_cpu(fdm_ptr->rmr_region.source_len), - be64_to_cpu(fdm_ptr->rmr_region.bytes_dumped)); - - if (!fdm_active || - (fw_dump.reserve_dump_area_start == - be64_to_cpu(fdm_ptr->cpu_state_data.destination_address))) - goto out; - - /* Dump is active. Show reserved memory region. */ - seq_printf(m, - " : [%#016llx-%#016llx] %#llx bytes, " - "Dumped: %#llx\n", - (unsigned long long)fw_dump.reserve_dump_area_start, - be64_to_cpu(fdm_ptr->cpu_state_data.destination_address) - 1, - be64_to_cpu(fdm_ptr->cpu_state_data.destination_address) - - fw_dump.reserve_dump_area_start, - be64_to_cpu(fdm_ptr->cpu_state_data.destination_address) - - fw_dump.reserve_dump_area_start); -out: - if (fdm_active) - mutex_unlock(&fadump_mutex); + fw_dump.ops->fadump_region_show(&fw_dump, m); + mutex_unlock(&fadump_mutex); return 0; } @@ -1497,12 +1195,13 @@ int __init setup_fadump(void) * if dump process fails then invalidate the registration * and release memory before proceeding for re-registration. */ - if (process_fadump(fdm_active) < 0) + if (fw_dump.ops->fadump_process(&fw_dump) < 0) fadump_invalidate_release_mem(); } /* Initialize the kernel dump memory structure for FAD registration. */ else if (fw_dump.reserve_dump_area_size) fw_dump.ops->fadump_init_mem_struct(&fw_dump); + fadump_init_files(); return 1; -- cgit v1.2.3 From 41df5928721ff4b5f83767cd5e8b77862fc62bb3 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:20:26 +0530 Subject: powerpc/fadump: add fadump support on powernv Add basic callback functions for FADump on PowerNV platform. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821342072.5656.4346362203141486452.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 9d9f7c384a71..b17673d8d50b 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -107,11 +107,20 @@ static int __init fadump_cma_init(void) { return 1; } int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname, int depth, void *data) { - if (depth != 1 || strcmp(uname, "rtas") != 0) + if (depth != 1) return 0; - rtas_fadump_dt_scan(&fw_dump, node); - return 1; + if (strcmp(uname, "rtas") == 0) { + rtas_fadump_dt_scan(&fw_dump, node); + return 1; + } + + if (strcmp(uname, "ibm,opal") == 0) { + opal_fadump_dt_scan(&fw_dump, node); + return 1; + } + + return 0; } /* -- cgit v1.2.3 From 6abec12c65e8870d8cafe154a86240fe0bcdd4f7 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:20:41 +0530 Subject: powerpc/fadump: improve fadump_reserve_mem() Some code clean-up like using minimal assignments and updating printk messages. Also, add an 'error_out' label for handling error cleanup at one place. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821343485.5656.10202857091553646948.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 48 +++++++++++++++++++++++--------------------- 1 file changed, 25 insertions(+), 23 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index b17673d8d50b..7d47d4bb7d6e 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -338,16 +338,15 @@ static void __init fadump_reserve_crash_area(unsigned long base, int __init fadump_reserve_mem(void) { - unsigned long base, size, memory_boundary; + u64 base, size, mem_boundary; + int ret = 1; if (!fw_dump.fadump_enabled) return 0; if (!fw_dump.fadump_supported) { - printk(KERN_INFO "Firmware-assisted dump is not supported on" - " this hardware\n"); - fw_dump.fadump_enabled = 0; - return 0; + pr_info("Firmware-Assisted Dump is not supported on this hardware\n"); + goto error_out; } /* * Initialize boot memory size @@ -355,7 +354,8 @@ int __init fadump_reserve_mem(void) * first kernel. */ if (!fw_dump.dump_active) { - fw_dump.boot_memory_size = fadump_calculate_reserve_size(); + fw_dump.boot_memory_size = + PAGE_ALIGN(fadump_calculate_reserve_size()); #ifdef CONFIG_CMA if (!fw_dump.nocma) fw_dump.boot_memory_size = @@ -381,10 +381,11 @@ int __init fadump_reserve_mem(void) " dump, now %#016llx\n", memory_limit); } if (memory_limit) - memory_boundary = memory_limit; + mem_boundary = memory_limit; else - memory_boundary = memblock_end_of_DRAM(); + mem_boundary = memblock_end_of_DRAM(); + base = fw_dump.boot_memory_size; size = get_fadump_area_size(); fw_dump.reserve_dump_area_size = size; if (fw_dump.dump_active) { @@ -404,8 +405,7 @@ int __init fadump_reserve_mem(void) * dump is written to disk by userspace tool. This memory * will be released for general use once the dump is saved. */ - base = fw_dump.boot_memory_size; - size = memory_boundary - base; + size = mem_boundary - base; fadump_reserve_crash_area(base, size); pr_debug("fadumphdr_addr = %#016lx\n", fw_dump.fadumphdr_addr); @@ -418,29 +418,31 @@ int __init fadump_reserve_mem(void) * use memblock_find_in_range() here since it doesn't allocate * from bottom to top. */ - for (base = fw_dump.boot_memory_size; - base <= (memory_boundary - size); - base += size) { + while (base <= (mem_boundary - size)) { if (memblock_is_region_memory(base, size) && !memblock_is_region_reserved(base, size)) break; + + base += size; } - if ((base > (memory_boundary - size)) || + + if ((base > (mem_boundary - size)) || memblock_reserve(base, size)) { - pr_err("Failed to reserve memory\n"); - return 0; + pr_err("Failed to reserve memory!\n"); + goto error_out; } - pr_info("Reserved %ldMB of memory at %ldMB for firmware-" - "assisted dump (System RAM: %ldMB)\n", - (unsigned long)(size >> 20), - (unsigned long)(base >> 20), - (unsigned long)(memblock_phys_mem_size() >> 20)); + pr_info("Reserved %lldMB of memory at %#016llx (System RAM: %lldMB)\n", + (size >> 20), base, (memblock_phys_mem_size() >> 20)); fw_dump.reserve_dump_area_start = base; - return fadump_cma_init(); + ret = fadump_cma_init(); } - return 1; + + return ret; +error_out: + fw_dump.fadump_enabled = 0; + return 0; } unsigned long __init arch_reserved_kernel_pages(void) -- cgit v1.2.3 From 742a265accd3e3afcc8e7b17f409c93c1de8be85 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:20:57 +0530 Subject: powerpc/fadump: register kernel metadata address with opal OPAL allows registering address with it in the first kernel and retrieving it after MPIPL. Setup kernel metadata and register its address with OPAL to use it for processing the crash dump. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821345011.5656.13567765019032928471.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 7d47d4bb7d6e..7e7056382d98 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -313,6 +313,10 @@ static unsigned long get_fadump_area_size(void) size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2); size = PAGE_ALIGN(size); + + /* This is to hold kernel metadata on platforms that support it */ + size += (fw_dump.ops->fadump_get_metadata_size ? + fw_dump.ops->fadump_get_metadata_size() : 0); return size; } @@ -348,6 +352,7 @@ int __init fadump_reserve_mem(void) pr_info("Firmware-Assisted Dump is not supported on this hardware\n"); goto error_out; } + /* * Initialize boot memory size * If dump is active then we have already calculated the size during @@ -426,8 +431,21 @@ int __init fadump_reserve_mem(void) base += size; } - if ((base > (mem_boundary - size)) || - memblock_reserve(base, size)) { + if (base > (mem_boundary - size)) { + pr_err("Failed to find memory chunk for reservation!\n"); + goto error_out; + } + fw_dump.reserve_dump_area_start = base; + + /* + * Calculate the kernel metadata address and register it with + * f/w if the platform supports. + */ + if (fw_dump.ops->fadump_setup_metadata && + (fw_dump.ops->fadump_setup_metadata(&fw_dump) < 0)) + goto error_out; + + if (memblock_reserve(base, size)) { pr_err("Failed to reserve memory!\n"); goto error_out; } @@ -435,7 +453,6 @@ int __init fadump_reserve_mem(void) pr_info("Reserved %lldMB of memory at %#016llx (System RAM: %lldMB)\n", (size >> 20), base, (memblock_phys_mem_size() >> 20)); - fw_dump.reserve_dump_area_start = base; ret = fadump_cma_init(); } -- cgit v1.2.3 From 2790d01d1e1d22735d848eec55668f7d44417e22 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:21:16 +0530 Subject: powerpc/fadump: reset metadata address during clean up During kexec boot, metadata address needs to be reset to avoid running into errors interpreting stale metadata address, in case the kexec'ed kernel crashes before metadata address could be setup again. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821346629.5656.10783321582005237813.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 7e7056382d98..aab9b4db0363 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -965,6 +965,9 @@ static int register_fadump(void) void fadump_cleanup(void) { + if (!fw_dump.fadump_supported) + return; + /* Invalidate the registration only if dump is active. */ if (fw_dump.dump_active) { pr_debug("Invalidating firmware-assisted dump registration\n"); @@ -974,6 +977,9 @@ void fadump_cleanup(void) fw_dump.ops->fadump_unregister(&fw_dump); free_crash_memory_ranges(); } + + if (fw_dump.ops->fadump_cleanup) + fw_dump.ops->fadump_cleanup(&fw_dump); } static void fadump_free_reserved_memory(unsigned long start_pfn, -- cgit v1.2.3 From a4e2e2ca2f7bddf6d5d788033cc56f40af6e9c5a Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:23:28 +0530 Subject: powerpc/fadump: handle invalidation of crashdump and re-registraion Make OPAL call to indicate that the dump is processed and the metadata area in OPAL can be cleared/released. Also, setup/initialize FADump for re-registration. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821356046.5656.12270927048195494911.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index aab9b4db0363..852ac4761e90 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1063,7 +1063,13 @@ static void fadump_invalidate_release_mem(void) fadump_release_memory(fw_dump.boot_memory_size, memblock_end_of_DRAM()); fadump_free_cpu_notes_buf(); - /* Initialize the kernel dump memory structure for FAD registration. */ + /* + * Setup kernel metadata and initialize the kernel dump + * memory structure for FADump re-registration. + */ + if (fw_dump.ops->fadump_setup_metadata && + (fw_dump.ops->fadump_setup_metadata(&fw_dump) < 0)) + pr_warn("Failed to setup kernel metadata!\n"); fw_dump.ops->fadump_init_mem_struct(&fw_dump); } -- cgit v1.2.3 From 579ca1a27675485a99da50cd7fedc14232f817c3 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:24:28 +0530 Subject: powerpc/fadump: make use of memblock's bottom up allocation mode Earlier, memblock_find_in_range() was not used to find the memory to be reserved for FADump as bottom up allocation mode was not supported. But since commit 79442ed189acb8b ("mm/memblock.c: introduce bottom-up allocation mode") bottom up allocation mode is supported for memblock. So, use it to find the memory to be reserved for FADump. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821364211.5656.14336025460336135194.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 852ac4761e90..da751402c649 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -342,7 +342,8 @@ static void __init fadump_reserve_crash_area(unsigned long base, int __init fadump_reserve_mem(void) { - u64 base, size, mem_boundary; + bool is_memblock_bottom_up = memblock_bottom_up(); + u64 base, size, mem_boundary, align = PAGE_SIZE; int ret = 1; if (!fw_dump.fadump_enabled) @@ -362,10 +363,11 @@ int __init fadump_reserve_mem(void) fw_dump.boot_memory_size = PAGE_ALIGN(fadump_calculate_reserve_size()); #ifdef CONFIG_CMA - if (!fw_dump.nocma) + if (!fw_dump.nocma) { + align = FADUMP_CMA_ALIGNMENT; fw_dump.boot_memory_size = - ALIGN(fw_dump.boot_memory_size, - FADUMP_CMA_ALIGNMENT); + ALIGN(fw_dump.boot_memory_size, align); + } #endif } @@ -419,19 +421,15 @@ int __init fadump_reserve_mem(void) } else { /* * Reserve memory at an offset closer to bottom of the RAM to - * minimize the impact of memory hot-remove operation. We can't - * use memblock_find_in_range() here since it doesn't allocate - * from bottom to top. + * minimize the impact of memory hot-remove operation. */ - while (base <= (mem_boundary - size)) { - if (memblock_is_region_memory(base, size) && - !memblock_is_region_reserved(base, size)) - break; + memblock_set_bottom_up(true); + base = memblock_find_in_range(base, mem_boundary, size, align); - base += size; - } + /* Restore the previous allocation mode */ + memblock_set_bottom_up(is_memblock_bottom_up); - if (base > (mem_boundary - size)) { + if (!base) { pr_err("Failed to find memory chunk for reservation!\n"); goto error_out; } -- cgit v1.2.3 From e4fc48fb4d34f7e7d42eb980a9c130bb93aba3b9 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:25:05 +0530 Subject: powerpc/fadump: make crash memory ranges array allocation generic Make allocate_crash_memory_ranges() and free_crash_memory_ranges() functions generic to reuse them for memory management of all types of dynamic memory range arrays. This change helps in memory management of reserved ranges array to be added later. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821369863.5656.4375667005352155892.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 113 +++++++++++++++++++++++-------------------- 1 file changed, 60 insertions(+), 53 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index da751402c649..f95ec1fd797a 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -35,10 +35,7 @@ static struct fw_dump fw_dump; static DEFINE_MUTEX(fadump_mutex); -struct fad_crash_memory_ranges *crash_memory_ranges; -int crash_memory_ranges_size; -int crash_mem_ranges; -int max_crash_mem_ranges; +struct fadump_mrange_info crash_mrange_info = { "crash", NULL, 0, 0, 0 }; #ifdef CONFIG_CMA static struct cma *fadump_cma; @@ -629,46 +626,48 @@ void fadump_free_cpu_notes_buf(void) fw_dump.cpu_notes_buf_size = 0; } -static void free_crash_memory_ranges(void) +static void fadump_free_mem_ranges(struct fadump_mrange_info *mrange_info) { - kfree(crash_memory_ranges); - crash_memory_ranges = NULL; - crash_memory_ranges_size = 0; - max_crash_mem_ranges = 0; + kfree(mrange_info->mem_ranges); + mrange_info->mem_ranges = NULL; + mrange_info->mem_ranges_sz = 0; + mrange_info->max_mem_ranges = 0; } /* - * Allocate or reallocate crash memory ranges array in incremental units + * Allocate or reallocate mem_ranges array in incremental units * of PAGE_SIZE. */ -static int allocate_crash_memory_ranges(void) +static int fadump_alloc_mem_ranges(struct fadump_mrange_info *mrange_info) { - struct fad_crash_memory_ranges *new_array; + struct fadump_memory_range *new_array; u64 new_size; - new_size = crash_memory_ranges_size + PAGE_SIZE; - pr_debug("Allocating %llu bytes of memory for crash memory ranges\n", - new_size); + new_size = mrange_info->mem_ranges_sz + PAGE_SIZE; + pr_debug("Allocating %llu bytes of memory for %s memory ranges\n", + new_size, mrange_info->name); - new_array = krealloc(crash_memory_ranges, new_size, GFP_KERNEL); + new_array = krealloc(mrange_info->mem_ranges, new_size, GFP_KERNEL); if (new_array == NULL) { - pr_err("Insufficient memory for setting up crash memory ranges\n"); - free_crash_memory_ranges(); + pr_err("Insufficient memory for setting up %s memory ranges\n", + mrange_info->name); + fadump_free_mem_ranges(mrange_info); return -ENOMEM; } - crash_memory_ranges = new_array; - crash_memory_ranges_size = new_size; - max_crash_mem_ranges = (new_size / - sizeof(struct fad_crash_memory_ranges)); + mrange_info->mem_ranges = new_array; + mrange_info->mem_ranges_sz = new_size; + mrange_info->max_mem_ranges = (new_size / + sizeof(struct fadump_memory_range)); return 0; } -static inline int fadump_add_crash_memory(unsigned long long base, - unsigned long long end) +static inline int fadump_add_mem_range(struct fadump_mrange_info *mrange_info, + u64 base, u64 end) { - u64 start, size; + struct fadump_memory_range *mem_ranges = mrange_info->mem_ranges; bool is_adjacent = false; + u64 start, size; if (base == end) return 0; @@ -677,38 +676,41 @@ static inline int fadump_add_crash_memory(unsigned long long base, * Fold adjacent memory ranges to bring down the memory ranges/ * PT_LOAD segments count. */ - if (crash_mem_ranges) { - start = crash_memory_ranges[crash_mem_ranges - 1].base; - size = crash_memory_ranges[crash_mem_ranges - 1].size; + if (mrange_info->mem_range_cnt) { + start = mem_ranges[mrange_info->mem_range_cnt - 1].base; + size = mem_ranges[mrange_info->mem_range_cnt - 1].size; if ((start + size) == base) is_adjacent = true; } if (!is_adjacent) { /* resize the array on reaching the limit */ - if (crash_mem_ranges == max_crash_mem_ranges) { + if (mrange_info->mem_range_cnt == mrange_info->max_mem_ranges) { int ret; - ret = allocate_crash_memory_ranges(); + ret = fadump_alloc_mem_ranges(mrange_info); if (ret) return ret; + + /* Update to the new resized array */ + mem_ranges = mrange_info->mem_ranges; } start = base; - crash_memory_ranges[crash_mem_ranges].base = start; - crash_mem_ranges++; + mem_ranges[mrange_info->mem_range_cnt].base = start; + mrange_info->mem_range_cnt++; } - crash_memory_ranges[crash_mem_ranges - 1].size = (end - start); - pr_debug("crash_memory_range[%d] [%#016llx-%#016llx], %#llx bytes\n", - (crash_mem_ranges - 1), start, end - 1, (end - start)); + mem_ranges[mrange_info->mem_range_cnt - 1].size = (end - start); + pr_debug("%s_memory_range[%d] [%#016llx-%#016llx], %#llx bytes\n", + mrange_info->name, (mrange_info->mem_range_cnt - 1), + start, end - 1, (end - start)); return 0; } -static int fadump_exclude_reserved_area(unsigned long long start, - unsigned long long end) +static int fadump_exclude_reserved_area(u64 start, u64 end) { - unsigned long long ra_start, ra_end; + u64 ra_start, ra_end; int ret = 0; ra_start = fw_dump.reserve_dump_area_start; @@ -716,18 +718,22 @@ static int fadump_exclude_reserved_area(unsigned long long start, if ((ra_start < end) && (ra_end > start)) { if ((start < ra_start) && (end > ra_end)) { - ret = fadump_add_crash_memory(start, ra_start); + ret = fadump_add_mem_range(&crash_mrange_info, + start, ra_start); if (ret) return ret; - ret = fadump_add_crash_memory(ra_end, end); + ret = fadump_add_mem_range(&crash_mrange_info, + ra_end, end); } else if (start < ra_start) { - ret = fadump_add_crash_memory(start, ra_start); + ret = fadump_add_mem_range(&crash_mrange_info, + start, ra_start); } else if (ra_end < end) { - ret = fadump_add_crash_memory(ra_end, end); + ret = fadump_add_mem_range(&crash_mrange_info, + ra_end, end); } } else - ret = fadump_add_crash_memory(start, end); + ret = fadump_add_mem_range(&crash_mrange_info, start, end); return ret; } @@ -772,11 +778,11 @@ static int fadump_init_elfcore_header(char *bufp) static int fadump_setup_crash_memory_ranges(void) { struct memblock_region *reg; - unsigned long long start, end; + u64 start, end; int ret; pr_debug("Setup crash memory ranges.\n"); - crash_mem_ranges = 0; + crash_mrange_info.mem_range_cnt = 0; /* * add the first memory chunk (RMA_START through boot_memory_size) as @@ -785,13 +791,14 @@ static int fadump_setup_crash_memory_ranges(void) * specified during fadump registration. We need to create a separate * program header for this chunk with the correct offset. */ - ret = fadump_add_crash_memory(RMA_START, fw_dump.boot_memory_size); + ret = fadump_add_mem_range(&crash_mrange_info, + RMA_START, fw_dump.boot_memory_size); if (ret) return ret; for_each_memblock(memory, reg) { - start = (unsigned long long)reg->base; - end = start + (unsigned long long)reg->size; + start = (u64)reg->base; + end = start + (u64)reg->size; /* * skip the first memory chunk that is already added (RMA_START @@ -876,11 +883,11 @@ static int fadump_create_elfcore_headers(char *bufp) /* setup PT_LOAD sections. */ - for (i = 0; i < crash_mem_ranges; i++) { - unsigned long long mbase, msize; - mbase = crash_memory_ranges[i].base; - msize = crash_memory_ranges[i].size; + for (i = 0; i < crash_mrange_info.mem_range_cnt; i++) { + u64 mbase, msize; + mbase = crash_mrange_info.mem_ranges[i].base; + msize = crash_mrange_info.mem_ranges[i].size; if (!msize) continue; @@ -973,7 +980,7 @@ void fadump_cleanup(void) } else if (fw_dump.dump_registered) { /* Un-register Firmware-assisted dump if it was registered. */ fw_dump.ops->fadump_unregister(&fw_dump); - free_crash_memory_ranges(); + fadump_free_mem_ranges(&crash_mrange_info); } if (fw_dump.ops->fadump_cleanup) -- cgit v1.2.3 From dda9dbfeeb7a855a75965b8ba7269f4edb35cde7 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:25:36 +0530 Subject: powerpc/fadump: consider reserved ranges while releasing memory Commit 0962e8004e97 ("powerpc/prom: Scan reserved-ranges node for memory reservations") enabled support to parse 'reserved-ranges' DT node to reserve kernel memory falling in these ranges for firmware purposes. Along with the preserved area memory, ensure memory in reserved ranges is not overlapped with memory released by capture kernel aftering saving vmcore. Also, fix the off-by-one error in fadump_release_reserved_area function while releasing memory. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821371358.5656.6061214942558818661.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 167 +++++++++++++++++++++++++++++++++++++------ 1 file changed, 146 insertions(+), 21 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index f95ec1fd797a..502e49ab4b98 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -36,6 +36,7 @@ static struct fw_dump fw_dump; static DEFINE_MUTEX(fadump_mutex); struct fadump_mrange_info crash_mrange_info = { "crash", NULL, 0, 0, 0 }; +struct fadump_mrange_info reserved_mrange_info = { "reserved", NULL, 0, 0, 0 }; #ifdef CONFIG_CMA static struct cma *fadump_cma; @@ -1009,49 +1010,173 @@ static void fadump_free_reserved_memory(unsigned long start_pfn, /* * Skip memory holes and free memory that was actually reserved. */ -static void fadump_release_reserved_area(unsigned long start, unsigned long end) +static void fadump_release_reserved_area(u64 start, u64 end) { + u64 tstart, tend, spfn, epfn; struct memblock_region *reg; - unsigned long tstart, tend; - unsigned long start_pfn = PHYS_PFN(start); - unsigned long end_pfn = PHYS_PFN(end); + spfn = PHYS_PFN(start); + epfn = PHYS_PFN(end); for_each_memblock(memory, reg) { - tstart = max(start_pfn, memblock_region_memory_base_pfn(reg)); - tend = min(end_pfn, memblock_region_memory_end_pfn(reg)); + tstart = max_t(u64, spfn, memblock_region_memory_base_pfn(reg)); + tend = min_t(u64, epfn, memblock_region_memory_end_pfn(reg)); if (tstart < tend) { fadump_free_reserved_memory(tstart, tend); - if (tend == end_pfn) + if (tend == epfn) break; - start_pfn = tend + 1; + spfn = tend; } } } /* - * Release the memory that was reserved in early boot to preserve the memory - * contents. The released memory will be available for general use. + * Sort the mem ranges in-place and merge adjacent ranges + * to minimize the memory ranges count. */ -static void fadump_release_memory(unsigned long begin, unsigned long end) +static void sort_and_merge_mem_ranges(struct fadump_mrange_info *mrange_info) { - unsigned long ra_start, ra_end; + struct fadump_memory_range *mem_ranges; + struct fadump_memory_range tmp_range; + u64 base, size; + int i, j, idx; + + if (!reserved_mrange_info.mem_range_cnt) + return; + + /* Sort the memory ranges */ + mem_ranges = mrange_info->mem_ranges; + for (i = 0; i < mrange_info->mem_range_cnt; i++) { + idx = i; + for (j = (i + 1); j < mrange_info->mem_range_cnt; j++) { + if (mem_ranges[idx].base > mem_ranges[j].base) + idx = j; + } + if (idx != i) { + tmp_range = mem_ranges[idx]; + mem_ranges[idx] = mem_ranges[i]; + mem_ranges[i] = tmp_range; + } + } + + /* Merge adjacent reserved ranges */ + idx = 0; + for (i = 1; i < mrange_info->mem_range_cnt; i++) { + base = mem_ranges[i-1].base; + size = mem_ranges[i-1].size; + if (mem_ranges[i].base == (base + size)) + mem_ranges[idx].size += mem_ranges[i].size; + else { + idx++; + if (i == idx) + continue; + + mem_ranges[idx] = mem_ranges[i]; + } + } + mrange_info->mem_range_cnt = idx + 1; +} + +/* + * Scan reserved-ranges to consider them while reserving/releasing + * memory for FADump. + */ +static inline int fadump_scan_reserved_mem_ranges(void) +{ + struct device_node *root; + const __be32 *prop; + int len, ret = -1; + unsigned long i; + + root = of_find_node_by_path("/"); + if (!root) + return ret; + + prop = of_get_property(root, "reserved-ranges", &len); + if (!prop) + return ret; + + /* + * Each reserved range is an (address,size) pair, 2 cells each, + * totalling 4 cells per range. + */ + for (i = 0; i < len / (sizeof(*prop) * 4); i++) { + u64 base, size; + + base = of_read_number(prop + (i * 4) + 0, 2); + size = of_read_number(prop + (i * 4) + 2, 2); + + if (size) { + ret = fadump_add_mem_range(&reserved_mrange_info, + base, base + size); + if (ret < 0) { + pr_warn("some reserved ranges are ignored!\n"); + break; + } + } + } + + return ret; +} + +/* + * Release the memory that was reserved during early boot to preserve the + * crash'ed kernel's memory contents except reserved dump area (permanent + * reservation) and reserved ranges used by F/W. The released memory will + * be available for general use. + */ +static void fadump_release_memory(u64 begin, u64 end) +{ + u64 ra_start, ra_end, tstart; + int i, ret; + + fadump_scan_reserved_mem_ranges(); ra_start = fw_dump.reserve_dump_area_start; ra_end = ra_start + fw_dump.reserve_dump_area_size; /* - * exclude the dump reserve area. Will reuse it for next - * fadump registration. + * Add reserved dump area to reserved ranges list + * and exclude all these ranges while releasing memory. */ - if (begin < ra_end && end > ra_start) { - if (begin < ra_start) - fadump_release_reserved_area(begin, ra_start); - if (end > ra_end) - fadump_release_reserved_area(ra_end, end); - } else - fadump_release_reserved_area(begin, end); + ret = fadump_add_mem_range(&reserved_mrange_info, ra_start, ra_end); + if (ret != 0) { + /* + * Not enough memory to setup reserved ranges but the system is + * running shortage of memory. So, release all the memory except + * Reserved dump area (reused for next fadump registration). + */ + if (begin < ra_end && end > ra_start) { + if (begin < ra_start) + fadump_release_reserved_area(begin, ra_start); + if (end > ra_end) + fadump_release_reserved_area(ra_end, end); + } else + fadump_release_reserved_area(begin, end); + + return; + } + + /* Get the reserved ranges list in order first. */ + sort_and_merge_mem_ranges(&reserved_mrange_info); + + /* Exclude reserved ranges and release remaining memory */ + tstart = begin; + for (i = 0; i < reserved_mrange_info.mem_range_cnt; i++) { + ra_start = reserved_mrange_info.mem_ranges[i].base; + ra_end = ra_start + reserved_mrange_info.mem_ranges[i].size; + + if (tstart >= ra_end) + continue; + + if (tstart < ra_start) + fadump_release_reserved_area(tstart, ra_start); + tstart = ra_end; + } + + if (tstart < end) + fadump_release_reserved_area(tstart, end); } static void fadump_invalidate_release_mem(void) -- cgit v1.2.3 From b2a815a554a34f0e6fab4526ae762d5528783600 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:25:49 +0530 Subject: powerpc/fadump: improve how crashed kernel's memory is reserved The size parameter to fadump_reserve_crash_area() function is not needed as all the memory above boot memory size must be preserved anyway. Update the function by dropping this redundant parameter. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821374440.5656.2945512543806951766.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 53 ++++++++++++++++++++++++-------------------- 1 file changed, 29 insertions(+), 24 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 502e49ab4b98..6f52a60bc212 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -34,6 +34,8 @@ static struct fw_dump fw_dump; +static void __init fadump_reserve_crash_area(u64 base); + static DEFINE_MUTEX(fadump_mutex); struct fadump_mrange_info crash_mrange_info = { "crash", NULL, 0, 0, 0 }; struct fadump_mrange_info reserved_mrange_info = { "reserved", NULL, 0, 0, 0 }; @@ -318,26 +320,6 @@ static unsigned long get_fadump_area_size(void) return size; } -static void __init fadump_reserve_crash_area(unsigned long base, - unsigned long size) -{ - struct memblock_region *reg; - unsigned long mstart, mend, msize; - - for_each_memblock(memory, reg) { - mstart = max_t(unsigned long, base, reg->base); - mend = reg->base + reg->size; - mend = min(base + size, mend); - - if (mstart < mend) { - msize = mend - mstart; - memblock_reserve(mstart, msize); - pr_info("Reserved %ldMB of memory at %#016lx for saving crash dump\n", - (msize >> 20), mstart); - } - } -} - int __init fadump_reserve_mem(void) { bool is_memblock_bottom_up = memblock_bottom_up(); @@ -406,12 +388,11 @@ int __init fadump_reserve_mem(void) #endif /* * If last boot has crashed then reserve all the memory - * above boot_memory_size so that we don't touch it until + * above boot memory size so that we don't touch it until * dump is written to disk by userspace tool. This memory - * will be released for general use once the dump is saved. + * can be released for general use by invalidating fadump. */ - size = mem_boundary - base; - fadump_reserve_crash_area(base, size); + fadump_reserve_crash_area(base); pr_debug("fadumphdr_addr = %#016lx\n", fw_dump.fadumphdr_addr); pr_debug("Reserve dump area start address: 0x%lx\n", @@ -1377,3 +1358,27 @@ int __init setup_fadump(void) return 1; } subsys_initcall(setup_fadump); + +/* Preserve everything above the base address */ +static void __init fadump_reserve_crash_area(u64 base) +{ + struct memblock_region *reg; + u64 mstart, msize; + + for_each_memblock(memory, reg) { + mstart = reg->base; + msize = reg->size; + + if ((mstart + msize) < base) + continue; + + if (mstart < base) { + msize -= (base - mstart); + mstart = base; + } + + pr_info("Reserving %lluMB of memory at %#016llx for preserving crash data", + (msize >> 20), mstart); + memblock_reserve(mstart, msize); + } +} -- cgit v1.2.3 From bec53196adf4791d466adf0e339b61186c7b5283 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:26:03 +0530 Subject: powerpc/fadump: add support to preserve crash data on FADUMP disabled kernel Add a new kernel config option, CONFIG_PRESERVE_FA_DUMP that ensures that crash data, from previously crash'ed kernel, is preserved. This helps in cases where FADump is not enabled but the subsequent memory preserving kernel boot is likely to process this crash data. One typical usecase for this config option is petitboot kernel. As OPAL allows registering address with it in the first kernel and retrieving it after MPIPL, use it to store the top of boot memory. A kernel that intends to preserve crash data retrieves it and avoids using memory beyond this address. Move arch_reserved_kernel_pages() function as it is needed for both FA_DUMP and PRESERVE_FA_DUMP configurations. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821375751.5656.11459483669542541602.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/Makefile | 4 +++- arch/powerpc/kernel/fadump.c | 44 +++++++++++++++++++++++++++++++++++++++----- arch/powerpc/kernel/prom.c | 4 ++-- 3 files changed, 44 insertions(+), 8 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 66c54443187d..21ab769e8530 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -79,7 +79,9 @@ obj-$(CONFIG_EEH) += eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \ eeh_driver.o eeh_event.o eeh_sysfs.o obj-$(CONFIG_GENERIC_TBSYNC) += smp-tbsync.o obj-$(CONFIG_CRASH_DUMP) += crash_dump.o -obj-$(CONFIG_FA_DUMP) += fadump.o +ifneq ($(CONFIG_FA_DUMP)$(CONFIG_PRESERVE_FA_DUMP),) +obj-y += fadump.o +endif ifdef CONFIG_PPC32 obj-$(CONFIG_E500) += idle_e500.o endif diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 6f52a60bc212..645d9d4d9332 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -36,6 +36,7 @@ static struct fw_dump fw_dump; static void __init fadump_reserve_crash_area(u64 base); +#ifndef CONFIG_PRESERVE_FA_DUMP static DEFINE_MUTEX(fadump_mutex); struct fadump_mrange_info crash_mrange_info = { "crash", NULL, 0, 0, 0 }; struct fadump_mrange_info reserved_mrange_info = { "reserved", NULL, 0, 0, 0 }; @@ -439,11 +440,6 @@ error_out: return 0; } -unsigned long __init arch_reserved_kernel_pages(void) -{ - return memblock_reserved_size() / PAGE_SIZE; -} - /* Look for fadump= cmdline option. */ static int __init early_fadump_param(char *p) { @@ -1358,6 +1354,39 @@ int __init setup_fadump(void) return 1; } subsys_initcall(setup_fadump); +#else /* !CONFIG_PRESERVE_FA_DUMP */ + +/* Scan the Firmware Assisted dump configuration details. */ +int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname, + int depth, void *data) +{ + if ((depth != 1) || (strcmp(uname, "ibm,opal") != 0)) + return 0; + + opal_fadump_dt_scan(&fw_dump, node); + return 1; +} + +/* + * When dump is active but PRESERVE_FA_DUMP is enabled on the kernel, + * preserve crash data. The subsequent memory preserving kernel boot + * is likely to process this crash data. + */ +int __init fadump_reserve_mem(void) +{ + if (fw_dump.dump_active) { + /* + * If last boot has crashed then reserve all the memory + * above boot memory to preserve crash data. + */ + pr_info("Preserving crash data for processing in next boot.\n"); + fadump_reserve_crash_area(fw_dump.boot_mem_top); + } else + pr_debug("FADump-aware kernel..\n"); + + return 1; +} +#endif /* CONFIG_PRESERVE_FA_DUMP */ /* Preserve everything above the base address */ static void __init fadump_reserve_crash_area(u64 base) @@ -1382,3 +1411,8 @@ static void __init fadump_reserve_crash_area(u64 base) memblock_reserve(mstart, msize); } } + +unsigned long __init arch_reserved_kernel_pages(void) +{ + return memblock_reserved_size() / PAGE_SIZE; +} diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index 5828f1c81dc9..6620f37abe73 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -708,7 +708,7 @@ void __init early_init_devtree(void *params) of_scan_flat_dt(early_init_dt_scan_ultravisor, NULL); #endif -#ifdef CONFIG_FA_DUMP +#if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP) /* scan tree to see if dump is active during last boot */ of_scan_flat_dt(early_init_dt_scan_fw_dump, NULL); #endif @@ -735,7 +735,7 @@ void __init early_init_devtree(void *params) if (PHYSICAL_START > MEMORY_START) memblock_reserve(MEMORY_START, 0x8000); reserve_kdump_trampoline(); -#ifdef CONFIG_FA_DUMP +#if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP) /* * If we fail to reserve memory for firmware-assisted dump then * fallback to kexec based kdump. -- cgit v1.2.3 From 7b1b3b48250acbfd7f15ba950d4654b7f02a8300 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:26:59 +0530 Subject: powerpc/fadump: consider f/w load area OPAL loads kernel & initrd at 512MB offset (256MB size), also exported as ibm,opal/dump/fw-load-area. So, if boot memory size of FADump is less than 768MB, kernel memory to be exported as '/proc/vmcore' would be overwritten by f/w while loading kernel & initrd. To avoid such a scenario, enforce a minimum boot memory size of 768MB on OPAL platform and skip using FADump if a newer F/W version loads kernel & initrd above 768MB. Also, irrespective of RMA size, set the minimum boot memory size expected on pseries platform at 320MB. This is to avoid inflating the minimum memory requirements on systems with 512M/1024M RMA size. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821381414.5656.1592867278535469652.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 645d9d4d9332..bd49b1f200bf 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -240,10 +240,10 @@ static void fadump_show_config(void) * that is required for a kernel to boot successfully. * */ -static inline unsigned long fadump_calculate_reserve_size(void) +static inline u64 fadump_calculate_reserve_size(void) { + u64 base, size, bootmem_min; int ret; - unsigned long long base, size; if (fw_dump.reserve_bootvar) pr_warn("'fadump_reserve_mem=' parameter is deprecated in favor of 'crashkernel=' parameter.\n"); @@ -293,7 +293,8 @@ static inline unsigned long fadump_calculate_reserve_size(void) if (memory_limit && size > memory_limit) size = memory_limit; - return (size > MIN_BOOT_MEM ? size : MIN_BOOT_MEM); + bootmem_min = fw_dump.ops->fadump_get_bootmem_min(); + return (size > bootmem_min ? size : bootmem_min); } /* @@ -323,8 +324,8 @@ static unsigned long get_fadump_area_size(void) int __init fadump_reserve_mem(void) { + u64 base, size, mem_boundary, bootmem_min, align = PAGE_SIZE; bool is_memblock_bottom_up = memblock_bottom_up(); - u64 base, size, mem_boundary, align = PAGE_SIZE; int ret = 1; if (!fw_dump.fadump_enabled) @@ -350,6 +351,13 @@ int __init fadump_reserve_mem(void) ALIGN(fw_dump.boot_memory_size, align); } #endif + + bootmem_min = fw_dump.ops->fadump_get_bootmem_min(); + if (fw_dump.boot_memory_size < bootmem_min) { + pr_err("Can't enable fadump with boot memory size (0x%lx) less than 0x%llx\n", + fw_dump.boot_memory_size, bootmem_min); + goto error_out; + } } /* -- cgit v1.2.3 From becd91d9c5467160984a0380df72fdf71fee82f6 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:27:26 +0530 Subject: powerpc/fadump: remove RMA_START and RMA_END macros RMA_START is defined as '0' and there is even a BUILD_BUG_ON() to make sure it is never anything else. Remove this macro and use '0' instead as code change is needed anyway when it has to be something else. Also, remove unused RMA_END macro. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821384096.5656.15026984053970204652.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index bd49b1f200bf..2e139259474d 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -128,18 +128,22 @@ int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname, * If fadump is registered, check if the memory provided * falls within boot memory area and reserved memory area. */ -int is_fadump_memory_area(u64 addr, ulong size) +int is_fadump_memory_area(u64 addr, unsigned long size) { - u64 d_start = fw_dump.reserve_dump_area_start; - u64 d_end = d_start + fw_dump.reserve_dump_area_size; + u64 d_start, d_end; if (!fw_dump.dump_registered) return 0; + if (!size) + return 0; + + d_start = fw_dump.reserve_dump_area_start; + d_end = d_start + fw_dump.reserve_dump_area_size; if (((addr + size) > d_start) && (addr <= d_end)) return 1; - return (addr + size) > RMA_START && addr <= fw_dump.boot_memory_size; + return (addr <= fw_dump.boot_memory_size); } int should_fadump_crash(void) @@ -771,14 +775,14 @@ static int fadump_setup_crash_memory_ranges(void) crash_mrange_info.mem_range_cnt = 0; /* - * add the first memory chunk (RMA_START through boot_memory_size) as + * add the first memory chunk (0 through boot_memory_size) as * a separate memory chunk. The reason is, at the time crash firmware * will move the content of this memory chunk to different location * specified during fadump registration. We need to create a separate * program header for this chunk with the correct offset. */ ret = fadump_add_mem_range(&crash_mrange_info, - RMA_START, fw_dump.boot_memory_size); + 0, fw_dump.boot_memory_size); if (ret) return ret; @@ -787,11 +791,9 @@ static int fadump_setup_crash_memory_ranges(void) end = start + (u64)reg->size; /* - * skip the first memory chunk that is already added (RMA_START - * through boot_memory_size). This logic needs a relook if and - * when RMA_START changes to a non-zero value. + * skip the first memory chunk that is already added + * (0 through boot_memory_size). */ - BUILD_BUG_ON(RMA_START != 0); if (start < fw_dump.boot_memory_size) { if (end > fw_dump.boot_memory_size) start = fw_dump.boot_memory_size; @@ -815,7 +817,7 @@ static int fadump_setup_crash_memory_ranges(void) */ static inline unsigned long fadump_relocate(unsigned long paddr) { - if (paddr > RMA_START && paddr < fw_dump.boot_memory_size) + if ((paddr > 0) && (paddr < fw_dump.boot_memory_size)) return fw_dump.boot_mem_dest_addr + paddr; else return paddr; @@ -883,11 +885,11 @@ static int fadump_create_elfcore_headers(char *bufp) phdr->p_flags = PF_R|PF_W|PF_X; phdr->p_offset = mbase; - if (mbase == RMA_START) { + if (mbase == 0) { /* - * The entire RMA region will be moved by firmware - * to the specified destination_address. Hence set - * the correct offset. + * The entire real memory region will be moved by + * firmware to the specified destination_address. + * Hence set the correct offset. */ phdr->p_offset = fw_dump.boot_mem_dest_addr; } -- cgit v1.2.3 From 7dee93a9a8808b3d8595e1cc79ccb8b1a7bc7a77 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Wed, 11 Sep 2019 20:27:39 +0530 Subject: powerpc/fadump: support holes in kernel boot memory area With support to copy multiple kernel boot memory regions owing to copy size limitation, also handle holes in the memory area to be preserved. Support as many as 128 kernel boot memory regions. This allows having an adequate FADump capture kernel size for different scenarios. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/156821385448.5656.6124791213910877759.stgit@hbathini.in.ibm.com --- arch/powerpc/kernel/fadump.c | 191 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 162 insertions(+), 29 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 2e139259474d..ed59855430b9 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -143,7 +143,7 @@ int is_fadump_memory_area(u64 addr, unsigned long size) if (((addr + size) > d_start) && (addr <= d_end)) return 1; - return (addr <= fw_dump.boot_memory_size); + return (addr <= fw_dump.boot_mem_top); } int should_fadump_crash(void) @@ -194,7 +194,20 @@ static bool is_fadump_mem_area_contiguous(u64 d_start, u64 d_end) */ bool is_fadump_boot_mem_contiguous(void) { - return is_fadump_mem_area_contiguous(0, fw_dump.boot_memory_size); + unsigned long d_start, d_end; + bool ret = false; + int i; + + for (i = 0; i < fw_dump.boot_mem_regs_cnt; i++) { + d_start = fw_dump.boot_mem_addr[i]; + d_end = d_start + fw_dump.boot_mem_sz[i]; + + ret = is_fadump_mem_area_contiguous(d_start, d_end); + if (!ret) + break; + } + + return ret; } /* @@ -213,6 +226,8 @@ bool is_fadump_reserved_mem_contiguous(void) /* Print firmware assisted dump configurations for debugging purpose. */ static void fadump_show_config(void) { + int i; + pr_debug("Support for firmware-assisted dump (fadump): %s\n", (fw_dump.fadump_supported ? "present" : "no support")); @@ -226,7 +241,13 @@ static void fadump_show_config(void) pr_debug("Dump section sizes:\n"); pr_debug(" CPU state data size: %lx\n", fw_dump.cpu_state_data_size); pr_debug(" HPTE region size : %lx\n", fw_dump.hpte_region_size); - pr_debug("Boot memory size : %lx\n", fw_dump.boot_memory_size); + pr_debug(" Boot memory size : %lx\n", fw_dump.boot_memory_size); + pr_debug(" Boot memory top : %llx\n", fw_dump.boot_mem_top); + pr_debug("Boot memory regions cnt: %llx\n", fw_dump.boot_mem_regs_cnt); + for (i = 0; i < fw_dump.boot_mem_regs_cnt; i++) { + pr_debug("[%03d] base = %llx, size = %llx\n", i, + fw_dump.boot_mem_addr[i], fw_dump.boot_mem_sz[i]); + } } /** @@ -326,6 +347,88 @@ static unsigned long get_fadump_area_size(void) return size; } +static int __init add_boot_mem_region(unsigned long rstart, + unsigned long rsize) +{ + int i = fw_dump.boot_mem_regs_cnt++; + + if (fw_dump.boot_mem_regs_cnt > FADUMP_MAX_MEM_REGS) { + fw_dump.boot_mem_regs_cnt = FADUMP_MAX_MEM_REGS; + return 0; + } + + pr_debug("Added boot memory range[%d] [%#016lx-%#016lx)\n", + i, rstart, (rstart + rsize)); + fw_dump.boot_mem_addr[i] = rstart; + fw_dump.boot_mem_sz[i] = rsize; + return 1; +} + +/* + * Firmware usually has a hard limit on the data it can copy per region. + * Honour that by splitting a memory range into multiple regions. + */ +static int __init add_boot_mem_regions(unsigned long mstart, + unsigned long msize) +{ + unsigned long rstart, rsize, max_size; + int ret = 1; + + rstart = mstart; + max_size = fw_dump.max_copy_size ? fw_dump.max_copy_size : msize; + while (msize) { + if (msize > max_size) + rsize = max_size; + else + rsize = msize; + + ret = add_boot_mem_region(rstart, rsize); + if (!ret) + break; + + msize -= rsize; + rstart += rsize; + } + + return ret; +} + +static int __init fadump_get_boot_mem_regions(void) +{ + unsigned long base, size, cur_size, hole_size, last_end; + unsigned long mem_size = fw_dump.boot_memory_size; + struct memblock_region *reg; + int ret = 1; + + fw_dump.boot_mem_regs_cnt = 0; + + last_end = 0; + hole_size = 0; + cur_size = 0; + for_each_memblock(memory, reg) { + base = reg->base; + size = reg->size; + hole_size += (base - last_end); + + if ((cur_size + size) >= mem_size) { + size = (mem_size - cur_size); + ret = add_boot_mem_regions(base, size); + break; + } + + mem_size -= size; + cur_size += size; + ret = add_boot_mem_regions(base, size); + if (!ret) + break; + + last_end = base + size; + } + fw_dump.boot_mem_top = PAGE_ALIGN(fw_dump.boot_memory_size + hole_size); + + return ret; +} + int __init fadump_reserve_mem(void) { u64 base, size, mem_boundary, bootmem_min, align = PAGE_SIZE; @@ -362,6 +465,11 @@ int __init fadump_reserve_mem(void) fw_dump.boot_memory_size, bootmem_min); goto error_out; } + + if (!fadump_get_boot_mem_regions()) { + pr_err("Too many holes in boot memory area to enable fadump\n"); + goto error_out; + } } /* @@ -385,7 +493,7 @@ int __init fadump_reserve_mem(void) else mem_boundary = memblock_end_of_DRAM(); - base = fw_dump.boot_memory_size; + base = fw_dump.boot_mem_top; size = get_fadump_area_size(); fw_dump.reserve_dump_area_size = size; if (fw_dump.dump_active) { @@ -769,34 +877,35 @@ static int fadump_setup_crash_memory_ranges(void) { struct memblock_region *reg; u64 start, end; - int ret; + int i, ret; pr_debug("Setup crash memory ranges.\n"); crash_mrange_info.mem_range_cnt = 0; /* - * add the first memory chunk (0 through boot_memory_size) as - * a separate memory chunk. The reason is, at the time crash firmware - * will move the content of this memory chunk to different location - * specified during fadump registration. We need to create a separate - * program header for this chunk with the correct offset. + * Boot memory region(s) registered with firmware are moved to + * different location at the time of crash. Create separate program + * header(s) for this memory chunk(s) with the correct offset. */ - ret = fadump_add_mem_range(&crash_mrange_info, - 0, fw_dump.boot_memory_size); - if (ret) - return ret; + for (i = 0; i < fw_dump.boot_mem_regs_cnt; i++) { + start = fw_dump.boot_mem_addr[i]; + end = start + fw_dump.boot_mem_sz[i]; + ret = fadump_add_mem_range(&crash_mrange_info, start, end); + if (ret) + return ret; + } for_each_memblock(memory, reg) { start = (u64)reg->base; end = start + (u64)reg->size; /* - * skip the first memory chunk that is already added - * (0 through boot_memory_size). + * skip the memory chunk that is already added + * (0 through boot_memory_top). */ - if (start < fw_dump.boot_memory_size) { - if (end > fw_dump.boot_memory_size) - start = fw_dump.boot_memory_size; + if (start < fw_dump.boot_mem_top) { + if (end > fw_dump.boot_mem_top) + start = fw_dump.boot_mem_top; else continue; } @@ -817,17 +926,35 @@ static int fadump_setup_crash_memory_ranges(void) */ static inline unsigned long fadump_relocate(unsigned long paddr) { - if ((paddr > 0) && (paddr < fw_dump.boot_memory_size)) - return fw_dump.boot_mem_dest_addr + paddr; - else - return paddr; + unsigned long raddr, rstart, rend, rlast, hole_size; + int i; + + hole_size = 0; + rlast = 0; + raddr = paddr; + for (i = 0; i < fw_dump.boot_mem_regs_cnt; i++) { + rstart = fw_dump.boot_mem_addr[i]; + rend = rstart + fw_dump.boot_mem_sz[i]; + hole_size += (rstart - rlast); + + if (paddr >= rstart && paddr < rend) { + raddr += fw_dump.boot_mem_dest_addr - hole_size; + break; + } + + rlast = rend; + } + + pr_debug("vmcoreinfo: paddr = 0x%lx, raddr = 0x%lx\n", paddr, raddr); + return raddr; } static int fadump_create_elfcore_headers(char *bufp) { - struct elfhdr *elf; + unsigned long long raddr, offset; struct elf_phdr *phdr; - int i; + struct elfhdr *elf; + int i, j; fadump_init_elfcore_header(bufp); elf = (struct elfhdr *)bufp; @@ -870,7 +997,9 @@ static int fadump_create_elfcore_headers(char *bufp) (elf->e_phnum)++; /* setup PT_LOAD sections. */ - + j = 0; + offset = 0; + raddr = fw_dump.boot_mem_addr[0]; for (i = 0; i < crash_mrange_info.mem_range_cnt; i++) { u64 mbase, msize; @@ -885,13 +1014,17 @@ static int fadump_create_elfcore_headers(char *bufp) phdr->p_flags = PF_R|PF_W|PF_X; phdr->p_offset = mbase; - if (mbase == 0) { + if (mbase == raddr) { /* * The entire real memory region will be moved by * firmware to the specified destination_address. * Hence set the correct offset. */ - phdr->p_offset = fw_dump.boot_mem_dest_addr; + phdr->p_offset = fw_dump.boot_mem_dest_addr + offset; + if (j < (fw_dump.boot_mem_regs_cnt - 1)) { + offset += fw_dump.boot_mem_sz[j]; + raddr = fw_dump.boot_mem_addr[++j]; + } } phdr->p_paddr = mbase; @@ -1177,7 +1310,7 @@ static void fadump_invalidate_release_mem(void) fadump_cleanup(); mutex_unlock(&fadump_mutex); - fadump_release_memory(fw_dump.boot_memory_size, memblock_end_of_DRAM()); + fadump_release_memory(fw_dump.boot_mem_top, memblock_end_of_DRAM()); fadump_free_cpu_notes_buf(); /* -- cgit v1.2.3 From e7ca44ed3ba77fc26cf32650bb71584896662474 Mon Sep 17 00:00:00 2001 From: Ganesh Goudar Date: Wed, 4 Sep 2019 13:29:49 +0530 Subject: powerpc: dump kernel log before carrying out fadump or kdump Since commit 4388c9b3a6ee ("powerpc: Do not send system reset request through the oops path"), pstore dmesg file is not updated when dump is triggered from HMC. This commit modified system reset (sreset) handler to invoke fadump or kdump (if configured), without pushing dmesg to pstore. This leaves pstore to have old dmesg data which won't be much of a help if kdump fails to capture the dump. This patch fixes that by calling kmsg_dump() before heading to fadump ot kdump. Fixes: 4388c9b3a6ee ("powerpc: Do not send system reset request through the oops path") Reviewed-by: Mahesh Salgaonkar Reviewed-by: Nicholas Piggin Signed-off-by: Ganesh Goudar Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190904075949.15607-1-ganeshgr@linux.ibm.com --- arch/powerpc/kernel/traps.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 11caa0291254..82f43535e686 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -472,6 +472,7 @@ void system_reset_exception(struct pt_regs *regs) if (debugger(regs)) goto out; + kmsg_dump(KMSG_DUMP_OOPS); /* * A system reset is a request to dump, so we always send * it through the crashdump code (if fadump or kdump are -- cgit v1.2.3 From 370011a27028d6f05e598ed6211a0ca2dc0213f7 Mon Sep 17 00:00:00 2001 From: "Naveen N. Rao" Date: Thu, 5 Sep 2019 23:50:29 +0530 Subject: powerpc/ftrace: Enable HAVE_FUNCTION_GRAPH_RET_ADDR_PTR This associates entries in the ftrace_ret_stack with corresponding stack frames, enabling more robust stack unwinding. Also update the only user of ftrace_graph_ret_addr() to pass the stack pointer. Signed-off-by: Naveen N. Rao Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/0224f2d0971b069c678e2ff678cfc2cd1e114cfe.1567707399.git.naveen.n.rao@linux.vnet.ibm.com --- arch/powerpc/kernel/stacktrace.c | 2 +- arch/powerpc/kernel/trace/ftrace.c | 5 +++-- arch/powerpc/kernel/trace/ftrace_32.S | 1 + arch/powerpc/kernel/trace/ftrace_64_mprofile.S | 1 + arch/powerpc/kernel/trace/ftrace_64_pg.S | 1 + 5 files changed, 7 insertions(+), 3 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c index 1e2276963f6d..e2a46cfed5fd 100644 --- a/arch/powerpc/kernel/stacktrace.c +++ b/arch/powerpc/kernel/stacktrace.c @@ -182,7 +182,7 @@ static int __save_stack_trace_tsk_reliable(struct task_struct *tsk, * FIXME: IMHO these tests do not belong in * arch-dependent code, they are generic. */ - ip = ftrace_graph_ret_addr(tsk, &graph_idx, ip, NULL); + ip = ftrace_graph_ret_addr(tsk, &graph_idx, ip, stack); #ifdef CONFIG_KPROBES /* * Mark stacktraces with kretprobed functions on them diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index be1ca98fce5c..7ea0ca044b65 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -944,7 +944,8 @@ int ftrace_disable_ftrace_graph_caller(void) * Hook the return address and push it in the stack of return addrs * in current thread info. Return the address we want to divert to. */ -unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip) +unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip, + unsigned long sp) { unsigned long return_hooker; @@ -956,7 +957,7 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip) return_hooker = ppc_function_entry(return_to_handler); - if (!function_graph_enter(parent, ip, 0, NULL)) + if (!function_graph_enter(parent, ip, 0, (unsigned long *)sp)) parent = return_hooker; out: return parent; diff --git a/arch/powerpc/kernel/trace/ftrace_32.S b/arch/powerpc/kernel/trace/ftrace_32.S index 183f608efb81..e023ae59c429 100644 --- a/arch/powerpc/kernel/trace/ftrace_32.S +++ b/arch/powerpc/kernel/trace/ftrace_32.S @@ -50,6 +50,7 @@ _GLOBAL(ftrace_stub) #ifdef CONFIG_FUNCTION_GRAPH_TRACER _GLOBAL(ftrace_graph_caller) + addi r5, r1, 48 /* load r4 with local address */ lwz r4, 44(r1) subi r4, r4, MCOUNT_INSN_SIZE diff --git a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S index 74acbf16a666..f9fd5f743eba 100644 --- a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S +++ b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S @@ -294,6 +294,7 @@ _GLOBAL(ftrace_graph_caller) std r2, 24(r1) ld r2, PACATOC(r13) /* get kernel TOC in r2 */ + addi r5, r1, 112 mfctr r4 /* ftrace_caller has moved local addr here */ std r4, 40(r1) mflr r3 /* ftrace_caller has restored LR from stack */ diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.S b/arch/powerpc/kernel/trace/ftrace_64_pg.S index e41a7d13c99c..6708e24db0ab 100644 --- a/arch/powerpc/kernel/trace/ftrace_64_pg.S +++ b/arch/powerpc/kernel/trace/ftrace_64_pg.S @@ -41,6 +41,7 @@ _GLOBAL(ftrace_stub) #ifdef CONFIG_FUNCTION_GRAPH_TRACER _GLOBAL(ftrace_graph_caller) + addi r5, r1, 112 /* load r4 with local address */ ld r4, 128(r1) subi r4, r4, MCOUNT_INSN_SIZE -- cgit v1.2.3 From 7c1bb6bbf75d8ca5ec878627d3170effcaf54f27 Mon Sep 17 00:00:00 2001 From: "Naveen N. Rao" Date: Thu, 5 Sep 2019 23:50:30 +0530 Subject: powerpc: Use ftrace_graph_ret_addr() when unwinding With support for HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, ftrace_graph_ret_addr() provides more robust unwinding when function graph is in use. Update show_stack() to use the same. With dump_stack() added to sysrq_sysctl_handler(), before this patch: root@(none):/sys/kernel/debug/tracing# cat /proc/sys/kernel/sysrq CPU: 0 PID: 218 Comm: cat Not tainted 5.3.0-rc7-00868-g8453ad4a078c-dirty #20 Call Trace: [c0000000d1e13c30] [c00000000006ab98] return_to_handler+0x0/0x40 (dump_stack+0xe8/0x164) (unreliable) [c0000000d1e13c80] [c000000000145680] sysrq_sysctl_handler+0x48/0xb8 [c0000000d1e13cd0] [c00000000006ab98] return_to_handler+0x0/0x40 (proc_sys_call_handler+0x274/0x2a0) [c0000000d1e13d60] [c00000000006ab98] return_to_handler+0x0/0x40 (return_to_handler+0x0/0x40) [c0000000d1e13d80] [c00000000006ab98] return_to_handler+0x0/0x40 (__vfs_read+0x3c/0x70) [c0000000d1e13dd0] [c00000000006ab98] return_to_handler+0x0/0x40 (vfs_read+0xb8/0x1b0) [c0000000d1e13e20] [c00000000006ab98] return_to_handler+0x0/0x40 (ksys_read+0x7c/0x140) After this patch: Call Trace: [c0000000d1e33c30] [c00000000006ab58] return_to_handler+0x0/0x40 (dump_stack+0xe8/0x164) (unreliable) [c0000000d1e33c80] [c000000000145680] sysrq_sysctl_handler+0x48/0xb8 [c0000000d1e33cd0] [c00000000006ab58] return_to_handler+0x0/0x40 (proc_sys_call_handler+0x274/0x2a0) [c0000000d1e33d60] [c00000000006ab58] return_to_handler+0x0/0x40 (__vfs_read+0x3c/0x70) [c0000000d1e33d80] [c00000000006ab58] return_to_handler+0x0/0x40 (vfs_read+0xb8/0x1b0) [c0000000d1e33dd0] [c00000000006ab58] return_to_handler+0x0/0x40 (ksys_read+0x7c/0x140) [c0000000d1e33e20] [c00000000006ab58] return_to_handler+0x0/0x40 (system_call+0x5c/0x68) Signed-off-by: Naveen N. Rao Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/dc89c9a887121342d9c7819482c3dabdece2a323.1567707399.git.naveen.n.rao@linux.vnet.ibm.com --- arch/powerpc/kernel/process.c | 19 ++++++------------- 1 file changed, 6 insertions(+), 13 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 24621e7e5033..f289bdd2b562 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -2047,10 +2047,8 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) int count = 0; int firstframe = 1; #ifdef CONFIG_FUNCTION_GRAPH_TRACER - struct ftrace_ret_stack *ret_stack; - extern void return_to_handler(void); - unsigned long rth = (unsigned long)return_to_handler; - int curr_frame = 0; + unsigned long ret_addr; + int ftrace_idx = 0; #endif if (tsk == NULL) @@ -2079,15 +2077,10 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) if (!firstframe || ip != lr) { printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip); #ifdef CONFIG_FUNCTION_GRAPH_TRACER - if ((ip == rth) && curr_frame >= 0) { - ret_stack = ftrace_graph_get_ret_stack(current, - curr_frame++); - if (ret_stack) - pr_cont(" (%pS)", - (void *)ret_stack->ret); - else - curr_frame = -1; - } + ret_addr = ftrace_graph_ret_addr(current, + &ftrace_idx, ip, stack); + if (ret_addr != ip) + pr_cont(" (%pS)", (void *)ret_addr); #endif if (firstframe) pr_cont(" (unreliable)"); -- cgit v1.2.3 From d9101bfa6adc831bda8836c4d774820553c14942 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Wed, 18 Sep 2019 20:23:28 +0530 Subject: powerpc/mm/mce: Keep irqs disabled during lockless page table walk __find_linux_mm_pte() returns a page table entry pointer after walking the page table without holding locks. To make it safe against a THP split and/or collapse, we disable interrupts around the lockless page table walk. However we need to keep interrupts disabled as long as we use the page table entry pointer that is returned. Fix addr_to_pfn() to do that. Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Signed-off-by: Aneesh Kumar K.V [mpe: Rearrange code slightly and tweak change log wording] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190918145328.28602-1-aneesh.kumar@linux.ibm.com --- arch/powerpc/kernel/mce_power.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c index 356e7b99f661..1cbf7f1a4e3d 100644 --- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -29,7 +29,7 @@ unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) { pte_t *ptep; unsigned int shift; - unsigned long flags; + unsigned long pfn, flags; struct mm_struct *mm; if (user_mode(regs)) @@ -39,18 +39,22 @@ unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) local_irq_save(flags); ptep = __find_linux_pte(mm->pgd, addr, NULL, &shift); - local_irq_restore(flags); - if (!ptep || pte_special(*ptep)) - return ULONG_MAX; + if (!ptep || pte_special(*ptep)) { + pfn = ULONG_MAX; + goto out; + } - if (shift > PAGE_SHIFT) { + if (shift <= PAGE_SHIFT) + pfn = pte_pfn(*ptep); + else { unsigned long rpnmask = (1ul << shift) - PAGE_SIZE; - - return pte_pfn(__pte(pte_val(*ptep) | (addr & rpnmask))); + pfn = pte_pfn(__pte(pte_val(*ptep) | (addr & rpnmask))); } - return pte_pfn(*ptep); +out: + local_irq_restore(flags); + return pfn; } /* flush SLBs and reload */ -- cgit v1.2.3 From 13c7bb3c57dcfe779ea5b4b083f6c47753cc5327 Mon Sep 17 00:00:00 2001 From: Jordan Niethe Date: Tue, 17 Sep 2019 10:46:05 +1000 Subject: powerpc/64s: Set reserved PCR bits Currently the reserved bits of the Processor Compatibility Register (PCR) are cleared as per the Programming Note in Section 1.3.3 of version 3.0B of the Power ISA. This causes all new architecture features to be made available when running on newer processors with new architecture features added to the PCR as bits must be set to disable a given feature. For example to disable new features added as part of Version 2.07 of the ISA the corresponding bit in the PCR needs to be set. As new processor features generally require explicit kernel support they should be disabled until such support is implemented. Therefore kernels should set all unknown/reserved bits in the PCR such that any new architecture features which the kernel does not currently know about get disabled. An update is planned to the ISA to clarify that the PCR is an exception to the Programming Note on reserved bits in Section 1.3.3. Signed-off-by: Alistair Popple Signed-off-by: Jordan Niethe Tested-by: Joel Stanley Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190917004605.22471-2-alistair@popple.id.au --- arch/powerpc/kernel/cpu_setup_power.S | 6 ++++++ arch/powerpc/kernel/dt_cpu_ftrs.c | 3 ++- 2 files changed, 8 insertions(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S index 3239a9fe6c1c..a460298c7ddb 100644 --- a/arch/powerpc/kernel/cpu_setup_power.S +++ b/arch/powerpc/kernel/cpu_setup_power.S @@ -23,6 +23,7 @@ _GLOBAL(__setup_cpu_power7) beqlr li r0,0 mtspr SPRN_LPID,r0 + LOAD_REG_IMMEDIATE(r0, PCR_MASK) mtspr SPRN_PCR,r0 mfspr r3,SPRN_LPCR li r4,(LPCR_LPES1 >> LPCR_LPES_SH) @@ -37,6 +38,7 @@ _GLOBAL(__restore_cpu_power7) beqlr li r0,0 mtspr SPRN_LPID,r0 + LOAD_REG_IMMEDIATE(r0, PCR_MASK) mtspr SPRN_PCR,r0 mfspr r3,SPRN_LPCR li r4,(LPCR_LPES1 >> LPCR_LPES_SH) @@ -54,6 +56,7 @@ _GLOBAL(__setup_cpu_power8) beqlr li r0,0 mtspr SPRN_LPID,r0 + LOAD_REG_IMMEDIATE(r0, PCR_MASK) mtspr SPRN_PCR,r0 mfspr r3,SPRN_LPCR ori r3, r3, LPCR_PECEDH @@ -76,6 +79,7 @@ _GLOBAL(__restore_cpu_power8) beqlr li r0,0 mtspr SPRN_LPID,r0 + LOAD_REG_IMMEDIATE(r0, PCR_MASK) mtspr SPRN_PCR,r0 mfspr r3,SPRN_LPCR ori r3, r3, LPCR_PECEDH @@ -98,6 +102,7 @@ _GLOBAL(__setup_cpu_power9) mtspr SPRN_PSSCR,r0 mtspr SPRN_LPID,r0 mtspr SPRN_PID,r0 + LOAD_REG_IMMEDIATE(r0, PCR_MASK) mtspr SPRN_PCR,r0 mfspr r3,SPRN_LPCR LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE | LPCR_HEIC) @@ -123,6 +128,7 @@ _GLOBAL(__restore_cpu_power9) mtspr SPRN_PSSCR,r0 mtspr SPRN_LPID,r0 mtspr SPRN_PID,r0 + LOAD_REG_IMMEDIATE(r0, PCR_MASK) mtspr SPRN_PCR,r0 mfspr r3,SPRN_LPCR LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE | LPCR_HEIC) diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index bd95318d2202..bceee2fde885 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -101,7 +101,7 @@ static void __restore_cpu_cpufeatures(void) if (hv_mode) { mtspr(SPRN_LPID, 0); mtspr(SPRN_HFSCR, system_registers.hfscr); - mtspr(SPRN_PCR, 0); + mtspr(SPRN_PCR, PCR_MASK); } mtspr(SPRN_FSCR, system_registers.fscr); @@ -144,6 +144,7 @@ static void __init cpufeatures_setup_cpu(void) mtspr(SPRN_HFSCR, 0); } mtspr(SPRN_FSCR, 0); + mtspr(SPRN_PCR, PCR_MASK); /* * LPCR does not get cleared, to match behaviour with secondaries -- cgit v1.2.3 From 3a83f677a6eeff65751b29e3648d7c69c3be83f3 Mon Sep 17 00:00:00 2001 From: Michael Roth Date: Wed, 11 Sep 2019 17:31:55 -0500 Subject: KVM: PPC: Book3S HV: use smp_mb() when setting/clearing host_ipi flag On a 2-socket Power9 system with 32 cores/128 threads (SMT4) and 1TB of memory running the following guest configs: guest A: - 224GB of memory - 56 VCPUs (sockets=1,cores=28,threads=2), where: VCPUs 0-1 are pinned to CPUs 0-3, VCPUs 2-3 are pinned to CPUs 4-7, ... VCPUs 54-55 are pinned to CPUs 108-111 guest B: - 4GB of memory - 4 VCPUs (sockets=1,cores=4,threads=1) with the following workloads (with KSM and THP enabled in all): guest A: stress --cpu 40 --io 20 --vm 20 --vm-bytes 512M guest B: stress --cpu 4 --io 4 --vm 4 --vm-bytes 512M host: stress --cpu 4 --io 4 --vm 2 --vm-bytes 256M the below soft-lockup traces were observed after an hour or so and persisted until the host was reset (this was found to be reliably reproducible for this configuration, for kernels 4.15, 4.18, 5.0, and 5.3-rc5): [ 1253.183290] rcu: INFO: rcu_sched self-detected stall on CPU [ 1253.183319] rcu: 124-....: (5250 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=1941 [ 1256.287426] watchdog: BUG: soft lockup - CPU#105 stuck for 23s! [CPU 52/KVM:19709] [ 1264.075773] watchdog: BUG: soft lockup - CPU#24 stuck for 23s! [worker:19913] [ 1264.079769] watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [worker:20331] [ 1264.095770] watchdog: BUG: soft lockup - CPU#45 stuck for 23s! [worker:20338] [ 1264.131773] watchdog: BUG: soft lockup - CPU#64 stuck for 23s! [avocado:19525] [ 1280.408480] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791] [ 1316.198012] rcu: INFO: rcu_sched self-detected stall on CPU [ 1316.198032] rcu: 124-....: (21003 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=8243 [ 1340.411024] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791] [ 1379.212609] rcu: INFO: rcu_sched self-detected stall on CPU [ 1379.212629] rcu: 124-....: (36756 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=14714 [ 1404.413615] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791] [ 1442.227095] rcu: INFO: rcu_sched self-detected stall on CPU [ 1442.227115] rcu: 124-....: (52509 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=21403 [ 1455.111787] INFO: task worker:19907 blocked for more than 120 seconds. [ 1455.111822] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.111833] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.111884] INFO: task worker:19908 blocked for more than 120 seconds. [ 1455.111905] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.111925] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.111966] INFO: task worker:20328 blocked for more than 120 seconds. [ 1455.111986] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.111998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.112048] INFO: task worker:20330 blocked for more than 120 seconds. [ 1455.112068] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.112097] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.112138] INFO: task worker:20332 blocked for more than 120 seconds. [ 1455.112159] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.112179] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.112210] INFO: task worker:20333 blocked for more than 120 seconds. [ 1455.112231] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.112242] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.112282] INFO: task worker:20335 blocked for more than 120 seconds. [ 1455.112303] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.112332] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.112372] INFO: task worker:20336 blocked for more than 120 seconds. [ 1455.112392] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 CPUs 45, 24, and 124 are stuck on spin locks, likely held by CPUs 105 and 31. CPUs 105 and 31 are stuck in smp_call_function_many(), waiting on target CPU 42. For instance: # CPU 105 registers (via xmon) R00 = c00000000020b20c R16 = 00007d1bcd800000 R01 = c00000363eaa7970 R17 = 0000000000000001 R02 = c0000000019b3a00 R18 = 000000000000006b R03 = 000000000000002a R19 = 00007d537d7aecf0 R04 = 000000000000002a R20 = 60000000000000e0 R05 = 000000000000002a R21 = 0801000000000080 R06 = c0002073fb0caa08 R22 = 0000000000000d60 R07 = c0000000019ddd78 R23 = 0000000000000001 R08 = 000000000000002a R24 = c00000000147a700 R09 = 0000000000000001 R25 = c0002073fb0ca908 R10 = c000008ffeb4e660 R26 = 0000000000000000 R11 = c0002073fb0ca900 R27 = c0000000019e2464 R12 = c000000000050790 R28 = c0000000000812b0 R13 = c000207fff623e00 R29 = c0002073fb0ca808 R14 = 00007d1bbee00000 R30 = c0002073fb0ca800 R15 = 00007d1bcd600000 R31 = 0000000000000800 pc = c00000000020b260 smp_call_function_many+0x3d0/0x460 cfar= c00000000020b270 smp_call_function_many+0x3e0/0x460 lr = c00000000020b20c smp_call_function_many+0x37c/0x460 msr = 900000010288b033 cr = 44024824 ctr = c000000000050790 xer = 0000000000000000 trap = 100 CPU 42 is running normally, doing VCPU work: # CPU 42 stack trace (via xmon) [link register ] c00800001be17188 kvmppc_book3s_radix_page_fault+0x90/0x2b0 [kvm_hv] [c000008ed3343820] c000008ed3343850 (unreliable) [c000008ed33438d0] c00800001be11b6c kvmppc_book3s_hv_page_fault+0x264/0xe30 [kvm_hv] [c000008ed33439d0] c00800001be0d7b4 kvmppc_vcpu_run_hv+0x8dc/0xb50 [kvm_hv] [c000008ed3343ae0] c00800001c10891c kvmppc_vcpu_run+0x34/0x48 [kvm] [c000008ed3343b00] c00800001c10475c kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm] [c000008ed3343b90] c00800001c0f5a78 kvm_vcpu_ioctl+0x470/0x7c8 [kvm] [c000008ed3343d00] c000000000475450 do_vfs_ioctl+0xe0/0xc70 [c000008ed3343db0] c0000000004760e4 ksys_ioctl+0x104/0x120 [c000008ed3343e00] c000000000476128 sys_ioctl+0x28/0x80 [c000008ed3343e20] c00000000000b388 system_call+0x5c/0x70 --- Exception: c00 (System Call) at 00007d545cfd7694 SP (7d53ff7edf50) is in userspace It was subsequently found that ipi_message[PPC_MSG_CALL_FUNCTION] was set for CPU 42 by at least 1 of the CPUs waiting in smp_call_function_many(), but somehow the corresponding call_single_queue entries were never processed by CPU 42, causing the callers to spin in csd_lock_wait() indefinitely. Nick Piggin suggested something similar to the following sequence as a possible explanation (interleaving of CALL_FUNCTION/RESCHEDULE IPI messages seems to be most common, but any mix of CALL_FUNCTION and !CALL_FUNCTION messages could trigger it): CPU X: smp_muxed_ipi_set_message(): X: smp_mb() X: message[RESCHEDULE] = 1 X: doorbell_global_ipi(42): X: kvmppc_set_host_ipi(42, 1) X: ppc_msgsnd_sync()/smp_mb() X: ppc_msgsnd() -> 42 42: doorbell_exception(): // from CPU X 42: ppc_msgsync() 105: smp_muxed_ipi_set_message(): 105: smb_mb() // STORE DEFERRED DUE TO RE-ORDERING --105: message[CALL_FUNCTION] = 1 | 105: doorbell_global_ipi(42): | 105: kvmppc_set_host_ipi(42, 1) | 42: kvmppc_set_host_ipi(42, 0) | 42: smp_ipi_demux_relaxed() | 42: // returns to executing guest | // RE-ORDERED STORE COMPLETES ->105: message[CALL_FUNCTION] = 1 105: ppc_msgsnd_sync()/smp_mb() 105: ppc_msgsnd() -> 42 42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored 105: // hangs waiting on 42 to process messages/call_single_queue This can be prevented with an smp_mb() at the beginning of kvmppc_set_host_ipi(), such that stores to message[] (or other state indicated by the host_ipi flag) are ordered vs. the store to to host_ipi. However, doing so might still allow for the following scenario (not yet observed): CPU X: smp_muxed_ipi_set_message(): X: smp_mb() X: message[RESCHEDULE] = 1 X: doorbell_global_ipi(42): X: kvmppc_set_host_ipi(42, 1) X: ppc_msgsnd_sync()/smp_mb() X: ppc_msgsnd() -> 42 42: doorbell_exception(): // from CPU X 42: ppc_msgsync() // STORE DEFERRED DUE TO RE-ORDERING -- 42: kvmppc_set_host_ipi(42, 0) | 42: smp_ipi_demux_relaxed() | 105: smp_muxed_ipi_set_message(): | 105: smb_mb() | 105: message[CALL_FUNCTION] = 1 | 105: doorbell_global_ipi(42): | 105: kvmppc_set_host_ipi(42, 1) | // RE-ORDERED STORE COMPLETES -> 42: kvmppc_set_host_ipi(42, 0) 42: // returns to executing guest 105: ppc_msgsnd_sync()/smp_mb() 105: ppc_msgsnd() -> 42 42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored 105: // hangs waiting on 42 to process messages/call_single_queue Fixing this scenario would require an smp_mb() *after* clearing host_ipi flag in kvmppc_set_host_ipi() to order the store vs. subsequent processing of IPI messages. To handle both cases, this patch splits kvmppc_set_host_ipi() into separate set/clear functions, where we execute smp_mb() prior to setting host_ipi flag, and after clearing host_ipi flag. These functions pair with each other to synchronize the sender and receiver sides. With that change in place the above workload ran for 20 hours without triggering any lock-ups. Fixes: 755563bc79c7 ("powerpc/powernv: Fixes for hypervisor doorbell handling") # v4.0 Signed-off-by: Michael Roth Acked-by: Paul Mackerras Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190911223155.16045-1-mdroth@linux.vnet.ibm.com --- arch/powerpc/kernel/dbell.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/dbell.c b/arch/powerpc/kernel/dbell.c index 804b1a6196fa..f17ff1200eaa 100644 --- a/arch/powerpc/kernel/dbell.c +++ b/arch/powerpc/kernel/dbell.c @@ -33,7 +33,7 @@ void doorbell_global_ipi(int cpu) { u32 tag = get_hard_smp_processor_id(cpu); - kvmppc_set_host_ipi(cpu, 1); + kvmppc_set_host_ipi(cpu); /* Order previous accesses vs. msgsnd, which is treated as a store */ ppc_msgsnd_sync(); ppc_msgsnd(PPC_DBELL_MSGTYPE, 0, tag); @@ -48,7 +48,7 @@ void doorbell_core_ipi(int cpu) { u32 tag = cpu_thread_in_core(cpu); - kvmppc_set_host_ipi(cpu, 1); + kvmppc_set_host_ipi(cpu); /* Order previous accesses vs. msgsnd, which is treated as a store */ ppc_msgsnd_sync(); ppc_msgsnd(PPC_DBELL_MSGTYPE, 0, tag); @@ -84,7 +84,7 @@ void doorbell_exception(struct pt_regs *regs) may_hard_irq_enable(); - kvmppc_set_host_ipi(smp_processor_id(), 0); + kvmppc_clear_host_ipi(smp_processor_id()); __this_cpu_inc(irq_stat.doorbell_irqs); smp_ipi_demux_relaxed(); /* already performed the barrier */ -- cgit v1.2.3 From 677733e296b5c7a37c47da391fc70a43dc40bd67 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Tue, 24 Sep 2019 09:22:51 +0530 Subject: powerpc/book3s64/mm: Don't do tlbie fixup for some hardware revisions The store ordering vs tlbie issue mentioned in commit a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") is fixed for Nimbus 2.3 and Cumulus 1.3 revisions. We don't need to apply the fixup if we are running on them We can only do this on PowerNV. On pseries guest with KVM we still don't support redoing the feature fixup after migration. So we should be enabling all the workarounds needed, because whe can possibly migrate between DD 2.3 and DD 2.2 Fixes: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Aneesh Kumar K.V Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190924035254.24612-1-aneesh.kumar@linux.ibm.com --- arch/powerpc/kernel/dt_cpu_ftrs.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index bceee2fde885..5fc1b527de46 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -692,9 +692,35 @@ static bool __init cpufeatures_process_feature(struct dt_cpu_feature *f) return true; } +/* + * Handle POWER9 broadcast tlbie invalidation issue using + * cpu feature flag. + */ +static __init void update_tlbie_feature_flag(unsigned long pvr) +{ + if (PVR_VER(pvr) == PVR_POWER9) { + /* + * Set the tlbie feature flag for anything below + * Nimbus DD 2.3 and Cumulus DD 1.3 + */ + if ((pvr & 0xe000) == 0) { + /* Nimbus */ + if ((pvr & 0xfff) < 0x203) + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + } else if ((pvr & 0xc000) == 0) { + /* Cumulus */ + if ((pvr & 0xfff) < 0x103) + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + } else { + WARN_ONCE(1, "Unknown PVR"); + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + } + } +} + static __init void cpufeatures_cpu_quirks(void) { - int version = mfspr(SPRN_PVR); + unsigned long version = mfspr(SPRN_PVR); /* * Not all quirks can be derived from the cpufeatures device tree. @@ -713,10 +739,10 @@ static __init void cpufeatures_cpu_quirks(void) if ((version & 0xffff0000) == 0x004e0000) { cur_cpu_spec->cpu_features &= ~(CPU_FTR_DAWR); - cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; cur_cpu_spec->cpu_features |= CPU_FTR_P9_TIDR; } + update_tlbie_feature_flag(version); /* * PKEY was not in the initial base or feature node * specification, but it should become optional in the next -- cgit v1.2.3 From 09ce98cacd51fcd0fa0af2f79d1e1d3192f4cbb0 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Tue, 24 Sep 2019 09:22:52 +0530 Subject: powerpc/book3s64/radix: Rename CPU_FTR_P9_TLBIE_BUG feature flag Rename the #define to indicate this is related to store vs tlbie ordering issue. In the next patch, we will be adding another feature flag that is used to handles ERAT flush vs tlbie ordering issue. Fixes: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Aneesh Kumar K.V Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190924035254.24612-2-aneesh.kumar@linux.ibm.com --- arch/powerpc/kernel/dt_cpu_ftrs.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index 5fc1b527de46..a86486390c70 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -706,14 +706,14 @@ static __init void update_tlbie_feature_flag(unsigned long pvr) if ((pvr & 0xe000) == 0) { /* Nimbus */ if ((pvr & 0xfff) < 0x203) - cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_STQ_BUG; } else if ((pvr & 0xc000) == 0) { /* Cumulus */ if ((pvr & 0xfff) < 0x103) - cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_STQ_BUG; } else { WARN_ONCE(1, "Unknown PVR"); - cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_STQ_BUG; } } } -- cgit v1.2.3 From 047e6575aec71d75b765c22111820c4776cd1c43 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Tue, 24 Sep 2019 09:22:53 +0530 Subject: powerpc/mm: Fixup tlbie vs mtpidr/mtlpidr ordering issue on POWER9 On POWER9, under some circumstances, a broadcast TLB invalidation will fail to invalidate the ERAT cache on some threads when there are parallel mtpidr/mtlpidr happening on other threads of the same core. This can cause stores to continue to go to a page after it's unmapped. The workaround is to force an ERAT flush using PID=0 or LPID=0 tlbie flush. This additional TLB flush will cause the ERAT cache invalidation. Since we are using PID=0 or LPID=0, we don't get filtered out by the TLB snoop filtering logic. We need to still follow this up with another tlbie to take care of store vs tlbie ordering issue explained in commit: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9"). The presence of ERAT cache implies we can still get new stores and they may miss store queue marking flush. Cc: stable@vger.kernel.org Signed-off-by: Aneesh Kumar K.V Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190924035254.24612-3-aneesh.kumar@linux.ibm.com --- arch/powerpc/kernel/dt_cpu_ftrs.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index a86486390c70..180b3a5d1001 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -715,6 +715,8 @@ static __init void update_tlbie_feature_flag(unsigned long pvr) WARN_ONCE(1, "Unknown PVR"); cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_STQ_BUG; } + + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_ERAT_BUG; } } -- cgit v1.2.3 From 253c892193ab58da6b1d94371285971b22c63260 Mon Sep 17 00:00:00 2001 From: Oliver O'Halloran Date: Thu, 26 Sep 2019 22:25:02 +1000 Subject: powerpc/eeh: Fix eeh eeh_debugfs_break_device() with SRIOV devices s/CONFIG_IOV/CONFIG_PCI_IOV/ Whoops. Fixes: bd6461cc7b3c ("powerpc/eeh: Add a eeh_dev_break debugfs interface") Signed-off-by: Oliver O'Halloran [mpe: Fixup the #endif comment as well] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190926122502.14826-1-oohall@gmail.com --- arch/powerpc/kernel/eeh.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 0a91dee51245..bc8a551013be 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -1960,7 +1960,7 @@ static int eeh_debugfs_break_device(struct pci_dev *pdev) pci_err(pdev, "Going to break: %pR\n", bar); if (pdev->is_virtfn) { -#ifndef CONFIG_IOV +#ifndef CONFIG_PCI_IOV return -ENXIO; #else /* @@ -1980,7 +1980,7 @@ static int eeh_debugfs_break_device(struct pci_dev *pdev) pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_SRIOV); pos += PCI_SRIOV_CTRL; bit = PCI_SRIOV_CTRL_MSE; -#endif /* !CONFIG_IOV */ +#endif /* !CONFIG_PCI_IOV */ } else { bit = PCI_COMMAND_MEMORY; pos = PCI_COMMAND; -- cgit v1.2.3 From 05d9a952832cb206a32e3705eff6edebdb2207e7 Mon Sep 17 00:00:00 2001 From: Thiago Jung Bauermann Date: Wed, 11 Sep 2019 13:34:33 -0300 Subject: powerpc/prom_init: Undo relocation before entering secure mode The ultravisor will do an integrity check of the kernel image but we relocated it so the check will fail. Restore the original image by relocating it back to the kernel virtual base address. This works because during build vmlinux is linked with an expected virtual runtime address of KERNELBASE. Fixes: 6a9c930bd775 ("powerpc/prom_init: Add the ESM call to prom_init") Signed-off-by: Thiago Jung Bauermann Tested-by: Michael Anderson [mpe: Add IS_ENABLED() to fix the CONFIG_RELOCATABLE=n build] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20190911163433.12822-1-bauerman@linux.ibm.com --- arch/powerpc/kernel/prom_init.c | 13 +++++++++++++ arch/powerpc/kernel/prom_init_check.sh | 3 ++- 2 files changed, 15 insertions(+), 1 deletion(-) (limited to 'arch/powerpc/kernel') diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index a4e7762dd286..100f1b57ec2f 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -3249,7 +3249,20 @@ static void setup_secure_guest(unsigned long kbase, unsigned long fdt) /* Switch to secure mode. */ prom_printf("Switching to secure mode.\n"); + /* + * The ultravisor will do an integrity check of the kernel image but we + * relocated it so the check will fail. Restore the original image by + * relocating it back to the kernel virtual base address. + */ + if (IS_ENABLED(CONFIG_RELOCATABLE)) + relocate(KERNELBASE); + ret = enter_secure_mode(kbase, fdt); + + /* Relocate the kernel again. */ + if (IS_ENABLED(CONFIG_RELOCATABLE)) + relocate(kbase); + if (ret != U_SUCCESS) { prom_printf("Returned %d from switching to secure mode.\n", ret); prom_rtas_os_term("Switch to secure mode failed.\n"); diff --git a/arch/powerpc/kernel/prom_init_check.sh b/arch/powerpc/kernel/prom_init_check.sh index 78bab17b1396..b183ab9c5107 100644 --- a/arch/powerpc/kernel/prom_init_check.sh +++ b/arch/powerpc/kernel/prom_init_check.sh @@ -26,7 +26,8 @@ _end enter_prom $MEM_FUNCS reloc_offset __secondary_hold __secondary_hold_acknowledge __secondary_hold_spinloop __start logo_linux_clut224 btext_prepare_BAT reloc_got2 kernstart_addr memstart_addr linux_banner _stext -__prom_init_toc_start __prom_init_toc_end btext_setup_display TOC." +__prom_init_toc_start __prom_init_toc_end btext_setup_display TOC. +relocate" NM="$1" OBJ="$2" -- cgit v1.2.3