summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2018-05-10powerpc: wii_defconfig: Disable Ethernet driver support codeJonathan Neuschäfer
The Wii doesn't have built-in Ethernet and USB Ethernet adapters are in a different menu. Disable CONFIG_ETHERNET to save some space in support code for Ethernet drivers. Note that this patch doesn't disable any Ethernet drivers, because they are not enabled by default. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10powerpc/watchdog: fix typo 'can by' to 'can be'Wolfram Sang
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10powerpc/pseries: hcall_exit tracepoint retval should be signedMichael Ellerman
The hcall_exit() tracepoint has retval defined as unsigned long. That leads to humours results like: bash-3686 [009] d..2 854.134094: hcall_entry: opcode=24 bash-3686 [009] d..2 854.134095: hcall_exit: opcode=24 retval=18446744073709551609 It's normal for some hcalls to return negative values, displaying them as unsigned isn't very helpful. So change it to signed. bash-3711 [001] d..2 471.691008: hcall_entry: opcode=24 bash-3711 [001] d..2 471.691008: hcall_exit: opcode=24 retval=-7 Which can be more easily compared to H_NOT_FOUND in hvcall.h Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
2018-05-07Revert "powerpc/powernv: Increase memory block size to 1GB on radix"Balbir Singh
This commit was a stop-gap to prevent crashes on hotunplug, caused by the mismatch between the 1G mappings used for the linear mapping and the memory block size. Those issues are now resolved because we split the linear mapping at hotunplug time if necessary, as implemented in commit 4dd5f8a99e79 ("powerpc/mm/radix: Split linear mapping on hot-unplug"). Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Tested-by: Rashmica Gupta <rashmica.g@gmail.com> Tested-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07powerpc/nohash: Use IS_ENABLED() to simplify __set_pte_at()Christophe Leroy
By using IS_ENABLED() we can simplify __set_pte_at() by removing redundant *ptep = pte. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07powerpc/nohash: Remove _PAGE_BUSYChristophe Leroy
_PAGE_BUSY is always 0, remove it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07powerpc/nohash: Remove hash related code from nohash headers.Christophe Leroy
When nohash and book3s header were split, some hash related stuff remained in the nohash header. This patch removes them. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Duplicate pte_young() to avoid circular header dependency] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc/fadump: Unregister fadump on kexec down path.Mahesh Salgaonkar
Unregister fadump on kexec down path otherwise the fadump registration in new kexec-ed kernel complains that fadump is already registered. This makes new kernel to continue using fadump registered by previous kernel which may lead to invalid vmcore generation. Hence this patch fixes this issue by un-registering fadump in fadump_cleanup() which is called during kexec path so that new kernel can register fadump with new valid values. Fixes: b500afff11f6 ("fadump: Invalidate registration and release reserved memory for general use.") Cc: stable@vger.kernel.org # v3.4+ Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc/fadump: Do not use hugepages when fadump is activeHari Bathini
FADump capture kernel boots in restricted memory environment preserving the context of previous kernel to save vmcore. Supporting hugepages in such environment makes things unnecessarily complicated, as hugepages need memory set aside for them. This means most of the capture kernel's memory is used in supporting hugepages. In most cases, this results in out-of-memory issues while booting FADump capture kernel. But hugepages are not of much use in capture kernel whose only job is to save vmcore. So, disabling hugepages support, when fadump is active, is a reliable solution for the out of memory issues. Introducing a flag variable to disable HugeTLB support when fadump is active. Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc/fadump: exclude memory holes while reserving memory in second kernelMahesh Salgaonkar
The second kernel, during early boot after the crash, reserves rest of the memory above boot memory size to make sure it does not touch any of the dump memory area. It uses memblock_reserve() that reserves the specified memory region irrespective of memory holes present within that region. There are chances where previous kernel would have hot removed some of its memory leaving memory holes behind. In such cases fadump kernel reports incorrect number of reserved pages through arch_reserved_kernel_pages() hook causing kernel to hang or panic. Fix this by excluding memory holes while reserving rest of the memory above boot memory size during second kernel boot after crash. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc: remove retired sbc834x supportPaul Gortmaker
I no longer have a functional version of this board for even the most basic sanity boot testing, and they have not been available for purchase for quite some years now. There is no point in adding a burden to testing coverage that does walk all the possible defconfigs, so with all the above in mind, it makes sense to remove it. Of course it will remain in the git history for anyone who happens to stumble on one and wants to tinker with it. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc: Only support DYNAMIC_FTRACE not staticMichael Ellerman
We've had dynamic ftrace support for over 9 years since Steve first wrote it, all the distros use dynamic, and static is basically untested these days, so drop support for static ftrace. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc64/ftrace: Implement support for ftrace_regs_caller()Naveen N. Rao
With -mprofile-kernel, we always save the full register state in ftrace_caller(). While this works, this is inefficient if we're not interested in the register state, such as when we're using the function tracer. Rename the existing ftrace_caller() as ftrace_regs_caller() and provide a simpler implementation for ftrace_caller() that is used when registers are not required to be saved. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc64/ftrace: Use the generic version of ftrace_replace_code()Naveen N. Rao
Our implementation matches that of the generic version, which also handles FTRACE_UPDATE_MODIFY_CALL. So, remove our implementation in favor of the generic version. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc64/module: Tighten detection of mcount call sites with -mprofile-kernelNaveen N. Rao
For R_PPC64_REL24 relocations, we suppress emitting instructions for TOC load/restore in the relocation stub if the relocation is for _mcount() call when using -mprofile-kernel ABI. To detect this, we check if the preceding instructions are per the standard set of instructions emitted by gcc: either the two instruction sequence of 'mflr r0; std r0,16(r1)', or the more optimized variant of a single 'mflr r0'. This is not sufficient since nothing prevents users from hand coding sequences involving a 'mflr r0' followed by a 'bl'. For removing the toc save instruction from the stub, we additionally check if the symbol is "_mcount". Add the same check here as well. Also rename is_early_mcount_callsite() to is_mprofile_mcount_callsite() since that is what is being checked. The use of "early" is misleading since there is nothing involving this function that qualifies as early. Fixes: 153086644fd1f ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc64/kexec: Hard disable ftrace before switching to the new kernelNaveen N. Rao
If function_graph tracer is enabled during kexec, we see the below exception in the simulator: root@(none):/# kexec -e kvm: exiting hardware virtualization kexec_core: Starting new kernel [ 19.262020070,5] OPAL: Switch to big-endian OS kexec: Starting switchover sequence. Interrupt to 0xC000000000004380 from 0xC000000000004380 ** Execution stopped: Continuous Interrupt, Instruction caused exception, ** Now that we have a more effective way to completely disable ftrace on ppc64, let's also use that before switching to a new kernel during kexec. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc64/ftrace: Disable ftrace during kvm entry/exitNaveen N. Rao
During guest entry/exit, we switch over to/from the guest MMU context and we cannot take exceptions in the hypervisor code. Since ftrace may be enabled and since it can result in us taking a trap, disable ftrace by setting paca->ftrace_enabled to zero. There are two paths through which we enter/exit a guest: 1. If we are the vcore runner, then we enter the guest via __kvmppc_vcore_entry() and we disable ftrace around this. This is always the case for Power9, and for the primary thread on Power8. 2. If we are a secondary thread in Power8, then we would be in nap due to SMT being disabled. We are woken up by an IPI to enter the guest. In this scenario, we enter the guest through kvm_start_guest(). We disable ftrace at this point. In this scenario, ftrace would only get re-enabled on the secondary thread when SMT is re-enabled (via start_secondary()). Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc64/ftrace: Disable ftrace during hotplugNaveen N. Rao
Disable ftrace when a cpu is about to go offline. When the cpu is woken up, ftrace will get enabled in start_secondary(). Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc64/ftrace: Delay enabling ftrace on secondary cpusNaveen N. Rao
On the boot cpu, though we enable paca->ftrace_enabled in early_setup() (via cpu_ready_for_interrupts()), we don't start tracing until much later since ftrace is not initialized yet and since we only support DYNAMIC_FTRACE on powerpc. However, it is possible that ftrace has been initialized by the time some of the secondary cpus start up. In this case, we will try to trace some of the early boot code which can cause problems. To address this, move setting paca->ftrace_enabled from cpu_ready_for_interrupts() to early_setup() for the boot cpu, and towards the end of start_secondary() for secondary cpus. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc64/ftrace: Add helpers to hard disable ftraceNaveen N. Rao
Add some helpers to enable/disable ftrace through paca->ftrace_enabled. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc64/ftrace: Rearrange #ifdef sections in ftrace.hNaveen N. Rao
Re-arrange the last #ifdef section in preparation for a subsequent change. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc64/ftrace: Add a field in paca to disable ftrace in unsafe code pathsNaveen N. Rao
We have some C code that we call into from real mode where we cannot take any exceptions. Though the C functions themselves are mostly safe, if these functions are traced, there is a possibility that we may take an exception. For instance, in certain conditions, the ftrace code uses WARN(), which uses a 'trap' to do its job. For such scenarios, introduce a new field in paca 'ftrace_enabled', which is checked on ftrace entry before continuing. This field can then be set to zero to disable/pause ftrace, and set to a non-zero value to resume ftrace. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-27powerpc/kvm/booke: Fix altivec related build breakLaurentiu Tudor
Add missing "altivec unavailable" interrupt injection helper thus fixing the linker error below: arch/powerpc/kvm/emulate_loadstore.o: In function `kvmppc_check_altivec_disabled': arch/powerpc/kvm/emulate_loadstore.c: undefined reference to `.kvmppc_core_queue_vec_unavail' Fixes: 09f984961c137c4b ("KVM: PPC: Book3S: Add MMIO emulation for VMX instructions") Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-27powerpc: Fix deadlock with multiple calls to smp_send_stopNicholas Piggin
smp_send_stop can lock up the IPI path for any subsequent calls, because the receiving CPUs spin in their handler function. This started becoming a problem with the addition of an smp_send_stop call in the reboot path, because panics can reboot after doing their own smp_send_stop. The NMI IPI variant was fixed with ac61c11566 ("powerpc: Fix smp_send_stop NMI IPI handling"), which leaves the smp_call_function variant. This is fixed by having smp_send_stop only ever do the smp_call_function once. This is a bit less robust than the NMI IPI fix, because any other call to smp_call_function after smp_send_stop could deadlock, but that has always been the case, and it was not been a problem before. Fixes: f2748bdfe1573 ("powerpc/powernv: Always stop secondaries before reboot/shutdown") Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-25powerpc: Fix smp_send_stop NMI IPI handlingNicholas Piggin
The NMI IPI handler for a receiving CPU increments nmi_ipi_busy_count over the handler function call, which causes later smp_send_nmi_ipi() callers to spin until the call is finished. The stop_this_cpu() function never returns, so the busy count is never decremeted, which can cause the system to hang in some cases. For example panic() will call smp_send_stop() early on which calls stop_this_cpu() on other CPUs, then later in the reboot path, pnv_restart() will call smp_send_stop() again, which hangs. Fix this by adding a special case to the stop_this_cpu() handler to decrement the busy count, because it will never return. Now that the NMI/non-NMI versions of stop_this_cpu() are different, split them out into separate functions rather than doing #ifdef tricks to share the body between the two functions. Fixes: 6bed3237624e3 ("powerpc: use NMI IPI for smp_send_stop") Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out the functions, tweak change log a bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-25rtc: opal: Fix OPAL RTC driver OPAL_BUSY loopsNicholas Piggin
The OPAL RTC driver does not sleep in case it gets OPAL_BUSY or OPAL_BUSY_EVENT from firmware, which causes large scheduling latencies, up to 50 seconds have been observed here when RTC stops responding (BMC reboot can do it). Fix this by converting it to the standard form OPAL_BUSY loop that sleeps. Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks") Cc: stable@vger.kernel.org # v3.2+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/mce: Fix a bug where mce loops on memory UE.Mahesh Salgaonkar
The current code extracts the physical address for UE errors and then hooks it up into memory failure infrastructure. On successful extraction of physical address it wrongly sets "handled = 1" which means this UE error has been recovered. Since MCE handler gets return value as handled = 1, it assumes that error has been recovered and goes back to same NIP. This causes MCE interrupt again and again in a loop leading to hard lockup. Also, initialize phys_addr to ULONG_MAX so that we don't end up queuing undesired page to hwpoison. Without this patch we see: Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 ... Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 Memory failure: 0x20181a08: recovery action for dirty LRU page: Recovered Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned ... Watchdog CPU:38 Hard LOCKUP After this patch we see: Severe Machine check interrupt [Not recovered] NIP: [00007fffaae585f4] PID: 7168 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffaafe28ac Physical address: 00002017c0bd0000 find[7168]: unhandled signal 7 at 00007fffaae585f4 nip 00007fffaae585f4 lr 00007fffaae585e0 code 4 Memory failure: 0x2017c0bd: recovery action for dirty LRU page: Recovered Fixes: 01eaac2b0591 ("powerpc/mce: Hookup ierror (instruction) UE errors") Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/powernv/npu: Do a PID GPU TLB flush when invalidating a large ↵Alistair Popple
address range The NPU has a limited number of address translation shootdown (ATSD) registers and the GPU has limited bandwidth to process ATSDs. This can result in contention of ATSD registers leading to soft lockups on some threads, particularly when invalidating a large address range in pnv_npu2_mn_invalidate_range(). At some threshold it becomes more efficient to flush the entire GPU TLB for the given MM context (PID) than individually flushing each address in the range. This patch will result in ranges greater than 2MB being converted from 32+ ATSDs into a single ATSD which will flush the TLB for the given PID on each GPU. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Alistair Popple <alistair@popple.id.au> Acked-by: Balbir Singh <bsingharora@gmail.com> Tested-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/powernv/npu: Prevent overwriting of pnv_npu2_init_contex() callback ↵Alistair Popple
parameters There is a single npu context per set of callback parameters. Callers should be prevented from overwriting existing callback values so instead return an error if different parameters are passed. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Mark Hairgrove <mhairgrove@nvidia.com> Tested-by: Mark Hairgrove <mhairgrove@nvidia.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/powernv/npu: Add lock to prevent race in concurrent context init/destroyAlistair Popple
The pnv_npu2_init_context() and pnv_npu2_destroy_context() functions are used to allocate/free contexts to allow address translation and shootdown by the NPU on a particular GPU. Context initialisation is implicitly safe as it is protected by the requirement mmap_sem be held in write mode, however pnv_npu2_destroy_context() does not require mmap_sem to be held and it is not safe to call with a concurrent initialisation for a different GPU. It was assumed the driver would ensure destruction was not called concurrently with initialisation. However the driver may be simplified by allowing concurrent initialisation and destruction for different GPUs. As npu context creation/destruction is not a performance critical path and the critical section is not large a single spinlock is used for simplicity. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Mark Hairgrove <mhairgrove@nvidia.com> Tested-by: Mark Hairgrove <mhairgrove@nvidia.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/powernv/memtrace: Let the arch hotunplug code flush cacheBalbir Singh
Don't do this via custom code, instead now that we have support in the arch hotplug/hotunplug code, rely on those routines to do the right thing. The existing flush doesn't work because it uses ppc64_caches.l1d.size instead of ppc64_caches.l1d.line_size. Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing") Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/mm: Flush cache on memory hot(un)plugBalbir Singh
This patch adds support for flushing potentially dirty cache lines when memory is hot-plugged/hot-un-plugged. The support is currently limited to 64 bit systems. The bug was exposed when mappings for a device were actually hot-unplugged and plugged in back later. A similar issue was observed during the development of memtrace, but memtrace does it's own flushing of region via a custom routine. These patches do a flush both on hotplug/unplug to clear any stale data in the cache w.r.t mappings, there is a small race window where a clean cache line may be created again just prior to tearing down the mapping. The patches were tested by disabling the flush routines in memtrace and doing I/O on the trace file. The system immediately checkstops (quite reliablly if prior to the hot-unplug of the memtrace region, we memset the regions we are about to hot unplug). After these patches no custom flushing is needed in the memtrace code. Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-20proc: fix /proc/loadavg regressionAlexey Dobriyan
Commit 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API") changed last field of /proc/loadavg (last pid allocated) to be off by one: # unshare -p -f --mount-proc cat /proc/loadavg 0.00 0.00 0.00 1/60 2 <=== It should be 1 after first fork into pid namespace. This is formally a regression but given how useless this field is I don't think anyone is affected. Bug was found by /proc testsuite! Link: http://lkml.kernel.org/r/20180413175408.GA27246@avx2 Fixes: 95846ecf9dac508 ("pid: replace pid bitmap implementation with IDR API") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gargi Sharma <gs051095@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-19powerpc/kvm: Fix lockups when running KVM guests on Power8Michael Ellerman
When running KVM guests on Power8 we can see a lockup where one CPU stops responding. This often leads to a message such as: watchdog: CPU 136 detected hard LOCKUP on other CPUs 72 Task dump for CPU 72: qemu-system-ppc R running task 10560 20917 20908 0x00040004 And then backtraces on other CPUs, such as: Task dump for CPU 48: ksmd R running task 10032 1519 2 0x00000804 Call Trace: ... --- interrupt: 901 at smp_call_function_many+0x3c8/0x460 LR = smp_call_function_many+0x37c/0x460 pmdp_invalidate+0x100/0x1b0 __split_huge_pmd+0x52c/0xdb0 try_to_unmap_one+0x764/0x8b0 rmap_walk_anon+0x15c/0x370 try_to_unmap+0xb4/0x170 split_huge_page_to_list+0x148/0xa30 try_to_merge_one_page+0xc8/0x990 try_to_merge_with_ksm_page+0x74/0xf0 ksm_scan_thread+0x10ec/0x1ac0 kthread+0x160/0x1a0 ret_from_kernel_thread+0x5c/0x78 This is caused by commit 8c1c7fb0b5ec ("powerpc/64s/idle: avoid sync for KVM state when waking from idle"), which added a check in pnv_powersave_wakeup() to see if the kvm_hstate.hwthread_state is already set to KVM_HWTHREAD_IN_KERNEL, and if so to skip the store and test of kvm_hstate.hwthread_req. The problem is that the primary does not set KVM_HWTHREAD_IN_KVM when entering the guest, so it can then come out to cede with KVM_HWTHREAD_IN_KERNEL set. It can then go idle in kvm_do_nap after setting hwthread_req to 1, but because hwthread_state is still KVM_HWTHREAD_IN_KERNEL we will skip the test of hwthread_req when we wake up from idle and won't go to kvm_start_guest. From there the thread will return somewhere garbage and crash. Fix it by skipping the store of hwthread_state, but not the test of hwthread_req, when coming out of idle. It's OK to skip the sync in that case because hwthread_req will have been set on the same thread, so there is no synchronisation required. Fixes: 8c1c7fb0b5ec ("powerpc/64s/idle: avoid sync for KVM state when waking from idle") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-19powerpc/eeh: Fix enabling bridge MMIO windowsMichael Neuling
On boot we save the configuration space of PCIe bridges. We do this so when we get an EEH event and everything gets reset that we can restore them. Unfortunately we save this state before we've enabled the MMIO space on the bridges. Hence if we have to reset the bridge when we come back MMIO is not enabled and we end up taking an PE freeze when the driver starts accessing again. This patch forces the memory/MMIO and bus mastering on when restoring bridges on EEH. Ideally we'd do this correctly by saving the configuration space writes later, but that will have to come later in a larger EEH rewrite. For now we have this simple fix. The original bug can be triggered on a boston machine by doing: echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound On boston, this PHB has a PCIe switch on it. Without this patch, you'll see two EEH events, 1 expected and 1 the failure we are fixing here. The second EEH event causes the anything under the PHB to disappear (i.e. the i40e eth). With this patch, only 1 EEH event occurs and devices properly recover. Fixes: 652defed4875 ("powerpc/eeh: Check PCIe link after reset") Cc: stable@vger.kernel.org # v3.11+ Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-19powerpc/xive: Fix trying to "push" an already active pool VPBenjamin Herrenschmidt
When setting up a CPU, we "push" (activate) a pool VP for it. However it's an error to do so if it already has an active pool VP. This happens when doing soft CPU hotplug on powernv since we don't tear down the CPU on unplug. The HW flags the error which gets captured by the diagnostics. Fix this by making sure to "pull" out any already active pool first. Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-17powerpc/64s: Default l1d_size to 64K in RFI fallback flushMadhavan Srinivasan
If there is no d-cache-size property in the device tree, l1d_size could be zero. We don't actually expect that to happen, it's only been seen on mambo (simulator) in some configurations. A zero-size l1d_size leads to the loop in the asm wrapping around to 2^64-1, and then walking off the end of the fallback area and eventually causing a page fault which is fatal. Just default to 64K which is correct on some CPUs, and sane enough to not cause a crash on others. Fixes: aa8a5e0062ac9 ('powerpc/64s: Add support for RFI flush of L1-D cache') Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Rewrite comment and change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-17powerpc/lib: Fix off-by-one in alternate feature patchingMichael Ellerman
When we patch an alternate feature section, we have to adjust any relative branches that branch out of the alternate section. But currently we have a bug if we have a branch that points to past the last instruction of the alternate section, eg: FTR_SECTION_ELSE 1: b 2f or 6,6,6 2: ALT_FTR_SECTION_END(...) nop This will result in a relative branch at 1 with a target that equals the end of the alternate section. That branch does not need adjusting when it's moved to the non-else location. Currently we do adjust it, resulting in a branch that goes off into the link-time location of the else section, which is junk. The fix is to not patch branches that have a target == end of the alternate section. Fixes: d20fe50a7b3c ("KVM: PPC: Book3S HV: Branch inside feature section") Fixes: 9b1a735de64c ("powerpc: Add logic to patch alternative feature sections") Cc: stable@vger.kernel.org # v2.6.27+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-15Merge tag 'powerpc-4.17-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix crashes when loading modules built with a different CONFIG_RELOCATABLE value by adding CONFIG_RELOCATABLE to vermagic. - Fix busy loops in the OPAL NVRAM driver if we get certain error conditions from firmware. - Remove tlbie trace points from KVM code that's called in real mode, because it causes crashes. - Fix checkstops caused by invalid tlbiel on Power9 Radix. - Ensure the set of CPU features we "know" are always enabled is actually the minimal set when we build with support for firmware supplied CPU features. Thanks to: Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin. * tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features powerpc/mm/radix: Fix checkstops caused by invalid tlbiel KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode powerpc/8xx: Fix build with hugetlbfs enabled powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops powerpc/fscr: Enable interrupts earlier before calling get_user() powerpc/64s: Fix section mismatch warnings from setup_rfi_flush() powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic
2018-04-13kernel/kexec_file.c: allow archs to set purgatory load addressPhilipp Rudo
For s390 new kernels are loaded to fixed addresses in memory before they are booted. With the current code this is a problem as it assumes the kernel will be loaded to an 'arbitrary' address. In particular, kexec_locate_mem_hole searches for a large enough memory region and sets the load address (kexec_bufer->mem) to it. Luckily there is a simple workaround for this problem. By returning 1 in arch_kexec_walk_mem, kexec_locate_mem_hole is turned off. This allows the architecture to set kbuf->mem by hand. While the trick works fine for the kernel it does not for the purgatory as here the architectures don't have access to its kexec_buffer. Give architectures access to the purgatories kexec_buffer by changing kexec_load_purgatory to take a pointer to it. With this change architectures have access to the buffer and can edit it as they need. A nice side effect of this change is that we can get rid of the purgatory_info->purgatory_load_address field. As now the information stored there can directly be accessed from kbuf->mem. Link: http://lkml.kernel.org/r/20180321112751.22196-11-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kexec_file,x86,powerpc: factor out kexec_file_ops functionsAKASHI Takahiro
As arch_kexec_kernel_image_{probe,load}(), arch_kimage_file_post_load_cleanup() and arch_kexec_kernel_verify_sig() are almost duplicated among architectures, they can be commonalized with an architecture-defined kexec_file_ops array. So let's factor them out. Link: http://lkml.kernel.org/r/20180306102303.9063-3-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kexec_file: make use of purgatory optionalAKASHI Takahiro
Patch series "kexec_file, x86, powerpc: refactoring for other architecutres", v2. This is a preparatory patchset for adding kexec_file support on arm64. It was originally included in a arm64 patch set[1], but Philipp is also working on their kexec_file support on s390[2] and some changes are now conflicting. So these common parts were extracted and put into a separate patch set for better integration. What's more, my original patch#4 was split into a few small chunks for easier review after Dave's comment. As such, the resulting code is basically identical with my original, and the only *visible* differences are: - renaming of _kexec_kernel_image_probe() and _kimage_file_post_load_cleanup() - change one of types of arguments at prepare_elf64_headers() Those, unfortunately, require a couple of trivial changes on the rest (#1, #6 to #13) of my arm64 kexec_file patch set[1]. Patch #1 allows making a use of purgatory optional, particularly useful for arm64. Patch #2 commonalizes arch_kexec_kernel_{image_probe, image_load, verify_sig}() and arch_kimage_file_post_load_cleanup() across architectures. Patches #3-#7 are also intended to generalize parse_elf64_headers(), along with exclude_mem_range(), to be made best re-use of. [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-February/561182.html [2] http://lkml.iu.edu//hypermail/linux/kernel/1802.1/02596.html This patch (of 7): On arm64, crash dump kernel's usable memory is protected by *unmapping* it from kernel virtual space unlike other architectures where the region is just made read-only. It is highly unlikely that the region is accidentally corrupted and this observation rationalizes that digest check code can also be dropped from purgatory. The resulting code is so simple as it doesn't require a bit ugly re-linking/relocation stuff, i.e. arch_kexec_apply_relocations_add(). Please see: http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545428.html All that the purgatory does is to shuffle arguments and jump into a new kernel, while we still need to have some space for a hash value (purgatory_sha256_digest) which is never checked against. As such, it doesn't make sense to have trampline code between old kernel and new kernel on arm64. This patch introduces a new configuration, ARCH_HAS_KEXEC_PURGATORY, and allows related code to be compiled in only if necessary. [takahiro.akashi@linaro.org: fix trivial screwup] Link: http://lkml.kernel.org/r/20180309093346.GF25863@linaro.org Link: http://lkml.kernel.org/r/20180306102303.9063-2-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU featuresMichael Ellerman
The cpu_has_feature() mechanism has an optimisation where at build time we construct a mask of the CPU feature bits that will always be true for the given .config, based on the platform/bitness/etc. that we are building for. That is incompatible with DT CPU features, where the set of CPU features is dependent on feature flags that are given to us by firmware. The result is that some feature bits can not be *disabled* by DT CPU features. Or more accurately, they can be disabled but they will still appear in the ALWAYS mask, meaning cpu_has_feature() will always return true for them. In the past this hasn't really been a problem because on Book3S 64 (where we support DT CPU features), the set of ALWAYS bits has been very small. That was because we always built for POWER4 and later, meaning the set of common bits was small. The only bit that could be cleared by DT CPU features that was also in the ALWAYS mask was CPU_FTR_NODSISRALIGN, and that was only used in the alignment handler to create a fake DSISR. That code was itself deleted in 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to handle alignment faults") (Sep 2017). However the set of ALWAYS features changed with the recent commit db5ae1c155af ("powerpc/64s: Refine feature sets for little endian builds") which restricted the set of feature flags when building little endian to Power7 or later. That caused the ALWAYS mask to become much larger for little endian builds. The result is that the following feature bits can currently not be *disabled* by DT CPU features: CPU_FTR_REAL_LE, CPU_FTR_MMCRA, CPU_FTR_CTRL, CPU_FTR_SMT, CPU_FTR_PURR, CPU_FTR_SPURR, CPU_FTR_DSCR, CPU_FTR_PKEY, CPU_FTR_VMX_COPY, CPU_FTR_CFAR, CPU_FTR_HAS_PPR. To fix it we need to mask the set of ALWAYS features with the base set of DT CPU features, ie. the features that are always enabled by DT CPU features. That way there are no bits in the ALWAYS mask that are not also always set by DT CPU features. Fixes: db5ae1c155af ("powerpc/64s: Refine feature sets for little endian builds") Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-12powerpc/mm/radix: Fix checkstops caused by invalid tlbielMichael Ellerman
In tlbiel_radix_set_isa300() we use the PPC_TLBIEL() macro to construct tlbiel instructions. The instruction takes 5 fields, two of which are registers, and the others are constants. But because it's constructed with inline asm the compiler doesn't know that. We got the constraint wrong on the 'r' field, using "r" tells the compiler to put the value in a register. The value we then get in the macro is the *register number*, not the value of the field. That means when we mask the register number with 0x1 we get 0 or 1 depending on which register the compiler happens to put the constant in, eg: li r10,1 tlbiel r8,r9,2,0,0 li r7,1 tlbiel r10,r6,0,0,1 If we're unlucky we might generate an invalid instruction form, for example RIC=0, PRS=1 and R=0, tlbiel r8,r7,0,1,0, this has been observed to cause machine checks: Oops: Machine check, sig: 7 [#1] CPU: 24 PID: 0 Comm: swapper NIP: 00000000000385f4 LR: 000000000100ed00 CTR: 000000000000007f REGS: c00000000110bb40 TRAP: 0200 MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48002222 XER: 20040000 CFAR: 00000000000385d0 DAR: 0000000000001c00 DSISR: 00000200 SOFTE: 1 If the machine check happens early in boot while we have MSR_ME=0 it will escalate into a checkstop and kill the box entirely. To fix it we could change the inline asm constraint to "i" which tells the compiler the value is a constant. But a better fix is to just pass a literal 1 into the macro, which bypasses any problems with inline asm constraints. Fixes: d4748276ae14 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2018-04-11exec: pass stack rlimit into mm layout functionsKees Cook
Patch series "exec: Pin stack limit during exec". Attempts to solve problems with the stack limit changing during exec continue to be frustrated[1][2]. In addition to the specific issues around the Stack Clash family of flaws, Andy Lutomirski pointed out[3] other places during exec where the stack limit is used and is assumed to be unchanging. Given the many places it gets used and the fact that it can be manipulated/raced via setrlimit() and prlimit(), I think the only way to handle this is to move away from the "current" view of the stack limit and instead attach it to the bprm, and plumb this down into the functions that need to know the stack limits. This series implements the approach. [1] 04e35f4495dd ("exec: avoid RLIMIT_STACK races with prlimit()") [2] 779f4e1c6c7c ("Revert "exec: avoid RLIMIT_STACK races with prlimit()"") [3] to security@kernel.org, "Subject: existing rlimit races?" This patch (of 3): Since it is possible that the stack rlimit can change externally during exec (either via another thread calling setrlimit() or another process calling prlimit()), provide a way to pass the rlimit down into the per-architecture mm layout functions so that the rlimit can stay in the bprm structure instead of sitting in the signal structure until exec is finalized. Link: http://lkml.kernel.org/r/1518638796-20819-2-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Willy Tarreau <w@1wt.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Rik van Riel <riel@redhat.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Greg KH <greg@kroah.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Cc: Brad Spengler <spender@grsecurity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11mm, migrate: remove reason argument from new_page_tMichal Hocko
No allocation callback is using this argument anymore. new_page_node used to use this parameter to convey node_id resp. migration error up to move_pages code (do_move_page_to_node_array). The error status never made it into the final status field and we have a better way to communicate node id to the status field now. All other allocation callbacks simply ignored the argument so we can drop it finally. [mhocko@suse.com: fix migration callback] Link: http://lkml.kernel.org/r/20180105085259.GH2801@dhcp22.suse.cz [akpm@linux-foundation.org: fix alloc_misplaced_dst_page()] [mhocko@kernel.org: fix build] Link: http://lkml.kernel.org/r/20180103091134.GB11319@dhcp22.suse.cz Link: http://lkml.kernel.org/r/20180103082555.14592-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Andrea Reale <ar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11KVM: PPC: Book3S HV: trace_tlbie must not be called in realmodeNicholas Piggin
This crashes with a "Bad real address for load" attempting to load from the vmalloc region in realmode (faulting address is in DAR). Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted 4.16.0-01530-g43d1859f0994 NIP: c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580 REGS: c000000fff76dd80 TRAP: 0200 Not tainted (4.16.0-01530-g43d1859f0994) MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48082222 XER: 00000000 CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3 NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 LR [c0000000000c2430] do_tlbies+0x230/0x2f0 I suspect the reason is the per-cpu data is not in the linear chunk. This could be restored if that was able to be fixed, but for now, just remove the tracepoints. Fixes: 0428491cba92 ("powerpc/mm: Trace tlbie(l) instructions") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-11powerpc/8xx: Fix build with hugetlbfs enabledAneesh Kumar K.V
8xx uses the slice code when hugetlbfs is enabled. We missed a header include on 8xx which resulted in the below build failure: config: mpc885_ads_defconfig + CONFIG_HUGETLBFS arch/powerpc/mm/slice.c: In function 'slice_get_unmapped_area': arch/powerpc/mm/slice.c:655:2: error: implicit declaration of function 'need_extra_context' arch/powerpc/mm/slice.c:656:3: error: implicit declaration of function 'alloc_extended_context' on PPC64 the mmu_context.h was included via linux/pkeys.h Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-11powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loopsNicholas Piggin
The OPAL NVRAM driver does not sleep in case it gets OPAL_BUSY or OPAL_BUSY_EVENT from firmware, which causes large scheduling latencies, and various lockup errors to trigger (again, BMC reboot can cause it). Fix this by converting it to the standard form OPAL_BUSY loop that sleeps. Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks") Depends-on: 34dd25de9fe3 ("powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops") Cc: stable@vger.kernel.org # v3.2+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-10Merge tag 'libnvdimm-for-4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This cycle was was not something I ever want to repeat as there were several late changes that have only now just settled. Half of the branch up to commit d2c997c0f145 ("fs, dax: use page->mapping to warn...") have been in -next for several releases. The of_pmem driver and the address range scrub rework were late arrivals, and the dax work was scaled back at the last moment. The of_pmem driver missed a previous merge window due to an oversight. A sense of obligation to rectify that miss is why it is included for 4.17. It has acks from PowerPC folks. Stephen reported a build failure that only occurs when merging it with your latest tree, for now I have fixed that up by disabling modular builds of of_pmem. A test merge with your tree has received a build success report from the 0day robot over 156 configs. An initial version of the ARS rework was submitted before the merge window. It is self contained to libnvdimm, a net code reduction, and passing all unit tests. The filesystem-dax changes are based on the wait_var_event() functionality from tip/sched/core. However, late review feedback showed that those changes regressed truncate performance to a large degree. The branch was rewound to drop the truncate behavior change and now only includes preparation patches and cleanups (with full acks and reviews). The finalization of this dax-dma-vs-trnucate work will need to wait for 4.18. Summary: - A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. - The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. - Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. - The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. - Miscellaneous cleanups and updates to unit test infrastructure" * tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits) libnvdimm, of_pmem: workaround OF_NUMA=n build error nfit, address-range-scrub: add module option to skip initial ars nfit, address-range-scrub: rework and simplify ARS state machine nfit, address-range-scrub: determine one platform max_ars value powerpc/powernv: Create platform devs for nvdimm buses doc/devicetree: Persistent memory region bindings libnvdimm: Add device-tree based driver libnvdimm: Add of_node to region and bus descriptors libnvdimm, region: quiet region probe libnvdimm, namespace: use a safe lookup for dimm device name libnvdimm, dimm: fix dpa reservation vs uninitialized label area libnvdimm, testing: update the default smart ctrl_temperature libnvdimm, testing: Add emulation for smart injection commands nfit, address-range-scrub: introduce nfit_spa->ars_state libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device' nfit, address-range-scrub: fix scrub in-progress reporting dax, dm: allow device-mapper to operate without dax support dax: introduce CONFIG_DAX_DRIVER fs, dax: use page->mapping to warn if truncate collides with a busy page ext2, dax: introduce ext2_dax_aops ...