summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2018-03-23powerpc/mm: Add tracking of the number of coprocessors using a contextBenjamin Herrenschmidt
Currently, when using coprocessors (which use the Nest MMU), we simply increment the active_cpu count to force all TLB invalidations to be come broadcast. Unfortunately, due to an errata in POWER9, we will need to know more specifically that coprocessors are in use. This maintains a separate copros counter in the MMU context for that purpose. NB. The commit mentioned in the fixes tag below is not at fault for the bug we're fixing in this commit and the next, but this fix applies on top the infrastructure it introduced. Fixes: 03b8abedf4f4 ("cxl: Enable global TLBIs for cxl contexts") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-23powerpc/64s: Fix lost pending interrupt due to race causing lost update to ↵Nicholas Piggin
irq_happened force_external_irq_replay() can be called in the do_IRQ path with interrupts hard enabled and soft disabled if may_hard_irq_enable() set MSR[EE]=1. It updates local_paca->irq_happened with a load, modify, store sequence. If a maskable interrupt hits during this sequence, it will go to the masked handler to be marked pending in irq_happened. This update will be lost when the interrupt returns and the store instruction executes. This can result in unpredictable latencies, timeouts, lockups, etc. Fix this by ensuring hard interrupts are disabled before modifying irq_happened. This could cause any maskable asynchronous interrupt to get lost, but it was noticed on P9 SMP system doing RDMA NVMe target over 100GbE, so very high external interrupt rate and high IPI rate. The hang was bisected down to enabling doorbell interrupts for IPIs. These provided an interrupt type that could run at high rates in the do_IRQ path, stressing the race. Fixes: 1d607bb3bd60 ("powerpc/irq: Add mechanism to force a replay of interrupts") Cc: stable@vger.kernel.org # v4.8+ Reported-by: Carol L. Soto <clsoto@us.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-14powerpc/64s: Fix NULL AT_BASE_PLATFORM when using DT CPU featuresMichael Ellerman
When running virtualised the powerpc kernel is able to run the system in "compat mode" - which means the kernel and hardware are pretending to userspace that the CPU is an older version than it actually is. AT_BASE_PLATFORM is an AUXV entry that we export to userspace for use when we're running in that mode, which tells userspace the "platform" string for the real CPU version, as opposed to the faked version. Although we don't support compat mode when using DT CPU features, and arguably don't need to set AT_BASE_PLATFORM, the existing cputable based code always sets it even when we're running bare metal. That means the lack of AT_BASE_PLATFORM is a user-visible artifact of the fact that the kernel is using DT CPU features, which we don't want. So set it in the DT CPU features code also. This results in eg: $ LD_SHOW_AUXV=1 /bin/true | grep "AT_.*PLATFORM" AT_PLATFORM: power9 AT_BASE_PLATFORM:power9 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2018-03-06powerpc/pseries: Fix vector5 in ibm architecture vector tableBharata B Rao
With ibm,dynamic-memory-v2 and ibm,drc-info coming around the same time, byte22 in vector5 of ibm architecture vector table got set twice separately. The end result is that guest kernel isn't advertising support for ibm,dynamic-memory-v2. Fix this by removing the duplicate assignment of byte22. Fixes: 02ef6dd8109b ("powerpc: Enable support for ibm,drc-info devtree property") Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-28powerpc/boot: Fix random libfdt related build errorsGuenter Roeck
Once in a while I see build errors similar to the following when building images from a clean tree. Building powerpc:virtex-ml507:44x/virtex5_defconfig ... failed ------------ Error log: arch/powerpc/boot/treeboot-akebono.c:37:20: fatal error: libfdt.h: No such file or directory Building powerpc:bamboo:smpdev:44x/bamboo_defconfig ... failed ------------ Error log: arch/powerpc/boot/treeboot-akebono.c:37:20: fatal error: libfdt.h: No such file or directory arch/powerpc/boot/treeboot-currituck.c:35:20: fatal error: libfdt.h: No such file or directory Rebuilds will succeed. Turns out that several source files in arch/powerpc/boot/ include libfdt.h, but Makefile dependencies are incomplete. Let's fix that. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-23powerpc/powernv: Support firmware disable of RFI flushMichael Ellerman
Some versions of firmware will have a setting that can be configured to disable the RFI flush, add support for it. Fixes: 6e032b350cd1 ("powerpc/powernv: Check device-tree for RFI flush settings") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-23powerpc/pseries: Support firmware disable of RFI flushMichael Ellerman
Some versions of firmware will have a setting that can be configured to disable the RFI flush, add support for it. Fixes: 8989d56878a7 ("powerpc/pseries: Query hypervisor for RFI flush settings") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-23powerpc/mm/drmem: Fix unexpected flag value in ibm,dynamic-memory-v2Bharata B Rao
Memory addtion and removal by count and indexed-count methods temporarily mark the LMBs that are being added/removed by a special flag value DRMEM_LMB_RESERVED. Accessing flags value directly at a few places without proper accessor method is causing two unexpected side-effects: - DRMEM_LMB_RESERVED bit is becoming part of the flags word of drconf_cell_v2 entries in ibm,dynamic-memory-v2 DT property. - This results in extra drconf_cell entries in ibm,dynamic-memory-v2. For example if 1G memory is added, it leads to one entry for 3 LMBs and 1 separate entry for the last LMB. All the 4 LMBs should be defined by one entry here. Fix this by always accessing the flags by its accessor method drmem_lmb_flags(). Fixes: 2b31e3aec1db ("powerpc/drmem: Add support for ibm, dynamic-memory-v2 property") Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-22powerpc/bpf/jit: Fix 32-bit JIT for seccomp_data accessMark Lord
I am using SECCOMP to filter syscalls on a ppc32 platform, and noticed that the JIT compiler was failing on the BPF even though the interpreter was working fine. The issue was that the compiler was missing one of the instructions used by SECCOMP, so here is a patch to enable JIT for that instruction. Fixes: eb84bab0fb38 ("ppc: Kconfig: Enable BPF JIT on ppc32") Signed-off-by: Mark Lord <mlord@pobox.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-22powerpc/pseries: Revert support for ibm,drc-info devtree propertyMichael Bringmann
This reverts commit 02ef6dd8109b581343ebeb1c4c973513682535d6. The earlier patch tried to enable support for a new property "ibm,drc-info" on powerpc systems. Unfortunately, some errors in the associated patch set break things in some of the DLPAR operations. In particular when attempting to hot-add a new CPU or set of CPUs, the original patch failed to properly calculate the available resources, and aborted the operation. In addition, the original set missed several opportunities to compress and reuse common code. As the associated patch set was meant to provide an optimization of storage and performance of a set of device-tree properties for future systems with large amounts of resources, reverting just restores the previous behavior for existing systems. It seems unnecessary to enable this feature and introduce the consequent problems in the field that it will cause at this time, so please revert it for now until testing of the corrections are finished properly. Fixes: 02ef6dd8109b ("powerpc: Enable support for ibm,drc-info devtree property") Signed-off-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-22powerpc/pseries: Fix duplicate firmware feature for DRC_INFOMichael Ellerman
We had a mid-air collision between two new firmware features, DRMEM_V2 and DRC_INFO, and they ended up with the same value. No one's actually reported any problems, presumably because the new firmware that supports both properties is not widely available, and the two properties tend to be enabled together. Still if we ever had one enabled but not the other, the bugs that could result are many and varied. So fix it. Fixes: 3f38000eda48 ("powerpc/firmware: Add definitions for new drc-info firmware feature") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
2018-02-21powerpc/eeh: Fix crashes in eeh_report_resume()Juan J. Alvarez
The notify_resume() callback in eeh_ops is NULL on powernv, leading to crashes: NIP (null) LR eeh_report_resume+0x218/0x220 Call Trace: eeh_report_resume+0x1f0/0x220 (unreliable) eeh_pe_dev_traverse+0x98/0x170 eeh_handle_normal_event+0x3f4/0x650 eeh_handle_event+0x54/0x380 eeh_event_handler+0x14c/0x210 kthread+0x168/0x1b0 ret_from_kernel_thread+0x5c/0xb4 Fix it by adding a check before calling it. Fixes: 856e1eb9bdd4 ("PCI/AER: Add uevents in AER and EEH error/resume") Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com> Reviewed-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Tested-by: Carol L. Soto <clsoto@us.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Tested-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com> Acked-by: Michael Neuling <mikey@neuling.org> [mpe: Rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-16powerpc/pseries: Check for zero filled ibm,dynamic-memory propertyNathan Fontenot
Some versions of QEMU will produce an ibm,dynamic-reconfiguration-memory node with a ibm,dynamic-memory property that is zero-filled. This causes the drmem code to oops trying to parse this property. The fix for this is to validate that the property does contain LMB entries before trying to parse it and bail if the count is zero. Oops: Kernel access of bad area, sig: 11 [#1] DAR: 0000000000000010 NIP read_drconf_v1_cell+0x54/0x9c LR read_drconf_v1_cell+0x48/0x9c Call Trace: __param_initcall_debug+0x0/0x28 (unreliable) drmem_init+0x144/0x2f8 do_one_initcall+0x64/0x1d0 kernel_init_freeable+0x298/0x38c kernel_init+0x24/0x160 ret_from_kernel_thread+0x5c/0xb4 The ibm,dynamic-reconfiguration-memory device tree property generated that causes this: ibm,dynamic-reconfiguration-memory { ibm,lmb-size = <0x0 0x10000000>; ibm,memory-flags-mask = <0xff>; ibm,dynamic-memory = <0x0 0x0 0x0 0x0 0x0 0x0>; linux,phandle = <0x7e57eed8>; ibm,associativity-lookup-arrays = <0x1 0x4 0x0 0x0 0x0 0x0>; ibm,memory-preservation-time = <0x0>; }; Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Tested-by: Daniel Black <daniel@linux.vnet.ibm.com> [mpe: Trim oops report] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-15powerpc/pseries: Add empty update_numa_cpu_lookup_table() for NUMA=nCorentin Labbe
When CONFIG_NUMA is not set, the build fails with: arch/powerpc/platforms/pseries/hotplug-cpu.c:335:4: error: déclaration implicite de la fonction « update_numa_cpu_lookup_table » So we have to add update_numa_cpu_lookup_table() as an empty function when CONFIG_NUMA is not set. Fixes: 1d9a090783be ("powerpc/numa: Invalidate numa_cpu_lookup_table on cpu remove") Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-15powerpc/powernv: IMC fix out of bounds memory access at shutdownNicholas Piggin
The OPAL IMC driver's shutdown handler disables nest PMU counters by walking nodes and taking the first CPU out of their cpumask, which is used to index into the paca (get_hard_smp_processor_id()). This does not always do the right thing, and in particular for CPU-less nodes it returns NR_CPUS and that overruns the paca and dereferences random memory. Fix it by being more careful about checking returned CPU, and only using online CPUs. It's not clear this shutdown code makes sense after commit 885dcd709b ("powerpc/perf: Add nest IMC PMU support"), but this should not make things worse Currently the bug causes us to call OPAL with a junk CPU number. A separate patch in development to change the way pacas are allocated escalates this bug into a crash: Unable to handle kernel paging request for data at address 0x2a21af1eeb000076 Faulting instruction address: 0xc0000000000a5468 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP opal_imc_counters_shutdown+0x148/0x1d0 LR opal_imc_counters_shutdown+0x134/0x1d0 Call Trace: opal_imc_counters_shutdown+0x134/0x1d0 (unreliable) platform_drv_shutdown+0x44/0x60 device_shutdown+0x1f8/0x350 kernel_restart_prepare+0x54/0x70 kernel_restart+0x28/0xc0 SyS_reboot+0x1d0/0x2c0 system_call+0x58/0x6c Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-15powerpc/xive: Use hw CPU ids when configuring the CPU queuesCédric Le Goater
The CPU event notification queues on sPAPR should be configured using a hardware CPU identifier. The problem did not show up on the Power Hypervisor because pHyp supports 8 threads per core which keeps CPU number contiguous. This is not the case on all sPAPR virtual machines, some use SMT=1. Also improve error logging by adding the CPU number. Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-15powerpc: Expose TSCR via sysfs only on powernvCyril Bur
The TSCR can only be accessed in hypervisor mode. Fixes: 88b5e12eeb11 ("powerpc: Expose TSCR via sysfs") Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-14Merge tag 'powerpc-4.16-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "A larger batch of fixes than we'd like. Roughly 1/3 fixes for new code, 1/3 fixes for stable and 1/3 minor things. There's four commits fixing bugs when using 16GB huge pages on hash, caused by some of the preparatory changes for pkeys. Two fixes for bugs in the enhanced IRQ soft masking for local_t, one of which broke KVM in some circumstances. Four fixes for Power9. The most bizarre being a bug where futexes stopped working because a NULL pointer dereference didn't trap during early boot (it aliased the kernel mapping). A fix for memory hotplug when using the Radix MMU, and a fix for live migration of guests using the Radix MMU. Two fixes for hotplug on pseries machines. One where we weren't correctly updating NUMA info when CPUs are added and removed. And the other fixes crashes/hangs seen when doing memory hot remove during boot, which is apparently a thing people do. Finally a handful of build fixes for obscure configs and other minor fixes. Thanks to: Alexey Kardashevskiy, Aneesh Kumar K.V, Balbir Singh, Colin Ian King, Daniel Henrique Barboza, Florian Weimer, Guenter Roeck, Harish, Laurent Vivier, Madhavan Srinivasan, Mauricio Faria de Oliveira, Nathan Fontenot, Nicholas Piggin, Sam Bobroff" * tag 'powerpc-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: selftests/powerpc: Fix to use ucontext_t instead of struct ucontext powerpc/kdump: Fix powernv build break when KEXEC_CORE=n powerpc/pseries: Fix build break for SPLPAR=n and CPU hotplug powerpc/mm/hash64: Zero PGD pages on allocation powerpc/mm/hash64: Store the slot information at the right offset for hugetlb powerpc/mm/hash64: Allocate larger PMD table if hugetlb config is enabled powerpc/mm: Fix crashes with 16G huge pages powerpc/mm: Flush radix process translations when setting MMU type powerpc/vas: Don't set uses_vas for kernel windows powerpc/pseries: Enable RAS hotplug events later powerpc/mm/radix: Split linear mapping on hot-unplug powerpc/64s/radix: Boot-time NULL pointer protection using a guard-PID ocxl: fix signed comparison with less than zero powerpc/64s: Fix may_hard_irq_enable() for PMI soft masking powerpc/64s: Fix MASKABLE_RELON_EXCEPTION_HV_OOL macro powerpc/numa: Invalidate numa_cpu_lookup_table on cpu remove
2018-02-13powerpc/kdump: Fix powernv build break when KEXEC_CORE=nGuenter Roeck
If KEXEC_CORE is not enabled, powernv builds fail as follows. arch/powerpc/platforms/powernv/smp.c: In function 'pnv_smp_cpu_kill_self': arch/powerpc/platforms/powernv/smp.c:236:4: error: implicit declaration of function 'crash_ipi_callback' Add dummy function calls, similar to kdump_in_progress(), to solve the problem. Fixes: 4145f358644b ("powernv/kdump: Fix cases where the kdump kernel can get HMI's") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13powerpc/pseries: Fix build break for SPLPAR=n and CPU hotplugGuenter Roeck
Commit e67e02a544e9 ("powerpc/pseries: Fix cpu hotplug crash with memoryless nodes") adds an unconditional call to find_and_online_cpu_nid(), which is only declared if CONFIG_PPC_SPLPAR is enabled. This results in the following build error if this is not the case. arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_online_cpu': arch/powerpc/platforms/pseries/hotplug-cpu.c:369: undefined reference to `.find_and_online_cpu_nid' Follow the guideline provided by similar functions and provide a dummy function if CONFIG_PPC_SPLPAR is not enabled. This also moves the external function declaration into an include file where it should be. Fixes: e67e02a544e9 ("powerpc/pseries: Fix cpu hotplug crash with memoryless nodes") Signed-off-by: Guenter Roeck <linux@roeck-us.net> [mpe: Change subject to emphasise the build fix] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13powerpc/mm/hash64: Zero PGD pages on allocationAneesh Kumar K.V
On powerpc we allocate page table pages from slab caches of different sizes. Currently we have a constructor that zeroes out the objects when we allocate them for the first time. We expect the objects to be zeroed out when we free the the object back to slab cache. This happens in the unmap path. For hugetlb pages we call huge_pte_get_and_clear() to do that. With the current configuration of page table size, both PUD and PGD level tables are allocated from the same slab cache. At the PUD level, we use the second half of the table to store the slot information. But we never clear that when unmapping. When such a freed object is then allocated for a PGD page, the second half of the page table page will not be zeroed as expected. This results in a kernel crash. Fix it by always clearing PGD pages when they're allocated. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Change log wording and formatting, add whitespace] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13powerpc/mm/hash64: Store the slot information at the right offset for hugetlbAneesh Kumar K.V
The hugetlb pte entries are at the PMD and PUD level, so we can't use PTRS_PER_PTE to find the second half of the page table. Use the right offset for PUD/PMD to get to the second half of the table. Fixes: bf9a95f9a648 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13powerpc/mm/hash64: Allocate larger PMD table if hugetlb config is enabledAneesh Kumar K.V
We use the second half of the page table to store slot information, so we must allocate it always if hugetlb is possible. Fixes: bf9a95f9a648 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13powerpc/mm: Fix crashes with 16G huge pagesAneesh Kumar K.V
To support memory keys, we moved the hash pte slot information to the second half of the page table. This was ok with PTE entries at level 4 (PTE page) and level 3 (PMD). We already allocate larger page table pages at those levels to accomodate extra details. For level 4 we already have the extra space which was used to track 4k hash page table entry details and at level 3 the extra space was allocated to track the THP details. With hugetlbfs PTE, we used this extra space at the PMD level to store the slot details. But we also support hugetlbfs PTE at PUD level for 16GB pages and PUD level page didn't allocate extra space. This resulted in memory corruption. Fix this by allocating extra space at PUD level when HUGETLB is enabled. Fixes: bf9a95f9a648 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13powerpc/mm: Flush radix process translations when setting MMU typeAlexey Kardashevskiy
Radix guests do normally invalidate process-scoped translations when a new pid is allocated but migrated guests do not invalidate these so migrated guests crash sometime, especially easy to reproduce with migration happening within first 10 seconds after the guest boot start on the same machine. This adds the "Invalidate process-scoped translations" flush to fix radix guests migration. Fixes: 2ee13be34b13 ("KVM: PPC: Book3S HV: Update kvmppc_set_arch_compat() for ISA v3.00") Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13powerpc/vas: Don't set uses_vas for kernel windowsNicholas Piggin
cp_abort is only required for user windows, because kernel context must not be preempted between a copy/paste pair. Without this patch, the init task gets used_vas set when it runs the nx842_powernv_init initcall, which opens windows for kernel usage. used_vas is then never cleared anywhere, so it gets propagated into all other tasks. It's a property of the address space, so it should really be cleared when a new mm is created (or in dup_mmap if the mmaps are marked as VM_DONTCOPY). For now we seem to have no such driver, so leave that for another patch. Fixes: 6c8e6bb2a52d ("powerpc/vas: Add support for user receive window") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13powerpc/pseries: Enable RAS hotplug events laterSam Bobroff
Currently if the kernel receives a memory hot-unplug event early enough, it may get stuck in an infinite loop in dissolve_free_huge_pages(). This appears as a stall just after: pseries-hotplug-mem: Attempting to hot-remove XX LMB(s) at YYYYYYYY It appears to be caused by "minimum_order" being uninitialized, due to init_ras_IRQ() executing before hugetlb_init(). To correct this, extract the part of init_ras_IRQ() that enables hotplug event processing and place it in the machine_late_initcall phase, which is guaranteed to be after hugetlb_init() is called. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> [mpe: Reorder the functions to make the diff readable] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-11vfs: do bulk POLL* -> EPOLL* replacementLinus Torvalds
This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-10Merge tag 'pci-v4.16-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "Fix a POWER9/powernv INTx regression from the merge window (Alexey Kardashevskiy)" * tag 'pci-v4.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: powerpc/pci: Fix broken INTx configuration via OF
2018-02-10Merge tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Radim Krčmář: "ARM: - icache invalidation optimizations, improving VM startup time - support for forwarded level-triggered interrupts, improving performance for timers and passthrough platform devices - a small fix for power-management notifiers, and some cosmetic changes PPC: - add MMIO emulation for vector loads and stores - allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without requiring the complex thread synchronization of older CPU versions - improve the handling of escalation interrupts with the XIVE interrupt controller - support decrement register migration - various cleanups and bugfixes. s390: - Cornelia Huck passed maintainership to Janosch Frank - exitless interrupts for emulated devices - cleanup of cpuflag handling - kvm_stat counter improvements - VSIE improvements - mm cleanup x86: - hypervisor part of SEV - UMIP, RDPID, and MSR_SMI_COUNT emulation - paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit - allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more AVX512 features - show vcpu id in its anonymous inode name - many fixes and cleanups - per-VCPU MSR bitmaps (already merged through x86/pti branch) - stable KVM clock when nesting on Hyper-V (merged through x86/hyperv)" * tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (197 commits) KVM: PPC: Book3S: Add MMIO emulation for VMX instructions KVM: PPC: Book3S HV: Branch inside feature section KVM: PPC: Book3S HV: Make HPT resizing work on POWER9 KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code KVM: PPC: Book3S PR: Fix broken select due to misspelling KVM: x86: don't forget vcpu_put() in kvm_arch_vcpu_ioctl_set_sregs() KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled KVM: PPC: Book3S HV: Drop locks before reading guest memory kvm: x86: remove efer_reload entry in kvm_vcpu_stat KVM: x86: AMD Processor Topology Information x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested kvm: embed vcpu id to dentry of vcpu anon inode kvm: Map PFN-type memory regions as writable (if possible) x86/kvm: Make it compile on 32bit and with HYPYERVISOR_GUEST=n KVM: arm/arm64: Fixup userspace irqchip static key optimization KVM: arm/arm64: Fix userspace_irqchip_in_use counting KVM: arm/arm64: Fix incorrect timer_is_pending logic MAINTAINERS: update KVM/s390 maintainers MAINTAINERS: add Halil as additional vfio-ccw maintainer MAINTAINERS: add David as a reviewer for KVM/s390 ...
2018-02-10powerpc/pci: Fix broken INTx configuration via OFAlexey Kardashevskiy
59f47eff03a0 ("powerpc/pci: Use of_irq_parse_and_map_pci() helper") replaced of_irq_parse_pci() + irq_create_of_mapping() with of_irq_parse_and_map_pci(), but neglected to capture the virq returned by irq_create_of_mapping(), so virq remained zero, which caused INTx configuration to fail. Save the virq value returned by of_irq_parse_and_map_pci() and correct the virq declaration to match the of_irq_parse_and_map_pci() signature. Fixes: 59f47eff03a0 "powerpc/pci: Use of_irq_parse_and_map_pci() helper" Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-02-09Merge tag 'kvm-ppc-next-4.16-2' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc Second PPC KVM update for 4.16 Seven fixes that are either trivial or that address bugs that people are actually hitting. The main ones are: - Drop spinlocks before reading guest memory - Fix a bug causing corruption of VCPU state in PR KVM with preemption enabled - Make HPT resizing work on POWER9 - Add MMIO emulation for vector loads and stores, because guests now use these instructions in memcpy and similar routines.
2018-02-09KVM: PPC: Book3S: Add MMIO emulation for VMX instructionsJose Ricardo Ziviani
This patch provides the MMIO load/store vector indexed X-Form emulation. Instructions implemented: lvx: the quadword in storage addressed by the result of EA & 0xffff_ffff_ffff_fff0 is loaded into VRT. stvx: the contents of VRS are stored into the quadword in storage addressed by the result of EA & 0xffff_ffff_ffff_fff0. Reported-by: Gopesh Kumar Chaudhary <gopchaud@in.ibm.com> Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-09KVM: PPC: Book3S HV: Branch inside feature sectionAlexander Graf
We ended up with code that did a conditional branch inside a feature section to code outside of the feature section. Depending on how the object file gets organized, that might mean we exceed the 14bit relocation limit for conditional branches: arch/powerpc/kvm/built-in.o:arch/powerpc/kvm/book3s_hv_rmhandlers.S:416:(__ftr_alt_97+0x8): relocation truncated to fit: R_PPC64_REL14 against `.text'+1ca4 So instead of doing a conditional branch outside of the feature section, let's just jump at the end of the same, making the branch very short. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-09KVM: PPC: Book3S HV: Make HPT resizing work on POWER9David Gibson
This adds code to enable the HPT resizing code to work on POWER9, which uses a slightly modified HPT entry format compared to POWER8. On POWER9, we convert HPTEs read from the HPT from the new format to the old format so that the rest of the HPT resizing code can work as before. HPTEs written to the new HPT are converted to the new format as the last step before writing them into the new HPT. This takes out the checks added by commit bcd3bb63dbc8 ("KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now", 2017-02-18), now that HPT resizing works on POWER9. On POWER9, when we pivot to the new HPT, we now call kvmppc_setup_partition_table() to update the partition table in order to make the hardware use the new HPT. [paulus@ozlabs.org - added kvmppc_setup_partition_table() call, wrote commit message.] Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-09KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing codePaul Mackerras
This fixes the computation of the HPTE index to use when the HPT resizing code encounters a bolted HPTE which is stored in its secondary HPTE group. The code inverts the HPTE group number, which is correct, but doesn't then mask it with new_hash_mask. As a result, new_pteg will be effectively negative, resulting in new_hptep pointing before the new HPT, which will corrupt memory. In addition, this removes two BUG_ON statements. The condition that the BUG_ONs were testing -- that we have computed the hash value incorrectly -- has never been observed in testing, and if it did occur, would only affect the guest, not the host. Given that BUG_ON should only be used in conditions where the kernel (i.e. the host kernel, in this case) can't possibly continue execution, it is not appropriate here. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-08powerpc/mm/radix: Split linear mapping on hot-unplugBalbir Singh
This patch splits the linear mapping if the hot-unplug range is smaller than the mapping size. The code detects if the mapping needs to be split into a smaller size and if so, uses the stop machine infrastructure to clear the existing mapping and then remap the remaining range using a smaller page size. The code will skip any region of the mapping that overlaps with kernel text and warn about it once. We don't want to remove a mapping where the kernel text and the LMB we intend to remove overlap in the same TLB mapping as it may affect the currently executing code. I've tested these changes under a kvm guest with 2 vcpus, from a split mapping point of view, some of the caveats mentioned above applied to the testing I did. Fixes: 4b5d62ca17a1 ("powerpc/mm: add radix__remove_section_mapping()") Signed-off-by: Balbir Singh <bsingharora@gmail.com> [mpe: Tweak change log to match updated behaviour] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-08powerpc/64s/radix: Boot-time NULL pointer protection using a guard-PIDNicholas Piggin
This change restores and formalises the behaviour that access to NULL or other user addresses by the kernel during boot should fault rather than succeed and modify memory. This was inadvertently broken when fixing another bug, because it was previously not well defined and only worked by chance. powerpc/64s/radix uses high address bits to select an address space "quadrant", which determines which PID and LPID are used to translate the rest of the address (effective PID, effective LPID). The kernel mapping at 0xC... selects quadrant 3, which uses PID=0 and LPID=0. So the kernel page tables are installed in the PID 0 process table entry. An address at 0x0... selects quadrant 0, which uses PID=PIDR for translating the rest of the address (that is, it uses the value of the PIDR register as the effective PID). If PIDR=0, then the translation is performed with the PID 0 process table entry page tables. This is the kernel mapping, so we effectively get another copy of the kernel address space at 0. A NULL pointer access will access physical memory address 0. To prevent duplicating the kernel address space in quadrant 0, this patch allocates a guard PID containing no translations, and initializes PIDR with this during boot, before the MMU is switched on. Any kernel access to quadrant 0 will use this guard PID for translation and find no valid mappings, and therefore fault. After boot, this PID will be switchd away to user context PIDs, but those contain user mappings (and usually NULL pointer protection) rather than kernel mapping, which is much safer (and by design). It may be in future this is tightened further, which the guard PID could be used for. Commit 371b8044 ("powerpc/64s: Initialize ISAv3 MMU registers before setting partition table"), introduced this problem because it zeroes PIDR at boot. However previously the value was inherited from firmware or kexec, which is not robust and can be zero (e.g., mambo). Fixes: 371b80447ff3 ("powerpc/64s: Initialize ISAv3 MMU registers before setting partition table") Cc: stable@vger.kernel.org # v4.15+ Reported-by: Florian Weimer <fweimer@redhat.com> Tested-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-08powerpc/64s: Fix may_hard_irq_enable() for PMI soft maskingNicholas Piggin
The soft IRQ masking code has to hard-disable interrupts in cases where the exception is not cleared by the masked handler. External interrupts used this approach for soft masking. Now recently PMU interrupts do the same thing. The soft IRQ masking code additionally allowed for interrupt handlers to hard-enable interrupts after soft-disabling them. The idea is to allow PMU interrupts through to profile interrupt handlers. So when interrupts are being replayed when there is a pending interrupt that requires hard-disabling, there is a test to prevent those handlers from hard-enabling them if there is a pending external interrupt. may_hard_irq_enable() handles this. After f442d00480 ("powerpc/64s: Add support to mask perf interrupts and replay them"), may_hard_irq_enable() could prematurely enable MSR[EE] when a PMU exception exists, which would result in the interrupt firing again while masked, and MSR[EE] being disabled again. I haven't seen that this could cause a serious problem, but it's more consistent to handle these soft-masked interrupts in the same way. So introduce a define for all types of interrupts that require MSR[EE] masking in their soft-disable handlers, and use that in may_hard_irq_enable(). Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and replay them") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-08powerpc/64s: Fix MASKABLE_RELON_EXCEPTION_HV_OOL macroMadhavan Srinivasan
Commit f14e953b191f ("powerpc/64s: Add support to take additional parameter in MASKABLE_* macro") messed up MASKABLE_RELON_EXCEPTION_HV_OOL macro by adding the wrong SOFTEN test which caused guest kernel crash at boot. Patch to fix the macro to use SOFTEN_TEST_HV instead of SOFTEN_NOTEST_HV. Fixes: f14e953b191f ("powerpc/64s: Add support to take additional parameter in MASKABLE_* macro") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Fix-Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-08powerpc/numa: Invalidate numa_cpu_lookup_table on cpu removeNathan Fontenot
When DLPAR removing a CPU, the unmapping of the cpu from a node in unmap_cpu_from_node() should also invalidate the CPUs entry in the numa_cpu_lookup_table. There is not a guarantee that on a subsequent DLPAR add of the CPU the associativity will be the same and thus could be in a different node. Invalidating the entry in the numa_cpu_lookup_table causes the associativity to be read from the device tree at the time of the add. The current behavior of not invalidating the CPUs entry in the numa_cpu_lookup_table can result in scenarios where the the topology layout of CPUs in the partition does not match the device tree or the topology reported by the HMC. This bug looks like it was introduced in 2004 in the commit titled "ppc64: cpu hotplug notifier for numa", which is 6b15e4e87e32 in the linux-fullhist tree. Hence tag it for all stable releases. Cc: stable@vger.kernel.org Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-08KVM: PPC: Book3S PR: Fix broken select due to misspellingUlf Magnusson
Commit 76d837a4c0f9 ("KVM: PPC: Book3S PR: Don't include SPAPR TCE code on non-pseries platforms") added a reference to the globally undefined symbol PPC_SERIES. Looking at the rest of the commit, PPC_PSERIES was probably intended. Change PPC_SERIES to PPC_PSERIES. Discovered with the https://github.com/ulfalizer/Kconfiglib/blob/master/examples/list_undefined.py script. Fixes: 76d837a4c0f9 ("KVM: PPC: Book3S PR: Don't include SPAPR TCE code on non-pseries platforms") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-06Merge branch 'linus' into sched/urgent, to resolve conflictsIngo Molnar
Conflicts: arch/arm64/kernel/entry.S arch/x86/Kconfig include/linux/sched/mm.h kernel/fork.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06Merge tag 'libnvdimm-for-4.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Ross Zwisler: - Require struct page by default for filesystem DAX to remove a number of surprising failure cases. This includes failures with direct I/O, gdb and fork(2). - Add support for the new Platform Capabilities Structure added to the NFIT in ACPI 6.2a. This new table tells us whether the platform supports flushing of CPU and memory controller caches on unexpected power loss events. - Revamp vmem_altmap and dev_pagemap handling to clean up code and better support future future PCI P2P uses. - Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL spec, and instead rely on the generic ND_CMD_CALL approach used by the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}. - Enhance nfit_test so we can test some of the new things added in version 1.6 of the DSM specification. This includes testing firmware download and simulating the Last Shutdown State (LSS) status. * tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (37 commits) libnvdimm, namespace: remove redundant initialization of 'nd_mapping' acpi, nfit: fix register dimm error handling libnvdimm, namespace: make min namespace size 4K tools/testing/nvdimm: force nfit_test to depend on instrumented modules libnvdimm/nfit_test: adding support for unit testing enable LSS status libnvdimm/nfit_test: add firmware download emulation nfit-test: Add platform cap support from ACPI 6.2a to test libnvdimm: expose platform persistence attribute for nd_region acpi: nfit: add persistent memory control flag for nd_region acpi: nfit: Add support for detect platform CPU cache flush on power loss device-dax: Fix trailing semicolon libnvdimm, btt: fix uninitialized err_lock dax: require 'struct page' by default for filesystem dax ext2: auto disable dax instead of failing mount ext4: auto disable dax instead of failing mount mm, dax: introduce pfn_t_special() mm: Fix devm_memremap_pages() collision handling mm: Fix memory size alignment in devm_memremap_pages_release() memremap: merge find_dev_pagemap into get_dev_pagemap memremap: change devm_memremap_pages interface to use struct dev_pagemap ...
2018-02-06Merge tag 'pci-v4.16-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - skip AER driver error recovery callbacks for correctable errors reported via ACPI APEI, as we already do for errors reported via the native path (Tyler Baicar) - fix DPC shared interrupt handling (Alex Williamson) - print full DPC interrupt number (Keith Busch) - enable DPC only if AER is available (Keith Busch) - simplify DPC code (Bjorn Helgaas) - calculate ASPM L1 substate parameter instead of hardcoding it (Bjorn Helgaas) - enable Latency Tolerance Reporting for ASPM L1 substates (Bjorn Helgaas) - move ASPM internal interfaces out of public header (Bjorn Helgaas) - allow hot-removal of VGA devices (Mika Westerberg) - speed up unplug and shutdown by assuming Thunderbolt controllers don't support Command Completed events (Lukas Wunner) - add AtomicOps support for GPU and Infiniband drivers (Felix Kuehling, Jay Cornwall) - expose "ari_enabled" in sysfs to help NIC naming (Stuart Hayes) - clean up PCI DMA interface usage (Christoph Hellwig) - remove PCI pool API (replaced with DMA pool) (Romain Perier) - deprecate pci_get_bus_and_slot(), which assumed PCI domain 0 (Sinan Kaya) - move DT PCI code from drivers/of/ to drivers/pci/ (Rob Herring) - add PCI-specific wrappers for dev_info(), etc (Frederick Lawler) - remove warnings on sysfs mmap failure (Bjorn Helgaas) - quiet ROM validation messages (Alex Deucher) - remove redundant memory alloc failure messages (Markus Elfring) - fill in types for compile-time VGA and other I/O port resources (Bjorn Helgaas) - make "pci=pcie_scan_all" work for Root Ports as well as Downstream Ports to help AmigaOne X1000 (Bjorn Helgaas) - add SPDX tags to all PCI files (Bjorn Helgaas) - quirk Marvell 9128 DMA aliases (Alex Williamson) - quirk broken INTx disable on Ceton InfiniTV4 (Bjorn Helgaas) - fix CONFIG_PCI=n build by adding dummy pci_irqd_intx_xlate() (Niklas Cassel) - use DMA API to get MSI address for DesignWare IP (Niklas Cassel) - fix endpoint-mode DMA mask configuration (Kishon Vijay Abraham I) - fix ARTPEC-6 incorrect IS_ERR() usage (Wei Yongjun) - add support for ARTPEC-7 SoC (Niklas Cassel) - add endpoint-mode support for ARTPEC (Niklas Cassel) - add Cadence PCIe host and endpoint controller driver (Cyrille Pitchen) - handle multiple INTx status bits being set in dra7xx (Vignesh R) - translate dra7xx hwirq range to fix INTD handling (Vignesh R) - remove deprecated Exynos PHY initialization code (Jaehoon Chung) - fix MSI erratum workaround for HiSilicon Hip06/Hip07 (Dongdong Liu) - fix NULL pointer dereference in iProc BCMA driver (Ray Jui) - fix Keystone interrupt-controller-node lookup (Johan Hovold) - constify qcom driver structures (Julia Lawall) - rework Tegra config space mapping to increase space available for endpoints (Vidya Sagar) - simplify Tegra driver by using bus->sysdata (Manikanta Maddireddy) - remove PCI_REASSIGN_ALL_BUS usage on Tegra (Manikanta Maddireddy) - add support for Global Fabric Manager Server (GFMS) event to Microsemi Switchtec switch driver (Logan Gunthorpe) - add IDs for Switchtec PSX 24xG3 and PSX 48xG3 (Kelvin Cao) * tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits) PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller PCI: endpoint: Fix EPF device name to support multi-function devices PCI: endpoint: Add the function number as argument to EPC ops PCI: cadence: Add host driver for Cadence PCIe controller dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe host controller PCI: Add vendor ID for Cadence PCI: Add generic function to probe PCI host controllers PCI: generic: fix missing call of pci_free_resource_list() PCI: OF: Add generic function to parse and allocate PCI resources PCI: Regroup all PCI related entries into drivers/pci/Makefile PCI/DPC: Reformat DPC register definitions PCI/DPC: Add and use DPC Status register field definitions PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error() PCI/DPC: Remove unnecessary RP PIO register structs PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info() PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info() PCI/DPC: Make RP PIO log size check more generic PCI/DPC: Rename local "status" to "dpc_status" PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error() ...
2018-02-05membarrier: Provide GLOBAL_EXPEDITED commandMathieu Desnoyers
Allow expedited membarrier to be used for data shared between processes through shared memory. Processes wishing to receive the membarriers register with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED. Those which want to issue membarrier invoke MEMBARRIER_CMD_GLOBAL_EXPEDITED. This allows extremely simple kernel-level implementation: we have almost everything we need with the PRIVATE_EXPEDITED barrier code. All we need to do is to add a flag in the mm_struct that will be used to check whether we need to send the IPI to the current thread of each CPU. There is a slight downside to this approach compared to targeting specific shared memory users: when performing a membarrier operation, all registered "global" receivers will get the barrier, even if they don't share a memory mapping with the sender issuing MEMBARRIER_CMD_GLOBAL_EXPEDITED. This registration approach seems to fit the requirement of not disturbing processes that really deeply care about real-time: they simply should not register with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED. In order to align the membarrier command names, the "MEMBARRIER_CMD_SHARED" command is renamed to "MEMBARRIER_CMD_GLOBAL", keeping an alias of MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_GLOBAL for UAPI header backward compatibility. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-5-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-05powerpc, membarrier: Skip memory barrier in switch_mm()Mathieu Desnoyers
Allow PowerPC to skip the full memory barrier in switch_mm(), and only issue the barrier when scheduling into a task belonging to a process that has registered to use expedited private. Threads targeting the same VM but which belong to different thread groups is a tricky case. It has a few consequences: It turns out that we cannot rely on get_nr_threads(p) to count the number of threads using a VM. We can use (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1) instead to skip the synchronize_sched() for cases where the VM only has a single user, and that user only has a single thread. It also turns out that we cannot use for_each_thread() to set thread flags in all threads using a VM, as it only iterates on the thread group. Therefore, test the membarrier state variable directly rather than relying on thread flags. This means membarrier_register_private_expedited() needs to set the MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows private expedited membarrier commands to succeed. membarrier_arch_switch_mm() now tests for the MEMBARRIER_STATE_PRIVATE_EXPEDITED flag. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20180129202020.8515-3-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-03Merge branch 'for-4.16/nfit' into libnvdimm-for-nextRoss Zwisler
2018-02-02Merge tag 'powerpc-4.16-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights: - Enable support for memory protection keys aka "pkeys" on Power7/8/9 when using the hash table MMU. - Extend our interrupt soft masking to support masking PMU interrupts as well as "normal" interrupts, and then use that to implement local_t for a ~4x speedup vs the current atomics-based implementation. - A new driver "ocxl" for "Open Coherent Accelerator Processor Interface (OpenCAPI)" devices. - Support for new device tree properties on PowerVM to describe hotpluggable memory and devices. - Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 64-bit VDSO. - Freescale updates from Scott: fixes for CPM GPIO and an FSL PCI erratum workaround, plus a minor cleanup patch. As well as quite a lot of other changes all over the place, and small fixes and cleanups as always. Thanks to: Alan Modra, Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andreas Schwab, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual, Anton Blanchard, Arnd Bergmann, Balbir Singh, Benjamin Herrenschmidt, Bhaktipriya Shridhar, Bryant G. Ly, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Cyril Bur, David Gibson, Desnes A. Nunes do Rosario, Dmitry Torokhov, Frederic Barrat, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo A. R. Silva, Gustavo Romero, Ivan Mikhaylov, Joakim Tjernlund, Joe Perches, Josh Poimboeuf, Juan J. Alvarez, Julia Cartwright, Kamalesh Babulal, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Malaterre, Michael Bringmann, Michael Hanselmann, Michael Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud, Ram Pai, Russell Currey, Santosh Sivaraj, Scott Wood, Seth Forshee, Simon Guo, Stewart Smith, Sukadev Bhattiprolu, Thiago Jung Bauermann, Vaibhav Jain, Vasyl Gomonovych" * tag 'powerpc-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (199 commits) powerpc/mm/radix: Fix build error when RADIX_MMU=n macintosh/ams-input: Use true and false for boolean values macintosh: change some data types from int to bool powerpc/watchdog: Print the NIP in soft_nmi_interrupt() powerpc/watchdog: regs can't be null in soft_nmi_interrupt() powerpc/watchdog: Tweak watchdog printks powerpc/cell: Remove axonram driver rtc-opal: Fix handling of firmware error codes, prevent busy loops powerpc/mpc52xx_gpt: make use of raw_spinlock variants macintosh/adb: Properly mark continued kernel messages powerpc/pseries: Fix cpu hotplug crash with memoryless nodes powerpc/numa: Ensure nodes initialized for hotplug powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes powerpc/kernel: Block interrupts when updating TIDR powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn powerpc/mm/nohash: do not flush the entire mm when range is a single page powerpc/pseries: Add Initialization of VF Bars powerpc/pseries/pci: Associate PEs to VFs in configure SR-IOV powerpc/eeh: Add EEH notify resume sysfs powerpc/eeh: Add EEH operations to notify resume ...
2018-02-01Merge tag 'clk-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk updates from Stephen Boyd: "The core framework has a handful of patches this time around, mostly due to the clk rate protection support added by Jerome Brunet. This feature will allow consumers to lock in a certain rate on the output of a clk so that things like audio playback don't hear pops when the clk frequency changes due to shared parent clks changing rates. Currently the clk API doesn't guarantee the rate of a clk stays at the rate you request after clk_set_rate() is called, so this new API will allow drivers to express that requirement. Beyond this, the core got some debugfs pretty printing patches and a couple minor non-critical fixes. Looking outside of the core framework diff we have some new driver additions and the removal of a legacy TI clk driver. Both of these hit high in the dirstat. Also, the removal of the asm-generic/clkdev.h file causes small one-liners in all the architecture Kbuild files. Overall, the driver diff seems to be the normal stuff that comes all the time to fix little problems here and there and to support new hardware. Summary: Core: - Clk rate protection - Symbolic clk flags in debugfs output - Clk registration enabled clks while doing bookkeeping updates New Drivers: - Spreadtrum SC9860 - HiSilicon hi3660 stub - Qualcomm A53 PLL, SPMI clkdiv, and MSM8916 APCS - Amlogic Meson-AXG - ASPEED BMC Removed Drivers: - TI OMAP 3xxx legacy clk (non-DT) support - asm*/clkdev.h got removed (not really a driver) Updates: - Renesas FDP1-0 module clock on R-Car M3-W - Renesas LVDS module clock on R-Car V3M - Misc fixes to pr_err() prints - Qualcomm MSM8916 audio fixes - Qualcomm IPQ8074 rounded out support for more peripherals - Qualcomm Alpha PLL variants - Divider code was using container_of() on bad pointers - Allwinner DE2 clks on H3 - Amlogic minor data fixes and dropping of CLK_IGNORE_UNUSED - Mediatek clk driver compile test support - AT91 PMC clk suspend/resume restoration support - PLL issues fixed on si5351 - Broadcom IProc PLL calculation updates - DVFS support for Armada mvebu CPU clks - Allwinner fixed post-divider support - TI clkctrl fixes and support for newer SoCs" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (125 commits) clk: aspeed: Handle inverse polarity of USB port 1 clock gate clk: aspeed: Fix return value check in aspeed_cc_init() clk: aspeed: Add reset controller clk: aspeed: Register gated clocks clk: aspeed: Add platform driver and register PLLs clk: aspeed: Register core clocks clk: Add clock driver for ASPEED BMC SoCs clk: mediatek: adjust dependency of reset.c to avoid unexpectedly being built clk: fix reentrancy of clk_enable() on UP systems clk: meson-axg: fix potential NULL dereference in axg_clkc_probe() clk: Simplify debugfs registration clk: Fix debugfs_create_*() usage clk: Show symbolic clock flags in debugfs clk: renesas: r8a7796: Add FDP clock clk: Move __clk_{get,put}() into private clk.h API clk: sunxi: Use CLK_IS_CRITICAL flag for critical clks clk: Improve flags doc for of_clk_detect_critical() arch: Remove clkdev.h asm-generic from Kbuild clk: sunxi-ng: a83t: Add M divider to TCON1 clock clk: Prepare to remove asm-generic/clkdev.h ...