summaryrefslogtreecommitdiff
path: root/arch/powerpc/platforms
AgeCommit message (Collapse)Author
2019-08-30powerpc/pseries: Introduce option to build secure virtual machinesThiago Jung Bauermann
Introduce CONFIG_PPC_SVM to control support for secure guests and include Ultravisor-related helpers when it is selected Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-3-bauerman@linux.ibm.com
2019-08-30Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
Merge our ppc-kvm topic branch to bring in the Ultravisor support patches.
2019-08-30powerpc/powernv: Add ultravisor message log interfaceClaudio Carvalho
The ultravisor (UV) provides an in-memory console which follows the OPAL in-memory console structure. This patch extends the OPAL msglog code to initialize the UV memory console and provide the "/sys/firmware/ultravisor/msglog" interface for userspace to view the UV message log. Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Claudio Carvalho <cclaudio@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190828130521.26764-2-mpe@ellerman.id.au
2019-08-30powerpc/powernv/opal-msglog: Refactor memcons codeClaudio Carvalho
This patch refactors the code in opal-msglog that operates on the OPAL memory console in order to make it cleaner and also allow the reuse of the new memcons_* functions. Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Claudio Carvalho <cclaudio@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190828130521.26764-1-mpe@ellerman.id.au
2019-08-30powerpc/powernv: Access LDBAR only if ultravisor disabledClaudio Carvalho
LDBAR is a per-thread SPR populated and used by the thread-imc pmu driver to dump the data counter into memory. It contains memory along with few other configuration bits. LDBAR is populated and enabled only when any of the thread imc pmu events are monitored. In ultravisor enabled systems, LDBAR becomes ultravisor privileged and an attempt to write to it will cause a Hypervisor Emulation Assistance interrupt. In ultravisor enabled systems, the ultravisor is responsible to maintain the LDBAR (e.g. save and restore it). This restricts LDBAR access to only when ultravisor is disabled. Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Reviewed-by: Ryan Grimm <grimm@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190822034838.27876-7-cclaudio@linux.ibm.com
2019-08-30powerpc/powernv: Introduce FW_FEATURE_ULTRAVISORClaudio Carvalho
In PEF enabled systems, some of the resources which were previously hypervisor privileged are now ultravisor privileged and controlled by the ultravisor firmware. This adds FW_FEATURE_ULTRAVISOR to indicate if PEF is enabled. The host kernel can use FW_FEATURE_ULTRAVISOR, for instance, to skip accessing resources (e.g. PTCR and LDBAR) in case PEF is enabled. Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> [ andmike: Device node name to "ibm,ultravisor" ] Signed-off-by: Michael Anderson <andmike@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190822034838.27876-4-cclaudio@linux.ibm.com
2019-08-30powerpc/powernv/ioda: Remove obsolete iommu_table_ops::exchange callbacksAlexey Kardashevskiy
As now we have xchg_no_kill/tce_kill, these are not used anymore so remove them. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190829085252.72370-6-aik@ozlabs.ru
2019-08-30powerpc/pseries/iommu: Switch to xchg_no_killAlexey Kardashevskiy
This is the last implementation of iommu_table_ops::exchange() which we are about to remove. This implements xchg_no_kill() for pseries. Since it is paravirtual platform, the hypervisor does TCE invalidations and we do not have to deal with it here, hence no tce_kill() hook. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190829085252.72370-5-aik@ozlabs.ru
2019-08-30powerpc/powernv/ioda: Split out TCE invalidation from TCE updatesAlexey Kardashevskiy
At the moment updates in a TCE table are made by iommu_table_ops::exchange which update one TCE and invalidates an entry in the PHB/NPU TCE cache via set of registers called "TCE Kill" (hence the naming). Writing a TCE is a simple xchg() but invalidating the TCE cache is a relatively expensive OPAL call. Mapping a 100GB guest with PCI+NPU passed through devices takes about 20s. Thankfully we can do better. Since such big mappings happen at the boot time and when memory is plugged/onlined (i.e. not often), these requests come in 512 pages so we call call OPAL 512 times less which brings 20s from the above to less than 10s. Also, since TCE caches can be flushed entirely, calling OPAL for 512 TCEs helps skiboot [1] to decide whether to flush the entire cache or not. This implements 2 new iommu_table_ops callbacks: - xchg_no_kill() to update a single TCE with no TCE invalidation; - tce_kill() to invalidate multiple TCEs. This uses the same xchg_no_kill() callback for IODA1/2. This implements 2 new wrappers on top of the new callbacks similar to the existing iommu_tce_xchg(). This does not use the new callbacks yet, the next patches will; so this should not cause any behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190829085252.72370-2-aik@ozlabs.ru
2019-08-28powerpc: use the generic dma coherent remap allocatorChristoph Hellwig
This switches to using common code for the DMA allocations, including potential use of the CMA allocator if configured. Switching to the generic code enables DMA allocations from atomic context, which is required by the DMA API documentation, and also adds various other minor features drivers start relying upon. It also makes sure we have on tested code base for all architectures that require uncached pte bits for coherent DMA allocations. Another advantage is that consistent memory allocations now share the general vmalloc pool instead of needing an explicit careout from it. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> # tested on 8xx Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190814132230.31874-2-hch@lst.de
2019-08-28powerpc/32s: get rid of CPU_FTR_601 featureChristophe Leroy
Now that 601 is exclusive from other 6xx, CPU_FTR_601 and associated fixups are useless. Drop this feature and use #ifdefs instead. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ecdb7194a17dbfa01865df6a82979533adc2c70b.1566834712.git.christophe.leroy@c-s.fr
2019-08-28powerpc/32s: add an option to exclusively select powerpc 601Christophe Leroy
Powerpc 601 is rather old powerpc which as some important limitations compared to other book3s/32 powerpcs: - No Timebase. - Common BATs for instruction and data. - No execution protection in segment registers. - No RI bit in MSR - ... It is starting to be difficult and cumbersome to maintain kernels that are compatible both with 601 and other 6xx cores. Create a compiletime option to exclusively select either powerpc 601 or other 6xx. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d644eaf7dff8cc149260066802af230bdf34fded.1566834712.git.christophe.leroy@c-s.fr
2019-08-27powerpc/ps3: replace __ioremap() by ioremap_prot()Christophe Leroy
__ioremap() is similar to ioremap_prot() except that ioremap_prot() does a few sanity changes in addition. The flags used by PS3 are not impacted by those changes so for PS3 both functions are equivalent. At the same time, drop parts of the comment that have been invalid since commit e58e87adc8bf ("powerpc/mm: Update _PAGE_KERNEL_RO") Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/36bff5d875ff562889c5e12dab63e5d7c5d1fbd8.1566309262.git.christophe.leroy@c-s.fr
2019-08-27powerpc: remove the ppc44x ocm.c fileChristoph Hellwig
The on chip memory allocator is entirely unused in the kernel tree. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7b1668941ad1041d08b19167030868de5840b153.1566309262.git.christophe.leroy@c-s.fr
2019-08-22powerpc/eeh: Convert log messages to eeh_edev_* macrosSam Bobroff
Convert existing messages, where appropriate, to use the eeh_edev_* logging macros. The only effect should be minor adjustments to the log messages, apart from: - A new message in pseries_eeh_probe() "Probing device" to match the powernv case. - The "Probing device" message in pnv_eeh_probe() is now generated slightly later, which will mean that it is no longer emitted for devices that aren't probed due to the initial checks. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ce505a0a7a4a5b0367f0f40f8b26e7c0a9cf4cb7.1565930772.git.sbobroff@linux.ibm.com
2019-08-22powerpc/eeh: Refactor around eeh_probe_devices()Sam Bobroff
Now that EEH support for all devices (on PowerNV and pSeries) is provided by the pcibios bus add device hooks, eeh_probe_devices() and eeh_addr_cache_build() are redundant and can be removed. Move the EEH enabled message into it's own function so that it can be called from multiple places. Note that previously on pSeries, useless EEH sysfs files were created for some devices that did not have EEH support and this change prevents them from being created. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/33b0a6339d5ac88693de092d6fba984f2a5add66.1565930772.git.sbobroff@linux.ibm.com
2019-08-22powerpc/eeh: EEH for pSeries hot plugSam Bobroff
On PowerNV and pSeries, devices currently acquire EEH support from several different places: Boot-time devices from eeh_probe_devices() and eeh_addr_cache_build(), Virtual Function devices from the pcibios bus add device hooks and hot plugged devices from pci_hp_add_devices() (with other platforms using other methods as well). Unfortunately, pSeries machines currently discover hot plugged devices using pci_rescan_bus(), not pci_hp_add_devices(), and so those devices do not receive EEH support. Rather than adding another case for pci_rescan_bus(), this change widens the scope of the pcibios bus add device hooks so that they can handle all devices. As a side effect this also supports devices discovered after manually rescanning via /sys/bus/pci/rescan. Note that on PowerNV, this change allows the EEH subsystem to become enabled after boot as long as it has not been forced off, which was not previously possible (it was already possible on pSeries). Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/72ae8ae9c54097158894a52de23690448de38ea9.1565930772.git.sbobroff@linux.ibm.com
2019-08-22powerpc/eeh: Improve debug messages around device additionSam Bobroff
Also remove useless comment. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/59db84f4bf94718a12f206bc923ac797d47e4cc1.1565930772.git.sbobroff@linux.ibm.com
2019-08-20powerpc/pseries/mobility: use cond_resched when updating device treeNathan Lynch
After a partition migration, pseries_devicetree_update() processes changes to the device tree communicated from the platform to Linux. This is a relatively heavyweight operation, with multiple device tree searches, memory allocations, and conversations with partition firmware. There's a few levels of nested loops which are bounded only by decisions made by the platform, outside of Linux's control, and indeed we have seen RCU stalls on large systems while executing this call graph. Use cond_resched() in these loops so that the cpu is yielded when needed. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190802192926.19277-4-nathanl@linux.ibm.com
2019-08-19powerpc/powernv/ioda2: Create bigger default window with 64k IOMMU pagesAlexey Kardashevskiy
At the moment we create a small window only for 32bit devices, the window maps 0..2GB of the PCI space only. For other devices we either use a sketchy bypass or hardware bypass but the former can only work if the amount of RAM is no bigger than the device's DMA mask and the latter requires devices to support at least 59bit DMA. This extends the default DMA window to the maximum size possible to allow a wider DMA mask than just 32bit. The default window size is now limited by the the iommu_table::it_map allocation bitmap which is a contiguous array, 1 bit per an IOMMU page. This increases the default IOMMU page size from hard coded 4K to the system page size to allow wider DMA masks. This increases the level number to not exceed the max order allocation limit per TCE level. By the same time, this keeps minimal levels number as 2 in order to save memory. As the extended window now overlaps the 32bit MMIO region, this adds an area reservation to iommu_init_table(). After this change the default window size is 0x80000000000==1<<43 so devices limited to DMA mask smaller than the amount of system RAM can still use more than just 2GB of memory for DMA. This is an optimization and not a bug fix for DMA API usage. With the on-demand allocation of indirect TCE table levels enabled and 2 levels, the first TCE level size is just 1<<ceil((log2(0x7ffffffffff+1)-16)/2)=16384 TCEs or 2 system pages. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190718051139.74787-5-aik@ozlabs.ru
2019-08-19powerpc/powernv/ioda2: Allocate TCE table levels on demand for default DMA ↵Alexey Kardashevskiy
window We allocate only the first level of multilevel TCE tables for KVM already (alloc_userspace_copy==true), and the rest is allocated on demand. This is not enabled though for bare metal. This removes the KVM limitation (implicit, via the alloc_userspace_copy parameter) and always allocates just the first level. The on-demand allocation of missing levels is already implemented. As from now on DMA map might happen with disabled interrupts, this allocates TCEs with GFP_ATOMIC; otherwise lockdep reports errors 1]. In practice just a single page is allocated there so chances for failure are quite low. To save time when creating a new clean table, this skips non-allocated indirect TCE entries in pnv_tce_free just like we already do in the VFIO IOMMU TCE driver. This changes the default level number from 1 to 2 to reduce the amount of memory required for the default 32bit DMA window at the boot time. The default window size is up to 2GB which requires 4MB of TCEs which is unlikely to be used entirely or at all as most devices these days are 64bit capable so by switching to 2 levels by default we save 4032KB of RAM per a device. While at this, add __GFP_NOWARN to alloc_pages_node() as the userspace can trigger this path via VFIO, see the failure and try creating a table again with different parameters which might succeed. [1]: === BUG: sleeping function called from invalid context at mm/page_alloc.c:4596 in_atomic(): 1, irqs_disabled(): 1, pid: 1038, name: scsi_eh_1 2 locks held by scsi_eh_1/1038: #0: 000000005efd659a (&host->eh_mutex){+.+.}, at: ata_eh_acquire+0x34/0x80 #1: 0000000006cf56a6 (&(&host->lock)->rlock){....}, at: ata_exec_internal_sg+0xb0/0x5c0 irq event stamp: 500 hardirqs last enabled at (499): [<c000000000cb8a74>] _raw_spin_unlock_irqrestore+0x94/0xd0 hardirqs last disabled at (500): [<c000000000cb85c4>] _raw_spin_lock_irqsave+0x44/0x120 softirqs last enabled at (0): [<c000000000101120>] copy_process.isra.4.part.5+0x640/0x1a80 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 73 PID: 1038 Comm: scsi_eh_1 Not tainted 5.2.0-rc6-le_nv2_aikATfstn1-p1 #634 Call Trace: [c000003d064cef50] [c000000000c8e6c4] dump_stack+0xe8/0x164 (unreliable) [c000003d064cefa0] [c00000000014ed78] ___might_sleep+0x2f8/0x310 [c000003d064cf020] [c0000000003ca084] __alloc_pages_nodemask+0x2a4/0x1560 [c000003d064cf220] [c0000000000c2530] pnv_alloc_tce_level.isra.0+0x90/0x130 [c000003d064cf290] [c0000000000c2888] pnv_tce+0x128/0x3b0 [c000003d064cf360] [c0000000000c2c00] pnv_tce_build+0xb0/0xf0 [c000003d064cf3c0] [c0000000000bbd9c] pnv_ioda2_tce_build+0x3c/0xb0 [c000003d064cf400] [c00000000004cfe0] ppc_iommu_map_sg+0x210/0x550 [c000003d064cf510] [c00000000004b7a4] dma_iommu_map_sg+0x74/0xb0 [c000003d064cf530] [c000000000863944] ata_qc_issue+0x134/0x470 [c000003d064cf5b0] [c000000000863ec4] ata_exec_internal_sg+0x244/0x5c0 [c000003d064cf700] [c0000000008642d0] ata_exec_internal+0x90/0xe0 [c000003d064cf780] [c0000000008650ac] ata_dev_read_id+0x2ec/0x640 [c000003d064cf8d0] [c000000000878e28] ata_eh_recover+0x948/0x16d0 [c000003d064cfa10] [c00000000087d760] sata_pmp_error_handler+0x480/0xbf0 [c000003d064cfbc0] [c000000000884624] ahci_error_handler+0x74/0xe0 [c000003d064cfbf0] [c000000000879fa8] ata_scsi_port_error_handler+0x2d8/0x7c0 [c000003d064cfca0] [c00000000087a544] ata_scsi_error+0xb4/0x100 [c000003d064cfd00] [c000000000802450] scsi_error_handler+0x120/0x510 [c000003d064cfdb0] [c000000000140c48] kthread+0x1b8/0x1c0 [c000003d064cfe20] [c00000000000bd8c] ret_from_kernel_thread+0x5c/0x70 ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) irq event stamp: 2305 ======================================================== hardirqs last enabled at (2305): [<c00000000000e4c8>] fast_exc_return_irq+0x28/0x34 hardirqs last disabled at (2303): [<c000000000cb9fd0>] __do_softirq+0x4a0/0x654 WARNING: possible irq lock inversion dependency detected 5.2.0-rc6-le_nv2_aikATfstn1-p1 #634 Tainted: G W softirqs last enabled at (2304): [<c000000000cba054>] __do_softirq+0x524/0x654 softirqs last disabled at (2297): [<c00000000010f278>] irq_exit+0x128/0x180 -------------------------------------------------------- swapper/0/0 just changed the state of lock: 0000000006cf56a6 (&(&host->lock)->rlock){-...}, at: ahci_single_level_irq_intr+0xac/0x120 but this lock took another, HARDIRQ-unsafe lock in the past: (fs_reclaim){+.+.} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); local_irq_disable(); lock(&(&host->lock)->rlock); lock(fs_reclaim); <Interrupt> lock(&(&host->lock)->rlock); *** DEADLOCK *** no locks held by swapper/0/0. the shortest dependencies between 2nd lock and 1st lock: -> (fs_reclaim){+.+.} ops: 167579 { HARDIRQ-ON-W at: lock_acquire+0xf8/0x2a0 fs_reclaim_acquire.part.23+0x44/0x60 kmem_cache_alloc_node_trace+0x80/0x590 alloc_desc+0x64/0x270 __irq_alloc_descs+0x2e4/0x3a0 irq_domain_alloc_descs+0xb0/0x150 irq_create_mapping+0x168/0x2c0 xics_smp_probe+0x2c/0x98 pnv_smp_probe+0x40/0x9c smp_prepare_cpus+0x524/0x6c4 kernel_init_freeable+0x1b4/0x650 kernel_init+0x2c/0x148 ret_from_kernel_thread+0x5c/0x70 SOFTIRQ-ON-W at: lock_acquire+0xf8/0x2a0 fs_reclaim_acquire.part.23+0x44/0x60 kmem_cache_alloc_node_trace+0x80/0x590 alloc_desc+0x64/0x270 __irq_alloc_descs+0x2e4/0x3a0 irq_domain_alloc_descs+0xb0/0x150 irq_create_mapping+0x168/0x2c0 xics_smp_probe+0x2c/0x98 pnv_smp_probe+0x40/0x9c smp_prepare_cpus+0x524/0x6c4 kernel_init_freeable+0x1b4/0x650 kernel_init+0x2c/0x148 ret_from_kernel_thread+0x5c/0x70 INITIAL USE at: lock_acquire+0xf8/0x2a0 fs_reclaim_acquire.part.23+0x44/0x60 kmem_cache_alloc_node_trace+0x80/0x590 alloc_desc+0x64/0x270 __irq_alloc_descs+0x2e4/0x3a0 irq_domain_alloc_descs+0xb0/0x150 irq_create_mapping+0x168/0x2c0 xics_smp_probe+0x2c/0x98 pnv_smp_probe+0x40/0x9c smp_prepare_cpus+0x524/0x6c4 kernel_init_freeable+0x1b4/0x650 kernel_init+0x2c/0x148 ret_from_kernel_thread+0x5c/0x70 } === Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190718051139.74787-4-aik@ozlabs.ru
2019-08-19powerpc/powernv/ioda: Fix race in TCE level allocationAlexey Kardashevskiy
pnv_tce() returns a pointer to a TCE entry and originally a TCE table would be pre-allocated. For the default case of 2GB window the table needs only a single level and that is fine. However if more levels are requested, it is possible to get a race when 2 threads want a pointer to a TCE entry from the same page of TCEs. This adds cmpxchg to handle the race. Note that once TCE is non-zero, it cannot become zero again. Fixes: a68bd1267b72 ("powerpc/powernv/ioda: Allocate indirect TCE levels on demand") CC: stable@vger.kernel.org # v4.19+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190718051139.74787-2-aik@ozlabs.ru
2019-08-19powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt()Gautham R. Shenoy
The calls to arch_add_memory()/arch_remove_memory() are always made with the read-side cpu_hotplug_lock acquired via memory_hotplug_begin(). On pSeries, arch_add_memory()/arch_remove_memory() eventually call resize_hpt() which in turn calls stop_machine() which acquires the read-side cpu_hotplug_lock again, thereby resulting in the recursive acquisition of this lock. In the absence of CONFIG_PROVE_LOCKING, we hadn't observed a system lockup during a memory hotplug operation because cpus_read_lock() is a per-cpu rwsem read, which, in the fast-path (in the absence of the writer, which in our case is a CPU-hotplug operation) simply increments the read_count on the semaphore. Thus a recursive read in the fast-path doesn't cause any problems. However, we can hit this problem in practice if there is a concurrent CPU-Hotplug operation in progress which is waiting to acquire the write-side of the lock. This will cause the second recursive read to block until the writer finishes. While the writer is blocked since the first read holds the lock. Thus both the reader as well as the writers fail to make any progress thereby blocking both CPU-Hotplug as well as Memory Hotplug operations. Memory-Hotplug CPU-Hotplug CPU 0 CPU 1 ------ ------ 1. down_read(cpu_hotplug_lock.rw_sem) [memory_hotplug_begin] 2. down_write(cpu_hotplug_lock.rw_sem) [cpu_up/cpu_down] 3. down_read(cpu_hotplug_lock.rw_sem) [stop_machine()] Lockdep complains as follows in these code-paths. swapper/0/1 is trying to acquire lock: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: stop_machine+0x2c/0x60 but task is already holding lock: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(cpu_hotplug_lock.rw_sem); lock(cpu_hotplug_lock.rw_sem); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by swapper/0/1: #0: (____ptrval____) (&dev->mutex){....}, at: __driver_attach+0x12c/0x1b0 #1: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50 #2: (____ptrval____) (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x54/0x1a0 stack backtrace: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc5-58373-gbc99402235f3-dirty #166 Call Trace: dump_stack+0xe8/0x164 (unreliable) __lock_acquire+0x1110/0x1c70 lock_acquire+0x240/0x290 cpus_read_lock+0x64/0xf0 stop_machine+0x2c/0x60 pseries_lpar_resize_hpt+0x19c/0x2c0 resize_hpt_for_hotplug+0x70/0xd0 arch_add_memory+0x58/0xfc devm_memremap_pages+0x5e8/0x8f0 pmem_attach_disk+0x764/0x830 nvdimm_bus_probe+0x118/0x240 really_probe+0x230/0x4b0 driver_probe_device+0x16c/0x1e0 __driver_attach+0x148/0x1b0 bus_for_each_dev+0x90/0x130 driver_attach+0x34/0x50 bus_add_driver+0x1a8/0x360 driver_register+0x108/0x170 __nd_driver_register+0xd0/0xf0 nd_pmem_driver_init+0x34/0x48 do_one_initcall+0x1e0/0x45c kernel_init_freeable+0x540/0x64c kernel_init+0x2c/0x160 ret_from_kernel_thread+0x5c/0x68 Fix this issue by 1) Requiring all the calls to pseries_lpar_resize_hpt() be made with cpu_hotplug_lock held. 2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked() as a consequence of 1) 3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt() with cpu_hotplug_lock held. Fixes: dbcf929c0062 ("powerpc/pseries: Add support for hash table resizing") Cc: stable@vger.kernel.org # v4.11+ Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1557906352-29048-1-git-send-email-ego@linux.vnet.ibm.com
2019-08-19Merge branch 'fixes' into nextMichael Ellerman
Merge in our fixes branch, which brings in clone3() as well as some implicit fallthrough fixes we want in next.
2019-08-05powerpc/pseries/hotplug-memory.c: Replace nested ifs by switch-caseLeonardo Bras
I noticed these nested ifs can be easily replaced by switch-cases, which can improve readability. Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190801225251.17864-1-leonardo@linux.ibm.com
2019-08-05powerpc/powernv: Restrict OPAL symbol map to only be readable by rootAndrew Donnellan
Currently the OPAL symbol map is globally readable, which seems bad as it contains physical addresses. Restrict it to root. Fixes: c8742f85125d ("powerpc/powernv: Expose OPAL firmware symbol map") Cc: stable@vger.kernel.org # v3.19+ Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190503075253.22798-1-ajd@linux.ibm.com
2019-08-05powerpc/powernv: Fix checkpatch warnings in opal-xscom.cAndrew Donnellan
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190509051119.7694-4-ajd@linux.ibm.com
2019-08-05powerpc/powernv: Get rid of old scom_controller abstractionAndrew Donnellan
Once upon a time, the SCOM access code was used by the WSP platform as well as powernv. Thus it made sense to have a generic SCOM access interface to abstract between different platforms. Now that it's just powernv, with no other platforms currently on the horizon, let's rip out scom_controller and make everything much simpler and more direct. While we're here, fix up the comment block at the top. Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190509051119.7694-3-ajd@linux.ibm.com
2019-08-05powerpc/powernv: Remove dead SCOM access codeAndrew Donnellan
Nothing is using scom_map_device() or scom_find_parent(). Remove them. Also don't export scom_controller, there are no other users of it. Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190509051119.7694-2-ajd@linux.ibm.com
2019-08-05powerpc/powernv: Move SCOM access code into powernv platformAndrew Donnellan
The powernv platform is the only one that directly accesses SCOMs. Move the support code to platforms/powernv, and get rid of the PPC_SCOM Kconfig option, as SCOM support is always selected when compiling for powernv. This also means that the Kconfig item for CONFIG_SCOM_DEBUGFS will show up in menuconfig in the platform menu, rather than at the root, which is a much better location. Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190509051119.7694-1-ajd@linux.ibm.com
2019-07-30powerpc/nvdimm: Pick nearby online node if the device node is not onlineAneesh Kumar K.V
Currently, nvdimm subsystem expects the device numa node for SCM device to be an online node. It also doesn't try to bring the device numa node online. Hence if we use a non-online numa node as device node we hit crashes like below. This is because we try to access uninitialized NODE_DATA in different code paths. cpu 0x0: Vector: 300 (Data Access) at [c0000000fac53170] pc: c0000000004bbc50: ___slab_alloc+0x120/0xca0 lr: c0000000004bc834: __slab_alloc+0x64/0xc0 sp: c0000000fac53400 msr: 8000000002009033 dar: 73e8 dsisr: 80000 current = 0xc0000000fabb6d80 paca = 0xc000000003870000 irqmask: 0x03 irq_happened: 0x01 pid = 7, comm = kworker/u16:0 Linux version 5.2.0-06234-g76bd729b2644 (kvaneesh@ltc-boston123) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #135 SMP Thu Jul 11 05:36:30 CDT 2019 enter ? for help [link register ] c0000000004bc834 __slab_alloc+0x64/0xc0 [c0000000fac53400] c0000000fac53480 (unreliable) [c0000000fac53500] c0000000004bc818 __slab_alloc+0x48/0xc0 [c0000000fac53560] c0000000004c30a0 __kmalloc_node_track_caller+0x3c0/0x6b0 [c0000000fac535d0] c000000000cfafe4 devm_kmalloc+0x74/0xc0 [c0000000fac53600] c000000000d69434 nd_region_activate+0x144/0x560 [c0000000fac536d0] c000000000d6b19c nd_region_probe+0x17c/0x370 [c0000000fac537b0] c000000000d6349c nvdimm_bus_probe+0x10c/0x230 [c0000000fac53840] c000000000cf3cc4 really_probe+0x254/0x4e0 [c0000000fac538d0] c000000000cf429c driver_probe_device+0x16c/0x1e0 [c0000000fac53950] c000000000cf0b44 bus_for_each_drv+0x94/0x130 [c0000000fac539b0] c000000000cf392c __device_attach+0xdc/0x200 [c0000000fac53a50] c000000000cf231c bus_probe_device+0x4c/0xf0 [c0000000fac53a90] c000000000ced268 device_add+0x528/0x810 [c0000000fac53b60] c000000000d62a58 nd_async_device_register+0x28/0xa0 [c0000000fac53bd0] c0000000001ccb8c async_run_entry_fn+0xcc/0x1f0 [c0000000fac53c50] c0000000001bcd9c process_one_work+0x46c/0x860 [c0000000fac53d20] c0000000001bd4f4 worker_thread+0x364/0x5f0 [c0000000fac53db0] c0000000001c7260 kthread+0x1b0/0x1c0 [c0000000fac53e20] c00000000000b954 ret_from_kernel_thread+0x5c/0x68 The patch tries to fix this by picking the nearest online node as the SCM node. This does have a problem of us losing the information that SCM node is equidistant from two other online nodes. If applications need to understand these fine-grained details we should express then like x86 does via /sys/devices/system/node/nodeX/accessY/initiators/ With the patch we get # numactl -H available: 2 nodes (0-1) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 1 size: 130865 MB node 1 free: 129130 MB node distances: node 0 1 0: 10 20 1: 20 10 # cat /sys/bus/nd/devices/region0/numa_node 0 # dmesg | grep papr_scm [ 91.332305] papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Region registered with target node 2 and online node 0 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190729095128.23707-1-aneesh.kumar@linux.ibm.com
2019-07-24Merge tag 'powerpc-5.3-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "An assortment of non-regression fixes that have accumulated since the start of the merge window. - A fix for a user triggerable oops on machines where transactional memory is disabled, eg. Power9 bare metal, Power8 with TM disabled on the command line, or all Power7 or earlier machines. - Three fixes for handling of PMU and power saving registers when running nested KVM on Power9. - Two fixes for bugs found while stress testing the XIVE interrupt controller code, also on Power9. - A fix to allow guests to boot under Qemu/KVM on Power9 using the the Hash MMU with >= 1TB of memory. - Two fixes for bugs in the recent DMA cleanup, one of which could lead to checkstops. - And finally three fixes for the PAPR SCM nvdimm driver. Thanks to: Alexey Kardashevskiy, Andrea Arcangeli, Cédric Le Goater, Christoph Hellwig, David Gibson, Gautham R. Shenoy, Michael Neuling, Oliver O'Halloran, Satheesh Rajendran, Shawn Anastasio, Suraj Jitindar Singh, Vaibhav Jain" * tag 'powerpc-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL powerpc/pseries: Update SCM hcall op-codes in hvcall.h powerpc/tm: Fix oops on sigreturn on systems without TM powerpc/dma: Fix invalid DMA mmap behavior KVM: PPC: Book3S HV: XIVE: fix rollback when kvmppc_xive_create fails powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask() powerpc: fix off by one in max_zone_pfn initialization for ZONE_DMA KVM: PPC: Book3S HV: Save and restore guest visible PSSCR bits on pseries powerpc/pmu: Set pmcregs_in_use in paca when running as LPAR KVM: PPC: Book3S HV: Always save guest pmu for guest capable of nesting powerpc/mm: Limit rma_size to 1TB when running without HV mode
2019-07-22powerpc/papr_scm: Force a scm-unbind if initial scm-bind failsVaibhav Jain
In some cases initial bind of scm memory for an lpar can fail if previously it wasn't released using a scm-unbind hcall. This situation can arise due to panic of the previous kernel or forced lpar fadump. In such cases the H_SCM_BIND_MEM return a H_OVERLAP error. To mitigate such cases the patch updates papr_scm_probe() to force a call to drc_pmem_unbind() in case the initial bind of scm memory fails with EBUSY error. In case scm-bind operation again fails after the forced scm-unbind then we follow the existing error path. We also update drc_pmem_bind() to handle the H_OVERLAP error returned by phyp and indicate it as a EBUSY error back to the caller. Suggested-by: "Oliver O'Halloran" <oohall@gmail.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190629160610.23402-4-vaibhav@linux.ibm.com
2019-07-22powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALLVaibhav Jain
The new hcall named H_SCM_UNBIND_ALL has been introduce that can unbind all or specific scm memory assigned to an lpar. This is more efficient than using H_SCM_UNBIND_MEM as currently we don't support partial unbind of scm memory. Hence this patch proposes following changes to drc_pmem_unbind(): * Update drc_pmem_unbind() to replace hcall H_SCM_UNBIND_MEM to H_SCM_UNBIND_ALL. * Update drc_pmem_unbind() to handles cases when PHYP asks the guest kernel to wait for specific amount of time before retrying the hcall via the 'LONG_BUSY' return value. * Ensure appropriate error code is returned back from the function in case of an error. Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190629160610.23402-3-vaibhav@linux.ibm.com
2019-07-18mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of ↵David Hildenbrand
pfns walk_memory_range() was once used to iterate over sections. Now, it iterates over memory blocks. Rename the function, fixup the documentation. Also, pass start+size instead of PFNs, which is what most callers already have at hand. (we'll rework link_mem_sections() most probably soon) Follow-up patches will rework, simplify, and move walk_memory_blocks() to drivers/base/memory.c. Note: walk_memory_blocks() only works correctly right now if the start_pfn is aligned to a section start. This is the case right now, but we'll generalize the function in a follow up patch so the semantics match the documentation. [akpm@linux-foundation.org: remove unused variable] Link: http://lkml.kernel.org/r/20190614100114.311-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Rashmica Gupta <rashmica.g@gmail.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Arun KS <arunks@codeaurora.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-13Merge tag 'powerpc-5.3-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Removal of the NPU DMA code, used by the out-of-tree Nvidia driver, as well as some other functions only used by drivers that haven't (yet?) made it upstream. - A fix for a bug in our handling of hardware watchpoints (eg. perf record -e mem: ...) which could lead to register corruption and kernel crashes. - Enable HAVE_ARCH_HUGE_VMAP, which allows us to use large pages for vmalloc when using the Radix MMU. - A large but incremental rewrite of our exception handling code to use gas macros rather than multiple levels of nested CPP macros. And the usual small fixes, cleanups and improvements. Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Cédric Le Goater, Christian Lamparter, Christophe Leroy, Christophe Lombard, Christoph Hellwig, Daniel Axtens, Denis Efremov, Enrico Weigelt, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geliang Tang, Gen Zhang, Greg Kroah-Hartman, Greg Kurz, Gustavo Romero, Krzysztof Kozlowski, Madhavan Srinivasan, Masahiro Yamada, Mathieu Malaterre, Michael Neuling, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nishad Kamdar, Oliver O'Halloran, Qian Cai, Ravi Bangoria, Sachin Sant, Sam Bobroff, Satheesh Rajendran, Segher Boessenkool, Shaokun Zhang, Shawn Anastasio, Stewart Smith, Suraj Jitindar Singh, Thiago Jung Bauermann, YueHaibing" * tag 'powerpc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (163 commits) powerpc/powernv/idle: Fix restore of SPRN_LDBAR for POWER9 stop state. powerpc/eeh: Handle hugepages in ioremap space ocxl: Update for AFU descriptor template version 1.1 powerpc/boot: pass CONFIG options in a simpler and more robust way powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h powerpc/irq: Don't WARN continuously in arch_local_irq_restore() powerpc/module64: Use symbolic instructions names. powerpc/module32: Use symbolic instructions names. powerpc: Move PPC_HA() PPC_HI() and PPC_LO() to ppc-opcode.h powerpc/module64: Fix comment in R_PPC64_ENTRY handling powerpc/boot: Add lzo support for uImage powerpc/boot: Add lzma support for uImage powerpc/boot: don't force gzipped uImage powerpc/8xx: Add microcode patch to move SMC parameter RAM. powerpc/8xx: Use IO accessors in microcode programming. powerpc/8xx: replace #ifdefs by IS_ENABLED() in microcode.c powerpc/8xx: refactor programming of microcode CPM params. powerpc/8xx: refactor printing of microcode patch name. powerpc/8xx: Refactor microcode write powerpc/8xx: refactor writing of CPM microcode arrays ...
2019-07-12Merge tag 'driver-core-5.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and debugfs updates from Greg KH: "Here is the "big" driver core and debugfs changes for 5.3-rc1 It's a lot of different patches, all across the tree due to some api changes and lots of debugfs cleanups. Other than the debugfs cleanups, in this set of changes we have: - bus iteration function cleanups - scripts/get_abi.pl tool to display and parse Documentation/ABI entries in a simple way - cleanups to Documenatation/ABI/ entries to make them parse easier due to typos and other minor things - default_attrs use for some ktype users - driver model documentation file conversions to .rst - compressed firmware file loading - deferred probe fixes All of these have been in linux-next for a while, with a bunch of merge issues that Stephen has been patient with me for" * tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (102 commits) debugfs: make error message a bit more verbose orangefs: fix build warning from debugfs cleanup patch ubifs: fix build warning after debugfs cleanup patch driver: core: Allow subsystems to continue deferring probe drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT arch_topology: Remove error messages on out-of-memory conditions lib: notifier-error-inject: no need to check return value of debugfs_create functions swiotlb: no need to check return value of debugfs_create functions ceph: no need to check return value of debugfs_create functions sunrpc: no need to check return value of debugfs_create functions ubifs: no need to check return value of debugfs_create functions orangefs: no need to check return value of debugfs_create functions nfsd: no need to check return value of debugfs_create functions lib: 842: no need to check return value of debugfs_create functions debugfs: provide pr_fmt() macro debugfs: log errors when something goes wrong drivers: s390/cio: Fix compilation warning about const qualifiers drivers: Add generic helper to match by of_node driver_find_device: Unify the match function with class_find_device() bus_find_device: Unify the match callback with class_find_device ...
2019-07-12powerpc/powernv/idle: Fix restore of SPRN_LDBAR for POWER9 stop state.Athira Rajeev
commit 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C") reimplemented book3S code to pltform/powernv/idle.c. But when doing so missed to add the per-thread LDBAR update in the core_woken path of the power9_idle_stop(). Patch fixes the same. Fixes: 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190702105836.26695-1-maddy@linux.vnet.ibm.com
2019-07-08Merge branch 'siginfo-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull force_sig() argument change from Eric Biederman: "A source of error over the years has been that force_sig has taken a task parameter when it is only safe to use force_sig with the current task. The force_sig function is built for delivering synchronous signals such as SIGSEGV where the userspace application caused a synchronous fault (such as a page fault) and the kernel responded with a signal. Because the name force_sig does not make this clear, and because the force_sig takes a task parameter the function force_sig has been abused for sending other kinds of signals over the years. Slowly those have been fixed when the oopses have been tracked down. This set of changes fixes the remaining abusers of force_sig and carefully rips out the task parameter from force_sig and friends making this kind of error almost impossible in the future" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits) signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus signal: Remove the signal number and task parameters from force_sig_info signal: Factor force_sig_info_to_task out of force_sig_info signal: Generate the siginfo in force_sig signal: Move the computation of force into send_signal and correct it. signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal signal: Remove the task parameter from force_sig_fault signal: Use force_sig_fault_to_task for the two calls that don't deliver to current signal: Explicitly call force_sig_fault on current signal/unicore32: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from ptrace_break signal/nds32: Remove tsk parameter from send_sigtrap signal/riscv: Remove tsk parameter from do_trap signal/sh: Remove tsk parameter from force_sig_info_fault signal/um: Remove task parameter from send_sigtrap signal/x86: Remove task parameter from send_sigtrap signal: Remove task parameter from force_sig_mceerr signal: Remove task parameter from force_sig signal: Remove task parameter from force_sigsegv ...
2019-07-05powerpc/8xx: Add microcode patch to move SMC parameter RAM.Christophe Leroy
Some SCC functions like the QMC requires an extended parameter RAM. On modern 8xx (ie 866 and 885), SPI area can already be relocated, allowing the use of those functions on SCC2. But SCC3 and SCC4 parameter RAM collide with SMC1 and SMC2 parameter RAMs. This patch adds microcode to allow the relocation of both SMC1 and SMC2, and relocate them at offsets 0x1ec0 and 0x1fc0. Those offsets are by default for the CPM1 DSP1 and DSP2, but there is no kernel driver using them at the moment so this area can be reused. This microcode is provided by Freescale/NXP in Engineering Bulletin EB662 ("MPC8xx I2C/SPI and SMC Relocation Microcode Packages") dated 2006. The binary code is public. The source is not available. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/8xx: Use IO accessors in microcode programming.Christophe Leroy
Change microcode functions to use IO accessors and get rid of volatile attributes. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/8xx: replace #ifdefs by IS_ENABLED() in microcode.cChristophe Leroy
Reduce #ifdef mess by using IS_ENABLED() instead. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/8xx: refactor programming of microcode CPM params.Christophe Leroy
The CPM registers RCCR and CPMCR1..4 registers has to be set in accordance with the microcode patch beeing programmed. Lets define them as part of the patch set and refactor their programming from that definition. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/8xx: refactor printing of microcode patch name.Christophe Leroy
Define patch name together with the patch code, and refactor the associated printk() while replacing it by a pr_info() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/8xx: Refactor microcode writeChristophe Leroy
Add empty microcode tables so that all tables are defined all the time. Regroup the writing of the 3 tables regardless of the selected microcode. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/8xx: refactor writing of CPM microcode arraysChristophe Leroy
Create a function to refactor the writing of CPM microcode arrays. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/8xx: compact microcode arraysChristophe Leroy
Compact obscure microcode arrays by putting 4 values per line in order to reduce number of lines in the file to increase readability. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/8xx: drop verify_patch()Christophe Leroy
verify_patch() has been opted out since many years, and the comment suggests it doesn't work. So drop it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/8xx: move CPM1 related files from sysdev/ to platforms/8xxChristophe Leroy
Only 8xx selects CPM1 and related CONFIG options are already in platforms/8xx/Kconfig Move the related C files to platforms/8xx/. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Minor formatting fixes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/mm: Remove radix dependency on HugeTLB pageAneesh Kumar K.V
Now that we have switched the page table walk to use pmd_is_leaf we can now revert commit 8adddf349fda ("powerpc/mm/radix: Make Radix require HUGETLB_PAGE") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>