summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-05-28powerpc/powernv/pci: Re-work bus PE configurationOliver O'Halloran
For normal PHBs IODA PEs are handled on a per-bus basis so all the devices on that bus will share a PE. Which PE specificly is determined by the location of the MMIO BARs for the devices on the bus so we can't actually configure the bus PEs until after MMIO resources are allocated. As a result PEs are currently configured by pcibios_setup_bridge(), which is called just before the bridge windows are programmed into the bus' parent bridge. Configuring the bus PE here causes a few problems: 1. The root bus doesn't have a parent bridge so setting up the PE for the root bus requires some hacks. 2. The PELT-V isn't setup correctly because pnv_ioda_set_peltv() assumes that PEs will be configured in root-to-leaf order. This assumption is broken because resource assignment is performed depth-first so the leaf bridges are setup before their parents are. The hack mentioned in 1) results in the "correct" PELT-V for busses immediately below the root port, but not for devices below a switch. 3. It's possible to break the sysfs PCI rescan feature by removing all the devices on a bus. When the last device is removed from a PE its will be de-configured. Rescanning the devices on a bus does not cause the bridge to be reconfigured rendering the devices on that bus unusable. We can address most of these problems by moving the PE setup out of pcibios_setup_bridge() and into pcibios_bus_add_device(). This fixes 1) and 2) because pcibios_bus_add_device() is called on each device in root-to-leaf order so PEs for parent buses will always be configured before their children. It also fixes 3) by ensuring the PE is configured before initialising DMA for the device. In the event the PE was de-configured due to removing all the devices in that PE it will now be reconfigured when a new device is added since there's no dependecy on the bridge_setup() hook being called. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200417073508.30356-3-oohall@gmail.com
2020-05-28powerpc/powernv/pci: Add helper to find ioda_pe from BDFNOliver O'Halloran
For each PHB we maintain a reverse-map that can be used to find the PE that a BDFN is currently mapped to. Add a helper for doing this lookup so we can check if a PE has been configured without looking at pdn->pe_number. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200417073508.30356-2-oohall@gmail.com
2020-05-28powerpc/powernv/pci: Add an explaination for PNV_IODA_PE_BUS_ALLOliver O'Halloran
It's pretty obsecure and confused me for a long time so I figured it's worth documenting properly. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200414233502.758-1-oohall@gmail.com
2020-05-28powerpc/powernv: Add a print indicating when an IODA PE is releasedOliver O'Halloran
Quite useful to know in some cases. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200408112213.5549-1-oohall@gmail.com
2020-05-28powerpc/powernv/npu: Move IOMMU group setup into npu-dma.cOliver O'Halloran
The NVlink IOMMU group setup is only relevant to NVLink devices so move it into the NPU containment zone. This let us remove some prototypes in pci.h and staticfy some function definitions. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200406030745.24595-8-oohall@gmail.com
2020-05-28powerpc/powernv/pci: Move tce size parsing to pci-ioda-tce.cOliver O'Halloran
Move it in with the rest of the TCE wrangling rather than carting around a static prototype in pci-ioda.c Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200406030745.24595-7-oohall@gmail.com
2020-05-28powerpc/powernv/pci: Delete old iommu recursive iommu setupOliver O'Halloran
No longer used. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200406030745.24595-6-oohall@gmail.com
2020-05-28powerpc/powernv/pci: Add device to iommu group during dma_dev_setup()Oliver O'Halloran
Historically adding devices to their respective iommu group has been handled by the post-init phb fixup for most devices. This was done because: 1) The IOMMU group is tied to the PE (usually) so we can only setup the iommu groups after we've done resource allocation since BAR location determines the device's PE, and: 2) The sysfs directory for the pci_dev needs to be available since iommu_add_device() wants to add an attribute for the iommu group. However, since commit 30d87ef8b38d ("powerpc/pci: Fix pcibios_setup_device() ordering") both conditions are met when hose->ops->dma_dev_setup() is called so there's no real need to do this in the fixup. Moving the call to iommu_add_device() into pnv_pci_ioda_dma_setup_dev() is a nice cleanup since it puts all the per-device IOMMU setup into one place. It also results in all (non-nvlink) devices getting their iommu group via a common path rather than relying on the bus notifier hack in pnv_tce_iommu_bus_notifier() to handle the adding VFs and hotplugged devices to their group. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200406030745.24595-5-oohall@gmail.com
2020-05-28powerpc/powernv/pci: Register iommu group at PE DMA setupOliver O'Halloran
Move the registration of IOMMU groups out of the post-phb init fixup and into when we configure DMA for a PE. For most devices this doesn't result in any functional changes, but for NVLink attached GPUs it requires a bit of care. When the GPU is probed an IOMMU group would be created for the PE that contains it. We need to ensure that group is removed before we add the PE to the compound group that's used to keep the translations see by the PCIe and NVLink buses the same. No functional changes. Probably. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200406030745.24595-4-oohall@gmail.com
2020-05-28powerpc/powernv/iov: Don't add VFs to iommu group during PE configOliver O'Halloran
In pnv_ioda_setup_vf_PE() we register an iommu group for the VF PE then call pnv_ioda_setup_bus_iommu_group() to add devices to that group. However, this function is called before the VFs are scanned so there's no devices to add. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200406030745.24595-3-oohall@gmail.com
2020-05-28powerpc/powernv/npu: Clean up compound table group initialisationOliver O'Halloran
Re-work the control flow a bit so what's going on is a little clearer. This also ensures the table_group is only initialised once in the P9 case. This shouldn't be a functional change since all the GPU PCI devices should have the same table_group configuration, but it does look strange. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200406030745.24595-2-oohall@gmail.com
2020-05-28powerpc/64s/kuap: Conditionally restore AMR in kuap_restore_amr asmNicholas Piggin
Similar to the C code change, make the AMR restore conditional on whether the register has changed. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429065654.1677541-7-npiggin@gmail.com
2020-05-28powerpc/64/kuap: Conditionally restore AMR in interrupt exitNicholas Piggin
The AMR update is made conditional on AMR actually changing, which should be the less common case on most workloads (though kernel page faults on uaccess could be frequent, this doesn't significantly slow down that case). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429065654.1677541-4-npiggin@gmail.com
2020-05-28powerpc/64s/kuap: Add missing isync to KUAP restore pathsNicholas Piggin
Writing the AMR register is documented to require context synchronizing operations before and after, for it to take effect as expected. The KUAP restore at interrupt exit time deliberately avoids the isync after the AMR update because it only needs to take effect after the context synchronizing RFID that soon follows. Add a comment for this. The missing isync before the update doesn't have an obvious justification, and seems it could theoretically allow a rogue user access to leak past the AMR update. Add isyncs for these. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429065654.1677541-3-npiggin@gmail.com
2020-05-28powerpc/4xx: Don't unmap NULL mbasehuhai
Signed-off-by: huhai <huhai@tj.kylinos.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200521072648.1254699-1-mpe@ellerman.id.au
2020-05-28powerpc/40x: Don't save CR in SPRN_SPRG_SCRATCH6Christophe Leroy
We have r12 available, use it to keep CR around and don't save it in SPRN_SPRG_SCRATCH6. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/019f314a98c107c4ca46e46c1cf402e9a44114a7.1590079969.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/40x: Avoid using r12 in TLB miss handlersChristophe Leroy
Let's reduce the number of registers used in TLB miss handlers. We have both r9 and r12 available for any temporary use. r9 is enough, avoid using r12. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7f330e971952abb2645fb9ca4310c0f527e84dcb.1590079969.git.christophe.leroy@csgroup.eu
2020-05-28powerpc: Remove IBM405 Erratum #77Christophe Leroy
This erratum is dedicated to IBM 405GP and STB03xxx which are now gone. Remove this erratum. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/44dbc08e9034681eb28324cbabc086e97044c36c.1590079969.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/40x: Remove IBM405 Erratum #51Christophe Leroy
This erratum was for IBM 403GCX, 405EP and STB03xxx which are now gone. Remove this erratum. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1b6c9916514ef3e084bba57925ad9eb444627566.1590079969.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/40x: Remove support for IBM 405GPChristophe Leroy
All platforms selecting the obsolete processor are gone now. Remove support for it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/906c6a6df710f2826e332b8a0cd5d2859a913a1c.1590079969.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/40x: Remove support for ISS SimulatorChristophe Leroy
ISS4xx has support for 405GP which is obsolete. Remote it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7380974bf5952af825ae2552d0a987c0c1c8b506.1590079969.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/40x: Remove EP405Christophe Leroy
EP405 is an old type of board based on a 405GP which is obsolete. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e9534caa51f327c841b3db5f48043a47ad70d246.1590079968.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/40x: Remove WALNUTChristophe Leroy
CONFIG_WALNUT is not selected by any config and is based on 405GP which is obsolete. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ab46013d8d33346af68faf30a719a586c3befad9.1590079968.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/40x: Remove STB03xxxChristophe Leroy
CONFIG_STB03xxx is not user selectable and is not selected by any config. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d7d73f9a8ee3a890566abace568101e9b4836016.1590079968.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/40x: Remove support for IBM 403GCXChristophe Leroy
CONFIG_403GCX is not user selectable and is not selected by any platform. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/635f8f5ce9d1f761b3bd8dc3e8ddad500cea26c4.1590079968.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/pgtable: Drop PTE_ATOMIC_UPDATESChristophe Leroy
40x was the last user of PTE_ATOMIC_UPDATES. Drop everything related to PTE_ATOMIC_UPDATES. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/dbe8438fd1ed3e500132c8ab70269d4e6cc84531.1590079968.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/40x: Rework 40x PTE access and TLB missChristophe Leroy
Commit 1bc54c03117b ("powerpc: rework 4xx PTE access and TLB miss") reworked 44x PTE access to avoid atomic pte updates, and left 8xx, 40x and fsl booke with atomic pte updates. Commit 6cfd8990e27d ("powerpc: rework FSL Book-E PTE access and TLB miss") removed atomic pte updates on fsl booke. It went away on 8xx with commit ddfc20a3b9ae ("powerpc/8xx: Remove PTE_ATOMIC_UPDATES"). 40x is the last platform setting PTE_ATOMIC_UPDATES. Rework PTE access and TLB miss to remove PTE_ATOMIC_UPDATES for 40x: - Always handle DSI as a fault. - Bail out of TLB miss handler when CONFIG_SWAP is set and _PAGE_ACCESSED is not set. - Bail out of ITLB miss handler when _PAGE_EXEC is not set. - Only set WR bit when both _PAGE_RW and _PAGE_DIRTY are set. - Remove _PAGE_HWWRITE - Don't require PTE_ATOMIC_UPDATES anymore Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/99a0fcd337ef67088140d1647d75fea026a70413.1590079968.git.christophe.leroy@csgroup.eu
2020-05-28sound: ac97: Remove sound driver for ancient platformMichal Simek
Xilinx PowerPC platforms are no longer supported and none is really testing these platforms that's why remove them. If someone has any issue with it these patches can be reverted. Signed-off-by: Michal Simek <michal.simek@xilinx.com> Acked-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/31a3b884dde2c47a30bb2b92355978b97ea70f86.1585575111.git.michal.simek@xilinx.com
2020-05-28powerpc: Remove Xilinx PPC405/PPC440 supportMichal Simek
The latest Xilinx design tools called ISE and EDK has been released in October 2013. New tool doesn't support any PPC405/PPC440 new designs. These platforms are no longer supported and tested. PowerPC 405/440 port is orphan from 2013 by commit cdeb89943bfc ("MAINTAINERS: Fix incorrect status tag") and commit 19624236cce1 ("MAINTAINERS: Update Grant's email address and maintainership") that's why it is time to remove the support fot these platforms. Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8c593895e2cb57d232d85ce4d8c3a1aa7f0869cc.1590079968.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/64: Refactor interrupt exit irq disabling sequenceNicholas Piggin
The same complicated sequence for juggling EE, RI, soft mask, and irq tracing is repeated 3 times, tidy these up into one function. This differs qiute a bit between sub architectures, so this makes the ppc32 port cleaner as well. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429062421.1675400-1-npiggin@gmail.com
2020-05-28powerpc/64s/radix: Don't prefetch DAR in update_mmu_cacheNicholas Piggin
The idea behind this prefetch was to kick off a page table walk before returning from the fault, getting some pipelining advantage. But this never showed up any noticable performance advantage, and in fact with KUAP the prefetches are actually blocked and cause some kind of micro-architectural fault. Removing this improves page fault microbenchmark performance by about 9%. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Keep the early return in update_mmu_cache()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200504122907.49304-1-npiggin@gmail.com
2020-05-28input: i8042 - Remove special PowerPC handlingNathan Chancellor
This causes a build error with CONFIG_WALNUT because kb_cs and kb_data were removed in commit 917f0af9e5a9 ("powerpc: Remove arch/ppc and include/asm-ppc"). ld.lld: error: undefined symbol: kb_cs > referenced by i8042-ppcio.h:28 (drivers/input/serio/i8042-ppcio.h:28) > input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a > referenced by i8042-ppcio.h:28 (drivers/input/serio/i8042-ppcio.h:28) > input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a > referenced by i8042-ppcio.h:28 (drivers/input/serio/i8042-ppcio.h:28) > input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a ld.lld: error: undefined symbol: kb_data > referenced by i8042.c:309 (drivers/input/serio/i8042.c:309) > input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a > referenced by i8042-ppcio.h:33 (drivers/input/serio/i8042-ppcio.h:33) > input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a > referenced by i8042.c:319 (drivers/input/serio/i8042.c:319) > input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a > referenced 15 more times Presumably since nobody has noticed this for the last 12 years, there is not anyone actually trying to use this driver so we can just remove this special walnut code and use the generic header so it builds for all configurations. Fixes: 917f0af9e5a9 ("powerpc: Remove arch/ppc and include/asm-ppc") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Link: https://lore.kernel.org/r/20200518181043.3363953-1-natechancellor@gmail.com
2020-05-28macintosh/ams-input: switch to using input device polling modeDmitry Torokhov
Now that instances of input_dev support polling mode natively, we no longer need to create input_polled_dev instance. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191002214854.GA114387@dtor-ws
2020-05-28powerpc/xive: Do not expose a debugfs file when XIVE is disabledCédric Le Goater
The XIVE interrupt mode can be disabled with the "xive=off" kernel parameter, in which case there is nothing to present to the user in the associated /sys/kernel/debug/powerpc/xive file. Fixes: 930914b7d528 ("powerpc/xive: Add a debugfs file to dump internal XIVE state") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429075122.1216388-4-clg@kaod.org
2020-05-26powerpc/xive: Clear the page tables for the ESB IO mappingCédric Le Goater
Commit 1ca3dec2b2df ("powerpc/xive: Prevent page fault issues in the machine crash handler") fixed an issue in the FW assisted dump of machines using hash MMU and the XIVE interrupt mode under the POWER hypervisor. It forced the mapping of the ESB page of interrupts being mapped in the Linux IRQ number space to make sure the 'crash kexec' sequence worked during such an event. But it didn't handle the un-mapping. This mapping is now blocking the removal of a passthrough IO adapter under the POWER hypervisor because it expects the guest OS to have cleared all page table entries related to the adapter. If some are still present, the RTAS call which isolates the PCI slot returns error 9001 "valid outstanding translations". Remove these mapping in the IRQ data cleanup routine. Under KVM, this cleanup is not required because the ESB pages for the adapter interrupts are un-mapped from the guest by the hypervisor in the KVM XIVE native device. This is now redundant but it's harmless. Fixes: 1ca3dec2b2df ("powerpc/xive: Prevent page fault issues in the machine crash handler") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429075122.1216388-2-clg@kaod.org
2020-05-26powerpc: Add ppc_inst_as_u64()Michael Ellerman
The code patching code wants to get the value of a struct ppc_inst as a u64 when the instruction is prefixed, so we can pass the u64 down to __put_user_asm() and write it with a single store. The optprobes code wants to load a struct ppc_inst as an immediate into a register so it is useful to have it as a u64 to use the existing helper function. Currently this is a bit awkward because the value differs based on the CPU endianness, so add a helper to do the conversion. This fixes the usage in arch_prepare_optimized_kprobe() which was previously incorrect on big endian. Fixes: 650b55b707fd ("powerpc: Add prefixed instructions to instruction data type") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Jordan Niethe <jniethe5@gmail.com> Link: https://lore.kernel.org/r/20200526072630.2487363-1-mpe@ellerman.id.au
2020-05-26powerpc: Add ppc_inst_next()Michael Ellerman
In a few places we want to calculate the address of the next instruction. Previously that was simple, we just added 4 bytes, or if using a u32 * we incremented that pointer by 1. But prefixed instructions make it more complicated, we need to advance by either 4 or 8 bytes depending on the actual instruction. We also can't do pointer arithmetic using struct ppc_inst, because it is always 8 bytes in size on 64-bit, even though we might only need to advance by 4 bytes. So add a ppc_inst_next() helper which calculates the location of the next instruction, if the given instruction was located at the given address. Note the instruction doesn't need to actually be at the address in memory. Although it would seem natural for the value to be passed by value, that makes it too easy to write a loop that will read off the end of a page, eg: for (; src < end; src = ppc_inst_next(src, *src), dest = ppc_inst_next(dest, *dest)) As noticed by Christophe and Jordan, if end is the exact end of a page, and the next page is not mapped, this will fault, because *dest will read 8 bytes, 4 bytes into the next page. So value is passed by reference, so the helper can be careful to use ppc_inst_read() on it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Jordan Niethe <jniethe5@gmail.com> Link: https://lore.kernel.org/r/20200522133318.1681406-1-mpe@ellerman.id.au
2020-05-26Merge branch 'fixes' into nextMichael Ellerman
Merge our fixes branch from this cycle. It contains several important fixes we need in next for testing purposes, and also some that will conflict with upcoming changes.
2020-05-26Merge "Use hugepages to map kernel mem on 8xx" into nextMichael Ellerman
Merge Christophe's large series to use huge pages for the linear mapping on 8xx. From his cover letter: The main purpose of this big series is to: - reorganise huge page handling to avoid using mm_slices. - use huge pages to map kernel memory on the 8xx. The 8xx supports 4 page sizes: 4k, 16k, 512k and 8M. It uses 2 Level page tables, PGD having 1024 entries, each entry covering 4M address space. Then each page table has 1024 entries. At the time being, page sizes are managed in PGD entries, implying the use of mm_slices as it can't mix several pages of the same size in one page table. The first purpose of this series is to reorganise things so that standard page tables can also handle 512k pages. This is done by adding a new _PAGE_HUGE flag which will be copied into the Level 1 entry in the TLB miss handler. That done, we have 2 types of pages: - PGD entries to regular page tables handling 4k/16k and 512k pages - PGD entries to hugepd tables handling 8M pages. There is no need to mix 8M pages with other sizes, because a 8M page will use more than what a single PGD covers. Then comes the second purpose of this series. At the time being, the 8xx has implemented special handling in the TLB miss handlers in order to transparently map kernel linear address space and the IMMR using huge pages by building the TLB entries in assembly at the time of the exception. As mm_slices is only for user space pages, and also because it would anyway not be convenient to slice kernel address space, it was not possible to use huge pages for kernel address space. But after step one of the series, it is now more flexible to use huge pages. This series drop all assembly 'just in time' handling of huge pages and use huge pages in page tables instead. Once the above is done, then comes icing on the cake: - Use huge pages for KASAN shadow mapping - Allow pinned TLBs with strict kernel rwx - Allow pinned TLBs with debug pagealloc Then, last but not least, those modifications for the 8xx allows the following improvement on book3s/32: - Mapping KASAN shadow with BATs - Allowing BATs with debug pagealloc All this allows to considerably simplify TLB miss handlers and associated initialisation. The overhead of reading page tables is negligible compared to the reduction of the miss handlers. While we were at touching pte_update(), some cleanup was done there too. Tested widely on 8xx and 832x. Boot tested on QEMU MAC99.
2020-05-26powerpc/32s: Implement dedicated kasan_init_region()Christophe Leroy
Implement a kasan_init_region() dedicated to book3s/32 that allocates KASAN regions using BATs. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/709e821602b48a1d7c211a9b156da26db98c3e9d.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/32s: Allow mapping with BATs with DEBUG_PAGEALLOCChristophe Leroy
DEBUG_PAGEALLOC only manages RW data. Text and RO data can still be mapped with BATs. In order to map with BATs, also enforce data alignment. Set by default to 256M which is a good compromise for keeping enough BATs for also KASAN and IMMR. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fd29c1718ee44d82115d0e835ced808eb4ccbf51.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/8xx: Implement dedicated kasan_init_region()Christophe Leroy
Implement a kasan_init_region() dedicated to 8xx that allocates KASAN regions using huge pages. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d2d60202a8821dc81cffe6ff59cc13c15b7e4bb6.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/8xx: Allow large TLBs with DEBUG_PAGEALLOCChristophe Leroy
DEBUG_PAGEALLOC only manages RW data. Text and RO data can still be mapped with hugepages and pinned TLB. In order to map with hugepages, also enforce a 512kB data alignment minimum. That's a trade-off between size of speed, taking into account that DEBUG_PAGEALLOC is a debug option. Anyway the alignment is still tunable. We also allow tuning of alignment for book3s to limit the complexity of the test in Kconfig that will anyway disappear in the following patches once DEBUG_PAGEALLOC is handled together with BATs. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c13256f2d356a316715da61fe089b3623ef217a5.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/8xx: Allow STRICT_KERNEL_RwX with pinned TLBChristophe Leroy
Pinned TLB are 8M. Now that there is no strict boundary anymore between text and RO data, it is possible to use 8M pinned executable TLB that covers both text and RO data. When PIN_TLB_DATA or PIN_TLB_TEXT is selected, enforce 8M RW data alignment and allow STRICT_KERNEL_RWX. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c535fc97bf0dd8693192e25feeed8088701e00c6.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/8xx: Map linear memory with huge pagesChristophe Leroy
Map linear memory space with 512k and 8M pages whenever possible. Three mappings are performed: - One for kernel text - One for RO data - One for the rest Separating the mappings is done to be able to update the protection later when using STRICT_KERNEL_RWX. The ITLB miss handler now need to also handle huge TLBs unless kernel text in pinned. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c44f0ab5510474f25123d904cd1f4e5c6aa3c1ac.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/8xx: Map IMMR with a huge pageChristophe Leroy
Map the IMMR area with a single 512k huge page. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9495dba06669da40e133f24607758fa6dcc65f66.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/8xx: Add a function to early map kernel via huge pagesChristophe Leroy
Add a function to early map kernel memory using huge pages. For 512k pages, just use standard page table and map in using 512k pages. For 8M pages, create a hugepd table and populate the two PGD entries with it. This function can only be used to create page tables at startup. Once the regular SLAB allocation functions replace memblock functions, this function cannot allocate new pages anymore. However it can still update existing mappings with new protections. hugepd_none() macro is moved into asm/hugetlb.h to be usable outside of mm/hugetlbpage.c early_pte_alloc_kernel() is made visible. _PAGE_HUGE flag is now displayed by ptdump. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Change ptdump display to use "huge"] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/68325bcd3b6f93127f7810418a2352c3519066d6.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/8xx: Refactor kernel address boundary comparisonChristophe Leroy
Now that linear and IMMR dedicated TLB handling is gone, kernel boundary address comparison is similar in ITLB miss handler and in DTLB miss handler. Create a macro named compare_to_kernel_boundary. When TASK_SIZE is strictly below 0x80000000 and PAGE_OFFSET is above 0x80000000, it is enough to compare to 0x8000000, and this can be done with a single instruction. Using not. instruction, we get to use 'blt' conditional branch as when doing a regular comparison: 0x00000000 <= addr <= 0x7fffffff ==> 0xffffffff >= NOT(addr) >= 0x80000000 The above test corresponds to a 'blt' Otherwise, do a regular comparison using two instructions. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6312575d06a8813105e6564a3b12e1d373aa1b2f.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/mm: Don't be too strict with _etext alignment on PPC32Christophe Leroy
Similar to PPC64, accept to map RO data as ROX as a trade off between between security and memory usage. Having RO data executable is not a high risk as RO data can't be modified to forge an exploit. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8c4a0d89d944eed984dd941e509614031a5ace2b.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/8xx: Move DTLB perf handling closer.Christophe Leroy
Now that space have been freed next to the DTLB miss handler, it's associated DTLB perf handling can be brought back in the same place. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/97f48cc1a2ea6b895bfac0752cbe59deaf2eecda.1589866984.git.christophe.leroy@csgroup.eu