summaryrefslogtreecommitdiff
path: root/arch/powerpc/platforms/powernv
AgeCommit message (Collapse)Author
2020-05-20powerpc/powernv: add NULL check after kzallocChen Zhou
Fixes coccicheck warning: ./arch/powerpc/platforms/powernv/opal.c:813:1-5: alloc with no test, possible model on line 814 Add NULL check after kzalloc. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200509020838.121660-1-chenzhou10@huawei.com
2020-05-11powerpc: Replace _ALIGN_UP() by ALIGN()Christophe Leroy
_ALIGN_UP() is specific to powerpc ALIGN() is generic and does the same Replace _ALIGN_UP() by ALIGN() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Joel Stanley <joel@jms.id.au> Link: https://lore.kernel.org/r/8a6d7e45f7904c73a0af539642d3962e2a3c7268.1587407777.git.christophe.leroy@c-s.fr
2020-05-11powerpc: Replace _ALIGN_DOWN() by ALIGN_DOWN()Christophe Leroy
_ALIGN_DOWN() is specific to powerpc ALIGN_DOWN() is generic and does the same Replace _ALIGN_DOWN() by ALIGN_DOWN() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Joel Stanley <joel@jms.id.au> Link: https://lore.kernel.org/r/3911a86d6b5bfa7ad88cd7c82416fbe6bb47e793.1587407777.git.christophe.leroy@c-s.fr
2020-05-11powerpc/powernv: Fix a warning messageChristophe JAILLET
Fix a cut'n'paste error in a warning message. This should be 'cpu-idle-state-residency-ns' to match the property searched in the previous 'of_property_read_u32_array()' Fixes: 9c7b185ab2fe ("powernv/cpuidle: Parse dt idle properties into global structure") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200502115949.139000-1-christophe.jaillet@wanadoo.fr
2020-04-20powerpc/vas: Add VAS user space APIHaren Myneni
On power9, userspace can send GZIP compression requests directly to NX once kernel establishes NX channel / window with VAS. This patch provides user space API which allows user space to establish channel using open VAS_TX_WIN_OPEN ioctl, mmap and close operations. Each window corresponds to file descriptor and application can open multiple windows. After the window is opened, VAS_TX_WIN_OPEN icoctl to open a window on specific VAS instance, mmap() system call to map the hardware address of engine's request queue into the application's virtual address space. Then the application can then submit one or more requests to the the engine by using the copy/paste instructions and pasting the CRBs to the virtual address (aka paste_address) returned by mmap(). Only NX GZIP coprocessor type is supported right now and allow GZIP engine access via /dev/crypto/nx-gzip device node. Thanks to Michael Ellerman for his changes and suggestions to make the ioctl generic to support any coprocessor type. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1587114121.2275.1109.camel@hbabu-laptop
2020-04-20powerpc/vas: Initialize window attributes for GZIP coprocessor typeHaren Myneni
Initialize send and receive window attributes for GZIP high and normal priority types. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1587114029.2275.1103.camel@hbabu-laptop
2020-04-20powerpc: Use mm_context vas_windows counter to issue CP_ABORTHaren Myneni
set_thread_uses_vas() sets used_vas flag for a process that opened VAS window and issue CP_ABORT during context switch for only that process. In multi-thread application, windows can be shared. For example Thread A can open a window and Thread B can run COPY/PASTE instructions to send NX request which may cause corruption or snooping or a covert channel Also once this flag is set, continue to run CP_ABORT even the VAS window is closed. So define vas-windows counter in process mm_context, increment this counter for each window open and decrement it for window close. If vas-windows is set, issue CP_ABORT during context switch. It means clear the foreign real address mapping only if the process / thread uses COPY/PASTE. Then disable it for that process if windows are not open. Moved set_thread_uses_vas() code to vas_tx_win_open() as this functionality is needed only for userspace open windows. We are adding VAS userspace support along with this fix. So no need to include this fix in stable releases. Fixes: 9d2a4d71332c ("powerpc: Define set_thread_uses_vas()") Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reported-by: Nicholas Piggin <npiggin@gmail.com> Suggested-by: Milton Miller <miltonm@us.ibm.com> Suggested-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1587017291.2275.1077.camel@hbabu-laptop
2020-04-20powerpc/vas: Free send window in VAS instance after credits returnedHaren Myneni
NX may be processing requests while trying to close window. Wait until all credits are returned and then free send window from VAS instance. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1587017256.2275.1076.camel@hbabu-laptop
2020-04-20powerpc/vas: Display process stuck messageHaren Myneni
Process can not close send window until all requests are processed. Means wait until window state is not busy and send credits are returned. Display debug messages in case taking longer to close the window. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1587017219.2275.1073.camel@hbabu-laptop
2020-04-20powerpc/vas: Do not use default credits for receive windowHaren Myneni
System checkstops if RxFIFO overruns with more requests than the maximum possible number of CRBs allowed in FIFO at any time. So max credits value (rxattr.wcreds_max) is set and is passed to vas_rx_win_open() by the the driver. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1587017136.2275.1070.camel@hbabu-laptop
2020-04-20powerpc/vas: Print CRB and FIFO valuesHaren Myneni
Dump FIFO entries if could not find send window and print CRB for debugging. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1587017099.2275.1067.camel@hbabu-laptop
2020-04-20powerpc/vas: Return credits after handling faultHaren Myneni
NX uses credit mechanism to control the number of requests issued on a specific window at any point of time. Only send windows and fault window are used credits. When the request is issued on a given window, a credit is taken. This credit will be returned after that request is processed. If credits are not available, returns RMA_Busy for send window and RMA_Reject for fault window. NX expects OS to return credit for send window after processing fault CRB. Also credit has to be returned for fault window after handling the fault. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1587017059.2275.1064.camel@hbabu-laptop
2020-04-20powerpc/vas: Update CSB and notify process for fault CRBsHaren Myneni
Applications polls on CSB for the status update after requests are issued. NX process these requests and update the CSB with the status. If it encounters translation error, pastes CRB in fault FIFO and raises an interrupt. The kernel handles fault by reading CRB from fault FIFO and process the fault CRB. For each fault CRB, update fault address in CRB (fault_storage_addr) and translation error status in CSB so that user space can touch the fault address and resend the request. If the user space passed invalid CSB address send signal to process with SIGSEGV. In the case of multi-thread applications, child thread may not be available. So if the task is not running, send signal to tgid. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1587017022.2275.1063.camel@hbabu-laptop
2020-04-20powerpc/vas: Setup thread IRQ handler per VAS instanceHaren Myneni
When NX encounters translation error on CRB and any request buffer, raises an interrupt on the CPU to handle the fault. It can raise one interrupt for multiple faults. Expects OS to handle these faults and return credits for fault window after processing faults. Setup thread IRQ handler and IRQ thread function per each VAS instance. IRQ handler checks if the thread is already woken up and can handle new faults. If so returns with IRQ_HANDLED, otherwise wake up thread to process new faults. The thread functions reads each CRB entry from fault FIFO until sees invalid entry. After reading each CRB, determine the corresponding send window using pswid (from CRB) and process fault CRB. Then invalidate the entry and return credit. Processing fault CRB and return credit is described in subsequent patches. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1587016982.2275.1060.camel@hbabu-laptop
2020-04-20powerpc/vas: Take reference to PID and mm for user space windowsHaren Myneni
When process opens a window, its pid and tgid will be saved in the vas_window struct. This window will be closed when the process exits. The kernel handles NX faults by updating CSB or send SEGV signal to pid of the process if the userspace csb addr is invalid. In multi-thread applications, a window can be opened by a child thread, but it will not be closed when this thread exits. It is expected that the parent will clean up all resources including NX windows opened by child threads. A child thread can send NX requests using this window and could be killed before completion is reported. If the pid assigned to this thread is reused while requests are pending, a failure SEGV would be directed to the wrong place. To prevent reusing the pid, take references to pid and mm when the window is opened and release them when when the window is closed. Then if child thread is not running, SEGV signal will be sent to thread group leader (tgid). Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1587016936.2275.1057.camel@hbabu-laptop
2020-04-20powerpc/vas: Register NX with fault window ID and IRQ port valueHaren Myneni
For each user space send window, register NX with fault window ID and port value so that NX paste CRBs in this fault FIFO when it sees fault on the request buffer. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1587016888.2275.1054.camel@hbabu-laptop
2020-04-20powerpc/vas: Setup fault window per VAS instanceHaren Myneni
Setup fault window for each VAS instance. When NX gets a fault on request buffer, pastes fault CRB in the corresponding fault FIFO and then raises an interrupt to the OS. The kernel handles this fault and process faults CRB from this FIFO. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1587016846.2275.1053.camel@hbabu-laptop
2020-04-20powerpc/vas: Alloc and setup IRQ and trigger port addressHaren Myneni
Allocate a xive irq on each chip with a vas instance. The NX coprocessor raises a host CPU interrupt via vas if it encounters page fault on user space request buffer. Subsequent patches register the trigger port with the NX coprocessor, and create a vas fault handler for this interrupt mapping. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1587016806.2275.1050.camel@hbabu-laptop
2020-04-09Merge tag 'powerpc-5.7-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull more powerpc updates from Michael Ellerman: "The bulk of this is the series to make CONFIG_COMPAT user-selectable, it's been around for a long time but was blocked behind the syscall-in-C series. Plus there's also a few fixes and other minor things. Summary: - A fix for a crash in machine check handling on pseries (ie. guests) - A small series to make it possible to disable CONFIG_COMPAT, and turn it off by default for ppc64le where it's not used. - A few other miscellaneous fixes and small improvements. Thanks to: Alexey Kardashevskiy, Anju T Sudhakar, Arnd Bergmann, Christophe Leroy, Dan Carpenter, Ganesh Goudar, Geert Uytterhoeven, Geoff Levand, Mahesh Salgaonkar, Markus Elfring, Michal Suchanek, Nicholas Piggin, Stephen Boyd, Wen Xiong" * tag 'powerpc-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: selftests/powerpc: Always build the tm-poison test 64-bit powerpc: Improve ppc_save_regs() Revert "powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled" powerpc/time: Replace <linux/clk-provider.h> by <linux/of_clk.h> powerpc/pseries/ddw: Extend upper limit for huge DMA window for persistent memory powerpc/perf: split callchain.c by bitness powerpc/64: Make COMPAT user-selectable disabled on littleendian by default. powerpc/64: make buildable without CONFIG_COMPAT powerpc/perf: consolidate valid_user_sp -> invalid_user_sp powerpc/perf: consolidate read_user_stack_32 powerpc: move common register copy functions from signal_32.c to signal.c powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro powerpc/ps3: Set CONFIG_UEVENT_HELPER=y in ps3_defconfig powerpc/ps3: Remove an unneeded NULL check powerpc/ps3: Remove duplicate error message powerpc/powernv: Re-enable imc trace-mode in kernel powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events. powerpc/pseries: Fix MCE handling on pseries selftests/eeh: Skip ahci adapters powerpc/64s: Fix doorbell wakeup msgclr optimisation
2020-04-07powernv/memtrace: always online added memory blocksDavid Hildenbrand
Let's always try to online the re-added memory blocks. In case add_memory() already onlined the added memory blocks, the first device_online() call will fail and stop processing the remaining memory blocks. This avoids manually having to check memhp_auto_online. Note: PPC always onlines all hotplugged memory directly from the kernel as well - something that is handled by user space on other architectures. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Yumei Huang <yuhuang@redhat.com> Link: http://lkml.kernel.org/r/20200317104942.11178-5-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-03powerpc/powernv: Re-enable imc trace-mode in kernelAnju T Sudhakar
commit <249fad734a25> ""powerpc/perf: Disable trace_imc pmu" disables IMC(In-Memory Collection) trace-mode in kernel, since frequent mode switching between accumulation mode and trace mode via the spr LDBAR in the hardware can trigger a checkstop(system crash). Patch to re-enable imc-trace mode in kernel. The previous patch(1/2) in this series will address the mode switching issue by implementing a global lock, and will restrict the usage of accumulation and trace-mode at a time. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200313055238.8656-2-anju@linux.vnet.ibm.com
2020-03-25powerpc/eeh: Rework eeh_ops->probe()Oliver O'Halloran
With the EEH early probe now being pseries specific there's no need for eeh_ops->probe() to take a pci_dn. Instead, we can make it take a pci_dev and use the probe function to map a pci_dev to an eeh_dev. This allows the platform to implement it's own method for finding (or creating) an eeh_dev for a given pci_dev which also removes a use of pci_dn in generic EEH code. This patch also renames eeh_device_add_late() to eeh_device_probe(). This better reflects what it does does and removes the last vestiges of the early/late EEH probe split. Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306073904.4737-6-oohall@gmail.com
2020-03-25powerpc/eeh: Make early EEH init pseries specificOliver O'Halloran
The eeh_ops->probe() function is called from two different contexts: 1. On pseries, where we set EEH_PROBE_MODE_DEVTREE, it's called in eeh_add_device_early() which is supposed to run before we create a pci_dev. 2. On PowerNV, where we set EEH_PROBE_MODE_DEV, it's called in eeh_device_add_late() which is supposed to run *after* the pci_dev is created. The "early" probe is required because PAPR requires that we perform an RTAS call to enable EEH support on a device before we start interacting with it via config space or MMIO. This requirement doesn't exist on PowerNV and shoehorning two completely separate initialisation paths into a common interface just results in a convoluted code everywhere. Additionally the early probe requires the probe function to take an pci_dn rather than a pci_dev argument. We'd like to make pci_dn a pseries specific data structure since there's no real requirement for them on PowerNV. To help both goals move the early probe into the pseries containment zone so the platform depedence is more explicit. Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306073904.4737-5-oohall@gmail.com
2020-03-25powerpc/eeh: Add sysfs files in late probeOliver O'Halloran
Move creating the EEH specific sysfs files into eeh_add_device_late() rather than being open-coded all over the place. Calling the function is generally done immediately after calling eeh_add_device_late() anyway. This is also a correctness fix since currently the sysfs files will be added even if the EEH probe happens to fail. Similarly, on pseries we currently add the sysfs files before calling eeh_add_device_late(). This is flat-out broken since the sysfs files require the pci_dev->dev.archdata.edev pointer to be set, and that is done in eeh_add_device_late(). Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306073904.4737-1-oohall@gmail.com
2020-03-04powerpc/powernv: Add explicit fast-reboot supportOliver O'Halloran
Add a way to manually invoke a fast-reboot rather than setting the NVRAM flag. The idea is to allow userspace to invoke a fast-reboot using the optional string argument to the reboot() system call, or using the xmon zr command so we don't need to leave around a persistent changes on a system to use the feature. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200217024833.30580-2-oohall@gmail.com
2020-03-04powerpc/powernv: Treat an empty reboot string as defaultOliver O'Halloran
Treat an empty reboot cmd string the same as a NULL string. This squashes a spurious unsupported reboot message that sometimes gets out when using xmon. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200217024833.30580-1-oohall@gmail.com
2020-03-04powerpc/powernv: no need to check return value of debugfs_create functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200209105901.1620958-6-gregkh@linuxfoundation.org
2020-02-19powerpc/powernv: Move core and fadump_release_opalcore under new kobjectSourabh Jain
The /sys/firmware/opal/core and /sys/kernel/fadump_release_opalcore sysfs files are used to export and release the OPAL memory on PowerNV platform. let's organize them into a new kobject under /sys/firmware/opal/mpipl/ directory. A symlink is added to maintain the backward compatibility for /sys/firmware/opal/core sysfs file. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191211160910.21656-5-sourabhjain@linux.ibm.com
2020-01-23powernv/pci: Move pnv_pci_dma_bus_setup() to pci-ioda.cOliver O'Halloran
This is only used in pci-ioda.c so move it there and rename it to match. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200110070207.439-6-oohall@gmail.com
2020-01-23powernv/pci: Fold pnv_pci_dma_dev_setup() into the pci-ioda.c versionOliver O'Halloran
pnv_pci_dma_dev_setup() does nothing but call the phb->dma_dev_setup() callback, if one exists. That callback is only set for normal PCIe PHBs so we can remove the layer of indirection and use the ioda version in the pci_controller_ops. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200110070207.439-5-oohall@gmail.com
2020-01-23powerpc/iov: Move VF pdev fixup into pcibios_fixup_iov()Oliver O'Halloran
An ioda_pe for each VF is allocated in pnv_pci_sriov_enable() before the pci_dev for the VF is created. We need to set the pe->pdev pointer at some point after the pci_dev is created. Currently we do that in: pcibios_bus_add_device() pnv_pci_dma_dev_setup() (via phb->ops.dma_dev_setup) /* fixup is done here */ pnv_pci_ioda_dma_dev_setup() (via pnv_phb->dma_dev_setup) The fixup needs to be done before setting up DMA for for the VF's PE, but there's no real reason to delay it until this point. Move the fixup into pnv_pci_ioda_fixup_iov() so the ordering is: pcibios_add_device() pnv_pci_ioda_fixup_iov() (via ppc_md.pcibios_fixup_sriov) pcibios_bus_add_device() ... This isn't strictly required, but it's a slightly more logical place to do the fixup and it simplifies pnv_pci_dma_dev_setup(). Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200110070207.439-4-oohall@gmail.com
2020-01-23powernv/pci: Remove dma_dev_setup() for NPU PHBsOliver O'Halloran
The pnv_pci_dma_dev_setup() only does something when: 1) There PHB contains VFs, or 2) The PHB defines a dma_dev_setup() callback in the pnv_phb structure. Neither is true for NPU PHBs so there's no reason to set the callback. Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200110070207.439-3-oohall@gmail.com
2020-01-23powerpc/powernv: Allow manually invoking special rebootsOliver O'Halloran
OPAL provides several different kinds of reboot for the kernel to use, namely forcing a full reboot, platform error reboot and MPIPL. Right now triggering the alternative resets requires some ad-hoc method such as triggering a kernel crash and hoping the stars align. It's sometimes handy to be able to trigger one of these resets directly, so add a way to do that. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191101085522.3055-2-oohall@gmail.com
2020-01-23powerpc/powernv: Use common code for the symbol_map exportOliver O'Halloran
Long before we had a generic way for firmware to export memory ranges of interest we added a special case for the skiboot symbol map. The code is pretty much identical to the generic export so re-use the code. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191101062611.32610-2-oohall@gmail.com
2020-01-23powerpc/powernv: Rework exports to support subnodesOliver O'Halloran
Originally we only had a handful of exported memory ranges, but we'd to export the per-core trace buffers. This results in a lot of files in the exports directory which is a but unfortunate. We can clean things up a bit by turning subnodes into subdirectories of the exports directory. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191101062611.32610-1-oohall@gmail.com
2020-01-23powernv/pci: Add a debugfs entry to dump PHB's IODA PE stateOliver O'Halloran
Add a debugfs entry to dump the state of the active IODA PEs. The IODA PE state reflects how the PHB's internal concept of a PE is configured. This is separate to the EEH PE state and is managed power the PowerNV PCI backend rather than the EEH core. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> [mpe: Use DEFINE_DEBUGFS_ATTRIBUTE] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190912052945.12589-3-oohall@gmail.com
2020-01-23powernv/pci: Allow any write trigger the diag dumpOliver O'Halloran
Make the dump trigger off any input rather than just '1'. This allows you to write "echo 1> dump_diag_data" and it'll do what you want rather than erroring out pointlessly. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190912052945.12589-2-oohall@gmail.com
2020-01-23powernv/pci: Use pnv_phb as the private data for debugfs entriesOliver O'Halloran
Use the pnv_phb structure as the private data pointer for the debugfs files. This lets us delete some code and an open-coded use of hose->private_data. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190912052945.12589-1-oohall@gmail.com
2020-01-23powerpc/pcidn: Make VF pci_dn management CONFIG_PCI_IOV specificOliver O'Halloran
The powerpc PCI code requires that a pci_dn structure exists for all devices in the system. This is fine for real devices since at boot a pci_dn is created for each PCI device in the DT and it's fine for hotplugged devices since the hotplug slot driver will manage the pci_dn's devices in hotplug slots. For SR-IOV, we need the platform / pcibios to manage the pci_dn for virtual functions since firmware is unaware of VFs, and they aren't "hot plugged" in the traditional sense. Management of the pci_dn is handled by the, poorly named, functions: add_pci_dev_data() and remove_pci_dev_data(). The entire body of these functions is #ifdef`ed around CONFIG_PCI_IOV and they cannot be used in any other context, so make them only available when CONFIG_PCI_IOV is selected, and rename them to reflect their actual usage rather than having them masquerade as generic code. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190821062655.19735-2-oohall@gmail.com
2020-01-23powerpc/powernv/ioda: Find opencapi slot for a device nodeFrederic Barrat
Unlike real PCI slots, opencapi slots are directly associated to the (virtual) opencapi PHB, there's no intermediate bridge. So when looking for a slot ID, we must start the search from the device node itself and not its parent. Also, the slot ID is not attached to a specific bdfn, so let's build it from the PHB ID, like skiboot. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191121134918.7155-6-fbarrat@linux.ibm.com
2020-01-23powerpc/powernv/ioda: Release opencapi deviceFrederic Barrat
With hotplug, an opencapi device can now go away. It needs to be released, mostly to clean up its PE state. We were previously not defining any device callback. We can reuse the standard PCI release callback, it does a bit too much for an opencapi device, but it's harmless, and only needs minor tuning. Also separate the undo of the PELT-V code in a separate function, it is not needed for NPU devices and it improves a bit the readability of the code. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191121134918.7155-5-fbarrat@linux.ibm.com
2020-01-23powerpc/powernv/ioda: set up PE on opencapi device when enablingFrederic Barrat
The PE for an opencapi device was set as part of a late PHB fixup operation, when creating the PHB. To use the PCI hotplug framework, this is not going to work, as the PHB stays the same, it's only the devices underneath which are updated. For regular PCI devices, it is done as part of the reconfiguration of the bridge, but for opencapi PHBs, we don't have an intermediate bridge. So let's define the PE when the device is enabled. PEs are meaningless for opencapi, the NPU doesn't define them and opal is not doing anything with them. Reviewed-by: Alastair D'Silva <alastair@d-silva.org> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191121134918.7155-4-fbarrat@linux.ibm.com
2020-01-23powerpc/powernv/ioda: Protect PE listFrederic Barrat
Protect the PHB's list of PE. Probably not needed as long as it was populated during PHB creation, but it feels right and will become required once we can add/remove opencapi devices on hotplug. Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191121134918.7155-3-fbarrat@linux.ibm.com
2020-01-23powerpc/powernv/ioda: Fix ref count for devices with their own PEFrederic Barrat
The pci_dn structure used to store a pointer to the struct pci_dev, so taking a reference on the device was required. However, the pci_dev pointer was later removed from the pci_dn structure, but the reference was kept for the npu device. See commit 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn"). We don't need to take a reference on the device when assigning the PE as the struct pnv_ioda_pe is cleaned up at the same time as the (physical) device is released. Doing so prevents the device from being released, which is a problem for opencapi devices, since we want to be able to remove them through PCI hotplug. Now the ugly part: nvlink npu devices are not meant to be released. Because of the above, we've always leaked a reference and simply removing it now is dangerous and would likely require more work. There's currently no release device callback for nvlink devices for example. So to be safe, this patch leaks a reference on the npu device, but only for nvlink and not opencapi. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191121134918.7155-2-fbarrat@linux.ibm.com
2020-01-07powerpc/powernv: use resource_sizeJulia Lawall
Use resource_size rather than a verbose computation on the end and start fields. The semantic patch that makes these changes is as follows: (http://coccinelle.lip6.fr/) <smpl> @@ struct resource ptr; @@ - (ptr.end - ptr.start + 1) + resource_size(&ptr) </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1577900990-8588-11-git-send-email-Julia.Lawall@inria.fr
2020-01-06powerpc/powernv/iov: Ensure the pdn for VFs always contains a valid PE numberOliver O'Halloran
On pseries there is a bug with adding hotplugged devices to an IOMMU group. For a number of dumb reasons fixing that bug first requires re-working how VFs are configured on PowerNV. For background, on PowerNV we use the pcibios_sriov_enable() hook to do two things: 1. Create a pci_dn structure for each of the VFs, and 2. Configure the PHB's internal BARs so the MMIO range for each VF maps to a unique PE. Roughly speaking a PE is the hardware counterpart to a Linux IOMMU group since all the devices in a PE share the same IOMMU table. A PE also defines the set of devices that should be isolated in response to a PCI error (i.e. bad DMA, UR/CA, AER events, etc). When isolated all MMIO and DMA traffic to and from devicein the PE is blocked by the root complex until the PE is recovered by the OS. The requirement to block MMIO causes a giant headache because the P8 PHB generally uses a fixed mapping between MMIO addresses and PEs. As a result we need to delay configuring the IOMMU groups for device until after MMIO resources are assigned. For physical devices (i.e. non-VFs) the PE assignment is done in pcibios_setup_bridge() which is called immediately after the MMIO resources for downstream devices (and the bridge's windows) are assigned. For VFs the setup is more complicated because: a) pcibios_setup_bridge() is not called again when VFs are activated, and b) The pci_dev for VFs are created by generic code which runs after pcibios_sriov_enable() is called. The work around for this is a two step process: 1. A fixup in pcibios_add_device() is used to initialised the cached pe_number in pci_dn, then 2. A bus notifier then adds the device to the IOMMU group for the PE specified in pci_dn->pe_number. A side effect fixing the pseries bug mentioned in the first paragraph is moving the fixup out of pcibios_add_device() and into pcibios_bus_add_device(), which is called much later. This results in step 2. failing because pci_dn->pe_number won't be initialised when the bus notifier is run. We can fix this by removing the need for the fixup. The PE for a VF is known before the VF is even scanned so we can initialise pci_dn->pe_number pcibios_sriov_enable() instead. Unfortunately, moving the initialisation causes two problems: 1. We trip the WARN_ON() in the current fixup code, and 2. The EEH core clears pdn->pe_number when recovering a VF and relies on the fixup to correctly re-set it. The only justification for either of these is a comment in eeh_rmv_device() suggesting that pdn->pe_number *must* be set to IODA_INVALID_PE in order for the VF to be scanned. However, this comment appears to have no basis in reality. Both bugs can be fixed by just deleting the code. Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191028085424.12006-1-oohall@gmail.com
2019-12-05powerpc/perf: Disable trace_imc pmuMadhavan Srinivasan
When a root user or a user with CAP_SYS_ADMIN privilege uses any trace_imc performance monitoring unit events, to monitor application or KVM threads, it may result in a checkstop (System crash). The cause is frequent switching of the "trace/accumulation" mode of the In-Memory Collection hardware (LDBAR). This patch disables the trace_imc PMU unit entirely to avoid triggering the checkstop. A future patch will reenable it at a later stage once a workaround has been developed. Fixes: 012ae244845f ("powerpc/perf: Trace imc PMU functions") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Tested-by: Hariharan T.S. <hari@linux.ibm.com> [mpe: Add pr_info_once() so dmesg shows the PMU has been disabled] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191118034452.9939-1-maddy@linux.vnet.ibm.com
2019-12-05powerpc/powernv: Avoid re-registration of imc debugfs directoryAnju T Sudhakar
export_imc_mode_and_cmd() function which creates the debugfs interface for imc-mode and imc-command, is invoked when each nest pmu units is registered. When the first nest pmu unit is registered, export_imc_mode_and_cmd() creates 'imc' directory under `/debug/powerpc/`. In the subsequent invocations debugfs_create_dir() function returns, since the directory already exists. The recent commit <c33d442328f55> (debugfs: make error message a bit more verbose), throws a warning if we try to invoke `debugfs_create_dir()` with an already existing directory name. Address this warning by making the debugfs directory registration in the opal_imc_counters_probe() function, i.e invoke export_imc_mode_and_cmd() function from the probe function. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Tested-by: Nageswara R Sastry <nasastry@in.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191127072035.4283-1-anju@linux.vnet.ibm.com
2019-11-21powerpc/powernv: Disable native PCIe port managementOliver O'Halloran
On PowerNV the PCIe topology is (currently) managed by the powernv platform code in Linux in cooperation with the platform firmware. Linux's native PCIe port service drivers operate independently of both and this can cause problems. The main issue is that the portbus driver will conflict with the platform specific hotplug driver (pnv_php) over ownership of the MSI used to notify the host when a hotplug event occurs. The portbus driver claims this MSI on behalf of the individual port services because the same interrupt is used for hotplug events, PMEs (on root ports), and link bandwidth change notifications. The portbus driver will always claim the interrupt even if the individual port service drivers, such as pciehp, are compiled out. The second, bigger, problem is that the hotplug port service driver fundamentally does not work on PowerNV. The platform assumes that all PCI devices have a corresponding arch-specific handle derived from the DT node for the device (pci_dn) and without one the platform will not allow a PCI device to be enabled. This problem is largely due to historical baggage, but it can't be resolved without significant re-factoring of the platform PCI support. We can fix these problems in the interim by setting the "pcie_ports_disabled" flag during platform initialisation. The flag indicates the platform owns the PCIe ports which stops the portbus driver from being registered. This does have the side effect of disabling all port services drivers that is: AER, PME, BW notifications, hotplug, and DPC. However, this is not a huge disadvantage on PowerNV since these services are either unused or handled through other means. Fixes: 66725152fb9f ("PCI/hotplug: PowerPC PowerNV PCI hotplug driver") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191118065553.30362-1-oohall@gmail.com
2019-11-13powerpc/powernv/ioda: using kfree_rcu() to simplify the codeYueHaibing
The callback function of call_rcu() just calls a kfree(), so we can use kfree_rcu() instead of call_rcu() + callback function. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190711141818.18044-1-yuehaibing@huawei.com