summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel/eeh_driver.c
AgeCommit message (Collapse)Author
2016-07-15Merge tag 'powerpc-4.7-5' into nextMichael Ellerman
Pull in the fixes we sent during 4.7, we have code we want to merge into next that depends on some of them.
2016-06-28powerpc/eeh: Fix wrong argument passed to eeh_rmv_device()Gavin Shan
When calling eeh_rmv_device() in eeh_reset_device() for partial hotplug case, @rmv_data instead of its address is the proper argument. Otherwise, the stack frame is corrupted when writing to @rmv_data (actually its address) in eeh_rmv_device(). It results in kernel crash as observed. This fixes the issue by passing @rmv_data, not its address to eeh_rmv_device() in eeh_reset_device(). Fixes: 67086e32b564 ("powerpc/eeh: powerpc/eeh: Support error recovery for VF PE") Reported-by: Pridhiviraj Paidipeddi <ppaidipe@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-17powerpc/eeh: Fix invalid cached PE primary busGavin Shan
The PE primary bus cannot be got from its child devices when having full hotplug in error recovery. The PE primary bus is cached, which is done in commit <05ba75f84864> ("powerpc/eeh: Fix stale cached primary bus"). In eeh_reset_device(), the flag (EEH_PE_PRI_BUS) is cleared before the PCI hot remove. eeh_pe_bus_get() then returns NULL as the PE primary bus in pnv_eeh_reset() and it crashes the kernel eventually. This fixes the issue by clearing the flag (EEH_PE_PRI_BUS) before the PCI hot add. With it, the PowerNV EEH reset backend (pnv_eeh_reset()) can get valid PE primary bus through eeh_pe_bus_get(). Fixes: 67086e32b564 ("powerpc/eeh: powerpc/eeh: Support error recovery for VF PE") Reported-by: Pridhiviraj Paidipeddi <ppaiddipe@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-14powerpc: Various typo fixesMichael Ellerman
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-12powerpc/eeh: Ignore handlers in eeh_pe_reset_and_recover()Gavin Shan
The function eeh_pe_reset_and_recover() is used to recover EEH error when the passthrough device are transferred to guest and backwards, meaning the device's driver is vfio-pci or none. In both cases, the handlers triggered by eeh_report_reset() and eeh_report_resume() shouldn't be called. This ignores the error handlers from eeh_report_reset() and eeh_report_resume(). Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-12powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover()Gavin Shan
The function eeh_pe_reset_and_recover() is used to recover EEH error when the passthrou device are transferred to guest and backwards. The content in the device's config space will be lost on PE reset issued in the middle of the recovery. The function saves/restores it before/after the reset. However, config access to some adapters like Broadcom BCM5719 at this point will causes fenced PHB. The config space is always blocked and we save 0xFF's that are restored at late point. The memory BARs are totally corrupted, causing another EEH error upon access to one of the memory BARs. This restores the config space on those adapters like BCM5719 from the content saved to the EEH device when it's populated, to resolve above issue. Fixes: 5cfb20b9 ("powerpc/eeh: Emulate EEH recovery for VFIO devices") Cc: stable@vger.kernel.org #v3.18+ Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-12powerpc/eeh: Don't report error in eeh_pe_reset_and_recover()Gavin Shan
The function eeh_pe_reset_and_recover() is used to recover EEH error when the passthrough device are transferred to guest and backwards, meaning the device's driver is vfio-pci or none. When the driver is vfio-pci that provides error_detected() error handler only, the handler simply stops the guest and it's not expected behaviour. On the other hand, no error handlers will be called if we don't have a bound driver. This ignores the error handler in eeh_pe_reset_and_recover() that reports the error to device driver to avoid the exceptional behaviour. Fixes: 5cfb20b9 ("powerpc/eeh: Emulate EEH recovery for VFIO devices") Cc: stable@vger.kernel.org #v3.18+ Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11powerpc/pci: Rename pcibios_{add, remove}_pci_devices()Gavin Shan
This renames pcibios_{add,remove}_pci_devices() to avoid conflicts with names of the weak functions in PCI subsystem, which have the prefix "pcibios". No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: Don't remove passed VFsGavin Shan
When we have partial hotplug as part of the error recovery on PF, the VFs that are bound with vfio-pci driver will experience hotplug. That's not allowed. This checks if the VF PE is passed or not. If it does, we leave the VF without removing it. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: Don't propagate error to guestGavin Shan
When EEH error happened to the parent PE of those PEs that have been passed through to guest, the error is propagated to guest domain and the VFIO driver's error handlers are called. It's not correct as the error in the host domain shouldn't be propagated to guests and affect them. This adds one more limitation when calling EEH error handlers. If the PE has been passed through to guest, the error handlers won't be called. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: powerpc/eeh: Support error recovery for VF PEWei Yang
PFs are enumerated on PCI bus, while VFs are created by PF's driver. In EEH recovery, it has two cases: 1. Device and driver is EEH aware, error handlers are called. 2. Device and driver is not EEH aware, un-plug the device and plug it again by enumerating it. The special thing happens on the second case. For a PF, we could use the original pci core to enumerate the bus, while for VF we need to record the VFs which aer un-plugged then plug it again. Also The patch caches the VF index in pci_dn, which can be used to calculate VF's bus, device and function number. Those information helps to locate the VF's PCI device instance when doing hotplug during EEH recovery if necessary. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-25Merge tag 'powerpc-4.5-4' into nextMichael Ellerman
Pull in our current fixes from 4.5, in particular the "Fix Multi hit ERAT" bug is causing folks some grief when testing next.
2016-02-22powerpc/eeh: Fix partial hotplug criterionGavin Shan
During error recovery, the device could be removed as part of the partial hotplug. The criterion used to come with partial hotplug is: if the device driver provides error_detected(), slot_reset() and resume() callbacks, it's immune from hotplug. Otherwise, it's going to experience partial hotplug during EEH recovery. But the criterion isn't correct enough: mlx4_core driver for Mellanox adapters provides error_detected(), slot_reset() callbacks, but resume() isn't there. Those Mellanox adapters won't be to involved in the partial hotplug. This fixes the criterion to a practical one: adpater with driver that provides error_detected(), slot_reset() will be immune from partial hotplug. resume() isn't mandatory. Fixes: f2da4ccf ("powerpc/eeh: More relaxed hotplug criterion") Cc: stable@vger.kernel.org #v4.4+ Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-15powerpc/eeh: Fix stale cached primary busGavin Shan
When PE is created, its primary bus is cached to pe->bus. At later point, the cached primary bus is returned from eeh_pe_bus_get(). However, we could get stale cached primary bus and run into kernel crash in one case: full hotplug as part of fenced PHB error recovery releases all PCI busses under the PHB at unplugging time and recreate them at plugging time. pe->bus is still dereferencing the PCI bus that was released. This adds another PE flag (EEH_PE_PRI_BUS) to represent the validity of pe->bus. pe->bus is updated when its first child EEH device is online and the flag is set. Before unplugging in full hotplug for error recovery, the flag is cleared. Fixes: 8cdb2833 ("powerpc/eeh: Trace PCI bus from PE") Cc: stable@vger.kernel.org #v3.11+ Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reported-by: Pradipta Ghosh <pradghos@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-01-21Merge tag 'pci-v4.5-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for the v4.5 merge window: Enumeration: - Simplify config space size computation (Bjorn Helgaas) - Avoid iterating through ROM outside the resource window (Edward O'Callaghan) - Support PCIe devices with short cfg_size (Jason S. McMullan) - Add Netronome vendor and device IDs (Jason S. McMullan) - Limit config space size for Netronome NFP6000 family (Jason S. McMullan) - Add Netronome NFP4000 PF device ID (Simon Horman) - Limit config space size for Netronome NFP4000 (Simon Horman) - Print warnings for all invalid expansion ROM headers (Vladis Dronov) Resource management: - Fix minimum allocation address overwrite (Christoph Biedl) PCI device hotplug: - acpiphp_ibm: Fix null dereferences on null ibm_slot (Colin Ian King) - pciehp: Always protect pciehp_disable_slot() with hotplug mutex (Guenter Roeck) - shpchp: Constify hpc_ops structure (Julia Lawall) - ibmphp: Remove unneeded NULL test (Julia Lawall) Power management: - Make ASPM sysfs link_state_store() consistent with link_state_show() (Andy Lutomirski) Virtualization - Add function 1 DMA alias quirk for Lite-On/Plextor M6e/Marvell 88SS9183 (Tim Sander) MSI: - Remove empty pci_msi_init_pci_dev() (Bjorn Helgaas) - Mark PCIe/PCI (MSI) IRQ cascade handlers as IRQF_NO_THREAD (Grygorii Strashko) - Initialize MSI capability for all architectures (Guilherme G. Piccoli) - Relax msi_domain_alloc() to support parentless MSI irqdomains (Liu Jiang) ARM Versatile host bridge driver: - Remove unused pci_sys_data structures (Lorenzo Pieralisi) Broadcom iProc host bridge driver: - Hide CONFIG_PCIE_IPROC (Arnd Bergmann) - Do not use 0x in front of %pap (Dmitry V. Krivenok) - Update iProc PCIe device tree binding (Ray Jui) - Add PAXC interface support (Ray Jui) - Add iProc PCIe MSI device tree binding (Ray Jui) - Add iProc PCIe MSI support (Ray Jui) Freescale i.MX6 host bridge driver: - Use gpio_set_value_cansleep() (Fabio Estevam) - Add support for active-low reset GPIO (Petr Štetiar) HiSilicon host bridge driver: - Add support for HiSilicon Hip06 PCIe host controllers (Gabriele Paoloni) Intel VMD host bridge driver: - Export irq_domain_set_info() for module use (Keith Busch) - x86/PCI: Allow DMA ops specific to a PCI domain (Keith Busch) - Use 32 bit PCI domain numbers (Keith Busch) - Add driver for Intel Volume Management Device (VMD) (Keith Busch) Qualcomm host bridge driver: - Document PCIe devicetree bindings (Stanimir Varbanov) - Add Qualcomm PCIe controller driver (Stanimir Varbanov) - dts: apq8064: add PCIe devicetree node (Stanimir Varbanov) - dts: ifc6410: enable PCIe DT node for this board (Stanimir Varbanov) Renesas R-Car host bridge driver: - Add support for R-Car H3 to pcie-rcar (Harunobu Kurokawa) - Allow DT to override default window settings (Phil Edworthy) - Convert to DT resource parsing API (Phil Edworthy) - Revert "PCI: rcar: Build pcie-rcar.c only on ARM" (Phil Edworthy) - Remove unused pci_sys_data struct from pcie-rcar (Phil Edworthy) - Add runtime PM support to pcie-rcar (Phil Edworthy) - Add Gen2 PHY setup to pcie-rcar (Phil Edworthy) - Add gen2 fallback compatibility string for pci-rcar-gen2 (Simon Horman) - Add gen2 fallback compatibility string for pcie-rcar (Simon Horman) Synopsys DesignWare host bridge driver: - Simplify control flow (Bjorn Helgaas) - Make config accessor override checking symmetric (Bjorn Helgaas) - Ensure ATU is enabled before IO/conf space accesses (Stanimir Varbanov) Miscellaneous: - Add of_pci_get_host_bridge_resources() stub (Arnd Bergmann) - Check for PCI_HEADER_TYPE_BRIDGE equality, not bitmask (Bjorn Helgaas) - Fix all whitespace issues (Bogicevic Sasa) - x86/PCI: Simplify pci_bios_{read,write} (Geliang Tang) - Use to_pci_dev() instead of open-coding it (Geliang Tang) - Use kobj_to_dev() instead of open-coding it (Geliang Tang) - Use list_for_each_entry() to simplify code (Geliang Tang) - Fix typos in <linux/msi.h> (Thomas Petazzoni) - x86/PCI: Clarify AMD Fam10h config access restrictions comment (Tomasz Nowicki)" * tag 'pci-v4.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (58 commits) PCI: Add function 1 DMA alias quirk for Lite-On/Plextor M6e/Marvell 88SS9183 PCI: Limit config space size for Netronome NFP4000 PCI: Add Netronome NFP4000 PF device ID x86/PCI: Add driver for Intel Volume Management Device (VMD) PCI/AER: Use 32 bit PCI domain numbers x86/PCI: Allow DMA ops specific to a PCI domain irqdomain: Export irq_domain_set_info() for module use PCI: host: Add of_pci_get_host_bridge_resources() stub genirq/MSI: Relax msi_domain_alloc() to support parentless MSI irqdomains PCI: rcar: Add Gen2 PHY setup to pcie-rcar PCI: rcar: Add runtime PM support to pcie-rcar PCI: designware: Make config accessor override checking symmetric PCI: ibmphp: Remove unneeded NULL test ARM: dts: ifc6410: enable PCIe DT node for this board ARM: dts: apq8064: add PCIe devicetree node PCI: hotplug: Use list_for_each_entry() to simplify code PCI: rcar: Remove unused pci_sys_data struct from pcie-rcar PCI: hisi: Add support for HiSilicon Hip06 PCIe host controllers PCI: Avoid iterating through memory outside the resource window PCI: acpiphp_ibm: Fix null dereferences on null ibm_slot ...
2015-12-10PCI: Check for PCI_HEADER_TYPE_BRIDGE equality, not bitmaskBjorn Helgaas
Bit 7 of the "Header Type" register indicates a multi-function device when set. Bits 0-6 contain encoded values, where 0x1 indicates a PCI-PCI bridge. It is incorrect to test this as though it were a mask. For example, while the PCI 3.0 spec only defines values 0x0, 0x1, and 0x2, it's conceivable that a future spec could define 0x3 to mean something else; then tests for "(hdr_type & 0x7f) & PCI_HEADER_TYPE_BRIDGE" would incorrectly succeed for this new 0x3 header type. Test bits 0-6 of the Header Type for equality with PCI_HEADER_TYPE_BRIDGE. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2015-12-09Revert "powerpc/eeh: Don't unfreeze PHB PE after reset"Andrew Donnellan
This reverts commit 527d10ef3a315d3cb9dc098dacd61889a6c26439. The reverted commit breaks cxlflash devices following an EEH reset (and possibly other cxl devices, however this has not been tested). The reverted commit changed the behaviour of eeh_reset_device() so that PHB PEs are not unfrozen following the completion of the reset. This should not be problematic, as no device resources should have been associated with the PHB PE. However, when attempting to load the cxlflash driver after a reset, the driver attempts to read Vital Product Data through a call to pci_read_vpd() (which is called on the physical cxl device, not on the virtual AFU device). pci_read_vpd() in turn attempts to read from the cxl device's config space. This fails, as the PE it's trying to read from is still frozen. In turn, the driver gets an -ENODEV and fails to initialise. It appears this issue only affects some parts of the VPD area, as "lspci -vvv", which only reads a subset of the VPD bytes, is not broken by the original patch. At this stage, we don't fully understand why we're trying to read a frozen PE, and we don't know how this affects other cxl devices. It is possible that there is an underlying bug in the cxl driver or the powerpc CAPI support code, or alternatively a bug in the PCI resource allocation/mapping code that is incorrectly mapping resources to PE#0. As such, this fix is incomplete, however it is necessary to prevent a serious regression in CAPI support. In the meantime, revert the commit, especially as it was intended to be a non-functional change. Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Daniel Axtens <dja@axtens.net> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21powerpc/eeh: Force reset on fenced PHBGavin Shan
On fenced PHB, the error handlers in the drivers of its subordinate devices could return PCI_ERS_RESULT_CAN_RECOVER, indicating no reset will be issued during the recovery. It's conflicting with the fact that fenced PHB won't be recovered without reset. This limits the return value from the error handlers in the drivers of the fenced PHB's subordinate devices to PCI_ERS_RESULT_NEED_NONE or PCI_ERS_RESULT_NEED_RESET, to ensure reset will be issued during recovery. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21powerpc/eeh: More relaxed hotplug criterionGavin Shan
Currently, we rely on the existence of struct pci_driver::err_handler to decide if the corresponding PCI device should be unplugged during EEH recovery (partially hotplug case). However that check is not sufficient. Some device drivers implement only some of the EEH error handlers to collect diag-data. That means the driver still expects a hotplug to recover from the EEH error. This makes the hotplug criterion more relaxed: if the device driver doesn't provide all necessary EEH error handlers, it will experience hotplug during EEH recovery. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> [mpe: Minor change log rewording] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21powerpc/eeh: Don't unfreeze PHB PE after resetGavin Shan
On PowerNV platform, the PE is kept in frozen state until the PE reset is completed to avoid recursive EEH error caused by MMIO access during the period of EEH reset. The PE's frozen state is cleared after BARs of PCI device included in the PE are restored and enabled. However, we needn't clear the frozen state for PHB PE explicitly at this point as there is no real PE for PHB PE. As the PHB PE is always binding with PE#0, we actually clear PE#0, which is wrong. It doesn't incur any problem though. This checks if the PE is PHB PE and doesn't clear the frozen state if it is. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-13powerpc/eeh: fix comment for wait_state()Wei Yang
To retrieve the PCI slot state, EEH driver would set a timeout for that. While current comment is not aligned to what the code does. This patch fixes those comments according to the code. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-24powerpc/eeh: Remove device_node dependencyGavin Shan
The patch removes struct eeh_dev::dn and the corresponding helper functions: eeh_dev_to_of_node() and of_node_to_eeh_dev(). Instead, eeh_dev_to_pdn() and pdn_to_eeh_dev() should be used to get the pdn, which might contain device_node on PowerNV platform. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-01-23powerpc/eeh: Allow to set maximal frozen timesGavin Shan
When PE's frozen count hits maximal allowed frozen times, which is 5 currently, it will be forced to be offline permanently. Once the PE is removed permanently, rebooting machine is required to bring the PE back. It's not convienent when testing EEH functionality. The patch exports the maximal allowed frozen times through debugfs entry (/sys/kernel/debug/powerpc/eeh_max_freezes). Requested-by: Ryan Grimm <grimm@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23powerpc/eeh: Introduce flag EEH_PE_REMOVEDGavin Shan
The conditions that one specific PE's frozen count exceeds the maximal allowed times (EEH_MAX_ALLOWED_FREEZES) and it's in isolated or recovery state indicate the PE was removed permanently implicitly. The patch introduces flag EEH_PE_REMOVED to indicate that explicitly so that we don't depend on the fixed maximal allowed times, which can be varied as we do in subsequent patch. Flag EEH_PE_REMOVED is expected to be marked for the PE whose frozen count exceeds the maximal allowed times, or just failed from recovery. Requested-by: Ryan Grimm <grimm@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-02powerpc/eeh: Set EEH_PE_RESET on PE resetGavin Shan
The patch introduces additional flag EEH_PE_RESET to indicate the corresponding PE is under reset. In turn, the PE retrieval bakcend on PowerNV platform can return unfrozen state for the EEH core to moving forward. Flag EEH_PE_CFG_BLOCKED isn't the correct one for the purpose. In PCI passthrou case, the problem is more worse: Guest doesn't recover 6th EEH error. The PE is left in isolated (frozen) and config blocked state on Broadcom adapters. We can't retrieve the PE's state correctly any more, even from the host side via sysfs /sys/bus/pci/devices/xxx/eeh_pe_state. Reported-by: Rajeshkumar Subramanian <rajeshkumars@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-10-15powerpc/eeh: Rename flag EEH_PE_RESET to EEH_PE_CFG_BLOCKEDGavin Shan
The flag EEH_PE_RESET indicates blocking config space of the PE during reset time. We potentially need block PE's config space other than reset time. So it's reasonable to replace it with EEH_PE_CFG_BLOCKED to indicate its usage. There are no substantial code or logic changes in this patch. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/eeh: Emulate EEH recovery for VFIO devicesGavin Shan
When enabling EEH functionality on passed through devices (PE) with VFIO, the devices in the PE would be removed permanently from guest side. In that case, the PE remains frozen state. When returning PE to host, or restarting the guest again, we had mechanism unfreezing the PE by clearing PESTA/B frozen bits. However, that's not enough for some adapters, which are indicated as following "lspci" shows. Those adapters require hot reset on the parent bus to bring their firmware back to workable state. Otherwise, those adaptrs won't be operative and the host (for returning case) or the guest will fail to load the drivers for those adapters without exception. 0000:01:00.0 Ethernet controller: Emulex Corporation OneConnect \ 10Gb NIC (be3) (rev 02) 0000:01:00.0 0200: 19a2:0710 (rev 02) 0001:03:00.0 Ethernet controller: Emulex Corporation OneConnect \ NIC (Lancer) (rev 10) 0001:03:00.0 0200: 10df:e220 (rev 10) The patch adds mechanism to emulate EEH recovery (for hot reset on parent PCI bus) on 3 gates to fix the issue: open/release one adapter of the PE, enable EEH functionality on one adapter of the PE. Reported-by: Murilo Fossa Vicentini <muvic@br.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/eeh: Use eeh_unfreeze_pe()Gavin Shan
The patch uses eeh_unfreeze_pe() to replace the logic clearing frozen IO and DMA, in order to simplify the code. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-08-05powerpc/eeh: Replace pr_warning() with pr_warn()Gavin Shan
pr_warn() is equal to pr_warning(), but the former is a bit more formal according to commit fc62f2f ("kernel.h: add pr_warn for symmetry to dev_warn, netdev_warn"). The patch replaces pr_warning() with pr_warn(). Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11powerpc/powernv: Fix killed EEH eventGavin Shan
On PowerNV platform, EEH errors are reported by IO accessors or poller driven by interrupt. After the PE is isolated, we won't produce EEH event for the PE. The current implementation has possibility of EEH event lost in this way: The interrupt handler queues one "special" event, which drives the poller. EEH thread doesn't pick the special event yet. IO accessors kicks in, the frozen PE is marked as "isolated" and EEH event is queued to the list. EEH thread runs because of special event and purge all existing EEH events. However, we never produce an other EEH event for the frozen PE. Eventually, the PE is marked as "isolated" and we don't have EEH event to recover it. The patch fixes the issue to keep EEH events for PEs that have been marked as "isolated" with the help of additional "force" help to eeh_remove_event(). Reported-by: Rolf Brudeseth <rolfb@us.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11powerpc/eeh: Clear frozen state for child PEGavin Shan
Since commit cb523e09 ("powerpc/eeh: Avoid I/O access during PE reset"), the PE is kept as frozen state on hardware level until the PE reset is done completely. After that, we explicitly clear the frozen state of the affected PE. However, there might have frozen child PEs of the affected PE and we also need clear their frozen state as well. Otherwise, the recovery is going to fail. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28powerpc/eeh: Can't recover from non-PE-reset caseGavin Shan
When PCI_ERS_RESULT_CAN_RECOVER returned from device drivers, the EEH core should enable I/O and DMA for the affected PE. However, it was missed to have DMA enabled in eeh_handle_normal_event(). Besides, the frozen state of the affected PE should be cleared after successful recovery, but we didn't. The patch fixes both of the issues as above. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28powerpc/eeh: No hotplug on permanently removed devGavin Shan
The issue was detected in a bit complicated test case where we have multiple hierarchical PEs shown as following figure: +-----------------+ | PE#3 p2p#0 | | p2p#1 | +-----------------+ | +-----------------+ | PE#4 pdev#0 | | pdev#1 | +-----------------+ PE#4 (have 2 PCI devices) is the child of PE#3, which has 2 p2p bridges. We accidentally had less-known scenario: PE#4 was removed permanently from the system because of permanent failure (e.g. exceeding the max allowd failure times in last hour), then we detects EEH errors on PE#3 and tried to recover it. However, eeh_dev instances for pdev#0/1 were not detached from PE#4, which was still connected to PE#3. All of that was because of the fact that we rely on count-based pcibios_release_device(), which isn't reliable enough. When doing recovery for PE#3, we still apply hotplug on PE#4 and pdev#0/1, which are not valid any more. Eventually, we run into kernel crash. The patch fixes above issue from two aspects. For unplug, we simply skip those permanently removed PE, whose state is (EEH_PE_STATE_ISOLATED && !EEH_PE_STATE_RECOVERING) and its frozen count should be greater than EEH_MAX_ALLOWED_FREEZES. For plug, we marked all permanently removed EEH devices with EEH_DEV_REMOVED and return 0xFF's on read its PCI config so that PCI core will omit them. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28powerpc/eeh: Avoid I/O access during PE resetGavin Shan
We have suffered recrusive frozen PE a lot, which was caused by IO accesses during the PE reset. Ben came up with the good idea to keep frozen PE until recovery (BAR restore) gets done. With that, IO accesses during PE reset are dropped by hardware and wouldn't incur the recrusive frozen PE any more. The patch implements the idea. We don't clear the frozen state until PE reset is done completely. During the period, the EEH core expects unfrozen state from backend to keep going. So we have to reuse EEH_PE_RESET flag, which has been set during PE reset, to return normal state from backend. The side effect is we have to clear frozen state for towice (PE reset and clear it explicitly), but that's harmless. We have some limitations on pHyp. pHyp doesn't allow to enable IO or DMA for unfrozen PE. So we don't enable them on unfrozen PE in eeh_pci_enable(). We have to enable IO before grabbing logs on pHyp. Otherwise, 0xFF's is always returned from PCI config space. Also, we had wrong return value from eeh_pci_enable() for EEH_OPT_THAW_DMA case. The patch fixes it too. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28powerpc/eeh: Block PCI-CFG access during PE resetGavin Shan
We've observed multiple PE reset failures because of PCI-CFG access during that period. Potentially, some device drivers can't support EEH very well and they can't put the device to motionless state before PE reset. So those device drivers might produce PCI-CFG accesses during PE reset. Also, we could have PCI-CFG access from user space (e.g. "lspci"). Since access to frozen PE should return 0xFF's, we can block PCI-CFG access during the period of PE reset so that we won't get recrusive EEH errors. The patch adds flag EEH_PE_RESET, which is kept during PE reset. The PowerNV/pSeries PCI-CFG accessors reuse the flag to block PCI-CFG accordingly. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28powerpc/eeh: Remove EEH_PE_PHB_DEADGavin Shan
The PE state (for eeh_pe instance) EEH_PE_PHB_DEAD is duplicate to EEH_PE_ISOLATED. Originally, those PHBs (PHB PE) with EEH_PE_PHB_DEAD would be removed from the system. However, it's safe to replace that with EEH_PE_ISOLATED. The patch also clear EEH_PE_RECOVERING after fenced PHB has been handled, either failure or success. It makes the PHB PE state consistent with: PHB functions normally NONE PHB has been removed EEH_PE_ISOLATED PHB fenced, recovery in progress EEH_PE_ISOLATED | RECOVERING Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-05powerpc: eeh: Fixup the brown paperbag fallout of the "cleanup"Thomas Gleixner
Commit b8a9a11b9 (powerpc: eeh: Kill another abuse of irq_desc) is missing some brackets ..... It's not a good idea to write patches in grumpy mode and then forget to at least compile test them or rely on the few eyeballs discussing that patch to spot it..... Reported-by: fengguang.wu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Gavin Shan <shangw@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: ppc <linuxppc-dev@lists.ozlabs.org>
2014-03-04powerpc: Eeh: Kill another abuse of irq_descThomas Gleixner
commit 91150af3a (powerpc/eeh: Fix unbalanced enable for IRQ) is another brilliant example of trainwreck engineering. The patch "fixes" the issue of an unbalanced call to irq_enable() which causes a prominent warning by checking the disabled state of the interrupt line and call conditionally into the core code. This is wrong in two aspects: 1) The warning is there to tell users, that they need to fix their asymetric enable/disable patterns by finding the root cause and solving it there. It's definitely not meant to work around it by conditionally calling into the core code depending on the random state of the irq line. Asymetric irq_disable/enable calls are a clear sign of wrong usage of the interfaces which have to be cured at the root and not by somehow hacking around it. 2) The abuse of core internal data structure instead of using the proper interfaces for retrieving the information for the 'hack around' irq_desc is core internal and it's clear enough stated. Replace at least the irq_desc abuse with the proper functions and add a big fat comment why this is absurd and completely wrong. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Gavin Shan <shangw@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: ppc <linuxppc-dev@lists.ozlabs.org> Link: http://lkml.kernel.org/r/20140223212736.562906212@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-11powerpc/eeh: Drop taken reference to driver on eeh_rmv_deviceThadeu Lima de Souza Cascardo
Commit f5c57710dd62dd06f176934a8b4b8accbf00f9f8 ("powerpc/eeh: Use partial hotplug for EEH unaware drivers") introduces eeh_rmv_device, which may grab a reference to a driver, but not release it. That prevents a driver from being removed after it has gone through EEH recovery. This patch drops the reference if it was taken. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-27Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "So here's my next branch for powerpc. A bit late as I was on vacation last week. It's mostly the same stuff that was in next already, I just added two patches today which are the wiring up of lockref for powerpc, which for some reason fell through the cracks last time and is trivial. The highlights are, in addition to a bunch of bug fixes: - Reworked Machine Check handling on kernels running without a hypervisor (or acting as a hypervisor). Provides hooks to handle some errors in real mode such as TLB errors, handle SLB errors, etc... - Support for retrieving memory error information from the service processor on IBM servers running without a hypervisor and routing them to the memory poison infrastructure. - _PAGE_NUMA support on server processors - 32-bit BookE relocatable kernel support - FSL e6500 hardware tablewalk support - A bunch of new/revived board support - FSL e6500 deeper idle states and altivec powerdown support You'll notice a generic mm change here, it has been acked by the relevant authorities and is a pre-req for our _PAGE_NUMA support" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (121 commits) powerpc: Implement arch_spin_is_locked() using arch_spin_value_unlocked() powerpc: Add support for the optimised lockref implementation powerpc/powernv: Call OPAL sync before kexec'ing powerpc/eeh: Escalate error on non-existing PE powerpc/eeh: Handle multiple EEH errors powerpc: Fix transactional FP/VMX/VSX unavailable handlers powerpc: Don't corrupt transactional state when using FP/VMX in kernel powerpc: Reclaim two unused thread_info flag bits powerpc: Fix races with irq_work Move precessing of MCE queued event out from syscall exit path. pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines powerpc: Make add_system_ram_resources() __init powerpc: add SATA_MV to ppc64_defconfig powerpc/powernv: Increase candidate fw image size powerpc: Add debug checks to catch invalid cpu-to-node mappings powerpc: Fix the setup of CPU-to-Node mappings during CPU online powerpc/iommu: Don't detach device without IOMMU group powerpc/eeh: Hotplug improvement powerpc/eeh: Call opal_pci_reinit() on powernv for restoring config space powerpc/eeh: Add restore_config operation ...
2014-01-15powerpc/eeh: Use global PCI rescan-remove lockingRafael J. Wysocki
Race conditions are theoretically possible between the PCI device addition and removal in the PPC64 PCI error recovery driver and the generic PCI bus rescan and device removal that can be triggered via sysfs. To avoid those race conditions make PPC64 PCI error recovery driver use global PCI rescan-remove locking around PCI device addition and removal. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-01-15powerpc/eeh: Handle multiple EEH errorsGavin Shan
For one PCI error relevant OPAL event, we possibly have multiple EEH errors for that. For example, multiple frozen PEs detected on different PHBs. Unfortunately, we didn't cover the case. The patch enumarates the return value from eeh_ops::next_error() and change eeh_handle_special_event() and eeh_ops::next_error() to handle all existing EEH errors. As Ben pointed out, we needn't list_for_each_entry_safe() since we are not deleting any PHB from the hose_list and the EEH serialized lock should be held while purging EEH events. The patch covers those suggestions as well. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc/eeh: Hotplug improvementGavin Shan
When EEH error comes to one specific PCI device before its driver is loaded, we will apply hotplug to recover the error. During the plug time, the PCI device will be probed and its driver is loaded. Then we wrongly calls to the error handlers if the driver supports EEH explicitly. The patch intends to fix by introducing flag EEH_DEV_NO_HANDLER and set it before we remove the PCI device. In turn, we can avoid wrongly calls the error handlers of the PCI device after its driver loaded. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-05powerpc: Increase EEH recovery timeout for SR-IOVBrian King
In order to support concurrent adapter firmware download to SR-IOV adapters on pSeries, each VF will see an EEH event where the slot will remain in the unavailable state for the duration of the adapter firmware update, which can take as long as 5 minutes. Extend the EEH recovery timeout to account for this. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-24powerpc/eeh: Fix unbalanced enable for IRQGavin Shan
The patch fixes following issue: Unbalanced enable for IRQ 23 ------------[ cut here ]------------ WARNING: at kernel/irq/manage.c:437 : NIP [c00000000016de8c] .__enable_irq+0x11c/0x140 LR [c00000000016de88] .__enable_irq+0x118/0x140 Call Trace: [c000003ea1f23880] [c00000000016de88] .__enable_irq+0x118/0x140 (unreliable) [c000003ea1f23910] [c00000000016df08] .enable_irq+0x58/0xa0 [c000003ea1f239a0] [c0000000000388b4] .eeh_enable_irq+0xc4/0xe0 [c000003ea1f23a30] [c000000000038a28] .eeh_report_reset+0x78/0x130 [c000003ea1f23ac0] [c000000000037508] .eeh_pe_dev_traverse+0x98/0x170 [c000003ea1f23b60] [c0000000000391ac] .eeh_handle_normal_event+0x2fc/0x3d0 [c000003ea1f23bf0] [c000000000039538] .eeh_handle_event+0x2b8/0x2c0 [c000003ea1f23c90] [c000000000039600] .eeh_event_handler+0xc0/0x170 [c000003ea1f23d30] [c0000000000da9a0] .kthread+0xf0/0x100 [c000003ea1f23e30] [c00000000000a1dc] .ret_from_kernel_thread+0x5c/0x80 Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-24powerpc/eeh: Use partial hotplug for EEH unaware driversGavin Shan
When EEH error happens to one specific PE, some devices with drivers supporting EEH won't except hotplug on the device. However, there might have other deivces without driver, or with driver without EEH support. For the case, we need do partial hotplug in order to make sure that the PE becomes absolutely quite during reset. Otherise, the PE reset might fail and leads to failure of error recovery. The current code doesn't handle that 'mixed' case properly, it either uses the error callbacks to the drivers, or tries hotplug, but doesn't handle a PE (EEH domain) composed of a combination of the two. The patch intends to support so-called "partial" hotplug for EEH: Before we do reset, we stop and remove those PCI devices without EEH sensitive driver. The corresponding EEH devices are not detached from its PE, but with special flag. After the reset is done, those EEH devices with the special flag will be scanned one by one. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-24powerpc/eeh: Keep PE during hotplugGavin Shan
When we do normal hotplug, the PE (shadow EEH structure) shouldn't be kept around. However, we need to keep it if the hotplug an artifial one caused by EEH errors recovery. Since we remove EEH device through the PCI hook pcibios_release_device(), the flag "purge_pe" passed to various functions is meaningless. So the patch removes the meaningless flag and introduce new flag "EEH_PE_KEEP" to save the PE while doing hotplug during EEH error recovery. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-01powerpc/eeh: Refactor the output messageGavin Shan
We needn't the the whole backtrace other than one-line message in the error reporting interrupt handler. For errors triggered by access PCI config space or MMIO, we replace "WARN(1, ...)" with pr_err() and dump_stack(). The patch also adds more output messages to indicate what EEH core is doing. Besides, some printk() are replaced with pr_warning(). Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc/eeh: EEH core to handle special eventGavin Shan
On PowerNV platform, the EEH event caused by interrupt won't have binding PE. The patch enables EEH core to handle the special event. To avoid the current logic we have, The eeh_handle_event() is renamed to eeh_handle_normal_event(), and the eeh_handle_special_event() is introduced. The function eeh_handle_event() dispatches to above two functions according to the input parameter. Besides, new backend "next_error" added to eeh_ops and it's expected to have following return values: 4 - Dead IOC 3 - Dead PHB 2 - Fenced PHB 1 - Frozen PE 0 - No error found Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc/eeh: Trace time on first error for PEGavin Shan
We're not expecting that one specific PE got frozen for over 5 times in last hour. Otherwise, the PE will be removed from the system upon newly coming EEH errors. The patch introduces time stamp to trace the first error on specific PE in last hour and function to update that accordingly. Besides, the time stamp is recovered during PE hotplug path as we did for frozen count. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>