powerpc/eeh: Only dump stack once if an MMIO loop is detected

Many drivers don't check for errors when they get a 0xFFs response from an MMIO load. As a result after an EEH event occurs a driver can get stuck in a polling loop unless it some kind of internal timeout logic. Currently EEH tries to detect and report stuck drivers by dumping a stack trace after eeh_dev_check_failure() is called EEH_MAX_FAILS times on an already frozen PE. The value of EEH_MAX_FAILS was chosen so that a dump would occur every few seconds if the driver was spinning in a loop. This results in a lot of spurious stack traces in the kernel log. Fix this by limiting it to printing one stack trace for each PE freeze. If the driver is truely stuck the kernel's hung task detector is better suited to reporting the probelm anyway. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191016012536.22588-1-oohall@gmail.com
author: Oliver O'Halloran <oohall@gmail.com> 2019-10-16 12:25:36 +1100
committer: Michael Ellerman <mpe@ellerman.id.au> 2020-01-23 21:31:20 +1100
commit: 4e0942c0302b5ad76b228b1a7b8c09f658a1d58a (patch)
tree: 55653c2bc697c1108034e1ac6ade3560124073ce
parent: 18697d2b08622b35278fc4312e86e6b5d3cd758d (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index daf9ff34a255..17cb3e9b5697 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -503,7 +503,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	rc = 1;
 	if (pe->state & EEH_PE_ISOLATED) {
 		pe->check_count++;
-		if (pe->check_count % EEH_MAX_FAILS == 0) {
+		if (pe->check_count == EEH_MAX_FAILS) {
 			dn = pci_device_to_OF_node(dev);
 			if (dn)
 				location = of_get_property(dn, "ibm,loc-code",
author	Oliver O'Halloran <oohall@gmail.com>	2019-10-16 12:25:36 +1100
committer	Michael Ellerman <mpe@ellerman.id.au>	2020-01-23 21:31:20 +1100
commit	4e0942c0302b5ad76b228b1a7b8c09f658a1d58a (patch)
tree	55653c2bc697c1108034e1ac6ade3560124073ce
parent	18697d2b08622b35278fc4312e86e6b5d3cd758d (diff)