summaryrefslogtreecommitdiff
path: root/drivers/edac
AgeCommit message (Collapse)Author
2016-09-13EDAC/mce_amd: Print syndrome register value on SMCA systemsYazen Ghannam
Print SyndV bit status and print the raw value of the MCA_SYND register. Further decoding of the syndrome from struct mce.synd can be done in other places where appropriate, e.g. DRAM ECC. Boris: make the error stanza more compact by putting the error address and syndrome on the same line: [Hardware Error]: Corrected error, no action required. [Hardware Error]: CPU:2 (17:0:0) MC4_STATUS[-|CE|-|PCC|AddrV|-|-|SyndV|CECC]: 0x96204100001e0117 [Hardware Error]: Error Addr: 0x000000007f4c52e3, Syndrome: 0x0000000000000000 [Hardware Error]: Invalid IP block specified. [Hardware Error]: cache level: L3/GEN, tx: DATA, mem-tx: RD Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1467633035-32080-2-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-12EDAC, sb_edac: Remove NULL pointer check on array pci_tadColin Ian King
pvt->pci_tad is a NUM_CHANNELS array of struct pci_dev pointers and hence cannot be NULL, so the NULL pointer check on pci_tad is redundant. Remove it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20160908083801.14766-1-colin.king@canonical.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-12EDAC: Remove NO_IRQ from powerpc-only driversMichael Ellerman
We'd like to eventually remove NO_IRQ on powerpc, so remove usages of it from powerpc-only drivers. The pdata structs are kzalloc'ed, so we don't need to initialise those to 0, we can just drop the assignments entirely. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Johannes Thumshirn <morbidrsa@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linuxppc-dev@ozlabs.org Link: http://lkml.kernel.org/r/1473674436-19467-1-git-send-email-mpe@ellerman.id.au Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-09EDAC, fsl_ddr: Fix error return code in fsl_mc_err_probe()Wei Yongjun
Return negative error code from the edac_mc_add_mc() error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: York Sun <york.sun@nxp.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1473350284-26482-1-git-send-email-weiyj.lk@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01EDAC, fsl_ddr: Replace simple_strtoul() with kstrtoul()York Sun
Replace obsolete simple_strtoul() with kstrtoul(). Signed-off-by: York Sun <york.sun@nxp.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1471990593-27536-1-git-send-email-york.sun@nxp.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01EDAC, layerscape: Add Layerscape EDAC supportYork Sun
Add DDR EDAC driver for ARM-based compatible controllers. Both big-endian and little-endian are supported, as specified in device tree. Signed-off-by: York Sun <york.sun@nxp.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1471990465-27443-1-git-send-email-york.sun@nxp.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01EDAC, fsl_ddr: Fix IRQ dispose warning when module is removedYork Sun
When compiled as a module, removing it causes kernel warnings when irq_dispose_mapping() is called. Instead of calling irq_of_parse_and_map(), use platform_get_irq() to acquire the IRQ number. Signed-off-by: York Sun <york.sun@nxp.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: morbidrsa@gmail.com Cc: oss@buserror.net Cc: stuart.yoder@nxp.com Link: http://lkml.kernel.org/r/1470779760-16483-8-git-send-email-york.sun@nxp.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01EDAC, fsl_ddr: Add support for little endianYork Sun
Get endianness from device tree. Both big endian and little endian are supported. Default to big endian for backwards compatibility to MPC85xx. Signed-off-by: York Sun <york.sun@nxp.com> Acked-by: Rob Herring <robh+dt@kernel.org> Cc: devicetree@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: morbidrsa@gmail.com Cc: oss@buserror.net Cc: stuart.yoder@nxp.com Link: http://lkml.kernel.org/r/1470779760-16483-7-git-send-email-york.sun@nxp.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01EDAC, fsl_ddr: Add missing DDR DRAM typesYork Sun
The compatible DDR controllers may support DDR, DDR2, DDR3, DDR4 DRAM. An individual controller doesn't support all of them. The EDAC driver reads SDRAM_CFG to determine which mode is configured. Add DDR4 and drop the defines used only in the mtype assignment. Signed-off-by: York Sun <york.sun@nxp.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: morbidrsa@gmail.com Cc: oss@buserror.net Cc: stuart.yoder@nxp.com Link: http://lkml.kernel.org/r/1470779760-16483-6-git-send-email-york.sun@nxp.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01EDAC, fsl_ddr: Rename macros and namesYork Sun
Use FSL-specific prefix for macros, variables and functions. Signed-off-by: York Sun <york.sun@nxp.com> Cc: Johannes Thumshirn <morbidrsa@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: oss@buserror.net Cc: stuart.yoder@nxp.com Link: http://lkml.kernel.org/r/1470779760-16483-5-git-send-email-york.sun@nxp.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01EDAC, fsl-ddr: Separate FSL DDR driver from MPC85xxYork Sun
The mpc85xx-compatible DDR controllers are used on ARM-based SoCs too. Carve out the DDR part from the mpc85xx EDAC driver in preparation to support both architectures. Signed-off-by: York Sun <york.sun@nxp.com> Cc: Johannes Thumshirn <morbidrsa@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: oss@buserror.net Cc: stuart.yoder@nxp.com Link: http://lkml.kernel.org/r/1470946525-3410-1-git-send-email-york.sun@nxp.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01EDAC, mpc85xx: Replace printk() with pr_* formatYork Sun
Replace printk() with pr_err/pr_warn/pr_info macros. Signed-off-by: York Sun <york.sun@nxp.com> Cc: Johannes Thumshirn <morbidrsa@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: oss@buserror.net Cc: stuart.yoder@nxp.com Link: http://lkml.kernel.org/r/1470779760-16483-3-git-send-email-york.sun@nxp.com [ Boris: unbreak strings for easier greppability. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01EDAC, mpc85xx: Drop setting/clearing RFXE bit in HID1York Sun
On e500v1, read fault exception enable (RFXE) controls whether assertion of core_fault_in causes a machine check interrupt. Assertion of core_fault_in can result from uncorrectable data error, such as an L2 multi-bit ECC error. It can also occur from a system error if logic on the integrated device signals a fault for nonfatal errors. RFXE bit is cleared out of reset, and should be left clear for normal operation. Assertion of core_fault_in does not cause a machine check. RFXE is set specifically for RIO (Rapid IO) and PCI for book E to catch the errors by machine check. With this bit set, the EDAC driver can't get the interrupt in case of uncorrectable error. So this bit is cleared in favor of EDAC. However, the benefit of catching such uncorrectable error doesn't outweigh the other errors which may hang the system. Besides, e500v2 has different errors masked by RFXE, and e500mc doesn't support this bit. It is more reasonable to leave RFXE as is in the EDAC driver, and leave the uncorrectable errors triggering machine check for e500v1. Suggested-by: Scott Wood <oss@buserror.net> Signed-off-by: York Sun <york.sun@nxp.com> Cc: Johannes Thumshirn <morbidrsa@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: oss@buserror.net Cc: stuart.yoder@nxp.com Link: http://lkml.kernel.org/r/1470779760-16483-2-git-send-email-york.sun@nxp.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01EDAC, altera: Rename MC trigger to common nameThor Thayer
Rename the Memory Controller debug trigger to the same common name as the EDAC devices. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1471622666-15197-3-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01EDAC, altera: Rename device trigger to common nameThor Thayer
The L2 and OCRAM devices have different ecc trigger names than the other EDAC devices (FIFO peripherals). Make them all the same and remove the character array from the device structure. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1471622666-15197-2-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-21EDAC, skx_edac: Add EDAC driver for SkylakeTony Luck
This is an entirely new driver instead of yet another set of patches to sb_edac.c because: 1) Mapping from PCI devices to socket/memory controller is significantly different. Skylake scatters devices on a socket across a number of PCI buses. 2) There is an extra level of interleaving via the "mcroute" register that would be a little messy to squeeze into the old driver. 3) Validation is getting too expensive. Changes to sb_edac need to be checked against Sandy Bridge, Ivy Bridge, Haswell, Broadwell and Knights Landing. Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-18EDAC, mpc85xx: Fix PCIe error captureTillmann Heidsieck
According to the reference manual of MPC8572 and T4240, bit 31 of PEX_ERR_CAP_STAT is W1C (write 1 to clear). Add the corresponding write to PEX_ERR_CAP_STAT in order to fix the PCIe error capture. Tested on a T4240 processor. Signed-off-by: Tillmann Heidsieck <theidsieck@leenox.de> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20160815190849.29327-1-theidsieck@leenox.de Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-15EDAC, wq: Remove deprecated create_singlethread_workqueue()Bhaktipriya Shridhar
Replace the deprecated create_singlethread_workqueue() with alloc_ordered_workqueue() with WQ_MEM_RECLAIM. This is the identity conversion. It's not recommended to stall it from memory pressure. Hence, WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20160813164124.GA9077@Karyakshetra Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-10EDAC, altera: Make a10_eccmgr_ic_ops staticWei Yongjun
Fix the following sparse warning: drivers/edac/altera_edac.c:1649:23: warning: symbol 'a10_eccmgr_ic_ops' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Reviewed-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lkml <linux-kernel@vger.kernel.org> Link: http://lkml.kernel.org/r/1470836667-11822-1-git-send-email-weiyj.lk@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-10EDAC, altera: Add Arria10 SD-MMC EDAC supportThor Thayer
Add Altera Arria10 SD-MMC FIFO memory EDAC support. The SD-MMC is a dual port RAM implementation which is different than any of the other peripherals and therefore requires additional code. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1470753653-23465-3-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-08EDAC, amd64: Fix channel decode on Fam15hMod60h systemsYazen Ghannam
Fam15hMod60h systems are using the channel decode of Fam15hMod30h which gives incorrect results. Fam15hMod60h systems should use the generic channel decode method plus a couple more cases. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1470236355-30039-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-08EDAC, altera: Add Arria10 QSPI supportThor Thayer
Add Altera Arria10 QSPI FIFO memory support. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1468512408-5156-9-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-08EDAC, altera: Add Arria10 USB supportThor Thayer
Add Altera Arria10 USB FIFO memory support. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1468512408-5156-8-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-08EDAC, altera: Add Arria10 DMA supportThor Thayer
Add Altera Arria10 DMA FIFO memory support. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1468512408-5156-7-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-08EDAC, altera: Add Arria10 NAND supportThor Thayer
Add Altera Arria10 NAND FIFO memory support. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1468512408-5156-6-git-send-email-tthayer@opensource.altera.com [ Reformat loop in altr_edac_a10_probe() for better readability. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-08EDAC, sb_edac: Fix channel reporting on Knights LandingLukasz Odzioba
On Intel Xeon Phi Knights Landing processor family the channels of the memory controller have untypical arrangement - MC0 is mapped to CH3,4,5 and MC1 is mapped to CH0,1,2. This causes the EDAC driver to report the channel name incorrectly. We missed this change earlier, so the code already contains similar comment, but the translation function is incorrect. Without this patch: errors in DIMM_A and DIMM_D were reported in DIMM_D errors in DIMM_B and DIMM_E were reported in DIMM_E errors in DIMM_C and DIMM_F were reported in DIMM_F Correct this. Hubert Chrzaniuk: - rebased to 4.8 - comments and code cleanup Fixes: d0cdf9003140 ("sb_edac: Add Knights Landing (Xeon Phi gen 2) support") Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Cc: lukasz.odzioba@intel.com Cc: mchehab@kernel.org Cc: <stable@vger.kernel.org> # v4.5.. Link: http://lkml.kernel.org/r/1469231089-22837-1-git-send-email-lukasz.odzioba@intel.com Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com> [ Boris: Simplify a bit by removing char mc. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2016-07-27Merge tag 'edac_for_4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds
Pull EDAC updates from Borislav Petkov: "This last cycle, Thor was busy adding Arria10 eth FIFO support to the altera_edac driver along with other improvements. We have two cleanups/fixes too. Summary: - Altera Arria10 ethernet FIFO buffer support (Thor Thayer) - Minor cleanups" * tag 'edac_for_4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: ARM: dts: Add Arria10 Ethernet EDAC devicetree entry EDAC, altera: Add Arria10 Ethernet EDAC support EDAC, altera: Add Arria10 ECC memory init functions Documentation: dt: socfpga: Add Arria10 Ethernet binding EDAC, altera: Drop some ifdeffery EDAC, altera: Add panic flag check to A10 IRQ EDAC, altera: Check parent status for Arria10 EDAC block EDAC, altera: Make all private data structures static EDAC: Correct channel count limit EDAC, amd64_edac: Init opstate at the proper time during init EDAC, altera: Handle Arria10 SDRAM child node EDAC, altera: Add ECC Manager IRQ controller support Documentation: dt: socfpga: Add interrupt-controller to ecc-manager
2016-07-16EDAC, sb_edac: Fix Knights LandingTony Luck
In commit 2c1ea4c700af ("EDAC, sb_edac: Use cpu family/model in driver detection") I broke Knights Landing because I failed to notice that it called a wrapper macro "sbridge_get_all_devices_knl" instead of "sbridge_get_all_devices" like all the other types. Now that we include the processor type in the pci_id_table structure we can skip the wrappers and just have the sbridge_get_all_devices() check the type to decide whether to allow duplicate devices and controllers to have registers spread across buses. Fixes: 2c1ea4c700af ("EDAC, sb_edac: Use cpu family/model in driver detection") Tested-by: Lukasz Odzioba <lukasz.odzioba@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-25EDAC, altera: Add Arria10 Ethernet EDAC supportThor Thayer
Add Altera Arria10 Ethernet FIFO memory EDAC support. Update to support a common compatibility string for all Ethernet FIFOs in the DT. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1466603939-7526-8-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-24EDAC, altera: Add Arria10 ECC memory init functionsThor Thayer
In preparation for additional memory module ECCs, add the memory initialization functions and helpers. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1466603939-7526-7-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-24EDAC, altera: Drop some ifdefferyThor Thayer
Make the IRQ and check_deps() functions available to all the memory buffers by moving them outside of the OCRAM only area. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1466603939-7526-5-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-24EDAC, altera: Add panic flag check to A10 IRQThor Thayer
In preparation for additional memory module ECCs, the IRQ function will check a panic flag before doing a kernel panic on double bit errors. OCRAM uncorrectable errors cause a panic because sleep/resume functions and FPGA contents during sleep are stored in OCRAM. ECCs on peripheral FIFO buffers will not cause a kernel panic on DBERRs because the packet can be retried and therefore recovered. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1466603939-7526-3-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-24EDAC, altera: Check parent status for Arria10 EDAC blockThor Thayer
In preparation for the Arria10 ECC modules, check the status of the parent in the device tree to ensure the block is enabled. Skip if no parent phandle is set in the device tree. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1466603939-7526-2-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-24EDAC, altera: Make all private data structures staticThor Thayer
The device private data structures are used only here so make them static. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1466603939-7526-4-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-16EDAC: Correct channel count limitBorislav Petkov
c44696fff04f ("EDAC: Remove arbitrary limit on number of channels") lifted the arbitrary limit on memory controller channels in EDAC. However, the dynamic channel attributes dynamic_csrow_dimm_attr and dynamic_csrow_ce_count_attr remained 6. This wasn't a problem except channels 6 and 7 weren't visible in sysfs on machines with more than 6 channels after the conversion to static attr groups with 2c1946b6d629 ("EDAC: Use static attribute groups for managing sysfs entries") [ without that, we're exploding in edac_create_sysfs_mci_device() because we're dereferencing out of the bounds of the dynamic_csrow_dimm_attr array. ] Add attributes for channels 6 and 7 along with a guard for the future, should more channels be required and/or to sanity check for misconfigured machines. We still need to check against the number of channels present on the MC first, as Thor reported. Signed-off-by: Borislav Petkov <bp@suse.de> Reported-by: Hironobu Ishii <ishii.hironobu@jp.fujitsu.com> Tested-by: Thor Thayer <tthayer@opensource.altera.com> Cc: <stable@vger.kernel.org> # 4.2
2016-06-16EDAC, amd64_edac: Init opstate at the proper time during initBorislav Petkov
It is useless to do it if we're loaded on unsupported hardware so do that only after we have detected at least 1 supported AMD northbridge. Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-08EDAC, altera: Handle Arria10 SDRAM child nodeThor Thayer
Separate the device match arrays for each platform to prevent CycloneV matches when calling of_platform_populate() on the Arria10 ECC manager node. If the SDRAM is a child node of ECC manager, call probe function via of_platform_populate(). Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1464193783-5071-4-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-08EDAC, altera: Add ECC Manager IRQ controller supportThor Thayer
To better support child devices, the ECC manager needs to be implemented as an IRQ controller. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1465331757-10227-1-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-03EDAC, sb_edac: Readd accidentally dropped Broadwell-D supportTony Luck
In commit 2c1ea4c700af ("EDAC, sb_edac: Use cpu family/model in driver detection") we switched from using PCI ids to determine which platform we are running on to using CPU model instead. I forgot that Broadwell-DE has its own distinct model number different from Broadwell-EP or -EX. Fixing this isn't just adding a line to the array of cpuids - the exising code assumed a 1:1 mapping between entries in that array and the "enum type" values. Added the type to pci_id_table structure to remove this dependency and allows two Broadwell cpu models. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: 2c1ea4c700af ("EDAC, sb_edac: Use cpu family/model in driver detection") Link: http://lkml.kernel.org/r/b3cffe40dec6dfe0235a5d52a504f0ba86a07ce7.1464902605.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-03EDAC: Fix workqueues poll period resettingNicholas Krause
After the workqueue cleanup, we're registering workqueues based on the presence of an ->edac_check function. When that is the case, we're setting OP_RUNNING_POLL. But we forgot to check that in edac_mc_reset_delay_period(), leading to: BUG: unable to handle kernel paging request at 0000000000015d10 IP: [ .. ] queued_spin_lock_slowpath PGD 3ffcc8067 PUD 3ffc56067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: ... CPU: 1 PID: 2792 Comm: edactest Not tainted 4.6.0-dirty #1 Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013 Stack: Call Trace: ? _raw_spin_lock_irqsave ? lock_timer_base.isra.34 ? del_timer ? try_to_grab_pending ? mod_delayed_work_on ? edac_mc_reset_delay_period ? edac_set_poll_msec ? param_attr_store ? module_attr_store ? kernfs_fop_write ? __vfs_write ? __vfs_read ? __alloc_fd ? vfs_write ? SyS_write ? entry_SYSCALL_64_fastpath Code: RIP [ .. ] queued_spin_lock_slowpath RSP <> CR2: 0000000000015d10 ---[ end trace 3f286bc71cca15d1 ]--- Kernel panic - not syncing: Fatal exception Fix it. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Cc: <stable@vger.kernel.org> # 4.5 Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1463697958-13406-1-git-send-email-xerofoify@gmail.com [ Rewrite commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-03EDAC, sb_edac: Fix rank lookup on BroadwellTony Luck
Broadwell made a small change to the rank target register moving the target rank ID field up from bits 16:19 to bits 20:23. Also found that the offset field grew by one bit in the IVY_BRIDGE to HASWELL transition, so fix the RIR_OFFSET() macro too. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: stable@vger.kernel.org # v3.19+ Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/2943fb819b1f7e396681165db9c12bb3df0e0b16.1464735623.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-05-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (21 commits) gitignore: fix wording mfd: ab8500-debugfs: fix "between" in printk memstick: trivial fix of spelling mistake on management cpupowerutils: bench: fix "average" treewide: Fix typos in printk IB/mlx4: printk fix pinctrl: sirf/atlas7: fix printk spelling serial: mctrl_gpio: Grammar s/lines GPIOs/line GPIOs/, /sets/set/ w1: comment spelling s/minmum/minimum/ Blackfin: comment spelling s/divsor/divisor/ metag: Fix misspellings in comments. ia64: Fix misspellings in comments. hexagon: Fix misspellings in comments. tools/perf: Fix misspellings in comments. cris: Fix misspellings in comments. c6x: Fix misspellings in comments. blackfin: Fix misspelling of 'register' in comment. avr32: Fix misspelling of 'definitions' in comment. treewide: Fix typos in printk Doc: treewide : Fix typos in DocBook/filesystem.xml ...
2016-05-16Merge tag 'edac_for_4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds
Pull EDAC updates from Borislav Petkov: "It was pretty busy in EDAC land this time: - Altera Arria10 L2 cache and On-Chip RAM ECC handling (Thor Thayer) - Remove ad-hoc buffering of MCE records in sb_edac and i7core_edac (Tony Luck) - Do not register sb_edac with pci_register_driver() (Tony Luck) - Add support for Skylake to ie31200_edac (Jason Baron) - Do not register amd64_edac with pci_register_driver() (Borislav Petkov) ... plus the usual round of cleanups and fixes all over the place" * tag 'edac_for_4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (25 commits) EDAC, amd64_edac: Drop pci_register_driver() use EDAC, ie31200_edac: Add Skylake support EDAC, sb_edac: Use cpu family/model in driver detection EDAC, i7core: Remove double buffering of error records EDAC, amd64_edac: Issue driver banner only on success ARM: socfpga: Initialize Arria10 OCRAM ECC on startup EDAC: Increment correct counter in edac_inc_ue_error() EDAC, sb_edac: Remove double buffering of error records EDAC: Fix used after kfree() error in edac_unregister_sysfs() EDAC, altera: Avoid unused function warnings EDAC, altera: Remove useless casts ARM: socfpga: Enable Arria10 OCRAM ECC on startup EDAC, altera: Add Arria10 OCRAM ECC support Documentation: dt: socfpga: Add Altera Arria10 OCRAM binding EDAC, altera: Make OCRAM ECC dependency check generic EDAC, altera: Add register offset for ECC Enable EDAC, altera: Extract error inject operations to a struct fops ARM: socfpga: Enable Arria10 L2 cache ECC on startup EDAC, altera: Add Arria10 L2 Cache ECC handling Documentation, dt, socfpga: Add Altera Arria10 L2 cache binding ...
2016-05-12EDAC, mce_amd: Detect SMCA using X86_FEATURE_SMCAYazen Ghannam
Use X86_FEATURE_SMCA when detecting if SMCA is available instead of directly using CPUID 0x80000007_EBX. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1462971509-3856-7-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-09EDAC, amd64_edac: Drop pci_register_driver() useBorislav Petkov
- remove homegrown instances counting. - take F3 PCI device from amd_nb caching instead of F2 which was used with the PCI core. With those changes, the driver doesn't need to register a PCI driver and relies on the northbridges caching which we do anyway on AMD. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com>
2016-05-06EDAC, ie31200_edac: Add Skylake supportJason Baron
Skylake adjusts some register locations, but otherwise follows the existing model quite closely. I was able to verify that the 'ce_count' increments when 'bad dimms' are used. The accounting of 'ce_count' and 'ue_count' is the primary functionality of interest for us. Tested on Intel(R) Xeon(R) CPU E3-1260L v5 @ 2.90GHz. Signed-off-by: Jason Baron <jbaron@akamai.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1462547927-22679-1-git-send-email-jbaron@akamai.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-05-02EDAC, sb_edac: Use cpu family/model in driver detectionTony Luck
Instead of picking a random PCI ID from the dozen or so we need to access, just use x86_match_cpu() to pick based on CPU model number. The choosing of PCI devices has been problematic in the past, see 11249e739929 ("sb_edac: Fix detection on SNB machines") which fixed problems introduced by d0585cd815fa ("sb_edac: Claim a different PCI device"). This is especially ugly if future hardware might not even have EDAC-relevant registers in PCI config space and we would still be required to choose some "random" PCI devices to scan for just so our driver loads. Is this cleaner/clearer? It deletes much more code than it adds. Only tested on Broadwell. The driver loads/unloads and loads again. Still decodes errors too. Signed-off-by: Tony Luck <tony.luck@intel.com> Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-29EDAC, i7core: Remove double buffering of error recordsTony Luck
In the bad old days the functions from x86_mce_decoder_chain could be called in machine check context. So we used to carefully copy them and defer processing until later. But in f29a7aff4bd60 ("x86/mce: Avoid potential deadlock due to printk() in MCE context") we switched the logging code to save the record in a genpool, and call the functions that registered to be notified later from a work queue. So drop all the double buffering and do all the work we want to do as soon as i7core_mce_check_error() is called. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/29ab2c370915c6e132fc5d88e7b72cb834bedbfe.1461855008.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-29EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callbackTony Luck
Both of these drivers can return NOTIFY_BAD, but this terminates processing other callbacks that were registered later on the chain. Since the driver did nothing to log the error it seems wrong to prevent other interested parties from seeing it. E.g. neither of them had even bothered to check the type of the error to see if it was a memory error before the return NOTIFY_BAD. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/72937355dd92318d2630979666063f8a2853495b.1461864507.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-27EDAC, amd64_edac: Issue driver banner only on successBorislav Petkov
... and don't mislead users into thinking that the driver has loaded successfully. Signed-off-by: Borislav Petkov <bp@suse.de>