summaryrefslogtreecommitdiff
path: root/drivers/edac/amd64_edac.c
AgeCommit message (Collapse)Author
2020-10-12Merge tag 'edac_updates_for_v5.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Add Amazon's Annapurna Labs memory controller EDAC driver (Talel Shenhar) - New AMD CPUs support (Yazen Ghannam) - The usual misc fixes and cleanups all over the subsystem * tag 'edac_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/amd64: Set proper family type for Family 19h Models 20h-2Fh EDAC/mc_sysfs: Add missing newlines when printing {max,dimm}_location EDAC/aspeed: Use module_platform_driver() to simplify EDAC, sb_edac: Simplify switch statement EDAC/ti: Fix handling of platform_get_irq() error EDAC/aspeed: Fix handling of platform_get_irq() error EDAC/i5100: Fix error handling order in i5100_init_one() EDAC/highbank: Handover Calxeda Highbank maintenance to Andre Przywara EDAC/socfpga: Transfer SoCFPGA EDAC maintainership EDAC/thunderx: Make symbol lmc_dfs_ents static EDAC/al-mc-edac: Add Amazon's Annapurna Labs Memory Controller driver dt-bindings: EDAC: Add Amazon's Annapurna Labs Memory Controller binding EDAC/mce_amd: Add new error descriptions for existing types EDAC: Replace HTTP links with HTTPS ones
2020-10-09EDAC/amd64: Set proper family type for Family 19h Models 20h-2FhYazen Ghannam
AMD Family 19h Models 20h-2Fh use the same PCI IDs as Family 17h Models 70h-7Fh. The same family ops and number of channels also apply. Use the Family17h Model 70h family_type and ops for Family 19h Models 20h-2Fh. Update the controller name to match the system. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201009171803.3214354-1-Yazen.Ghannam@amd.com
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-06-18EDAC/amd64: Read back the scrub rate PCI register on F15hBorislav Petkov
Commit: da92110dfdfa ("EDAC, amd64_edac: Extend scrub rate support to F15hM60h") added support for F15h, model 0x60 CPUs but in doing so, missed to read back SCRCTRL PCI config register on F15h CPUs which are *not* model 0x60. Add that read so that doing $ cat /sys/devices/system/edac/mc/mc0/sdram_scrub_rate can show the previously set DRAM scrub rate. Fixes: da92110dfdfa ("EDAC, amd64_edac: Extend scrub rate support to F15hM60h") Reported-by: Anders Andersson <pipatron@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> #v4.4.. Link: https://lkml.kernel.org/r/CAKkunMbNWppx_i6xSdDHLseA2QQmGJqj_crY=NF-GZML5np4Vw@mail.gmail.com
2020-06-11Merge branch 'x86/entry' into ras/coreThomas Gleixner
to fixup conflicts in arch/x86/kernel/cpu/mce/core.c so MCE specific follow up patches can be applied without creating a horrible merge conflict afterwards.
2020-05-29EDAC/amd64: Remove redundant assignment to variable ret in hw_info_get()Colin Ian King
The variable ret is being assigned with a value that is never read and it is being updated later with a new value. The initialization is redundant so remove it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200429154847.287001-1-colin.king@canonical.com
2020-05-22EDAC/amd64: Add AMD family 17h model 60h PCI IDsAlexander Monakov
Add support for AMD Renoir (4000-series Ryzen CPUs). Signed-off-by: Alexander Monakov <amonakov@ispras.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lkml.kernel.org/r/20200510204842.2603-4-amonakov@ispras.ru
2020-04-14x86/mce/amd, edac: Remove report_gart_errorsBorislav Petkov
... because no one should be interested in spurious MCEs anyway. Make the filtering unconditional and move it to amd_filter_mce(). Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200407163414.18058-2-bp@alien8.de
2020-03-24EDAC: Convert to new X86 CPU match macrosThomas Gleixner
The new macro set has a consistent namespace and uses C99 initializers instead of the grufty C89 ones. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200320131509.673579000@linutronix.de
2020-01-27Merge branch 'ras-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Misc fixes to the MCE code all over the place, by Jan H. Schönherr. - Initial support for AMD F19h and other cleanups to amd64_edac, by Yazen Ghannam. - Other small cleanups. * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: EDAC/mce_amd: Make fam_ops static global EDAC/amd64: Drop some family checks for newer systems EDAC/amd64: Add family ops for Family 19h Models 00h-0Fh x86/amd_nb: Add Family 19h PCI IDs EDAC/mce_amd: Always load on SMCA systems x86/MCE/AMD, EDAC/mce_amd: Add new Load Store unit McaType x86/mce: Fix use of uninitialized MCE message string x86/mce: Fix mce=nobootlog x86/mce: Take action on UCNA/Deferred errors again x86/mce: Remove mce_inject_log() in favor of mce_log() x86/mce: Pass MCE message to mce_panic() on failed kernel recovery x86/mce/therm_throt: Mark throttle_active_work() as __maybe_unused
2020-01-17EDAC/amd64: Do not warn when removing instancesBorislav Petkov
On machines which do not populate all nodes with DIMMs, the driver doesn't initialize an instance there. However, the instance removal remove_one_instance() path will warn unconditionally, which is wrong. Remove the WARN_ON() even if the warning is innocent because it causes a splat in dmesg. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200117115939.5524-1-bp@alien8.de
2020-01-16EDAC/amd64: Drop some family checks for newer systemsYazen Ghannam
In general, "pvt->umc != NULL" is used to check if the system is Family 17h+. However, there are a few places that are using direct family checks. Replace the remaining family checks with a check for "pvt->umc != NULL". Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200110015651.14887-6-Yazen.Ghannam@amd.com
2020-01-16EDAC/amd64: Add family ops for Family 19h Models 00h-0FhYazen Ghannam
Add family ops to support AMD Family 19h systems. Existing Family 17h functions can be used. Also, add Family 19h to the list of families to automatically load the module. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200110015651.14887-5-Yazen.Ghannam@amd.com
2019-11-09EDAC/amd64: Get rid of the ECC disabled long messageBorislav Petkov
This message keeps flooding dmesg on boxes where ECC is disabled or the DIMMs do not support ECC but the module gets auto-probed. What's even worse is that autoprobing happens on every CPU due to the CPU-family matching the driver does and uevent being generated for each CPU device. What is more, this message is becoming even more useless on newer systems where forcing ECC is not recommended and it should be done in the BIOS so the BIOS can do all the necessary work, i.e., just setting a bit in an MSR is not enough anymore. So get rid of it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/20191106160607.GC28380@zn.tnic
2019-11-06EDAC/amd64: Check for memory before fully initializing an instanceYazen Ghannam
Return early before checking for ECC if the node does not have any populated memory. Free any cached hardware data before returning. Also, return 0 in this case since this is not a failure. Other nodes may have memory and the module should attempt to load an instance for them. Move printing of hardware information to after the instance is initialized, so that the information is only printed for nodes with memory. Return an error code when ECC is disabled. This check happens after checking for memory. The module should explicitly fail to load if memory is populated on a node and ECC is disabled. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106012448.243970-6-Yazen.Ghannam@amd.com
2019-11-06EDAC/amd64: Use cached data when checking for ECCYazen Ghannam
...now that the data is available earlier. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106012448.243970-5-Yazen.Ghannam@amd.com
2019-11-06EDAC/amd64: Save max number of controllers to family typeYazen Ghannam
The maximum number of memory controllers is fixed within a family/model group. In most cases, this has been fixed at 2, but some systems may have up to 8. The struct amd64_family_type already contains family/model-specific information, and this can be used rather than adding model checks to various functions. Create a new field in struct amd64_family_type for max_mcs. Set this when setting other family type information, and use this when needing the maximum number of memory controllers possible for a system. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106012448.243970-4-Yazen.Ghannam@amd.com
2019-11-06EDAC/amd64: Gather hardware information earlyYazen Ghannam
Split out gathering hardware information from init_one_instance() into a separate function hw_info_get(). This is necessary so that the information can be cached earlier and used to check if memory is populated and if ECC is enabled on a node. Also, define a function hw_info_put() to back out changes made in hw_info_get(). Check for an allocated PCI device (Function 0 for Family 17h or Function 1 for pre-Family 17h) before freeing, since hw_info_put() may be called before PCI siblings are reserved. Drop the family check when freeing pvt->umc. This will be NULL on pre-Family 17h systems. However, kfree() is safe and will check for a NULL pointer before freeing. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106012448.243970-3-Yazen.Ghannam@amd.com
2019-11-06EDAC/amd64: Make struct amd64_family_type globalYazen Ghannam
The struct amd64_family_type doesn't change between multiple nodes and instances of the module, so make it global. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106012448.243970-2-Yazen.Ghannam@amd.com
2019-10-25EDAC/amd64: Set grain per DIMMYazen Ghannam
The following commit introduced a warning on error reports without a non-zero grain value. 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation") The amd64_edac_mod module does not provide a value, so the warning will be given on the first reported memory error. Set the grain per DIMM to cacheline size (64 bytes). This is the current recommendation. Fixes: 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191022203448.13962-7-Yazen.Ghannam@amd.com
2019-09-07EDAC/amd64: Add PCI device IDs for family 17h, model 70hIsaac Vaughn
Add the new Family 17h Model 70h PCI IDs (device 18h functions 0 and 6) to the AMD64 EDAC module. [ bp: s/f17_base_addr_to_cs_size/f17_addr_mask_to_cs_size/g ] Signed-off-by: Isaac Vaughn <isaac.vaughn@knights.ucf.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: linux-edac@vger.kernel.org Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190906192131.8ced0ca112146f32d82b6cae@knights.ucf.edu
2019-08-23EDAC/amd64: Support asymmetric dual-rank DIMMsYazen Ghannam
Future AMD systems will support asymmetric dual-rank DIMMs. These are DIMMs where the ranks are of different sizes. The even rank will use the Primary Even Chip Select registers and the odd rank will use the Secondary Odd Chip Select registers. Recognize if a Secondary Odd Chip Select is being used. Use the Secondary Odd Address Mask when calculating the chip select size. [ bp: move csrow_sec_enabled() to the header, fix CS_ODD define and tone-down the capitalized words spelling. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190821235938.118710-8-Yazen.Ghannam@amd.com
2019-08-23EDAC/amd64: Cache secondary Chip Select registersYazen Ghannam
AMD Family 17h systems have a set of secondary Chip Select Base Addresses and Address Masks. These do not represent unique Chip Selects, rather they are used in conjunction with the primary Chip Select registers in certain cases. Cache these secondary Chip Select registers for future use. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190821235938.118710-7-Yazen.Ghannam@amd.com
2019-08-23EDAC/amd64: Decode syndrome before translating addressYazen Ghannam
AMD Family 17h systems currently require address translation in order to report the system address of a DRAM ECC error. This is currently done before decoding the syndrome information. The syndrome information does not depend on the address translation, so the proper EDAC csrow/channel reporting can function without the address. However, the syndrome information will not be decoded if the address translation fails. Decode the syndrome information before doing the address translation. The syndrome information is architecturally defined in MCA_SYND and can be considered robust. The address translation is system-specific and may fail on newer systems without proper updates to the translation algorithm. Fixes: 713ad54675fd ("EDAC, amd64: Define and register UMC error decode function") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190821235938.118710-6-Yazen.Ghannam@amd.com
2019-08-23EDAC/amd64: Find Chip Select memory size using Address MaskYazen Ghannam
Chip Select memory size reporting on AMD Family 17h was recently fixed in order to account for interleaving. However, the current method is not robust. The Chip Select Address Mask can be used to find the memory size. There are a couple of cases. 1) For single-rank and dual-rank non-interleaved, use the address mask plus 1 as the size. 2) For dual-rank interleaved, do #1 but "de-interleave" the address mask first. Always "de-interleave" the address mask in order to simplify the code flow. Bit mask manipulation is necessary to check for interleaving, so just go ahead and do the de-interleaving. In the non-interleaved case, the original and de-interleaved address masks will be the same. To de-interleave the mask, count the number of zero bits in the middle of the mask and swap them with the most significant bits. For example, Original=0xFFFF9FE, De-interleaved=0x3FFFFFE Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190821235938.118710-5-Yazen.Ghannam@amd.com
2019-08-23EDAC/amd64: Initialize DIMM info for systems with more than two channelsYazen Ghannam
Currently, the DIMM info for AMD Family 17h systems is initialized in init_csrows(). This function is shared with legacy systems, and it has a limit of two channel support. This prevents initialization of the DIMM info for a number of ranks, so there will be missing ranks in the EDAC sysfs. Create a new init_csrows_df() for Family17h+ and revert init_csrows() back to pre-Family17h support. Loop over all channels in the new function in order to support systems with more than two channels. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190821235938.118710-4-Yazen.Ghannam@amd.com
2019-08-23EDAC/amd64: Recognize DRAM device type ECC capabilityYazen Ghannam
AMD Family 17h systems support x4 and x16 DRAM devices. However, the device type is not checked when setting mci.edac_ctl_cap. Set the appropriate capability flag based on the device type. Default to x8 DRAM device when neither the x4 or x16 bits are set. [ bp: reverse cpk_en check to save an indentation level. ] Fixes: 2d09d8f301f5 ("EDAC, amd64: Determine EDAC MC capabilities on Fam17h") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190821235938.118710-3-Yazen.Ghannam@amd.com
2019-08-22EDAC/amd64: Support more than two controllers for chip selects handlingYazen Ghannam
The struct chip_select array that's used for saving chip select bases and masks is fixed at length of two. There should be one struct chip_select for each controller, so this array should be increased to support systems that may have more than two controllers. Increase the size of the struct chip_select array to eight, which is the largest number of controllers per die currently supported on AMD systems. Fix number of DIMMs and Chip Select bases/masks on Family17h, because AMD Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per channel. Also, carve out the Family 17h+ reading of the bases/masks into a separate function. This effectively reverts the original bases/masks reading code to before Family 17h support was added. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190821235938.118710-2-Yazen.Ghannam@amd.com
2019-05-21treewide: Add SPDX license identifier for more missed filesThomas Gleixner
Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25Revert "EDAC/amd64: Support more than two controllers for chip select handling"Borislav Petkov
This reverts commit 0a227af521d6df5286550b62f4b591417170b4ea. Unfortunately, this commit caused wrong detection of chip select sizes on some F17h client machines: --- 00-rc6+ 2019-02-14 14:28:03.126622904 +0100 +++ 01-rc4+ 2019-04-14 21:06:16.060614790 +0200 EDAC amd64: MC: 0: 0MB 1: 0MB -EDAC amd64: MC: 2: 16383MB 3: 16383MB +EDAC amd64: MC: 2: 0MB 3: 2097151MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: UMC1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB -EDAC amd64: MC: 2: 16383MB 3: 16383MB +EDAC amd64: MC: 2: 0MB 3: 2097151MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0M Revert it for now until it has been solved properly. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com>
2019-03-27EDAC/amd64: Adjust printed chip select sizes when interleavedYazen Ghannam
AMD systems may support chip select interleaving. However, on family 17h+ this was not taken into account when printing the chip select sizes. Add support to detect if chip selects are interleaved on family 17h+, and adjust the sizes accordingly. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190228153558.127292-6-Yazen.Ghannam@amd.com
2019-03-27EDAC/amd64: Support more than two controllers for chip select handlingYazen Ghannam
The struct chip_select array that's used for saving chip select bases and masks is fixed at length of two. There should be one struct chip_select for each controller, so this array should be increased to support systems that may have more than two controllers. Increase the size of the struct chip_select array to eight, which is the largest number of controllers per die currently supported on AMD systems. Also, carve out the Family 17h+ reading of the bases/masks into a separate function. This effectively reverts the original bases/masks reading code to before Family 17h support was added. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190228153558.127292-5-Yazen.Ghannam@amd.com
2019-03-27EDAC/amd64: Recognize x16 symbol sizeYazen Ghannam
Future AMD systems may support x16 symbol sizes. Recognize if a system is using x16 symbol size. Also, simplify the print statement. Note that a x16 syndrome vector table is not necessary like with x4 or x8 syndromes. This is because systems that support x16 symbol sizes are SMCA systems and in that case, the syndrome can be directly extracted from the MCA_SYND[Syndrome] field. [ bp: massage. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190228153558.127292-4-Yazen.Ghannam@amd.com
2019-03-27EDAC/amd64: Set maximum channel layer size depending on familyYazen Ghannam
The AMD64 EDAC module currently hardcodes the EDAC channel layer size count to two. Future AMD systems may have more channels than this. Set the EDAC channel layer size equal to the maximum number of channels possible for the system. On Family 17h and later, this is set in the num_umcs variable. Older systems will continue to use two as the default. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190325203319.7603-1-Yazen.Ghannam@amd.com
2019-03-27EDAC/amd64: Support more than two Unified Memory ControllersYazen Ghannam
The first few models of Family 17h all had 2 Unified Memory Controllers per Die, so this was treated as a fixed value. However, future systems may have more Unified Memory Controllers per Die. Related to this, the channel number and base address of a Unified Memory Controller were found by matching on fixed, known values. However, current and future systems follow this pattern for the channel number and base address of a Unified Memory Controller: 0xYXXXXX, where Y is the channel number. So matching on hardcoded values is not necessary. Set the number of Unified Memory Controllers at driver init time based on the family/model. Also, update the functions that find the channel number and base address of a Unified Memory Controller to support more than two. [ bp: Move num_umcs into the .c file and simplify comment. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190228153558.127292-3-Yazen.Ghannam@amd.com
2019-03-27EDAC/amd64: Use a macro for iterating over Unified Memory ControllersYazen Ghannam
Define and use a macro for looping over the number of Unified Memory Controllers. No functional change. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190228153558.127292-2-Yazen.Ghannam@amd.com
2019-03-27EDAC/amd64: Add Family 17h Model 30h PCI IDsYazen Ghannam
Add the new Family 17h Model 30h PCI IDs to the AMD64 EDAC module. This also fixes a probe failure that appeared when some other PCI IDs for Family 17h Model 30h were added to the AMD NB code. Fixes: be3518a16ef2 (x86/amd_nb: Add PCI device IDs for family 17h, model 30h) Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190228153558.127292-1-Yazen.Ghannam@amd.com
2018-09-27EDAC, amd64: Add Hygon Dhyana supportPu Wen
Add support for Hygon Dhyana CPU to EDAC. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: mchehab@kernel.org Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: thomas.lendacky@amd.com Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/9d71061301177822bc55b3bfd44f91057458d886.1537533369.git.puwen@hygon.cn
2018-08-27EDAC, amd64: Add Family 17h, models 10h-2fh supportMichael Jin
Add new device IDs for family 17h, models 10h-2fh. This is required by amd64_edac_mod in order to properly detect PCI device functions 0 and 6. Signed-off-by: Michael Jin <mikhail.jin@gmail.com> Reviewed-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20180816192840.31166-1-mikhail.jin@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2018-06-12treewide: kzalloc() -> kcalloc()Kees Cook
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-02-15x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_steppingJia Zhang
x86_mask is a confusing name which is hard to associate with the processor's stepping. Additionally, correct an indent issue in lib/cpu.c. Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com> [ Updated it to more recent kernels. ] Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/1514771530-70829-1-git-send-email-qianyue.zj@alibaba-inc.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-25EDAC: Add owner check to the x86 platform driversToshi Kani
Change x86 EDAC platform drivers to verify the module owner at the beginning of their module init functions. This allows them to fail their init immediately when ghes_edac is enabled. Similar change can be made to other edac drivers if necessary. Also, remove ".c" from module names of pnp2_edac, sb_edac, and skx_edac. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Suggested-by: Borislav Petkov <bp@alien8.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170823225447.15608-6-toshi.kani@hpe.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-07-17EDAC: Get rid of mci->mod_verBorislav Petkov
It is a write-only variable so get rid of it. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Robert Richter <rric@kernel.org> Acked-by: Michal Simek <michal.simek@xilinx.com> Acked-by: Thor Thayer <thor.thayer@linux.intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Jason Baron <jbaron@akamai.com> Cc: "Sören Brinkmann" <soren.brinkmann@xilinx.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Loc Ho <lho@apm.com> Cc: linux-edac@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@linux-mips.org
2017-05-03EDAC, amd64: Fix reporting of Chip Select sizes on Fam17hYazen Ghannam
The wrong index into the csbases/csmasks arrays was being passed to the function to compute the chip select sizes, which resulted in the wrong size being computed. Address that so that the correct values are computed and printed. Also, redo how we calculate the number of pages in a CS row. Reported-by: Benjamin Bennett <benbennett@gmail.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: <stable@vger.kernel.org> # 4.10.x Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1493313114-11260-1-git-send-email-Yazen.Ghannam@amd.com [ Remove unneeded integer math comment, minor cleanups. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28EDAC, amd64: Add x86cpuid sanity check during initYazen Ghannam
Match one of the devices in amd64_cpuids[] before loading the module. This is an additional sanity check against users trying to load amd64_edac_mod on unsupported systems. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-9-git-send-email-Yazen.Ghannam@amd.com [ Get rid of err_ret label, make it a bit more readable this way. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28EDAC, amd64: Don't treat ECC disabled as failureYazen Ghannam
Having ECC disabled on a node doesn't necessarily mean that it's disabled for the entire system. So let's return a non-failing code when ECC is disabled on a node. This way we can skip initialization for the node but still continue with the remaining nodes. After probing all instances, make sure we have at least one MC device allocated. This issue is seen and fix tested on Fam15h and Fam17h MCM systems. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-8-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28EDAC, amd64: Rework messages in ecc_enabled()Yazen Ghannam
Print the node number when informing that DRAM ECC is disabled so that we can show which nodes have DRAM ECC disabled. Also, print more detailed system information as edac_dbg(), so as to not bother general users. Switch amd64_notice to amd64_info to match the message above it. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-5-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28EDAC, amd64: Move global code out of instance functionsYazen Ghannam
We have a few functions that register/unregister an ECC error decoding routine. These functions are called when we init/remove instances. However, they are global and so don't need to be registered/unregistered multiple times. So move them out of the init/remove instance functions and into the module init/exit routines. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485297149-13733-4-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28EDAC, amd64: Free unused memory when init_one_instance() failsYazen Ghannam
Jump to memory freeing routines when init_one_instance() fails. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485297149-13733-3-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-16EDAC, amd64: Save and return err code from probe_one_instance()Yazen Ghannam
We should save the return code from probe_one_instance() so that it can be returned from the module init function. Otherwise, we'll be returning the -ENOMEM from above. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1484322741-41884-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>