|
|
Subscribe / Log in / New account

FRU Memory Poison Manager

From:  Yazen Ghannam <yazen.ghannam-AT-amd.com>
To:  <bp-AT-alien8.de>, <tony.luck-AT-intel.com>, <linux-edac-AT-vger.kernel.org>
Subject:  [PATCH 0/2] FRU Memory Poison Manager
Date:  Tue, 13 Feb 2024 21:35:14 -0600
Message-ID:  <20240214033516.1344948-1-yazen.ghannam@amd.com>
Cc:  <linux-kernel-AT-vger.kernel.org>, <avadhut.naik-AT-amd.com>, <john.allen-AT-amd.com>, <muralidhara.mk-AT-amd.com>, <naveenkrishna.chatradhi-AT-amd.com>, <sathyapriya.k-AT-amd.com>, Yazen Ghannam <yazen.ghannam-AT-amd.com>
Archive-link:  Article

Hi all,

This set adds a new module to manage error records on persistent
storage.

Patch 1 moves a function from AMD64 EDAC to the AMD Address Translation
Library. This is needed for patch 2.

Patch 2 adds the new module. This is a near total rewrite based on patch
2 from the following set:
https://lore.kernel.org/r/20231129075034.2159223-1-murali...

I included questions in code comments where I think more attention is
needed.

I'd like to add Murali and Naveen as Co-developers, since this is based
on their work. Also, I kept Naveen as a maintainer in case he's still
interested.

Regarding the old set:
 * Patch 1 exports a new function from the ERST driver. This is not
   necessary.

 * Patch 3 adds a new sysfs interface. This needs more work.

 * Patch 4 old set adds documentation. This needs updating.

I did some basic testing on a 2P server system without ERST support.
Mostly I tried to check out the memory layout of the structures. And I
did some memory error injections to check out the record updating flow.
I did some fixups after testing, so I apologize if I missed anything.

Thanks,
Yazen

Yazen Ghannam (2):
  RAS/AMD/ATL, EDAC/amd64: Move MI300 Row Retirement to ATL
  RAS: Introduce the FRU Memory Poison Manager

 MAINTAINERS                 |   7 +
 drivers/edac/Kconfig        |   1 -
 drivers/edac/amd64_edac.c   |  48 ---
 drivers/ras/Kconfig         |  13 +
 drivers/ras/Makefile        |   1 +
 drivers/ras/amd/atl/Kconfig |   1 +
 drivers/ras/amd/atl/umc.c   |  51 +++
 drivers/ras/amd/fmpm.c      | 776 ++++++++++++++++++++++++++++++++++++
 include/linux/ras.h         |   2 +
 9 files changed, 851 insertions(+), 49 deletions(-)
 create mode 100644 drivers/ras/amd/fmpm.c


base-commit: c2064388aa8765abd7c2c5785e7bfe266a2f6cd3
-- 
2.34.1




Copyright © 2024, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds