|
|
Subscribe / Log in / New account

drm/xe/xe_drm_client: Add per drm client reset stats

From:  Jonathan Cavitt <jonathan.cavitt-AT-intel.com>
To:  intel-xe-AT-lists.freedesktop.org
Subject:  [PATCH v3 0/6] drm/xe/xe_drm_client: Add per drm client reset stats
Date:  Wed, 19 Feb 2025 16:23:34 +0000
Message-ID:  <20250219162340.116499-1-jonathan.cavitt@intel.com>
Cc:  saurabhg.gupta-AT-intel.com, alex.zuo-AT-intel.com, jonathan.cavitt-AT-intel.com, joonas.lahtinen-AT-linux.intel.com, tvrtko.ursulin-AT-ursulin.net, lucas.demarchi-AT-intel.com, matthew.brost-AT-intel.com, dri-devel-AT-lists.freedesktop.org, simona.vetter-AT-ffwll.ch
Archive-link:  Article

Add additional information to drm client so it can report the last 50
relevant exec queues to have been banned on it, as well as the last
pagefault seen when said exec queues were banned. Since we cannot
reasonably associate a pagefault to a specific exec queue, we currently
report the last seen pagefault on the associated hw engine instead.

The last pagefault seen per exec queue is saved to the hw engine, and the
pagefault is updated during the pagefault handling process in
xe_gt_pagefault. The last seen pagefault is reset when the engine is
reset because any future exec queue bans likely were not caused by said
pagefault after the reset.

Also add a tracker that counts the number of times the drm client has
experienced an engine reset.

Finally, add a new query to xe_query that reports these drm client reset
stats back to the user.

v2: Report the per drm client reset stats as a query, rather than
    coopting xe_drm_client_fdinfo (Joonas)

v3: Report EOPNOTSUPP during the reset stats query if CONFIG_PROC_FS
    is not set in the kernel config, as it is required to trace the
    reset count and exec queue bans.

Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
CC: Tvrtko Ursulin <tvrtko.ursulin@ursulin.net>
CC: Lucas de Marchi <lucas.demarchi@intel.com>
CC: Matthew Brost <matthew.brost@intel.com>
CC: Simona Vetter <simona.vetter@ffwll.ch>

Jonathan Cavitt (6):
  drm/xe/xe_exec_queue: Add ID param to exec queue struct
  drm/xe/xe_gt_pagefault: Migrate pagefault struct to header
  drm/xe/xe_drm_client: Add per drm client pagefault info
  drm/xe/xe_drm_client: Add per drm client reset stats
  drm/xe/xe_query: Pass drm file to query funcs
  drm/xe/xe_query: Add support for per-drm-client reset stat querying

 drivers/gpu/drm/xe/xe_drm_client.c       |  66 ++++++++++++++
 drivers/gpu/drm/xe/xe_drm_client.h       |  44 +++++++++
 drivers/gpu/drm/xe/xe_exec_queue.c       |   8 ++
 drivers/gpu/drm/xe/xe_exec_queue_types.h |   2 +
 drivers/gpu/drm/xe/xe_gt_pagefault.c     |  46 ++++------
 drivers/gpu/drm/xe/xe_gt_pagefault.h     |  28 ++++++
 drivers/gpu/drm/xe/xe_guc_submit.c       |  19 ++++
 drivers/gpu/drm/xe/xe_hw_engine.c        |   4 +
 drivers/gpu/drm/xe/xe_hw_engine_types.h  |   8 ++
 drivers/gpu/drm/xe/xe_query.c            | 109 ++++++++++++++++++++---
 include/uapi/drm/xe_drm.h                |  50 +++++++++++
 11 files changed, 345 insertions(+), 39 deletions(-)

-- 
2.43.0



Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds