|
|
Log in / Subscribe / Register

drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4)

From:  Dave Airlie <airlied-AT-gmail.com>
To:  dri-devel-AT-lists.freedesktop.org, tj-AT-kernel.org, christian.koenig-AT-amd.com, Johannes Weiner <hannes-AT-cmpxchg.org>, Michal Hocko <mhocko-AT-kernel.org>, Roman Gushchin <roman.gushchin-AT-linux.dev>, Shakeel Butt <shakeel.butt-AT-linux.dev>, Muchun Song <muchun.song-AT-linux.dev>
Subject:  drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4)
Date:  Thu, 16 Oct 2025 12:31:28 +1000
Message-ID:  <20251016023205.2303108-1-airlied@gmail.com>
Cc:  cgroups-AT-vger.kernel.org, Dave Chinner <david-AT-fromorbit.com>, Waiman Long <longman-AT-redhat.com>, simona-AT-ffwll.ch
Archive-link:  Article

Hi all,

This is a another repost with some fixes and cleanups. I've added Christian's acks/reviews from the
previous round. I've fixed the obj_cgroup_put into the core, instead of in the drivers.

I'd really like to land this into drm-next, I've added Maarten xe support patch to this. I'd like
to get any missing acks/reviews.

Christian, I think you said patch 4 got lost last time, hopefully you get it this time.

Patches still needing ack/review:
ttm/pool: drop numa specific pools
ttm/pool: track allocated_pages per numa node.
ttm: add objcg pointer to bo and tt (v2)
ttm/pool: enable memcg tracking and shrinker. (v2)
amdgpu: add support for memory cgroups

Differences since v1 posting:
1. added ttm_bo_set_cgroup wrapper - the cgroup reference is passed to the ttm object.
2. put the cgroup reference in ttm object release
3. rebase onto 6.19-rc1
4. added xe support patch from Maarten.

Differences since v2 posting:
1. Squashed exports into where they are used (Shakeel)
2. Fixed bug in uncharge path memcg
3. Fixed config bug in the module option.

Differences since 1st posting:
1. Added patch 18: add a module option to allow pooled pages to not be stored in the lru per-memcg
   (Requested by Christian Konig)
2. Converged the naming and stats between vmstat and memcg (Suggested by Shakeel Butt)
3. Cleaned up the charge/uncharge code and some other bits.

Dave.

Original cover letter:
tl;dr: start using list_lru/numa/memcg in GPU driver core and amdgpu driver for now.

This is a complete series of patches, some of which have been sent before and reviewed,
but I want to get the complete picture for others, and try to figure out how best to land this.

There are 3 pieces to this:
01->02: add support for global gpu stat counters (previously posted, patch 2 is newer)
03->06: port ttm pools to list_lru for numa awareness
07->13: add memcg stats + gpu apis, then port ttm pools to memcg aware list_lru and shrinker
14: enable amdgpu to use new functionality.
15: add a module option to turn it all off.

The biggest difference in the memcg code from previously is I discovered what
obj cgroups were designed for and I'm reusing the page/objcg intergration that 
already exists, to avoid reinventing that wheel right now.

There are some igt-gpu-tools tests I've written at:
https://gitlab.freedesktop.org/airlied/igt-gpu-tools/-/tr...

One problem is there are a lot of delayed action, that probably means the testing
needs a bit more robustness, but the tests validate all the basic paths.

Regards,
Dave.





Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds