| From: |
| Harry Yoo <harry.yoo-AT-oracle.com> |
| To: |
| akpm-AT-linux-foundation.org, vbabka-AT-suse.cz |
| Subject: |
| [RFC PATCH V3 0/7] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space |
| Date: |
| Mon, 27 Oct 2025 21:28:40 +0900 |
| Message-ID: |
| <20251027122847.320924-1-harry.yoo@oracle.com> |
| Cc: |
| andreyknvl-AT-gmail.com, cl-AT-linux.com, dvyukov-AT-google.com, glider-AT-google.com, hannes-AT-cmpxchg.org, linux-mm-AT-kvack.org, mhocko-AT-kernel.org, muchun.song-AT-linux.dev, rientjes-AT-google.com, roman.gushchin-AT-linux.dev, ryabinin.a.a-AT-gmail.com, shakeel.butt-AT-linux.dev, surenb-AT-google.com, vincenzo.frascino-AT-arm.com, yeoreum.yun-AT-arm.com, harry.yoo-AT-oracle.com, tytso-AT-mit.edu, adilger.kernel-AT-dilger.ca, linux-ext4-AT-vger.kernel.org, linux-kernel-AT-vger.kernel.org |
| Archive-link: |
| Article |
RFC v2: https://lore.kernel.org/linux-mm/20250827113726.707801-1-...
RFC v2 -> v3:
- RFC v3 now depends on the patch "[PATCH V2] mm/slab: ensure all metadata
in slab object are word-aligned"
- During the merge window, the size of ext4 inode cache has shrunken
and it couldn't benefit from the change anymore as the unused space
became smaller. But I somehow found a way to shrink the size of
ext4 inode object by a word...
With new patch 1 and 2, now it can benefit from the optimization again.
- As suggested by Andrey, SLUB now disables KASAN and KMSAN, and reset the
kasan tag instead of unpoisoning slabobj_ext metadata (Patch 5).
When CONFIG_MEMCG and CONFIG_MEM_ALLOC_PROFILING are enabled,
the kernel allocates two pointers per object: one for the memory cgroup
(obj_cgroup) to which it belongs, and another for the code location
that requested the allocation.
In two special cases, this overhead can be eliminated by allocating
slabobj_ext metadata from unused space within a slab:
Case 1. The "leftover" space after the last slab object is larger than
the size of an array of slabobj_ext.
Case 2. The per-object alignment padding is larger than
sizeof(struct slabobj_ext).
For these two cases, one or two pointers can be saved per slab object.
Examples: ext4 inode cache (case 1) and xfs inode cache (case 2).
That's approximately 0.7-0.8% (memcg) or 1.5-1.6%% (memcg + mem profiling)
of the total inode cache size.
Implementing case 2 is not straightforward, because the existing code
assumes that slab->obj_exts is an array of slabobj_ext, while case 2
breaks the assumption.
As suggested by Vlastimil, abstract access to individual slabobj_ext
metadata via a new helper named slab_obj_ext():
static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
unsigned long obj_exts,
unsigned int index)
{
return (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index);
}
In the normal case (including case 1), slab->obj_exts points to an array
of slabobj_ext, and the stride is sizeof(struct slabobj_ext).
In case 2, the stride is s->size and
slab->obj_exts = slab_address(slab) + s->red_left_pad + (offset of slabobj_ext)
With this approach, the memcg charging fastpath doesn't need to care the
storage method of slabobj_ext.
Harry Yoo (7):
mm/slab: allow specifying freepointer offset when using constructor
ext4: specify the free pointer offset for ext4_inode_cache
mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
mm/slab: use stride to access slabobj_ext
mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison
mm/slab: save memory by allocating slabobj_ext array from leftover
mm/slab: place slabobj_ext metadata in unused space within s->size
fs/ext4/super.c | 20 ++-
include/linux/slab.h | 9 ++
mm/memcontrol.c | 34 +++--
mm/slab.h | 94 ++++++++++++-
mm/slab_common.c | 8 +-
mm/slub.c | 304 ++++++++++++++++++++++++++++++++++++-------
6 files changed, 398 insertions(+), 71 deletions(-)
--
2.43.0