memcg: Support per-memcg KSM metrics
From: | xu.xin16-AT-zte.com.cn | |
To: | <akpm-AT-linux-foundation.org>, <shakeel.butt-AT-linux.dev>, <hannes-AT-cmpxchg.org>, <mhocko-AT-kernel.org>, <roman.gushchin-AT-linux.dev> | |
Subject: | [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics | |
Date: | Sun, 21 Sep 2025 23:07:26 +0800 | |
Message-ID: | <20250921230726978agBBWNsPLi2hCp9Sxed1Y@zte.com.cn> | |
Cc: | <david-AT-redhat.com>, <chengming.zhou-AT-linux.dev>, <xu.xin16-AT-zte.com.cn>, <muchun.song-AT-linux.dev>, <linux-kernel-AT-vger.kernel.org>, <linux-mm-AT-kvack.org>, <cgroups-AT-vger.kernel.org> | |
Archive-link: | Article |
From: xu xin <xu.xin16@zte.com.cn> v2->v3: ------ Some fixes of compilation error due to missed inclusion of header or missed function definition on some kernel config. https://lore.kernel.org/all/202509142147.WQI0impC-lkp@int... https://lore.kernel.org/all/202509142046.QatEaTQV-lkp@int... v1->v2: ------ According to Shakeel's suggestion, expose these metric item into memory.stat instead of a new interface. https://lore.kernel.org/all/ir2s6sqi6hrbz7ghmfngbif6fbgms... Background ========== With the enablement of container-level KSM (e.g., via prctl [1]), there is a growing demand for container-level observability of KSM behavior. However, current cgroup implementations lack support for exposing KSM-related metrics. So add the counter in the existing memory.stat without adding a new interface. To diaplay per-memcg KSM statistic counters, we traverse all processes of a memcg and summing the processes' ksm_rmap_items counters instead of adding enum item in memcg_stat_item or node_stat_item and updating the corresponding enum counter when ksmd manipulate pages. Now Linux users can look up all per-memcg KSM counters by: # cat /sys/fs/cgroup/xuxin/memory.stat | grep ksm ksm_rmap_items 0 ksm_zero_pages 0 ksm_merging_pages 0 ksm_profit 0 Q&A ==== why don't I add enum item in memcg_stat_item or node_stat_item like other items in memory.stat ? I tried the way of adding enum item in memcg_stat_item and updating them when ksmd manipulate pages, but it failed with error statistic ksm counters of memcg. This is because of the following reasons: 1) The KSM counter of memcgroup can be correctly incremented, but cannot be properly decremented. E,g,, when ksmd scans pages of a process, it can use the mm_struct of the struct ksm_rmap_item to reverse-lookup the memcg and then increase the value via mod_memcg_state(memcg, MEMCG_KSM_COUNT, 1). However, when the process exits abruptly, since ksmd asynchronously scans the mmslot list in the background, it is no longer able to correctly locate the original memcg through mm_struct by get_mem_cgroup_from_mm(), as the task_struct has already been freed. 2) The first issue could potentially be addressed by adding a memcg pointer directly into the ksm_rmap_item structure. However, this increases memory overhead, especially when there are a large number of ksm_rmap_items in the system (due to a high volume of pages being scanned by ksmd). Moreover, this approach does not resolve the same problem for ksm_zero_pages, because updates to ksm_zero_pages are not performed through ksm_rmap_item, but rather directly during unmap or page table entry (pte) faults based on the mm_struct. At that point, if the process has already exited, the corresponding memcg can no longer be accurately identified. xu xin (6): memcg: add per-memcg ksm_rmap_items stat memcg: show ksm_zero_pages count in memory.stat memcg: show ksm_merging_pages in memory.stat ksm: make ksm_process_profit available on CONFIG_PROCFS=n memcg: add per-memcg ksm_profit Documentation: add KSM statistic counters description in cgroup-v2.rst Documentation/admin-guide/cgroup-v2.rst | 17 ++++++ include/linux/ksm.h | 1 + mm/ksm.c | 70 ++++++++++++++++++++++--- mm/memcontrol.c | 5 ++ 4 files changed, 85 insertions(+), 8 deletions(-) -- 2.25.1