|
|
Log in / Subscribe / Register

The misc control group

By Jake Edge
May 18, 2021

Control groups (cgroups) are meant to limit access to a shared resource among processes in the system. One such resource is the values used to specify an encrypted-memory region for a virtual machine, such as the address-space identifiers (ASIDs) used by the AMD Secure Encrypted Virtualization (SEV) feature. Vipin Sharma set out to add a control group for these ASIDs back in September; based on the feedback, though, he expanded the idea into a controller to track and limit any countable resource. The patch set became the controller for the misc control group and has been merged for Linux 5.13.

The underlying idea is to allow administrators (or cloud orchestration systems) to enforce limits on the number of these IDs that can be consumed by the processes in a control group. In a cloud setting, those processes could correspond to virtual machines being run under KVM. The initial posting for ASIDs was met with a suggestion from Sean Christopherson to expand the reach of the controller to govern more types of encryption IDs beyond just those used by AMD SEV. Intel has an analogous Trust Domain Extensions (TDX) feature that uses key IDs, which are also a resource that may need limiting. The s390 architecture has its secure execution IDs (SEIDs), as well; those are far less scarce than the others, but could still benefit from a controller to limit the consumption of them.

All of that led Sharma to make the controller more generic so that it could be used by TDX IDs and SEIDs as well. By January, the "encryption ID controller" patch set had reached version 4, but maintainer Tejun Heo was concerned about enshrining hardware-specific control knobs in the control-groups subsystem:

I'm very reluctant to ack vendor specific interfaces for a few reasons but most importantly because they usually indicate abstraction and/or the underlying feature not being sufficiently developed and they tend to become baggages after a while.

He and Sharma talked past each other a bit in the discussion, but eventually Heo said that because the landscape for encryption IDs is still immature, he would prefer a different approach. Instead of tying the controller to encryption IDs, it would be for miscellaneous (misc) items that can be tracked by number up to a maximum for the control group (or system as a whole). "So, behavior-wise, not that different from the proposed code. Just made generic into a misc controller."

Sharma agreed with that plan and posted an RFC patch for a misc controller in mid-February. There have been three subsequent versions posted, but the form of the controller has stayed basically the same throughout. The patches also add SEV ASIDs and the related (but distinct) SEV Encrypted State (SEV-ES) IDs as two quantities to be controlled. In a kernel built with CONFIG_CGROUP_MISC on a suitably equipped AMD CPU, the root control group will have a misc.capacity file that shows the number of available IDs in each category:

    $ cat misc.capacity
    sev 50
    sev_es 10

More generally, a system that has two resources managed by the controller with the names "res_a" and "res_b" will display them in the files in the control-group hierarchy. The misc.capacity root file is read-only and reflects the amount of the resources for the whole system; two other files appear in the non-root control groups, which can be used to limit the resources and to monitor their use:

    $ cat misc.current
    res_a 3
    res_b 0
    
    $ cat misc.max
    res_a 10
    res_b 4
As might be guessed, misc.current reports on the current usage by the group, while misc.max reports the setting for the maximum allowed for the group. misc.max is a read-write file, unlike the other two, so setting the maximum can be done as follows:
    # echo res_a 1 > misc.max
    # echo res_b max > misc.max
The first sets the maximum for res_a to one, while the second sets res_b to the maximum allowed for the group (which could be less than the system maximum due to limits in one of its parent control groups).

The patch adding the two types of SEV ASIDs shows the steps needed to add other resources, such as TDX IDs or SEIDs, to the controller. An entry gets added to the misc_res_types enum and a corresponding name is added to the misc_res_names array. Before the resource can be used, the initialization must set the system-wide capacity using:

    int misc_cg_set_capacity(enum misc_res_type type, unsigned long capacity);
When one of the resources is needed or released, the charge and uncharge API is used:
    int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
			   unsigned long amount);
    void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg,
			  unsigned long amount);

One thing to note is that, in keeping with the ideas behind version 2 of control groups, migrating a process to a different control group does not change the accounting. The control group that contained the process when the resource was acquired will continue to be charged for it until it is freed. The caller needs to track the control group that was charged so that the uncharge can be done for the proper group. Having the charge follow the process as it migrates came up in review; Jacob Pan asked about adding charge migration because he was looking at using the misc controller to limit ASIDs used for I/O via DMA (IOASIDs). Heo was clear that adding charge migration to the misc controller was unlikely:

Please note that cgroup2 by and large don't really like or support charge migration or even migrations themselves. We tried that w/ memcg on cgroup1 and it turned out horrible. The expected usage model as [described] in the doc is using migration to seed a cgroup (or even better, use the new clone call to start in the target cgroup) and then stay there until exit. All existing controllers assume this usage model and I'm likely to nack deviation unless there are some super strong justifications.

As it turns out, there may be no real use cases for migrating processes after they have acquired IOASIDs, so Pan plans to use the misc controller, at least for now.

In truth, the misc controller is "a bit of cop-out", as Heo put it. He does not believe that these hardware features are necessarily going to be around "forever" so he is loath to tie the control-groups subsystem to them for the long term:

My take is that the underlying hardware feature isn't mature enough to have reasonable abstraction built on top of them. Given time, maybe future iterations will get there or maybe it's a passing fad and people will mostly forget about these.

But, cop-out or no, the misc controller is now open for business. It would seem there are several other candidates for being added to it; others may well arise in the coming months. For simple resources that just need to be tracked and limited based on their count, the misc controller seems like it will do the job.



to post comments

The misc control group

Posted May 20, 2021 8:37 UTC (Thu) by jengelh (subscriber, #33263) [Link] (1 responses)

>to track and limit any countable resource

Do computers even have uncountable resources at this point? And even if we do, it does not seem like that would be a showstopper, as one would then track and limit using ranges rather than counts, or something along that line.

The misc control group

Posted May 20, 2021 14:50 UTC (Thu) by zlynx (guest, #2285) [Link]

Memory and cache bandwidth use.

Yes you can find it through performance counters, but most people don't.

The misc control group

Posted May 5, 2022 15:02 UTC (Thu) by ahmet (subscriber, #117402) [Link]

Hi Jake, hope you write about cgroups v2 sometime. More users are migrating that way now.


Copyright © 2021, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds