Cache sizes

Posted Nov 5, 2024 8:06 UTC (Tue) by anton (subscriber, #25547)
In reply to: Cache sizes by himi
Parent article: Kernel optimization with BOLT

The L2 cache of many cores is dedicated to the core, too (e.g., on Intel's P-cores for over a decade and on AMD's Zen-Zen5 cores).

The reason for keeping the L1 cache small is latency. If the cache grows, the miss rate decreases, but the latency increases. You can see the longer latency nicely in the comparison of L2 sizes and latencies in this article.

One reason is that the wires get longer, which increases the time that signals travel.

You also want to use a virtually-indexed physically-tagged (VIPT) cache as L1 cache, which allows to perform the TLB access and the cache access in parallel, i.e., with low latency. But that means that the size of a cache way is at most as large as a page; the number of ways is limited (you typically don't see more than 16-way set-associative caches, and a lower number of ways is common in L1 caches), the page size is 4KB on AMD64, which limits the L1 cache sizes to 64KB (and 32KB or 48KB is more common). Apple's Firestorm (M1 P-core) has larger caches (192KB I-cache, 128KB D-cache), but also 16KB pages, which allows a VIPT cache implementation with a 12-way (I) or 8-way (D) set-associative cache.

Cache sizes

Posted Nov 8, 2024 14:42 UTC (Fri) by raven667 (subscriber, #5198) [Link]

Some of what you describe is the fact that while software people sort of sort of imagine the computer operates in a virtual realm of abstract logic, hardware is actually a physical electrical device and you can't just abstractly put "more cache" on it the way you could refactor a software program because of the physical reality of electrical circuits and wiring that is the computer.