This article has a lot of good information, much of which I was completely unfamiliar with until
now. It seldom, however, talks about non-x86 architectures. I've been doing a lot of PowerPC
work over the past few years, and the cache implementation is somwhat different. (It also differs
between PowerPC families). I've been using systems that have L1i, L1d, L2i and L2d caches.
That's right, the L2 is still divided between i and d. If you want to do self-modifying code, you
have to explicitly invalidate the instruction cache over that region or you have a good chance of
getting the instructions if there were previously instructions in those locations.
The discussion of how the cache tags/lines/sets are managed is pretty close, but the 32 bit PPCs
in the 7450 line such as the 7448 have three address layers; virtual, effective, and physical.
The 32 bit virtual address maps to an effective address that is 54 bits (I think) wide, which is then
mapped to a 36 bit physical address, which is then passed to the system controller. The cache is
associated with the effective address, so if two tasks are sharing the same data at different
virtual addresses that map to the same effective address, and that address is in a cacheable
region, you don't end up with two copies of the data in the cache. There are a lot of other
variations on other architectures. The PPC7450 series also provides very lightweight advisory
instructions that give hints to the cache controller to pre-fetch data before the instructions that
need that data are reached. These instructions get serviced out of order, and (if I read the
documentation correctly) do not occupy a space in the pipeline.
Quite a few other things covered don't apply to non-x86 style systems. This is not, in itself a
failure on the part of the author, though he ought to make it explicit that he's only covering the
Intel/AMD/Cyrix/VIA world, not the PPC, SPARC, or Alpha.
The only thing I was disappointed in is that he appears to have skipped write-through vs. write-
back cache strategies and cache locking. Some cache systems give you a choice of how to
handle writes (made by the OS, not the application), and some give the system the ability to lock
a range of addresses into i or d cache (L1 and/or L2). At least some of the AMCC PPC440 models
(I think all) allow some or all of L2 cache to be used as static RAM or L2 cache.
Overall, it's a good article. I will be emailing my typographical comments, as requested.
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds