|
|
Log in / Subscribe / Register

Format-specific compression with OpenZL

By Jake Edge
January 14, 2026

OSS Japan

Lossless data compression is an important tool for reducing the storage requirements of the world's ever-growing data sets. Yann Collet developed the LZ4 algorithm and designed the Zstandard (or Zstd) algorithm; he came to the 2025 Open Source Summit Japan in Tokyo to talk about where data compression goes from here. It turns out that we have reached a point where general-purpose algorithms are only going to provide limited improvement; for significant increases in compression, while keeping computation costs within reason for data-center use, turning to format-specific techniques will be needed.

[Yann Collet]

Zstandard was introduced ten years ago and "it offered really much better performance tradeoffs than what existed before", Collet began. The alternatives were zlib, which was a "very good middle ground for decent speed and decent compression ratio", but was not fast enough, and LZ4, which provided much better compression speed but did not compress the data enough. Zstandard quickly supplanted the others because it was fundamentally better for size and speed. In the years since, Zstandard has improved, especially in its decompression speed, but those advances are still fairly modest. "We are reaching the limits of that technology."

In looking at what can be done to improve things, there are other problems beyond just the diminishing returns. The Zstandard format is limiting; with a new format, gains of 2-3% for compression ratio and 10-20% for speed are possible. "Is it worth it?", he asked. It is not really about the time needed to develop the new format, but that there is a huge ecosystem of Zstandard users that would need to change, which is extremely costly. He does not think there would be a serious shift to a new format unless it offered overwhelming advantages. "If we introduce a new compressor, it has to be vastly better."

There are other options, such as copy-based algorithms (e.g. LZ78), which copy repeated data from the compression dictionary to reconstruct the original; they can meet the needs for data-center compression, but they converge toward the same limits as Zstandard. That convergence was surprising, Collet said, because the techniques are quite different, but it stems from the fact that all of them make no assumptions about the data and simply treat it as a stream of undifferentiated bytes. There are high-compression algorithms that can achieve better results (e.g. PPM) but they run too slowly for data-center applications.

Format specific

Compressors that are only concerned with a specific format can do much better. For a trivial example, a simple array of consecutive integer values cannot be compressed by algorithms like LZ because there are no repetitions. A simple delta transformation turns that into something that can be heavily compressed, however. "If we know what we are compressing, it's not just a bunch of bytes, [...] it opens more options and, because we have more options, we should be able to compress better."

A more real example is a compressor for the Smithsonian Astrophysical Observatory star catalog format, known as "SAO". It is part of the Silesia compression corpus, which consists of data sets that are used to compare compression algorithms. "It's very well defined", with a header followed by an array of 28-byte structures with fixed fields and types.

Turning the array of structures into a structure of arrays is a "trivial transformation"; each array is homogeneous and it can be analyzed separately. For example, the first two fields in the structure are 64-bit X and Y positions. The X values are mostly sorted, so a delta compression gives good results; the Y values are bounded and have a limited number of values compared to the range, so a transpose transformation can focus on compressing the high (largely unchanging) bytes, while other techniques can be applied to the subset of all the possible values for the low bytes. Other fields have properties that can be exploited as well.

He compared the results of a few different compressors on the SAO file. Zstandard using its default (i.e. zstd -3) reduced the 7.2MB file to 5.5MB for a 1.31 compression factor, which is not great. The data is numeric, which Zstandard is not particularly good at compressing, and the SAO file is "packed with information", lacking zeroes and repeating sequences, "it is difficult to compress". But the speed of Zstandard is good, Collet said, compressing at 100MB per second and decompressing at 750MB/s; "if you want to deploy something in a data center, you want this kind of speed".

He compared the "best of the best" widely available compression (lzma -9), which got much better compression (4.4MB or 1.64 compression factor), but the speed was not adequate for deployment (2.9MB/s compression, 45MB/s decompression). For another data point, he used cmix, which is an experimental compressor by Byron Knoll; you would not deploy it, he said, but "it's recognized as the best compressor out there". It reduced the SAO file to 3.7MB, which is almost a factor of two, but compressing and decompressing can only be done at 0.001MB/s.

Those results set the goals for the SAO-specific compressor: a factor of around two and speed like that of Zstandard. It achieves those goals easily, with a compressed size of 3.5MB (2.06 compression factor) and speeds faster than those of Zstandard (215MB/s compression, 800MB/s decompression). "Here we have enough gains to justify deploying something new in our data centers; this is the next step we were looking for." It turns out that knowing anything about the data gives a major advantage in compression; it is "an insane advantage, a way too large advantage to ignore".

Drawbacks

There are some problems in switching to format-specific compression, starting with the need to design algorithms for the formats. It will require engineers, hopefully with data-compression experience, some time to understand the format and devise an algorithm for it. That typically takes around 18 months, he said, "and you don't know in advance what you will get"—it is not just time and money, but there is uncertainty as well.

Once a good algorithm has been found, there will be a need to optimize it and to safeguard it against attacks. "Every codec [compressor/decompressor] is an injection point." Since there are lots of formats, and there is a need to be cost effective with developing these compressors, developers may rely on only handling "safe" data instead of spending the effort on fuzzing and other techniques for hardening. After a while, the codec may slowly start being used on less-safe data, resulting in vulnerabilities and attacks, however.

Once a codec is ready for deployment, there are still hurdles to overcome. Decompressors must be deployed everywhere the data may need to be accessed, which is not necessarily as easy as it sounds. That may include thousands (or hundreds of thousands) of servers all over the world, clients of various sorts, and so on; it is not uncommon that it takes longer to deploy a new compression algorithm than it did to develop it, Collet said.

There is also a large maintenance cost associated with a format-specific compression. In addition, if the format needs to change, the compressor will also, and all of the deployment woes arise again. The original developers may well have moved on to other things, so finding people to work on it may be hard and take time. This becomes a "silent velocity obstacle", because no one wants to consider changing the format, even if there would be large benefits to doing so, because it is so daunting.

Enter OpenZL

So there is a tension between the promise of format-specific compression and the problems that can come from using it. But the truth is that those problems already exist, Collet said, because in every large organization there are already groups using these compression techniques; "the gains are so huge" that they get adopted piecemeal. "OpenZL is our answer to this tension; we believe that this solution solves all the problems that were just mentioned."

OpenZL has a core library and tools that allow creating specialized compressors. He likened it to the OpenGL graphics API, which "is not a 3D app but is a set of primitives to do a 3D app"; similarly, the OpenZL library gives users primitives to build their own compressor. The idea is to define compressors as graphs of pre-validated codecs, so that the these different pieces can be combined in a myriad of different ways to produce compressors—"pretty much like Lego".

Using those codecs will allow creating new compressors in a matter of days, instead of months. The graphs provide an enormous search space, by human standards, but that space is not particularly large for computers, so it can be systematically searched. "We can provide tools that will do this work of finding the best arrangement of codecs and will give you an answer in minutes." That is a game-changer, he said; users can know quickly whether it even makes sense to pursue a format-specific compressor.

Assuming that it does make sense, the "deployment bottleneck" will soon rear its head. OpenZL avoids that by having a unified decompression engine that can handle any graph, so there is only one program that needs to be deployed. Updates and changes to the compressor are simply new configurations; transitions can be handled by supporting multiple graphs for a format. In addition, graphs can even be changed dynamically during compression if desired. The maintenance headaches are reduced, as well, since there is only a single code base that needs attention for bug fixes, performance improvements, and security upgrades.

It is natural to think of these graphs as being static, but that is not the reality. These compressors have a selector that chooses a graph by analyzing the data, so the graph for a format can change based on the input. The intent is to maintain performance, he said, but, more importantly, to handle exceptions. If an integer array is expected, but text is found, using a numeric compressor "is going to end badly"; that should be detected and a switch made to Zstandard, which is the fallback codec.

The first step to generating an OpenZL compressor is describing the data format. There are already around a dozen formats supported by OpenZL and dozens more will be added over the next few months, he said. Those will only cover common formats, however, so others will need to be described, either by providing a parser function or by using the Simple Data Description Language (SDDL) compiler.

SDDL can describe straightforward formats easily; it can also handle more complex formats, "but at some point, it is no longer the right tool". If creating the SDDL becomes too difficult, the work can be outsourced to an LLM "and it actually works", he said. There is one prompt that teaches the LLM about the SDDL syntax and then it can be asked to generate the SDDL. "If it's a good LLM, it should work well; like every LLM, you should read it." It is approaching the point where no programming at all will be needed to do this, Collet said.

OpenZL has tools that will use the description of the data and some sample files to create multiple compressors in a few minutes. Those different compressors allow users to choose the tradeoffs that matter to them: faster speed or more compression. In order to compress a file using one of them, the description of it, called a serialized compressor, is specified along with the file to compress. Decompression does not need to specify the compressor because the description is stored in the compressed data.

Any of the steps can be done manually, which might be somewhat painful, but means that everything about the compressor can be examined. "We can observe it, we can change it, we can see if we can find something better". That is important for debugging and research into compression techniques.

He showed some graphs comparing OpenZL compression to existing tools, but noted that "it's not a fair fight". The graphs show OpenZL doing much better than the competition. That's the whole point of OpenZL, he said, "if you know something about your data, why not use it to get better performance?"

OpenZL is already deployed widely at his employer, Meta. One of the main workloads at Meta is LLMs, so there is a lot of data to handle. The Meta system is set up to constantly monitor the data being generated, periodically retrain the compressors based on that, and then deploy the resulting compressed files immediately—the decompressor can always handle the result. He noted that compression is not only about saving storage, it is also about transmission time savings for moving data around—to and from GPUs, for example. That directly translates to higher compute utilization.

OpenZL is open source and available on GitHub (under the three-clause BSD license). The quick start instructions are straightforward, Collet said; following those steps will introduce all of the new concepts and tools. "It's not Zstandard++, this thing is different", so there are more steps and users need to invest some time to come up to speed. If they do, they will get better compression and more speed, however; "the difference is stark".

It has not yet reached a 1.0 release, because the OpenZL developers believe the final wire protocol needs to be built with the community. Over the next few years, the idea is to engage with the community to ensure that all of the different use cases are covered. In addition, there is work on getting OpenZL acceleration working directly in various types of hardware: CPUs, GPUs, and ASICs. That will take some time, "but we expect to see the result of that before the end of the decade", he concluded.

Interested readers may wish to view the YouTube video of the talk or look at Collet's slides.

[ I would like to thank the Linux Foundation, LWN's travel sponsor, for assistance with traveling to Tokyo for Open Source Summit Japan. ]

Index entries for this article
ConferenceOpen Source Summit Japan/2025


to post comments

Prior similar art (?) - ZPAQ

Posted Jan 14, 2026 19:17 UTC (Wed) by Hobart (subscriber, #59974) [Link]

Matt Mahoney's ZPAQ (2009) is a similar lossless-compression implementation language that this reminded me of.

https://en.wikipedia.org/wiki/ZPAQ

Collet is GOAT in compression algorithms

Posted Jan 14, 2026 21:15 UTC (Wed) by NHO (subscriber, #104320) [Link] (1 responses)

Collet's blog about compression and his development of Zstandard is fantastic and is recommended reading, I feel. https://fastcompression.blogspot.com/

I would love to read a similar series of articles on OpenZL.

Collet is GOAT in compression algorithms

Posted Jan 15, 2026 5:52 UTC (Thu) by wtarreau (subscriber, #51152) [Link]

Seconded! I had been following his XXH, LZ4, FSE and Zstd development years ago on his blog, as well as on encode.su (.ru by then) and his approach has always been very pragmatical, never strongly tied to his code, only reusing proven useful pieces and making forward progress in directions that deserved being explored. I, too, encourage reading his blog for anyone interested in data compression theory and the cache hierarchy impact on data access times.

Integration into file formats.

Posted Jan 14, 2026 22:21 UTC (Wed) by himi (subscriber, #340) [Link] (9 responses)

Given the concerns about applying a format-specific compression algorithm to generalised data (both in terms of performance and safety), perhaps it would make sense to apply this kind of approach to the file formats themselves? i.e. integrate data compression into the file format itself rather than handing it off to a generalised compression tool. Obviously that would mean designing new file formats, or new versions of existing ones, as well as tooling transition from old to new formats (where that makes sense), but the payoffs could be pretty significant - there's a *lot* of big data sets being shuffled around the world these days, better compression of that data would save on both transfer time/network utilisation and storage requirements; better compression/decompression performance would be a nice cherry on top.

This seems like it would be entirely compatible with the OpenZL approach, though I think you'd need additional tooling to support this kind of use case. You'd also want to make sure there was lots of information about how to design file formats to suit this model, particularly the trade-offs between different data layouts; probably also consideration of archival versus live data formats (with archival being designed for maximum compression efficiency, versus the live format optimising for whatever IO patterns your active use case requires), and streaming versus random-access, and probably a bunch of other considerations I haven't thought of . . . In fact, the world in general could benefit quite a bit from having a readily available knowledge-base about designing good file formats, particularly if that was supported by high quality tooling and libraries.

Of course you'd still need to support the generalised use cases, and the current OpenZL model of special-case with fallback to general also makes lots of sense (there's a lot of uncompressed data already out there, after all), but building good support for compression into the file formats themselves seems like a reasonable next step, and supporting the development of better file formats in general would be a pretty good end goal.

Integration into file formats.

Posted Jan 14, 2026 22:45 UTC (Wed) by jepsis (subscriber, #130218) [Link] (3 responses)

OpenZL is not a file format. It is a universal and self-describing compression layer and does not need a format-specific compression format.
File format design is still important. OpenZL works on top of existing formats and removes the need for custom compression codecs, while still using the data structure.

Integration into file formats.

Posted Jan 15, 2026 2:19 UTC (Thu) by himi (subscriber, #340) [Link]

Yes, I was suggesting that the lessons learned from creating data-format-specific compression logic could feed into designing file formats that incorporate that logic from the start, with the various OpenZL components being used in the actual implementation of the tooling for the new file formats. So rather than taking an existing '.foo' file, compressing it with a foo-specific profile, and storing the compressed stream in a '.foo.zl' file, you'd incorporate the foo-specific compression logic into the libfoo library (with your implementation making use of OpenZL components), and create a new '.fooz' file format that directly integrated the compressed stream(s) of data. After all, since libfoo is obviously specialised in handling this particular type of data, it seems like a good place to put specialised knowledge about how best to handle compressing that data - at least, in a world where there's tooling which can make it relatively easy to do that.

You can do something similar as it stands with existing compression libraries, but it's a lot of work for not much gain over using a general tool for whole-file compression. What the OpenZL project brings is a body of knowledge about data compression in general that can be used to inform the way that you set up your data streams to allow the best possible results, and a bunch of code that makes it easy to create a highly specialised compression pipeline - if that gets you something two thirds the size of the old '.foo.gz' files that can be compressed and decompressed in half the time it may well be worth the effort.

The body of knowledge could also feed more broadly into file format design choices - if laying your data out one way versus another costs you (say) 10% in terms of zstd compressed file size, that's kind of useful to know even if you're not going to try and make a super-specialised compression tool. As far as I know that sort of knowledge base doesn't exist at present.

Integration into file formats.

Posted Jan 15, 2026 2:48 UTC (Thu) by jepsis (subscriber, #130218) [Link] (1 responses)

Automatic decompression for such a file format is easy. Compression is the hard part. To write efficient representation you need clear intent i.e. how the data is expected to be used (streaming, random access, read-heavy, write-heavy), what the lifecycle looks like (archival or live data or if recompression is expected), and how the data is structured internally (schema, value distributions, chunking and ordering). Without this information any attempt to choose compression automatically is mostly guesswork and likely ends up with suboptimal result.

Integration into file formats.

Posted Jan 15, 2026 14:46 UTC (Thu) by willy (subscriber, #9762) [Link]

The two of you may be talking past each other a little. It depends whether this is archival data or working set whether building compression into the file format is a good idea. There's value in "today's data is stored in foo, last year's data is stored in foo.gz". But sometimes we're always dealing with data that needs to be compressed, and then it's worth building it into the file format.

Integration into file formats.

Posted Jan 15, 2026 6:55 UTC (Thu) by martinfick (subscriber, #4455) [Link] (4 responses)

As enticing as this may sound, this has a major drawback that it will not work very well with object stores. If the file/object data is already compressed when it is inserted, then it makes it much harder to perform any sort of cross file or version deltafication, such as what git can do. With many compression formats, if a single byte is altered in the raw data, it may drastically change the compressed output. When this happens, deltafication across file versions becomes almost impossible, or not very useful. It is much better to perform deltafication on the raw data first, and to then compress the deltas.

Another problem you will encounter, perhaps even worse, is with content addressable object stores, here once again git comes to mind. Inserting already compressed data makes it almost impossible to improve upon the original compression, and thus freezes/osifies the compression since any hashes of the content would be performed on the compressed content instead of the raw data. This leaves the storage at the whim of the original compression algorithm and speed settings without ever being able to change things if better algorithms are developed. If the compression were to be changed, the hash of the compressed data would change, and the object store would not see it as the same object even though the raw data would be the same! Instead, if the compression is left up to the storage, the storage will be able to take advantage of new compression techniques as they are developed, or even just the availability of more CPU cycles.

Integration into file formats.

Posted Jan 15, 2026 9:08 UTC (Thu) by himi (subscriber, #340) [Link] (2 responses)

That is indeed an issue . . . and one that probably doesn't have any resolution - if you want to have smarts in the storage layer (underneath the filesystem abstraction), you really need to make sure those smarts can see the raw data rather than any kind of processed format.

But there are definitely scenarios where that approach doesn't work for some reason. The use case that I was thinking of is one that we deal with where I work: we're continuously pulling down large amounts of satellite data (we run a regional hub for the ESA's Copernicus program) - basically a big collection of files, each one unique and unchanging; new data means new files, old files never get touched; if the underlying raw data gets reprocessed (e.g. reprocessing data from older satellites to be consistent with the processing done with current satellites, which happens occasionally) that results in a set of new files *alongside* the old ones. By its very nature the raw data pretty much *has* to have little to no commonality between files - it's sensor data, essentially long strings of numbers with a sizeable random noise component alongside the signal; if your storage layer can do any kind of meaningful deduplication or similar something's probably gone seriously wrong with the satellites. The only thing that's worth doing is compression - improved compression in this use case, both at rest and in flight, would be a major win.

That's what immediately came to my mind, but there's a whole lot of other scientific data sets that will have similar properties, and ideally we'd hang onto those raw data sets essentially indefinitely - there's always potential for extracting new information from data that's already been collected. One nice example is research extracting historical climate data from Royal Navy log books going back more than two hundred years; there's also lots of astronomical research beng done that's mostly reprocessing old raw data, and programs like JWST build that into their foundations - every bit of observational data from JWST will eventually be available for anyone to access and use for their own research.

Which all kind of agrees with your basic argument, I guess - the raw data is critical, you want to process it as little and as late as possible, at the point where you can gain as much value out of it as you can . . . but that means different things for different types of data.

All that said, one of the standard complaints from the data storage team where I work is researchers who keep ten copies of identical data because they can't keep track of where they put things (and then complain about hitting their quota . . . ) - magic in the storage layer to handle that kind of deduplication would definitely be nice.

Integration into file formats.

Posted Jan 15, 2026 11:50 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

> magic in the storage layer to handle that kind of deduplication would definitely be nice.

Isn't this inherent in one the file-systems? ZFS springs to mind?

Some filesystems I believe keep a hash of disk blocks, and if two blocks have the same contents, the overlying files will be changed to point to the same block. Within this, they can either "check on write" and so dedup on the fly, or they do a post-hoc dedupe pass. Either way, I'm sure this functionality is available in at least one regular linux file system.

Cheers,
Wol

Integration into file formats.

Posted Jan 15, 2026 14:51 UTC (Thu) by willy (subscriber, #9762) [Link]

This is the kind of thing that sounds seductively attractive and then you actually try to do it and the metadata needed to keep track of everything blows up exponentially (literally, not in the modern meaning of "a lot"). And fragmentation increases massively, which turns out to matter even on NVMe drives.

There's specialist cases where this makes sense, but it's no free meal. Or maybe it is a free meal, in the sense that the drinks now cost 50% more.

Integration into file formats.

Posted Jan 17, 2026 22:45 UTC (Sat) by cesarb (subscriber, #6266) [Link]

> If the file/object data is already compressed when it is inserted, then it makes it much harder to perform any sort of cross file or version deltafication, such as what git can do. [...] Another problem you will encounter, perhaps even worse, is with content addressable object stores, here once again git comes to mind. Inserting already compressed data makes it almost impossible to improve upon the original compression, and thus freezes/osifies the compression since any hashes of the content would be performed on the compressed content instead of the raw data.

Funny you mention git. Very early in the git history, it worked exactly like that: the object identifier was the hash of the *compressed* data. See https://github.com/git/git/commit/d98b46f8d9a3daf965a39f8... ("Do SHA1 hash _before_ compression.") and https://github.com/git/git/commit/f18ca731663191477613645... ("The recent hash/compression switch-over missed the blob creation."), where it was changed to the current behavior of using the hash of the *uncompressed* data.

Also improves wav compression.

Posted Jan 15, 2026 22:37 UTC (Thu) by gmatht (subscriber, #58961) [Link]

About 30 years back I found that I could zip WAV files better by doing exactly what the OP suggests. By replacing the values in the wav with the difference between the last value would result in a WAV file that the exact same size but zipped to a smaller size.

Of course, a more audio specific format lack FLAC should give even better compression again.

Great !

Posted Jan 17, 2026 9:03 UTC (Sat) by matp75 (subscriber, #45699) [Link]

zstd is really excellent at compression and decompression in general.
It is great that other alternatives are being explored for more format specific data (as of course we already have things specialized for video/audio even if it is usually not lossless)

Consider the resources required for decompression

Posted Jan 18, 2026 16:55 UTC (Sun) by jreiser (subscriber, #11027) [Link] (4 responses)

In the 1970's a friend worked for a US company that designed and built large power transformers and related equipment, then shipped it overseas to Africa for use in the electrical grid of developing countries. I learned that the most important design factor was not volts, amperes, electrical phases, materials, or operating efficiency. Rather, the most imporant constraint was the width of the railroad tunnels between the receiving seaport and the installation. If the equipment could not be transported to the end destination, then nothing else mattered.

Analogously for lossless data compression, then the most important design constraint is the resources required for decompression. If the receiving environment does not have enough RAM for working storage, or enough ROM to store the decompression program, then the compression ratio does not matter. Perhaps a datacenter has vastly more than enough RAM and ROM, but many other environments do not. A child's toy, household appliance, home internet router, electric bicycle, industrial automoton, etc, often operate in resource-poor environments. Think of a 16-bit microcontroller with a total of 128kB of RAM and 256kB of ROM. Even a low-end cellphone with 3MB of RAM and 1MB of ROM can be a tight fit.

Because the output of OpenZL must name (or include) the schema, then there is an opportunity to approach this situation
by labeling each compressed output. "This data was compressed by version 1.2.3 of implementation FOO of Standard BAR. The reference decompressor requires 10MB of working storage and 38kB of code." For any compression system, it would be a great improvement if the constraints of decompression could be expressed as explicit parameters to the compressor. "For decompression, then my embedded device allows 20kB of RAM and 10kB of ROM. Please meet these constraints, or tell me how close you can come."

Consider the resources required for decompression

Posted Jan 19, 2026 1:26 UTC (Mon) by excors (subscriber, #95769) [Link]

> If the receiving environment does not have enough RAM for working storage, or enough ROM to store the decompression program, then the compression ratio does not matter. ... Think of a 16-bit microcontroller with a total of 128kB of RAM and 256kB of ROM.

In that case (or even smaller), I've found Zlib is surprisingly good. The uzlib implementation takes about 2KB of ROM (on Cortex-M4), and about 1KB RAM + window size (max 32KB but you can probably reduce it to 16KB or 8KB without much impact on compression ratio). If you're storing compressed data in flash (e.g. when downloading a firmware update) then you want to minimise `code_size + uncompressed_size * compression_ratio`, and I suspect it's going to be hard to beat uzlib with a more sophisticated algorithm until you're getting up in the hundreds of KBs of data.

It would be nice if there were more algorithms (with size-optimised implementations) competing in that space, and benchmarks showing what's the best tradeoff in different ranges. It sounds like that's not what OpenZL is interested in though, since it's currently designed with a universal decompressor that will presumably be huge.

Consider the resources required for decompression

Posted Jan 23, 2026 6:44 UTC (Fri) by kmeyer (subscriber, #50720) [Link] (2 responses)

I don't think OpenZL is trying to be a solution for this increasingly irrelevant class of microcontroller. You can get relatively sophisticated ARM SOCs for cheap these days.

Consider the resources required for decompression

Posted Jan 23, 2026 9:12 UTC (Fri) by farnz (subscriber, #17727) [Link] (1 responses)

While the CPU core has become much better since then (typically a Cortex-M0 or a RISC-V thing, CPU core clock speeds below 40 MHz are now virtually non-existent), the cheap end is still very constrained for RAM and Flash.

Once you're looking at the under $0.25 per MCU range, 16 KiB flash is still common, but 2 KiB RAM isn't that uncommon - and these MCUs are remaining relevant because they're incredibly cheap in high volumes, use very little power and are still reasonably capable.

Consider the resources required for decompression

Posted Jan 23, 2026 11:54 UTC (Fri) by excors (subscriber, #95769) [Link]

Yes, I wouldn't personally care about 16-bit processors nowadays; but there's plenty of low-cost battery-powered IoT devices that can't afford megabytes of always-on DRAM, and I don't imagine energy efficiency will improve enough to change that any time soon. They'll probably have a (relatively) fast 32-bit core that can do plenty of sophisticated computation and data processing, but very little storage, so compression algorithms are both possible and useful.

(E.g. the "mainstream" STM32G0 series has 64MHz Cortex-M0+, and ranges from 8KB RAM / 32KB flash ($0.60 direct from ST) up to 144KB RAM / 512KB flash ($1.75).)

I wouldn't be surprised if this was becoming _more_ relevant over time, since many people see value in monitoring and controlling every device over the network, so they need more microcontrollers.


Copyright © 2026, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds