Recoding existing JPEGs
Recoding existing JPEGs
Posted Apr 9, 2024 6:54 UTC (Tue) by epa (subscriber, #39769)Parent article: Introducing Jpegli: A New JPEG Coding Library (Google Open Source Blog)
Posted Apr 9, 2024 7:58 UTC (Tue)
by smurf (subscriber, #17840)
[Link] (6 responses)
… well, except … you could train a GAN with a heap of before-and-after-bad-JPEG-compression images, thereby encoding the information that typically gets lost when JPEGging images in the neural net. Thus you could recover something that at least looks like a possible original file. You could then re-compress that with jpegli.
Whether this is worth the effort is another question.
Posted Apr 9, 2024 10:41 UTC (Tue)
by chris_se (subscriber, #99706)
[Link] (2 responses)
Interesting, especially since standard JPEG has a fixed block size of 8x8, so you'd only need 64x3 (for color) input parameters for the network, which is not even that expensive. (Though the question is whether you'd want to train different networks depending on the chroma subsampling used.) Plus you could even auto-generate the training data if you have raw images.
Posted Apr 9, 2024 11:27 UTC (Tue)
by smurf (subscriber, #17840)
[Link]
Well, no. Consider a blue sky that JPEG has deconstructed into monochromatic 8x8 blocks (and even if it didn't, it definitely lost the lower bits of color information). In order to recover the original image, i.e. a continuous 10-bit "hazy horizon to somewhat-deep blue" gradient, you need more context.
Posted Apr 9, 2024 11:35 UTC (Tue)
by excors (subscriber, #95769)
[Link]
(I believe JPEG XL has a deblocking filter as part of the standard decoder pipeline (as do many video codecs, and images codecs derived from them (WebP, AVIF)), to reduce that problem. But Jpegli is designed for compatibility with JPEG decoders, so it can't do that and its performance is significantly worse than JPEG XL, even if you start with the original uncompressed images.)
Posted Apr 9, 2024 11:41 UTC (Tue)
by gmgod (guest, #143864)
[Link] (2 responses)
And before you object to that, if you don't need to look close enough, then most of the time you don't need an artifact remover at all. Or if you need one, non-AI stuff will just prove more reliable and good enough (or better, actually, I've seen pretty neat ones iteratively reconstruct the underlying image, leveraging our knowledge of how jpeg works).
Posted Apr 9, 2024 12:24 UTC (Tue)
by farnz (subscriber, #17727)
[Link] (1 responses)
IME, the interesting use of AI is at the encoding stage, as part of the encoder heuristics for determining where to spend the bit budget.
All codecs decompose naturally (for the purposes of analysis, although not normally implementation) into three stages; a set of nominally lossless transforms (in practice, limited by quantization error in the number of bits used) to make the lossy part do better (e.g. RGB → YCrCb, a DCT, reconstruction of the picture via motion vectors from reference pictures and a residual), a lossy part where decisions are made to discard some data to reduce the output bit consumption (e.g. sub-sampling Cr and Cb components, quantizing DCT coefficients, quantizing motion vectors, quantizing a residual picture), and a lossless output compression stage (such as arithmetic coding).
The lossy part is the part that's interesting - because the lossless stages are tightly defined, there's not a lot of room to change them to reduce output data sizes - and the lossy part depends crucially on minimising the degree to which a human will notice the data that's been discarded. And the bit where AI is interesting is in quantifying how likely it is that a human will notice a given chunk of discarded data in the lossy phase; that allows the encoder as a whole to maximize the human-useful information content per output bit.
Posted Apr 9, 2024 14:47 UTC (Tue)
by atnot (subscriber, #124910)
[Link]
I think this is still true, but there is a type of encoding that could pretty straightforwardly benefit; Predictive encoders. The way those work is by feeding your data into some sort of model that predicts what the next value is going to be and then notes down the difference to the actual value. To decode it, you just add those differences onto the prediction the same model gives with the data you've decoded so far. If your model is good, those values will hopefully be very low and compressible. I understand this is mostly being used in the audio world right now, but with stuff like AI frame interpolation being used in games, it doesn't seem too far fetched to use it for video too? Whether it's at all computationally or size efficient compared to manually written codecs is another question though of course.
Posted Apr 9, 2024 18:05 UTC (Tue)
by fraetor (subscriber, #161147)
[Link]
It does this by taking the image back to its DCT coefficients (which is how images are internally stored in JPEG) and reencodes them using improved lossless compression techniques developed over the last 30 years. Then, when it decodes them it also is more exact about the calculations, which removes some rounding errors giving you more effective bit depth. This last step is also used in jpegli to produce nicer decodes of existing JPEG images.
Recoding existing JPEGs
Recoding existing JPEGs
Recoding existing JPEGs
Recoding existing JPEGs
Recoding existing JPEGs
Recoding existing JPEGs
Recoding existing JPEGs
Recoding existing JPEGs