The strange story of the ARM Meltdown-fix backport
The Meltdown vulnerability is most prominent in the x86 world, but it is not an Intel-only problem; some (but not all) 64-bit ARM processors suffer from it as well. The answer to Meltdown is the same in the ARM world as it is for x86 processors: kernel page-table isolation (KPTI), though the details of its implementation necessarily differ. The arm64 KPTI patches entered the mainline during the 4.16 merge window. ARM-based systems notoriously run older kernels, though, so it is natural to want to protect those kernels from these vulnerabilities as well.
When Shi posted the 4.9 backport, stable-kernel maintainer Greg Kroah-Hartman responded with a pair of questions: why has a separate backport been done when the Android Common kernel tree already contains the Meltdown work, and what sort of testing has been done on this backport? In both cases, the answer illustrated some interesting aspects of how the ARM vendor ecosystem works.
Android Common and LTS kernels
The Android Common kernels are maintained by Google as part of the Android Open-Source Project; they are meant to serve as a base for vendors to use when creating their device-specific kernels. These kernels start with the long-term support (LTS) kernels, but then add a number of Android-specific features, including the energy-aware scheduling work, features that haven't made it into the mainline for a number of reasons, and more. They also contain backports of important features and fixes, including the Meltdown fixes.
The Meltdown-fix backport was quite a bit of work, and it has gone through extensive testing in the Android kernel. Kroah-Hartman worried that the new backport may not have all of the necessary pieces or have been as extensively validated as the Android work; as such, it may not be something that should appear in the LTS kernels. The analogous effort for x86 should not be an example to follow, he said:
The problem with this idea is that not every ARM system is running Android, and pulling from the Android kernel will not work for vendors whose kernels are closer to the mainline. As Mark Brown put it:
Those vendors would like to have a long-term supported version of the Meltdown mitigations that does not require dragging in all of the other changes that accumulate in the Android kernels. As Brown pointed out, there are increasing numbers of vendors that are doing what the community has been asking for years and staying closer to the mainline. Not providing a proper backport of these important fixes could be seen as breaking the promise that the community has made: run the officially supported stable kernels and you will get the fixes for significant problems.
There is, thus, a reasonable argument to be made that a proper set of backports for the Meltdown fixes should find its way into the LTS kernels. One little problem remains, though: a proper backport should be known to actually work.
Testing deemed optional
Shi's response to Kroah-Hartman's question about testing was, in its
entirety: "Oh, I have no A73/A75 cpu, so I can not reproduce meltdown
bug.
"  Reproducing the bug on the A73 would  be a bit of a
challenge, since that processor does not suffer from Meltdown, but A75
does, so asking for testing results on that CPU does not seem entirely out
of line.  When Kroah-Hartman repeated his request for testing, though, Ard
Biesheuvel responded:
Upon receipt of that message, Kroah-Hartman dropped the patch series entirely, complaining
that: "I can't believe we are having the argument of 'Test that your
patches actually work'
".  He later added that if the developers working on the
backport don't have both the hardware and the exploit code, "then
someone is doing something seriously wrong
".  He urged them to
complain to ARM Ltd to get that problem fixed.
At that point, the conversation stopped.  Whether the testing problem
is on its way toward a solution has not been revealed.  It does seem right that
the fixes should be merged into the LTS kernels; otherwise the promises
that the community has made regarding those kernels will start to look
hollow.  But the vendors depending on the LTS kernels also have a right to
fixes that somebody has actually bothered to test; anybody who has worked
in system software for any period of time knows that just checking for
adherence to a specification is no guarantee of a working solution.
| Index entries for this article | |
|---|---|
| Kernel | Development model/Stable tree | 
| Kernel | Security/Meltdown and Spectre | 
| Security | Meltdown and Spectre | 
      Posted Mar 15, 2018 17:35 UTC (Thu)
                               by david.a.wheeler (subscriber, #72896)
                              [Link] (11 responses)
       
     
    
      Posted Mar 15, 2018 19:08 UTC (Thu)
                               by sjfriedl (✭ supporter ✭, #10111)
                              [Link] 
       
Right? :-) 
     
      Posted Mar 15, 2018 23:37 UTC (Thu)
                               by ken (subscriber, #625)
                              [Link] (2 responses)
       
That is the nature of weird corner cases. Now was there even a meltdown test case published ?  
     
    
      Posted Mar 16, 2018 0:53 UTC (Fri)
                               by rahvin (guest, #16953)
                              [Link] (1 responses)
       
     
    
      Posted Mar 16, 2018 13:13 UTC (Fri)
                               by hkario (subscriber, #94864)
                              [Link] 
       
I wouldn't say this shows that. It's not uncommon to just reason whether some bug in software is possible or not and fix based on that. I'd say it's likely that people that have the design documents for hardware can do exactly the same, just based on the public description of the vulnerability. 
     
      Posted Mar 16, 2018 6:53 UTC (Fri)
                               by epa (subscriber, #39769)
                              [Link] (6 responses)
       
As the developer says, there is a magic sequence of processor instructions that has to be done at particular points. That’s the spec the vendor has provided - just as they might make similar black-box pronouncements about buggy floating point division or other hardware bugs to work around. Taking this on trust has the same problems as taking any other vendor guarantee on trust. Lots of code that theoretically conforms to some spec turns out not to work in practice. But to do nothing instead? 
It’s also bizarre to recommend using the Android tree after years of preaching about sticking to the mainline.  
     
    
      Posted Mar 16, 2018 9:23 UTC (Fri)
                               by tchernobog (guest, #73595)
                              [Link] (5 responses)
       
Not testing code before merging in a LTS is inexcusable. 
     
    
      Posted Mar 16, 2018 10:24 UTC (Fri)
                               by ardbiesheuvel (subscriber, #89747)
                              [Link] (4 responses)
       
The debate is about whether it is sufficient to test whether a mitigation such as KPTI in fact does what is expected of it, i.e., unmap the kernel while running in userland, or whether it is mandatory to go all the way to the beginning and test whether unmapping the v4.9 kernel blocks Meltdown attacks just like unmapping the v4.16 kernel does. 
 
 
     
    
      Posted Mar 17, 2018 19:21 UTC (Sat)
                               by epa (subscriber, #39769)
                              [Link] (3 responses)
       
     
    
      Posted Mar 18, 2018 14:03 UTC (Sun)
                               by gregkh (subscriber, #8)
                              [Link] (2 responses)
       
In this way, I am applying the exact same standard to these ARM patches as I have with the other architecture patches of this nature.  For me to not apply that same standard would not be very fair, don't you think? 
     
    
      Posted Mar 18, 2018 20:49 UTC (Sun)
                               by epa (subscriber, #39769)
                              [Link] 
       
     
      Posted Mar 18, 2018 20:51 UTC (Sun)
                               by epa (subscriber, #39769)
                              [Link] 
       
     
      Posted Mar 15, 2018 19:58 UTC (Thu)
                               by flussence (guest, #85566)
                              [Link] (3 responses)
       
     
    
      Posted Mar 15, 2018 20:58 UTC (Thu)
                               by pizza (subscriber, #46)
                              [Link] 
       
     
      Posted Mar 16, 2018 2:09 UTC (Fri)
                               by atelszewski (guest, #111673)
                              [Link] (1 responses)
       
> It's bewildering to think ARM still exists 
Remember that Arm isn't only the big and complex SoCs. 
-- 
     
    
      Posted Mar 16, 2018 2:57 UTC (Fri)
                               by pizza (subscriber, #46)
                              [Link] 
       
Over time Arm has provided more and more building blocks, including reference designs, and have increasingly encouraged more standardization in how things are put together (The Cortex-M's CMSIS framework is a good example, along with the SBSA stuff for the ARMv8 servers) but it's still ultimately up to the licensee to put together and support an appropriate CSP/BSP.  Because the many licensees typically end up differentiating themselves into corners, things tend to fall apart rapidly, resulting in the current less-than-ideal situation. 
     
      Posted Mar 16, 2018 0:07 UTC (Fri)
                               by hjames (subscriber, #14925)
                              [Link] (3 responses)
       
     
    
      Posted Mar 16, 2018 2:05 UTC (Fri)
                               by viro (subscriber, #7872)
                              [Link] 
       
     
      Posted Mar 23, 2018 18:16 UTC (Fri)
                               by raven667 (subscriber, #5198)
                              [Link] 
       
Strangely enough, I can tell right away that this is unlikely to be true, just because the code works OK in one physical device, regardless of how many copies of that device are manufactured, doesn't make it suitable for inclusion in millions of other different devices, the code needs to work for a wide range of devices and usage, not just one. 
     
      Posted Mar 30, 2018 19:47 UTC (Fri)
                               by flussence (guest, #85566)
                              [Link] 
       
     
    Yes, all software changes should be tested
      
Yes, all software changes should be tested
      
Yes, all software changes should be tested
      
Yes, all software changes should be tested
      
Yes, all software changes should be tested
      
Yes, all software changes should be tested
      
Yes, all software changes should be tested
      
Yes, all software changes should be tested
      
Yes, all software changes should be tested
      
Yes, all software changes should be tested
      
Yes, all software changes should be tested
      
Yes, all software changes should be tested
      
The strange story of the ARM Meltdown-fix backport
      
The strange story of the ARM Meltdown-fix backport
      
The strange story of the ARM Meltdown-fix backport
      
It's a big player in the microcontrollers area.
And as much as I hate what they do in Linux ecosystem, I enjoy working with Cortex-M{0,3} cores.
Best regards,
Andrzej Telszewski
The strange story of the ARM Meltdown-fix backport
      
The strange story of the ARM Meltdown-fix backport
      
The strange story of the ARM Meltdown-fix backport
      
The strange story of the ARM Meltdown-fix backport
      
The strange story of the ARM Meltdown-fix backport
      
Let a million CVEs bloom...
 
           