Inline encryption support for block devices

By Jake Edge
March 22, 2017

LSFMM 2017

In a combined storage and filesystem session at LSFMM 2017, Ted Ts'o led a discussion of support for inline cryptographic engines (ICEs) that are being used in mobile phones. A number of hacks have been made over the last few years by Android device makers for Linux support of these engines to encrypting filesystem data, but Ts'o would like to create something that can go into the mainline kernel. He was looking for thoughts on how to make that happen.

Doing AES encryption on the ARM cores that are used in mobile phones is fairly power hungry, so manufacturers are increasingly turning to ICE devices to encrypt the data on the device. These ICE devices sit between the CPU and the flash storage; the CPU must provide a key ID to them in order to use them. So there is a need to tell the engine which key to use for an I/O request. In the future, Ts'o said, the keys themselves may come from a secure element, such that the CPU and kernel will not have access to them at all.

Qualcomm has been trying to get support for ICE devices upstream for some time, but the code is "rather unspeakable". It blindly assumes an ext4 filesystem and roots through private pointers to access inode structures in order to associate key IDs with I/O requests. The Qualcomm code is not what is in the Pixel phones, he was quick to note; Qualcomm started with the Pixel code and "did horrible things to it".

His goal is to find upstream-acceptable changes to support ICE. A "nice to have" would be a way for him to remove the hacks in the ext4 and f2fs filesystems, as well, and add a filesystem and block encryption mechanism that does not require a device-mapper layer. For the desktop and server case, having a device-mapper layer makes it easier for users, he said, but with hardware crypto, there's no reason to have one.

Ts'o proposed adding a 32-bit key ID field to struct bio, which is what Universal Flash Storage (UFS) has. Key IDs are integer values that refer to keys that have been stored into "slots" in ICE device. He believes that most ICE devices will have far fewer slots than 32 bits will allow, though.

James Bottomley suggested using the Data Integrity Field and Data Integrity Extensions (DIF/DIX) support for the key IDs. Martin Petersen said there is a union that holds DIF/DIX or copy offload information; another field could be added for the key ID. Ts'o said he would look into that.

There will also be a need for a key slot manager of some kind. Since there will be a limited number of key slots for an ICE device, there can only be that many BIOs with different key IDs in flight at any given time. So the device will need to request a key slot, which might block if there are none available. The slots will need to be reference counted; they would be incremented when a BIO with an ID is submitted and decremented when it completes.

All of the key slot management would be hidden from the filesystem. The drivers will manage the slots, but the filesystem will need to identify the key that goes with a particular request. It is important that two BIOs with different keys do not get merged. David Howells asked about superblock encryption and whether mount() needs to know about keys, but Ts'o said that the metadata for the ext4 and f2fs filesystems is not encrypted on Android devices. There is some rough prototype code that Michael Halcrow has been working on that should come out soon, Ts'o said.

In something of a side note, he also mentioned that right now filesystem encryption on desktops or servers uses a per-process or per-session key ring. Users can set and remove their own keys from those rings, but that doesn't work for hardware devices because there is no concept of a key owner. Once a key gets into an ICE device, there are no further checks and anyone can use the key. It is the host operating system that allows or prevents access to files using the Unix permissions.

It would be useful to have a kind of global key ring for software crypto that could be used like an ICE device, he said. Keys would be added or removed only by root, but once they are added, those keys can be used by anyone on the system. Someone in the audience asked about containers where there may be multiple "root" users due to user namespaces. Ts'o said he hadn't thought about it. Someone suggested tying the key ring to the user namespace where they were created, so a global key ring created by root in a container would only be accessible to other users in that container/namespace.

Index entries for this article
Kernel	Block layer
Kernel	Security
Security	Encryption
Conference	Storage, Filesystem, and Memory-Management Summit/2017

Inline encryption support for block devices

Posted Mar 23, 2017 1:34 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

A global user-managed keyring would be nice, block-level encryption for selected files is a nice feature in general even without ICEs.

wait... how do other systems such as APFS and ZFS do acceleration ?

Posted Mar 24, 2017 3:05 UTC (Fri) by johnjones (guest, #5462) [Link] (5 responses)

As the title says how do other systems such as APFS and ZFS do acceleration ?

Also does not OCF (OpenBSD Crypto Framework) and other accelerators already do something like this ?

thanks

John

wait... how do other systems such as APFS and ZFS do acceleration ?

Posted Mar 26, 2017 2:00 UTC (Sun) by tytso (subscriber, #9993) [Link] (4 responses)

It's not a question of how do other operating systems or file systems do acceleration; it's a question of how does the hardware do acceleration.

Historically, most accelerators work by sitting on the Host CPU's bus, and talks to the main memory. An example of this is an SHA256 accelerator which sits on the PCIe bus, and checksums data in memory. See the slides from a presentation[1] at the 2014 OpenZFS summit.

[1] http://open-zfs.org/w/images/6/63/Lightning_Talk-Zacodi_L...

What some ARM SOC (system on chip) vendors have done is to put an encryption engine in between the host CPU and the storage device. This isn't actually new; IBM Mainframes can do something similar. Interestingly, one of things ARM CPU's on handsets and Mainframe CPU's is that they tend to be relatively underpowered compared to the rest of the system. So while having a storage-specific accelerator between CPU and storage device is less flexible, it reduces the overhead on the CPU and memory. (For example, an encryption engine which can also be used for IPSEC would read from memory, and then write the ciphertext back to memory --- but then data would have to be sent from memory to the storage or networking device. Compare this architecture with one where you have a crypto engine just for storage, and a different crypto engine just for networking which lives on the NIC.)

I predict that in the future, we'll see this architecture on server platforms. Since we can no longer double the CPU frequency every 18 months, it makes more sense to speed up system by pushing more transistors away from CPU to more specialized hardware accelerators. And if that means specialized crypto engines for storage and networking --- that's just fine.

wait... how do other systems such as APFS and ZFS do acceleration ?

Posted Mar 27, 2017 8:15 UTC (Mon) by ortalo (guest, #4654) [Link] (2 responses)

I agree this seems fine for spreading usage of crypto. acceleration.
However, it seems to me we are seeing the problem of sharing these devices between multiple users appear now ; especially in the use cases where different users expect to use different keys.
Admittedly, it is already very nice to have host encryption, if only to have host-level protection againts "outsiders".
However, I wonder if simple additions to these hardware would not allow association of different sets of keys to different users contexts (similarly to the way pagetables are associated to specific adress spaces in the classical MMU case)?

Another question: all these new hardware are targetting encryption only? Nothing with respect to signing (or hashing) only?

wait... how do other systems such as APFS and ZFS do acceleration ?

Posted Mar 28, 2017 21:04 UTC (Tue) by tytso (subscriber, #9993) [Link] (1 responses)

How you associate keys to different user contexts is a key management problem, which is somewhat orthogonal to crypto acceleration. If the keys are provided by the host CPU to the crypto engine, then supporting keys for different users is trivial. We do this today with the Pixel and Pixel XL. Separate keys are actually used for each user profile. This means that if you leave your company, and your company is using G Suite (e.g., Gmail, Google Drive, etc.) with enterprise domain management, your company can arrange to have the key associated with the corporate identity be erased from your Android device, while allowing you to keep your personal profile on your phone. (Previously the Android Device Policy Manager would have to wipe the entire device when you left the company.) And if you have multiple users on your phone, each user also has a separate key.

Right now, all of in-line crypto acceleration hardware which I am familiar with is targetting encryption only, unfortunately.

The challenge with doing authenticated encryption (e.g., AES GCM) is that you need to store the per-block authentication tag somewhere. The challenge is that doing this would mean we would need flash chips with page sizes that are 4k plus 32 bytes for the IV and AES GCM authentication tag. But there aren't any eMMC flash devices with 4128 byte pages. And a non-standard, custom eMMC storage device would be extremely pricey. So it would probably be not commercially viable. :-(

wait... how do other systems such as APFS and ZFS do acceleration ?

Posted Mar 28, 2017 22:56 UTC (Tue) by tytso (subscriber, #9993) [Link]

I should add there is one hardware crypto devices that you can find in most laptops that will do signing, and that's the TPM. A few months ago, James Bottomley blogged[1] about interesting ways to use the TPM, including as a way of securing your SSH keys. (That's actually an encryption function, not a signing function, but the TPM can also do signature operations. Still, it's a good introduction to the TPM technology, so it's worth a read.)

[1] https://blog.hansenpartnership.com/using-your-tpm-as-a-se...

But a TPM is really not a crypto *accelerator*; in fact, pretty much all TPM's are incredibly S-L-O-W at doing crypto operations. So why does a TPM exist? Because it will only encrypt, decrypt, or sign messages on your behalf if you give it the correct password or PIN. And if you try too many bad passwords, the TPM will lock you out. So this is useful if you want to protect someone from stealing your password, since the password by itself won't be enough; they will also need to be able to get your laptop. It's also really good at making Jim Comey, head of the FBI, really angry. Oh, well. :-)

You can certainly use a TPM, or some other secure element, in conjunction with an in-line crypto engine. The two technologies are very complementary. This is why I want to make sure that whatever inline crypto encryption support gets landed in the upstream kernel can be easily extended to support device designs where the host CPU does not have access to the encryption keys, and where the keys are provisioned via some kind of secure element directly to the inline crypto engine.

wait... how do other systems such as APFS and ZFS do acceleration ?

Posted Mar 29, 2017 8:36 UTC (Wed) by robbe (guest, #16131) [Link]

> This isn't actually new; IBM Mainframes can do something similar.

Isn’t this the case for most HW features, that mainframes had it already in the 1970s? IBM is the Beatles of computer hardware.

Device mapper and hardware encryption

Posted Mar 30, 2017 8:14 UTC (Thu) by Wol (subscriber, #4433) [Link]

More a pointer than actual info, but from following the raid list I get the impression device mapper and hardware encryption don't fit together well. Even on desktop/server hardware.

The problem is basically that data gets passed to the encryption engine in ?4K chunks. Each chunk suffers a setup/teardown overhead. And while the hardware could quite happily handle much larger chunks, the device-mapper code needs major changes to be able to deal with said larger chunks. The impact is actually very noticeable once you look for it.

Cheers,
Wol

"Adding a 32-bit key ID field to struct bio" for Inline Encryption

Posted Jun 6, 2018 17:24 UTC (Wed) by ladvine (guest, #124882) [Link]

Adding a 32bit key ID field to struct bio, will be helpful to associate I/O requests to the encryption keys programmed in the key slot. But the challenging part is the key slots are limited, let us say 32 key programming slots in the case of UFS Host Controller Inline Encryption engine. This demands providing a reference to the crypto context in the bio structure; reason is when the key slots are exhausted, there can be multiple crypto context to the same hardware key slot and key reprogramming is required if the key programmed in the slot doesn't match with that of crypto context.
I am not sure providing a reference in the bio to the crypto context has security risks associated to same.