A netlink-based user-space crypto API

By Jake Edge
October 20, 2010

User-space access to the kernel cryptography subsystem has reared its head several times of late. We looked at one proposal back in August that had a /dev/crypto interface patterned after similar functionality in OpenBSD. There is another related effort, known as the NCR API, and crypto API maintainer Herbert Xu has recently posted an RFC for yet another. But giving user space the ability to request that the kernel perform its computation-intensive crypto operations is not uncontroversial.

As noted back in August, some kernel hackers are skeptical that there would be any performance gains by moving user-space crypto into the kernel. But there are a number of systems, especially embedded systems, with dedicated cryptographic hardware. Allowing user space to access that hardware will likely result in performance gains, in fact 50-100x performance improvements have been reported.

Another problem with both the /dev/crypto and NCR APIs (collectively known as the cryptodev-linux modules) is the addition of an enormous amount of code to the kernel to support crypto algorithms beyond those that are already available. Those two modules have adapted user-space libraries for crypto and multi-precision integers and included them into the kernel. They are necessary to support some government crypto standards and certifications that require a separation between user space and crypto processing. So, the cryptodev-linux modules are trying to solve two separate (or potentially separate) problems: user-space access to crypto hardware acceleration and security standards compliance.

When Xu first put out an RFC on his idea for the API (without any accompanying code) back in September, Christoph Hellwig had a rather strongly worded reaction:

doing crypto in kernel for userspace consumers [is] simply insane. It's computational intensive code which has no business in kernel space unless absolutely required (e.g. for kernel consumers). In addition to that adding the context switch overhead and address space transitions is god [awful] too.

Xu more or less agrees with Hellwig, but sees his API as a way to provide access to the hardware crypto devices. Because Xu's API is based on netlink sockets (as opposed to ioctl()-based or a brand new API that the cryptodev-linux modules introduce), he is clearly hoping that it will provide a way forward without requiring such large changes to the kernel:

FWIW I don't care about user-space using kernel software crypto at all. It's the security people that do.

The purpose of the user-space API is to export the hardware crypto devices to user-space. This means PCI devices mostly, as things like aesni-intel [Intel AES instructions] can already be used without kernel help.

Now as a side-effect if this means that we can shut the security people up about adding another interface then all the better. But I will certainly not go out of the way to add more crap to the kernel for that purpose.

The netlink-based interface uses a new AF_ALG address family that gets passed to the initial socket() call. There is also a new struct sockaddr_alg that contains information about what type of algorithm (e.g. "hash" or "skcipher") is to be used as well as the specific algorithm name (e.g. "sha1" or "cbc(aes)") that is being requested. That structure is then passed in the bind() call on the socket.

For things like hashing, where there is little or no additional information needed, an accept() is done on the socket, which yields an operation file descriptor. The data to be hashed is written to that descriptor and, when there is no more data to be hashed, the appropriate number of bytes (20 for sha1) are then read from the descriptor.

It is a bit more complicated for ciphers. Before accepting the connection on the socket, a key needs to be established for a symmetric key cipher. That is done with a setsockopt() call using the new SOL_ALG level and ALG_SET_KEY option name and passing the key data and its length. But there are additional parameters that need to be set up for ciphers, and those are done using sendmsg().

A cipher will need to know which direction it is operating in (i.e. encrypting or decrypting) and may need an initialization vector. Those are specified with the ALG_SET_OP and ALG_SET_IV messages. Once the accept() has been done, those messages are sent to the operational descriptor and the cipher is ready for use. Data can be sent as messages or written to the operational descriptor, and the resulting data can then be read from that descriptor.

There is an additional wrinkle for the "authenticated encryption with associated data" (AEAD) block cipher mode, which can include authentication information (i.e. message authentication code or MAC) into the ciphertext stream. Because of that, AEAD requires two data streams, one containing the data itself and another with the associated authentication data (the MAC). This is handled in Xu's API by doing two accept() calls, the first for the operational descriptor, and the second for the associated data. If the cipher is operating in encryption mode, both descriptors will be written to, while the encrypted data is read from the operational descriptor. For decryption, the ciphertext is written to the operational descriptor, while the plaintext and authentication data are read from the two descriptors.

There hasn't been much discussion, yet, of the actual code posting, but Xu's September posting elicited a number of complaints about performance, most from proponents of the cryptodev-linux modules. But it would seem that there is some real resistance to adding completely new APIs (as NCR does) or to adding a complicated ioctl()-based API (as /dev/crypto does). Now there are three competing solutions available, but it isn't at all clear that any interface to the kernel crypto subsystem will be acceptable to the kernel community at large. We will have to wait to see how it all plays out.

Index entries for this article
Kernel	Cryptography

A netlink-based user-space crypto API

Posted Oct 21, 2010 6:16 UTC (Thu) by neilbrown (subscriber, #359) [Link] (10 responses)

I must say I'm not a big fan of sockets or netlink. You cannot access them with shell scripts...

I much prefer the filesystem model.

{ cat myfile &>0 ; read hash ; } <> /random-mountpoint/crypto/hash/sha1

So the name of the algorithm in passed as part of the file name, the content is written to the file descriptor. The hash is read from that same filedescriptor. The hash state is stored attached to the 'struct file'. See "Transaction based IO" in fs/libfs.c. It would need to be extended to work with writing a large file, but the concept is sound.

For encrypting.. using the same 'fd' for both read and write is problematic in a way that it isn't (so much) for the above. The original (Unix 6) pipe syscall returned only one fd which you could read from and write to. One problem with that was that it can be awkward to detect when the 'write' end has been closed (so the read end should get EOF), as there is no distinction between the two. If you happen to have two processes with the 'read' end open you never see EOF.

If we can either ignore that or work around it, then
/mountpoint/crypto/cypher/$direction/$cyphertype/$key/$iv
is a promising file name to write to/ read from, except that there is a risk that the key would get stuck in the dcache and appear in /proc/$N/fd/$FD. I'm sure that is solvable though. The key would be HEX or BASE64 encoded of course.

The need to multiplex cyphertext and MAC is certainly a complication. I suspect there was a reason Herbert suggested 2 sockets rather than a simple multiplexing scheme. Without knowing that reason it is pointless trying to refine the design.

If it was to be done with sockets, it would seem to make much more sense to use 'socketpair(AF_ALG, SOCK_STREAM, ....)' rather than the sockets + accept model. Then you have distinct 'read' and 'write' ends. I would also use MSG_OOB to send the MAC beside the cyphertext rather than having two separate streams (not that I am a big fan of MSG_OOB, but it does seem to be a shoe that fits).

A netlink-based user-space crypto API

Posted Oct 21, 2010 14:23 UTC (Thu) by ken (subscriber, #625) [Link] (1 responses)

I have to confess that I do not understand what problem the open() ioctl() interface have that the socket() setsockopt() bind() accept() solves.

To me it looks like you just transform magic ioctl number into magic socket options and magic sendmsg() commands.

where is the benefit over /dev/crypto ??

A netlink-based user-space crypto API

Posted Nov 1, 2010 4:49 UTC (Mon) by kevinm (guest, #69913) [Link]

The advantage that the sockets API has over ioctl is that the former provides a single, standard, already implemented and tested method of marshalling and unmarshalling parameters kernel-side.

The fundamental problem of the ioctl interface is that every implementer of the interface must re-implement that parameter marshalling - for every arch. There are plenty of ioctl()s that *still* don't work properly for IA32 callers on x86-64 arch.

No way to access sockets from a shell script?

Posted Oct 21, 2010 15:04 UTC (Thu) by rvfh (guest, #31018) [Link] (4 responses)

What about netcat?

No way to access sockets from a shell script?

Posted Oct 21, 2010 15:12 UTC (Thu) by jengelh (guest, #33263) [Link] (3 responses)

What you would want is socat, anyway.

No way to access sockets from a shell script?

Posted Oct 21, 2010 16:29 UTC (Thu) by nye (subscriber, #51576) [Link] (1 responses)

HFS. I've never seen socat before, but from reading the manpage I'm pretty certain it must have the option to Dominate All Humans, if only I can figure out the right command line arguments.

No way to access sockets from a shell script?

Posted Oct 28, 2010 19:11 UTC (Thu) by oak (guest, #2786) [Link]

I mistyped that a few years ago and the operation seems to be non-reverseable. Sorry.

Hmmm. On second thought, I would assume socat author to have tested all the documented options. Maybe I used it with the --simulate option after all.

(That could also explain the dancing ping elephants and Jumbo frame's enormous flapping ears...)

No way to access sockets from a shell script?

Posted Oct 22, 2010 1:42 UTC (Fri) by neilbrown (subscriber, #359) [Link]

While I'm sure socat is an absolutely awesome program, I'm guessing it isn't precognitive, and so cannot support new address families and socket options until someone goes to the trouble of coding them in.

With suitably chosen file names, no such extra coding for the shell is needed.

A netlink-based user-space crypto API

Posted Oct 21, 2010 15:13 UTC (Thu) by jengelh (guest, #33263) [Link] (2 responses)

I would prefer not to have the IV in the filename, for that would potentially be visible in ps output.

A netlink-based user-space crypto API

Posted Oct 22, 2010 1:54 UTC (Fri) by neilbrown (subscriber, #359) [Link] (1 responses)

It is long past time that /proc/*/cmdline were not world-readable.

A netlink-based user-space crypto API

Posted Oct 22, 2010 9:09 UTC (Fri) by nix (subscriber, #2304) [Link]

Unfortunately there are a *lot* of scripts out there that depend on 'ps -o args' and similar commands working for users other than the current one. Really a lot. This would, of course, break them all.

Don't use netlink

Posted Oct 21, 2010 18:28 UTC (Thu) by daniel (guest, #3181) [Link]

Netlink is the wrong way to do anything. It is a kernel interface designed in the image of a network. However the kernel is not a network node. The kernel supports a nice simple and generic model of file methods, from which any desired stream interface to the kernel may be derived in a straightforward efficient way that does not bring along the addressing baggage of netlink. Please just don't spam the kernel with new uses of netlink. Ever.

A netlink-based user-space crypto API

Posted Oct 21, 2010 20:54 UTC (Thu) by alonz (subscriber, #815) [Link] (6 responses)

Well, speaking as the architect of a hardware cryptography device…

I also dislike for Xu's proposal. Sorry.

My issues with this API (unlike the previous commenters) relate to function, not form:

It creates unnatural semantic linkages between sockets (most importantly these pairs of sockets used for AEAD, which need to be written to/read from in a very particular ordering)
There is no way to achieve zero-copy cipher operation with this API (at least one of the sendmsg()/recv() will have to copy data to/from an skbuff).

I don't really have a good alternative API; crypto just doesn't appear to map cleanly to the Unix abstractions. Maybe a specialized system call ("sendrecvmsg()"/"servercall()" or somesuch) could help with the second point.

A netlink-based user-space crypto API

Posted Oct 22, 2010 0:59 UTC (Fri) by SEJeff (guest, #51588) [Link] (2 responses)

Just curious, what hardware crypto device? Can you disclose that and is any information public?

A netlink-based user-space crypto API

Posted Oct 24, 2010 21:50 UTC (Sun) by alonz (subscriber, #815) [Link] (1 responses)

I'm the lead architect for Discretix' CryptoCell embedded security platform (which is also the basis for the Intel Moorestown security subsystem).

A netlink-based user-space crypto API

Posted Oct 25, 2010 12:13 UTC (Mon) by SEJeff (guest, #51588) [Link]

Ah thanks. That gives much better context.

A netlink-based user-space crypto API

Posted Oct 22, 2010 1:52 UTC (Fri) by neilbrown (subscriber, #359) [Link] (2 responses)

When you say "zero-copy", do you need the transformation to happen in-place, or is it OK to transform (e.g. encrypt) from one buffer to another buffer as long as the app chooses the buffers?

I'm imagining you do an aio_read() to identify the buffer for the transformed data to be written to, then an O_DIRECT write to identify the buffer containing data to be transformed. The underlying implementation would need to notice the presence of a pending aio_read and place the result directly there rather than in the page cache.

I guess the same thing could do an in-place transformation, but it could get messy.

Of course if O_DIRECT wasn't used it would fall back to the simple case of copying to the page cache, transforming, and copying back out.

A netlink-based user-space crypto API

Posted Oct 24, 2010 21:58 UTC (Sun) by alonz (subscriber, #815) [Link] (1 responses)

I refer to "zero-copy" in rather loose terms—not copying more than is necessary. In particular, if the application chooses input/output buffers that are suitable for DMA, I would like to perform a single DMA translation (many cryptography engines have dual-channel DMA engines, so they can read the source buffer via DMA, transform it, and write the output to the target buffer in a single pass).

As for the specific API—all proposals I have seen so far look like hacks, and are rather brittle (e.g. the aio_read solution would require the driver to keep userspace pointers for longer than a single system call, which is generally considered bad taste AFAIK).

A netlink-based user-space crypto API

Posted Oct 24, 2010 23:31 UTC (Sun) by neilbrown (subscriber, #359) [Link]

One man's hack is another man's elegant design :-)

aio, by its very nature, requires the kernel to hold on to user-space pointers for longer than a single system call. This is OK because 'aio_cancel' exists to reclaim the pointer if needed.

What about PKI? And other comments.

Posted Oct 29, 2010 15:49 UTC (Fri) by deviantmaru (guest, #70901) [Link] (1 responses)

1) I see no mention of PKI algorithms. How would they be implemented?
2) I would have to agree with ken and alonz in that the netlink-based system
seems more like a hack than a proper design.
3) I would also agree with alonz that crypto operations don't seem to fit
well into any of the current Unix abstractions.
4) I am new to ioctl-based programming, so can anyone please tell me what is
awful about it?

Disclaimer: I am a kernel-driver who is currently hacking (learning) on an
ioctl-based, /dev/blah driver for a hardware (PCI) crypto device.

What about PKI? And other comments.

Posted Nov 1, 2010 15:26 UTC (Mon) by eparis (guest, #33060) [Link]

4) I am new to ioctl-based programming, so can anyone please tell me what is
awful about it?

The biggest problem with ioctl is by FAR that people get it wrong. ioctl is the equivalent of typing everything in C void * and wondering why your program isn't behaving correctly. Look at ioctl vs getsockopt() and setsockopt()

int ioctl(int d, int request, ...);

int getsockopt(int sockfd, int level, int optname, void *optval, socklen_t *optlen);
int setsockopt(int sockfd, int level, int optname, const void *optval, socklen_t optlen);

They provide the same ability to be generic and to move data back and forth but the socket functions encode size and direction into the call. It means you can easily do sane checks in the kernel.

Linus has recently pushed a bit that syscalls are the right way to go (not in this discussion, just in general discussions about kernel/userspace ABI). A good syscall is going to provide size, direction, and strong typing of arguments.

The more information an interface encodes and enforces the more likely it is that the interface will be used correctly.