Network security in the microservice environment

April 12, 2017

This article was contributed by Tom Yates

We have seen that a microservice architecture is intimately tied to the use of a TCP/IP network as the interconnecting fabric, so when Bernard Van De Walle from Aporeto gave a talk at CloudNativeCon and KubeCon Europe 2017 on why we shouldn't bother securing that network, it seemed a pretty provocative idea.

Back in the old days, said Van De Walle, the enterprise had its computing infrastructure in data centers — big racks of servers, all with workloads running on them. The interconnecting network was divided into zones according to functional and security requirements. The zones were separated by big firewalls which filtered traffic based on source and destination IP address (or subnet) and port range. Modulo a few disasters, this worked OK.

But then came microservices, which took away workload predictability. No longer could one point at a box and say "this is a Java box, it runs the JVM", or "this is an Apache box, it runs the web server". Now any given pod may end up running on any node, at least by default, and this is a big challenge to the traditional model of firewalls. One could run one firewall per node, but people have started to deploy VPNs and software-defined networks, and once you do that, each new pod deployment requires updating the node firewalls across the entire deployment. The traditional firewall-as-gatekeeper model has real problems in this world.

So we should step back and think about this again; perhaps the network is not the best place to provide our interconnection security. Indeed, Van De Walle went on, we could embrace the contrary and assume that the network provides no security at all. This changes our threat model; the internal network is now just as untrusted as the external one. We cannot now rely on IP addresses and port numbers as authenticators; each flow must be authorized some other way. Kubernetes, he said, defaults to zero-trust networking; that is, by default the network is flat from the ground up. The IP address is dynamically assigned and carries no real information. Instead, Kubernetes has objects with associated identifiers — name, namespace, and labels — these are the foundation for your identity.

Kubernetes has made attempts to address the flatness of the network before. In v1.4, noted Van De Walle, network policies were introduced. The policy engine was based on a whitelist model; that is, everything was forbidden unless you explicitly authorized it. Policies are ingress-only, so you get to police traffic coming into your pod, but not that going out of it. Network policies are not active by default; they are activated on a per-namespace basis by the use of an annotation in the relevant YAML file. (For those who are not familiar with Kubernetes, I should point out that nearly everything involves YAML. The creation, modification, or destruction of just about any kind of resource involves the changes being specified in a YAML file, which is then referenced on the command line. If you're going to start using Kubernetes, prepare to start dreaming in YAML.)

In Van De Walle's opinion, Kubernetes network policies are fairly good. Rules are applied to specific pods, which are selected by a role label. Although the filtering is ingress filtering, you specify which traffic is allowed to enter a pod based on the role of the sender. For example, a policy might apply to all nodes whose role was backend; that policy would permit any traffic originating from a pod whose role was frontend. Rules are additive; each rule allows a new class of traffic, and traffic need only match any one rule in order to be permitted ingress.

There are already a number of projects working with (and extending) the network policies mechanism on the market; Van De Walle named Project Calico as an example, and there are others. But these implementations are tied to the networking backend, because, at the end of the day, policing is still based on IP addresses. So Aporeto has developed Trireme, a way of securing data flows that is independent of IP address information or the network fabric, and based entirely around the pod label.

Trireme adds a signed-identity-exchange phase on top of TCP's three-way handshake. The signed identity is then used to implement Kubernetes's native network policies with an enhanced degree of reliability. The iptables "mangle" table is used to send packets involved in the handshake via Trireme's user-space daemon, which must be installed on each authenticating node and which adds the identity exchange on top of the TCP handshake. Signatures are authenticated via a pre-shared key, or full PKI. If the pre-shared key option is used — only recommended for small deployments — Aporeto suggests distributing the key as a Kubernetes secret; if PKI is used, each node must be supplied with its private key and the relevant certificates.

Because this authentication is implemented on the node level, the pod is completely unaware that it is happening; it just never hears incoming traffic that fails the Trireme test. At this point, Van De Walle did a pleasingly quirky demo where a server pod was deployed that took requests over the network for the "Beer of the Day", which it returned from a random list of German beers. Two client pods were also deployed, both running a tiny application that continuously retrieved Beers of the Day from the server; one client possessed the tokens to assert an identity via Trireme, and one did not. When no network policy was in force, both clients were able to retrieve the beer of the day; when a network policy allowing only the approved client to connect to the server was applied, the non-approved client could no longer retrieve the daily beer.

Trireme is particularly helpful when not everything is happening in a single Kubernetes cluster. The ability to federate clusters across multiple data centers is coming; because this will almost inherently involve network address translation (NAT), authentication via source IP becomes extremely difficult. But as long as a TCP connection can be made, Trireme can layer its identity exchange on top. Future plans for Trireme include the ability to require encryption on-demand on a connection-by-connection basis, though this will slow data flows as now all packets in a flow will need to go via the user-space daemon, for encryption to be applied/removed.

There are problems, or at least corner cases. Because TCP is stateful, and the netfilter state engine is used to recognize the packets involved in a new connection in order to send those (and only those) via the Trireme user-space daemon, every connection set up before any policies are applied remains valid after policy application, even if the policy should have forbidden it. Aporeto is experimenting with flushing the TCP connection table in order to address this problem.

The slides from Van De Walle's talk are available those who are interested. Trireme is an elegant implementation of a clever idea, but for me its greatest value may be in encouraging me to recognize that zero-trust networking is a good way to think in a containerized microservice environment; that the old days, when access to the private network bolstered or indeed established your right to access the information stored thereon, might just be passing away.

[Thanks to the Linux Foundation, LWN's travel sponsor, for assistance in getting to Berlin for CNC and KubeCon.]

Index entries for this article
Security	Containers
GuestArticles	Yates, Tom
Conference	CloudNativeCon+KubeCon/2017

Network security in the microservice environment

Posted Apr 13, 2017 0:29 UTC (Thu) by shemminger (subscriber, #5739) [Link] (1 responses)

This makes a lot of sense. I wonder if the security architecture could be tied into faster packet filtering like XDP rather than getting bogged down with flow management in user space.

Network security in the microservice environment

Posted Apr 13, 2017 13:15 UTC (Thu) by SEJeff (guest, #51588) [Link]

The flow management can be offloaded to hardware directly via DMA and can actually be faster when it is all in userspace than in kernel space. If that wasn't true, things like Mellanox's VMA, SolarFlare's OpenOnload, Intel's DPTK, Chelsio's FCOE Offload, etc. Finally, if you think networking in userspace is slower, may I politely suggest you read up on Van Jacobson's network channels (it was covered here on LWN maybe 10 or so years ago?).

Network security in the microservice environment

Posted Apr 13, 2017 6:31 UTC (Thu) by alonz (subscriber, #815) [Link]

I wonder whether a similar mechanism could be built on top of standard IPsec (perhaps as some variation on “opportunistic” IPsec). This would reduce the cost of encryption, as it would be carried out on the kernel level.

Network security in the microservice environment

Posted Apr 13, 2017 6:54 UTC (Thu) by Kamilion (guest, #42576) [Link]

Huh, this seems like a good fit for wireguard's model, too...

NAT?

Posted Apr 13, 2017 7:21 UTC (Thu) by zdzichu (subscriber, #17118) [Link] (1 responses)

> this will almost inherently involve network address translation (NAT)

Excuse me, NAT? Why even bother with that when you have IPv6? Ah yes, it's 2017 and Kubernetes DOES NOT support IPv6.

NAT?

Posted Apr 13, 2017 15:13 UTC (Thu) by smoogen (subscriber, #97) [Link]

The fact that kubernetes doesn't support ipv6 is probably tied to the fact that the majority of the environments it is currently deployed don't either. Lots of new and existing hardware still doesn't support IPv6 out of the box. At the usual rate of hardware obsolescence, IPv4 will be around in large amounts til 2028. [I don't say this as something I find happy or joyful.. :/]

Network security in the microservice environment

Posted Apr 13, 2017 22:02 UTC (Thu) by davidstrauss (guest, #85867) [Link] (7 responses)

> if PKI is used, each node must be supplied with its private key and the relevant certificates.

Perhaps I'm assuming distribution because of the word "supplied" being used for both the key and the certificate, but private keys ought to be generated on hardware local to the machine that needs them -- and never sent anywhere. The only certificate setup steps that should involve a network hop are (1) the certificate signing request and (2) the certificate returned for it.

Network security in the microservice environment

Posted Apr 14, 2017 15:52 UTC (Fri) by madhatter (subscriber, #4665) [Link] (6 responses)

Firstly, it's my article, so that's my fault. Trireme's README on github states only that "when using PKI, the PrivateKey and corresponding CA certificate must be available locally on the node", so I have added an implication that was not there by my choice of words, for which I apologise.

That said, I don't agree that there are no valid reasons whatsoever to generate a keypair off-host. You can find me attempting to consider both pros and cons in my answer to this ServerFault question from a few years back.

Network security in the microservice environment

Posted Apr 14, 2017 19:11 UTC (Fri) by davidstrauss (guest, #85867) [Link]

> That said, I don't agree that there are no valid reasons whatsoever to generate a keypair off-host. You can find me attempting to consider both pros and cons in my answer to this ServerFault question from a few years back.

I agree that a design like that can make sense, but it's more of a way to package and ship entropy than part of the actual PKI architecture. You could create a file with random data, send it to the server, and pump it into the kernel's entropy sources to get a similar effect.

Network security in the microservice environment

Posted Apr 14, 2017 19:23 UTC (Fri) by davidstrauss (guest, #85867) [Link] (3 responses)

I should add to my previous comment that shipping entropy ought to be better than shipping the keypair, not just comparable. When you ship entropy to the keypair-generating system (KGS, to spare myself later verbosity), the result should be *at least* as random as shipping the key itself. That is, if the KGS has no entropy or predictable "entropy," it shouldn't undermine the entropy of the data shipped to it; xor-ing random data with predictable data (in equal parts) still produces an equally random result. But, the upside is that, if the KGS has any usable entropy, the private key will not be known or guessable to other systems (to whatever degree that KGS has entropy).

So, I think the worst case has similar attack surface to the "ship the keypair" design, but the best case has less.

Network security in the microservice environment

Posted Apr 14, 2017 19:31 UTC (Fri) by madhatter (subscriber, #4665) [Link] (2 responses)

Yes, and no. A communications mechanism that can be trusted to ship a block of entropy undamaged and secure, from a secure endpoint (for who else should be allowed to generate this) can be trusted to do the same for a key.

I don't mean to suggest that it's done deal either way, merely to note that the matter is capable of reasonable argument, and that to have a preference either way isn't necessarily foolish.

Network security in the microservice environment

Posted Apr 14, 2017 19:53 UTC (Fri) by davidstrauss (guest, #85867) [Link] (1 responses)

> A communications mechanism that can be trusted to ship a block of entropy undamaged and secure, from a secure endpoint (for who else should be allowed to generate this) can be trusted to do the same for a key.

I'm not referring so much to the trustworthiness of the transport as this distinction:

* If Machine P generates the keypair and sends it to Machine Q, both have had access to the full private key.
* If Machine P generates a batch of entropy and ships it to Machine Q (which has less or less trustworthy entropy), and Q generates the keypair, then only Q knows the full private key. Machine P may have lots of useful data to start guessing the private key, but any local entropy on Q -- as well as which parts of P's entropy that Q chooses to use for the keypair (versus other entropy-consuming tasks) -- make that much harder.

The distinction for attack surface could matter in various scenarios:

* Machine P is directly compromised.
* Machine P is a VM, and the host gets compromised.
* Machine P accidentally logs the data it's sending, and that data gets sent elsewhere (like an ELK stack).
* Machine P turns out to use predictable entropy sources.
* The transport from P to Q is compromised. (Even with all the security that's possible, it's still worse to leak the private key than just entropy used in creating it.)

And, unlike how you can use an HSM to sign CSRs on the certificate authority (preventing many attacks from getting the private key data), Machine P necessarily has to handle the entropy or keypair data that it's sending to Machine Q.

Network security in the microservice environment

Posted Apr 14, 2017 19:59 UTC (Fri) by madhatter (subscriber, #4665) [Link]

As I said, I think the matter is capable of reasonable argument, and I'm certainly not saying you don't have a point. I think that what I'm doing is defensible, but I wouldn't dream of suggesting that anyone else do it. I would dream of suggesting that anyone who's got to generate a keypair should think about the threat model they think they face, and take their own position on what to do, rather than blindly following anyone's stern prescriptions.

Network security in the microservice environment

Posted Apr 14, 2017 20:18 UTC (Fri) by mjg59 (subscriber, #23239) [Link]

Generate them on a TPM - if you know the node's EK in advance, this even gives you a mechanism to prove identity.

Network security in the microservice environment

Posted Apr 14, 2017 13:37 UTC (Fri) by mcmanus (guest, #4569) [Link]

this seems to claim authentication for the flow, but it is really authentication for just the tcp handshake packets, right? And those aren't usually the interesting ones :)

Network security in the microservice environment

Posted Apr 17, 2017 19:21 UTC (Mon) by ncm (guest, #165) [Link]

Say what you will about kerberos, they thought all this through to the end and came out with tickets. You don't have to use kerberos, but you will end up in the same place when you're finished, only incompatible with everybody else.

The problem that made people roll their eyes about kerberos was key management. Solve that the modern way, and kerberos fixes everything else. It will be complicated because some things cannot be simple without being broken.