Network security in the microservice environment
Back in the old days, said Van De Walle, the enterprise had its computing infrastructure in data centers — big racks of servers, all with workloads running on them. The interconnecting network was divided into zones according to functional and security requirements. The zones were separated by big firewalls which filtered traffic based on source and destination IP address (or subnet) and port range. Modulo a few disasters, this worked OK.
But then came microservices, which took away workload predictability. No longer could one point at a box and say "this is a Java box, it runs the JVM", or "this is an Apache box, it runs the web server". Now any given pod may end up running on any node, at least by default, and this is a big challenge to the traditional model of firewalls. One could run one firewall per node, but people have started to deploy VPNs and software-defined networks, and once you do that, each new pod deployment requires updating the node firewalls across the entire deployment. The traditional firewall-as-gatekeeper model has real problems in this world.
So we should step back and think about this again; perhaps the network
is not the best place to provide our interconnection security. Indeed,
Van De Walle went on, we could embrace the contrary and assume that the
network provides no security at all. This changes our threat model; the
internal network is now just as untrusted as the external one. We cannot
now rely on IP addresses and port numbers as authenticators; each flow must be
authorized some other way. Kubernetes, he said, defaults to zero-trust
networking; that is, by default the network is flat from the ground up.
The IP address is dynamically assigned and carries no real information.
Instead,
Kubernetes has objects with associated identifiers — name,
namespace, and labels — these are the foundation for your identity.
Kubernetes has made attempts to address the flatness of the network before. In v1.4, noted Van De Walle, network policies were introduced. The policy engine was based on a whitelist model; that is, everything was forbidden unless you explicitly authorized it. Policies are ingress-only, so you get to police traffic coming into your pod, but not that going out of it. Network policies are not active by default; they are activated on a per-namespace basis by the use of an annotation in the relevant YAML file. (For those who are not familiar with Kubernetes, I should point out that nearly everything involves YAML. The creation, modification, or destruction of just about any kind of resource involves the changes being specified in a YAML file, which is then referenced on the command line. If you're going to start using Kubernetes, prepare to start dreaming in YAML.)
In Van De Walle's opinion, Kubernetes network policies are fairly good. Rules are applied to specific pods, which are selected by a role label. Although the filtering is ingress filtering, you specify which traffic is allowed to enter a pod based on the role of the sender. For example, a policy might apply to all nodes whose role was backend; that policy would permit any traffic originating from a pod whose role was frontend. Rules are additive; each rule allows a new class of traffic, and traffic need only match any one rule in order to be permitted ingress.
There are already a number of projects working with (and extending) the network policies mechanism on the market; Van De Walle named Project Calico as an example, and there are others. But these implementations are tied to the networking backend, because, at the end of the day, policing is still based on IP addresses. So Aporeto has developed Trireme, a way of securing data flows that is independent of IP address information or the network fabric, and based entirely around the pod label.
Trireme adds a signed-identity-exchange phase on top of TCP's three-way handshake. The signed identity is then used to implement Kubernetes's native network policies with an enhanced degree of reliability. The iptables "mangle" table is used to send packets involved in the handshake via Trireme's user-space daemon, which must be installed on each authenticating node and which adds the identity exchange on top of the TCP handshake. Signatures are authenticated via a pre-shared key, or full PKI. If the pre-shared key option is used — only recommended for small deployments — Aporeto suggests distributing the key as a Kubernetes secret; if PKI is used, each node must be supplied with its private key and the relevant certificates.
Because this authentication is implemented on the node level, the pod is completely unaware that it is happening; it just never hears incoming traffic that fails the Trireme test. At this point, Van De Walle did a pleasingly quirky demo where a server pod was deployed that took requests over the network for the "Beer of the Day", which it returned from a random list of German beers. Two client pods were also deployed, both running a tiny application that continuously retrieved Beers of the Day from the server; one client possessed the tokens to assert an identity via Trireme, and one did not. When no network policy was in force, both clients were able to retrieve the beer of the day; when a network policy allowing only the approved client to connect to the server was applied, the non-approved client could no longer retrieve the daily beer.
Trireme is particularly helpful when not everything is happening in a single Kubernetes cluster. The ability to federate clusters across multiple data centers is coming; because this will almost inherently involve network address translation (NAT), authentication via source IP becomes extremely difficult. But as long as a TCP connection can be made, Trireme can layer its identity exchange on top. Future plans for Trireme include the ability to require encryption on-demand on a connection-by-connection basis, though this will slow data flows as now all packets in a flow will need to go via the user-space daemon, for encryption to be applied/removed.
There are problems, or at least corner cases. Because TCP is stateful, and the netfilter state engine is used to recognize the packets involved in a new connection in order to send those (and only those) via the Trireme user-space daemon, every connection set up before any policies are applied remains valid after policy application, even if the policy should have forbidden it. Aporeto is experimenting with flushing the TCP connection table in order to address this problem.
The slides from Van De Walle's talk are available those who are interested. Trireme is an elegant implementation of a clever idea, but for me its greatest value may be in encouraging me to recognize that zero-trust networking is a good way to think in a containerized microservice environment; that the old days, when access to the private network bolstered or indeed established your right to access the information stored thereon, might just be passing away.
[Thanks to the Linux Foundation, LWN's travel sponsor, for assistance
in getting to Berlin for CNC and KubeCon.]
| Index entries for this article | |
|---|---|
| Security | Containers |
| GuestArticles | Yates, Tom |
| Conference | CloudNativeCon+KubeCon/2017 |
Posted Apr 13, 2017 0:29 UTC (Thu)
by shemminger (subscriber, #5739)
[Link] (1 responses)
Posted Apr 13, 2017 13:15 UTC (Thu)
by SEJeff (guest, #51588)
[Link]
Posted Apr 13, 2017 6:31 UTC (Thu)
by alonz (subscriber, #815)
[Link]
Posted Apr 13, 2017 6:54 UTC (Thu)
by Kamilion (guest, #42576)
[Link]
Posted Apr 13, 2017 7:21 UTC (Thu)
by zdzichu (subscriber, #17118)
[Link] (1 responses)
Excuse me, NAT? Why even bother with that when you have IPv6? Ah yes, it's 2017 and Kubernetes DOES NOT support IPv6.
Posted Apr 13, 2017 15:13 UTC (Thu)
by smoogen (subscriber, #97)
[Link]
Posted Apr 13, 2017 22:02 UTC (Thu)
by davidstrauss (guest, #85867)
[Link] (7 responses)
Perhaps I'm assuming distribution because of the word "supplied" being used for both the key and the certificate, but private keys ought to be generated on hardware local to the machine that needs them -- and never sent anywhere. The only certificate setup steps that should involve a network hop are (1) the certificate signing request and (2) the certificate returned for it.
Posted Apr 14, 2017 15:52 UTC (Fri)
by madhatter (subscriber, #4665)
[Link] (6 responses)
That said, I don't agree that there are no valid reasons whatsoever to generate a keypair off-host. You can find me attempting to consider both pros and cons in my answer to this ServerFault question from a few years back.
Posted Apr 14, 2017 19:11 UTC (Fri)
by davidstrauss (guest, #85867)
[Link]
I agree that a design like that can make sense, but it's more of a way to package and ship entropy than part of the actual PKI architecture. You could create a file with random data, send it to the server, and pump it into the kernel's entropy sources to get a similar effect.
Posted Apr 14, 2017 19:23 UTC (Fri)
by davidstrauss (guest, #85867)
[Link] (3 responses)
So, I think the worst case has similar attack surface to the "ship the keypair" design, but the best case has less.
Posted Apr 14, 2017 19:31 UTC (Fri)
by madhatter (subscriber, #4665)
[Link] (2 responses)
I don't mean to suggest that it's done deal either way, merely to note that the matter is capable of reasonable argument, and that to have a preference either way isn't necessarily foolish.
Posted Apr 14, 2017 19:53 UTC (Fri)
by davidstrauss (guest, #85867)
[Link] (1 responses)
I'm not referring so much to the trustworthiness of the transport as this distinction:
* If Machine P generates the keypair and sends it to Machine Q, both have had access to the full private key.
The distinction for attack surface could matter in various scenarios:
* Machine P is directly compromised.
And, unlike how you can use an HSM to sign CSRs on the certificate authority (preventing many attacks from getting the private key data), Machine P necessarily has to handle the entropy or keypair data that it's sending to Machine Q.
Posted Apr 14, 2017 19:59 UTC (Fri)
by madhatter (subscriber, #4665)
[Link]
Posted Apr 14, 2017 20:18 UTC (Fri)
by mjg59 (subscriber, #23239)
[Link]
Posted Apr 14, 2017 13:37 UTC (Fri)
by mcmanus (guest, #4569)
[Link]
Posted Apr 17, 2017 19:21 UTC (Mon)
by ncm (guest, #165)
[Link]
The problem that made people roll their eyes about kerberos was key management. Solve that the modern way, and kerberos fixes everything else. It will be complicated because some things cannot be simple without being broken.
Network security in the microservice environment
Network security in the microservice environment
I wonder whether a similar mechanism could be built on top of standard IPsec (perhaps as some variation on “opportunistic” IPsec). This would reduce the cost of encryption, as it would be carried out on the kernel level.
Network security in the microservice environment
Network security in the microservice environment
NAT?
NAT?
Network security in the microservice environment
Firstly, it's my article, so that's my fault. Trireme's README on github states only that "when using PKI, the PrivateKey and corresponding CA certificate must be available locally on the node", so I have added an implication that was not there by my choice of words, for which I apologise.
Network security in the microservice environment
Network security in the microservice environment
Network security in the microservice environment
Network security in the microservice environment
Network security in the microservice environment
* If Machine P generates a batch of entropy and ships it to Machine Q (which has less or less trustworthy entropy), and Q generates the keypair, then only Q knows the full private key. Machine P may have lots of useful data to start guessing the private key, but any local entropy on Q -- as well as which parts of P's entropy that Q chooses to use for the keypair (versus other entropy-consuming tasks) -- make that much harder.
* Machine P is a VM, and the host gets compromised.
* Machine P accidentally logs the data it's sending, and that data gets sent elsewhere (like an ELK stack).
* Machine P turns out to use predictable entropy sources.
* The transport from P to Q is compromised. (Even with all the security that's possible, it's still worse to leak the private key than just entropy used in creating it.)
Network security in the microservice environment
Network security in the microservice environment
Network security in the microservice environment
Network security in the microservice environment
