The container orchestrator landscape
Docker and other container engines can greatly simplify many aspects of deploying a server-side application, but numerous applications consist of more than one container. Managing a group of containers only gets harder as additional applications and services are deployed; this has led to the development of a class of tools called container orchestrators. The best-known of these by far is Kubernetes; the history of container orchestration can be divided into what came before it and what came after.
The convenience offered by containers comes with some trade-offs; someone who adheres strictly to Docker's idea that each service should have its own container will end up running a large number of them. Even a simple web interface to a database might require running separate containers for the database server and the application; it might also include a separate container for a web server to handle serving static files, a proxy server to terminate SSL/TLS connections, a key-value store to serve as a cache, or even a second application container to handle background jobs and scheduled tasks.
An administrator who is responsible for several such applications will quickly find themselves wishing for a tool to make their job easier; this is where container orchestrators step in. A container orchestrator is a tool that can manage a group of multiple containers as a single unit. Instead of operating on a single server, orchestrators allow combining multiple servers into a cluster, and automatically distribute container workloads among the cluster nodes.
Docker Compose and Swarm
Docker Compose is not quite an orchestrator, but it was Docker's first attempt to create a tool to make it easier to manage applications that are made out of several containers. It consumes a YAML-formatted file, which is almost always named docker-compose.yml. Compose reads this file and uses the Docker API to create the resources that it declares; Compose also adds labels to all of the resources, so that they can be managed as a group after they are created. In effect, it is an alternative to the Docker command-line interface (CLI) that operates on groups of containers. Three types of resources can be defined in a Compose file:
- services contains declarations of containers to be launched. Each entry in services is equivalent to a docker run command.
- networks declares networks that can be attached to the containers defined in the Compose file. Each entry in networks is equivalent to a docker network create command.
- volumes defines named volumes that can be attached to the containers. In Docker parlance, a volume is persistent storage that is mounted into the container. Named volumes are managed by the Docker daemon. Each entry in volumes is equivalent to a docker volume create command.
Networks and volumes can be directly connected to networks and filesystems on the host that Docker is running on, or they can be provided by a plugin. Network plugins allow things like connecting containers to VPNs; a volume plugin might allow storing a volume on an NFS server or an object storage service.
Compose provides a much more convenient way to manage an application that consists of multiple containers, but, at least in its original incarnation, it only worked with a single host; all of the containers that it created were run on the same machine. To extend its reach across multiple hosts, Docker introduced Swarm mode in 2016. This is actually the second product from Docker to bear the name "Swarm" — a product from 2014 implemented a completely different approach to running containers across multiple hosts, but it is no longer maintained. It was replaced by SwarmKit, which provides the underpinnings of the current version of Docker Swarm.
Swarm mode is included in Docker; no additional software is required. Creating a cluster is a simple matter of running docker swarm init on an initial node, and then docker swarm join on each additional node to be added. Swarm clusters contain two types of nodes. Manager nodes provide an API to launch containers on the cluster, and communicate with each other using a protocol based on the Raft Consensus Algorithm in order to synchronize the state of the cluster across all managers. Worker nodes do the actual work of running containers. It is unclear how large these clusters can be; Docker's documentation says that a cluster should have no more than 7 manager nodes but does not specify a limit on the number of worker nodes. Bridging container networks across nodes is built-in, but sharing storage between nodes is not; third-party volume plugins need to be used to provide shared persistent storage across nodes.
Services are deployed on a swarm using Compose files. Swarm extended the Compose format by adding a deploy key to each service that specifies how many instances of the service should be running and which nodes they should run on. Unfortunately, this led to a divergence between Compose and Swarm, which caused some confusion because options like CPU and memory quotas needed to be specified in different ways depending on which tool was being used. During this period of divergence, a file intended for Swarm was referred to as a "stack file" instead of a Compose file in an attempt to disambiguate the two; thankfully, these differences appear to have been smoothed over in the current versions of Swarm and Compose, and any references to a stack file being distinct from a Compose file seem to have largely been scoured from the Internet. The Compose format now has an open specification and its own GitHub organization providing reference implementations.
There is some level of uncertainty about the future of Swarm. It once formed the backbone of a service called Docker Cloud, but the service was suddenly shut down in 2018. It was also touted as a key feature of Docker's Enterprise Edition, but that product has since been sold to another company and is now marketed as Mirantis Kubernetes Engine. Meanwhile, recent versions of Compose have gained the ability to deploy containers to services hosted by Amazon and Microsoft. There has been no deprecation announcement, but there also hasn't been any announcement of any other type in recent memory; searching for the word "Swarm" on Docker's website only turns up passing mentions.
Kubernetes
Kubernetes (sometimes known as k8s) is a project inspired by an internal Google tool called Borg. Kubernetes manages resources and coordinates running workloads on clusters of up to thousands of nodes; it dominates container orchestration like Google dominates search. Google wanted to collaborate with Docker on Kubernetes development in 2014, but Docker decided to go its own way with Swarm. Instead, Kubernetes grew up under the auspices of the Cloud Native Computing Foundation (CNCF). By 2017, Kubernetes had grown so popular that Docker announced that it would be integrated into Docker's own product.
Aside from its popularity, Kubernetes is primarily known for its complexity. Setting up a new cluster by hand is an involved task, which requires the administrator to select and configure several third-party components in addition to Kubernetes itself. Much like the Linux kernel needs to be combined with additional software to make a complete operating system, Kubernetes is only an orchestrator and needs to be combined with additional software to make a complete cluster. It needs a container engine to run its containers; it also needs plugins for networking and persistent volumes.
Kubernetes distributions exist to fill this gap. Like a Linux distribution, a Kubernetes distribution bundles Kubernetes with an installer and a curated selection of third-party components. Different distributions exist to fill different niches; seemingly every tech company of a certain size has its own distribution and/or hosted offering to cater to enterprises. The minikube project offers an easier on-ramp for developers looking for a local environment to experiment with. Unlike their Linux counterparts, Kubernetes distributions are certified for conformance by the CNCF; each distribution must implement the same baseline of functionality in order to obtain the certification, which allows them to use the "Certified Kubernetes" badge.
A Kubernetes cluster contains several software components. Every node in the cluster runs an agent called the kubelet to maintain membership in the cluster and accept work from it, a container engine, and kube-proxy to enable network communication with containers running on other nodes.
The components that maintain the state of the cluster and make decisions about resource allocations are collectively referred to as the control plane — these include a distributed key-value store called etcd, a scheduler that assigns work to cluster nodes, and one or more controller processes that react to changes in the state of the cluster and trigger any actions needed to make the actual state match the desired state. Users and cluster nodes interact with the control plane through the Kubernetes API server. To effect changes, users set the desired state of the cluster through the API server, while the kubelet reports the actual state of each cluster node to the controller processes.
Kubernetes runs containers inside an abstraction called a pod, which can contain one or more containers, although running containers for more than one service in a pod is discouraged. Instead, a pod will generally have a single main container that provides a service, and possibly one or more "sidecar" containers that collect metrics or logs from the service running in the main container. All of the containers in a pod will be scheduled together on the same machine, and will share a network namespace — containers running within the same pod can communicate with each other over the loopback interface. Each pod receives its own unique IP address within the cluster. Containers running in different pods can communicate with each other using their cluster IP addresses.
A pod specifies a set of containers to run, but the definition of a pod says nothing about where to run those containers, or how long to run them for — without this information, Kubernetes will start the containers somewhere on the cluster, but will not restart them when they exit, and may abruptly terminate them if the control plane decides the resources they are using are needed by another workload. For this reason, pods are rarely used alone; instead, the definition of a pod is usually wrapped in a Deployment object, which is used to define a persistent service. Like Compose and Swarm, the objects managed by Kubernetes are declared in YAML; for Kubernetes, the YAML declarations are submitted to the cluster using the kubectl tool.
In addition to pods and Deployments, Kubernetes can manage many other types of objects, like load balancers and authorization policies. The list of supported APIs is continually evolving, and will vary depending on which version of Kubernetes and which distribution a cluster is running. Custom resources can be used to add APIs to a cluster to manage additional types of objects. KubeVirt adds APIs to enable Kubernetes to run virtual machines, for example. The complete list of APIs supported by a particular cluster can be discovered with the kubectl api-versions command.
Unlike Compose, each of these objects is declared in a separate YAML document, although multiple YAML documents can be inlined in the same file by separating them with "---", as seen in the Kubernetes documentation. A complex application might consist of many objects with their definitions spread across multiple files; keeping all of these definitions in sync with each other when maintaining such an application can be quite a chore. In order to make this easier, some Kubernetes administrators have turned to templating tools like Jsonnet.
Helm takes the templating approach a step further. Like Kubernetes, development of Helm takes place under the aegis of the CNCF; it is billed as "the package manager for Kubernetes". Helm generates YAML configurations for Kubernetes from a collection of templates and variable declarations called a chart. Its template language is distinct from the Jinja templates used by Ansible but looks fairly similar to them; people who are familiar with Ansible Roles will likely feel at home with Helm Charts.
Collections of Helm charts can be published in Helm repositories; Artifact Hub provides a large directory of public Helm repositories. Administrators can add these repositories to their Helm configuration and use the ready-made Helm charts to deploy prepackaged versions of popular applications to their cluster. Recent versions of Helm also support pushing and pulling charts to and from container registries, giving administrators the option to store charts in the same place that they store container images.
Kubernetes shows no signs of losing momentum any time soon. It is designed to manage any type of resource; this flexibility, as demonstrated by the KubeVirt virtual-machine controller, gives it the potential to remain relevant even if containerized workloads should eventually fall out of favor. Development proceeds at a healthy clip and new major releases come out regularly. Releases are supported for a year; there doesn't seem to be a long-term support version available. Upgrading a cluster is supported, but some prefer to bring up a new cluster and migrate their services over to it.
Nomad
Nomad is an orchestrator from HashiCorp, which is marketed as a simpler alternative to Kubernetes. Nomad is an open source project, like Docker and Kubernetes. It consists of a single binary called nomad, which can be used to start a daemon called the agent and also serves as a CLI to communicate with an agent. Depending on how it is configured, the agent process can run in one of two modes. Agents running in server mode accept jobs and allocate cluster resources for them. Agents running in client mode contact the servers to receive jobs, run them, and report their status back to the servers. The agent can also run in development mode, where it takes on the role of both client and server to form a single-node cluster that can be used for testing purposes.
Creating a Nomad cluster can be quite simple. In Nomad's most basic mode of operation, the initial server agent must be started, then additional nodes can be added to the cluster using the nomad server join command. HashiCorp also provides Consul, which is a general-purpose service mesh and discovery tool. While it can be used standalone, Nomad is probably at its best when used in combination with Consul. The Nomad agent can use Consul to automatically discover and join a cluster, and can also perform health checks, serve DNS records, and provide HTTPS proxies to services running on the cluster.
Nomad supports complex cluster topologies. Each cluster is divided into one or more "data centers". Like Swarm, server agents within a single data center communicate with each other using a protocol based on Raft; this protocol has tight latency requirements, but multiple data centers may be linked together using a gossip protocol that allows information to propagate through the cluster without each server having to maintain a direct connection to every other. Data centers linked together in this way can act as one cluster from a user's perspective. This architecture gives Nomad an advantage when scaled up to enormous clusters. Kubernetes officially supports up to 5,000 nodes and 300,000 containers, whereas Nomad's documentation cites example of clusters containing over 10,000 nodes and 2,000,000 containers.
Like Kubernetes, Nomad doesn't include a container engine or runtime. It uses task drivers to run jobs. Task drivers that use Docker and Podman to run containers are included; community-supported drivers are available for other container engines. Also like Kubernetes, Nomad's ambitions are not limited to containers; there are also task drivers for other types of workloads, including a fork/exec driver that simply runs a command on the host, a QEMU driver for running virtual machines, and a Java driver for launching Java applications. Community-supported task drivers connect Nomad to other types of workloads.
Unlike Docker or Kubernetes, Nomad eschews YAML in favor of HashiCorp Configuration Language (HCL), which was originally created for another HashiCorp project for provisioning cloud resources called Terraform. HCL is used across the HashiCorp product line, although it has limited adoption elsewhere. Documents written in HCL can easily be converted to JSON, but it aims to provide a syntax that is more finger-friendly than JSON and less error-prone than YAML.
HashiCorp's equivalent to Helm is called Nomad Pack. Like Helm, Nomad Pack processes a directory full of templates and variable declarations to generate job configurations. Nomad also has a community registry of pre-packaged applications, but the selection is much smaller than what is available for Helm at Artifact Hub.
Nomad does not have the same level of popularity as Kubernetes. Like Swarm, its development appears to be primarily driven by its creators; although it has been deployed by many large companies, HashiCorp is still very much the center of the community around Nomad. At this point, it seems unlikely the project has gained enough momentum to have a life independent from its corporate parent. Users can perhaps find assurance in the fact that HashiCorp is much more clearly committed to the development and promotion of Nomad than Docker is to Swarm.
Conclusion
Swarm, Kubernetes, and Nomad are not the only container orchestrators, but they are the three most viable. Apache Mesos can also be used to run containers, but it was nearly mothballed in 2021; DC/OS is based on Mesos, but much like Docker Enterprise Edition, the company that backed its development is now focused on Kubernetes. Most "other" container orchestration projects, like OpenShift and Rancher, are actually just enhanced (and certified) Kubernetes distributions, even if they don't have Kubernetes in their name.
Despite (or perhaps, because of) its complexity, Kubernetes currently enjoys the most popularity by far, but HashiCorp's successes with Nomad show that there is still room for alternatives. Some users remain loyal to the simplicity of Docker Swarm, but its future is uncertain. Other alternatives appear to be largely abandoned at this point. It would seem that the landscape has largely settled around these three players, but container orchestration is a still a relatively immature area. Ten years ago, very little of this technology even existed, and things are still evolving quickly. There are likely many exciting new ideas and developments in container orchestration that are still to come.
[Special thanks to Guinevere Saenger for educating me with regard to some of the finer points of Kubernetes and providing some important corrections for this article.]
Index entries for this article | |
---|---|
GuestArticles | Webb, Jordan |
Posted Aug 23, 2022 19:08 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (4 responses)
The reverse also happens. Borg doesn't let you submit individual alloc instances (pods) or tasks (containers) without wrapping them up in an alloc (ReplicaSet) or job (see above), so if you just want one copy of something, you have to give Borg a template and say "make one copy of it" instead of submitting the individual object directly, and so in practice we mostly speak of "jobs and allocs" rather than "tasks and alloc instances." But in k8s, you can configure one pod at a time if you really want to.
(For more specifics on how Borg works, read the paper: https://research.google/pubs/pub43438/)
* k8s also defines something called a "job," but it's a completely different thing, not relevant here.
Posted Aug 23, 2022 19:12 UTC (Tue)
by jordan (subscriber, #110573)
[Link] (2 responses)
Posted Aug 23, 2022 20:39 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Posted Aug 23, 2022 20:43 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link]
Posted Aug 23, 2022 20:23 UTC (Tue)
by dw (subscriber, #12017)
[Link]
Posted Aug 23, 2022 19:21 UTC (Tue)
by zyga (subscriber, #81533)
[Link]
Posted Aug 23, 2022 20:18 UTC (Tue)
by dw (subscriber, #12017)
[Link] (8 responses)
I stopped looking at or caring for alternatives, ECS has just the right level of complexity and it's a real shame nobody has found the time to do a free software clone of its control plane.
Posted Aug 23, 2022 21:22 UTC (Tue)
by beagnach (guest, #32987)
[Link]
Posted Aug 23, 2022 22:34 UTC (Tue)
by k8to (guest, #15413)
[Link] (2 responses)
It's funny, when "open source" meant Linux and Samba to me, it seemed like a world of down to earth implementations that might be clunky in some ways but were focused on comprehensible goals. Now in a world of Kubernetes, Spark, and Solr, I associate it more with engineer-created balls of hair, that you have to take care of with specialists to keep them working. More necessary evils than amplifying enablers.
Posted Aug 23, 2022 23:14 UTC (Tue)
by dw (subscriber, #12017)
[Link]
As for ECS lock-in, the time saved on a 1 liner SSM deploy of on-prem nodes easily covers the risk at some future date of having to port container definitions to pretty much any other system. Optimistically, assuming 3 days of one person's time to set up a local k8s, ECS offers about 450 node-months before reaching breakeven (450 / 5 node cluster = 90 months, much longer than many projects last before reaching the scrapheap). Of course ECS setup isn't completely free, but relatively speaking it may as well be considered free.
Posted Aug 25, 2022 1:59 UTC (Thu)
by milesrout (subscriber, #126894)
[Link]
For most people it still is. People that just run things normally, the way they always have, just carry on as normal. You don't hear from them because there's nothing to blog about it. It's business as normal. People think that kubernetes and docker and that whole "ecosystem" is far more prevalent than it really is, because when you use such overcomplicated enterpriseware you inevitably have issues and they get talked about. There's just nothing to blog about when it comes to just running a few servers with nginx reverse proxying some internet daemon. It Just Works.
Posted Aug 24, 2022 1:12 UTC (Wed)
by rjones (subscriber, #159862)
[Link] (1 responses)
One of the problems with self-hosting Kubernetes in the typical approach is to self host uses the naive approach of mixing Kubernetes API components (API/Scheduler/etcd/etc) with Infrastructure components (Networking/storage/ingress controllers/etc) with applications all on the same set of nodes.
So you have all these containers operating at different levels all mixing together. Which means that your "blast radius" for the cluster is very bad. If you mess up a network controller configuration you can take your kubernetes offline. If a application freaks out then it can take your cluster offline. Memory resources could be exhausted by a bad deploy or misbehaving application, which then takes out your storage, etc. etc.
This makes upgrades irritating and difficult and full of pitfalls and the cluster very vulnerable to misconfigurations.
You can mitigate these issues by separating out 'admin' nodes from 'etcd', 'storage', and 'worker' nodes. This greatly reduces the chances of outages and makes management easier, but it also adds a lot of extra complexity and setup. This is a lot of configuring and messing around if you are interested in just hosting 1-5 node kubernetes cluster for personal lab or specific project or whatever.
With K0s (and similar approaches with k3s and RancherOS) you have a single Unix-style service that provides the Kubernetes API components. You can cluster if you want, but the simplest setup just uses sqlite as the backend, which works fine for small or single use clusters. This runs in a separate VM or small machine from the rest of the cluster. Even if it's a single point of failure it's not too bad. The cluster will happily hum right along as you reboot your k0s controller node.
In this way managing the cluster is much more like how AWS EKS or Azure AKS cluster works. With those the API services are managed by the cloud provider separate from what you manage.
This is a massive improvement over what you may have experienced with something like OpenShift, Kubespray, or even really simple kadmin-based deploys. And most other approaches. It may not seem like a big deal, but for what most people are interested in terms of self-hosted kubernetes clusters I think it is.
Also I think that having numerous smaller k8s clusters is preferable over having a very large multi-tenet clusters. Just having things split up solves a lot of potential issues.
Posted Aug 24, 2022 6:30 UTC (Wed)
by dw (subscriber, #12017)
[Link]
The problem with kubernetes starts and ends with its design, it's horrible to work with in concept never mind any particular implementation
Posted Aug 24, 2022 16:05 UTC (Wed)
by sbheinlein (guest, #160469)
[Link] (1 responses)
That's enough of a mention for me.
Posted Aug 25, 2022 4:45 UTC (Thu)
by samuelkarp (subscriber, #131165)
[Link]
Posted Aug 23, 2022 20:55 UTC (Tue)
by cry_regarder (subscriber, #50545)
[Link]
https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-...
IIRC it is the default orchestrator for things like Samza. https://samza.apache.org/learn/documentation/latest/deplo...
Posted Aug 23, 2022 22:12 UTC (Tue)
by onlyben (guest, #132784)
[Link] (2 responses)
I try and avoid Kubernetes if I can. I do appreciate the problem it solves and think it is extremely useful, but it hasn't quite captured the magic for me that other tools have (including original docker, and probably even docker-compose). I'd be curious to know what people feel about Nomad.
Posted Aug 24, 2022 3:27 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
I personally would avoid Nomad right now. It's an "open core" system, with the "enterprise" version locking in some very useful features like multi-region support.
With K8s you can also use EKS on AWS or AKS on Azure to offload running the control plane to AWS/Azure. It's still very heavy on infrastructure that you need to configure, but at least it's straightforward and needs to be done once.
Posted Aug 24, 2022 16:42 UTC (Wed)
by schmichael (guest, #160476)
[Link]
Quick point of clarification: Multi-region federation is open source. You can federate Nomad clusters to create a single global control plane.
Multi-region deployments (where you can deploy a single job to multiple regions) are enterprise. Single-region jobs and deployments are open source.
Disclaimer: I'm the HashiCorp Nomad Engineering Team Lead
Posted Aug 23, 2022 23:19 UTC (Tue)
by bartoc (guest, #124262)
[Link] (13 responses)
One thing that always really annoyed me about k8s is the whole networking stack and networking requirements. My servers have real ipv6 addresses, that are routable from everywhere and I really, really do not want to deal with some insane BGP overlay. Each host can good and well get (at least) a /60 that can be further subdivided for each container.
The whole process just felt like figuring out exactly how the k8s people had reimplemented any number of existing utilities. It all gave me the impression the whole thing was abstraction for abstraction's sake. I feel the same way about stuff like ansible, so maybe I just really care about what code is actually executing on my servers more than most people.
I found Hashicorp's offerings (in general tbh, not just Nomad) to be a lot of shiny websites on top of very basic tools that ended up adding relatively little value compared to just using whatever it was they were abstracting over.
Posted Aug 24, 2022 1:10 UTC (Wed)
by jordan (subscriber, #110573)
[Link] (3 responses)
Posted Aug 26, 2022 14:37 UTC (Fri)
by mdaverde (guest, #151459)
[Link] (2 responses)
I believe Facebook/Meta's infra is heavily systemd-based with their in-house Twine cluster manager but I don't know how much of the internals are available.
Posted Aug 26, 2022 15:20 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
Posted Aug 26, 2022 15:46 UTC (Fri)
by paulj (subscriber, #341)
[Link]
Twine (the external name, but more often called 'Tupperware' - probably the better name to use in searches) would be hard-to-impossible to make available to non-FB use, and probably mostly pointless. It is very heavily integrated in with lots of other Facebook infrastructure, from the CI system, to the automated fleet roll-out system of services, to the service discovery and routing system, etc., etc.
Posted Aug 24, 2022 1:28 UTC (Wed)
by rjones (subscriber, #159862)
[Link] (6 responses)
Kubernetes and it's networking stack complexity is the result of the original target for these sorts of clusters.
The idea is that you needed to have a way for Kubernetes to easily adapt to a wide variety of different cloud architectures. The people that are running them don't have control over the addresses they get, addresses are very expensive, and they don't have control over any of the network infrastructure. Ipv6 isn't even close to a option for most of these types of setup.
So it makes a lot of sense to take advantage of Tunnelling over TCP for the internal networking. This way it works completely independent of any physical or logical network configuration the kubernetes might be hosted on. You can even make it work between multiple cloud providers if you want.
> One thing that always really annoyed me about k8s is the whole networking stack and networking requirements. My servers have real ipv6 addresses, that are routable from everywhere and I really, really do not want to deal with some insane BGP overlay. Each host can good and well get (at least) a /60 that can be further subdivided for each container.
You don't have to use the tunneling network approach if you want. For example if you have physical servers with multiple network ports you can just use those separate lans instead.
Generally speaking you'll want to have 3 LANs. One for the pod network, one for the service network, and one for external network. More sophisticated setups might want to have a dedicated network for storage on top of that, and I am sure that people can find uses for even more then that.
I don't know how mature K8s IPv6 support is nowadays, but I can see why that would be preferable.
It could be that a lot of people are not in a position to micro-manage things on that level and must depend on the expertise of other people to accomplish things in a reasonable manner.
Posted Aug 24, 2022 6:55 UTC (Wed)
by dw (subscriber, #12017)
[Link] (2 responses)
Take as a simple example the network abstraction, it's maybe 20%+ of the the whole Kubernetes conceptual overhead. K8 more or less mandates some kind of mapping at the IP and naming layers, so you usually have at a minimum some variation of a custom DNS server and a few hundred ip/nf/xdp rules or whatnot to implement routing. Docker's solution to the same problem was simply a convention for dumping network addresses into environment variables. No custom DNS, no networking nonsense.
It's one of a thousand baked-in choices made in k8s that really didn't need to be that way. The design itself is bad.
No conversation of Kubernetes complexity is complete without mention of their obsolescent-by-design approach to API contracts. We've just entered a period where Ingresses went from marked beta, to stable, to about-to-be-deprecated by gateways. How many million lines of YAML toil across all k8s users needed trivial updates when the interface became stable, and how many million more will be wasted by the time gateways are fashionable? How long will gateways survive? That's a meta-design problem, and a huge red flag. Once you see it in a team you can expect it time and time again. Not only is it overcomplicated by design, it's also quicksand, and nothing you build on it can be expected to have any permanence.
Posted Aug 25, 2022 18:44 UTC (Thu)
by Depereo (guest, #104565)
[Link]
It's quite frustrating to go from the infrastructure world of VMs, which are extremely backwards and forwards compatible, to kubernetes, where the necessary major upgrades every few months will break several deployment pipelines, or deprecate APIs, or do various other things that require your clients to endlessly scramble to 'keep up'. And you're right, it's usually to do with network requirements (or sometimes storage which is somewhat related to network design anyway).
Committing to deployment on k8s is a commitment to a much higher degree of required ongoing updates for and probably unexpected issues with deployment than I'm used to with for example virtual machine orchestration. Unless you're at a certain and very large size I have come to think it's not worth it at all.
Posted Aug 26, 2022 1:38 UTC (Fri)
by thockin (guest, #158217)
[Link]
Last I looked in depth, docker had a DNS server built in, too. Publishing IPs via env vars is a TERRIBLE solution for a bunch of reasons. DNS is better, but still has a lot of historical problems (and yeah, kube sort of tickles it wrong sometimes). DNS + VIP is much better, which is what k8s implements. Perfect? No. But pretty functional.
> No conversation of Kubernetes complexity is complete without mention of their obsolescent-by-design approach to API contracts. We've just entered a period where Ingresses went from marked beta, to stable, to about-to-be-deprecated by gateways.
I know of no plan to formally deprecate Ingress, and I would be the approver of that, so....FUD. Also, deprecate != EOL. We have made a public statement that we have NO PLANS to remove GA APIs. Perhaps some future circumstance could cause us to re-evaluate that, but for now, no.
> How many million lines of YAML toil across all k8s users needed trivial updates when the interface became stable
The long-beta of Ingress is a charge I will accept. That sucked and we have taken action to prevent that from ever happening again.
> and how many million more will be wasted by the time gateways are fashionable?
Nobody HAS to adopt gateway, but hopefully they will want to. It's a much more functional API than Ingress.
> How long will gateways survive? That's a meta-design problem, and a huge red flag.
APIs are forever. That's how long. Once it hits GA, we will keep supporting it. No FUD required.
> nothing you build on it can be expected to have any permanence.
We have a WHOLE LOT of evidence to the contrary. If you have specific issues, I'd love to hear them.
I don't claim kubernetes is perfect or fits every need, but you seem to have had a bad experience that is not quite the norm.
Posted Aug 24, 2022 7:58 UTC (Wed)
by bartoc (guest, #124262)
[Link] (1 responses)
Well, I don't care about any cloud architectures except mine :). More seriously though the people running clouds absolutely do have control over the addresses they get! And tunneling works just as well if you want to provide access to the ipv6 internet on container hosts that only have ipv4, except in that situation you have some hope of getting rid of the tunnels once you no longer need ipv4.
> Generally speaking you'll want to have 3 LANs. One for the pod network, one for the service network, and one for external network. More sophisticated setups might want to have a dedicated network for storage on top of that, and I am sure that people can find uses for even more then that.
IMO this is _nuts_, I want _ONE_ network and I want that network to be the internet (with stateful firewalls, obviously).
Posted Aug 26, 2022 1:41 UTC (Fri)
by thockin (guest, #158217)
[Link]
You DO need to think about addressing and how you want you cluster(s) to interact with everything else.
Posted Aug 26, 2022 1:26 UTC (Fri)
by thockin (guest, #158217)
[Link]
Should work fine.
Posted Aug 26, 2022 1:25 UTC (Fri)
by thockin (guest, #158217)
[Link] (1 responses)
You don't need an overlay if you already have a decent sized range of IPs per node. Just use those IPs.
I don't know where the idea that you NEED an overlay comes from. If you have IPs, just use those. That's what it was designed for.
Posted Aug 26, 2022 23:18 UTC (Fri)
by bartoc (guest, #124262)
[Link]
Or you could use a virtual switch, that would probably "just work"
Posted Aug 24, 2022 0:26 UTC (Wed)
by denton (guest, #159595)
[Link] (2 responses)
One thing that k8s got right was it gave the ability for users to define new resource types, via CustomResourceDefinitions (CRDs). So for example, if you wanted a Postgres database in your k8s cluster, you could install a CRD + Postgres Controller and have access to that new API. It's led to a large number of Operators that can enable advanced functionality in the cluster, without the user needing to understand how they work. This is similar to managed services on cloud providers, like Aurora or RDS in AWS.
I'm wondering if nomad has a similar functionality?
Posted Aug 24, 2022 1:11 UTC (Wed)
by jordan (subscriber, #110573)
[Link]
Posted Aug 24, 2022 17:09 UTC (Wed)
by schmichael (guest, #160476)
[Link]
No, Nomad has chosen not to implement CRDs/Controllers/Operators as seen in Kubernetes. Many users use the Nomad API to build their own service control planes, and the Nomad Autoscaler - https://github.com/hashicorp/nomad-autoscaler/ - is an example of a generic version of this: it's a completely external project and service that runs in your Nomad cluster to provide autoscaling of your other Nomad managed services and their infrastructure. Projects like Patroni also work with Nomad, so similar projects to controllers due exist: https://github.com/ccakes/nomad-pgsql-patroni
The reason (pros) for this decision is largely that it lets Nomad focus on core scheduling problems. Many of our users build a platform on top of Nomad and appreciate the clear distinction between Nomad placing workloads and their higher level platform tooling managing the specific orchestration needs of their systems using Nomad's APIs. This should feel similar to the programming principles of encapsulation and composition.
The cons we've observed are: (1) you likely have to manage state for your control plane ... somewhere ... this makes it difficult to write generic open source controllers, and (2) your API will be distinct from Nomad's and require its own security, discovery, UI, etc.
I don't want to diminish the pain of forcing our users to solve those themselves. I could absolutely see Nomad gaining CRD-like capabilities someday, but in the short term you should plan on having to manage controller state and APIs yourself.
Disclaimer: I am the HashiCorp Nomad Engineering Team Lead
Posted Aug 24, 2022 11:15 UTC (Wed)
by jezuch (subscriber, #52988)
[Link]
Posted Aug 24, 2022 17:40 UTC (Wed)
by jordan (subscriber, #110573)
[Link] (6 responses)
Posted Aug 30, 2022 12:57 UTC (Tue)
by kleptog (subscriber, #1183)
[Link] (5 responses)
Our next projects will not use Swarm. We've experimented with K8s (on EKS) and you can make it do amazing things. But ECS is really easy to use and basically does what you want, just like Swarm. Nomad is something to look into.
Posted Aug 31, 2022 20:54 UTC (Wed)
by rorycl (guest, #151214)
[Link] (4 responses)
Our SaaS outfit is considering moving from a traditional Linux environment across around a few 10s of servers to use containerisation predominantly to allow a better development experience and testing, particularly for groups of covalent apps, but also to help divorce os and machine maintenance from app deployment.
Having built our business on reading the classic O'Reilly texts to pick up both concepts and implementation details, that combination seems difficult to find in books about orchestration. That is probably the fault of old age, but perhaps the proprietary beginnings of some of these technologies means marketing has confused purpose.
A guru pointed me to the Poulton "Docker Deep Dive" book (I read the May 2020 edition) and the last few chapters are devoted to Swarm. Despite the curious dissimilarities between Compose and Swarm, Swarm seems perfect for our sort of environment and a reasonable translation from our familiar linux setup in production, but where the Swarm manager acts to make hosts act like one large host by utilizing overlay networks on which apps can conveniently be scaled.
For a smallish outfit the benefits of Swarm seems straight-forward. Poulton summarises the situation like this: "Docker Swarm competes directly with Kubernetes -- they both orchestrate containerized applications. While it's true that Kubernetes has more momentum and a more active community and ecosystem, Docker Swarm is an excellent technology and a lot easier to configure and deploy. It's an excellent technology for small to medium businesses and application deployments".
Apart from concerns such as @kleptog's, it isn't clear to me why many more businesses aren't using Swarm.
Posted Sep 1, 2022 13:01 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
Because Swarm is too simplistic. It's kinda like writing in BASIC. It's OK for beginners, but you quickly reach its limits once you start using it seriously.
So people avoid it and jump straight into a more complex solution.
Posted Sep 2, 2022 14:47 UTC (Fri)
by rorycl (guest, #151214)
[Link] (1 responses)
...for what, for example?
(I've made a longer comment below, by the way.)
Posted Sep 3, 2022 5:00 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Sep 4, 2022 11:35 UTC (Sun)
by kleptog (subscriber, #1183)
[Link]
So if you have a complicated application where the resources (say networks, or services) depend on configuration settings, you have to write a kind of wrapper which reads the configuration and then uses that to update the Swarm configuration. And that configuration is stored separately to Swarm itself. This is annoying and error prone. Because the tool to do this is complex you get the situation where you distribute the tool in a container and then start it up passing the Docker control socket in.
So Swarm can work well if your application is simple enough to deploy via a Docker Compose file. But if you're getting to the point where you're thinking "I need to make a tool to generate the Compose file for me" you're basically at the point where you need something more powerful than Swarm can offer.
That said: for our CI environment, and local testing Swarm works fine. But for production it's too weak. Fortunately for containers, they don't care what tool they're running under.
Posted Aug 24, 2022 17:58 UTC (Wed)
by dskoll (subscriber, #1630)
[Link] (1 responses)
I don't have much to add, but reading this hurt my brain and I now understand a second meaning of the term "Cluster****"
I am so glad I'm nearing the end of my career and not starting out in tech today.
Posted Aug 26, 2022 15:38 UTC (Fri)
by flussence (guest, #85566)
[Link]
It would be nice if there was a consistent definition of what a "container" is though so I can copy the interesting bits. My entire motivation for that is getting better pretty-printed output in things like htop/atop/glances; those have to use a bunch of ad-hoc detection heuristics for all these competing container formats which is unfortunate.
Posted Aug 26, 2022 3:18 UTC (Fri)
by smitty_one_each (subscriber, #28989)
[Link]
Qualitatively, it seems that we're basically eating all of the networking and orchestration capability that the cloud provider handles. We're trading the "cloud" for the "puff".
Analogies are all like something that sucks, but I use this to curb the enthusiasm of those who think some Magic Wand Of Technical Debt Retirement exists. No, dudes: we're going to have to put in the hard work of un-jacking the architecture.
Paraphrasing Zawinski: `Some people, when confronted with a problem, think "I know, I'll use Kubernetes." Now they have two problems.`
Posted Sep 1, 2022 11:22 UTC (Thu)
by zoobab (guest, #9945)
[Link]
Well, having done some Terraform with their HCL language, I will happily stay with Yaml :-)
Posted Sep 1, 2022 21:40 UTC (Thu)
by brianeray (guest, #129476)
[Link] (3 responses)
k8s neophyte here, courtesy of "GitOps and Kubernetes: Continuous Deployment [..]" (Yuen, Matyushentsev, et al) a few years ago.
At one point the book pitched --kustomize as an alternative to at least some of the functionality provided by Helm. I was pressed for time so skipped the Helm content and stuck with the --kustomize content since hey, it's right there in `kubectl`.
Does --kustomize obviate the need for Helm? Is it widely used?
Posted Sep 3, 2022 17:22 UTC (Sat)
by rra (subscriber, #99804)
[Link] (2 responses)
The way I would explain it is that, when using kustomize, you write your Kubernetes manifests directly, and then you use kustomize to "poke" changes into them. It's akin to maintaining a core set of resources and then a set of diffs that you layer on top. As such, it has the problem of all diff systems: it's great and very convenient and easy to understand if the diffs you need are small, but it quickly becomes unwieldy if there are a lot of differences between deployments.
Because of that, if you're maintaining a big wad of flexible open source software (think Grafana, Redis, InfluxDB, that sort of thing), you are not going to have your downstream use kustomize; it would be a nightmare.
Helm can be used the same way, but I think it's best thought of as having an entirely different philosophy: you write a Helm chart that deploys your thing, you pick and choose exactly where that deployment can be customized, and you present an API to the consumers of your chart. (This API is in the form of your values.yaml file, which enumerates all of the supported customization points). Then, your downstream provides their own values.yaml to selectively override the default values, and Helm assembles the result. This has all the advantages that an API always has: you can hide complexity and separate concerns, which is much harder to do with kustomize (and any other patch system). And it has the disadvantages that any API has: more flexibility means more complexity, you have to learn the templating system (which is moderately annoying and tends to produce hideously confusing error messages), and you have to think hard about the API to provide a good one (and mostly people provide bad APIs with too many untested options).
Overall, having used both extensively, I went all in for Helm and haven't regretted it. I really like the clean separation of concerns of a proper API. But using kustomize is not wrong, and for smaller-scale projects than the fairly complex Kubernetes-based ecosystem I work on it may be the right choice.
Posted Sep 3, 2022 18:29 UTC (Sat)
by brianeray (guest, #129476)
[Link]
Posted Sep 10, 2022 16:47 UTC (Sat)
by Lennie (subscriber, #49641)
[Link]
The newest approach seems to be kpt Any idea they are on the right track ?
Posted Sep 2, 2022 14:27 UTC (Fri)
by rorycl (guest, #151214)
[Link] (4 responses)
When I think of "orchestration" as a word outside of its devops usage I think of scoring music for band or orchestra, with the implicit idea that the resulting performance will be conducted by a Herbert von Karajan type figure who helps balance the strings with the brass, percussion with woodwind.
Based on my admittedly inexpert research it is difficult to see how the concept of devops orchestration brings together the idea of creating a performance from containers in a way that makes sense in different cloud environments and equally in one's own racks.
For a small company turning over less than, say, $10m and til now able to work quite comfortably running services on dedicated machines without a dedicated sysadmin/devops team and enjoying the simplicity and stability of Debian, the "orchestration" component seems to be the fly in the ointment of containerisation.
Containerisation itself offers considerable benefits through modularisation and automated testing and deployment. What is very alluring about orchestration tech is that it would allow us to turn a group of servers into a virtual box using overlay networks, with neat scaling features. But that isn't so different from running our proxies to address certain servers rather than others. The overlay tech would allow us to more easily drop and add servers, for example to upgrade machine firmware or OS, but there seem few other advantages at the cost of considerably more complexity. Features often seen as orchestration features, such as secrets or configuration management, can be managed fine in the "spinning rust" environment we currently use (ironically often using tools such as vault or etcd).
Another major issue that hasn't been discussed is how data is handled. What is sometimes called "persistent storage" in the containerisation world, as if it was a side issue rather than the main point of providing SaaS in the first place, seems to have an uneasy relationship with orchestration. Does Herbert ensure that we didn't just mount the postgres 15 container on the postgres 12 mount point? The article doesn't cover this aspect.
So to this luddite it seems that orchestration is really just different approaches to using largely proprietary systems in the way those proprietary systems were made to be sold to you. It makes about as much sense as the software programmer I was interviewing when I asked him about his python skills and he responded "I don't know python, but I'm good with django".
Posted Sep 3, 2022 3:22 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link]
There is no simplicity in Debian on bare metal if you want to deploy complicated applications there. Especially for deployments for more than one machine (e.g a clustered server).
Containers, first and foremost, simplify _deployment_ of applications. And this very much includes small companies.
Posted Sep 3, 2022 17:38 UTC (Sat)
by rra (subscriber, #99804)
[Link] (2 responses)
It's interesting that you would say this because this is exactly the problem that my job solves with Kubernetes.
Our mission is to provide a reusable platform for scientific astronomy, initially targeted at the needs of our specific project, but hopefully generalizable to similar problems. This is a complex set of interrelated services and, perhaps more importantly, underlying infrastructure that handles such things as authentication and authorization and makes it easy to deploy additional astronomy services. And, vitally, we have to be able to run copies of the entire platform both in the cloud and in private data centers. The team I'm part of currently maintains six separate deployments, half in the cloud and half on prem, in addition to developing the infrastructure for the platform as a whole, and the same underlying infrastructure is deployed in three other on-prem data centers by other groups.
We went all in for Kubernetes and it was the best decision we ever made and the only way in which any of this is possible. Kubernetes abstracts away the differences in hosting environments, so that we can develop the hosting platform targeting Kubernetes and anyone who can deploy Kubernetes can deploy a copy of it. It works exactly the same on a cloud Kubernetes environment as it does in a private data center, with only minor changes required to customize things like underlying storage methods. It gives us a fairly tight interface and set of requirements for any new hosting environment: we can just say "give us Kubernetes of at least this version" with a few other requirements, and then we know our entire platform will deploy and work. There is absolutely no way that we could have done this as quickly or consistently, with a very tiny team, while trying to deploy directly on Debian, or even using something like Terraform. We need the additional layer of abstraction and it saves us an absolutely IMMENSE amount of work and debugging.
I'm saying this as someone who has been in this industry for approaching 30 years now and has done just about every type of system administration from hand-compiled GNU software trees in shared file systems through hand-rolled configuration management systems, Puppet, Chef, AWS at scale, and proprietary container orchestration systems; I'm not some neophile who has no experience with other ways of doing things. Kubernetes has its problems to be sure, and sometimes can be quite frustrating, but that orchestration layer lets you define very complex ecosystems of related applications in declarative code and deploy it in a hosting-agnostic way and that solves a critical problem for us.
Posted Sep 3, 2022 21:53 UTC (Sat)
by rorycl (guest, #151214)
[Link] (1 responses)
> [Kubernetes provides an] orchestration layer lets you define very complex ecosystems of related applications in declarative code and deploy it in a hosting-agnostic way...
Thank you for these very helpful descriptions of the benefits of Kubernetes, particularly its use across heterogenous environments at scale and your comments about the time it has saved your team.
I would be grateful to know how your team deals with local development and if it uses automated testing with Kubernetes, possibly as part of continuous integration workflows. It would also be great to know what reference material you and your team has found most useful in its implementation of Kubernetes, particularly from a conceptual perspective.
Posted Sep 3, 2022 22:37 UTC (Sat)
by rra (subscriber, #99804)
[Link]
Mostly for development beyond the basic unit test sort of stuff we use a dev cluster in the cloud (on Google Kubernetes Engine to be precise). It's just easier and less fiddly than a local install, and GKE is rock-solid. That of course comes with a monetary cost, although IMO it's pretty small compared to the cost of developers. But not being able to test locally easily is a bit of a gap that does occasionally cause problems, and while minikube is in theory an answer to this, in practice it's tricky to get all the pieces working happily on a laptop for typical local development (particularly on modern macOS, which a lot of people like to use but which adds the wrinkle of not being x86-based).
In terms of reference material, honestly I mostly just read the Kubernetes reference and tutorial pages on kubernetes.io (and of course implementation-specific guidance for specific cloud providers), plus the Helm documentation. But I joined a team that was already doing Kubernetes, so a lot of my training was from watching what other people were doing and asking questions, so I'm maybe not the best person to ask about initial reference material.
We use Argo CD to automate our Kubernetes deployment and maintenance, and I cannot recommend it highly enough. It makes it so much easier to automate the deployment process end-to-end and then be able to easily see what the cluster is doing and debug problems (and upgrade things, which is very important since we have a very fast development pace and are usually updating five or more times a week). I'll fall back on kubectl for some specific problems, but the Argo CD interface is usually more useful, and I say this as someone who almost always prefers command lines to any graphical tools.
Posted Oct 4, 2022 9:21 UTC (Tue)
by Klavs (guest, #10563)
[Link]
It saves me and my colleagues soo much time - and actually gives a huge pease of mind, knowing that our growing infrastructure - its not beyond us, to actually do a recovery test, we have a decent chance at believing will work for all services we operate. And we're a small company.. I've consulted for many larger corps - and k8s to me, enables the delivery of a "pre-determined but flexible enough" solution, to enable automatic consumption of "operations services" - by development teams - where the ops team has an actual chance of ensuring the ops quality is maintained.
As opposed to the old world of just handing out VMs and really "hoping for the best and otherwise blaming the developer teams".
It is defeintely complex though, and you should definetely be aware of your choices and their cost in complexity.
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
ECS is worth a mention
ECS is worth a mention
ECS is worth a mention
ECS is worth a mention
ECS is worth a mention
ECS is worth a mention
ECS is worth a mention
ECS is worth a mention
That would cover EKS, Amazon's hosted Kubernetes offering. ECS isn't Kubernetes.
ECS is worth a mention
Apache YARN
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
I find myself mourning fleet with some regularity.
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
> It all gave me the impression the whole thing was abstraction for abstraction's sake. I feel the same way about stuff like ansible, so maybe I just really care about what code is actually executing on my servers more than most people.
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
It's worth noting, that while Docker's website is largely devoid of any mention of Swarm, Mirantis reaffirmed their commitment to Swarm in April of this year. It seems like it will continue to be supported in Mirantis's product, but it's unclear to me what that might mean users of the freely-available version of Docker, which is developed and distributed by an entirely different company.
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
The container orchestrator landscape
Yikes...
Yikes...
The container orchestrator landscape
The container orchestrator landscape
Does anybody use `kubectl apply --kustomize`?
Does anybody use `kubectl apply --kustomize`?
Does anybody use `kubectl apply --kustomize`?
Does anybody use `kubectl apply --kustomize`?
what actually is orchestration?
what actually is orchestration?
what actually is orchestration?
what actually is orchestration?
what actually is orchestration?
The container orchestrator landscape
It allows me to deliver ALL users of standard apps, such as databases (postgresql, mongodb etc.) and other type of services, which can be very complicated to deliver in a scalable and highly-availalble manner, while enabling me to ensure that backup and recovery is HANDLED - and I need only have ONE set of procedures/documentation for handling this - and it works for ALL users of this (as we then use this operator for ALL places where we need f.ex. postgresql).