Repost from the Nebulaworks blog…for your reading pleasure :)
Over the last year, Docker has had a rapid rise in popularity. Developers have realized in order to get their job done having a tool like Docker greatly simplifies the instantiation of services to be used in their applications. No longer do you need to spend time to do all of the ops legwork just to develop. Cool eh? Not so fast…
As most of us know – what works in Dev is not necessarily what works in production. There are different requirements for each environment, namely production ops requiring far greater levels of uptime, security, and visibility into performance metrics and logging. So, that takes us down the road of figuring out how to run containers with these requirements – at a minimum choosing a framework to solve these challenges.
At Nebulaworks, we’ve found that there are a few approaches. In this multi-post series I will review the various orchestration/cluster frameworks that we have recommended to our customers as well as the considerations regarding why and when we would recommend them. Like all things not all tools are created equal: Some are complicated and more feature rich while others are simple and straight forward. Depending on your shop one will likely fit over the other. But first, let’s do some definition.
*What is an Orchestration/Cluster Framework?
Before we dive into orchestration frameworks are and the functionality they provide, it’s important to understand what they are what they include. At a high level, they offer scheduling and networking of containers across multiple compute nodes. You could (and some vendors do) call them cluster frameworks, however, I tend to stay away from that description based on the simple fact that it carries a legacy IT definition and with some of our clients a preconceived idea of what they do. In my opinion, these tools are actually more closely aligned with high performance computing clusters (grids), whereby all compute resources look like a single machine for scheduling. In fact, a couple of the tools that I will review go on to define themselves as a “Datacenter OS” and “Single System.”
In order to schedule workloads across multiple compute nodes these frameworks, at a minimum, are composed of the following tools and services:
Command Line Tool: Self explanatory, the tools are executed within a shell to configure, launch, and manage container workloads or the orchestration framework itself. They utilize the framework APIs to accomplish this task.
Controller/Scheduler: A centralized service which either itself understands the underlying available resources and schedules workload instantiation, or, calls on other infrastructure tools or frameworks to determine how and where to instantiate workloads.
Compute/Container Runtime: These compute instances are where containerized workloads are launched and scheduled. Today, the primary technology for running workloads is Docker, with some recent announcements by vendors to support Rocket.
Service Discovery/Registry: Typically a key/value pair datastore is where critical information about the configuration and status of the containers which have been launched, where they are running, state information, and other data is stored. This is also used for discovery services for other newly instantiated containers.
Depending on the framework there can be other services configured as well, such as proxy services (to provide load balanced ports to container ports), name services (can be part of the service registry) to address new container workloads by name, simplifying tying together services, container health check services, and distributed storage.
In my next post, I’ll cover some of the more popular frameworks, their benefits and shortcomings.