I read and watch a lot of science fiction, both contemporary and classic — and one of my favorites is Frank Herbert’s Dune. I even have a soft spot for the oft-maligned David Lynch movie adaptation. The opening line of the movie always captured my imagination: “A beginning is a very delicate time.”
Since I joined Pixability 4 short months ago, I’ve been reflecting on that quote. Starting a new job is always a particularly delicate experience — one wants to demonstrate value early, and the fastest way to do that is to fall back on familiar patterns, tools, languages, etc. But are those familiar patterns still the BEST way to accomplish a task? Are they even a fit for the current environment? It speaks to the quality of the Engineering culture here at Pixability that engineers are allowed the time and resources to explore these important questions.
The Birth of a Drop Pod
My colleague Martin Kerr wrote a great post a few months back on the work he’s doing with Terraform and Terragrunt. When I joined Pixability, I wasn’t too familiar with Terraform or Ansible. However, I was eager to explore them instead of falling back on infrastructures I’m more familiar with.
I feel it’s important to have infrastructure code tightly coupled with the application code that depends on it (wherever practical). To my mind that means embedding infrastructure code (be it Ansible playbooks or Terraform files) directly into your application repositories, and running it through automation and tests as with other types of code. As Terraform lets you reference modules from external sources, and has the ability to generate a static plan file, it seemed like adapting it into that pattern should be totally possible. So, my plan was to have our builds generate a static Terraform plan file, which I would later consume and apply as part of our application deployment tooling.
My first attempt was a false start, due in large part to me not bothering to RTFM. Simply running “
terraform plan --out=tfplan.out” within my builds to generate a Terraform “artifact” was not going to be enough. When you generate a Terraform plan, the plan file itself contains absolute paths to modules, files, etc. So, if you run the plan on an ephemeral build slave and then later want to apply that plan from a different host (or even your own laptop), it’s not going to work because the module paths referenced in the plan will be invalid.
HashiCorp’s suggestion to get around this is to run
terraform plan inside a Docker container, that way you maintain some control over the file paths. This still seemed clumsy to me — even if the Docker container gives me predictable paths for modules, (/tmp for instance), I still have to make sure my deployment tools assert that location, plus also dea with tarring/untarring the plan files to boot.
So I returned to the drawing board. My ultimate goal was to generate a Terraform plan that I could treat as a stand-alone, portable artifact. And what are Docker images if not a type of artifact? If I’m already running Terraform inside a container to generate the plan, then why not just build a container with all the plan files and required modules, and use that as my “artifact”?
And thus was our first configuration “drop pod” born: a fully self contained bundle, with everything it needs stored on-image to accomplish its mission.
Our builds now have a stage (implemented via a Jenkins shared library so anything can plug easily into it), that does the following:
Pulls down a HashiCorp Terraform image of whatever version specified by the invoking build
Launches a container with the current working directory mounted
Copies the Terraform files to a staging path within the container itself
terraform initfrom within the container
Switches to a workspace that corresponds to the environment
Runs the Terraform plan, generating a plan file that exists within the container
docker committo save the image, including plan files and all modules pulled down from
Tags the container with build/branch info and pushes the container to the registry.
In the end, we have a runnable Docker container, with a pinned version of Terraform on board, along with everything else needed to execute a given Terraform plan. You can probably imagine how this solves other occasionally annoying problems as well. For instance, if we really need to leverage a feature in a newer version of Terraform for a given application, it’s very easy for us to build a fully self-contained drop pod that uses the newer version without impacting anything else.
We have a nice wrapper supporting this pattern which accepts arguments of “app”, “branch”, “build” and “environment” that handles the
docker pull, and subsequent
docker run that applies the plan along with associated cleanup. And, because every plan is rendered as part of a build, we can clearly see a history of both proposed and applied changes. To provide a final set of linkages, all our Terraform code applies a ‘terraform-stack’ tag to every resource it manages, indicating the branch and build that produced the plan.
Extending Drop Pods to Ansible
I understand the appeal of Ansible — the simplicity, the procedural ordering of tasks, the lack of a host ‘agent’, etc. Unfortunately, I think that simplicity breeds some questionable practices, not unlike what I wanted to avoid with Terraform (e.g., operators checking out a repo, and executing code out of it with no central system of record or guardrails). Ansible is very powerful, but I think often gets used as a glorified scripting tool and nothing more.
So I found myself asking, could I build an Ansible drop pod that builds on the Terraform approach we took?
There were three problems to solve to adapt this pattern to Ansible.
The need to share and version roles so we could embed our Ansible playbooks alongside our application code in a portable way
A method to bundle everything into a container
A way to actually run the playbooks once bundled into a container.
The answer to first problem turned out to be Ansible-galaxy. Galaxy lets you store your roles separately from your playbooks, and they don’t necessarily have to be uploaded to the Galaxy repository either — you can store and version them as part of your own SCM. Sadly, to store roles in Git requires every one of them to have its own repository. That seemed like a non-starter to me (I don’t really want dozens of tiny Github repositories eating up our license space). However, you can also pull your roles from a web location if they are in a tar format. So, we took the approach of storing roles in a single repo — giving each of them a unique tag (ie, role1-1.0), and then creating a build job to automatically create tar bundles for them that gets uploaded to a web-hosting enabled S3 bucket. Essentially, we \ created our own “poor man’s” package repository.
Ansible-galaxy also provided the answer to the second problem as well. We can embed a ‘requirements.yml’, containing any shared role dependencies required into our application repos alongside a playbook that defines the specific configuration tasks. By running “
ansible-galaxy install -r requirements.yml” from within our build container, we’re producing an Ansible analog to the Terraform drop pod. We now end up with a container that “internally” has a filesystem structure that looks like this:
/root /ansible Playbook.yml requirements.yml /roles RoleA-1.0\ tasks\main.yml meta\... RoleB-1.0\ tasks\main.yml meta\...
Now onto question three. We can’t just run the playbook from within the container, it will end up running the playbook against the container itself. That isn’t what we want – we want to execute the playbook against an instance EXTERNAL to the container, specifically the host EC2 instance where we’ve downloaded the drop pod.
The trick was to run the container in “net=host” mode, and configure /etc/ansible/hosts (within the container of course) to reference “localhost”. This means executing the playbook from within the container will actually treat localhost as if it’s a remote host.
There’s some additional ingredients to making this work. We need to pre-bake these Ansible drop pods with an SSH private key that allows the ‘ansible-runner’ user to connect over (but ONLY from localhost).
Applying the playbook thus looks like this (keeping in mind the entry point is ‘ansible-playbook’
docker run -e ANSIBLE_HOST_KEY_CHECKING=False --net='host' --rm pixability/application_repository:some_versioned_drop_pod --private-key /root/.ssh/id_rsa --user ansible-runner playbook.yml
The container will run this playbook as if you’d provided it an external host to connect to. It just so happens that the external host is actually the host operating system where the container is running.
We now have a fully self-contained “package” that not only contains the playbook and all associated roles on the appropriate version, but also the full Ansible execution environment. This too is wrapped in a simple script that takes arguments of “app”, “branch,” and “build”, and handles pulling the container and executing it with the correct arguments.
Drop pods as a pattern are still being refined here, with plans to extend to our Terragrunt managed infrastructure coming down the road as well. By leveraging Docker to create fully self-contained “packages” with everything needed to apply a Terraform plan or execute an Ansible playbook, we get a system that is highly portable, versionable, and repeatable. We also sidestep and entirely avoid version compatibility issues, and can tightly couple our configuration and infrastructure definitions with the application builds that depend on them.
For Pixability, this system will have the most usefulness for managing our shared infrastructure components, since we run most of our web applications and microservices as containers. However, as a means for configuring and enforcing the underlying state of the container scheduling and orchestration layer itself, it will work very well indeed. From beginning a new job, to developing a new infrastructure package, these new beginnings hold a lot of promise for Pixability and its Engineering team.