08 Jul 2016, 08:10

What is the point of all of this effort? After all this is a very specific, personal work environment. This seems like an inordinate amount of time spent tweaking specific little items and where is the value in that? Is there a wider applicability to this effort?

Learning

In order to learn something new, such as an automation framework, you need to have some itch to scratch, a genuine purpose. Mine was twofold. One is that I had some legitimate work activities that called out for automation. Otherwise I would have spent hours and hours checking host after individual host in the middle of the night and making fixes when I found problems all by hand. So it was worth diving in and doing the up front work to automate. I’ll detail that activity in a shorter, single post.

I had also decided to move my site and some tools to Amazon and had not moved it in a number of years. It had been built on an older Ubuntu edition and a lot of the tools I came depend on were newer. I’d rebuilt my tooling environment enough times that it occurred to me that it might be worth automating.

As should be apparent from what is now a 5 part series of posts, I covered a lot of ground in this effort and learned a large chunk of detail with Ansible. I did that intentionally in part because to meet my specific needs, I had to, but also because there is value in doing things the hard way. There are ansible-galaxy pre-made roles for many of the things I’ve done. Doing them by hand, helped me understand not just the what, but the how. I can do this again for others.

One of the great uses for ansible-galaxy roles is to just try something out and see if it has value. If it does, you can dig in deeper if needed and understand how that role works and start using it for real.

But isn’t this just Rule 11 ?

RFC 1925, Rule 11 says:

Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works.
(11a) (corollary). See rule 6a.

For completeness, the corollary in 6a is

(6a) (corollary). It is always possible to add another level of indirection.

A lot of the things Ansible and similar tools do has been done before. In a previous work environment, we had an in-house Perl-baed tool that did a lot of the configuration management in an automated fashion. I’ve also used parallel ssh-based tools for many years. It’s one of the reasons I picked Ansible over Chef or Puppet. It’s a little familiar. cloud-init for early stage provisioning isn’t all that different from Solaris jumpstart, RedHat kickstart, etc. There continue to be new tools developed to solve seemingly the same problems.

Rule 11 can be understood purely as snark and it is true. But that doesn’t mean these efforts don’t have value. These tools are building on lessons learned and leveraging new and different languages and capabilities that they bring.

Rule 11 also allows you to think, “Hey, I’ve seen something like this before” and recognize a seemingly new and unfamiliar product is using familiar patterns and techniques. I was able to map my experience and understand how some of these things were similar to what I’ve done before, and also how the new tooling might be different and better.

One way in which they are better is that environments are vastly more dynamic than they used to be. A key design center of Ansible is helping managing that unstable, dynamic and potentially vast scale.

Desired state approach

Another aspect that is truly different from my perspective was the desired state approach for provisioning and configuring resources. As a long time sysadmin and scripting Unix engineer, I am comfortable and familiar with the imperative, procedural approach. “Do these things in this order, step by step”

With the desired state approach, you declare at a higher level of indirection what you want and the tool goes and does that on your behalf. You’re letting go a level of control but you’re getting code reuse as a benefit. You’re not hand crafting your own scripts over and over again with little variations.

This is the “Infrastructure As Code” paradigm in practice. I had heard that phrase expressed many times recently but until I started using some of the tools and techniques it didn’t really drive home just what it meant, and why it might have value.

Real change management

I am not a fan of ITIL . But I understand what it tries to achieve. Understanding who changed what, when that was done, and why is hugely important. It’s the key to Visible Ops . By using tools like git alongside configuration management, you can apply to infrastructure real change management that captures those details.

Because you’re using desired state and letting automation do the heavy lifting, you can actually produce a coherent execution plan and you can quickly backout and revert. To do this right, you would execute frequent, small , iterative changes on a rolling basis. Testing is a part of that process and these tools are also a key part of enabling testing. As you roll out Ansible changes, you must do test and validation in an automated fashion before pushing a change to larger, more critical components lest the “blast radius” of a mistake expand far and wide.

Again, being able to use the same automation to rapidly and reliably back out a change is critical as well. This approach allows for the “ruthless standardization” you’ve probably always sought but never achieved. The further down this path you go, the further away you get from that fear that when super important database server x reboots that it will come up and “just work”. If nothing else, you can reduce a data center that consists of nothing but a variety of specifically and individually customized one-offs to a cohesive, manageable and scalable system and relegate those oddballs to the legacy corner of painful pets not to be touched where they belong.

Expanding outward and final thoughts

My approach has all been Linux based but Ansible and similar tools can manage many other products as well. Since Ansible is ssh-based there’s a wide array of items that, if nothing else, you can use with the ‘raw’ module. I’ve been experimenting with Ansible to manage storage arrays, such as NetApp and legacy fiber channel switches.

Ansible does Windows as well and there is continued and extensive development in this area. Microsoft has announced that a native ssh service will be built into server 2016. Windows nano + sshd + powershell + Ansible sounds like goodness to me.

If you try out Ansible, you will of course read their excellent documentation. Google will find other resources and you can use ansible-galaxy to build upon others' efforts. The Ansible playbook best practices guide should be read again after you’ve spent some time digging in. I got a lot more out of it the second time I read it through after developing some plays and playbooks. The language features repo has tons of examples as well that helped me out.

If you’ve found this interesting and stuck with me this far, thanks! I have always been a believer in the power of automation and am pleased to see the industry as a whole has moved in this direction. I am not able to put the entirety of my efforts in github directly as it would be rather hard to separate the private items without inadvertently exposing some of them publicly. But if you have interest in using some of what I’ve done, hopefully I’ve put enough detail in this series. Good luck and go automate all thing things.

Previous Posts in this Series:

Using Ansible to Bootstrap My Work Environment Part 5

Learning

But isn’t this just Rule 11 ?

Desired state approach

Real change management

Expanding outward and final thoughts

Share!