Using Ansible to Bootstrap My Work Environment Part 4


Using an AWS EC2 instance as a utility host presents additional interesting wrinkles. With EC2, one can spin up and tear down instances quickly and dynamically. Because charging on AWS is usage based, I can either provision or start a previously provisioned instance as needed. When I decide to add new parts to my work environment I can use Ansible to quickly provision and test a replacement instance environment while my previous known-good configuration is still available to start and use.

I deliberately did not take the approach of hand-crafting an instance and then snapshotting it as I would pay for that idle storage for the snap on S3. You can also generate your own AMI, but again that’s storage you’d pay for. That said, these are good options to have and they have their uses.

The challenge with working in this way is that it’s all dynamic including the external IP addresses. But I can leverage Ansible’s EC2 integration and a dynamic ec2 inventory script to generate the Ansible inventory on the fly at run time. I also use the aws API instead of ssh when it’s not yet installed along with cloud init to do some early provisioning items.

Since the AWS external IP is also dynamic, instead of burning an EC2 Elastic IP address and paying for it during periods where it’s associated to instance that is not started, I can dynamically update a DNS CNAME record in AWS Route53 for normal ssh and vpn access to the instance.

Initial provisioning

I deliberately separated initial provisioning and post-provisioning configuration into two playbook runs. The first uses cloud-init and the aws api with a local connection during the Ansible playbook run. The second uses the AWS script to identify our started host and configure it. It would be possible to combine the two so it’s all done with one ansible-playbook invocation but that struck me as unnecessary.

The first playbook is ~/source/ansible-mystuff/provision_ec2instance.yml

- hosts: localhost
  connection: local
  gather_facts: False
  tags: provision
    - ec2

    - name: Launch instance
        keypair: "{{ keypair }}"
        region: "{{ region }}"
        zone: "{{ az }}"
        image: "{{ image }}"
        instance_type: "{{ instance_type }}"
        vpc_subnet_id: "{{ vpc_subnet_id }}"
        assign_public_ip: true
        group: ['my-security-group-name']
        exact_count: 1
        source_dest_check: no
          Name: "{{ item.hostname }}"
            Name: "{{ item.hostname }}"
            role: "{{ item.role }}"
            environment: "{{ item.environment }}"
          - device_name: /dev/sda1
            volume_size: 16
            device_type: gp2
            delete_on_termination: True
        wait: true
        instance_profile_name: ec2_role_route53
        user_data: "{{ lookup('template', 'roles/ec2/templates/user_data_route53_dns.yml.j2') }}"
        - hostname: "candyapplegrey" 
          fqdn: "candyapplegrey.{{ domain }}"
          role: "utilhost"
          environment: "production"
      register: ec2

I’ve got some variables here sourced from roles/ec2/vars/main.yml

keypair: my_handy_dandy_aws_keypair
instance_type: t2.micro
region: us-west-2
az: us-west-2a
image: ami-d06a90b0
vpc_subnet_id: subnet-xyy8899900
zone_id: removed

The AMI comes from Ubuntu’s handy ami locator. The aws keypair is sourced by the aws cli on my laptop . The zone_id is uses for Route53 which I will get to in a minute.

Since I’m using the ec2 Ansible module, the connection here is local for the subsequent play which launches and provisions a fresh instance. In this case I put my preferred hostname for this version of an EC2 utility host. It’s just a name I use when I want to access the host from the outside.

You may note that I specify delete_on_termination: True as I do want to mop up if I terminate an instance which is a frequent occurrence when I’m updating and testing. The group item specifies which EC2 security group to apply to the instance which is associates necessary firewall rules.

Two items come into play to do the dynamic DNS bit at boot, The first is the instance_profile_name which applies a previously created Amazon IAM profile to the instance. This profile has the policy rules that will allow this instance to update the relevant DNS records in route53 via an API call. That API call happens during the cloud-init invocation early in the instance startup that is specified in the line user_data: "{{ lookup('template', 'roles/ec2/templates/user_data_route53_dns.yml.j2') }}"

The cloud-init bits.

I found that cloud-init wasn’t super well documented. It’s used a lot but seems to be updated frequently. This gigantic example file provides perhaps the most comprehensive detail on what can be done with this. Here then is my annotated roles/ec2/templates/user_data_route53_dns.yml.j2


# need our user
 - name: {{ ansible_user }}
     - ssh-rsa  umm. nope
   groups: [ 'admin', 'adm', 'dialout', 'sudo', 'lxd', 'plugdev', 'netdev' ]
   shell: /bin/bash
   sudo: ["ALL=(ALL) NOPASSWD:ALL"]

Startup is pretty simple. I set up my normal user account which is obviously where all the subsequent configuration needs to go.

# Set the hostname and FQDN
hostname: "{{ item.hostname }}"
fqdn: "{{ item.fqdn }}"
# Set our hostname in /etc/hosts too
manage_etc_hosts: true
timezone: US/Central
package_update: true
package_upgrade: true

This should be straightforward enough. We are grabbing our hostname and fqdn and updating the instance’s /etc/host, setting a timezone, and executing apt-get update; apt-get upgrade .

# Our script below depends on this:
  - awscli
  - python

In order to do the next step, we need python2 on our Xenial instance which only includes python3 and the AWS cli to update a CNAME on Route53. Again, Ansible is idempotent so we don’t care that subsequent provisioning plays may also ensure these packages are present. cloud-init behaves similarly so calling all this stuff at instance launch or start doesn’t create problems.

# Write a script that executes on every boot and sets a DNS entry pointing to
# this instance. This requires the instance having an appropriate IAM role set,
# so it has permission to perform the changes to Route53.
  - content: |
      FQDN=`hostname -f`
      ZONE_ID="{{ zone_id }}"
      PUBLIC_DNS=$(curl ${SELF_META_URL}/public-hostname 2>/dev/null)
      cat << EOT > /tmp/aws_r53_batch.json
        "Comment": "Assign AWS Public DNS as a CNAME of hostname",
        "Changes": [
            "Action": "UPSERT",
            "ResourceRecordSet": {
              "Name": "${FQDN}.",
              "Type": "CNAME",
              "TTL": ${TTL},
              "ResourceRecords": [
                  "Value": "${PUBLIC_DNS}"
      aws route53 change-resource-record-sets --hosted-zone-id ${ZONE_ID} --change-batch file:///tmp/aws_r53_batch.json
      rm -f /tmp/aws_r53_batch.json
    path: /var/lib/cloud/scripts/per-boot/
    permissions: 0755

Here’s the tricky bit taken from here. Because I set up the IAM role ec2_role_route53 with the policy applied as documented in that post to update our chosen DNS zone and assigned it to the instance earlier, the instance is allowed to execute this update record API call. I like the approach of doing this at cloud-init rather than pushing it into systemd which is how I initially did this in earlier experiments. The result is that at every start of this instance, is a CNAME that gets updated to point to the instance’s current aws A record. I don’t worry about removing the CNAME on instance stop or termination though I suppose I could do that as well.

followup up provisioning playbook

After we launch an instance and do our early configuration, I can followup with a second invocation of ansible-playbook on ~/source/ansible-mystuff/util_ec2.yml


- hosts: tag_Name_candyapplegrey
  gather_facts: True
    - role: ec2
    - role: common
    - role: ansible_mystuff
    - role: openvpn_server

  - name: Reboot system if required
    tags: reboot
    become: yes
    command: /sbin/reboot  removes=/var/run/reboot-required 
    async: 1
    poll: 0
    ignore_errors: true

  - name: waiting for {{ inventory_hostname }} to reboot
    local_action: wait_for host={{ inventory_hostname }} state=started delay=30 timeout=300 
    become: no

The tag_Name_candyapplegrey is used in conjunction with the dynamic script to apply these plays to the correct host based on the dynamically discovered tag. Also, due to package updates the newly launched or updated host may need a reboot

the ec2 role



- name: Add the instance to known hosts
  local_action: command sh -c 'ssh-keyscan -t rsa {{ ec2_ip_address }} >> $HOME/.ssh/known_hosts'
  when: ec2_$ip_address is defined

This play adds the newly launched host’s ssh public key to my known hosts file for subsequent usage. Regardless, Ansible will prompt to accept the key on first ssh if it’s not already in the known hosts. This is all that happens on the ec2 role and it’s called only if the IP address is defined and captured during the provisioning playbook’s invocation.

We’ve already see the common role in my previous post so I’ll skip that here.

the ansible_mystuff role

There’s a little bit of inception here. In this case I want my (private) ansible-mystuff git repo with all it’s playbooks deployed to the ec2 instance. There’s just one play here is well in roles/ansible_mystuff/tasks/main.yml


- name: install ansible 
  become: yes
    name: ansible
    state: present

- name: ansible-mystuff repo for deployment of util hosts
  git:  dest=~/source/ansible-mystuff

What’s potentially interesting about this is that in a scenario where I entirely lost my laptop and/or Chromebook, I could still jump on a machine somewhere, access my AWS account, and provision up a uility host to start working. I can just grab my private git repos by hand and push them out. There’s a number of potential scenarios but the main thing is that it’s there and accessible.

The openvpn_server role

I learned a few additional items creating this role. The usefulness here is that I can use my openvpn keys (again kept in my private configfiles repo) and this openvpn server at an insecure location like a coffee shop for web browsing. I can also use it to get to AWS EC2 instances and resources that don’t have an internet gateway defined. So this utility host is an ssh bastion host and also a gateway to my VPC with this additional role applied. Because it’s a distinct role, I can always separate those functions later or reuse the role in some other context.

I don’t use the VPN feature for browsing too often as you pay for egress bandwidth usage on AWS. I also have OpenVPN installed on a router running dd-wrt at my house but since that is home internet and a cheap router, it’s not always reliable so it doesn’t hurt to have a secondary option.

The roles/openvpn_server/tasks/main.yml is a bit more involved.


- name: install openvpn
  become: yes
    name: openvpn
    state: present

- name: copy openvpn configuration data
  become: yes
  copy: src=openvpn dest=/etc

This installs the openvpn packages and recursively copies all the configuration from roles/openvpn_sever/files/openvpn to populate /etc/openvpn on the target instance

- name: set up ip forwarding
  become: yes
  sysctl: name="net.ipv4.ip_forward" value=1 sysctl_set=yes state=present reload=yes

- name: update /etc/rc.local
  become: yes
  copy: src=rc.local dest=/etc/rc.local mode=0755

- name: do iptables
  become: yes
  iptables: state=present table=nat chain=POSTROUTING source= out_interface=eth0 jump=MASQUERADE

I need to ensure ip forwarding is enabled and an iptables NAT rule is present for the VPN traffic so we can push client default route traffic out through the openvpn tunnel. The /etc/rc.local contains this iptables rule as well so it is present on subsequent boots.

- name: fix debian bug in openvpn service
  become: yes
  copy: src=openvpn@.service dest=/lib/systemd/system/

This took a bit of troubleshooting to locate. I’d push everything out and then after the system rebooted on AWS, it was unreachable. Turns out there’s a bug report against openvpn in Xenial. The service that starts openvpn is too early and the tun0 interface ends up getting enumerated and used as the default route interface. This play works around the bug. I’m watching it and the related report for an official fix.

I had to execute several launch and terminate sequences to isolate which took a while.Yaks were shaved. The whole sequence to push out a new machine takes 10-15 minutes which is not terribly long but when you’re doing it repeatedly and often it feels that way. However, the reality is I could spin up and destroy instances at will. I still had another known good EC2 instance running an earlier Ubuntu release to use.

If you came across something like this in a production environment, you could do just as I have and apply the workaround, document the bug and watch it. You might push a bug workaround via Ansible to a large number of machines in parallel. And when the bug in the package is corrected, you can run through testing, update the playbook(s), git commit your changes, and push them out.

- name: restart systemd daemon after updating unit config
  become: yes
  command: systemctl daemon-reload

- name: enable openvpn services
  become: yes
  command: systemctl enable openvpn.service

- name: restart openvpn services
  become: yes
  command: systemctl restart openvpn.service

It turns out the the systemd Ansible module appears in version 2.2 and the packaged version on Xenial is 2.0. I could instead deploy Ansible via git instead of using the dpkg or I could use pip install but I’m making the choice for now to use the packaged variant. The point here is that it is a fast moving target. Ansible is very good about documenting when a feature or module appears. Again, I can optionally update this method later if and when I move to Ansible 2.2 or higher but the command module gets the job done.

Launch and configure an instance.

In my current design, this is a two step operation. From my Linux laptop with Ansible installed along with awscli and my AWS keypair, I kick this off. That said, because I have encapsulated all the Ansible playbooks, package requirements and configuration details in my private git repositories, I can do this from another EC2 host, or really any reasonable Linux/Unix host. One of my goals is to be able to bootstrap from a bare host if necessary and I can, in fact, do that with just a little more work than what I’m demonstrating here.

Since the EC2 method has a few more parts including the low level provisioning, here’s how that looks.

~/source/ansible-mystuff on  master ⌚ 16:25:03
$ ansible-playbook provision_ec2instance.yml -i vars/hosts
PLAY ***************************************************************************

TASK [ec2 : Add the instance to known hosts] ***********************************
skipping: [localhost]

TASK [Launch instance] *********************************************************
changed: [localhost] => (item={u'environment': u'production', u'hostname': u'candyapplegrey', u'role': u'utilhost', u'fqdn': u''})

PLAY RECAP *********************************************************************
localhost                  : ok=1    changed=1    unreachable=0    failed=0

Without --verbose it’s fairly uneventful but this is concise and what you want to see. Note also that for the provisioning step, we source a plain vars/hosts for inventory since an inventory file or dynamic inventory script that generates one is required. Since the playbook specifies one play that uses a local connection, though, this really isn’t used

vars/hosts looks like this:



The ec2hosts group is a placeholder. When subsequent plays use the script to dynamically source inventory for launched ec2 instances, tags on those instances will be used to place them into Ansible groups. Ansible groups can be used in those place to make decisions about which roles to apply, use of group_vars, etc.

~/source/ansible-mystuff on  master ⌚ 16:35:20
$ ansible-playbook -i ../ansible/contrib/inventory/ util_ec2.yml 
PLAY ***************************************************************************

TASK [setup] *******************************************************************
The authenticity of host ' (' can't be established.
ECDSA key fingerprint is SHA256:WrO/LeWmzPJBoaxxbnsUbGM+00HVa7wkijVNxso2PjA.
Are you sure you want to continue connecting (yes/no)? yes
ok: []

TASK [ec2 : Add the instance to known hosts] ***********************************
changed: [ -> localhost]

TASK [common : install git] ****************************************************
ok: []

TASK [common : send ssh id] ****************************************************
changed: []

TASK [common : git configfiles] ************************************************
changed: [] => (item={u'dest': u'/home/sharney/.gitignore', u'src': u'dot_gitignore'})
ok: [] => (item={u'dest': u'/home/sharney/.gitignore', u'src': u'dot_gitignore'})
changed: [] => (item={u'dest': u'/home/sharney/.git', u'src': u'dot_git'})

TASK [common : bitbucket  key to known_hosts] **********************************
# SSH-2.0-OpenSSH_6.4
changed: []

TASK [common : do configfiles repo] ********************************************
changed: []

TASK [common : ssh id permissions fix] *****************************************
changed: []

TASK [common : github key to known_hosts] **************************************
# SSH-2.0-libssh-0.7.0
changed: []

TASK [common : install packages for emacs and more] ****************************
changed: [] => (item=[u'emacs24', u'emacs24-el', u'pandoc', u'tmux', u'zsh', u'ispell', u'vpnc', u'fonts-hack-ttf', u'ruby', u'ruby-aws-sdk', u'python-pip', u'python-pip-whl', u'virtualenv', u'curl', u'openjdk-8-jre'])

TASK [common : python pip install items] ***************************************
changed: [] => (item=powerline-status)
changed: [] => (item=awscli)
changed: [] => (item=saws)

TASK [common : aws sdk v1 for ruby for awscli] *********************************
ok: []

TASK [common : check for spacemacs already imported] ***************************
ok: []

TASK [common : mv .emacs.d out of the way for spacemacs if spacemacs not yet deployed] ***
changed: []

TASK [common : spacemacs] ******************************************************
changed: []

TASK [common : for spacemacs org-protocol-capture-html] ************************
changed: []

TASK [common : fix .emacs.d/private via source copy] ***************************
changed: []

TASK [common : gen en_US.UTF-8 locale] *****************************************
ok: []

TASK [common : set default locale to en_US.UTF-8] ******************************
changed: []

TASK [common : symlink for dir_colors] *****************************************
changed: []

TASK [common : symlink for .tmux.conf] *****************************************
changed: []

TASK [common : copy vpnc config] ***********************************************
changed: []

TASK [common : git clone] **************************************
changed: []

TASK [common : install ohmyzsh via git] ****************************************
changed: []

TASK [common : fix .zshrc] *****************************************************
ok: []

TASK [ansible_mystuff : install ansible] ***************************************
changed: []

TASK [ansible_mystuff : ansible-mystuff repo for deployment of util hosts] *****
changed: []

TASK [openvpn_server : install openvpn] ****************************************
changed: []

TASK [openvpn_server : copy openvpn configuration data] ************************
changed: []

TASK [openvpn_server : set up ip forwarding] ***********************************
changed: []

TASK [openvpn_server : update /etc/rc.local] ***********************************
changed: []

TASK [openvpn_server : do iptables] ********************************************
changed: []

TASK [openvpn_server : fix debian bug in openvpn service] ***
changed: []

TASK [openvpn_server : restart systemd daemon after updating unit config] ******
changed: []

TASK [openvpn_server : enable openvpn services] ********************************
changed: []

TASK [openvpn_server : restart openvpn services] *******************************
changed: []

TASK [Reboot system if required] ***********************************************
ok: []

TASK [waiting for to reboot] *************************************
ok: [ -> localhost]

PLAY RECAP *********************************************************************              : ok=38   changed=30   unreachable=0    failed=0

~/source/ansible-mystuff on  master ⌚ 16:46:28

And that’s it. This does quite a bit in a very short period of time but after the host reboots at the end, it’s ready to go. It does ask me to add the ssh key to known hosts which I could avoid by pushing a pre-existing already trusted key during cloud-init.

The final installment wraps up this series and provides some thoughts on the broader applicability and usefulness of this.


comments powered by Disqus