Overview
Using an AWS EC2 instance as a utility host presents additional interesting wrinkles. With EC2, one can spin up and tear down instances quickly and dynamically. Because charging on AWS is usage based, I can either provision or start a previously provisioned instance as needed. When I decide to add new parts to my work environment I can use Ansible to quickly provision and test a replacement instance environment while my previous known-good configuration is still available to start and use.
I deliberately did not take the approach of hand-crafting an instance and then snapshotting it as I would pay for that idle storage for the snap on S3. You can also generate your own AMI, but again that’s storage you’d pay for. That said, these are good options to have and they have their uses.
The challenge with working in this way is that it’s all dynamic including the
external IP addresses. But I can leverage Ansible’s EC2 integration and a
dynamic
ec2
inventory script
to generate the Ansible inventory on the fly at run time. I also use the aws API
instead of ssh when it’s not yet installed along with
cloud init
to do some
early provisioning items.
Since the AWS external IP is also dynamic, instead of burning an EC2 Elastic IP address and paying for it during periods where it’s associated to instance that is not started, I can dynamically update a DNS CNAME record in AWS Route53 for normal ssh and vpn access to the instance.
Initial provisioning
I deliberately separated initial provisioning and post-provisioning
configuration into two playbook runs. The first uses cloud-init
and the aws
api with a local connection during the Ansible playbook run. The second uses the
AWS ec2.py
script to identify our started host and configure it. It would be
possible to combine the two so it’s all done with one ansible-playbook
invocation but that struck me as unnecessary.
The first playbook is ~/source/ansible-mystuff/provision_ec2instance.yml
- hosts: localhost
connection: local
gather_facts: False
tags: provision
roles:
- ec2
tasks:
- name: Launch instance
ec2:
keypair: "{{ keypair }}"
region: "{{ region }}"
zone: "{{ az }}"
image: "{{ image }}"
instance_type: "{{ instance_type }}"
vpc_subnet_id: "{{ vpc_subnet_id }}"
assign_public_ip: true
group: ['my-security-group-name']
exact_count: 1
source_dest_check: no
count_tag:
Name: "{{ item.hostname }}"
instance_tags:
Name: "{{ item.hostname }}"
role: "{{ item.role }}"
environment: "{{ item.environment }}"
volumes:
- device_name: /dev/sda1
volume_size: 16
device_type: gp2
delete_on_termination: True
wait: true
instance_profile_name: ec2_role_route53
user_data: "{{ lookup('template', 'roles/ec2/templates/user_data_route53_dns.yml.j2') }}"
with_items:
- hostname: "candyapplegrey"
fqdn: "candyapplegrey.{{ domain }}"
role: "utilhost"
environment: "production"
register: ec2
I’ve got some variables here sourced from roles/ec2/vars/main.yml
---
keypair: my_handy_dandy_aws_keypair
instance_type: t2.micro
region: us-west-2
az: us-west-2a
image: ami-d06a90b0
vpc_subnet_id: subnet-xyy8899900
zone_id: removed
domain: scottharney.com
The AMI comes from Ubuntu’s handy ami locator. The aws keypair is sourced by the aws cli on my laptop . The zone_id is uses for Route53 which I will get to in a minute.
Since I’m using the ec2 Ansible module, the connection here is local for the subsequent play which launches and provisions a fresh instance. In this case I put my preferred hostname for this version of an EC2 utility host. It’s just a name I use when I want to access the host from the outside.
You may note that I specify delete_on_termination: True
as I do want to mop up
if I terminate an instance which is a frequent occurrence when I’m updating and
testing. The group
item specifies which EC2 security group to apply to the
instance which is associates necessary firewall rules.
Two items come into play to do the dynamic DNS bit at boot, The first is the
instance_profile_name
which applies a previously created Amazon IAM profile to
the instance. This profile has the policy rules that will allow this instance to
update the relevant DNS records in route53 via an API call. That API call
happens during the cloud-init
invocation early in the instance startup that is
specified in the line user_data: "{{ lookup('template', 'roles/ec2/templates/user_data_route53_dns.yml.j2') }}"
The cloud-init
bits.
I found that cloud-init
wasn’t super well documented. It’s used a lot but
seems to be updated frequently. This
gigantic example file
provides perhaps the most comprehensive detail on what can be done with this.
Here then is my annotated roles/ec2/templates/user_data_route53_dns.yml.j2
#cloud-config
# need our user
users:
- name: {{ ansible_user }}
ssh-authorized-keys:
- ssh-rsa umm. nope
groups: [ 'admin', 'adm', 'dialout', 'sudo', 'lxd', 'plugdev', 'netdev' ]
shell: /bin/bash
sudo: ["ALL=(ALL) NOPASSWD:ALL"]
Startup is pretty simple. I set up my normal user account which is obviously where all the subsequent configuration needs to go.
# Set the hostname and FQDN
hostname: "{{ item.hostname }}"
fqdn: "{{ item.fqdn }}"
# Set our hostname in /etc/hosts too
manage_etc_hosts: true
timezone: US/Central
package_update: true
package_upgrade: true
This should be straightforward enough. We are grabbing our hostname and fqdn
and updating the instance’s /etc/host
, setting a timezone, and executing
apt-get update; apt-get upgrade
.
# Our script below depends on this:
packages:
- awscli
- python
In order to do the next step, we need python2 on our Xenial instance which only
includes python3 and the AWS cli to update a CNAME on Route53. Again, Ansible
is idempotent so we don’t care that subsequent provisioning plays may also
ensure these packages are present. cloud-init
behaves similarly so calling all
this stuff at instance launch or start doesn’t create problems.
# Write a script that executes on every boot and sets a DNS entry pointing to
# this instance. This requires the instance having an appropriate IAM role set,
# so it has permission to perform the changes to Route53.
write_files:
- content: |
#!/bin/sh
FQDN=`hostname -f`
ZONE_ID="{{ zone_id }}"
TTL=300
SELF_META_URL="http://169.254.169.254/latest/meta-data"
PUBLIC_DNS=$(curl ${SELF_META_URL}/public-hostname 2>/dev/null)
cat << EOT > /tmp/aws_r53_batch.json
{
"Comment": "Assign AWS Public DNS as a CNAME of hostname",
"Changes": [
{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "${FQDN}.",
"Type": "CNAME",
"TTL": ${TTL},
"ResourceRecords": [
{
"Value": "${PUBLIC_DNS}"
}
]
}
}
]
}
EOT
aws route53 change-resource-record-sets --hosted-zone-id ${ZONE_ID} --change-batch file:///tmp/aws_r53_batch.json
rm -f /tmp/aws_r53_batch.json
path: /var/lib/cloud/scripts/per-boot/set_route53_dns.sh
permissions: 0755
Here’s the tricky bit taken from
here. Because I set
up the IAM role ec2_role_route53 with the policy applied as documented in that
post to update our chosen DNS zone and assigned it to the instance earlier, the
instance is allowed to execute this update record API call. I like the approach
of doing this at cloud-init
rather than pushing it into systemd
which is how
I initially did this in earlier experiments. The result is that at every start
of this instance, candyapplegrey.scottharney.com
is a CNAME that gets updated to
point to the instance’s current aws A record. I don’t worry about removing the
CNAME on instance stop or termination though I suppose I could do that as
well.
followup up provisioning playbook
After we launch an instance and do our early configuration, I can followup with
a second invocation of ansible-playbook
on ~/source/ansible-mystuff/util_ec2.yml
---
- hosts: tag_Name_candyapplegrey
gather_facts: True
roles:
- role: ec2
- role: common
- role: ansible_mystuff
- role: openvpn_server
tasks:
- name: Reboot system if required
tags: reboot
become: yes
command: /sbin/reboot removes=/var/run/reboot-required
async: 1
poll: 0
ignore_errors: true
- name: waiting for {{ inventory_hostname }} to reboot
local_action: wait_for host={{ inventory_hostname }} state=started delay=30 timeout=300
become: no
The tag_Name_candyapplegrey
is used in conjunction with the dynamic ec2.py
script to apply these plays to the correct host based on the dynamically
discovered tag. Also, due to package updates the newly launched or updated host
may need a
reboot
the ec2 role
roles/ec2/tasks/main.yml
---
- name: Add the instance to known hosts
local_action: command sh -c 'ssh-keyscan -t rsa {{ ec2_ip_address }} >> $HOME/.ssh/known_hosts'
when: ec2_$ip_address is defined
This play adds the newly launched host’s ssh public key to my known hosts file for subsequent usage. Regardless, Ansible will prompt to accept the key on first ssh if it’s not already in the known hosts. This is all that happens on the ec2 role and it’s called only if the IP address is defined and captured during the provisioning playbook’s invocation.
We’ve already see the common role in my previous post so I’ll skip that here.
the ansible_mystuff role
There’s a little bit of inception here. In this case I want my (private) ansible-mystuff
git repo with all it’s playbooks deployed to the ec2 instance. There’s just one
play here is well in roles/ansible_mystuff/tasks/main.yml
---
- name: install ansible
become: yes
apt:
name: ansible
state: present
- name: ansible-mystuff repo for deployment of util hosts
git: repo=git@bitbucket.org:scott_harney/ansible-mystuff.git dest=~/source/ansible-mystuff
What’s potentially interesting about this is that in a scenario where I entirely lost my laptop and/or Chromebook, I could still jump on a machine somewhere, access my AWS account, and provision up a uility host to start working. I can just grab my private git repos by hand and push them out. There’s a number of potential scenarios but the main thing is that it’s there and accessible.
The openvpn_server role
I learned a few additional items creating this role. The usefulness here is that I can use my openvpn keys (again kept in my private configfiles repo) and this openvpn server at an insecure location like a coffee shop for web browsing. I can also use it to get to AWS EC2 instances and resources that don’t have an internet gateway defined. So this utility host is an ssh bastion host and also a gateway to my VPC with this additional role applied. Because it’s a distinct role, I can always separate those functions later or reuse the role in some other context.
I don’t use the VPN feature for browsing too often as you pay for egress bandwidth usage on AWS. I also have OpenVPN installed on a router running dd-wrt at my house but since that is home internet and a cheap router, it’s not always reliable so it doesn’t hurt to have a secondary option.
The roles/openvpn_server/tasks/main.yml
is a bit more involved.
---
- name: install openvpn
become: yes
apt:
name: openvpn
state: present
- name: copy openvpn configuration data
become: yes
copy: src=openvpn dest=/etc
This installs the openvpn packages and recursively copies all the
configuration from roles/openvpn_sever/files/openvpn
to populate
/etc/openvpn
on the target instance
- name: set up ip forwarding
become: yes
sysctl: name="net.ipv4.ip_forward" value=1 sysctl_set=yes state=present reload=yes
- name: update /etc/rc.local
become: yes
copy: src=rc.local dest=/etc/rc.local mode=0755
- name: do iptables
become: yes
iptables: state=present table=nat chain=POSTROUTING source=192.168.3.0/24 out_interface=eth0 jump=MASQUERADE
I need to ensure ip forwarding is enabled and an iptables NAT rule is present
for the VPN traffic so we can push client default route traffic out through the
openvpn tunnel. The /etc/rc.local
contains this iptables rule as well so it
is present on subsequent boots.
- name: fix debian bug in openvpn service https://bugs.launchpad.net/ubuntu/+source/openvpn/+bug/1580356?
become: yes
copy: src=openvpn@.service dest=/lib/systemd/system/
This took a bit of troubleshooting to locate. I’d push everything out and then
after the system rebooted on AWS, it was unreachable. Turns out there’s a bug
report against openvpn in Xenial. The service that starts openvpn is too early
and the tun0
interface ends up getting enumerated and used as the default
route interface. This play works around the bug. I’m watching it and the related
report for an official fix.
I had to execute several launch and terminate sequences to isolate which took a while.Yaks were shaved. The whole sequence to push out a new machine takes 10-15 minutes which is not terribly long but when you’re doing it repeatedly and often it feels that way. However, the reality is I could spin up and destroy instances at will. I still had another known good EC2 instance running an earlier Ubuntu release to use.
If you came across something like this in a production environment, you could do
just as I have and apply the workaround, document the bug and watch it. You
might push a bug workaround via Ansible to a large number of machines in
parallel. And when the bug in the package is corrected, you can run through
testing, update the playbook(s), git commit
your changes, and push them out.
- name: restart systemd daemon after updating unit config
become: yes
command: systemctl daemon-reload
- name: enable openvpn services
become: yes
command: systemctl enable openvpn.service
- name: restart openvpn services
become: yes
command: systemctl restart openvpn.service
It turns out the the systemd
Ansible module appears in
version 2.2 and the
packaged version on Xenial is 2.0. I could instead deploy Ansible via git
instead of using the dpkg or I could use pip install
but I’m making the choice
for now to use the packaged variant. The point here is that it is a fast moving
target. Ansible is very good about documenting when a feature or module
appears. Again, I can optionally update this method later if and when I move
to Ansible 2.2 or higher but the command module gets the job done.
Launch and configure an instance.
In my current design, this is a two step operation. From my Linux laptop with
Ansible installed along with awscli
and my AWS keypair, I kick this off. That
said, because I have encapsulated all the Ansible playbooks, package
requirements and configuration details in my private git repositories, I can do
this from another EC2 host, or really any reasonable Linux/Unix host. One of my
goals is to be able to bootstrap from a bare host if necessary and I can, in
fact, do that with just a little more work than what I’m demonstrating here.
Since the EC2 method has a few more parts including the low level provisioning, here’s how that looks.
~/source/ansible-mystuff on master ⌚ 16:25:03
$ ansible-playbook provision_ec2instance.yml -i vars/hosts
PLAY ***************************************************************************
TASK [ec2 : Add the instance to known hosts] ***********************************
skipping: [localhost]
TASK [Launch instance] *********************************************************
changed: [localhost] => (item={u'environment': u'production', u'hostname': u'candyapplegrey', u'role': u'utilhost', u'fqdn': u'candyapplegrey.scottharney.com'})
PLAY RECAP *********************************************************************
localhost : ok=1 changed=1 unreachable=0 failed=0
Without --verbose
it’s fairly uneventful but this is concise and what you want
to see. Note also that for the provisioning step, we source a plain
vars/hosts
for inventory since an inventory file or dynamic inventory script
that generates one is required. Since the playbook specifies one play that uses
a local connection, though, this really isn’t used
vars/hosts
looks like this:
[crouton]
192.168.2.160
[ec2hosts]
The ec2hosts group is a placeholder. When subsequent plays use the ec2.py
script to dynamically source inventory for launched ec2 instances, tags on those
instances will be used to place them into Ansible groups. Ansible groups can be
used in those place to make decisions about which roles to apply, use of
group_vars
, etc.
~/source/ansible-mystuff on master ⌚ 16:35:20
$ ansible-playbook -i ../ansible/contrib/inventory/ec2.py util_ec2.yml
ansible-playbook
PLAY ***************************************************************************
TASK [setup] *******************************************************************
The authenticity of host '52.44.179.151 (52.44.179.151)' can't be established.
ECDSA key fingerprint is SHA256:WrO/LeWmzPJBoaxxbnsUbGM+00HVa7wkijVNxso2PjA.
Are you sure you want to continue connecting (yes/no)? yes
ok: [52.44.179.151]
TASK [ec2 : Add the instance to known hosts] ***********************************
changed: [52.44.179.151 -> localhost]
TASK [common : install git] ****************************************************
ok: [52.44.179.151]
TASK [common : send ssh id] ****************************************************
changed: [52.44.179.151]
TASK [common : git configfiles] ************************************************
changed: [52.44.179.151] => (item={u'dest': u'/home/sharney/.gitignore', u'src': u'dot_gitignore'})
ok: [52.44.179.151] => (item={u'dest': u'/home/sharney/.gitignore', u'src': u'dot_gitignore'})
changed: [52.44.179.151] => (item={u'dest': u'/home/sharney/.git', u'src': u'dot_git'})
TASK [common : bitbucket key to known_hosts] **********************************
# bitbucket.org:22 SSH-2.0-OpenSSH_6.4
changed: [52.44.179.151]
TASK [common : do configfiles repo] ********************************************
changed: [52.44.179.151]
TASK [common : ssh id permissions fix] *****************************************
changed: [52.44.179.151]
TASK [common : github key to known_hosts] **************************************
# github.com:22 SSH-2.0-libssh-0.7.0
changed: [52.44.179.151]
TASK [common : install packages for emacs and more] ****************************
changed: [52.44.179.151] => (item=[u'emacs24', u'emacs24-el', u'pandoc', u'tmux', u'zsh', u'ispell', u'vpnc', u'fonts-hack-ttf', u'ruby', u'ruby-aws-sdk', u'python-pip', u'python-pip-whl', u'virtualenv', u'curl', u'openjdk-8-jre'])
TASK [common : python pip install items] ***************************************
changed: [52.44.179.151] => (item=powerline-status)
changed: [52.44.179.151] => (item=awscli)
changed: [52.44.179.151] => (item=saws)
TASK [common : aws sdk v1 for ruby for awscli] *********************************
ok: [52.44.179.151]
TASK [common : check for spacemacs already imported] ***************************
ok: [52.44.179.151]
TASK [common : mv .emacs.d out of the way for spacemacs if spacemacs not yet deployed] ***
changed: [52.44.179.151]
TASK [common : spacemacs] ******************************************************
changed: [52.44.179.151]
TASK [common : for spacemacs org-protocol-capture-html] ************************
changed: [52.44.179.151]
TASK [common : fix .emacs.d/private via source copy] ***************************
changed: [52.44.179.151]
TASK [common : gen en_US.UTF-8 locale] *****************************************
ok: [52.44.179.151]
TASK [common : set default locale to en_US.UTF-8] ******************************
changed: [52.44.179.151]
TASK [common : symlink for dir_colors] *****************************************
changed: [52.44.179.151]
TASK [common : symlink for .tmux.conf] *****************************************
changed: [52.44.179.151]
TASK [common : copy vpnc config] ***********************************************
changed: [52.44.179.151]
TASK [common : git clone scottharney.com] **************************************
changed: [52.44.179.151]
TASK [common : install ohmyzsh via git] ****************************************
changed: [52.44.179.151]
TASK [common : fix .zshrc] *****************************************************
ok: [52.44.179.151]
TASK [ansible_mystuff : install ansible] ***************************************
changed: [52.44.179.151]
TASK [ansible_mystuff : ansible-mystuff repo for deployment of util hosts] *****
changed: [52.44.179.151]
TASK [openvpn_server : install openvpn] ****************************************
changed: [52.44.179.151]
TASK [openvpn_server : copy openvpn configuration data] ************************
changed: [52.44.179.151]
TASK [openvpn_server : set up ip forwarding] ***********************************
changed: [52.44.179.151]
TASK [openvpn_server : update /etc/rc.local] ***********************************
changed: [52.44.179.151]
TASK [openvpn_server : do iptables] ********************************************
changed: [52.44.179.151]
TASK [openvpn_server : fix debian bug in openvpn service https://bugs.launchpad.net/ubuntu/+source/openvpn/+bug/1580356?] ***
changed: [52.44.179.151]
TASK [openvpn_server : restart systemd daemon after updating unit config] ******
changed: [52.44.179.151]
TASK [openvpn_server : enable openvpn services] ********************************
changed: [52.44.179.151]
TASK [openvpn_server : restart openvpn services] *******************************
changed: [52.44.179.151]
TASK [Reboot system if required] ***********************************************
ok: [52.44.179.151]
TASK [waiting for 52.44.179.151 to reboot] *************************************
ok: [52.44.179.151 -> localhost]
PLAY RECAP *********************************************************************
52.44.179.151 : ok=38 changed=30 unreachable=0 failed=0
~/source/ansible-mystuff on master ⌚ 16:46:28
$
And that’s it. This does quite a bit in a very short period of time but after
the host reboots at the end, it’s ready to go. It does ask me to add the ssh
key to known hosts which I could avoid by pushing a pre-existing already trusted
key during cloud-init
.
The final installment wraps up this series and provides some thoughts on the broader applicability and usefulness of this.