Intro
My goal is to explore the Nutanix APIs provided by Prism and Acropolis Hypervisor and automate deployment of AHV virtual machines. I am looking at implementing CoreOS "container linux" VMs with an eye towards hosting applications on a Kubernetes (k8s) cluster for testing and learning purposes.
This is similar to working with public cloud (eg. ec2) but in a "private cloud" environment. Nutanix provides RESTful APIs to work with as does coreos and all of its component pieces.
Before I can deploy Kubernetes, I need to host it somewhere. In this case I'll use VMs to host 3 instances of coreos. This will also host the etcd2 cluster required for a production coreos setup and recommended for k8s as well. The etcd2 cluster should be separate from Kubernetes.
Deploying VMs in Prism via the GUI is quick and easy. But ultimately the goal for learning and getting closer to public cloud is to automate as much as possible such as what I might do with ec2.
File Organization
.
├── ansible_create_vm.yml
├── ansible_get_vm.yml
├── cloud-config-coreos-asg0.yml
├── cloud-config-coreos-asg1.yml
├── cloud-config-coreos-asg2.yml
├── create_vm_rest.http
├── playbooks
├── README.org
├── roles
├── templates
└── vars
├── group_vars
└── hosts
Direct RESTful GETs and POSTs
I started out just making direct REST queries and building up my understanding of how I would generate a POST to create a VM . I used restclient.el in Spacemacs but using curl or httpie or another client tool would be similar. The RESTful POST call to create a VM looks like this
# -*- restclient -*-
#
# Gets all Github APIs, formats JSON, shows response status and headers underneath.
# Also sends a User-Agent header, because the Github API requires this.
#
POST https://127.0.0.1:9440/PrismGateway/services/rest/v1/vms/
Authorization: Basic YWRA9xh6QTFzMmczISE=
{
"name": "coreos-asg0",
"memoryMb": "1024",
"numVcpus": "1",
"hypervisorType": "Acropolis",
"description": "CoreOS Instance 0",
"vmDisks": [
{
"isCdrom": false,
"isThinProvisioned": true,
"vmDiskCreate": {
"containerUuid": "a84d4b18-6656-43dd-8ca0-acc98c7cf7fb",
"sizeMb": "20480"
}
},
{
"isCdrom": true,
"vmDiskClone": {
"containerUuid": "5da9f6a9-2b84-46d5-8970-1fb9997752c1",
"vmDiskUuid": "ac514e35-1d83-4bde-8da8-1190379e0d83"
}
}
],
"vmNics": [
{
"networkUuid": "205b7475-f572-4330-81d5-a2db4af8bfcf",
"requestedIpAddress": "172.30.8.66"
}
]
}
The authentication string is a Base64 encoded string that contains
"admin:theactualpassword" . I've obfuscated the Base64 so it doesn't decode.
Executing the above launches a NTNX task and creates the VM that can then be
booted and the CoreOS installer can run and install to disk. Everything is
hard coded in this initial example but it's easy to see that items can be
supplied programatically. I pulled the uuid values from prism and via other
GET calls piping through jq
. The cloned disk above creates a "cdrom" that
holds the coreos iso bootable image and attaches it to the vm.
The only thing I don't demonstrate in the above POST is adding user_data
into the VM which contains initial VM customization. This is a key coreos and
cloud concept for low-level VM/instance creation. cloudconfig is a widely used
and evolving standard. For the coreos cloud-config user_data
insertion, you
can either point at a file on the underlying acropolis file system or you can
insert the YAML "cloud-config" data as a single string. This looks messy in a
raw HTTP call as you have to escape a lot of characters to insert YAML into
JSON. . It does work and and the result is that you can mount this data as a
second "cdrom" once coreos has booted from the ISO image with sudo mount
/dev/sr1 /mnt
and use the data in /mnt/openstack/latest/user_data
for sudo
coreos-install
The main thing here for me though is that I can use what I learned from constructing the above POST and related queries to instead use Ansible's URI module to create VM instances.
Starting with Ansible
Of course I need a vars/hosts
file with a few variables set for each VM I'm
going to create.
[etcd_hosts]
172.30.8.66 hostname=coreos-asg0 etcd_instance=0 filename=cloud-config-coreos-asg0.yml
172.30.8.67 hostname=coreos-asg1 etcd_instance=1 filename=cloud-config-coreos-asg1.yml
172.30.8.68 hostname=coreos-asg2 etcd_instance=2 filename=cloud-config-coreos-asg2.yml
And then I can do a bit of GET testing with ansible_get_vm.yml
---
- name: Get some info from hosts
hosts: etcd_hosts
gather_facts: False
connection: local
vars:
base_url: "https://127.0.0.1:9440/PrismGateway/services/rest/v1"
username: "{{ lookup('env', 'ANSIBLE_USER') }}"
password: "{{ lookup('env', 'ANSIBLE_PASSWORD') }}"
tasks:
- name: Get hosts
uri:
url: "{{ base_url }}/hosts"
validate_certs: no
force_basic_auth: yes
method: GET
status_code: 200
user: "{{ username }}"
password: "{{ password }}"
body_format: json
return_content: yes
register: hostinfo
- name: Output info
debug: msg="DEBUG {{ item.name }}"
with_items: "{{ hostinfo.json.entities }}"
The above is just for me to be able to validate that I can talk to the API
with Ansible's URI module. Note that rather than hardcoding a username and
password I set them in as shell environment variables and use Ansible's lookup
capabilities to get them. I can then call ansible-playbook -i vars/hosts
ansible_get_vm.yml
and validate that things are working as expected. I'll get
a whole bunch of JSON output displayed along with my "DEBUG" string
demonstrating that I can extract a desired value from the returned JSON.
A little side note: I'm using ssh port tunneling to get to my Nutanix cluster
and that's the reason for the base_url
content referencing localhost above.
Ansible playbook to create the VMs
I can use the URI module and tell it to parse YAML into JSON and then send it through via a POST. This blog post was enormously helpful in learning this detail . The playbook to create my VMs looks like this:
---
- name: Create VM
hosts: etcd_hosts
gather_facts: False
connection: local
vars:
base_url: "https://127.0.0.1:9440/PrismGateway/services/rest/v1"
username: "{{ lookup('env', 'ANSIBLE_USER') }}"
password: "{{ lookup('env', 'ANSIBLE_PASSWORD') }}"
tasks:
- name: create VM
uri:
url: "{{ base_url }}/vms"
validate_certs: no
force_basic_auth: yes
method: POST
status_code: 200
user: "{{ username }}"
password: "{{ password }}"
return_content: yes
body:
name: "{{ hostname }}"
memoryMb: 1024
numVcpus: 1
hypervisorType: Acropolis
description: "CoreOS Instance {{ etcd_instance }}"
vmCustomizationConfig:
userdata: "{{ lookup('file', filename) }}"
vmDisks:
- isCdrom: false
isThinProvisioned: true
vmDiskCreate:
containerUuid: "a84d4b18-6656-43dd-8ca0-acc98c7cf7fb"
sizeMb: 20480
- isCdrom: true
vmDiskClone:
containerUuid: "5da9f6a9-2b84-46d5-8970-1fb9997752c1"
vmDiskUuid: "ac514e35-1d83-4bde-8da8-1190379e0d83"
vmNics:
- networkUuid: "205b7475-f572-4330-81d5-a2db4af8bfcf"
requestedIpAddress: "{{ inventory_hostname }}"
body_format: json
If you compare the above to the "raw" http json query I showed earlier you can
see how this is constructed. the body_format: json
tells Ansible to take the
YAML dictionary content represented by "body" and format it as JSON.
I also am able to pass in the cloud-config customization data with a file lookup. So now the cloud-config customization can be pulled into the coreos booted instance for reference by the installer.
My cloud-config
VM customization files
Taking a look at one customization file cloud-config-coreos-asg0.yml
:
#cloud-config
hostname: "coreos-asg0"
ssh_authorized_keys:
- "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDaq728TKKxYSol4 etc@notgoingputithere.com"
write_files:
- path: /etc/resolv.conf
permissions: "0644"
owner: "root"
content: |
nameserver 172.30.1.7
- path: /etc/ntp.conf
content: |
# Common pool
server 0.pool.ntp.org
server 1.pool.ntp.org
# - Allow only time queries, at a limited rate.
# - Allow all local queries (IPv4, IPv6)
restrict default nomodify nopeer noquery limited kod
restrict 127.0.0.1
restrict [::1]
coreos:
etcd2:
name: infra0
initial-advertise-peer-urls: "http://172.30.8.66:2380"
initial-cluster-token: "etcd-cluster-1"
initial-cluster: "infra0=http://172.30.8.66:2380,infra1=http://172.30.8.67:2380,infra2=http://172.30.8.68:2380"
advertise-client-urls: "http://172.30.8.66:2379"
listen-client-urls: "http://172.30.8.66:2379,http://127.0.0.1:2379"
listen-peer-urls: "http://172.30.8.66:2380"
initial-cluster-state: new
fleet:
public-ip: "172.30.8.66"
update:
reboot-strategy: "etcd-lock"
units:
- name: "etcd2.service"
command: "start"
enable: true
- name: "fleet.service"
command: "start"
enable: true
- name: settimezone.service
command: start
content: |
[Unit]
Description=Set the time zone
[Service]
ExecStart=/usr/bin/timedatectl set-timezone America/Denver ; /usr/bin/timedatectl set-ntp true
RemainAfterExit=yes
Type=oneshot
- name: "ntpd.service"
command: "start"
enable: true
I'm doing a lot here. This injects an ssh key into the "coreos" user's
.ssh/authorized_keys
file . It also does some basic OS provisioning. I also
elected to set up a static etcd2 cluster for now. I did experiment with
automated discovery capabilities described in the docs but for simplicity I
decided to go back to a statically defined cluster for now.
running the playbook and deploying VMs
I run the playbook and ansible dutifully creates my three VMs.
sharney@zenarcade:~/source/coreos-k8s-lab$ ansible-playbook ansible_create_vm.yml -i vars/hosts
PLAY [Create VM] ***************************************************************
TASK [create VM] ***************************************************************
ok: [172.30.8.68]
ok: [172.30.8.67]
ok: [172.30.8.66]
PLAY RECAP *********************************************************************
172.30.8.66 : ok=1 changed=0 unreachable=0 failed=0
172.30.8.67 : ok=1 changed=0 unreachable=0 failed=0
172.30.8.68 : ok=1 changed=0 unreachable=0 failed=0
After creation I go into Prism UI and power on each vm. Install from the
CoreOS ISO to "bare metal" (VM really for my purposes) is a manual process . I
need to mount my ISO containing the user_data
and pass it to
coreos-install
. I launch the console for the vm and do the following
$ sudo bash
# mount /dev/sr1 /mnt
# coreos-install -c /mnt/openstack/latest/user_data -d /dev/sda
When this completes, I bring the guest down with a "Guest shutdown", remove both cdrom devices from the guest VM configuration, and power it back on. The VMs boot quickly and at this point I can ssh into each .
sharney@zenarcade:~/source/coreos-k8s-lab$ ssh core@172.30.8.66
The authenticity of host '172.30.8.66 (172.30.8.66)' can't be established.
ECDSA key fingerprint is SHA256:zDa/2I2iY9lpFyFYys5aEeQeaR1xJZp4srTJDJXaEa4.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '172.30.8.66' (ECDSA) to the list of known hosts.
CoreOS stable (1185.3.0)
core@coreos-asg0 ~ $ date
Tue Dec 6 09:40:38 MST 2016
core@coreos-asg0 ~ $ ntpd -q
6 Dec 09:41:47 ntpd[1435]: ntpd 4.2.8p8@1.3265-o Tue Nov 1 01:31:23 UTC 2016 (1): Starting
6 Dec 09:41:47 ntpd[1435]: Command line: ntpd -q
6 Dec 09:41:47 ntpd[1435]: must be run as root, not uid 500
core@coreos-asg0 ~ $ etcdctl cluster-health
member 2346579eee0df4e9 is healthy: got healthy result from http://172.30.8.66:2379
member 98ee296eb6a09283 is healthy: got healthy result from http://172.30.8.67:2379
member cc50d87ad87a552c is healthy: got healthy result from http://172.30.8.68:2379
cluster is healthy
core@coreos-asg0 ~ $ etcdctl member list
2346579eee0df4e9: name=infra0 peerURLs=http://172.30.8.66:2380 clientURLs=http://172.30.8.66:2379 isLeader=true
98ee296eb6a09283: name=infra1 peerURLs=http://172.30.8.67:2380 clientURLs=http://172.30.8.67:2379 isLeader=false
cc50d87ad87a552c: name=infra2 peerURLs=http://172.30.8.68:2380 clientURLs=http://172.30.8.68:2379 isLeader=false
This shows the local timezone was set, ntpd is running and my static etcd2 cluster is running well. A final smoke test is to set an etcd value on one host, check it on another and remove it on the third:
core@coreos-asg0 ~ $ etcdctl set /message "Hello World!"
Hello World!
core@coreos-asg1 ~ $ etcdctl get /message
Hello World!
core@coreos-asg2 ~ $ etcdctl rm /message
PrevNode.Value: Hello World!
Next steps
I learned a lot over a few days doing this exercise. I picked Nutanix because we have it in our lab and they've done a good job with exposing their APIs. It wasn't too hard to take concepts I learned working with EC2 and AWS and transfer them over to the Nutanix setup.
There's much more to do:
-
Gain a better understanding of production CoreOS clustering and related concepts:
-
etcd production clusters and discovery mechanism
-
fleet vs kubernetes for low-level items
-
dealing with systemd. I hate it but it's unavoidable these days
-
managing etcd as well as properly planning architecture. It should be straightforward to add in new nodes, for example, with more memory/disk space if needed and dynamically retire old ones.
-
-
fully automate the install
-
Since the install in this case is emulating a bare metal install it's technically destructive so coreos has this installer as manual
-
but if I look at installers via PXE, AWS EC2, etc these fully automate the deployment. I should be able to build my own images, use the OEM partition, etc. to create a no touch deployment using only cloud-init to customize generated VMs
-
-
move on to kubernetes.
-
ambition vs time. we shall see….
-