Automating Nutanix Prism VMs

06 Dec 2016, 00:00

Intro

My goal is to explore the Nutanix APIs provided by Prism and Acropolis Hypervisor and automate deployment of AHV virtual machines. I am looking at implementing CoreOS "container linux" VMs with an eye towards hosting applications on a Kubernetes (k8s) cluster for testing and learning purposes.

This is similar to working with public cloud (eg. ec2) but in a "private cloud" environment. Nutanix provides RESTful APIs to work with as does coreos and all of its component pieces.

Before I can deploy Kubernetes, I need to host it somewhere. In this case I'll use VMs to host 3 instances of coreos. This will also host the etcd2 cluster required for a production coreos setup and recommended for k8s as well. The etcd2 cluster should be separate from Kubernetes.

Deploying VMs in Prism via the GUI is quick and easy. But ultimately the goal for learning and getting closer to public cloud is to automate as much as possible such as what I might do with ec2.

File Organization

.
├── ansible_create_vm.yml
├── ansible_get_vm.yml
├── cloud-config-coreos-asg0.yml
├── cloud-config-coreos-asg1.yml
├── cloud-config-coreos-asg2.yml
├── create_vm_rest.http
├── playbooks
├── README.org
├── roles
├── templates
└── vars
    ├── group_vars
    └── hosts

Direct RESTful GETs and POSTs

I started out just making direct REST queries and building up my understanding of how I would generate a POST to create a VM . I used restclient.el in Spacemacs but using curl or httpie or another client tool would be similar. The RESTful POST call to create a VM looks like this

  

  # -*- restclient -*-
  #
  # Gets  all Github APIs, formats JSON, shows response status and headers underneath.
  # Also sends a User-Agent header, because the Github API requires this.
  #
  POST https://127.0.0.1:9440/PrismGateway/services/rest/v1/vms/
  Authorization: Basic YWRA9xh6QTFzMmczISE=
  {
    "name": "coreos-asg0",
    "memoryMb": "1024",
    "numVcpus": "1",
    "hypervisorType": "Acropolis",
    "description": "CoreOS Instance 0",
    "vmDisks": [
      {
        "isCdrom": false,
        "isThinProvisioned": true,
        "vmDiskCreate": {
          "containerUuid": "a84d4b18-6656-43dd-8ca0-acc98c7cf7fb",
          "sizeMb": "20480"
        }
      },
      {
        "isCdrom": true,
        "vmDiskClone": {
          "containerUuid": "5da9f6a9-2b84-46d5-8970-1fb9997752c1",
          "vmDiskUuid": "ac514e35-1d83-4bde-8da8-1190379e0d83"
        }
      }
    ],
    "vmNics": [
      {
        "networkUuid": "205b7475-f572-4330-81d5-a2db4af8bfcf",
        "requestedIpAddress": "172.30.8.66"
      }
    ]
  }

The authentication string is a Base64 encoded string that contains "admin:theactualpassword" . I've obfuscated the Base64 so it doesn't decode. Executing the above launches a NTNX task and creates the VM that can then be booted and the CoreOS installer can run and install to disk. Everything is hard coded in this initial example but it's easy to see that items can be supplied programatically. I pulled the uuid values from prism and via other GET calls piping through jq . The cloned disk above creates a "cdrom" that holds the coreos iso bootable image and attaches it to the vm.

The only thing I don't demonstrate in the above POST is adding user_data into the VM which contains initial VM customization. This is a key coreos and cloud concept for low-level VM/instance creation. cloudconfig is a widely used and evolving standard. For the coreos cloud-config user_data insertion, you can either point at a file on the underlying acropolis file system or you can insert the YAML "cloud-config" data as a single string. This looks messy in a raw HTTP call as you have to escape a lot of characters to insert YAML into JSON. . It does work and and the result is that you can mount this data as a second "cdrom" once coreos has booted from the ISO image with sudo mount /dev/sr1 /mnt and use the data in /mnt/openstack/latest/user_data for sudo coreos-install

The main thing here for me though is that I can use what I learned from constructing the above POST and related queries to instead use Ansible's URI module to create VM instances.

Starting with Ansible

Of course I need a vars/hosts file with a few variables set for each VM I'm going to create.

[etcd_hosts]
172.30.8.66  hostname=coreos-asg0 etcd_instance=0 filename=cloud-config-coreos-asg0.yml
172.30.8.67  hostname=coreos-asg1 etcd_instance=1 filename=cloud-config-coreos-asg1.yml
172.30.8.68  hostname=coreos-asg2 etcd_instance=2 filename=cloud-config-coreos-asg2.yml

And then I can do a bit of GET testing with ansible_get_vm.yml

---

- name: Get some info from hosts
  hosts: etcd_hosts
  gather_facts: False
  connection: local
  vars:
    base_url: "https://127.0.0.1:9440/PrismGateway/services/rest/v1"
    username: "{{ lookup('env', 'ANSIBLE_USER') }}"
    password: "{{ lookup('env', 'ANSIBLE_PASSWORD') }}"

  tasks:
    - name: Get hosts
      uri:
        url: "{{ base_url }}/hosts"
        validate_certs: no
        force_basic_auth: yes
        method: GET
        status_code: 200
        user: "{{ username }}"
        password: "{{ password }}"
        body_format: json
        return_content: yes
      register: hostinfo

    - name: Output info
      debug: msg="DEBUG {{ item.name }}"
        with_items: "{{ hostinfo.json.entities }}"

The above is just for me to be able to validate that I can talk to the API with Ansible's URI module. Note that rather than hardcoding a username and password I set them in as shell environment variables and use Ansible's lookup capabilities to get them. I can then call ansible-playbook -i vars/hosts ansible_get_vm.yml and validate that things are working as expected. I'll get a whole bunch of JSON output displayed along with my "DEBUG" string demonstrating that I can extract a desired value from the returned JSON.

A little side note: I'm using ssh port tunneling to get to my Nutanix cluster and that's the reason for the base_url content referencing localhost above.

Ansible playbook to create the VMs

I can use the URI module and tell it to parse YAML into JSON and then send it through via a POST. This blog post was enormously helpful in learning this detail . The playbook to create my VMs looks like this:

---

- name: Create VM
  hosts: etcd_hosts
  gather_facts: False
  connection: local
  vars:
    base_url: "https://127.0.0.1:9440/PrismGateway/services/rest/v1"
    username: "{{ lookup('env', 'ANSIBLE_USER') }}"
    password: "{{ lookup('env', 'ANSIBLE_PASSWORD') }}"

  tasks:
    - name: create VM
      uri:
        url: "{{ base_url }}/vms"
        validate_certs: no
        force_basic_auth: yes
        method: POST
        status_code: 200
        user: "{{ username }}"
        password: "{{ password }}"
        return_content: yes
        body: 
          name: "{{ hostname }}"
          memoryMb: 1024
          numVcpus: 1
          hypervisorType: Acropolis
          description: "CoreOS Instance {{ etcd_instance }}"
          vmCustomizationConfig:
            userdata: "{{ lookup('file', filename) }}" 
          vmDisks:
            - isCdrom: false
              isThinProvisioned: true
              vmDiskCreate:
                containerUuid: "a84d4b18-6656-43dd-8ca0-acc98c7cf7fb"
                sizeMb: 20480
            - isCdrom: true
              vmDiskClone:
                 containerUuid: "5da9f6a9-2b84-46d5-8970-1fb9997752c1"
                 vmDiskUuid: "ac514e35-1d83-4bde-8da8-1190379e0d83"
          vmNics:
            - networkUuid: "205b7475-f572-4330-81d5-a2db4af8bfcf"
              requestedIpAddress: "{{ inventory_hostname }}"
        body_format: json

If you compare the above to the "raw" http json query I showed earlier you can see how this is constructed. the body_format: json tells Ansible to take the YAML dictionary content represented by "body" and format it as JSON.

I also am able to pass in the cloud-config customization data with a file lookup. So now the cloud-config customization can be pulled into the coreos booted instance for reference by the installer.

My `cloud-config` VM customization files

Taking a look at one customization file cloud-config-coreos-asg0.yml :


  #cloud-config

  hostname: "coreos-asg0"
  ssh_authorized_keys:
    - "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDaq728TKKxYSol4 etc@notgoingputithere.com"
  write_files:
    - path: /etc/resolv.conf
      permissions: "0644"
      owner: "root"
      content: |
          nameserver 172.30.1.7
    - path: /etc/ntp.conf
      content: |
          # Common pool
          server 0.pool.ntp.org
          server 1.pool.ntp.org
          # - Allow only time queries, at a limited rate.
          # - Allow all local queries (IPv4, IPv6)
          restrict default nomodify nopeer noquery limited kod
          restrict 127.0.0.1
          restrict [::1]
  coreos:
    etcd2:
      name: infra0
      initial-advertise-peer-urls: "http://172.30.8.66:2380"
      initial-cluster-token: "etcd-cluster-1"
      initial-cluster: "infra0=http://172.30.8.66:2380,infra1=http://172.30.8.67:2380,infra2=http://172.30.8.68:2380"
      advertise-client-urls: "http://172.30.8.66:2379"
      listen-client-urls: "http://172.30.8.66:2379,http://127.0.0.1:2379"
      listen-peer-urls: "http://172.30.8.66:2380"
      initial-cluster-state: new
    fleet:
        public-ip: "172.30.8.66"
    update:
      reboot-strategy: "etcd-lock" 
    units:
    - name: "etcd2.service"
      command: "start"
      enable: true
    - name: "fleet.service"
      command: "start"
      enable: true
    - name: settimezone.service
      command: start
      content: |
        [Unit]
        Description=Set the time zone

        [Service]
        ExecStart=/usr/bin/timedatectl set-timezone America/Denver ; /usr/bin/timedatectl set-ntp true
        RemainAfterExit=yes
        Type=oneshot 
    - name: "ntpd.service"
      command: "start"
      enable: true

I'm doing a lot here. This injects an ssh key into the "coreos" user's .ssh/authorized_keys file . It also does some basic OS provisioning. I also elected to set up a static etcd2 cluster for now. I did experiment with automated discovery capabilities described in the docs but for simplicity I decided to go back to a statically defined cluster for now.

running the playbook and deploying VMs

I run the playbook and ansible dutifully creates my three VMs.

    sharney@zenarcade:~/source/coreos-k8s-lab$ ansible-playbook  ansible_create_vm.yml -i vars/hosts 

    PLAY [Create VM] ***************************************************************

    TASK [create VM] ***************************************************************
    ok: [172.30.8.68]
    ok: [172.30.8.67]
    ok: [172.30.8.66]

    PLAY RECAP *********************************************************************
    172.30.8.66                : ok=1    changed=0    unreachable=0    failed=0   
    172.30.8.67                : ok=1    changed=0    unreachable=0    failed=0   
    172.30.8.68                : ok=1    changed=0    unreachable=0    failed=0

After creation I go into Prism UI and power on each vm. Install from the CoreOS ISO to "bare metal" (VM really for my purposes) is a manual process . I need to mount my ISO containing the user_data and pass it to coreos-install . I launch the console for the vm and do the following

$ sudo bash
# mount /dev/sr1 /mnt
# coreos-install -c /mnt/openstack/latest/user_data -d /dev/sda

When this completes, I bring the guest down with a "Guest shutdown", remove both cdrom devices from the guest VM configuration, and power it back on. The VMs boot quickly and at this point I can ssh into each .

sharney@zenarcade:~/source/coreos-k8s-lab$ ssh core@172.30.8.66
The authenticity of host '172.30.8.66 (172.30.8.66)' can't be established.
ECDSA key fingerprint is SHA256:zDa/2I2iY9lpFyFYys5aEeQeaR1xJZp4srTJDJXaEa4.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '172.30.8.66' (ECDSA) to the list of known hosts.
CoreOS stable (1185.3.0)
core@coreos-asg0 ~ $ date
Tue Dec  6 09:40:38 MST 2016
core@coreos-asg0 ~ $ ntpd -q
 6 Dec 09:41:47 ntpd[1435]: ntpd 4.2.8p8@1.3265-o Tue Nov  1 01:31:23 UTC 2016 (1): Starting
 6 Dec 09:41:47 ntpd[1435]: Command line: ntpd -q
 6 Dec 09:41:47 ntpd[1435]: must be run as root, not uid 500
core@coreos-asg0 ~ $ etcdctl cluster-health
member 2346579eee0df4e9 is healthy: got healthy result from http://172.30.8.66:2379
member 98ee296eb6a09283 is healthy: got healthy result from http://172.30.8.67:2379
member cc50d87ad87a552c is healthy: got healthy result from http://172.30.8.68:2379
cluster is healthy
core@coreos-asg0 ~ $ etcdctl member list   
2346579eee0df4e9: name=infra0 peerURLs=http://172.30.8.66:2380 clientURLs=http://172.30.8.66:2379 isLeader=true
98ee296eb6a09283: name=infra1 peerURLs=http://172.30.8.67:2380 clientURLs=http://172.30.8.67:2379 isLeader=false
cc50d87ad87a552c: name=infra2 peerURLs=http://172.30.8.68:2380 clientURLs=http://172.30.8.68:2379 isLeader=false

This shows the local timezone was set, ntpd is running and my static etcd2 cluster is running well. A final smoke test is to set an etcd value on one host, check it on another and remove it on the third:

core@coreos-asg0 ~ $ etcdctl set /message "Hello World!"                                                                                                       
Hello World!

core@coreos-asg1 ~ $ etcdctl get /message
Hello World!

core@coreos-asg2 ~ $ etcdctl rm /message
PrevNode.Value: Hello World!

Next steps

I learned a lot over a few days doing this exercise. I picked Nutanix because we have it in our lab and they've done a good job with exposing their APIs. It wasn't too hard to take concepts I learned working with EC2 and AWS and transfer them over to the Nutanix setup.

There's much more to do:

Gain a better understanding of production CoreOS clustering and related concepts:
- etcd production clusters and discovery mechanism
- fleet vs kubernetes for low-level items
- dealing with systemd. I hate it but it's unavoidable these days
- managing etcd as well as properly planning architecture. It should be straightforward to add in new nodes, for example, with more memory/disk space if needed and dynamically retire old ones.
fully automate the install
- Since the install in this case is emulating a bare metal install it's technically destructive so coreos has this installer as manual
- but if I look at installers via PXE, AWS EC2, etc these fully automate the deployment. I should be able to build my own images, use the OEM partition, etc. to create a no touch deployment using only cloud-init to customize generated VMs
move on to kubernetes.
- ambition vs time. we shall see….