A Grimm's Reality

Friday, April 19, 2013

Cloud-in-a-box-in-a-VM (in a nutshell)

For the past few months, one of my projects at Eucalyptus has been our CentOS 6 based "Silvereye" (a.k.a. FastStart) installer, which assembles multiple repositories, custom install classes, and a few helper scripts onto a DVD-sized ISO image. One of the challenges has been that I'm a remote employee, and downloading an ISO file from our Jenkins server takes too long. At the same time, loading the ISO into Cobbler doesn't really test the same code paths as booting a DVD. So how can I test my cloud installer? Nested virtualization, of course!

As of Fedora 18, I've found nested KVM to be fairly reliable. The performance isn't earth-shattering, but it's usable. There are several things about the setup that may not be obvious, though, so I'll walk through everything I did for my test machine.

The server I'm using has two Xeon CPUs, 8 GB of RAM, and 2 TB of disk. I've allocated 50GB of LVM space for the root filesystem, and when I create new VMs, I give them a 100GB logical volume as their disk ( lvcreate -n vmName -L 100G vg01 ).

The server is on a 10.101.0.0/16 network, and I "own" 10.101.7.0/24 within that, as well as 192.168.57.0/23 for "private" addressing. I've connected bridge device br0 to my primary ethernet device, em1:

/etc/sysconfig/network-scripts/ifcfg-br0

DEVICE=br0
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=dhcp  # IP is reserved: 10.101.1.25
TYPE=Bridge
DELAY=0
PERSISTENT_DHCLIENT=yes

/etc/sysconfig/network-scripts/ifcfg-em1

DEVICE=em1
ONBOOT=yes
NM_CONTROLLED=no
TYPE=Ethernet
BRIDGE=br0

I enabled on the system by placing "options kvm_intel nested=1" in /etc/modprobe.d/kvm.conf and reloading the kvm_intel module.

To allow the VMs to be connected to the bridge interface, I've added "allow br0" to /etc/qemu/bridge.conf

I also set /proc/sys/net/ipv4/ip_forward to 1, so that packet forwarding works.

For now, I've completely turned off iptables. This is not ideal, but I'll leave iptables rule building for another post.

Booting an image into the FastStart installer looks like this:

qemu-kvm -m 2048 -cpu qemu64,+vmx -drive file=/dev/vg01/ciab,if=virtio \
         -net nic,model=virtio,macaddr=52:54:00:12:34:60 -net bridge,br=br0 \
         -cdrom silvereye-nightly-3.3-m5.iso -boot d -vnc :1

Let's look at the individual pieces of that.

1) "-m 2048" is 2GB of RAM
2) "-cpu qemu64,+vmx" specifies that we are emulating a CPU capable of virtualization
3) "-drive file=/dev/vg01/ciab,if=virtio" is our LVM-backed "disk"
4) "-net nic,model=virtio,macaddr=52:54:00:12:34:60" is our virtual network interface. Specifying a unique MAC address is very important! If you don't do this, every VM will get the same MAC, and they won't be able to communicate with each other. They will all be able to send and receive other traffic, though, which can be maddening if you don't realize what's going on.
5) "-net bridge,br=br0" says to connect the host TAP device to bridge br0. This gets you non-NAT access to the physical network, since the bridge is also connected to em1.
6) "-cdrom silvereye-nightly-3.3-m5.iso -boot d" connects our ISO image and boots from it.
7) "-vnc :1" starts a VNC server on the host (on port 5901) for this VM's console.

I connect to 10.101.1.25:5901 with a vnc client, and proceed with the install, selecting "Cloud in a Box" at the boot menu. Since I own 10.101.7.x, I give this VM the IP 10.101.7.1, and configure its "public IP range" to be 10.101.7.10-10.101.7.20. The subset of IPs here is arbitrary; I wanted to leave some for my other test clouds. For private IPs, I'll use 192.168.57.0/24, half of my allotted range. The default gateway and nameserver are exactly the same as for the host system. The host bridge is just a pass-through, not an extra routing "hop".

Once installed, I can access the cloud through the normal channels -- SSH, the admin UI on port 8443, and the user console on port 8888. I test by logging into the user console, generating and downloading a new key, launching an instance, and SSHing into the instance's public IP.

The chain of devices through which data flows may not be entirely clear. When you launch an instance inside the virtual cloud on 10.101.7.1 -- and let's say for the example that it gets an IP of 10.101.7.10 -- here's the list of "devices" through which packets flow when you connect to it from a different physical system:

the physical interface of the host, em1
the host bridge, br0
the host tap interface
the VM's "eth0" interface
the VM's "br0" bridge (which is not enslaving eth0 in this case)
the VM's tap interface (vnet0)
the nested VM's eth0

The only place that NAT happens here is between 4 and 5, and that's only necessary because I chose to use eucalyptus's "MANAGED-NOVLAN" mode.

So that's a cloud-in-a-box-in-a-VM (in a nutshell).

For a more traditional deployment, simply boot more VMs into the installer using the same qemu-kvm command format mentioned above, choosing "Frontend" for one, and "Node Controller" for the other(s). For each one you boot, you need to create a new lvm volume and change the MAC address and the vnc port to avoid conflicts. When running multiple clouds, you should also make sure that your IP ranges never overlap (i.e., don't let two clouds use the same public IPs or private subnet range).

Thursday, January 3, 2013

Using aws-cli with Eucalyptus

Just before the holidays, Amazon released awscli, a new command-line interface for managing AWS resources. The code is based on botocore, the core python library for the next major version of boto. I took awscli for a spin to see if it worked with the Eucalyptus Community Cloud, and as is often the case, the answer was ... almost.

First, it's useful to understand the fundamental problems that awscli was trying to address. The most obvious is profiles. Cloud users deal with multiple regions, accounts, users, etc., and keeping separate configurations for each one is a hassle. awscli uses a section-based config file format which allows for multiple profiles, each of which can reference it's own region, access keys, etc.

Another problem that this new code solves is the centralization of region and service data into JSON files which are easy to read, write, and parse. See _regions.json and _services.json in botocore for examples.

What I found was that rather than trying to alter the existing data files, what I really wanted was a eucalyptus "provider" with its own JSON files. I'll spare you all my trial-and-error, and simply explain what worked:

git clone https://github.com/boto/botocore.git
git clone https://github.com/a13m/aws-cli.git (note that this is my fork -- upstream is https://github.com/aws/aws-cli.git )
Install botocore and aws-cli however you prefer ( I use "python setup.py install --user" in each directory)
create a provider data directory, and a "euca" directory inside it. I'll use /var/tmp/providers as the top directory.
create _regions.json and _services.json under the "euca" directory (the linked examples here should work for ECC verbatim)
symlink to botocore/data/aws/ec2.json and botocore/data/aws/iam.json in the euca provider directory

Create your ~/.awsconfig file (or whatever you'd like to call it):

[default]
aws_access_key_id=XXXXXXXXXXXXXXXXXXXX
aws_secret_access_key=XXXXXXXXXXXXXXXXXX
region=ecc
provider_name=euca

export AWS_CONFIG_FILE=$HOME/.awsconfig
export AWS_DATA_PATH=/var/tmp/providers

try some commands, such as:

aws ec2 create-volume --size 1 --availability-zone partner01
aws ec2 describe-volumes
aws ec2 describe-images

It may take a couple of iterations for the patch I've proposed to be accepted upstream, but in the meantime, I hope this is useful information. As I've mentioned in the pull request, the solution is not ideal, as it requires that your default profile in a config file reference the euca provider, but I went for the least invasive fix first. Note that even with this version, you can use profiles to group all of your eucalyptus cloud credentials into a single config file, and then have a second file for AWS profiles. Switching back and forth is just a matter of setting AWS_CONFIG_FILE.

Wednesday, August 15, 2012

Fun with GitHub pull requests

GitHub pull requests are great for contributors. They offer a very simple way to publicly post a patch against a piece of open source software and get feedback from the maintainers. For various reasons, though, actually doing the merge of a pull request via GitHub's UI may not be the ideal thing for a project maintainer. Reasons include:

* a policy of putting the commit through test prior to merging it

* amendment of a relevant issue ID to a commit message which lacks one (I've not done this to a contribution, but I've considered it)

* the pull request should be merged into a branch other than the one it was filed against

and I'm sure you can think of others. So the first couple of pull requests that I accepted were a bit painful, because my process was:

* git clone the contributor's repo

* use git format-patch to export the patches

* use git am to import the patches

In particular, the git clone of an entire repo just to get a 5-line patch seemed like a tremendous waste of time and bandwidth, and I knew there had to be a better way. Obviously github's website displays the diff associated with the pull request, so it has to be stored _somewhere_, right?

And of course it is. For example, here is a pull request:

https://github.com/eucalyptus/eucalyptus/pull/3/

To see the properly formatted patch associated with it, simply drop the trailing slash and add ".patch":

https://github.com/eucalyptus/eucalyptus/pull/3.patch

It's worth noting that I found this via a link tag which exists in the source of the pull request page:

<link rel='alternate' type='text/x-patch' href='/eucalyptus/eucalyptus/pull/3.patch' />

So there's probably some code floating around somewhere to simply follow that link instead of knowing how to alter the url. Anyway, I'm quite happy to have found this. I'm not sure why there's not an obvious link to it in each pull request page. I think it would serve project maintainers well.

Tuesday, July 3, 2012

Jira full text search tips

One of the advantages of using Jira at Eucalyptus is that it has very good Lucene-based full-text search. It's not necessarily obvious how to use it, though. If you want to search for a multi-word string, you have to use the advanced search (JQL), and quote it like this:

text ~ "\"disk space\""

if you only want to find the individual words rather than the exact string, remove the escaped quotes:

text ~ "disk space"

The first query returns about 4 results for me, while the second returns dozens, so the subtle difference can be very important.

More search tips here.

Just be aware that most of those tips don't seem to work via "quick search". You have use them inside quotes in a JQL search. I tested things like:

text ~ "\"disk space\"~10" (disk and space within ten words of each other)
text ~ "behavio*r" (behavior/behaviour plus other possible but unlikely strings)

and so on. You can also search specific text fields such as:

summary ~ "behavio*r"

Hopefully this info will make it easier for people to find existing issues in our Jira instance as well as others around the web.

Thursday, May 3, 2012

Sampling GitHub API v3 in Python

Eucalyptus is in the process of moving code to GitHub, and this week I finally decided to look at the available API tools for working with GitHub. I wanted a tool written in python, since that would be the fastest for me to extend, and I found github2. Unfortunately, that homepage had a prominent warning that the code only worked with GitHub's old APIs, which were being turned off this week. So I decided to investigate what I could do from scratch in a small amount of code. I had already started using restkit in jiranemo, so that seemed to be a reasonable starting point. Here's what I came up with:

import json
from restkit import Resource, BasicAuth, Connection, request
from socketpool import ConnectionPool

pool = ConnectionPool(factory=Connection)
serverurl="https://api.github.com"

# Add your username and password here, or prompt for them
auth=BasicAuth(user, password)

# Use your basic auth to request a token
# This is just an example from http://developer.github.com/v3/
authreqdata = { "scopes": [ "public_repo" ], "
                note": "admin script" }
resource = Resource('https://api.github.com/authorizations', 
                    pool=pool, filters=[auth])
response = resource.post(headers={ "Content-Type": "application/json" }, 
                         payload=json.dumps(authreqdata))
token = json.loads(response.body_string())['token']

"""
Once you have a token, you can pass that in the Authorization header
You can store this in a cache and throw away the user/password
This is just an example query.  See http://developer.github.com/v3/ 
for more about the url structure
"""
resource = Resource('https://api.github.com/user/repos', pool=pool)
headers = {'Content-Type' : 'application/json' }
headers['Authorization'] = 'token %s' % token
response = resource.get(headers = headers)
repos = json.loads(response.body_string())

There's not any magic in this code, but it took a couple of reads to wade past all of the OAuth talk in github's docs and realize that for a simple browserless tool, you can avoid using OAuth libraries altogether and still not have to store a hard-coded password.

Thursday, April 26, 2012

Greenhopper, Jira, and REST

One of the somewhat frustrating problems I'm dealing with in Greenhopper is that I want the ability to treat a linked issue like a subtask, but without all the restrictions of a subtask. Subtasks have at least three limitations that get in my way:

They must be in the same project as their parent
They must have the same permissions (issue-level security) as their parent
They must be of an issue type that is flagged as a "subtask" type, so for example, a "Feature" cannot be a subtask of a "Story" unless you create a separate "Feature (subtask)" issues type.

Issue #1 is probably the most frustrating, because product management and the exec team at a software company think in terms of high-level features or use cases for which the implementation will often cross project boundaries. (An example at Eucalyptus is that a new use case may require changes to both Eucalyptus and Euca2ools.)

The Greenhopper UI operates mostly via a REST API, and so far this API is not well documented. Last night I got around this lack of documentation by using mitmproxy to monitor calls while moving issues up and down the planning page in Greenhopper's Rapid Board. Then I added a simple rest client class to jiranemo based on restkit. I made two helper functions: one to get the rest representation of an issue, and another to change the rank of an issue in Greenhopper. My script looks like this:

#!/usr/bin/python

import sys
import pyjira
from jiranemo import jiracfg

# Set the exception hook to enter a debugger on
# uncaught exceptions
from jiranemo.lib import util
sys.excepthook = util.genExcepthook(debug=True,
                                    debugCtrlC=True)

# Read ${HOME}/.jirarc, and set up clients and auth caches.
cfg = jiracfg.JiraConfiguration(readConfigFiles=True)
authorizer = pyjira.auth.CachingInteractiveAuthorizer(cfg.authCache)
ccAuthorizer = pyjira.auth.CookieCachingInteractiveAuthorizer(cfg.cookieCache)
client = pyjira.JiraClient(cfg.wsdl, 
                           (cfg.user, cfg.password), 
                           authorizer=authorizer, 
                           webAuthorizer=ccAuthorizer)

# Do a simple JQL query via the SOAP client, return 20 results
issues = client.client.getIssuesFromJqlSearch(
    '''project = "system testing 2" order by Rank DESC''', 20)
for x in issues:
    # Get the REST representation of each issue, because links
    # aren't shown in the SOAP representation
    rest_issue = client.restclient.get_issue(x.key)

    for link in rest_issue['fields']['issuelinks']:
      if link['type'].has_key('inward') and \
         link['type']['inward'] == "is blocked by":
        # Rank the linked issue above this one in Greenhopper
        result = client.restclient.gh_rank(link['inwardIssue']['key'], 
                                           before=rest_issue['key'])

The code could use some error checking, but this is a pretty simple starting point for doing something that Jira and Greenhopper can't do on their own.

Wednesday, April 25, 2012

Resurrecting Jiranemo

About six years ago, David Christian developed a JIRA CLI called jiranemo (his original blog post is, somewhat surprisingly, still around on the rPath website). After he left rPath, I spent some time updating the code for Jira 4 and adding some minor features, but it's been mostly stagnant for about two years. In the meantime, Jira 5 has been released, and the core dependency of jiranemo, SOAPpy, has been declared dead.

This month, Eucalyptus started on the migration path from using a combination of RT and Launchpad to using Jira. I'm really excited about the change, and it gave me a chance to pick up the jiranemo code again. I've now converted it from SOAPpy to suds, and on Monday I used it to import 2000 issues from RT into jira (stay tuned for details on that becoming a publicly-accessible system). I had database access to RT, but all of the interaction with jira was done through the SOAP API. ( I realize they now also have a REST API, which looks awesome, but I already had the code for using SOAP ).

I should also note that before I took on this work, I looked at Matt Doar's python-based CLI, which worked well for single commands (and was a reference for some of my jiranemo updates), but it didn't have a library interface, and it seemed very inefficient to keep spawning new python processes for thousands of commands. Jiranemo's separation of the command-line option handling and config file parsing from the client library and helper functions make it fantastic for integrating into more complex python apps.

I expect that the next phase of development for jiranemo will be a gradual migration toward the REST APIs. If this code is useful to you and you'd like to contribute to this effort, feel free to fork my bitbucket repo and send me pull requests.