Friday, April 19, 2013

Cloud-in-a-box-in-a-VM (in a nutshell)

For the past few months, one of my projects at Eucalyptus has been our CentOS 6 based "Silvereye" (a.k.a. FastStart) installer, which assembles multiple repositories, custom install classes, and a few helper scripts onto a DVD-sized ISO image.  One of the challenges has been that I'm a remote employee, and downloading an ISO file from our Jenkins server takes too long.  At the same time, loading the ISO into Cobbler doesn't really test the same code paths as booting a DVD.  So how can I test my cloud installer?  Nested virtualization, of course!

As of Fedora 18, I've found nested KVM to be fairly reliable.  The performance isn't earth-shattering, but it's usable.  There are several things about the setup that may not be obvious, though, so I'll walk through everything I did for my test machine.

The server I'm using has two Xeon CPUs, 8 GB of RAM, and 2 TB of disk.  I've allocated 50GB of LVM space for the root filesystem, and when I create new VMs, I give them a 100GB logical volume as their disk ( lvcreate -n vmName -L 100G vg01 ).

The server is on a 10.101.0.0/16 network, and I "own" 10.101.7.0/24 within that, as well as 192.168.57.0/23 for "private" addressing.  I've connected bridge device br0 to my primary ethernet device, em1:
  • /etc/sysconfig/network-scripts/ifcfg-br0
    DEVICE=br0
    ONBOOT=yes
    NM_CONTROLLED=no
    BOOTPROTO=dhcp  # IP is reserved: 10.101.1.25
    TYPE=Bridge
    DELAY=0
    PERSISTENT_DHCLIENT=yes
    
  • /etc/sysconfig/network-scripts/ifcfg-em1
    DEVICE=em1
    ONBOOT=yes
    NM_CONTROLLED=no
    TYPE=Ethernet
    BRIDGE=br0
    
I enabled on the system by placing "options kvm_intel nested=1" in /etc/modprobe.d/kvm.conf and reloading the kvm_intel module.

To allow the VMs to be connected to the bridge interface, I've added "allow br0" to /etc/qemu/bridge.conf

I also set /proc/sys/net/ipv4/ip_forward to 1, so that packet forwarding works.

For now, I've completely turned off iptables.  This is not ideal, but I'll leave iptables rule building for another post.

Booting an image into the FastStart installer looks like this:

qemu-kvm -m 2048 -cpu qemu64,+vmx -drive file=/dev/vg01/ciab,if=virtio \
         -net nic,model=virtio,macaddr=52:54:00:12:34:60 -net bridge,br=br0 \
         -cdrom silvereye-nightly-3.3-m5.iso -boot d -vnc :1

Let's look at the individual pieces of that.

1) "-m 2048" is 2GB of RAM
2) "-cpu qemu64,+vmx" specifies that we are emulating a CPU capable of virtualization
3) "-drive file=/dev/vg01/ciab,if=virtio" is our LVM-backed "disk"
4) "-net nic,model=virtio,macaddr=52:54:00:12:34:60" is our virtual network interface. Specifying a unique MAC address is very important!  If you don't do this, every VM will get the same MAC, and they won't be able to communicate with each other.  They will all be able to send and receive other traffic, though, which can be maddening if you don't realize what's going on.
5) "-net bridge,br=br0" says to connect the host TAP device to bridge br0.  This gets you non-NAT access to the physical network, since the bridge is also connected to em1.
6) "-cdrom silvereye-nightly-3.3-m5.iso -boot d" connects our ISO image and boots from it.
7) "-vnc :1" starts a VNC server on the host (on port 5901) for this VM's console.

I connect to 10.101.1.25:5901 with a vnc client, and proceed with the install, selecting "Cloud in a Box" at the boot menu.  Since I own 10.101.7.x, I give this VM the IP 10.101.7.1, and configure its "public IP range" to be 10.101.7.10-10.101.7.20.  The subset of IPs here is arbitrary; I wanted to leave some for my other test clouds.  For private IPs, I'll use 192.168.57.0/24, half of my allotted range.  The default gateway and nameserver are exactly the same as for the host system.  The host bridge is just a pass-through, not an extra routing "hop".




Once installed, I can access the cloud through the normal channels -- SSH, the admin UI on port 8443, and the user console on port 8888.  I test by logging into the user console, generating and downloading a new key, launching an instance, and SSHing into the instance's public IP.

The chain of devices through which data flows may not be entirely clear.  When you launch an instance inside the virtual cloud on 10.101.7.1 -- and let's say for the example that it gets an IP of 10.101.7.10 -- here's the list of "devices" through which packets flow when you connect to it from a different physical system:
  1. the physical interface of the host, em1
  2. the host bridge, br0
  3. the host tap interface
  4. the VM's "eth0" interface
  5. the VM's "br0" bridge (which is not enslaving eth0 in this case)
  6. the VM's tap interface (vnet0)
  7. the nested VM's eth0
The only place that NAT happens here is between 4 and 5, and that's only necessary because I chose to use eucalyptus's "MANAGED-NOVLAN" mode.

So that's a cloud-in-a-box-in-a-VM (in a nutshell).

For a more traditional deployment, simply boot more VMs into the installer using the same qemu-kvm command format mentioned above, choosing "Frontend" for one, and "Node Controller" for the other(s).   For each one you boot, you need to create a new lvm volume and change the MAC address and the vnc port to avoid conflicts.  When running multiple clouds, you should also make sure that your IP ranges never overlap (i.e., don't let two clouds use the same public IPs or private subnet range).