Availability Zones and Host Aggregates in OpenStack Compute (Nova)

May 21, 2013 17 comments

Confusion around Host Aggregates and Availabaility Zones in Nova seems to be very common. In this post I’ll attempt to show how each are used. All information in this post is based on the way things work in the Grizzly version of Nova.

First, go ahead and forget everything you know about things called Availability Zones in other systems.  They are not the same thing and trying to map Nova’s concept of Availability Zones to what something else calls Availability Zones will only cause confusion.

The high level view is this: A host aggregate is a grouping of hosts with associated metadata.  A host can be in more than one host aggregate.  The concept of host aggregates is only exposed to cloud administrators.

A host aggregate may be exposed to users in the form of an availability zone. When you create a host aggregate, you have the option of providing an availability zone name. If specified, the host aggregate you have created is now available as an availability zone that can be requested.

Here is a tour of some commands.

Create a host aggregate:

$ nova aggregate-create test-aggregate1
+----+-----------------+-------------------+-------+----------+
| Id | Name            | Availability Zone | Hosts | Metadata |
+----+-----------------+-------------------+-------+----------+
| 1  | test-aggregate1 | None              |       |          |
+----+-----------------+-------------------+-------+----------+

Create a host aggregate that is exposed to users as an availability zone. (This is not creating a host aggregate within an availability zone! It is creating a host aggregate that is the availability zone!)

$ nova aggregate-create test-aggregate2 test-az
+----+-----------------+-------------------+-------+----------+
| Id | Name            | Availability Zone | Hosts | Metadata |
+----+-----------------+-------------------+-------+----------+
| 2  | test-aggregate2 | test-az           |       |          |
+----+-----------------+-------------------+-------+----------+

Add a host to a host aggregate, test-aggregate2. Since this host aggregate defines the availability zone test-az, adding a host to this aggregate makes it a part of the test-az availability zone.

nova aggregate-add-host 2 devstack
Aggregate 2 has been successfully updated.
+----+-----------------+-------------------+---------------+------------------------------------+
| Id | Name            | Availability Zone | Hosts         | Metadata                           |
+----+-----------------+-------------------+---------------+------------------------------------+
| 2  | test-aggregate2 | test-az           | [u'devstack'] | {u'availability_zone': u'test-az'} |
+----+-----------------+-------------------+---------------+------------------------------------+

Note that the novaclient output shows the availability zone twice. The data model on the backend only stores the availability zone in the metadata. There is not a separate column for it. The API returns the availability zone separately from the general list of metadata, though, since it’s a special piece of metadata.

Now that the test-az availability zone has been defined and contains one host, a user can boot an instance and request this availability zone.

$ nova boot --flavor 84 --image 64d985ba-2cfa-434d-b789-06eac141c260 \
> --availability-zone test-az testinstance
$ nova show testinstance
+-------------------------------------+----------------------------------------------------------------+
| Property                            | Value                                                          |
+-------------------------------------+----------------------------------------------------------------+
| status                              | BUILD                                                          |
| updated                             | 2013-05-21T19:46:06Z                                           |
| OS-EXT-STS:task_state               | spawning                                                       |
| OS-EXT-SRV-ATTR:host                | devstack                                                       |
| key_name                            | None                                                           |
| image                               | cirros-0.3.1-x86_64-uec (64d985ba-2cfa-434d-b789-06eac141c260) |
| private network                     | 10.0.0.2                                                       |
| hostId                              | f038bdf5ff35e90f0a47e08954938b16f731261da344e87ca7172d3b       |
| OS-EXT-STS:vm_state                 | building                                                       |
| OS-EXT-SRV-ATTR:instance_name       | instance-00000002                                              |
| OS-EXT-SRV-ATTR:hypervisor_hostname | devstack                                                       |
| flavor                              | m1.micro (84)                                                  |
| id                                  | 107d332a-a351-451e-9cd8-aa251ce56006                           |
| security_groups                     | [{u'name': u'default'}]                                        |
| user_id                             | d0089a5a8f5440b587606bc9c5b2448d                               |
| name                                | testinstance                                                   |
| created                             | 2013-05-21T19:45:48Z                                           |
| tenant_id                           | 6c9cfd6c838d4c29b58049625efad798                               |
| OS-DCF:diskConfig                   | MANUAL                                                         |
| metadata                            | {}                                                             |
| accessIPv4                          |                                                                |
| accessIPv6                          |                                                                |
| progress                            | 0                                                              |
| OS-EXT-STS:power_state              | 0                                                              |
| OS-EXT-AZ:availability_zone         | test-az                                                        |
| config_drive                        |                                                                |
+-------------------------------------+----------------------------------------------------------------+

All of the examples so far show how host-aggregates provide an API driven mechanism for cloud administrators to define availability zones. The other use case host aggregates serves is a way to tag a group of hosts with a type of capability. When creating custom flavors, you can set a requirement for a capability. When a request is made to boot an instance of that type, it will only consider hosts in host aggregates tagged with this capability in its metadata.

We can add some metadata to the original host aggregate we created that was *not* also an availability zone, test-aggregate1.

$ nova aggregate-set-metadata 1 coolhardware=true
Aggregate 1 has been successfully updated.
+----+-----------------+-------------------+-------+----------------------------+
| Id | Name            | Availability Zone | Hosts | Metadata                   |
+----+-----------------+-------------------+-------+----------------------------+
| 1  | test-aggregate1 | None              | []    | {u'coolhardware': u'true'} |
+----+-----------------+-------------------+-------+----------------------------+

A flavor can include a set of key/value pairs called extra_specs. Here’s an example of creating a flavor that will only run on hosts in an aggregate with the coolhardware=true metadata.

$ nova flavor-create --is-public true m1.coolhardware 100 2048 20 2
+-----+-----------------+-----------+------+-----------+------+-------+-------------+-----------+
| ID  | Name            | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
+-----+-----------------+-----------+------+-----------+------+-------+-------------+-----------+
| 100 | m1.coolhardware | 2048      | 20   | 0         |      | 2     | 1.0         | True      |
+-----+-----------------+-----------+------+-----------+------+-------+-------------+-----------+
$ nova flavor-key 100 set coolhardware=true
$ nova flavor-show 100
+----------------------------+----------------------------+
| Property                   | Value                      |
+----------------------------+----------------------------+
| name                       | m1.coolhardware            |
| ram                        | 2048                       |
| OS-FLV-DISABLED:disabled   | False                      |
| vcpus                      | 2                          |
| extra_specs                | {u'coolhardware': u'true'} |
| swap                       |                            |
| os-flavor-access:is_public | True                       |
| rxtx_factor                | 1.0                        |
| OS-FLV-EXT-DATA:ephemeral  | 0                          |
| disk                       | 20                         |
| id                         | 100                        |
+----------------------------+----------------------------+

Hopefully this provides some useful information on what host aggregates and availability zones are, and how they are used.

Categories: OpenStack

OpenStack Compute (Nova) Roadmap for Havana

May 13, 2013 6 comments

The Havana design summit was held mid-April.  Since then we have been documenting the Havana roadmap and going full speed ahead on development of these features.  The list of features that developers have committed to completing for the Havana release is tracked using blueprints on Launchpad. At the time of writing, we have 74 blueprints listed that cover a wide range of development efforts.  Here are some highlights in no particular order:

Database Handling

Vish Ishaya made a change at the very beginning of the development cycle that will allow us to backport database migrations to the Grizzly release if needed. This is needed in case we need to backport a bug fix that requires a migration.

Dan Smith and Chris Behrens are working on a unified object model. One of the things that has been in the way of rolling upgrades of a Nova deployment is that the code and the database schema are very tightly coupled. The primary goal of this effort is to decouple these things. This effort is bringing in some other improvements, as well, including better object serialization handling for rpc, as well as object versioning.

Boris Pavlovic continues to do a lot of cleanup of database support in Nova.  He’s adding tests (and more tests), adding unique constraints, improving session handling, and improving archiving.

Chris Behrens has been working on a native MySQL database driver that performs much better than the SQLAlchemy driver for use in large scale deployments.

Mike Wilson is working on supporting read-only database slaves. This will allow distributing some queries to other database servers to help scaling in large scale deployments.

Bare Metal

The Grizzly release of Nova included the bare metal provisioning driver. Interest in this functionality has been rapidly increasing. Devananda van der Veen proposed that the bare metal provisioning code be split out into a new project called Ironic. The new project was approved for incubation by the OpenStack Technical Committee last week. Once this has been completed, there will be a driver in Nova that talks to the Ironic API. The Ironic API will present some additional functionality that doesn’t make sense to use to present in the Compute API in Nova.

Prior to the focus shift to Ironic, some new features were added to the bare metal driver. USC-ISI added support for Tilera and Devananda added a feature that allows you to request a specific bare metal node when provisioning a server.

Version 3 (v3) of the Compute API

The Havana release will include a new revision of the compute REST API in Nova. This effort is being led by Christopher Yeoh, with help from others. The v3 API will include a new framework for implementing extensions, extension versioning, and a whole bunch of cleanup: (1) (2) (3) (4).

Networking

The OpenStack community has been maintaining two network stacks for some time. Nova includes the nova-network service. Meanwhile, the OpenStack Networking project has been developed from scratch to support much more than nova-network does. Nova currently supports both. OpenStack Networking is expected to reach and surpass feature parity with nova-network in the Havana cycle. As a result, it’s time to deprecate nova-network. Vish Ishaya (from the Nova side) and Gary Kotton (from the OpenStack Networking side) have agreed to take on the challenging task of figuring out how to migrate existing deployments using nova-network to an updated environment that includes OpenStack Networking.

Scheduling

The Havana roadmap includes a mixed bag of scheduler features.

Andrew Laski is going to make the changes required so that the scheduler becomes exclusively a resource that gets queried. Currently, when starting an instance, the request is handed off to the scheduler, which then hands it off to the compute node that is selected. This change will make it so proxying through nova-scheduler is no longer done. This will mean that every operation that uses the scheduler will interact with it the same way, as opposed to some operations querying and others proxying.

Phil Day will be adding an API extension that allows you to discover which scheduler hints are supported.  Phil is also looking at adding a way to allocate an entire host to a single tenant.

Inbar Shapira is looking at allowing multiple scheduling policies to be in effect at the same time.  This will allow you to have different sets of scheduler filters activated depending on some type of criteria (perhaps the requested availability zone).

Rerngvit Yanggratoke is implementing support for weighting scheduling decisions based on the CPU utilization of existing instances on a host.

Migrations

Nova includes support for different types of migrations. We have cold migrations (migrate) and live migrations (live-migrate). We also have resize and evactuate, which are very related functions. The code paths for all of these features have evolved separately. It turns out that we can rework all of these things to share a lot of code. While we’re at it, we are restructuring the way these operations work to be primarily driven by the nova-conductor service.  This will allow the tasks to be tracked in a single place, as opposed to the flow of control being passed around between compute nodes. Having compute nodes tell each other what to do is also a very bad thing from a security perspective. These efforts are well underway. Tiago Rodrigues de Mello is working on moving cold migrations to nova-conductor and John Garbutt is working on moving live migrations. All of this is tracked under the parent blueprint for unified migrations.

And More!

This post doesn’t include every feature on the roadmap. You can find that here. I fully expect that more will be added to this list as Havana progresses. We don’t always know what features are being worked on in advance. If you have another feature you would like to propose, let’s talk about it on the openstack-dev list!

Categories: OpenStack

Deployment Considerations for nova-conductor Service in OpenStack Grizzly

February 19, 2013 7 comments

The Grizzly release of OpenStack Nova includes a new service, nova-conductor. Some previous posts about this service can be found here and here. This post is intended to provide some additional insight into how this service should be deployed and how the service should be scaled as load increases.

Smaller OpenStack Compute deployments typically consist of a single controller node and one or more compute (hypervisor) nodes. The nova-conductor service fits in the category of controller services. In this style of deployment you would run nova-conductor on the controller node and not on the compute nodes. Note that most of what nova-conductor does in the Grizzly release is doing database operations on behalf of compute nodes. This means that the controller node will have more work to do than in previous releases. Load should be monitored and the controller services should be scaled out if necessary.

Here is a model of what this size of deployment might look like:

nova-simple

As Compute deployments get larger, the controller services are scaled horizontally. For example, there would be multiple instances of the nova-api service on multiple nodes sitting behind a load balancer. You may also be running multiple instances of the nova-scheduler service across multiple nodes. Load balancing is done automatically for the scheduler by the AMQP message broker, RabbitMQ or Qpid. The nova-conductor service should be scaled out in this same way. It can be run multiple times across multiple nodes and the load will be balanced automatically by the message broker.

Here is a second deployment model. This gives an idea of how a deployment grows beyond a single controller node.

nova-complex

There are a couple of ways to monitor the performance of nova-conductor to see if it needs to be scaled out. The first is by monitoring CPU load. The second is by monitoring message queue size. If the queues are getting backed up, it is likely time to scale out services.  Both messaging systems provide at least one way to look at the state of message queues. For Qpid, try the qpid-stat command. For RabbitMQ, see the rabbitmqctl list_queues command.

Categories: OpenStack

Installing Steam for Linux Beta on Fedora 17

December 7, 2012 27 comments

UPDATE: Since the original post, the download of Team Fortress 2 completed and I hit a problem. The post has been amended with the solution.

It sounds like a lot more people got access to the Steam for Linux beta yesterday, including me. An announcement on steamcommunity.com says:

We’ve just expanded the limited public beta by a large amount – which means another round of email notifications – so check your inbox!

The official download is a deb package for Ubuntu. My laptop runs Fedora 17. I was pleasantly surprised to see that there was an unofficial Fedora repository already ready to use. Here is how I installed the beta on my laptop running Fedora 17:

$ wget http://spot.fedorapeople.org/steam/steam.repo
...
$ sudo mv steam.repo /etc/yum.repos.d/
$ sudo yum install steam
...
$ rpm -q steam
steam-1.0.0.14-3.fc17.i686

Once installation was completed, I ran the steam client from the same terminal:

$ steam

The first time I ran the steam client it automatically created /home/rbryant/Steam and downloaded about 100 kB of updates. Once the updates completed, the login screen came up. I closed the steam client and ran it again. I got a warning dialog that said:

Unable to copy /home/rbryant/Steam/bin_steam.sh to /usr/bin/steam, please contact your system administrator.

This was a bit odd since the app that I had been running was already /usr/bin/steam. I suspect this is just automatically installing a new version based on what was downloaded with the updates. Based on the output in my terminal, I can see that before this warning came up, steam tried to find gksudo, kdesudo, or xterm and then gave up. I went ahead and installed xterm.

$ sudo yum install xterm

When running steam yet again, it popped up an xterm window to ask me to type in my password. This only happened once. Subsequent runs of the steam client in my terminal went straight to the login window.

From there I finally decided to log in using my existing steam account. I confirmed access to my account on a new computer and was in. I kicked off a download of Team Fortress 2 Beta for Linux.

Once the game download was complete, I clicked Play. The first time I tried, it failed with the following error:

Required OpenGL extension “GL_EXT_texture_compression_s3tc” is not supported. Please install S3TC texture support.

To fix this, I had to add the rpmfusion repositories to my machine and install the libtxc_dxtn package.


$ sudo yum localinstall --nogpgcheck http://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-stable.noarch.rpm http://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-stable.noarch.rpm

$ sudo yum install libtxc_dxtn

If you’re running 64-bit Linux, you will actually need the 32-bit version of this library to fix the game. Install it by running:

$ sudo yum install libtxc_dxtn.i686

Once all of that was done, the game launched successfully and I was able to start a training session.

Enjoy!

Categories: Fedora

A new Nova service: nova-conductor

November 19, 2012 7 comments

The Grizzly release of OpenStack Nova will have a new service, nova-conductor. The service was discussed on the openstack-dev list and it was merged today. There is currently a configuration option that can be turned on to make it optional, but it is possible that by the time Grizzly is released, this service will be required.

One of the efforts that started during Folsom development and is scheduled to be completed in Grizzly is no-db-compute. In short, this effort is to remove direct database access from the nova-compute service. There are two main reasons we are doing this. Compute nodes are the least trusted part of a nova deployment, so removing direct database access is a step toward reducing the potential impact of a compromised compute node. The other benefit of no-db-compute is for upgrades. Direct database access complicates the ability to do live rolling upgrades. We’re working toward eventually making that possible, and this is a part of that.

All of the nova services use a messaging system (usually AMQP based) to communicate with each other. Many of the database accesses in nova-compute can be (and have been) removed by just sending more data in the initial message sent to nova-compute. However, that doesn’t apply to everything. That’s where the new service, nova-conductor, comes in.

The nova-conductor service is key to completing no-db-compute. Conceptually, it implements a new layer on top of nova-compute. It should *not* be deployed on compute nodes, or else the security benefits of removing database access from nova-compute will be negated. Just like other nova services such as nova-api or nova-scheduler, it can be scaled horizontally. You can run multiple instances of nova-conductor on different machines as needed for scaling purposes.

The methods exposed by nova-conductor will initially be relatively simple methods used by nova-compute to offload its database operations. Places where nova-compute previously did database access will now be talking to nova-conductor. However, we have plans in the medium to long term to move more and more of what is currently in nova-compute up to the nova-conductor layer. The compute service will start to look like a less intelligent slave service to nova-conductor. The conductor service will implement long running complex operations, ensuring forward progress and graceful error handling. This will be especially beneficial for operations that cross multiple compute nodes, such as migrations or resizes.

If you have any comments, questions, or suggestions for nova-conductor or the no-db-compute effort in general, please feel free to bring it up on the openstack-dev list.

Categories: OpenStack

OpenStack Design Summit and an Eye on Folsom

I just spent a week in San Francisco at the OpenStack design summit and conference. It was quite an amazing week and I’m really looking forward to the Folsom development cycle.  You can find a notes from various sessions held at the design summit on the OpenStack wiki.

Essex was the first release that I contributed to.  One thing I did was add Qpid support to both Nova and Glance as an alternative to using RabbitMQ.  Beyond that, I primarily worked on vulnerability management and other bug fixing.  For Folsom, I’m planning on working on some more improvements involving the inter-service messaging layer, also referred to as the rpc API, in Nova.

1) Moving the rpc API to openstack-common

The rpc API in Nova is used for private communication between nova services. As an example, when a tenant requests that a new virtual machine instance be created (either via the EC2 API or the OpenStack compute REST API), the nova-api service sends a message to the nova-scheduler service via the rpc API.  The nova-scheduler service decides where the instance is going to live and then sends a message to that compute node’s nova-compute service via the rpc API.

The other usage of the rpc API in Nova has been for notifications.  Notifications are asynchronous messages about events that happen within the system.  They can be used for monitoring and billing, among other things.  Strictly speaking, notifications aren’t directly tied to rpc.  A Notifier is an abstraction, of which using rpc is one of the implementations.  Glance also has notifications, including a set of Notifier implementations.  The code was the same at one point but has diverged quite a bit since.

We would like to move the notifiers into openstack-common.  Moving rpc into openstack-common is a prerequisite for that, so I’m going to knock that part out.  I’ve already written a few patches in that direction.  Once the rpc API is in openstack-common, other projects will be able to make use of it.  There was discussion of Quantum using rpc at the design summit, so this will be needed for that, too.  Another benefit is that the Heat project is using a copy of Nova’s rpc API right now, but will be able to migrate over to using the version from openstack-common.

2) Versioning the rpc API interfaces

The existing rpc API is pretty lightweight and seems to work quite well.  One limitation is that there is nothing provided to help with different versions of services talking to each other.  It may work … or it may not.  If it doesn’t, the failure you get could be something obvious, or it could be something really bad and bizarre where an operation fails half-way through, leaving things in a bad state.  I’d like to clean this up.

The end goal with this effort will be to make sure that as you upgrade from Essex to Folsom, any messages originating from an Essex service can and will be correctly processed by a Folsom service.  If that fails, then the failure should be immediate and obvious that a message was rejected due to a version issue.

3) Removing database access from nova-compute

This is by far the biggest effort of the 3 described here, and I won’t be tackling this one alone.  I want to help drive it, though.  This discussion came up in the design summit session about enhancements to Nova security.  By removing direct database access from the nova-compute service, we can help reduce the potential impact if a compute node were to be compromised.  There are two main parts to this effort.

The first part is to make more efficient use of messaging by sending full objects through the rpc API instead of IDs. For example, there are many cases where the nova-api service gets an instance object from the database, does its thing, and then just sends the instance ID in rpc message.  On the other side it has to go pull that same object out of the database.  We have to go through and change all cases like this to include the full object in the message.  In addition to the security benefit, it should be more efficient, as well. This doesn’t sound too complicated, and it isn’t really, but it’s quite a bit of work as there is a lot of code that needs to be changed. There will be some snags to deal with along the way, such as dealing with making sure all of the objects can be serialized properly.

Including full objects in messages is only part of the battle here.  The nova-compute service also does database updates.  We will have to come up with a new approach for this.  It will most likely end up being a new service of some sort, that handles state changes coming from compute nodes and makes the necessary database updates according to those state changes. I haven’t fully thought through the solution to this part yet. For example, ensuring that this service maintains proper order of messages without turning into a bottleneck in the overall system will be important.

Onward!

I’m sure I’ll work on other things in Folsom, as well, but those are three that I have on my mind right now.  OpenStack is a great project and I’m excited to be a part of it!

Categories: OpenStack

Automated Testing of Matahari in a chroot Using Mock

October 6, 2011 Leave a comment

While I was at Digium, I helped build out some automated testing for Asterisk (posts on that here and here). We had a continuous integration server that did builds, ran C API unit tests, and ran some functional tests written in Python.

One of the things that we wanted to do with the Asterisk tests is to sandbox each instance of Asterisk. All of this is handled by an Asterisk Python class which creates a new set of directories for each Asterisk instance to store its files. This seems to have worked pretty well. One down side is that there is still the potential for Asterisk instances to step on each others files. All instances of Asterisk for all runs of the tests run within the same OS installation. One way to improve that that is a bit less heavy handed than starting up a bunch of new VMs all the time is to use a chroot.

Over the last week or so, I have been working on a similar setup for Matahari and wanted to share some information on how it works and in particular, some aspects that are different from what I’ve done before, including running all of the tests in a chroot.

What was already in place

Matahari uses Test-AutoBuild as its continuous integration server. The results of the latest build for Fedora can be found here.

When autobuild runs each hour, it clones the Matahari git repo and runs the autobuild.sh script in the top level directory. This script uses mock to build both matahari and mingw32-matahari packages. Mock handles setting up a chroot for the distribution and version you want to build a package for so you can easily do a lot of different builds on one machine. It also does a lot of caching to make this process much faster if you run it multiple times.

To install mock on Fedora, install the mock package. You will also need to add the user that will be running mock to the mock group.

To do a build in mock, you first need an srpm. The autobuild.sh script in Matahari has a make_srpm function that does this. Once you have an srpm, you can do a build with a single call to mock. The –root option specifies the type of chroot you want mock to use.

$ mock --root=fedora-16-x86_64 --rebuild *.src.rpm

The root, fedora-16-x86_64, is defined by a mock profile. When mock gets installed, a set of profiles gets installed, which can be found in /etc/mock/.

What’s new

While doing continuous builds is useful (for example, breaking the Windows build is not terribly uncommon), doing tests against these builds adds an enormous amount of additional value to the setup. Similar to what we have for Asterisk, we have two sets of tests for Matahari. We have a suite of C API unit tests, and we have a suite of functional tests written in Python.</P.

The first thing I did was update the setup to run the unit tests. I decided to modify the RPM spec file to optionally run the unit tests as a part of the build process. That patch is here. If the run_unit_tests variable is set, the spec file runs ctest after compilation is complete. The other aspect of this is getting this variable defined, which is pretty easy to do with mock.

$ mock --root=fedora-16-x86_64 --define "run_unit_tests 1" --rebuild *.src.rpm

Getting the Python-based functional tests running within mock is a bit more tricky, but not too bad. With the mock commands presented so far, a number of things are happening automatically, including setting up the chroot and cleaning up the chroot. To get these other tests running, we have to break up the processes into smaller steps. The first step is to initialize a chroot. We will also be using another option for all of the mock commands, –resultdir, which lets you specify where logs and the resulting RPMs from –rebuild are stored.

$ mock --root=fedora-16-x86_64 --resultdir=mockresults --init

The next step is to build the RPMs like before. In this case, we already have an initialized chroot, so we need to tell mock not to create a new one. We also need to tell mock not to clean up the chroot after the RPM builds are complete, because we want to perform more operations in there.

$ mock --root=fedora-16-x86_64 --resultdir=mockresults
--define="run_unit_tests 1" --no-clean --no-cleanup-after
--rebuild *.src.rpm

At this point, we have compiled the source, run the unit tests, and built RPMs. The chroot used to do all of this is still there. We can take the RPMs we just built and install them into the chroot.

$ mock --root=fedora-16-x86_64 --resultdir=mockresults --install mockresults/*.rpm

Now it’s time to set up the functional tests. We need to install some dependencies for the tests and then copy the tests themselves into the chroot.

$ . src/tests/deps.sh
$ mock --root=fedora-16-x86_64 --resultdir=mockresults
--install ${MH_TESTS_DEPS}

$ mock --root=fedora-16-x86_64 --resultdir=mockresults
--copyin src/tests /matahari-tests

The dependencies of the tests are installed in the chroot and the tests themselves have been copied in. Now mock can be used to execute each set of tests.

$ mock --root=fedora-16-x86_64 --resultdir=mockresults
--shell "nosetests -v /matahari-tests/test_host_api.py"

$ mock --root=fedora-16-x86_64 --resultdir=mockresults
--shell "nosetests -v /matahari-tests/test_sysconfig_api.py"

$ mock --root=fedora-16-x86_64 --resultdir=mockresults
--shell "nosetests -v /matahari-tests/test_resource_api.py"

$ mock --root=fedora-16-x86_64 --resultdir=mockresults
--shell "nosetests -v /matahari-tests/test_service_api_minimal.py"

$ mock --root=fedora-16-x86_64 --resultdir=mockresults
--shell "nosetests -v /matahari-tests/test_network_api_minimal.py"

That’s it! The autobuild setup now tests compilation, unit tests, building RPMs, installing RPMs, and running and exercising all of the installed applications. It’s fast, too.

Final Thoughts

As with most things, there are some areas for improvement. For example, one glaring issue is that the entire setup is Fedora specific, or at least specific to a distribution that can use mock. However, I have at least heard of a similar tool to mock called pbuilder for dpkg based distributions, which could potentially be used in a similar way. I’m not sure.

There are also some issues with this approach specific to the Matahari project. Matahari includes a set of agents that provide systems management APIs. Testing of some of these APIs isn’t necessarily something you want to do on the same machine running the actual tests. To expand to much more complete coverage of these APIs, we’re going to have to break down and adopt an approach of spinning up VMs to run the Matahari agents. At that point, we may not run any of the functional tests from within mock anymore.

Mock is a very handy tool and it helped me to expand the automated build and test setup to get a lot more coverage in a shorter amount of time in a way that I was happy with. I hope this writeup helps you think about some other things that you could do with it.

Categories: Matahari
Follow

Get every new post delivered to your Inbox.

Join 1,190 other followers