PTLs and Project Success in OpenStack

We’re in the middle of another PTL change cycle.  Nominations have occurred over the last week.  We’ve also seen several people step down this cycle (Keystone, TripleO, Cinder, Heat, Glance).  This is becoming a regular thing in OpenStack.  The PTL position for most projects has changed hands over time.  The Heat project changes every cycle.  Nova has its 3rd PTL from a 3rd company (about to enter his 2nd cycle).  With all of this change, some people express some amount of discomfort and concern.

I think the change that we see is quite healthy.  This is our open governance model working well.  We should be thrilled that OpenStack is healthy enough that we don’t rely on any one person to move forward.

I’d like to thank everyone who steps up to fill a PTL position.  It is a huge commitment.  The recognition you get is great, but it’s really hard work.  It’s quite a bit more than just technical leadership.  It also involves project management and community management.  It’s a position subject to a lot of criticism and a lot of the work is thankless.  So, thank you very much to those that are serving as a PTL or have served as one in the past.

It’s quite important that everyone also realize that it takes a lot more than a PTL to make a project successful.  Our project governance and culture includes checks and balances.  There is a Technical Committee (TC) that is responsible for ensuring that OpenStack development overall remains healthy.  Some good examples of TC influence on projects would be the project reviews the TC has been doing over the last cycle, working with projects to apply course corrections (Neutron, Trove, Ceilometer, Horizon, Heat, Glance).  Most importantly, the PTL still must work for consensus of the project’s members (though is empowered to make the final call if necessary).

As a contributor to an OpenStack project, there is quite a bit you can do to help ensure project success beyond just writing code.  Here are some of those things:

Help the PTL

While the PTL is held accountable for the project, they do not have to be responsible for all of the work that needs to get done.  The larger projects have started officially delegating responsibilities.  There is talk about formalizing aspects of this, so take a look at that thread for examples.

If you’re involved in a project, you should work to understand these different aspects of keeping the project running.  See if there’s an area you can help out with.  This stuff is critically important.

If you aspire to be a PTL at some point in the future, I would say getting involved in these areas is the best way you can grow your skills, influence, and visibility in the project to make yourself a strong candidate in the future.

Participate in Project Discussions

The direction of a project is a result of many discussions.  You should be aware of these and participate in them as much as you can.  Watch the openstack-dev mailing list for discussions affecting your project.  It’s quite surprising how many people may be a core reviewer for a project but rarely participate in discussions on the mailing list.

Most projects also have an IRC channel.  Quite a bit of day-to-day discussion happens there.  This can be difficult due to time zones or other commitments.  Just join when you can.  If it’s time zone compatible, you should definitely make it a priority to join weekly project IRC meetings.  This is an important time to discuss current happenings in the project.

Finally, attend and participate in the design summit.  This is the time that projects try to sync up on goals for the cycle.  If you really want to play a role in affecting project direction and ensuring success, it’s important that you attend if possible.  Of course, there are many legitimate reasons some people may not be able to travel and those should be understood and respected by fellow project members.

Also keep in mind that project discussions span more than the project’s technical issues.  There are often project process and structure issues to work out.  Try to raise your awareness of these issues, provide input, and propose new ideas if you have them.  Some good recent examples of contributors doing this would be Daniel Berrange putting forth a detailed proposal to split out the virt drivers from nova, or Joe Gordon and John Garbutt pushing forward on evolving the blueprint handling process.

Do the Dirty Work

Most contributors to OpenStack are contributing on behalf of an employer.  Those employers have customers (which may be internal or external) and those customers have requirements.  It’s understandable that some amount of development time goes toward implementing features or fixing problems that are important to those customers.

It’s also critical that everyone understands that there is a good bit of common work that must get done.  If you want to build goodwill in a project while also helping its success, help in these areas.  Some of those include:

See You in Paris!

I hope this has helped some contributors think of new ways of helping ensuring the success of OpenStack projects.  I hope to see you on the mailing list, IRC, and in Paris!

OpenStack Board Meeting – 2014-09-18

I’m not a member of the OpenStack board, but the board meetings are open with the exception of the occasional Executive Session for topics that really do need to be private.  I attended the meeting on September 18, 2014.  Jonathan Bryce has since posted a summary of the meeting, but I wanted to share some additional commentary.

Mid-Cycle Meetups

Rob Hirschfeld raised the topic of mid-cycle meetups.  This wasn’t discussed for too long in the meeting.  The specific request was that the foundation staff evaluate what kind of assistance the foundation could and should be providing these events.  So far they are self-organized by teams within the OpenStack project.

These have been increasing in popularity.  The OpenStack wiki lists 12 such events during the Juno development cycle.  Some developers involved in cross project efforts attended several events.  This increase in travel has resulted in some tension in the community, as well.  Some people have legitimate personal reasons for not wanting the additional travel burden.  Others mention the increased impact to company travel budgets.  Overall, the majority opinion is that the events are incredibly productive and useful.

One of the most insightful outcomes of the discussions about meetups has been that the normal design summit is no longer serving its original purpose well enough.  The design summit is supposed to be the time that we sync on project goals.  We need to work on improving the design summit so that it better achieves that goal so that the mid-cycle meetups can be primarily focused on working sessions to execute on the plan from the design summit.

There was a thread on the openstack-dev list discussing improvements we can make to the design summit based on mid-cycle meetup feedback.

Foundation Platinum Membership

Jonathan raised a topic about how the foundation and board should fill a platinum sponsor spot that is opening up soon.  The discussion was primarily about the process the board should use for handling this.

This may actually be quite interesting to watch.  Nebula is stepping down from its platinum sponsor spot.  There is a limit of 8 platinum sponsors.  This is the first time since the foundation launched that a platinum sponsor spot has been opened.  OpenStack has grown an enormous amount since the foundation launched, so it’s reasonable to expect that there will be contention around this spot.

Which companies will apply for the spot?  How will the foundation and board handle contention?  And ultimately, which company will be the new platinum sponsor of the foundation?

DefCore

This topic took up the majority of the board meeting and was quite eventful.  Early in the discussion there was concern about the uncertainty caused by the length of the process so far.  There seemed to be general agreement that this is a real concern.  Naturally, the conversation proceeded to how to move forward.

The goal of DefCore has been around defining a single minimal definition to support the single OpenStack Powered trademark program.  The most important outcome from this discussion was that the board consensus seemed to be that this is causing far too much contention and that the DefCore committee should work with the foundation to propose a small set of trademark programs instead of trying to have a single one that covered all cases.

There were a couple of different angles discussed on creating additional trademark programs.  The first is about functionality.  So far, everything has been included in one program.  There was discussion of separate compute, storage, and networking trademark programs, for example.

Another angle that seemed quite likely based on the discussion was having OpenStack powered vs. OpenStack compatible trademark programs.  An OpenStack compatible program would only focus on compatibility with capabilities.  The OpenStack powered trademarks would include both capabilities and required code (designated sections).  The nice thing about this angle is that it would allow the board to press forward on the compatibility marks in support of the goal of interoperability, without getting held up by the debate around designated sections.

For more information on this, follow the ongoing discussion on the defcore-committee mailing list.

Next Board Meeting

The next OpenStack board meeting will be Nov. 2, 2014 in Paris, just before the OpenStack Summit.  There may also be an additional board meeting to continue the DefCore discussion.  Watch the foundation list for any news on additional meetings.

How to use the telephone (Phone Etiquette, or How I Learned to Love the Mute Button)

Many of us spend a lot of time on conference calls. Dan Smith and I recently put together an internal wiki page with some helpful information for our fellow conference call attendees. Dan wrote the instructions and I provided some additional visual aids.  Here is the content of that page. Feel free to share this information with anyone that you feel may benefit.

How to use the telephone

  1. Dial the number you wish to call
  2. Press the mute button
  3. If you wish to talk, place your finger on your mute button and press it, but keep your finger poised over the button
  4. Speak
  5. When finished speaking press the mute button before you return your finger to your keyboard
  6. If the call is continuing, go to step 3
  7. If the call is finished, hang up the telephone

Additional Tips

  1. Try to avoid using soft phones. They’re horrible and they often introduce echo and/or noise for everyone else on the call
  2. Try to avoid using cell phones. They’re horrible and they often introduce echo and/or noise for everyone else on the call
  3. Use a headset. Speakerphones aren’t as good as you think they are. If you have to use a speakerphone, be even more vigilant about your mute button.
  4. Stick to landline or proper voip phones whenever possible. Always use your mute button when you’re not actually speaking.
  5. If you’re using something that resembles but poorly emulates a real telephone, use the *6 mute feature as it mutes all the audio coming from your line, something that your emulated phone may not do well.
  6. Keep track of your mute status. It’s not that hard. It’s either on or off. Don’t announce to everyone that you forgot whether you were muted or not. If you need help determining if you’re muted or not try the following procedure below.
  7. Hold is not mute. Don’t use your phone’s hold function in any way when you’re connected to a conference. Your phone or phone system may play music or beep periodically into the conference, making it unusable until you return.

Steps to determine if you’re muted

  1. If you’re not talking right now, you’re muted (see step 3 in the How to use a telephone instructions)
  2. There is no step two. If you messed up step one and think you need a step two, review step one.

 Helpful Images

A phone

Note: This is a Polycom IP 335. Your phone may differ.

polycom_soundpoint_ip_335

 

Location of Mute Button

It’s the single big red button.  Note that your phone should remain black and NOT turn grey while locating the mute button.

polycom_soundpoint_ip_335_mute_location

 

Light On While Muted

Note that the phone should remain black while muted.  It will not turn grey.

 

polycom_soundpoint_ip_335_mute_light

Juno Preview for OpenStack Compute (Nova)

We’re now well into the Juno release cycle. Here’s my take on a preview of some of what you can expect in Juno for Nova.

NFV

One area receiving a lot of focus this cycle is NFV. We’ve started an upstream NFV sub-team for OpenStack that is tracking and helping to drive requirements and development efforts in support of NFV use cases. If you’re not familiar with NFV, here’s a quick overview that was put together by the NFV sub-team:

NFV stands for Network Functions Virtualization. It defines the
replacement of usually stand alone appliances used for high and low
level network functions, such as firewalls, network address translation,
intrusion detection, caching, gateways, accelerators, etc, into virtual
instance or set of virtual instances, which are called Virtual Network
Functions (VNF). In other words, it could be seen as replacing some of
the hardware network appliances with high-performance software taking
advantage of high performance para-virtual devices, other acceleration
mechanisms, and smart placement of instances. The origin of NFV comes
from a working group from the European Telecommunications Standards
Institute (ETSI) whose work is the basis of most current
implementations. The main consumers of NFV are Service providers
(telecommunication providers and the like) who are looking to accelerate
the deployment of new network services, and to do that, need to
eliminate the constraint of slow renewal cycle of hardware appliances,
which do not autoscale and limit their innovation.

NFV support for OpenStack aims to provide the best possible
infrastructure for such workloads to be deployed in, while respecting
the design principles of a IaaS cloud. In order for VNF to perform
correctly in a cloud world, the underlying infrastructure needs to
provide a certain number of functionalities which range from scheduling
to networking and from orchestration to monitoring capacities. This
means that to correctly support NFV use cases in OpenStack,
implementations may be required across most, if not all, main OpenStack
projects, starting with Neutron and Nova.

The opportunities for OpenStack in the NFV space are huge. The work being tracked by the NFV sub-team spans more than just Nova, but here are some of the NFV related projects for Nova:

Upgrades

The road to live upgrades has been a long one. Progress has been made over the last several releases. The Icehouse release was the first release that supported live upgrades in some form. From Havana to Icehouse, you can do a control plane upgrade with some API downtime without having to upgrade your compute nodes at the same time. You can roll through upgrading the compute nodes with the control plane already upgraded to Icehouse.

For Juno we are continuing to improve on this in several areas. Since Nova is a highly distributed system, one of the biggest requirements for doing this is versioning everything about the interactions between components. First we went through and versioned all of the interfaces between components. Next we have been versioning all of the data passed between components. This versioning of the data is part of what Nova Objects provide. Nova Objects are an internal implementation detail, but are critical to upgrade support. The entire code base had not been converted as of the Icehouse release. For Juno we continue to do conversions over to this new object model.

The other major improvement being looked at this release cycle is how we can reduce the downtime needed on the control plane by reducing how long it takes for database schema migrations to run. This is largely about developing new best practices about how migrations need to work going forward.

Finally, for the Icehouse release we added some basic testing of the live upgrade sceanario to the OpenStack CI system. This testing runs OpenStack using the previous release and then upgrades everything except the nova-compute service. At that point, everything should continue to work. One goal for the Juno cycle is to improve this testing to verify that we can also run an older instance of the nova-network service with an upgraded control plane. This is critical for deployments that use nova-network in multi-host mode. In that case, you have nova-network running on each compute node, so we need to support a mixed version environment for nova-network, as well as nova-compute.

Scheduler

There’s always a lot of interest in improving the way host scheduling works in Nova. In the Icehouse cycle we identified that we wanted to split the scheduler out into a new project (codenamed Gantt). Doing so requires decoupling Nova’s scheduler as much as possible from the rest of Nova. This decoupling effort is the primary goal for the Juno cycle. Once the scheduler is independent of Nova, we can investigate ways to integrate other projects so that scheduling can use information that currently only lives in other projects such as Neutron or Cinder.

Docker

The Docker driver for Nova was moved to Stackforge during the Icehouse development cycle. The primary reason was the lack of CI running for the driver. However, there were a number of feature gaps that made getting CI with tempest working as it needed to. Moving to stackforge gave an opportunity for the team working on this driver to iterate quicker and fill these gaps.

There has been a lot of progress on the Docker driver in the Juno cycle. Some of the feature gap work has resulting in improvements to Docker itself, which is really great to see. For example, Docker now supports pause and unpause, which is a feature of the Nova API that the Docker driver is now able to support. Another area that has seen some focus is Cinder support. To make this work, we have to be able to support exposing block devices to Docker containers at creation time, as well as later on after they are already running. There has been work on Docker itself in this area, which will eventually lead to support in the Nova Docker driver.

Finally, there has been ongoing work to get CI with tempest running. It’s now running in OpenStack’s CI infrastructure. The progress is great to see, but it also seems most likely that the driver will return to Nova in the K release cycle instead of Juno.

Ironic

Nova introduced the baremetal driver in the Grizzly release.  This driver allows you to use Nova’s API to do provisioning of bare metal instead of virtual machines.  There was immediately a lot of interest in this functionality for OpenStack.  Soon after this driver was introduced, it was decided that we should start a new project dedicated to bare metal management.  That project is Ironic.

Ironic has come a long way since then.  The project is currently incubated and could potentially graduate for the K release.  One of the major tasks in moving towards graduation is getting the Ironic driver for Nova merged.  The spec has been approved and the code seems to be in good shape.  I’m very hopeful that we will have this step completed in the Juno release.

Database Integration

OpenStack has been a long time user of the SQLAlchemy library for its integration with relational databases.  More recently, some OpenStack projects have begun using Alembic for managing database schema migrations.  Michael Bayer, author of SQLAlchemy and Alembic, recently joined Red Hat to help with OpenStack, as well as continue to maintain SQLAlchemy and Alembic.  He has been surveying OpenStack’s current usage of SQLAlchemy and identifying areas where we can improve.  He has written up a fascinating wiki page with his findings.  I expect this to result in some very nice improvements to many OpenStack projects, including Nova.

Other

There are many other features being worked on right now for Nova. The best place to get an idea of what’s going on is to look at either the list of approved design specs or the list of specs under review.

Availability Zones and Host Aggregates in OpenStack Compute (Nova)

UPDATE 2014-06-18: There was a talk at the last OpenStack Summit in Atlanta on this topic, Divide and Conquer: Resource Segregation in the OpenStack Cloud.

Confusion around Host Aggregates and Availabaility Zones in Nova seems to be very common. In this post I’ll attempt to show how each are used. All information in this post is based on the way things work in the Grizzly version of Nova.

First, go ahead and forget everything you know about things called Availability Zones in other systems.  They are not the same thing and trying to map Nova’s concept of Availability Zones to what something else calls Availability Zones will only cause confusion.

The high level view is this: A host aggregate is a grouping of hosts with associated metadata.  A host can be in more than one host aggregate.  The concept of host aggregates is only exposed to cloud administrators.

A host aggregate may be exposed to users in the form of an availability zone. When you create a host aggregate, you have the option of providing an availability zone name. If specified, the host aggregate you have created is now available as an availability zone that can be requested.

Here is a tour of some commands.

Create a host aggregate:

$ nova aggregate-create test-aggregate1
+----+-----------------+-------------------+-------+----------+
| Id | Name            | Availability Zone | Hosts | Metadata |
+----+-----------------+-------------------+-------+----------+
| 1  | test-aggregate1 | None              |       |          |
+----+-----------------+-------------------+-------+----------+

Create a host aggregate that is exposed to users as an availability zone. (This is not creating a host aggregate within an availability zone! It is creating a host aggregate that is the availability zone!)

$ nova aggregate-create test-aggregate2 test-az
+----+-----------------+-------------------+-------+----------+
| Id | Name            | Availability Zone | Hosts | Metadata |
+----+-----------------+-------------------+-------+----------+
| 2  | test-aggregate2 | test-az           |       |          |
+----+-----------------+-------------------+-------+----------+

Add a host to a host aggregate, test-aggregate2. Since this host aggregate defines the availability zone test-az, adding a host to this aggregate makes it a part of the test-az availability zone.

nova aggregate-add-host 2 devstack
Aggregate 2 has been successfully updated.
+----+-----------------+-------------------+---------------+------------------------------------+
| Id | Name            | Availability Zone | Hosts         | Metadata                           |
+----+-----------------+-------------------+---------------+------------------------------------+
| 2  | test-aggregate2 | test-az           | [u'devstack'] | {u'availability_zone': u'test-az'} |
+----+-----------------+-------------------+---------------+------------------------------------+

Note that the novaclient output shows the availability zone twice. The data model on the backend only stores the availability zone in the metadata. There is not a separate column for it. The API returns the availability zone separately from the general list of metadata, though, since it’s a special piece of metadata.

Now that the test-az availability zone has been defined and contains one host, a user can boot an instance and request this availability zone.

$ nova boot --flavor 84 --image 64d985ba-2cfa-434d-b789-06eac141c260 \
> --availability-zone test-az testinstance
$ nova show testinstance
+-------------------------------------+----------------------------------------------------------------+
| Property                            | Value                                                          |
+-------------------------------------+----------------------------------------------------------------+
| status                              | BUILD                                                          |
| updated                             | 2013-05-21T19:46:06Z                                           |
| OS-EXT-STS:task_state               | spawning                                                       |
| OS-EXT-SRV-ATTR:host                | devstack                                                       |
| key_name                            | None                                                           |
| image                               | cirros-0.3.1-x86_64-uec (64d985ba-2cfa-434d-b789-06eac141c260) |
| private network                     | 10.0.0.2                                                       |
| hostId                              | f038bdf5ff35e90f0a47e08954938b16f731261da344e87ca7172d3b       |
| OS-EXT-STS:vm_state                 | building                                                       |
| OS-EXT-SRV-ATTR:instance_name       | instance-00000002                                              |
| OS-EXT-SRV-ATTR:hypervisor_hostname | devstack                                                       |
| flavor                              | m1.micro (84)                                                  |
| id                                  | 107d332a-a351-451e-9cd8-aa251ce56006                           |
| security_groups                     | [{u'name': u'default'}]                                        |
| user_id                             | d0089a5a8f5440b587606bc9c5b2448d                               |
| name                                | testinstance                                                   |
| created                             | 2013-05-21T19:45:48Z                                           |
| tenant_id                           | 6c9cfd6c838d4c29b58049625efad798                               |
| OS-DCF:diskConfig                   | MANUAL                                                         |
| metadata                            | {}                                                             |
| accessIPv4                          |                                                                |
| accessIPv6                          |                                                                |
| progress                            | 0                                                              |
| OS-EXT-STS:power_state              | 0                                                              |
| OS-EXT-AZ:availability_zone         | test-az                                                        |
| config_drive                        |                                                                |
+-------------------------------------+----------------------------------------------------------------+

All of the examples so far show how host-aggregates provide an API driven mechanism for cloud administrators to define availability zones. The other use case host aggregates serves is a way to tag a group of hosts with a type of capability. When creating custom flavors, you can set a requirement for a capability. When a request is made to boot an instance of that type, it will only consider hosts in host aggregates tagged with this capability in its metadata.

We can add some metadata to the original host aggregate we created that was *not* also an availability zone, test-aggregate1.

$ nova aggregate-set-metadata 1 coolhardware=true
Aggregate 1 has been successfully updated.
+----+-----------------+-------------------+-------+----------------------------+
| Id | Name            | Availability Zone | Hosts | Metadata                   |
+----+-----------------+-------------------+-------+----------------------------+
| 1  | test-aggregate1 | None              | []    | {u'coolhardware': u'true'} |
+----+-----------------+-------------------+-------+----------------------------+

A flavor can include a set of key/value pairs called extra_specs. Here’s an example of creating a flavor that will only run on hosts in an aggregate with the coolhardware=true metadata.

$ nova flavor-create --is-public true m1.coolhardware 100 2048 20 2
+-----+-----------------+-----------+------+-----------+------+-------+-------------+-----------+
| ID  | Name            | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
+-----+-----------------+-----------+------+-----------+------+-------+-------------+-----------+
| 100 | m1.coolhardware | 2048      | 20   | 0         |      | 2     | 1.0         | True      |
+-----+-----------------+-----------+------+-----------+------+-------+-------------+-----------+
$ nova flavor-key 100 set coolhardware=true
$ nova flavor-show 100
+----------------------------+----------------------------+
| Property                   | Value                      |
+----------------------------+----------------------------+
| name                       | m1.coolhardware            |
| ram                        | 2048                       |
| OS-FLV-DISABLED:disabled   | False                      |
| vcpus                      | 2                          |
| extra_specs                | {u'coolhardware': u'true'} |
| swap                       |                            |
| os-flavor-access:is_public | True                       |
| rxtx_factor                | 1.0                        |
| OS-FLV-EXT-DATA:ephemeral  | 0                          |
| disk                       | 20                         |
| id                         | 100                        |
+----------------------------+----------------------------+

Hopefully this provides some useful information on what host aggregates and availability zones are, and how they are used.

OpenStack Compute (Nova) Roadmap for Havana

The Havana design summit was held mid-April.  Since then we have been documenting the Havana roadmap and going full speed ahead on development of these features.  The list of features that developers have committed to completing for the Havana release is tracked using blueprints on Launchpad. At the time of writing, we have 74 blueprints listed that cover a wide range of development efforts.  Here are some highlights in no particular order:

Database Handling

Vish Ishaya made a change at the very beginning of the development cycle that will allow us to backport database migrations to the Grizzly release if needed. This is needed in case we need to backport a bug fix that requires a migration.

Dan Smith and Chris Behrens are working on a unified object model. One of the things that has been in the way of rolling upgrades of a Nova deployment is that the code and the database schema are very tightly coupled. The primary goal of this effort is to decouple these things. This effort is bringing in some other improvements, as well, including better object serialization handling for rpc, as well as object versioning.

Boris Pavlovic continues to do a lot of cleanup of database support in Nova.  He’s adding tests (and more tests), adding unique constraints, improving session handling, and improving archiving.

Chris Behrens has been working on a native MySQL database driver that performs much better than the SQLAlchemy driver for use in large scale deployments.

Mike Wilson is working on supporting read-only database slaves. This will allow distributing some queries to other database servers to help scaling in large scale deployments.

Bare Metal

The Grizzly release of Nova included the bare metal provisioning driver. Interest in this functionality has been rapidly increasing. Devananda van der Veen proposed that the bare metal provisioning code be split out into a new project called Ironic. The new project was approved for incubation by the OpenStack Technical Committee last week. Once this has been completed, there will be a driver in Nova that talks to the Ironic API. The Ironic API will present some additional functionality that doesn’t make sense to use to present in the Compute API in Nova.

Prior to the focus shift to Ironic, some new features were added to the bare metal driver. USC-ISI added support for Tilera and Devananda added a feature that allows you to request a specific bare metal node when provisioning a server.

Version 3 (v3) of the Compute API

The Havana release will include a new revision of the compute REST API in Nova. This effort is being led by Christopher Yeoh, with help from others. The v3 API will include a new framework for implementing extensions, extension versioning, and a whole bunch of cleanup: (1) (2) (3) (4).

Networking

The OpenStack community has been maintaining two network stacks for some time. Nova includes the nova-network service. Meanwhile, the OpenStack Networking project has been developed from scratch to support much more than nova-network does. Nova currently supports both. OpenStack Networking is expected to reach and surpass feature parity with nova-network in the Havana cycle. As a result, it’s time to deprecate nova-network. Vish Ishaya (from the Nova side) and Gary Kotton (from the OpenStack Networking side) have agreed to take on the challenging task of figuring out how to migrate existing deployments using nova-network to an updated environment that includes OpenStack Networking.

Scheduling

The Havana roadmap includes a mixed bag of scheduler features.

Andrew Laski is going to make the changes required so that the scheduler becomes exclusively a resource that gets queried. Currently, when starting an instance, the request is handed off to the scheduler, which then hands it off to the compute node that is selected. This change will make it so proxying through nova-scheduler is no longer done. This will mean that every operation that uses the scheduler will interact with it the same way, as opposed to some operations querying and others proxying.

Phil Day will be adding an API extension that allows you to discover which scheduler hints are supported.  Phil is also looking at adding a way to allocate an entire host to a single tenant.

Inbar Shapira is looking at allowing multiple scheduling policies to be in effect at the same time.  This will allow you to have different sets of scheduler filters activated depending on some type of criteria (perhaps the requested availability zone).

Rerngvit Yanggratoke is implementing support for weighting scheduling decisions based on the CPU utilization of existing instances on a host.

Migrations

Nova includes support for different types of migrations. We have cold migrations (migrate) and live migrations (live-migrate). We also have resize and evactuate, which are very related functions. The code paths for all of these features have evolved separately. It turns out that we can rework all of these things to share a lot of code. While we’re at it, we are restructuring the way these operations work to be primarily driven by the nova-conductor service.  This will allow the tasks to be tracked in a single place, as opposed to the flow of control being passed around between compute nodes. Having compute nodes tell each other what to do is also a very bad thing from a security perspective. These efforts are well underway. Tiago Rodrigues de Mello is working on moving cold migrations to nova-conductor and John Garbutt is working on moving live migrations. All of this is tracked under the parent blueprint for unified migrations.

And More!

This post doesn’t include every feature on the roadmap. You can find that here. I fully expect that more will be added to this list as Havana progresses. We don’t always know what features are being worked on in advance. If you have another feature you would like to propose, let’s talk about it on the openstack-dev list!

Deployment Considerations for nova-conductor Service in OpenStack Grizzly

The Grizzly release of OpenStack Nova includes a new service, nova-conductor. Some previous posts about this service can be found here and here. This post is intended to provide some additional insight into how this service should be deployed and how the service should be scaled as load increases.

Smaller OpenStack Compute deployments typically consist of a single controller node and one or more compute (hypervisor) nodes. The nova-conductor service fits in the category of controller services. In this style of deployment you would run nova-conductor on the controller node and not on the compute nodes. Note that most of what nova-conductor does in the Grizzly release is doing database operations on behalf of compute nodes. This means that the controller node will have more work to do than in previous releases. Load should be monitored and the controller services should be scaled out if necessary.

Here is a model of what this size of deployment might look like:

nova-simple

As Compute deployments get larger, the controller services are scaled horizontally. For example, there would be multiple instances of the nova-api service on multiple nodes sitting behind a load balancer. You may also be running multiple instances of the nova-scheduler service across multiple nodes. Load balancing is done automatically for the scheduler by the AMQP message broker, RabbitMQ or Qpid. The nova-conductor service should be scaled out in this same way. It can be run multiple times across multiple nodes and the load will be balanced automatically by the message broker.

Here is a second deployment model. This gives an idea of how a deployment grows beyond a single controller node.

nova-complex

There are a couple of ways to monitor the performance of nova-conductor to see if it needs to be scaled out. The first is by monitoring CPU load. The second is by monitoring message queue size. If the queues are getting backed up, it is likely time to scale out services.  Both messaging systems provide at least one way to look at the state of message queues. For Qpid, try the qpid-stat command. For RabbitMQ, see the rabbitmqctl list_queues command.

Installing Steam for Linux Beta on Fedora 17

UPDATE: Since the original post, the download of Team Fortress 2 completed and I hit a problem. The post has been amended with the solution.

It sounds like a lot more people got access to the Steam for Linux beta yesterday, including me. An announcement on steamcommunity.com says:

We’ve just expanded the limited public beta by a large amount – which means another round of email notifications – so check your inbox!

The official download is a deb package for Ubuntu. My laptop runs Fedora 17. I was pleasantly surprised to see that there was an unofficial Fedora repository already ready to use. Here is how I installed the beta on my laptop running Fedora 17:

$ wget http://spot.fedorapeople.org/steam/steam.repo
...
$ sudo mv steam.repo /etc/yum.repos.d/
$ sudo yum install steam
...
$ rpm -q steam
steam-1.0.0.14-3.fc17.i686

Once installation was completed, I ran the steam client from the same terminal:

$ steam

The first time I ran the steam client it automatically created /home/rbryant/Steam and downloaded about 100 kB of updates. Once the updates completed, the login screen came up. I closed the steam client and ran it again. I got a warning dialog that said:

Unable to copy /home/rbryant/Steam/bin_steam.sh to /usr/bin/steam, please contact your system administrator.

This was a bit odd since the app that I had been running was already /usr/bin/steam. I suspect this is just automatically installing a new version based on what was downloaded with the updates. Based on the output in my terminal, I can see that before this warning came up, steam tried to find gksudo, kdesudo, or xterm and then gave up. I went ahead and installed xterm.

$ sudo yum install xterm

When running steam yet again, it popped up an xterm window to ask me to type in my password. This only happened once. Subsequent runs of the steam client in my terminal went straight to the login window.

From there I finally decided to log in using my existing steam account. I confirmed access to my account on a new computer and was in. I kicked off a download of Team Fortress 2 Beta for Linux.

Once the game download was complete, I clicked Play. The first time I tried, it failed with the following error:

Required OpenGL extension “GL_EXT_texture_compression_s3tc” is not supported. Please install S3TC texture support.

To fix this, I had to add the rpmfusion repositories to my machine and install the libtxc_dxtn package.


$ sudo yum localinstall --nogpgcheck http://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-stable.noarch.rpm http://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-stable.noarch.rpm

$ sudo yum install libtxc_dxtn

If you’re running 64-bit Linux, you will actually need the 32-bit version of this library to fix the game. Install it by running:

$ sudo yum install libtxc_dxtn.i686

Once all of that was done, the game launched successfully and I was able to start a training session.

Enjoy!

A new Nova service: nova-conductor

The Grizzly release of OpenStack Nova will have a new service, nova-conductor. The service was discussed on the openstack-dev list and it was merged today. There is currently a configuration option that can be turned on to make it optional, but it is possible that by the time Grizzly is released, this service will be required.

One of the efforts that started during Folsom development and is scheduled to be completed in Grizzly is no-db-compute. In short, this effort is to remove direct database access from the nova-compute service. There are two main reasons we are doing this. Compute nodes are the least trusted part of a nova deployment, so removing direct database access is a step toward reducing the potential impact of a compromised compute node. The other benefit of no-db-compute is for upgrades. Direct database access complicates the ability to do live rolling upgrades. We’re working toward eventually making that possible, and this is a part of that.

All of the nova services use a messaging system (usually AMQP based) to communicate with each other. Many of the database accesses in nova-compute can be (and have been) removed by just sending more data in the initial message sent to nova-compute. However, that doesn’t apply to everything. That’s where the new service, nova-conductor, comes in.

The nova-conductor service is key to completing no-db-compute. Conceptually, it implements a new layer on top of nova-compute. It should *not* be deployed on compute nodes, or else the security benefits of removing database access from nova-compute will be negated. Just like other nova services such as nova-api or nova-scheduler, it can be scaled horizontally. You can run multiple instances of nova-conductor on different machines as needed for scaling purposes.

The methods exposed by nova-conductor will initially be relatively simple methods used by nova-compute to offload its database operations. Places where nova-compute previously did database access will now be talking to nova-conductor. However, we have plans in the medium to long term to move more and more of what is currently in nova-compute up to the nova-conductor layer. The compute service will start to look like a less intelligent slave service to nova-conductor. The conductor service will implement long running complex operations, ensuring forward progress and graceful error handling. This will be especially beneficial for operations that cross multiple compute nodes, such as migrations or resizes.

If you have any comments, questions, or suggestions for nova-conductor or the no-db-compute effort in general, please feel free to bring it up on the openstack-dev list.

OpenStack Design Summit and an Eye on Folsom

I just spent a week in San Francisco at the OpenStack design summit and conference. It was quite an amazing week and I’m really looking forward to the Folsom development cycle.  You can find a notes from various sessions held at the design summit on the OpenStack wiki.

Essex was the first release that I contributed to.  One thing I did was add Qpid support to both Nova and Glance as an alternative to using RabbitMQ.  Beyond that, I primarily worked on vulnerability management and other bug fixing.  For Folsom, I’m planning on working on some more improvements involving the inter-service messaging layer, also referred to as the rpc API, in Nova.

1) Moving the rpc API to openstack-common

The rpc API in Nova is used for private communication between nova services. As an example, when a tenant requests that a new virtual machine instance be created (either via the EC2 API or the OpenStack compute REST API), the nova-api service sends a message to the nova-scheduler service via the rpc API.  The nova-scheduler service decides where the instance is going to live and then sends a message to that compute node’s nova-compute service via the rpc API.

The other usage of the rpc API in Nova has been for notifications.  Notifications are asynchronous messages about events that happen within the system.  They can be used for monitoring and billing, among other things.  Strictly speaking, notifications aren’t directly tied to rpc.  A Notifier is an abstraction, of which using rpc is one of the implementations.  Glance also has notifications, including a set of Notifier implementations.  The code was the same at one point but has diverged quite a bit since.

We would like to move the notifiers into openstack-common.  Moving rpc into openstack-common is a prerequisite for that, so I’m going to knock that part out.  I’ve already written a few patches in that direction.  Once the rpc API is in openstack-common, other projects will be able to make use of it.  There was discussion of Quantum using rpc at the design summit, so this will be needed for that, too.  Another benefit is that the Heat project is using a copy of Nova’s rpc API right now, but will be able to migrate over to using the version from openstack-common.

2) Versioning the rpc API interfaces

The existing rpc API is pretty lightweight and seems to work quite well.  One limitation is that there is nothing provided to help with different versions of services talking to each other.  It may work … or it may not.  If it doesn’t, the failure you get could be something obvious, or it could be something really bad and bizarre where an operation fails half-way through, leaving things in a bad state.  I’d like to clean this up.

The end goal with this effort will be to make sure that as you upgrade from Essex to Folsom, any messages originating from an Essex service can and will be correctly processed by a Folsom service.  If that fails, then the failure should be immediate and obvious that a message was rejected due to a version issue.

3) Removing database access from nova-compute

This is by far the biggest effort of the 3 described here, and I won’t be tackling this one alone.  I want to help drive it, though.  This discussion came up in the design summit session about enhancements to Nova security.  By removing direct database access from the nova-compute service, we can help reduce the potential impact if a compute node were to be compromised.  There are two main parts to this effort.

The first part is to make more efficient use of messaging by sending full objects through the rpc API instead of IDs. For example, there are many cases where the nova-api service gets an instance object from the database, does its thing, and then just sends the instance ID in rpc message.  On the other side it has to go pull that same object out of the database.  We have to go through and change all cases like this to include the full object in the message.  In addition to the security benefit, it should be more efficient, as well. This doesn’t sound too complicated, and it isn’t really, but it’s quite a bit of work as there is a lot of code that needs to be changed. There will be some snags to deal with along the way, such as dealing with making sure all of the objects can be serialized properly.

Including full objects in messages is only part of the battle here.  The nova-compute service also does database updates.  We will have to come up with a new approach for this.  It will most likely end up being a new service of some sort, that handles state changes coming from compute nodes and makes the necessary database updates according to those state changes. I haven’t fully thought through the solution to this part yet. For example, ensuring that this service maintains proper order of messages without turning into a bottleneck in the overall system will be important.

Onward!

I’m sure I’ll work on other things in Folsom, as well, but those are three that I have on my mind right now.  OpenStack is a great project and I’m excited to be a part of it!