The Grizzly release of OpenStack Nova includes a new service, nova-conductor. Some previous posts about this service can be found here and here. This post is intended to provide some additional insight into how this service should be deployed and how the service should be scaled as load increases.
Smaller OpenStack Compute deployments typically consist of a single controller node and one or more compute (hypervisor) nodes. The nova-conductor service fits in the category of controller services. In this style of deployment you would run nova-conductor on the controller node and not on the compute nodes. Note that most of what nova-conductor does in the Grizzly release is doing database operations on behalf of compute nodes. This means that the controller node will have more work to do than in previous releases. Load should be monitored and the controller services should be scaled out if necessary.
Here is a model of what this size of deployment might look like:
As Compute deployments get larger, the controller services are scaled horizontally. For example, there would be multiple instances of the nova-api service on multiple nodes sitting behind a load balancer. You may also be running multiple instances of the nova-scheduler service across multiple nodes. Load balancing is done automatically for the scheduler by the AMQP message broker, RabbitMQ or Qpid. The nova-conductor service should be scaled out in this same way. It can be run multiple times across multiple nodes and the load will be balanced automatically by the message broker.
Here is a second deployment model. This gives an idea of how a deployment grows beyond a single controller node.
There are a couple of ways to monitor the performance of nova-conductor to see if it needs to be scaled out. The first is by monitoring CPU load. The second is by monitoring message queue size. If the queues are getting backed up, it is likely time to scale out services. Both messaging systems provide at least one way to look at the state of message queues. For Qpid, try the
qpid-stat command. For RabbitMQ, see the
rabbitmqctl list_queues command.
It would be interesting to see some reference configurations for cells and nova controller as we scale to more hypervisors/VMs.
Being a proponent of the nova-no-db-compute effort, I think this is a step in the right direction. I hope the conductor is only a refactoring step in resolving the interaction between controller and compute nodes.
Did you test the “scale out” setup regarding actual scaling? Depending on the usage and interaction patterns I suspect that you might actually see an increase in message queue size and api response times because of blocking transactions in the db if you add more schedulers/conductors, because in the end these services coordinate ressources (like available memory etc.) which are synchronised via the db.
When I tested the folsom version (single scheduler) I already noticed an increase in api response times with higher concurrency in requests. If my reasoning is correct, this should get worse with more schedulers and conductors.
Yes, this is indeed a first step in refactoring the controller/compute interaction. There are others already working hard on design proposals for further significant refactoring to make the nova-compute service more of a slave to logic that has been moved up. Details are still being worked out though …
I haven’t personally tested out the scaling. It’s somewhat theoretical, but also based on what I learn from talking to people deploying things. I would definitely be interested in hearing things that work well, or don’t work well, based on real usage.
You’re right that there is some resource coordination, but much of it is actually fairly optimistic. A scheduler will pick a node for an instance to run on, but it doesn’t completely assume that it will work. The request will go down to the compute node and resources get reserved there. If it doesn’t work out, the request gets kicked back to the scheduler for it to retry somewhere else. So, I don’t think there should be a lot of blocking and contention in the scheduler on the db.
Pingback: OpenStack Grizzly Architecture (revisited) | Solinea
Pingback: OpenStack Grizzly Architecture | VietHiP
Hi Russel, i’ve deployed grizzle like yo mention on the first image, but i can’t launch any VMs. I think it’s related to the nova-conductor being misconfigured, but i haven’t found any doc about it. Where should i tell nova-compute it has to use nova-conductor? Thanks
Pingback: What’s new in OpenStack Grizzly ?
We are writing a survey which evaluates the performance of virtual instances on OpenStack and OpenNebula using the same hypervisor KVM.
In initial tests, OpenNebula was much better. We believe that the performance of OpenStack (Havana Version), has been affected by their sets of components (New, RabbitMQ, Neutron, OpenvSwitch, …), especially wide memory bandwidth and processor.
These are the articles published in Brazilian events, I would like to know your opinions about these results.