An EZ Bake OVN for OpenStack

When Ben Pfaff pushed the last of the changes needed to make OVN functional to the ovn branch, he dubbed it the “EZ Bake milestone”.  The analogy is both humorous and somewhat accurate.  We’ve reached the first functional milestone, which is quite exciting.

ovn-m0In previous posts I have gone through and shown components of the system as it has been built.  Now that it’s functional, I will go through a working demonstration of OpenStack using OVN.

DevStack

For this test environment we’ll stand up two hosts using DevStack.  Both hosts will be VMs running Fedora 21 that have 2 vCPUs and 4 GB of RAM.  We will refer to them as ovn-devstack-1 and ovn-devstack-2.

Each VM needs to have git installed and a user created that has sudo access.  This user will be used run DevStack.

Setting up ovn-devstack-1

The first DevStack host will look like a typical single node DevStack install that runs all of OpenStack.  It will be using OVN to provide L2 network connectivity instead of the default OVS ML2 driver and the neutron OVS agent.  It will still make use of the L3 and DHCP agents from Neutron as the equivalent functionality has not yet been implemented in OVN.

Start by cloning DevStack and networking-ovn:

(ovn-devstack-1)$ git clone http://git.openstack.org/openstack-dev/devstack.git
(ovn-devstack-1)$ git clone http://git.openstack.org/stackforge/networking-ovn.git

networking-ovn comes with some sample configuration files for DevStack.  We can use the main sample for this host without any modifications needed.

(ovn-devstack-1)$ cd devstack
(ovn-devstack-1)$ cp ../networking-ovn/devstack/local.conf.sample local.conf

After the DevStack configuration is in place, run DevStack to set up the environment.

(ovn-devstack-1)$ ./stack.sh

This takes several minutes to complete.  Once it has completed successfully, you should see some output that looks like this:

This is your host ip: 172.16.189.6
Horizon is now available at http://172.16.189.6/
Keystone is serving at http://172.16.189.6:5000/
The default users are: admin and demo
The password: password
2015-05-13 18:59:48.169 | stack.sh completed in 989 seconds.

Setting up ovn-devstack-2

The second DevStack host runs a minimal set of services needed to add an additional compute node (or hypervisor) to the existing DevStack environment.  It needs to run the OpenStack nova-compute service for managing local VMs and ovn-controller to manage the local ovs configuration.

Setting up the second DevStack host is a very similar process.  Start by cloning DevStack and networking-ovn.

(ovn-devstack-2)$ git clone http://git.openstack.org/openstack-dev/devstack.git
(ovn-devstack-2)$ git clone http://git.openstack.org/stackforge/networking-ovn.git

networking-ovn provides an additional sample configuration file for DevStack that is intended to be used for adding additional compute nodes to an existing DevStack environment.  You must set the SERVICE_HOST configuration variable in this file to be the IP address of the main DevStack host.

(ovn-devstack-2)$ cd devstack
(ovn-devstack-2)$ cp ../networking-ovn/devstack/computenode-local.conf.sample local.conf
(ovn-devstack-2)$ vim local.conf
... edit to set SERVICE_HOST=172.16.189.6 in this example ...

Once the DevStack configuration is ready, you can run DevStack to set up the new compute node.  It should take less time to complete than the first DevStack host.

(ovn-devstack-2)$ ./stack.sh

Once it completes, you should see output that looks like this:

This is your host ip: 172.16.189.10
2015-05-13 19:02:30.663 | stack.sh completed in 98 seconds.

The Default Environment

DevStack is now running on two hosts.  Let’s take a look at the default state of this environment before we start creating VMs.  We’ll run various OpenStack command line tools to interact with the OpenStack APIs.  By default, these tools get credentials from environment variables.  DevStack comes with a file called openrc that makes it easy to switch between admin (the cloud administrator) and demo (a regular cloud user) credentials.

We can start by making sure that Nova sees two hypervisors.  This API requires admin credentials.

(ovn-devstack-1)$ cd devstack
(ovn-devstack-1)$ . openrc admin
(ovn-devstack-1)$ nova hypervisor-list
+----+------------------------------------+-------+---------+
| ID | Hypervisor hostname                | State | Status  |
+----+------------------------------------+-------+---------+
| 1  | ovn-devstack-1.os1.phx2.redhat.com | up    | enabled |
| 2  | ovn-devstack-2.os1.phx2.redhat.com | up    | enabled |
+----+------------------------------------+-------+---------+

DevStack also has a default network configuration.  We can use the neutron command line tool to list the default networks.

(ovn-devstack-1)$ . openrc admin
(ovn-devstack-1)$ neutron net-list
+--------------------------------------+---------+----------------------------------------------------------+
| id                                   | name    | subnets                                                  |
+--------------------------------------+---------+----------------------------------------------------------+
| 7e78ba86-2114-47ac-8194-201936e3820a | public  | ebfe46b4-e0ab-4cda-b2ee-5bb1761b5995 172.24.4.0/24       |
|                                      |         | b435f473-bed1-41bf-9110-797424364016 2001:db8::/64       |
| cfc9ff50-4435-4b29-bf2e-c27dd6cf5a5e | private | 74056863-9d45-452a-a431-344a33cf517b fdc1:1919:4bd6::/64 |
|                                      |         | d5ad74d7-7bd9-4646-add2-e816cfee1ec3 10.0.0.0/24         |
+--------------------------------------+---------+----------------------------------------------------------+

The Horizon web interface also provides a visual representation of the network topology:

default-topology

The default environment also creates four Neutron ports.  Three are related to the router and can be seen in the diagram above.  The fourth (not shown) is for the DHCP agent providing DHCP services to the private network.

$ neutron port-list
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------------------------------+
| id                                   | name | mac_address       | fixed_ips                                                                                                   |
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------------------------------+
| 381a2d96-bc4a-4785-82bc-4f2b48e007e8 |      | fa:16:3e:76:12:96 | {"subnet_id": "d5ad74d7-7bd9-4646-add2-e816cfee1ec3", "ip_address": "10.0.0.2"}                             |
|                                      |      |                   | {"subnet_id": "74056863-9d45-452a-a431-344a33cf517b", "ip_address": "fdc1:1919:4bd6:0:f816:3eff:fe76:1296"} |
| a5b967d4-296e-44dc-98b9-7336d0224e57 |      | fa:16:3e:8c:d0:a8 | {"subnet_id": "ebfe46b4-e0ab-4cda-b2ee-5bb1761b5995", "ip_address": "172.24.4.2"}                           |
|                                      |      |                   | {"subnet_id": "b435f473-bed1-41bf-9110-797424364016", "ip_address": "2001:db8::1"}                          |
| b2e0ae9e-d472-42ed-8776-6b338349d01d |      | fa:16:3e:b7:cd:77 | {"subnet_id": "74056863-9d45-452a-a431-344a33cf517b", "ip_address": "fdc1:1919:4bd6::1"}                    |
| f24756f3-d803-47d3-9fc7-1315f4071ac0 |      | fa:16:3e:b1:34:ed | {"subnet_id": "d5ad74d7-7bd9-4646-add2-e816cfee1ec3", "ip_address": "10.0.0.1"}                             |
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------------------------------+

These default networks and ports can also be seen in OVN. OVN has a northbound database (OVN_Northbound) that serves as the public interface to OVN.  The Neutron driver updates this database to indicate the desired state.  OVN comes with a command line utility, ovn-nbctl, which can be used to view or update the OVN_Northbound database.  The show command gives a summary of the current configuration.

(ovn-devstack-1)$ ovn-nbctl show
    lswitch f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 (neutron-7e78ba86-2114-47ac-8194-201936e3820a)
        lport a5b967d4-296e-44dc-98b9-7336d0224e57
            macs: fa:16:3e:8c:d0:a8
    lswitch bd7dbbf9-1325-491f-b46b-80b4ecfc560b (neutron-cfc9ff50-4435-4b29-bf2e-c27dd6cf5a5e)
        lport f24756f3-d803-47d3-9fc7-1315f4071ac0
            macs: fa:16:3e:b1:34:ed
        lport 381a2d96-bc4a-4785-82bc-4f2b48e007e8
            macs: fa:16:3e:76:12:96
        lport b2e0ae9e-d472-42ed-8776-6b338349d01d
            macs: fa:16:3e:b7:cd:77

Launching VMs

Now that the environment is ready, we can start launching VMs.  We will launch two VMs so that one will end up on each of our compute nodes.  We’ll verify that the data path is working and then inspect what OVN has done to make it work.

We want our VMs to have a single vNIC attached to the private Neutron network.

(ovn-devstack-1)$ . openrc demo
(ovn-devstack-1)$ neutron net-list
+--------------------------------------+---------+----------------------------------------------------------+
| id                                   | name    | subnets                                                  |
+--------------------------------------+---------+----------------------------------------------------------+
| 7e78ba86-2114-47ac-8194-201936e3820a | public  | ebfe46b4-e0ab-4cda-b2ee-5bb1761b5995                     |
|                                      |         | b435f473-bed1-41bf-9110-797424364016                     |
| cfc9ff50-4435-4b29-bf2e-c27dd6cf5a5e | private | 74056863-9d45-452a-a431-344a33cf517b fdc1:1919:4bd6::/64 |
|                                      |         | d5ad74d7-7bd9-4646-add2-e816cfee1ec3 10.0.0.0/24         |
+--------------------------------------+---------+----------------------------------------------------------+

(ovn-devstack-1)$ PRIVATE_NET_ID=cfc9ff50-4435-4b29-bf2e-c27dd6cf5a5e

DevStack automatically imports a very small test image, CirrOS, which suits our needs.

(ovn-devstack-1)$ glance image-list
+--------------------------------------+---------------------------------+-------------+------------------+----------+--------+
| ID                                   | Name                            | Disk Format | Container Format | Size     | Status |
+--------------------------------------+---------------------------------+-------------+------------------+----------+--------+
| 4d9443ee-1497-4bb3-b917-2e35b0e59eab | cirros-0.3.4-x86_64-uec         | ami         | ami              | 25165824 | active |
| ab38e2d2-8397-4ece-8aa3-9a3058e63029 | cirros-0.3.4-x86_64-uec-kernel  | aki         | aki              | 4979632  | active |
| 38721471-6a19-45f7-8c8d-fd64b7737fd7 | cirros-0.3.4-x86_64-uec-ramdisk | ari         | ari              | 3740163  | active |
+--------------------------------------+---------------------------------+-------------+------------------+----------+--------+

(ovn-devstack-1)$ IMAGE_ID=4d9443ee-1497-4bb3-b917-2e35b0e59eab

We’ll use the m1.nano flavor, as minimal resources are sufficient for our testing with these VMs.

(ovn-devstack-1)$ nova flavor-list
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+
| ID | Name      | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+
| 1  | m1.tiny   | 512       | 1    | 0         |      | 1     | 1.0         | True      |
| 2  | m1.small  | 2048      | 20   | 0         |      | 1     | 1.0         | True      |
| 3  | m1.medium | 4096      | 40   | 0         |      | 2     | 1.0         | True      |
| 4  | m1.large  | 8192      | 80   | 0         |      | 4     | 1.0         | True      |
| 42 | m1.nano   | 64        | 0    | 0         |      | 1     | 1.0         | True      |
| 5  | m1.xlarge | 16384     | 160  | 0         |      | 8     | 1.0         | True      |
| 84 | m1.micro  | 128       | 0    | 0         |      | 1     | 1.0         | True      |
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+

(ovn-devstack-1)$ FLAVOR_ID=42

We also need to create an SSH keypair for logging in to the VMs we create.

(ovn-devstack-1)$ nova keypair-add demo > id_rsa_demo
(ovn-devstack-1)$ chmod 600 id_rsa_demo

We now have everything needed to boot some VMs. We’ll create two of them, named test1 and test2.

(ovn-devstaqck-1)$ nova boot --nic net-id=$PRIVATE_NET_ID --image $IMAGE_ID --flavor $FLAVOR_ID --key-name demo test1
+--------------------------------------+----------------------------------------------------------------+
| Property                             | Value                                                          |
+--------------------------------------+----------------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                         |
| OS-EXT-AZ:availability_zone          | nova                                                           |
| OS-EXT-STS:power_state               | 0                                                              |
| OS-EXT-STS:task_state                | scheduling                                                     |
| OS-EXT-STS:vm_state                  | building                                                       |
| OS-SRV-USG:launched_at               | -                                                              |
| OS-SRV-USG:terminated_at             | -                                                              |
| accessIPv4                           |                                                                |
| accessIPv6                           |                                                                |
| adminPass                            | 9NMJrLeCDPJv                                                   |
| config_drive                         |                                                                |
| created                              | 2015-05-14T13:33:55Z                                           |
| flavor                               | m1.nano (42)                                                   |
| hostId                               |                                                                |
| id                                   | d91cf422-fe2e-4131-bc49-2f310daa5cf0                           |
| image                                | cirros-0.3.4-x86_64-uec (4d9443ee-1497-4bb3-b917-2e35b0e59eab) |
| key_name                             | demo                                                           |
| metadata                             | {}                                                             |
| name                                 | test1                                                          |
| os-extended-volumes:volumes_attached | []                                                             |
| progress                             | 0                                                              |
| security_groups                      | default                                                        |
| status                               | BUILD                                                          |
| tenant_id                            | 92fbf8554b2246c5bb9b0db0be55529c                               |
| updated                              | 2015-05-14T13:33:56Z                                           |
| user_id                              | 207b4e55a2684f20a2a21e14c28dffed                               |
+--------------------------------------+----------------------------------------------------------------+

(ovn-devstack-1)$ nova boot --nic net-id=$PRIVATE_NET_ID --image $IMAGE_ID --flavor $FLAVOR_ID --key-name demo test2
+--------------------------------------+----------------------------------------------------------------+
| Property                             | Value                                                          |
+--------------------------------------+----------------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                         |
| OS-EXT-AZ:availability_zone          | nova                                                           |
| OS-EXT-STS:power_state               | 0                                                              |
| OS-EXT-STS:task_state                | scheduling                                                     |
| OS-EXT-STS:vm_state                  | building                                                       |
| OS-SRV-USG:launched_at               | -                                                              |
| OS-SRV-USG:terminated_at             | -                                                              |
| accessIPv4                           |                                                                |
| accessIPv6                           |                                                                |
| adminPass                            | BgL89P2oKotD                                                   |
| config_drive                         |                                                                |
| created                              | 2015-05-14T13:34:47Z                                           |
| flavor                               | m1.nano (42)                                                   |
| hostId                               |                                                                |
| id                                   | 4da9dd8e-4583-4955-94b1-0f9eaf77663c                           |
| image                                | cirros-0.3.4-x86_64-uec (4d9443ee-1497-4bb3-b917-2e35b0e59eab) |
| key_name                             | demo                                                           |
| metadata                             | {}                                                             |
| name                                 | test2                                                          |
| os-extended-volumes:volumes_attached | []                                                             |
| progress                             | 0                                                              |
| security_groups                      | default                                                        |
| status                               | BUILD                                                          |
| tenant_id                            | 92fbf8554b2246c5bb9b0db0be55529c                               |
| updated                              | 2015-05-14T13:34:48Z                                           |
| user_id                              | 207b4e55a2684f20a2a21e14c28dffed                               |
+--------------------------------------+----------------------------------------------------------------+

We can use admin credentials to see which hypervisor each VM ended up on. This is just to show that we now have an environment with two VMs on the private Neutron virtual network that spans two hypervisors.

(ovn-devstack-1)$ . openrc admin
(ovn-devstack-1)$ nova show test1 | grep hypervisor_hostname
| OS-EXT-SRV-ATTR:hypervisor_hostname  | ovn-devstack-1.os1.phx2.redhat.com

(ovn-devstack-1)$ nova show test2 | grep hypervisor_hostname
| OS-EXT-SRV-ATTR:hypervisor_hostname  | ovn-devstack-2.os1.phx2.redhat.com

When we first issue the boot requests, the status of each VM was BUILD. Once the VM is running on the hypervisor, it will switch to the ACTIVE status.

(ovn-devstack-1)$ . openrc demo
(ovn-devstack-1)$ nova list --fields name,status,networks
+--------------------------------------+-------+--------+--------------------------------------------------------+
| ID                                   | Name  | Status | Networks                                               |
+--------------------------------------+-------+--------+--------------------------------------------------------+
| d91cf422-fe2e-4131-bc49-2f310daa5cf0 | test1 | ACTIVE | private=fdc1:1919:4bd6:0:f816:3eff:fe24:463a, 10.0.0.3 |
| 4da9dd8e-4583-4955-94b1-0f9eaf77663c | test2 | ACTIVE | private=fdc1:1919:4bd6:0:f816:3eff:fe50:191, 10.0.0.4  |
+--------------------------------------+-------+--------+--------------------------------------------------------+

Testing and Inspecting the Network

Our two new VMs has resulted in two more Neutron ports being created.  This is shown in Horizon’s visual representation of the network topology:

topology-2-vms

We can also get all of the details from the Neutron API:

(ovn-devstack-1)$ . openrc admin
(ovn-devstack-1)$ neutron port-list
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------------------------------+
| id                                   | name | mac_address       | fixed_ips                                                                                                   |
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------------------------------+
| 10964198-b218-417e-a59e-6a6d7096c936 |      | fa:16:3e:50:01:91 | {"subnet_id": "d5ad74d7-7bd9-4646-add2-e816cfee1ec3", "ip_address": "10.0.0.4"}                             |
|                                      |      |                   | {"subnet_id": "74056863-9d45-452a-a431-344a33cf517b", "ip_address": "fdc1:1919:4bd6:0:f816:3eff:fe50:191"}  |
| 381a2d96-bc4a-4785-82bc-4f2b48e007e8 |      | fa:16:3e:76:12:96 | {"subnet_id": "d5ad74d7-7bd9-4646-add2-e816cfee1ec3", "ip_address": "10.0.0.2"}                             |
|                                      |      |                   | {"subnet_id": "74056863-9d45-452a-a431-344a33cf517b", "ip_address": "fdc1:1919:4bd6:0:f816:3eff:fe76:1296"} |
| a5b967d4-296e-44dc-98b9-7336d0224e57 |      | fa:16:3e:8c:d0:a8 | {"subnet_id": "ebfe46b4-e0ab-4cda-b2ee-5bb1761b5995", "ip_address": "172.24.4.2"}                           |
|                                      |      |                   | {"subnet_id": "b435f473-bed1-41bf-9110-797424364016", "ip_address": "2001:db8::1"}                          |
| a7a8ee94-996e-4623-941c-1ef7b7862f6e |      | fa:16:3e:24:46:3a | {"subnet_id": "d5ad74d7-7bd9-4646-add2-e816cfee1ec3", "ip_address": "10.0.0.3"}                             |
|                                      |      |                   | {"subnet_id": "74056863-9d45-452a-a431-344a33cf517b", "ip_address": "fdc1:1919:4bd6:0:f816:3eff:fe24:463a"} |
| b2e0ae9e-d472-42ed-8776-6b338349d01d |      | fa:16:3e:b7:cd:77 | {"subnet_id": "74056863-9d45-452a-a431-344a33cf517b", "ip_address": "fdc1:1919:4bd6::1"}                    |
| f24756f3-d803-47d3-9fc7-1315f4071ac0 |      | fa:16:3e:b1:34:ed | {"subnet_id": "d5ad74d7-7bd9-4646-add2-e816cfee1ec3", "ip_address": "10.0.0.1"}                             |
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------------------------------+

The Ping Test

Now let’s verify that the network seems to work as we expect.  In this environment we can connect to the private Network from ovn-devstack-1. We can start with a quick check that we can ping both VMs and also that we can ping from one VM to the other.

(ovn-devstack-1)$ ping -c 1 10.0.0.3
PING 10.0.0.3 (10.0.0.3) 56(84) bytes of data.
64 bytes from 10.0.0.3: icmp_seq=1 ttl=63 time=2.90 ms

(ovn-devstack-1)$ ping -c 1 10.0.0.4
PING 10.0.0.4 (10.0.0.4) 56(84) bytes of data.
64 bytes from 10.0.0.4: icmp_seq=1 ttl=63 time=3.87 ms

(ovn-devstack-1)$ ssh -i id_rsa_demo cirros@10.0.0.3
(test1)$ ping -c 1 10.0.0.4
PING 10.0.0.4 (10.0.0.4): 56 data bytes
64 bytes from 10.0.0.4: seq=0 ttl=64 time=2.945 ms

It works!

OVN Northbound Database

Now let’s take a closer look at what OVN has done to make this work. We looked at the OVN_Northbound database earlier. It now includes the two additional ports for the VMs in its configuration for the private virtual network.

(ovn-devstack-1)$ ovn-nbctl show
$ ovn-nbctl show
    lswitch f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 (neutron-7e78ba86-2114-47ac-8194-201936e3820a)
        lport a5b967d4-296e-44dc-98b9-7336d0224e57
            macs: fa:16:3e:8c:d0:a8
    lswitch bd7dbbf9-1325-491f-b46b-80b4ecfc560b (neutron-cfc9ff50-4435-4b29-bf2e-c27dd6cf5a5e)
        lport f24756f3-d803-47d3-9fc7-1315f4071ac0
            macs: fa:16:3e:b1:34:ed
        lport 10964198-b218-417e-a59e-6a6d7096c936
            macs: fa:16:3e:50:01:91
        lport 381a2d96-bc4a-4785-82bc-4f2b48e007e8
            macs: fa:16:3e:76:12:96
        lport b2e0ae9e-d472-42ed-8776-6b338349d01d
            macs: fa:16:3e:b7:cd:77
        lport a7a8ee94-996e-4623-941c-1ef7b7862f6e
            macs: fa:16:3e:24:46:3a

When we requested a new VM from Nova, Nova asked Neutron to create a new port on the network we specified. As the port was created, the Neutron OVN driver added this entry to the OVN_Northbound database. The northbound database is the desired state of the system. As it gets changed, the rest of OVN gets to work to implement the change.

OVN Chassis

OVN has a second database, OVN_Southbound, that is used internally to track the current state of the system. The Chassis table of OVN_Southbound is used to keep track of the different hypervisors running ovn-controller and how to connect to them. When ovn-controller starts, it registers itself in this table.

(ovn-devstack-1)$ ovsdb-client dump OVN_Southbound
...
Chassis table
_uuid                                encaps                                 gateway_ports name                                  
------------------------------------ -------------------------------------- ------------- --------------------------------------
4979df20-56a8-4c74-a499-d2409acb05cc [a59cbc44-b998-4a13-98b4-bc02c79d4d1e] {}            "2a33c976-54ec-4f62-878e-863eea3edcf5"
f58f3955-3dc4-4f79-8d4d-e0250a01a850 [d1e19aec-0e8a-4338-b1d7-eb83dfe197e8] {}            "b29ae352-588f-45bc-aefe-ba15bf2f889b"

Encap table
_uuid                                ip              options type  
------------------------------------ --------------- ------- ------
a59cbc44-b998-4a13-98b4-bc02c79d4d1e "172.16.189.10" {}      geneve
d1e19aec-0e8a-4338-b1d7-eb83dfe197e8 "172.16.189.6"  {}      geneve
...

OVN Bindings

As logical ports get added to OVN_Northbound, the ovn-northd service creates entries in the Binding table of OVN_Southbound. This table is used to keep track of which physical chassis a logical port resides on. At first, the chassis column is empty. Once ovn-controller sees a port plugged into the local br-int with an iface-id that matches a logical port, ovn-controller will update the chassis column of that logical port’s Binding row to reflect that the port resides on that chassis.

(ovn-devstack-1)$ ovsdb-client dump OVN_Southbound
...
Binding table
_uuid                                chassis                                logical_datapath                     logical_port                           mac                   parent_port tag tunnel_key
------------------------------------ -------------------------------------- ------------------------------------ -------------------------------------- --------------------- ----------- --- ----------
977a249e-3ec4-4d7c-a7bb-e751415ee4b1 ""                                     f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "a5b967d4-296e-44dc-98b9-7336d0224e57" ["fa:16:3e:8c:d0:a8"] []          []  3         
2e213a46-e52c-4e46-ac48-2de9bc5c56a4 "2a33c976-54ec-4f62-878e-863eea3edcf5" bd7dbbf9-1325-491f-b46b-80b4ecfc560b "10964198-b218-417e-a59e-6a6d7096c936" ["fa:16:3e:50:01:91"] []          []  6         
4cd9430e-0735-4e63-b5b2-25a666649f4e "b29ae352-588f-45bc-aefe-ba15bf2f889b" bd7dbbf9-1325-491f-b46b-80b4ecfc560b "381a2d96-bc4a-4785-82bc-4f2b48e007e8" ["fa:16:3e:76:12:96"] []          []  1         
23fc5778-5a00-4e3f-b1e7-4c37c999a378 "b29ae352-588f-45bc-aefe-ba15bf2f889b" bd7dbbf9-1325-491f-b46b-80b4ecfc560b "a7a8ee94-996e-4623-941c-1ef7b7862f6e" ["fa:16:3e:24:46:3a"] []          []  5         
744a141e-8f02-4783-8cbe-af13992f74f7 "b29ae352-588f-45bc-aefe-ba15bf2f889b" bd7dbbf9-1325-491f-b46b-80b4ecfc560b "b2e0ae9e-d472-42ed-8776-6b338349d01d" ["fa:16:3e:b7:cd:77"] []          []  4         
225cc721-7c59-4466-872b-02f5e61efe56 "b29ae352-588f-45bc-aefe-ba15bf2f889b" bd7dbbf9-1325-491f-b46b-80b4ecfc560b "f24756f3-d803-47d3-9fc7-1315f4071ac0" ["fa:16:3e:b1:34:ed"] []          []  2         
...

OVN Pipeline

Another function of the ovn-northd service is defining the contents of the Pipeline table in the OVN_Southbound database. Each row in the Pipeline table represents a logical flow. ovn-controller on each chassis is responsible for converting the logical flows into OpenFlow flows appropriate for that node. We will go through annotated Pipeline contents for the current configuration. The output has been reordered to make it easier to follow. It’s sorted by datapath (the logical switch the flows are associated with), then table_id, then priority.

The Pipeline table has a similar format to OpenFlow. For each logical datapath (logical switch), processing starts at the highest priority match in table 0. A complete description of the syntax for the Pipeline table can be found in the ovn-sb document.

(ovn-devstack-1)$ ovsdb-client dump OVN_Southbound
...
Pipeline table
_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
...

Table 0 starts by dropping anything with an invalid source MAC address. It also says to drop anything with a logical vlan tag, because there’s no concept of logical vlans.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
9bbe8795-d093-4c9b-a712-1c7b8e953ae7 "drop;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b "eth.src[40]"                                                                           100      0       
71af3be1-b0b7-4e5a-aadb-c510907bfabd "drop;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b vlan.present                                                                            100      0       

The next 5 rows correspond to the five logical ports on this logical network. If the packet came in from one of the logical ports and its source MAC address is one that is allowed, processing will continue in table 1.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
914eab89-b327-4a7d-ad88-08d2e4be104c "next;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b "inport == \"10964198-b218-417e-a59e-6a6d7096c936\" && eth.src == {fa:16:3e:50:01:91}"  50       0       
2f80e9b3-a2db-4160-9d94-598960160cfb "next;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b "inport == \"381a2d96-bc4a-4785-82bc-4f2b48e007e8\" && eth.src == {fa:16:3e:76:12:96}"  50       0       
a761a821-9cc3-49d1-a00b-b2eabd242480 "next;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b "inport == \"a7a8ee94-996e-4623-941c-1ef7b7862f6e\" && eth.src == {fa:16:3e:24:46:3a}"  50       0       
6b0675e9-9c09-48c3-8cbd-c0da0fd9f608 "next;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b "inport == \"b2e0ae9e-d472-42ed-8776-6b338349d01d\" && eth.src == {fa:16:3e:b7:cd:77}"  50       0       
b6fb9eb2-3047-4231-9131-88f34d56ff77 "next;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b "inport == \"f24756f3-d803-47d3-9fc7-1315f4071ac0\" && eth.src == {fa:16:3e:b1:34:ed}"  50       0       

Finally, if the packet did not patch any higher priority flows, it just gets dropped.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
14e02175-d6ab-4a7c-bf30-7774ecf8074c "drop;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b "1"                                                                                     0        0       

The highest priority flow in table 1 matches packets with a broadcast destination MAC address. In that case, processing continues in table 2 several times (once for each logical port on this network) with the outport variable set.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
d4adaf36-f63f-4c04-b37b-63399b5b9459 "outport = \"10964198-b218-417e-a59e-6a6d7096c936\"; next; outport = \"381a2d96-bc4a-4785-82bc-4f2b48e007e8\"; next; outport = \"f24756f3-d803-47d3-9fc7-1315f4071ac0\"; next; outport = \"b2e0ae9e-d472-42ed-8776-6b338349d01d\"; next; outport = \"a7a8ee94-996e-4623-941c-1ef7b7862f6e\"; next;" bd7dbbf9-1325-491f-b46b-80b4ecfc560b "eth.dst[40]"                                                                           100      1       

The next 5 flows match when the destination MAC address is a MAC address assigned to one of the logical ports. In that case, the outport variable gets set and processing continues in table 2.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
22e05122-a917-4fd5-804f-b8dc6e48d334 "outport = \"10964198-b218-417e-a59e-6a6d7096c936\"; next;"                                                                                                                                                                                                                                         bd7dbbf9-1325-491f-b46b-80b4ecfc560b "eth.dst == fa:16:3e:50:01:91"                                                          50       1       
b444b9f2-4924-4d4c-bc92-c44b86fb7fc2 "outport = \"381a2d96-bc4a-4785-82bc-4f2b48e007e8\"; next;"                                                                                                                                                                                                                                         bd7dbbf9-1325-491f-b46b-80b4ecfc560b "eth.dst == fa:16:3e:76:12:96"                                                          50       1       
a2fe2a1a-6b44-4255-bda0-0b665c2bfafc "outport = \"a7a8ee94-996e-4623-941c-1ef7b7862f6e\"; next;"                                                                                                                                                                                                                                         bd7dbbf9-1325-491f-b46b-80b4ecfc560b "eth.dst == fa:16:3e:24:46:3a"                                                          50       1       
834f9f4b-999b-4445-b7ae-3e912c97cbe7 "outport = \"b2e0ae9e-d472-42ed-8776-6b338349d01d\"; next;"                                                                                                                                                                                                                                         bd7dbbf9-1325-491f-b46b-80b4ecfc560b "eth.dst == fa:16:3e:b7:cd:77"                                                          50       1       
dcd4f6c2-627a-4441-927a-feedcf2295cb "outport = \"f24756f3-d803-47d3-9fc7-1315f4071ac0\"; next;"                                                                                                                                                                                                                                         bd7dbbf9-1325-491f-b46b-80b4ecfc560b "eth.dst == fa:16:3e:b1:34:ed"                                                          50       1       

Table 2 does nothing important in this environment. It will eventually be used to implement ACLs. In the context of Neutron, security groups will get translated into OVN ACLs and those ACLs will be reflected by flow entries in this table.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
325b6008-7cfd-480f-8b15-d9d42dfff567 "next;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b "1"                                                                                     0        2       

Table 3 is the final table. The first flow matches a broadcast destination MAC address. The action is output;, which means to output the packet to the logical port identified by the outport variable.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
70c9b568-0e8d-4592-904d-4f5b0c7ca606 "output;"                                                                                                                                                                                                                                                                                           bd7dbbf9-1325-491f-b46b-80b4ecfc560b "eth.dst[40]"                                                                           100      3       

The following 5 flows are associated with the 5 logical ports on this network. They will match if the outport variable matches a logical port and the destination MAC address is in the set of allowed MAC addresses.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
1d6f7f7f-f3ac-409c-9dda-f55b0fd4c6da "output;"                                                                                                                                                                                                                                                                                           bd7dbbf9-1325-491f-b46b-80b4ecfc560b "outport == \"10964198-b218-417e-a59e-6a6d7096c936\" && eth.dst == {fa:16:3e:50:01:91}" 50       3       
730a39c1-93ea-41a1-81e2-a8f08f011981 "output;"                                                                                                                                                                                                                                                                                           bd7dbbf9-1325-491f-b46b-80b4ecfc560b "outport == \"381a2d96-bc4a-4785-82bc-4f2b48e007e8\" && eth.dst == {fa:16:3e:76:12:96}" 50       3       
9b83c397-1d24-4446-a201-c56eec8cb9ba "output;"                                                                                                                                                                                                                                                                                           bd7dbbf9-1325-491f-b46b-80b4ecfc560b "outport == \"a7a8ee94-996e-4623-941c-1ef7b7862f6e\" && eth.dst == {fa:16:3e:24:46:3a}" 50       3       
f37cc0d7-e69a-4f45-b8ef-595c25b5c62b "output;"                                                                                                                                                                                                                                                                                           bd7dbbf9-1325-491f-b46b-80b4ecfc560b "outport == \"b2e0ae9e-d472-42ed-8776-6b338349d01d\" && eth.dst == {fa:16:3e:b7:cd:77}" 50       3       
5ec90c56-fe85-4199-ad30-f0f32ee2b8da "output;"                                                                                                                                                                                                                                                                                           bd7dbbf9-1325-491f-b46b-80b4ecfc560b "outport == \"f24756f3-d803-47d3-9fc7-1315f4071ac0\" && eth.dst == {fa:16:3e:b1:34:ed}" 50       3       

All of the flows above are associated with the private network. These flows follow the same pattern, but are for the public network.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
e9e8854e-cb44-437d-84b1-21fec5ce2929 "drop;"                                                                                                                                                                                                                                                                                             f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "eth.src[40]"                                                                           100      0       
859cf482-2119-43f6-8c41-f91bae89759a "drop;"                                                                                                                                                                                                                                                                                             f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 vlan.present                                                                            100      0       
47cdc942-0274-40d0-985c-5f218801adc7 "next;"                                                                                                                                                                                                                                                                                             f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "inport == \"a5b967d4-296e-44dc-98b9-7336d0224e57\" && eth.src == {fa:16:3e:8c:d0:a8}"  50       0       
5b80edbe-a556-4ae4-abc2-3708706c0c2a "drop;"                                                                                                                                                                                                                                                                                             f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "1"                                                                                     0        0       
bf9659cb-2632-4c37-ac7a-b960b1f168ac "outport = \"a5b967d4-296e-44dc-98b9-7336d0224e57\"; next;"                                                                                                                                                                                                                                         f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "eth.dst[40]"                                                                           100      1       
06684b55-3623-474d-8053-48febf7716f6 "outport = \"a5b967d4-296e-44dc-98b9-7336d0224e57\"; next;"                                                                                                                                                                                                                                         f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "eth.dst == fa:16:3e:8c:d0:a8"                                                          50       1       
dc42b637-e9f3-49ed-b158-cd71026ee021 "next;"                                                                                                                                                                                                                                                                                             f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "1"                                                                                     0        2       
1ba4eb72-9b66-4250-991b-6431b3360fce "output;"                                                                                                                                                                                                                                                                                           f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "eth.dst[40]"                                                                           100      3       
37827ec2-c7c6-4e20-b2b7-77f16db4d3d3 "output;"                                                                                                                                                                                                                                                                                           f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "outport == \"a5b967d4-296e-44dc-98b9-7336d0224e57\" && eth.dst == {fa:16:3e:8c:d0:a8}" 50       3       

The Integration Bridge

Part of the configuration for ovn-controller is the integration bridge to use for all of its configuration.  By default, this is br-int.  Let’s start by looking at the configuration of br-int on ovn-devstack-2, as it is a bit simpler than ovn-devstack-1.

(ovn-devstack-2)$ ovs-vsctl show
a70d8333-9b36-4765-8eb2-a91a3d5833f8
    Bridge br-int
        fail_mode: secure
        Port br-int
            Interface br-int
                type: internal
        Port "tap10964198-b2"
            Interface "tap10964198-b2"
        Port "ovn-b29ae3-0"
            Interface "ovn-b29ae3-0"
                type: geneve
                options: {key=flow, remote_ip="172.16.189.6"}

The port tap10964198-b2 is the port associated with VM running on this compute node (test2, 10.0.0.4). The other port, ovn-b29ae3-0, is for sending packets over a geneve tunnel to ovn-devstack-1.

Now we can look at the configuration of br-int on the other host, ovn-devstack-1. The setup is very similar, except it has some additional ports that are associated with the default Neutron setup done by DevStack.

(ovn-devstack-1)$ ovs-vsctl show
197d2a0d-c85d-4113-94bb-bd836ef03970
    Bridge br-int
        fail_mode: secure
        Port "qr-f24756f3-d8"
            Interface "qr-f24756f3-d8"
                type: internal
        Port "tapa7a8ee94-99"
            Interface "tapa7a8ee94-99"
        Port "ovn-2a33c9-0"
            Interface "ovn-2a33c9-0"
                type: geneve
                options: {key=flow, remote_ip="172.16.189.10"}
        Port "qr-b2e0ae9e-d4"
            Interface "qr-b2e0ae9e-d4"
                type: internal
        Port br-int
            Interface br-int
                type: internal
        Port "tap381a2d96-bc"
            Interface "tap381a2d96-bc"
                type: internal
    Bridge br-ex
        Port br-ex
            Interface br-ex
                type: internal
        Port "qg-a5b967d4-29"
            Interface "qg-a5b967d4-29"
                type: internal

OpenFlow

ovn-controller on each compute node converts the logical pipeline into OpenFlow flows. The processing maps conceptually to what we went through for the Pipeline table. Here are the flows for br-int on ovn-devstack-1.

(ovn-devstack-1)$ sudo ovs-ofctl -O OpenFlow13 dump-flows br-int
OFPST_FLOW reply (OF1.3) (xid=0x2):
 cookie=0x0, duration=15264.413s, table=0, n_packets=28, n_bytes=3302, priority=100,in_port=1 actions=set_field:0x1->metadata,set_field:0x1->reg6,resubmit(,16)
 cookie=0x0, duration=15264.413s, table=0, n_packets=1797, n_bytes=294931, priority=100,in_port=2 actions=set_field:0x1->metadata,set_field:0x2->reg6,resubmit(,16)
 cookie=0x0, duration=15264.413s, table=0, n_packets=12857, n_bytes=1414286, priority=100,in_port=3 actions=set_field:0x1->metadata,set_field:0x4->reg6,resubmit(,16)
 cookie=0x0, duration=15264.413s, table=0, n_packets=1239, n_bytes=143548, priority=100,in_port=5 actions=set_field:0x1->metadata,set_field:0x5->reg6,resubmit(,16)
 cookie=0x0, duration=15264.413s, table=0, n_packets=20, n_bytes=1940, priority=50,tun_id=0x1 actions=output:1
 cookie=0x0, duration=15264.413s, table=0, n_packets=237, n_bytes=23848, priority=50,tun_id=0x2 actions=output:2
 cookie=0x0, duration=15264.413s, table=0, n_packets=14, n_bytes=1430, priority=50,tun_id=0x4 actions=output:3
 cookie=0x0, duration=15264.413s, table=0, n_packets=75, n_bytes=8516, priority=50,tun_id=0x5 actions=output:5
 cookie=0x0, duration=15264.413s, table=16, n_packets=0, n_bytes=0, priority=100,metadata=0x1,vlan_tci=0x1000/0x1000 actions=drop
 cookie=0x0, duration=15264.413s, table=16, n_packets=0, n_bytes=0, priority=100,metadata=0x2,vlan_tci=0x1000/0x1000 actions=drop
 cookie=0x0, duration=15264.413s, table=16, n_packets=0, n_bytes=0, priority=100,metadata=0x1,dl_src=01:00:00:00:00:00/01:00:00:00:00:00 actions=drop
 cookie=0x0, duration=15264.413s, table=16, n_packets=0, n_bytes=0, priority=100,metadata=0x2,dl_src=01:00:00:00:00:00/01:00:00:00:00:00 actions=drop
 cookie=0x0, duration=15264.413s, table=16, n_packets=28, n_bytes=3302, priority=50,reg6=0x1,metadata=0x1,dl_src=fa:16:3e:76:12:96 actions=resubmit(,17)
 cookie=0x0, duration=15264.413s, table=16, n_packets=1797, n_bytes=294931, priority=50,reg6=0x2,metadata=0x1,dl_src=fa:16:3e:b1:34:ed actions=resubmit(,17)
 cookie=0x0, duration=15264.413s, table=16, n_packets=0, n_bytes=0, priority=50,reg6=0x3,metadata=0x2,dl_src=fa:16:3e:8c:d0:a8 actions=resubmit(,17)
 cookie=0x0, duration=15264.413s, table=16, n_packets=12857, n_bytes=1414286, priority=50,reg6=0x4,metadata=0x1,dl_src=fa:16:3e:b7:cd:77 actions=resubmit(,17)
 cookie=0x0, duration=15264.413s, table=16, n_packets=1239, n_bytes=143548, priority=50,reg6=0x5,metadata=0x1,dl_src=fa:16:3e:24:46:3a actions=resubmit(,17)
 cookie=0x0, duration=15264.413s, table=16, n_packets=0, n_bytes=0, priority=50,reg6=0x6,metadata=0x1,dl_src=fa:16:3e:50:01:91 actions=resubmit(,17)
 cookie=0x0, duration=15264.413s, table=16, n_packets=0, n_bytes=0, priority=0,metadata=0x1 actions=drop
 cookie=0x0, duration=15264.413s, table=16, n_packets=0, n_bytes=0, priority=0,metadata=0x2 actions=drop
 cookie=0x0, duration=15264.413s, table=17, n_packets=12978, n_bytes=1420946, priority=100,metadata=0x1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=set_field:0x6->reg7,resubmit(,18),set_field:0x1->reg7,resubmit(,18),set_field:0x2->reg7,resubmit(,18),set_field:0x4->reg7,resubmit(,18),set_field:0x5->reg7,resubmit(,18)
 cookie=0x0, duration=15264.413s, table=17, n_packets=0, n_bytes=0, priority=100,metadata=0x2,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=set_field:0x3->reg7,resubmit(,18)
 cookie=0x0, duration=15264.413s, table=17, n_packets=7, n_bytes=552, priority=50,metadata=0x1,dl_dst=fa:16:3e:76:12:96 actions=set_field:0x1->reg7,resubmit(,18)
 cookie=0x0, duration=15264.413s, table=17, n_packets=1064, n_bytes=129938, priority=50,metadata=0x1,dl_dst=fa:16:3e:b1:34:ed actions=set_field:0x2->reg7,resubmit(,18)
 cookie=0x0, duration=15264.413s, table=17, n_packets=0, n_bytes=0, priority=50,metadata=0x2,dl_dst=fa:16:3e:8c:d0:a8 actions=set_field:0x3->reg7,resubmit(,18)
 cookie=0x0, duration=15264.413s, table=17, n_packets=0, n_bytes=0, priority=50,metadata=0x1,dl_dst=fa:16:3e:b7:cd:77 actions=set_field:0x4->reg7,resubmit(,18)
 cookie=0x0, duration=15264.413s, table=17, n_packets=1492, n_bytes=154092, priority=50,metadata=0x1,dl_dst=fa:16:3e:24:46:3a actions=set_field:0x5->reg7,resubmit(,18)
 cookie=0x0, duration=15264.413s, table=17, n_packets=380, n_bytes=150539, priority=50,metadata=0x1,dl_dst=fa:16:3e:50:01:91 actions=set_field:0x6->reg7,resubmit(,18)
 cookie=0x0, duration=15264.413s, table=18, n_packets=37895, n_bytes=4255421, priority=0,metadata=0x1 actions=resubmit(,19)
 cookie=0x0, duration=15264.413s, table=18, n_packets=0, n_bytes=0, priority=0,metadata=0x2 actions=resubmit(,19)
 cookie=0x0, duration=15264.413s, table=19, n_packets=34952, n_bytes=3820300, priority=100,metadata=0x1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,64)
 cookie=0x0, duration=15264.413s, table=19, n_packets=0, n_bytes=0, priority=100,metadata=0x2,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,64)
 cookie=0x0, duration=15264.413s, table=19, n_packets=7, n_bytes=552, priority=50,reg7=0x1,metadata=0x1,dl_dst=fa:16:3e:76:12:96 actions=resubmit(,64)
 cookie=0x0, duration=15264.413s, table=19, n_packets=1064, n_bytes=129938, priority=50,reg7=0x2,metadata=0x1,dl_dst=fa:16:3e:b1:34:ed actions=resubmit(,64)
 cookie=0x0, duration=15264.413s, table=19, n_packets=0, n_bytes=0, priority=50,reg7=0x3,metadata=0x2,dl_dst=fa:16:3e:8c:d0:a8 actions=resubmit(,64)
 cookie=0x0, duration=15264.413s, table=19, n_packets=0, n_bytes=0, priority=50,reg7=0x4,metadata=0x1,dl_dst=fa:16:3e:b7:cd:77 actions=resubmit(,64)
 cookie=0x0, duration=15264.413s, table=19, n_packets=1492, n_bytes=154092, priority=50,reg7=0x5,metadata=0x1,dl_dst=fa:16:3e:24:46:3a actions=resubmit(,64)
 cookie=0x0, duration=15264.413s, table=19, n_packets=380, n_bytes=150539, priority=50,reg7=0x6,metadata=0x1,dl_dst=fa:16:3e:50:01:91 actions=resubmit(,64)
 cookie=0x0, duration=15264.413s, table=64, n_packets=9, n_bytes=726, priority=100,reg6=0x1,reg7=0x1 actions=drop
 cookie=0x0, duration=15264.413s, table=64, n_packets=6, n_bytes=252, priority=100,reg6=0x2,reg7=0x2 actions=drop
 cookie=0x0, duration=15264.413s, table=64, n_packets=3238, n_bytes=356180, priority=100,reg6=0x4,reg7=0x4 actions=drop
 cookie=0x0, duration=15264.413s, table=64, n_packets=96, n_bytes=4818, priority=100,reg6=0x5,reg7=0x5 actions=drop
 cookie=0x0, duration=15264.413s, table=64, n_packets=0, n_bytes=0, priority=100,reg6=0x6,reg7=0x6 actions=drop
 cookie=0x0, duration=15264.413s, table=64, n_packets=12976, n_bytes=1420772, priority=50,reg7=0x1 actions=output:1
 cookie=0x0, duration=15264.413s, table=64, n_packets=14018, n_bytes=1549120, priority=50,reg7=0x2 actions=output:2
 cookie=0x0, duration=15264.413s, table=64, n_packets=96, n_bytes=4818, priority=50,reg7=0x4 actions=output:3
 cookie=0x0, duration=15264.413s, table=64, n_packets=4722, n_bytes=509392, priority=50,reg7=0x5 actions=output:5
 cookie=0x0, duration=15264.413s, table=64, n_packets=2734, n_bytes=409343, priority=50,reg7=0x6 actions=set_field:0x6->tun_id,output:4

And here are the flows for br-int on ovn-devstack-2

(ovn-devstack-2)$ sudo ovs-ofctl -O OpenFlow13 dump-flows br-int
OFPST_FLOW reply (OF1.3) (xid=0x2):
 cookie=0x0, duration=20967.205s, table=0, n_packets=304, n_bytes=31444, priority=100,in_port=2 actions=set_field:0x1->metadata,set_field:0x6->reg6,resubmit(,16)
 cookie=0x0, duration=20967.205s, table=0, n_packets=2674, n_bytes=292967, priority=50,tun_id=0x6 actions=output:2
 cookie=0x0, duration=83073.583s, table=16, n_packets=0, n_bytes=0, priority=100,metadata=0x2,dl_src=01:00:00:00:00:00/01:00:00:00:00:00 actions=drop
 cookie=0x0, duration=83073.582s, table=16, n_packets=0, n_bytes=0, priority=100,metadata=0x1,dl_src=01:00:00:00:00:00/01:00:00:00:00:00 actions=drop
 cookie=0x0, duration=83073.582s, table=16, n_packets=0, n_bytes=0, priority=100,metadata=0x2,vlan_tci=0x1000/0x1000 actions=drop
 cookie=0x0, duration=83073.582s, table=16, n_packets=0, n_bytes=0, priority=100,metadata=0x1,vlan_tci=0x1000/0x1000 actions=drop
 cookie=0x0, duration=83073.582s, table=16, n_packets=0, n_bytes=0, priority=50,reg6=0x4,metadata=0x1,dl_src=fa:16:3e:b7:cd:77 actions=resubmit(,17)
 cookie=0x0, duration=83073.582s, table=16, n_packets=0, n_bytes=0, priority=50,reg6=0x1,metadata=0x1,dl_src=fa:16:3e:76:12:96 actions=resubmit(,17)
 cookie=0x0, duration=83073.582s, table=16, n_packets=0, n_bytes=0, priority=50,reg6=0x3,metadata=0x2,dl_src=fa:16:3e:8c:d0:a8 actions=resubmit(,17)
 cookie=0x0, duration=83073.582s, table=16, n_packets=0, n_bytes=0, priority=50,reg6=0x2,metadata=0x1,dl_src=fa:16:3e:b1:34:ed actions=resubmit(,17)
 cookie=0x0, duration=21021.391s, table=16, n_packets=0, n_bytes=0, priority=50,reg6=0x5,metadata=0x1,dl_src=fa:16:3e:24:46:3a actions=resubmit(,17)
 cookie=0x0, duration=20968.863s, table=16, n_packets=304, n_bytes=31444, priority=50,reg6=0x6,metadata=0x1,dl_src=fa:16:3e:50:01:91 actions=resubmit(,17)
 cookie=0x0, duration=83073.583s, table=16, n_packets=0, n_bytes=0, priority=0,metadata=0x1 actions=drop
 cookie=0x0, duration=83073.582s, table=16, n_packets=0, n_bytes=0, priority=0,metadata=0x2 actions=drop
 cookie=0x0, duration=83073.583s, table=17, n_packets=14, n_bytes=1430, priority=100,metadata=0x1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=set_field:0x1->reg7,resubmit(,18),set_field:0x2->reg7,resubmit(,18),set_field:0x4->reg7,resubmit(,18),set_field:0x5->reg7,resubmit(,18)
 cookie=0x0, duration=83073.582s, table=17, n_packets=0, n_bytes=0, priority=100,metadata=0x2,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=set_field:0x3->reg7,resubmit(,18)
 cookie=0x0, duration=83073.583s, table=17, n_packets=223, n_bytes=22418, priority=50,metadata=0x1,dl_dst=fa:16:3e:b1:34:ed actions=set_field:0x2->reg7,resubmit(,18)
 cookie=0x0, duration=83073.583s, table=17, n_packets=6, n_bytes=510, priority=50,metadata=0x1,dl_dst=fa:16:3e:76:12:96 actions=set_field:0x1->reg7,resubmit(,18)
 cookie=0x0, duration=83073.582s, table=17, n_packets=0, n_bytes=0, priority=50,metadata=0x1,dl_dst=fa:16:3e:b7:cd:77 actions=set_field:0x4->reg7,resubmit(,18)
 cookie=0x0, duration=83073.582s, table=17, n_packets=0, n_bytes=0, priority=50,metadata=0x2,dl_dst=fa:16:3e:8c:d0:a8 actions=set_field:0x3->reg7,resubmit(,18)
 cookie=0x0, duration=21021.390s, table=17, n_packets=61, n_bytes=7086, priority=50,metadata=0x1,dl_dst=fa:16:3e:24:46:3a actions=set_field:0x5->reg7,resubmit(,18)
 cookie=0x0, duration=20968.863s, table=17, n_packets=0, n_bytes=0, priority=50,metadata=0x1,dl_dst=fa:16:3e:50:01:91 actions=set_field:0x6->reg7,resubmit(,18)
 cookie=0x0, duration=83073.583s, table=18, n_packets=0, n_bytes=0, priority=0,metadata=0x2 actions=resubmit(,19)
 cookie=0x0, duration=83073.582s, table=18, n_packets=346, n_bytes=35734, priority=0,metadata=0x1 actions=resubmit(,19)
 cookie=0x0, duration=83073.583s, table=19, n_packets=56, n_bytes=5720, priority=100,metadata=0x1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,64)
 cookie=0x0, duration=83073.582s, table=19, n_packets=0, n_bytes=0, priority=100,metadata=0x2,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,64)
 cookie=0x0, duration=83073.583s, table=19, n_packets=0, n_bytes=0, priority=50,reg7=0x4,metadata=0x1,dl_dst=fa:16:3e:b7:cd:77 actions=resubmit(,64)
 cookie=0x0, duration=83073.582s, table=19, n_packets=6, n_bytes=510, priority=50,reg7=0x1,metadata=0x1,dl_dst=fa:16:3e:76:12:96 actions=resubmit(,64)
 cookie=0x0, duration=83073.582s, table=19, n_packets=223, n_bytes=22418, priority=50,reg7=0x2,metadata=0x1,dl_dst=fa:16:3e:b1:34:ed actions=resubmit(,64)
 cookie=0x0, duration=83073.582s, table=19, n_packets=0, n_bytes=0, priority=50,reg7=0x3,metadata=0x2,dl_dst=fa:16:3e:8c:d0:a8 actions=resubmit(,64)
 cookie=0x0, duration=21021.390s, table=19, n_packets=61, n_bytes=7086, priority=50,reg7=0x5,metadata=0x1,dl_dst=fa:16:3e:24:46:3a actions=resubmit(,64)
 cookie=0x0, duration=20968.863s, table=19, n_packets=0, n_bytes=0, priority=50,reg7=0x6,metadata=0x1,dl_dst=fa:16:3e:50:01:91 actions=resubmit(,64)
 cookie=0x0, duration=83073.583s, table=64, n_packets=0, n_bytes=0, priority=100,reg6=0x4,reg7=0x4 actions=drop
 cookie=0x0, duration=83073.582s, table=64, n_packets=0, n_bytes=0, priority=100,reg6=0x1,reg7=0x1 actions=drop
 cookie=0x0, duration=83073.582s, table=64, n_packets=0, n_bytes=0, priority=100,reg6=0x2,reg7=0x2 actions=drop
 cookie=0x0, duration=20993.791s, table=64, n_packets=0, n_bytes=0, priority=100,reg6=0x5,reg7=0x5 actions=drop
 cookie=0x0, duration=20967.205s, table=64, n_packets=0, n_bytes=0, priority=100,reg6=0x6,reg7=0x6 actions=drop
 cookie=0x0, duration=83073.583s, table=64, n_packets=237, n_bytes=23848, priority=50,reg7=0x2 actions=set_field:0x2->tun_id,output:1
 cookie=0x0, duration=83073.583s, table=64, n_packets=20, n_bytes=1940, priority=50,reg7=0x1 actions=set_field:0x1->tun_id,output:1
 cookie=0x0, duration=83073.583s, table=64, n_packets=14, n_bytes=1430, priority=50,reg7=0x4 actions=set_field:0x4->tun_id,output:1
 cookie=0x0, duration=20993.791s, table=64, n_packets=75, n_bytes=8516, priority=50,reg7=0x5 actions=set_field:0x5->tun_id,output:1
 cookie=0x0, duration=20967.205s, table=64, n_packets=0, n_bytes=0, priority=50,reg7=0x6 actions=output:2

Future work for OVN+OpenStack

The OpenStack integration with OVN still makes use of the L3 and DHCP agents from Neutron. We expect that functionality to be implemented in OVN, instead. As its available in OVN, we will expand the Neutron integration to make use of it.

Support for Neutron security groups is not yet fully implemented. The plan is to use some brand new OVS conntrack functionality, which should offer much better performance as compared to how security groups are implemented with Neutron today using iptables. This functionality is targeted at the next OVS release (2.4). My understanding is that the code has been working for a while and is just working its way through the review process for both OVS and the upstream Linux kernel.

We have some initial CI testing in place. It installs OpenStack with OVN and makes sure it can all start up and accept the default configuration done by DevStack. We just turned on a new job that runs the tempest test suite but haven’t started working through making it pass correctly. Once the tempest job is passing successfully, I would like to look into a test configuration that uses 2 nodes so that we can regularly exercise the use of tunnels between hosts running ovn-controller.

OVN and OpenStack Status – 2015-04-21

It has been a couple weeks since the last OVN status update. Here is a review of what has happened since that time.

ovn-nbd is now ovn-northd

Someone pointed out that the acronym “nbd” is used for “Network Block Device” and may exist in the same deployment as OVN.  To avoid any possible confusion, we renamed ovn-nbd to ovn-northd.

ovn-controller now exists

ovn-controller is the daemon that runs on every hypervisor or gateway.  The initial version of this daemon has been merged.  The current version of ovn-controller performs two important functions.

First, ovn-controller populates the Chassis table of the OVN_Southbound database.  Each row in the Chassis table represents a hypervisor or gateway running ovn-controller.  It contains information that identifies the chassis and what encapsulation types it supports.  If you run ovs-sandbox with OVN support enabled, it will run the following commands to configure ovn-controller:

ovs-vsctl set open . external-ids:system-id=56b18105-5706-46ef-80c4-ff20979ab068
ovs-vsctl set open . external-ids:ovn-remote=unix:"$sandbox"/db.sock
ovs-vsctl set open . external-ids:ovn-encap-type=vxlan
ovs-vsctl set open . external-ids:ovn-encap-ip=127.0.0.1
ovs-vsctl add-br br-int

After setup is complete, we can check the OVN_Southbound table’s contents and see the corresponding Chassis entry:

Chassis table
_uuid                                encaps                                 gateway_ports name                                  
------------------------------------ -------------------------------------- ------------- --------------------------------------
2852bf00-db63-4732-8b44-a3bc689ed1bc [e1c1f7fc-409d-4f74-923a-fc6de8409f82] {}            "56b18105-5706-46ef-80c4-ff20979ab068"

Encap table
_uuid                                ip          options type 
------------------------------------ ----------- ------- -----
e1c1f7fc-409d-4f74-923a-fc6de8409f82 "127.0.0.1" {}      vxlan

The other important task performed by the current version of ovn-controller is to monitor the local switch for ports being added that match up to logical ports created in OVN.  When a port is created on the local switch with an iface-id that matches the OVN logical port’s name, ovn-controller will update the Bindings table to specify that the port exists on this chassis.  Once this is done, ovn-northd will report that the port is up to the OVN_Northbound database.

$ ovsdb-client dump OVN_Southbound
Bindings table
_uuid                                chassis                                logical_port                           mac parent_port tag
------------------------------------ -------------------------------------- -------------------------------------- --- ----------- ---
...
2dc299fa-835b-4e42-aa82-3d2da523b4d9 "81b0f716-c957-43cf-b34e-87ae193f617a" "d03aa502-0d76-4c1e-8877-43778088c55c" []  []          [] 
...

$ ovn-nbctl lport-get-up d03aa502-0d76-4c1e-8877-43778088c55c
up

The next steps for ovn-controller are to program the local switch to create tunnels and flows as appropriate based on the contents of the OVN_Southbound database.  This is currently being worked on.

The Pipeline Table

The OVN_Southbound database has a table called Pipeline.  ovn-northd is responsible for translating the logical network elements defined in OVN_Northbound into entries in the Pipeline table of OVN_Southbound.  The first version of populating the Pipeline table has been merged. One thing that is particularly interesting here is that ovn-northd defines logical flows.  It does not have to figure out the detailed switch configuration for every chassis running ovn-controller.  ovn-controller is responsible for translating the logical flows into OpenFlow flows specific to the chassis.

The OVN_Southbound documentation has a good explanation of the contents of the Pipeline table.  If you’re familiar with OpenFlow, the format will be very familiar.

As a simple example, let’s just use ovn-nbctl to manually create a single logical switch that has 2 logical ports.

ovn-nbctl lswitch-add sw0
ovn-nbctl lport-add sw0 sw0-port1 
ovn-nbctl lport-add sw0 sw0-port2 
ovn-nbctl lport-set-macs sw0-port1 00:00:00:00:00:01
ovn-nbctl lport-set-macs sw0-port2 00:00:00:00:00:02

Now we can check out the resulting contents of the Pipeline table.  The output of ovsdb-client has been reordered to group the entries by table_id and priority. I’ve also cut off the _uuid column since it’s not important for understanding here.

Pipeline table
match                          priority table_id actions                                                                 logical_datapath
------------------------------ -------- -------- ----------------------------------------------------------------------- ------------------------------------
"eth.src[40]"                  100      0        drop                                                                    843a9a4a-8afc-41e2-bea1-5fa58874e109
vlan.present                   100      0        drop                                                                    843a9a4a-8afc-41e2-bea1-5fa58874e109
"inport == \"sw0-port1\""      50       0        resubmit                                                                843a9a4a-8afc-41e2-bea1-5fa58874e109
"inport == \"sw0-port2\""      50       0        resubmit                                                                843a9a4a-8afc-41e2-bea1-5fa58874e109
"1"                            0        0        drop                                                                    843a9a4a-8afc-41e2-bea1-5fa58874e109

"eth.dst[40]"                  100      1        "outport = \"sw0-port2\"; resubmit; outport = \"sw0-port1\"; resubmit;" 843a9a4a-8afc-41e2-bea1-5fa58874e109
"eth.dst == 00:00:00:00:00:01" 50       1        "outport = \"sw0-port1\"; resubmit;"                                    843a9a4a-8afc-41e2-bea1-5fa58874e109
"eth.dst == 00:00:00:00:00:02" 50       1        "outport = \"sw0-port2\"; resubmit;"                                    843a9a4a-8afc-41e2-bea1-5fa58874e109

"1"                            0        2        resubmit                                                                843a9a4a-8afc-41e2-bea1-5fa58874e109

"outport == \"sw0-port1\""     50       3        "output(\"sw0-port1\")"                                                 843a9a4a-8afc-41e2-bea1-5fa58874e109
"outport == \"sw0-port2\""     50       3        "output(\"sw0-port2\")"                                                 843a9a4a-8afc-41e2-bea1-5fa58874e109

In table 0, we’re dropping anything with a broadcast/multicast source MAC. We’re also dropping anything with a logical VLAN tag, as that doesn’t make sense. Next, if the packet comes from one of the ports connected to the logical switch, we will continue processing in table 1. Otherwise, we drop it.

In table 1, we will output the packet to all ports if the destination MAC is broadcast/multicast. Note that the output action to the source port is implicitly handled as a drop. Finally, we’ll set the output variable based on destination MAC address and continue processing in table 2.

Table 2 does nothing but continue to table 3. In the ovn-northd code, table 2 is where entries for ACLs go. ovn-nbctl does not currently support adding ACLs. This table is where Neutron will program security groups, but that’s not ready yet, either.

Table 3 handles sending the packet to the right output port based on the contents of the outport variable set back in table 1.

The logical_datapath column ties all of these rows together as implementing a single logical datapath, which in this case is an OVN logical switch.

There is one other item supported by ovn-northd that is not reflected in this example. The OVN_Northbound database has a port_security column for logical ports. Its contents are defined as “A set of L2 (Ethernet) or L3 (IPv4 or IPv6) addresses or L2+L3 pairs from which the logical port is allowed to send packets and to which it is allowed to receive packets.” If this were set here, table 0 would also handle ingress port security and table 3 would handle egress port security.

We will look at more detailed examples in future posts as both OVN and its Neutron integration progress further.

Neutron Integration

There have also been several changes to the Neutron integration for OVN in the last couple of weeks.  Since ovn-northd and ovn-controller are becoming more functional, the devstack integration runs both of these daemons, along with ovsdb-server and ovs-vswitchd.  That means that as you create networks and ports via the Neutron API, they will be created in OVN and result in Bindings and Pipeline updates.

We now also have a devstack CI job that runs against every patch proposed to the OVN Neutron integration.  It installs and runs Neutron with OVN.  Devstack also creates some default networks.  We still have a bit more work to do in OVN before we can expand this to actually test network connectivity.

Also related to testing, Terry Wilson submitted a patch to OVS that will allow us to publish the OVS Python bindings to PyPI.  The patch has been merged and Terry will soon be publishing the code to PyPI.  This will allow us to install the library for unit test jobs.

The original Neutron ML2 driver implementation used ovn-nbctl.  It has now been converted to use the Python ovsdb library, which should be much more efficient.  neutron-server will maintain an open connection to the OVN_Northbound database for all of its operations.

I’ve also been working on the necessary changes for creating a port in Neutron that is intended to be used by a container running inside a VM.  There is a python-neutronclient change and two changes needed to networking-ovn that I’m still testing.

There are some edge cases where a resource can be created in Neutron but fail before we’ve created it in OVN.  Gal Sagie is working on some code to get them back in sync.

Gal Sagie also has a patch up for the first step toward security group support.  We have to document how we will map Neutron security groups to rules in the OVN_Northbound ACL table.

One piece of information that is communicated back up to the OVN_Northbound database by OVN is the up state of a logical port.  Terry Wilson is working on having our Neutron driver consume that so that we can emit a notification when a port that was created becomes ready for use.  This notification gets turned into a callback to Nova to tell it the VIF is ready for use so the corresponding VM can be started.

OVN and OpenStack Integration Development Update

The Open vSwitch project announced the OVN effort back in January.  After OVN was announced, I got very interested in its potential.  OVN is by no means tied to OpenStack, but the primary reason I’m interested is I see it as a promising open source backend for OpenStack Neutron.  To put it into context with existing Neutron code, it would replace the OVS agent in Neutron in the short term.  It would eventually also replace the L3 and DHCP agents once OVN gains the equivalent functionality.

Implementation has been coming along well in the last month, so I wanted to share an overview of what we have so far.  We’re aiming to have a working implementation of L2 connectivity by the OpenStack Vancouver Summit next month.

Design

The initial design documentation was merged at the end of February.  Here are the rendered versions of those docs: ovn-architecture, ovn-nb schema, ovn schema.

This initial design allows hooking up VMs or containers to OVN managed virtual networks.  There was an update to the design merged that addresses the use case of running containers inside of VMs.  It seems like most existing work just creates another layer of overlay networks for containers.  What’s interesting about this proposal is that it allows you to connect those containers directly to the OVN managed virtual networks.  In the OpenStack world, that means you could have your containers hooked up directly to virtual networks managed by Neutron.  Further, the container hosting VM and all of its containers do not have to be connected to the same network and this works without having to create an extra layer of overlay networks.

OVN Implementation

For most of my OVN development and testing, I’ve been working straight from the ovs git tree. Building it is something like:

$ git clone http://github.com/openvswitch/ovs.git
$ cd ovs

Switch to the ovn branch, as that’s where OVN development is happening for now:

$ git checkout ovn

You’ll need automake, autoconf, libtool, make, patch, and gcc or clang installed, at least. For detailed instructions on building ovs, see INSTALL.md in the ovs git tree.

$ ./boot.sh
$ ./configure
$ make

OVS includes a script called ovs-sandbox that I find very helpful for development. It sets up a dummy ovs environment that you can run the tools against, but it doesn’t actually process real traffic. You can send some fake packets through to see how they would be processed if needed. I’ve been adding OVN support to ovs-sandbox along the way.

Here’s a demonstration of ovs-sandbox with what is implemented in OVN so far.  Start by running ovs-sandbox with OVN support turned on:

$ make sandbox SANDBOXFLAGS="-o"

You’ll get output like this:

----------------------------------------------------------------------
You are running in a dummy Open vSwitch environment. You can use
ovs-vsctl, ovs-ofctl, ovs-appctl, and other tools to work with the
dummy switch.

Log files, pidfiles, and the configuration database are in the
"sandbox" subdirectory.

Exit the shell to kill the running daemons.

Now everything is running:

$ ps ax | grep ov[sn]
 ...
 ... ovsdb-server --detach --no-chdir --pidfile -vconsole:off --log-file --remote=punix:/home/rbryant/src/ovs/tutorial/sandbox/db.sock ovn.db ovnnb.db conf.db
 ... ovs-vswitchd --detach --no-chdir --pidfile -vconsole:off --log-file --enable-dummy=override -vvconn -vnetdev_dummy
 ... ovn-nbd --detach --no-chdir --pidfile -vconsole:off --log-file

Note the ovn-nbd daemon. Soon there will also be an ovn-controller daemon running. Also note that ovsdb-server is serving up 3 databases (ovn.db, ovnnb.db, and conf.db).

You can run ovn-nbctl to create resources via the OVN public interface (the OVN_Northbound database). So, for example:

$ ovn-nbctl lswitch-add sw0
$ ovn-nbctl lswitch-add sw1
$ ovn-nbctl lswitch-list
4956f6b4-a1ba-49aa-86a6-134b9cfdfdf6 (sw1)
52858b33-995f-43fa-a1cf-445f16d2ab09 (sw0)
$ ovn-nbctl lport-add sw0-port0 sw0
$ ovn-nbctl lport-add sw0-port1 sw0
$ ovn-nbctl lport-list sw0
d4d78dc5-166d-4457-8bb0-1f6ed5f1ed91 (sw0-port1)
c2114eaa-2f75-443f-b23e-6dda664a979b (sw0-port0)

One of the things that ovn-nbd does is create entries in the Bindings table of the OVN database when logical ports are added to the OVN_Northbound database. The Bindings table is used to keep track of which hypervisor a port exists on after VIFs get created and plugged into the local ovs switch. After the commands above, there should be 2 entries in the Bindings table. We can dump the OVN db and see that they are there:

$ ovsdb-client dump OVN
Bindings table
_uuid chassis logical_port mac parent_port tag
------------------------------------ ------- ------------ --- ----------- ---
997e0c14-2fba-499d-b077-26ddfc87e935 "" "sw0-port0" [] [] []
f7b61ef1-01d5-42ab-b08e-176bf6f3eb4b "" "sw0-port1" [] [] []

Note that the chassis column is empty, meaning that the port hasn’t been placed on a hypervisor yet.

We can also see that the state of the port is still down in the OVN_Northbound database since it hasn’t been created on a hypervisor yet.

$ ovn-nbctl lport-get-up sw0-port0
down

One of the tasks of ovn-controller running on each hypervisor is to monitor the local switch and detect when a new port on the local switch corresponds with an OVN logical port. When that occurs, ovn-controller will update the chassis column. For now, we can simulate that with a manual ovsdb transaction:

$ ovsdb-client transact '["OVN",{"op":"update","table":"Bindings","where":[["_uuid","==",["uuid","997e0c14-2fba-499d-b077-26ddfc87e935"]]],"row":{"chassis":"hostname"}}]'
[{"count":1}]
$ ovsdb-client dump OVN
Bindings table
_uuid chassis logical_port mac parent_port tag
------------------------------------ -------- ------------ --- ----------- ---
f7b61ef1-01d5-42ab-b08e-176bf6f3eb4b "" "sw0-port1" [] [] []
997e0c14-2fba-499d-b077-26ddfc87e935 hostname "sw0-port0" [] [] []

Now that the chassis column has been populated, ovn-nbd should notice and set the port state to up in the OVN_Northbound db.

$ ovn-nbctl lport-get-up sw0-port0
up

OpenStack Integration

Like with most OpenStack projects, you can try out the Neutron support for OVN using devstack.  Instructions for using the OVN devstack plugin are in the networking-ovn git repo.

You start by cloning both devstack and networking-ovn.

$ git clone http://git.openstack.org/openstack-dev/devstack.git
$ git clone http://git.openstack.org/stackforge/networking-ovn.git

If you don’t have any devstack configuration, you can use a sample local.conf from the networking-ovn repo:

$ cd devstack
$ cp ../networking-ovn/devstack/local.conf.sample local.conf

If you’re new to using devstack, it is best if you use a throwaway VM for this.  You will also need to run devstack with a sudo enabled user.  Once your configuration that enables OVN support is in place, run devstack:

$ ./stack.sh

In my case, I’m running this on Fedora 21.  It has also been tested on Ubuntu. Once devstack finishes running successfully, you should get output that looks like this:

This is your host ip: 192.168.122.31
Keystone is serving at http://192.168.122.31:5000/
The default users are: admin and demo
The password: password
2015-04-08 14:31:10.242 | stack.sh completed in 165 seconds.

One bit of environment initialization that devstack does is create some initial Neutron networks.  You can see them using the neutron command, which talks to the Neutron REST API.

$ . openrc
$ neutron net-list
+--------------------------------------+---------+--------------------------------------------------+
| id | name | subnets |
+--------------------------------------+---------+--------------------------------------------------+
| a28b651e-5cb9-481b-9f9b-d5d57e55c6d0 | public | df0aee67-166c-4ad4-890c-bbf5d02ca3cf |
| 2637f01e-f41e-4d1b-865f-195253027031 | private | eac6621f-e8cc-4c94-84bf-e73dab610018 10.0.0.0/24 |
+--------------------------------------+---------+--------------------------------------------------+

Since OVN is the configured backend, we can use the ovn-nbctl utility to verify that these networks were created in OVN.

$ ovn-nbctl lswitch-list
480235d0-d1a5-43a9-821b-d32e109445fd (neutron-2637f01e-f41e-4d1b-865f-195253027031)
a60a2c16-cea7-4bdc-8082-b47745d016b3 (neutron-a28b651e-5cb9-481b-9f9b-d5d57e55c6d0)
$ ovn-nbctl lswitch-get-external-id 480235d0-d1a5-43a9-821b-d32e109445fd
neutron:network_name=private
$ ovn-nbctl lswitch-get-external-id a60a2c16-cea7-4bdc-8082-b47745d016b3
neutron:network_name=public

We can also create ports using the Neutron API and verify that they get created in OVN. To do that, we first create a port in Neutron:

$ neutron port-create private
Created a new port:
+-----------------------+---------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+---------------------------------------------------------------------------------+
| admin_state_up | True |
| allowed_address_pairs | |
| binding:vnic_type | normal |
| device_id | |
| device_owner | |
| fixed_ips | {"subnet_id": "eac6621f-e8cc-4c94-84bf-e73dab610018", "ip_address": "10.0.0.3"} |
| id | ff07588c-4b11-4ec8-b7c5-1be64fc0ebac |
| mac_address | fa:16:3e:23:bd:f6 |
| name | |
| network_id | 2637f01e-f41e-4d1b-865f-195253027031 |
| security_groups | ab539a1c-c3d8-49f7-9ad1-3a8b451bce91 |
| status | DOWN |
| tenant_id | 64f29642350d4c978cf03a4917a35999 |
+-----------------------+---------------------------------------------------------------------------------+

Then we can list the logical ports in OVN for the logical switch associated with the Neutron network named private.  The output is the OVN UUID for the port followed by the port name in parentheses.  Neutron sets the port name equal to the UUID of the Neutron port.

$ ovn-nbctl lswitch-get-external-id 480235d0-d1a5-43a9-821b-d32e109445fd
neutron:network_name=private
$ ovn-nbctl lport-list 480235d0-d1a5-43a9-821b-d32e109445fd
...
fe959cfa-fd20-4129-9669-67af1fa6bbf7 (ff07588c-4b11-4ec8-b7c5-1be64fc0ebac)

We can also see that the port is down since it has not yet been plugged in to the local ovs switch on a hypervisor:

$ ovn-nbctl lport-get-up fe959cfa-fd20-4129-9669-67af1fa6bbf7
down

Ongoing Work

All OVN development discussion, patch submission, and patch review happens on the ovs-dev mailing list.  Development is currently happening in the ovn branch until things are further along.  Discussion about the OpenStack integration happens on the openstack-dev mailing list, while patch submission and review happens in OpenStack’s gerrit.

As mentioned earlier, the ovn-controller daemon is not yet running in this development environment.  That will change shortly as Justin Pettit posted it for review earlier this week.

As you might have noticed, there’s a lot of infrastructure in place, but the actual flows and tunnels necessary to implement these virtual networks are not yet in place.  There’s been a lot of work in preparation for that, though.  Ben Pfaff has had a patch series up for review for expression matching needed for OVN.  It probably should have been merged by now, but the reviews have been a little slow.  (That’s my guilt talking.)  Ben has also started working on making ovn-nbd populate the Pipeline table of the OVN database.

Finally, the proposed OVN design introduces some new demands on ovsdb-server.  In particular, there will easily be hundreds of instances of ovn-controller connected to ovsdb-server.  Andy Zhou has been doing some very nice work around increasing performance in anticipation of these new demands.

Implementation of Pacemaker Managed OpenStack VM Recovery

I’ve discussed the use of Pacemaker as a method to detect compute node failures and recover the VMs that were running there.  The implementation of this is ready for testing.  Details can be found in this post to rdo-list.

The post mentions one pending enhancement to Nova that would improve things further:

Currently fence_compute loops, waiting for nova to recognise that the failed host is down, before we make a host-evacuate call which triggers nova to restart the VMs on another host. The discussed nova API extensions will speed up recovery times by allowing fence_compute to proactively push that information into nova instead.

The issue here is that the default backend for Nova’s servicegroup API relies on the nova-compute service to periodically check in to the Nova database to indicate that it is still running.  The delay in the recovery process is caused by Nova waiting on a configured timeout since the last time the service checked in.  Pacemaker is going to know about the failure much sooner, so it would be helpful if there was an API to tell Nova “trust me, this node is gone”.  This proposed spec intends to provide such an API.

The Different Facets of OpenStack HA

Last October, I wrote about a particular aspect of providing HA for workloads running on OpenStack. The HA problem space for OpenStack is much more broad than what was addressed there. There has been a lot of work around HA for the OpenStack services themselves. The problems for OpenStack services seem to be pretty well understood. There are reference architectures with detailed instructions available from vendors and distributions that have been integrated into deployment tools. The upstream OpenStack project also has an HA guide that covers this topic.

Another major area that has received less attention and effort is HA for compute workloads running on top of OpenStack. Requirements are very diverse in this area, so I’d like to provide some broader context around this topic and expand on my thoughts about useful work that can be done either in or around OpenStack to better support legacy workloads.

Approaches to Recovery

My original thinking around all of this was that we should be building infrastructure that can automatically handle the recovery in response to failures.  It turns out that there is also a lot of demand for the ability to completely custom manage the recovery process.  This was particularly made clear to me while discussing availability requirements for NFV workloads on OpenStack with participants in the OPNFV project.  With that said, we need to think about all failure types in two ways.

  1. Completely Automated Recovery – In this case, I envision a use wanting to enable an option that says “keep my VM running”.  The rest would be automatically handled by the cloud infrastructure.
  2. Custom Recovery – In this case, a set of applications has more strict requirements around failure detection (or even prediction) and availability.  They want to be notified of failures and be given the APIs necessary to implement their own recovery.

Types of Failures

We can break down the types of failures into 3 major categories that require different work for an OpenStack deployment.

  1. Failure of Infrastructure – This type of failure is when the hardware infrastructure support OpenStack workloads fails. A hardware failure on a given hypervisor is a prime example.
  2. Failure of the Guest Operating System – In this case, the base operating system in the VM fails for some reason.  Imagine a kernel panic in your VM that causes the VM to stop running, even though the hypervisor is still operating just fine.
  3. Failure of the Application – Something at the application layer may also fail.

Now let’s look at each failure type and consider the work that could be done to support each approach to recovery.

Infrastructure Failure

Failure of the infrastructure is what I wrote about last October.  I discussed the use of some recent enhancements to Pacemaker that would allow Pacemaker to monitor compute nodes.  This makes a lot of sense as Pacemaker is often already included in the HA architecture for the underlying OpenStack services.  There has since been work in cooperation with the Pacemaker team to build a proof-of-concept implementation of this approach.  Details for review and experimentation will be released soon.

The focus so far has been on completely automating the recovery process.  In particular that means issuing a nova evacuate API call for all of the instances that were running on the dead node.  However, this same architecture can be adapted to support the custom recovery approach.

Instead of fencing the node and issuing an evacuate, Pacemaker could optionally fence the node and emit a notification about the failure.  This notification could be put on the existing OpenStack notification message bus.  A deployment could choose to write something that consumes these notifications.  Another option could be to enhance Ceilometer to consume these notifications and turn it into an alarm that notifies an API consumer that the host that was running a VM has failed.

Guest Operating System Failure

The libvirt/KVM driver in OpenStack already has great support for handling this type of failure and providing automated recovery.  In particular, you can set the “hw:watchdog_action” property in either the extra_specs field of a flavor or as an image property.  When this property is set, a watchdog device will be configured for the VM. If the value for this property is set to “reset”, the VM will automatically be rebooted if the guest operating system crashes, triggering the watchdog.

To support the custom recovery case, we could add a new watchdog_action called “notify”.  In this case, Nova could simply emit a notification to the OpenStack notification message bus.  Again, a deployment could have a custom application that consumes these notifications or a service like Ceilometer could turn it into something consumable by a public REST API.

Application Failure

In my opinion, providing an answer for application failures is the least important part of HA from an OpenStack perspective.  Dealing with this is not new for legacy applications.  It was necessary before running on OpenStack and there are a lot of solutions available.  For example, it’s possible to run a virtual Pacemaker cluster inside a set of VMs just like you would on a physical deployment.  This has not always been the case, though.  When we looked at this in much earlier days of OpenStack (pre-Neutron), it was much more difficult to accomplish.  Neutron gives you enough control so that IP addresses can be predictable before creating the VMs, making it easy to get all of the addresses needed for Pacemaker configuration into the VM at the time it’s created.

There is still some discussion about coming up with a cloud native approach to application monitoring and recovery handling.  Heat is often brought up in this context.  There was a thread on the openstack-dev list this past December about this.  It seems there is potential there, but it’s unclear if it’s really high enough on anyone’s priority list to work on.

Pulling Things Together

A major point with all of this is that there are different layers failures can happen and there is not one solution for handling them all.  Breaking the problem space into pieces helps make it more manageable.  Once broken down, it seems clear to me that there is very reasonable work that can be done to enhance the ability of OpenStack to run traditional “pet” workloads.  Once these solutions are further along, it will be important to figure out how they integrate with deployment and management solutions, but let’s start with just making it work.

OpenStack Instance HA Proposal

In a perfect world, every workload that runs on OpenStack would be a cloud native application that is horizontally scalable and fault tolerant to anything that may cause a VM to go down.  However, the reality is quite different.  We continue to see a high demand for support of traditional workloads running on top of OpenStack and the HA expectations that come with them.

Traditional applications run on top of OpenStack just fine for the most part.  Some applications come up with availability requirements that a typical OpenStack deployment will not provide automatically.  If a hypervisor goes down, there is nothing in place that tries to rescue VMs that were running there.  There are some features in place that allow manual rescue, but it requires manual intervention from a cloud operator or an external orchestration tool.

This proposal discusses what it would take to provide automated detection of a failed hypervisor and the recovery of the VMs that were running there.  There are some differences to the solution based on what hypervisor you’re using.  I’m primarily concerned with libvirt/KVM, so I assume that for the rest of this post.  Except where libvirt is specifically mentioned, I think everything applies just as well to the use of the xenserver driver.

This topic is raised on a regular basis in the OpenStack community.  There has been pushback against putting this functionality directly in OpenStack.  Regardless of what components are used, I think we need to provide an answer to the question of how this problem should be approached.  I think this is quite achievable today using existing software.

Scope

This proposal is specific to recovery from infrastructure failures.  There are other types of failures that can affect application availability.  The guest operating system or the application itself could fail.  Recovery from these types of failures is primarily left up to the application developer and/or deployer.

It’s worth noting that the libvirt/KVM driver in OpenStack does contain one feature related to guest operating system failure.  The libvirt-watchdog blueprint was implemented in the Icehouse release of Nova.  This feature allows you to set the hw_watchdog_action property on either the image or flavor.  Valid values include poweroff, reset, pause, and none.  When this is enabled, libvirt will enable the i6300esb watchdog device for the guest and will perform the requested action if the watchdog is triggered.  This may be a helpful component of your strategy for recovery from guest failures.

Architecture

A solution to this problem requires a few key components:

  1. Monitoring – A system to detect that a hypervisor has failed.
  2. Fencing – A system to fence failed compute nodes.
  3. Recovery – A system to orchestrate the rescue of VMs from the failed hypervisor.

Monitoring

There are a two main requirements for the monitoring component of this solution.

  1. Detect that a host has failed.
  2. Trigger an automatic response to the failure (Fencing and Recovery).

It’s often suggested that the solution for this problem should be a part of OpenStack.  Many people have suggested that all of this functionality should be built into Nova.  The problem with putting it in Nova is that it assumes that Nova has proper visibility into the health of the infrastructure that Nova itself is running on.  There is a servicegroup API that does very basic group membership.  In particular, it keeps track of active compute nodes.  However, at best this can only tell you that the nova-compute service is not currently checking in.  There are several potential causes for this that would still leave the guest VMs running just fine.  Getting proper infrastructure visibility into Nova is really a layering violation.  Regardless, it would be a significant scope increase for Nova, and I really don’t expect the Nova team to agree to it.

It has also been proposed that this functionality be added to Heat.  The most fundamental problem with that is that a cloud user should not be required to use Heat to get their VM restarted if something fails.  There have been other proposals to use other (potentially new) OpenStack components for this.  I don’t like that for many of the same reasons I don’t think it should be in Nova.  I think it’s a job for the infrastructure supporting the OpenStack deployment, not OpenStack itself.

Instead of trying to figure out which OpenStack component to put it in, I think we should consider this a feature provided by the infrastructure supporting an OpenStack deployment.  Many OpenStack deployments already use Pacemaker to provide HA for portions of the deployment.  Historically, there have been scaling limits in the cluster stack that made Pacemaker not an option for use with compute nodes since there’s far too many of them.  This limitation is actually in Corosync and not Pacemaker itself.  More recently, Pacemaker has added a new feature called pacemaker_remote, which allows a host to be a part of a Pacemaker cluster, without having to be a part of a Corosync cluster.  It seems like this may be a suitable solution for OpenStack compute nodes.

Many OpenStack deployments may already be using a monitoring solution like Nagios for their compute nodes.  That seems reasonable, as well.

Fencing

To recap, fencing is an operation that completely isolates a failed node.  It could be IPMI based where it ensures that the failed node is powered off, for example.  Fencing is important for several reasons.  There are many ways a node can fail, and we must be sure that the node is completely gone before starting the same VM somewhere else.  We don’t want the same VM running twice.  That is certainly not what a user expects.  Worse, since an OpenStack deployment doing automatic evacuation is probably using shared storage, running the same VM twice can result in data corruption, as two VMs will be trying to use the same disks.  Another problem would be having the same IPs on the network twice.

A huge benefit of using Pacemaker for this is that it has built-in integration with fencing, since it’s a key component of any proper HA solution.  If you went with Nagios, fencing integration may be left up to you to figure out.

Recovery

Once a failure has been detected and the compute node has been fenced, the evacuation needs to be triggered.  To recap, evacuation is restarting an instance that was running on a failed host by moving it to another host.  Nova provides an API call to evacuate a single instance.  For this to work properly, instance disks should be on shared storage.  Alternatively, they could all be booted from Cinder volumes.  Interestingly, the evacuate API will still run even without either of these things.  The result is just a new VM from the same base image but without any data from the old one.  The only benefit then is that you get a VM back up and running under the same instance UUID.

A common use case with evacuation is “evacuate all instances from a given host”.  Since this is common enough, it was scripted as a feature in the novaclient library.  So, the monitoring tool can trigger this feature provided by novaclient.

If you want this functionality for all VMs in your OpenStack deployment, then we’re in good shape.  Many people have made the additional request that users should be able to request this behavior on a per-instance basis.  This does indeed seem reasonable, but poses an additional question.  How should we let a user indicate to the OpenStack deployment that it would like its instance automatically recovered?

The typical knobs used are image properties and flavor extra-specs.  That would certainly work, but it doesn’t seem quite flexible enough to me.  I don’t think a user should have to create a new image to mark it as “keep this running”.  Flavor extra-specs are fine if you want this for all VMs of a particular flavor or class of flavors.  In either case, the novaclient “evacuate a host” feature would have to be updated to optionally support it.

Another potential solution to this is by using a special tag that would be specified by the user.  There is a proposal up for review right now to provide a simple tagging API for instances in Nova.  For this discussion, let’s say the tag would be automatic-recovery.  We could also update the novaclient feature we’re using with support for “evacuate all instances on this host that have a given tag”.  The monitoring tool would trigger this feature and ask novaclient to evacuate a host of all VMs that were tagged with automatic-recovery.

Conclusions and Next Steps

Instance HA is clearly something that many deployments would like to provide.  I believe that this could be put together for a deployment today using existing software, Pacemaker in particular.  A next step here is to provide detailed information on how to set this up and also do some testing.

I expect that some people might say, “but I’m already using system Foo (Nagios or whatever) for monitoring my compute nodes”.  You could go this route, as well.  I’m not sure about fencing integration with something like Nagios.  If you skip the use of fencing in this solution, you get to keep the pieces when it breaks.  Aside from that, your monitoring system could trigger the evacuation functionality of novaclient just like Pacemaker would.

Some really nice future development around this would be integration into an OpenStack management UI.  I’d like to have a dashboard of my deployment that shows me any failures that have occurred and what responses have been triggered.  This should be possible since pcsd offers a REST API (WIP) that could export this information.

Lastly, it’s worth thinking about this problem specifically in the context of TripleO.  If you’re deploying OpenStack with OpenStack, should the solution be different?  In that world, all of your baremetal nodes are OpenStack resources via Ironic.  Ceilometer could be used to monitor the status of those resources.  At that point, OpenStack itself does have enough information about the supporting infrastructure to perform this functionality.  Then again, instead of trying to reinvent all of this in OpenStack, we could just use the more general Pacemaker based solution there, as well.

PTLs and Project Success in OpenStack

We’re in the middle of another PTL change cycle.  Nominations have occurred over the last week.  We’ve also seen several people step down this cycle (Keystone, TripleO, Cinder, Heat, Glance).  This is becoming a regular thing in OpenStack.  The PTL position for most projects has changed hands over time.  The Heat project changes every cycle.  Nova has its 3rd PTL from a 3rd company (about to enter his 2nd cycle).  With all of this change, some people express some amount of discomfort and concern.

I think the change that we see is quite healthy.  This is our open governance model working well.  We should be thrilled that OpenStack is healthy enough that we don’t rely on any one person to move forward.

I’d like to thank everyone who steps up to fill a PTL position.  It is a huge commitment.  The recognition you get is great, but it’s really hard work.  It’s quite a bit more than just technical leadership.  It also involves project management and community management.  It’s a position subject to a lot of criticism and a lot of the work is thankless.  So, thank you very much to those that are serving as a PTL or have served as one in the past.

It’s quite important that everyone also realize that it takes a lot more than a PTL to make a project successful.  Our project governance and culture includes checks and balances.  There is a Technical Committee (TC) that is responsible for ensuring that OpenStack development overall remains healthy.  Some good examples of TC influence on projects would be the project reviews the TC has been doing over the last cycle, working with projects to apply course corrections (Neutron, Trove, Ceilometer, Horizon, Heat, Glance).  Most importantly, the PTL still must work for consensus of the project’s members (though is empowered to make the final call if necessary).

As a contributor to an OpenStack project, there is quite a bit you can do to help ensure project success beyond just writing code.  Here are some of those things:

Help the PTL

While the PTL is held accountable for the project, they do not have to be responsible for all of the work that needs to get done.  The larger projects have started officially delegating responsibilities.  There is talk about formalizing aspects of this, so take a look at that thread for examples.

If you’re involved in a project, you should work to understand these different aspects of keeping the project running.  See if there’s an area you can help out with.  This stuff is critically important.

If you aspire to be a PTL at some point in the future, I would say getting involved in these areas is the best way you can grow your skills, influence, and visibility in the project to make yourself a strong candidate in the future.

Participate in Project Discussions

The direction of a project is a result of many discussions.  You should be aware of these and participate in them as much as you can.  Watch the openstack-dev mailing list for discussions affecting your project.  It’s quite surprising how many people may be a core reviewer for a project but rarely participate in discussions on the mailing list.

Most projects also have an IRC channel.  Quite a bit of day-to-day discussion happens there.  This can be difficult due to time zones or other commitments.  Just join when you can.  If it’s time zone compatible, you should definitely make it a priority to join weekly project IRC meetings.  This is an important time to discuss current happenings in the project.

Finally, attend and participate in the design summit.  This is the time that projects try to sync up on goals for the cycle.  If you really want to play a role in affecting project direction and ensuring success, it’s important that you attend if possible.  Of course, there are many legitimate reasons some people may not be able to travel and those should be understood and respected by fellow project members.

Also keep in mind that project discussions span more than the project’s technical issues.  There are often project process and structure issues to work out.  Try to raise your awareness of these issues, provide input, and propose new ideas if you have them.  Some good recent examples of contributors doing this would be Daniel Berrange putting forth a detailed proposal to split out the virt drivers from nova, or Joe Gordon and John Garbutt pushing forward on evolving the blueprint handling process.

Do the Dirty Work

Most contributors to OpenStack are contributing on behalf of an employer.  Those employers have customers (which may be internal or external) and those customers have requirements.  It’s understandable that some amount of development time goes toward implementing features or fixing problems that are important to those customers.

It’s also critical that everyone understands that there is a good bit of common work that must get done.  If you want to build goodwill in a project while also helping its success, help in these areas.  Some of those include:

See You in Paris!

I hope this has helped some contributors think of new ways of helping ensuring the success of OpenStack projects.  I hope to see you on the mailing list, IRC, and in Paris!

OpenStack Board Meeting – 2014-09-18

I’m not a member of the OpenStack board, but the board meetings are open with the exception of the occasional Executive Session for topics that really do need to be private.  I attended the meeting on September 18, 2014.  Jonathan Bryce has since posted a summary of the meeting, but I wanted to share some additional commentary.

Mid-Cycle Meetups

Rob Hirschfeld raised the topic of mid-cycle meetups.  This wasn’t discussed for too long in the meeting.  The specific request was that the foundation staff evaluate what kind of assistance the foundation could and should be providing these events.  So far they are self-organized by teams within the OpenStack project.

These have been increasing in popularity.  The OpenStack wiki lists 12 such events during the Juno development cycle.  Some developers involved in cross project efforts attended several events.  This increase in travel has resulted in some tension in the community, as well.  Some people have legitimate personal reasons for not wanting the additional travel burden.  Others mention the increased impact to company travel budgets.  Overall, the majority opinion is that the events are incredibly productive and useful.

One of the most insightful outcomes of the discussions about meetups has been that the normal design summit is no longer serving its original purpose well enough.  The design summit is supposed to be the time that we sync on project goals.  We need to work on improving the design summit so that it better achieves that goal so that the mid-cycle meetups can be primarily focused on working sessions to execute on the plan from the design summit.

There was a thread on the openstack-dev list discussing improvements we can make to the design summit based on mid-cycle meetup feedback.

Foundation Platinum Membership

Jonathan raised a topic about how the foundation and board should fill a platinum sponsor spot that is opening up soon.  The discussion was primarily about the process the board should use for handling this.

This may actually be quite interesting to watch.  Nebula is stepping down from its platinum sponsor spot.  There is a limit of 8 platinum sponsors.  This is the first time since the foundation launched that a platinum sponsor spot has been opened.  OpenStack has grown an enormous amount since the foundation launched, so it’s reasonable to expect that there will be contention around this spot.

Which companies will apply for the spot?  How will the foundation and board handle contention?  And ultimately, which company will be the new platinum sponsor of the foundation?

DefCore

This topic took up the majority of the board meeting and was quite eventful.  Early in the discussion there was concern about the uncertainty caused by the length of the process so far.  There seemed to be general agreement that this is a real concern.  Naturally, the conversation proceeded to how to move forward.

The goal of DefCore has been around defining a single minimal definition to support the single OpenStack Powered trademark program.  The most important outcome from this discussion was that the board consensus seemed to be that this is causing far too much contention and that the DefCore committee should work with the foundation to propose a small set of trademark programs instead of trying to have a single one that covered all cases.

There were a couple of different angles discussed on creating additional trademark programs.  The first is about functionality.  So far, everything has been included in one program.  There was discussion of separate compute, storage, and networking trademark programs, for example.

Another angle that seemed quite likely based on the discussion was having OpenStack powered vs. OpenStack compatible trademark programs.  An OpenStack compatible program would only focus on compatibility with capabilities.  The OpenStack powered trademarks would include both capabilities and required code (designated sections).  The nice thing about this angle is that it would allow the board to press forward on the compatibility marks in support of the goal of interoperability, without getting held up by the debate around designated sections.

For more information on this, follow the ongoing discussion on the defcore-committee mailing list.

Next Board Meeting

The next OpenStack board meeting will be Nov. 2, 2014 in Paris, just before the OpenStack Summit.  There may also be an additional board meeting to continue the DefCore discussion.  Watch the foundation list for any news on additional meetings.

How to use the telephone (Phone Etiquette, or How I Learned to Love the Mute Button)

Many of us spend a lot of time on conference calls. Dan Smith and I recently put together an internal wiki page with some helpful information for our fellow conference call attendees. Dan wrote the instructions and I provided some additional visual aids.  Here is the content of that page. Feel free to share this information with anyone that you feel may benefit.

How to use the telephone

  1. Dial the number you wish to call
  2. Press the mute button
  3. If you wish to talk, place your finger on your mute button and press it, but keep your finger poised over the button
  4. Speak
  5. When finished speaking press the mute button before you return your finger to your keyboard
  6. If the call is continuing, go to step 3
  7. If the call is finished, hang up the telephone

Additional Tips

  1. Try to avoid using soft phones. They’re horrible and they often introduce echo and/or noise for everyone else on the call
  2. Try to avoid using cell phones. They’re horrible and they often introduce echo and/or noise for everyone else on the call
  3. Use a headset. Speakerphones aren’t as good as you think they are. If you have to use a speakerphone, be even more vigilant about your mute button.
  4. Stick to landline or proper voip phones whenever possible. Always use your mute button when you’re not actually speaking.
  5. If you’re using something that resembles but poorly emulates a real telephone, use the *6 mute feature as it mutes all the audio coming from your line, something that your emulated phone may not do well.
  6. Keep track of your mute status. It’s not that hard. It’s either on or off. Don’t announce to everyone that you forgot whether you were muted or not. If you need help determining if you’re muted or not try the following procedure below.
  7. Hold is not mute. Don’t use your phone’s hold function in any way when you’re connected to a conference. Your phone or phone system may play music or beep periodically into the conference, making it unusable until you return.

Steps to determine if you’re muted

  1. If you’re not talking right now, you’re muted (see step 3 in the How to use a telephone instructions)
  2. There is no step two. If you messed up step one and think you need a step two, review step one.

 Helpful Images

A phone

Note: This is a Polycom IP 335. Your phone may differ.

polycom_soundpoint_ip_335

 

Location of Mute Button

It’s the single big red button.  Note that your phone should remain black and NOT turn grey while locating the mute button.

polycom_soundpoint_ip_335_mute_location

 

Light On While Muted

Note that the phone should remain black while muted.  It will not turn grey.

 

polycom_soundpoint_ip_335_mute_light

Juno Preview for OpenStack Compute (Nova)

We’re now well into the Juno release cycle. Here’s my take on a preview of some of what you can expect in Juno for Nova.

NFV

One area receiving a lot of focus this cycle is NFV. We’ve started an upstream NFV sub-team for OpenStack that is tracking and helping to drive requirements and development efforts in support of NFV use cases. If you’re not familiar with NFV, here’s a quick overview that was put together by the NFV sub-team:

NFV stands for Network Functions Virtualization. It defines the
replacement of usually stand alone appliances used for high and low
level network functions, such as firewalls, network address translation,
intrusion detection, caching, gateways, accelerators, etc, into virtual
instance or set of virtual instances, which are called Virtual Network
Functions (VNF). In other words, it could be seen as replacing some of
the hardware network appliances with high-performance software taking
advantage of high performance para-virtual devices, other acceleration
mechanisms, and smart placement of instances. The origin of NFV comes
from a working group from the European Telecommunications Standards
Institute (ETSI) whose work is the basis of most current
implementations. The main consumers of NFV are Service providers
(telecommunication providers and the like) who are looking to accelerate
the deployment of new network services, and to do that, need to
eliminate the constraint of slow renewal cycle of hardware appliances,
which do not autoscale and limit their innovation.

NFV support for OpenStack aims to provide the best possible
infrastructure for such workloads to be deployed in, while respecting
the design principles of a IaaS cloud. In order for VNF to perform
correctly in a cloud world, the underlying infrastructure needs to
provide a certain number of functionalities which range from scheduling
to networking and from orchestration to monitoring capacities. This
means that to correctly support NFV use cases in OpenStack,
implementations may be required across most, if not all, main OpenStack
projects, starting with Neutron and Nova.

The opportunities for OpenStack in the NFV space are huge. The work being tracked by the NFV sub-team spans more than just Nova, but here are some of the NFV related projects for Nova:

Upgrades

The road to live upgrades has been a long one. Progress has been made over the last several releases. The Icehouse release was the first release that supported live upgrades in some form. From Havana to Icehouse, you can do a control plane upgrade with some API downtime without having to upgrade your compute nodes at the same time. You can roll through upgrading the compute nodes with the control plane already upgraded to Icehouse.

For Juno we are continuing to improve on this in several areas. Since Nova is a highly distributed system, one of the biggest requirements for doing this is versioning everything about the interactions between components. First we went through and versioned all of the interfaces between components. Next we have been versioning all of the data passed between components. This versioning of the data is part of what Nova Objects provide. Nova Objects are an internal implementation detail, but are critical to upgrade support. The entire code base had not been converted as of the Icehouse release. For Juno we continue to do conversions over to this new object model.

The other major improvement being looked at this release cycle is how we can reduce the downtime needed on the control plane by reducing how long it takes for database schema migrations to run. This is largely about developing new best practices about how migrations need to work going forward.

Finally, for the Icehouse release we added some basic testing of the live upgrade sceanario to the OpenStack CI system. This testing runs OpenStack using the previous release and then upgrades everything except the nova-compute service. At that point, everything should continue to work. One goal for the Juno cycle is to improve this testing to verify that we can also run an older instance of the nova-network service with an upgraded control plane. This is critical for deployments that use nova-network in multi-host mode. In that case, you have nova-network running on each compute node, so we need to support a mixed version environment for nova-network, as well as nova-compute.

Scheduler

There’s always a lot of interest in improving the way host scheduling works in Nova. In the Icehouse cycle we identified that we wanted to split the scheduler out into a new project (codenamed Gantt). Doing so requires decoupling Nova’s scheduler as much as possible from the rest of Nova. This decoupling effort is the primary goal for the Juno cycle. Once the scheduler is independent of Nova, we can investigate ways to integrate other projects so that scheduling can use information that currently only lives in other projects such as Neutron or Cinder.

Docker

The Docker driver for Nova was moved to Stackforge during the Icehouse development cycle. The primary reason was the lack of CI running for the driver. However, there were a number of feature gaps that made getting CI with tempest working as it needed to. Moving to stackforge gave an opportunity for the team working on this driver to iterate quicker and fill these gaps.

There has been a lot of progress on the Docker driver in the Juno cycle. Some of the feature gap work has resulting in improvements to Docker itself, which is really great to see. For example, Docker now supports pause and unpause, which is a feature of the Nova API that the Docker driver is now able to support. Another area that has seen some focus is Cinder support. To make this work, we have to be able to support exposing block devices to Docker containers at creation time, as well as later on after they are already running. There has been work on Docker itself in this area, which will eventually lead to support in the Nova Docker driver.

Finally, there has been ongoing work to get CI with tempest running. It’s now running in OpenStack’s CI infrastructure. The progress is great to see, but it also seems most likely that the driver will return to Nova in the K release cycle instead of Juno.

Ironic

Nova introduced the baremetal driver in the Grizzly release.  This driver allows you to use Nova’s API to do provisioning of bare metal instead of virtual machines.  There was immediately a lot of interest in this functionality for OpenStack.  Soon after this driver was introduced, it was decided that we should start a new project dedicated to bare metal management.  That project is Ironic.

Ironic has come a long way since then.  The project is currently incubated and could potentially graduate for the K release.  One of the major tasks in moving towards graduation is getting the Ironic driver for Nova merged.  The spec has been approved and the code seems to be in good shape.  I’m very hopeful that we will have this step completed in the Juno release.

Database Integration

OpenStack has been a long time user of the SQLAlchemy library for its integration with relational databases.  More recently, some OpenStack projects have begun using Alembic for managing database schema migrations.  Michael Bayer, author of SQLAlchemy and Alembic, recently joined Red Hat to help with OpenStack, as well as continue to maintain SQLAlchemy and Alembic.  He has been surveying OpenStack’s current usage of SQLAlchemy and identifying areas where we can improve.  He has written up a fascinating wiki page with his findings.  I expect this to result in some very nice improvements to many OpenStack projects, including Nova.

Other

There are many other features being worked on right now for Nova. The best place to get an idea of what’s going on is to look at either the list of approved design specs or the list of specs under review.