Deployment Guide

Switch Hardware Selection

We have verified and therefore recommend using the switch model listed in Aether-verified Switch Hardware. Other Stratum-enabled switches listed in White Box Switch Hardware should also work in theory but more integration work may be required.

To use the P4 UPF, you must use fabric switches based on the Intel (formerly Barefoot) Tofino chipset. There are two variants of this switching chipset, with different resources and capabilities. The Dual Pipe Tofino ASIC is less expensive, while the Quad Pipe Tofino ASIC has more chip resources and a faster embedded system with more memory and storage.

The P4 UPF and SD-Fabric features run within the constraints of the Dual Pipe system for production deployments, but for development of features in P4, the larger capacity of the Quad Pipe is desirable.

These switches feature 32 QSFP+ ports capable of running in 100GbE, 40GbE, or 4x 10GbE mode (using a split DAC or fiber cable) and have a 1GbE management network interface.

See also the Rackmount of Equipment for how the Fabric switches should be rack-mounted to ensure proper airflow within a rack.

Deployment Overview

SD-Fabric is released with Helm chart and container images. We recommend using Kubernetes and Helm to deploy SD-Fabric. Here’s a list of high level steps required to deploy SD-Fabric:

Provision switch

We first need to install operating system with Docker and Kubernetes on the bare-metal switches.
Prepare switches as special Kubernetes nodes

Kubernetes label and taint are used to configure switches as special Kubernetes worker nodes. This is to make sure we deploy Stratum (and only Stratum) on switches.
Prepare ONOS network configuration

Network configuration defines properties such as switch pipeconf, subnet and VLAN.
Prepare Stratum chassis configuration for each switch

Chassis config defines switch properties such as port speed and breakout.
Install SD-Fabric using Helm

Finally, we are going to install SD-Fabric with the information we prepared in Step 1 to 5.

Step 1: Provision Switches

We follow Open Network Install Environment (ONIE) way to install Open Network Linux (ONL) image to switch. To work with the SD-Fabric environment, we have customized the ONL image to support related packages and dependencies.

Image source file can be found on ONF repository opennetworkinglab/OpenNetworkLinux. You can also download pre-compiled artifacts from Github Release page

Note

If you’re not familiar with ONIE/ONL environment, please check Getting Started to see how to install the ONL image to an ONIE supported switch.

Below is an example about how to install the ONL image.

1. Prepare a server which is accessible by the switch and then download the pre-compiled installer from the release page.

wget https://github.com/opennetworkinglab/OpenNetworkLinux/releases/download/v1.4.3/ONL-onf-ONLPv2_ONL-OS_2021-07-16.2159-5195444_AMD64_INSTALLED_INSTALLER -o onl-installer
sudo python -m http.server 80

Reboot the switch to enter ONIE installation mode

In order to reinstall an ONL image, you must change the ONIE bootloader to “Rescue Mode”.

Once the switch is powered on, it should retrieve an IP address on the OpenBMC interface with DHCP. Here we use 10.0.0.131 as an example. OpenBMC uses these default credentials

username: root
password: 0penBmc

Login to OpenBMC with SSH:

$ ssh root@10.0.0.131
The authenticity of host '10.0.0.131 (10.0.0.131)' can't be established.
ECDSA key fingerprint is SHA256:...
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.0.131' (ECDSA) to the list of known hosts.
root@10.0.0.131's password:
root@bmc:~#

Using the Serial-over-LAN Console, enter ONL

root@bmc:~# /usr/local/bin/sol.sh
You are in SOL session.
Use ctrl-x to quit.
-----------------------

root@onl:~#

Note

If sol.sh is unresponsive, please try to restart the mainboard with

root@onl:~# wedge_power.sh reset

Change the boot mode to rescue mode and reboot

root@onl:~# onl-onie-boot-mode rescue
[1053033.768512] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
[1053033.936893] EXT4-fs (sda3): re-mounted. Opts: (null)
[1053033.996727] EXT4-fs (sda3): re-mounted. Opts: (null)
The system will boot into ONIE rescue mode at the next restart.

root@onl:~# reboot

At this point, ONL will go through it’s shutdown sequence and ONIE will start. If it does not start right away, press the Enter/Return key a few times - it may show you a boot selection screen. Pick ONIE and Rescue if given a choice.

Install ONL installer

Now that the switch is in Rescue mode

Then run the onie-nos-install command, with the URL of the management server (here we use 10.0.0.129 as an example) on the management network segment

ONIE:/ # onie-nos-install http://10.0.0.129/onie-installer
discover: Rescue mode detected. No discover stopped.
ONIE: Unable to find 'Serial Number' TLV in EEPROM data.
Info: Fetching http://10.0.0.129/onie-installer ...
Connecting to 10.0.0.129 (10.0.0.129:80)
installer            100% |*******************************|   322M  0:00:00 ETA
ONIE: Executing installer: http://10.0.0.129/onie-installer
installer: computing checksum of original archive
installer: checksum is OK
...

The installation will now start, and then ONL will boot culminating in

Open Network Linux OS ONL-wedge100bf-32qs, 2020-11-04.19:44-64100e9

localhost login:

The default ONL login is::

username: root
password: onl

If you login, you can verify that the switch is getting it’s IP address via DHCP

root@localhost:~# ip addr
...
3: ma1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
      link/ether 00:90:fb:5c:e1:97 brd ff:ff:ff:ff:ff:ff
      inet 10.0.0.130/25 brd 10.0.0.255 scope global ma1
...

(Optional) Setup switch IP and hostname after the installation if DHCP is not available

Warning

Stop and return to Post-ONL configuration and continue the remaining steps there if you came from Aether docs. Otherwise, please continue the rest of the page here.

Step 2: Configure switches as special Kubernetes nodes

Our ONL version includes all packages required by running the Kubernetes on top of it. Once the Kubernetes is ready, the Stratum application will be deployed to the switch to manage it.

Unlike server, switch has less CPU and memory resources and we should avoid deploying unnecessary workloads into switch. Besides, the Stratum application should only be deployed to all switches.

To achieve the above goals, please apply the resources to your Kubernetes cluster.

Set up Label to all switch node, e.g node-role.kubernetes.io=switch
Set up Taint with NoSchedule to all switch node, e.g node-role.kubernetes.io=switch:NoSchedule
Properly configure the NodeSelector and Toleration when deploying Stratum via DaemonSet

Example of a five nodes Kubernetes cluster, two switches and three servers

╰─$ kubectl get node -o custom-columns=NAME:.metadata.name,TAINT:.spec.taints
NAME       TAINT
compute1   <none>
compute2   <none>
compute3   <none>
leaf1      [map[effect:NoSchedule key:node-role.kubernetes.io value:switch]]
leaf2      [map[effect:NoSchedule key:node-role.kubernetes.io value:switch]]
╰─$ kubectl get nodes -lnode-role.kubernetes.io=switch
NAME    STATUS   ROLES    AGE   VERSION
leaf1   Ready    worker   27d   v1.18.8
leaf2   Ready    worker   27d   v1.18.8

Step 3: Prepare ONOS network configuration

See Network Configuration for instructions

Step 4: Prepare Stratum chassis configuration

See See Stratum Chassis Configuration for instructions

Step 5: Install SD-Fabric with Helm

To install SD-Fabric into your Kubernetes cluster, follow instructions described on the SD-Fabric Helm Chart README