Deployment Guide

Switch Hardware Selection

We have verified and therefore recommend using the switch model listed in Aether-verified Switch Hardware. Other Stratum-enabled switches listed in White Box Switch Hardware should also work in theory but more integration work may be required.

To use the P4 UPF, you must use fabric switches based on the Intel (formerly Barefoot) Tofino chipset. There are two variants of this switching chipset, with different resources and capabilities. The Dual Pipe Tofino ASIC is less expensive, while the Quad Pipe Tofino ASIC has more chip resources and a faster embedded system with more memory and storage.

The P4 UPF and SD-Fabric features run within the constraints of the Dual Pipe system for production deployments, but for development of features in P4, the larger capacity of the Quad Pipe is desirable.

These switches feature 32 QSFP+ ports capable of running in 100GbE, 40GbE, or 4x 10GbE mode (using a split DAC or fiber cable) and have a 1GbE management network interface.

See also the Rackmount of Equipment for how the Fabric switches should be rack-mounted to ensure proper airflow within a rack.

Deployment Overview

SD-Fabric is released with Helm chart and container images. We recommend using Kubernetes and Helm to deploy SD-Fabric. Here’s a list of high level steps required to deploy SD-Fabric:

  1. Provision switch

    We first need to install operating system with Docker and Kubernetes on the bare-metal switches.

  2. Prepare switches as special Kubernetes nodes

    Kubernetes label and taint are used to configure switches as special Kubernetes worker nodes. This is to make sure we deploy Stratum (and only Stratum) on switches.

  3. Prepare ONOS network configuration

    Network configuration defines properties such as switch pipeconf, subnet and VLAN.

  4. Prepare Stratum chassis configuration for each switch

    Chassis config defines switch properties such as port speed and breakout.

  5. Install SD-Fabric using Helm

    Finally, we are going to install SD-Fabric with the information we prepared in Step 1 to 5.

Step 1: Provision Switches

We follow Open Network Install Environment (ONIE) way to install Open Network Linux (ONL) image to switch. To work with the SD-Fabric environment, we have customized the ONL image to support related packages and dependencies.

Image source file can be found on ONF repository opennetworkinglab/OpenNetworkLinux. You can also download pre-compiled artifacts from Github Release page

Note

If you’re not familiar with ONIE/ONL environment, please check Getting Started to see how to install the ONL image to an ONIE supported switch.

Below is an example about how to install the ONL image.

1. Prepare a server which is accessible by the switch and then download the pre-compiled installer from the release page.

wget https://github.com/opennetworkinglab/OpenNetworkLinux/releases/download/v1.4.3/ONL-onf-ONLPv2_ONL-OS_2021-07-16.2159-5195444_AMD64_INSTALLED_INSTALLER -o onl-installer
sudo python -m http.server 80
  1. Reboot the switch to enter ONIE installation mode

    In order to reinstall an ONL image, you must change the ONIE bootloader to “Rescue Mode”.

    Once the switch is powered on, it should retrieve an IP address on the OpenBMC interface with DHCP. Here we use 10.0.0.131 as an example. OpenBMC uses these default credentials

    username: root
    password: 0penBmc
    

    Login to OpenBMC with SSH:

    $ ssh root@10.0.0.131
    The authenticity of host '10.0.0.131 (10.0.0.131)' can't be established.
    ECDSA key fingerprint is SHA256:...
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added '10.0.0.131' (ECDSA) to the list of known hosts.
    root@10.0.0.131's password:
    root@bmc:~#
    

    Using the Serial-over-LAN Console, enter ONL

    root@bmc:~# /usr/local/bin/sol.sh
    You are in SOL session.
    Use ctrl-x to quit.
    -----------------------
    
    root@onl:~#
    

    Note

    If sol.sh is unresponsive, please try to restart the mainboard with

    root@onl:~# wedge_power.sh reset
    

    Change the boot mode to rescue mode and reboot

    root@onl:~# onl-onie-boot-mode rescue
    [1053033.768512] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
    [1053033.936893] EXT4-fs (sda3): re-mounted. Opts: (null)
    [1053033.996727] EXT4-fs (sda3): re-mounted. Opts: (null)
    The system will boot into ONIE rescue mode at the next restart.
    
    root@onl:~# reboot
    

    At this point, ONL will go through it’s shutdown sequence and ONIE will start. If it does not start right away, press the Enter/Return key a few times - it may show you a boot selection screen. Pick ONIE and Rescue if given a choice.

  2. Install ONL installer

    Now that the switch is in Rescue mode

    Then run the onie-nos-install command, with the URL of the management server (here we use 10.0.0.129 as an example) on the management network segment

    ONIE:/ # onie-nos-install http://10.0.0.129/onie-installer
    discover: Rescue mode detected. No discover stopped.
    ONIE: Unable to find 'Serial Number' TLV in EEPROM data.
    Info: Fetching http://10.0.0.129/onie-installer ...
    Connecting to 10.0.0.129 (10.0.0.129:80)
    installer            100% |*******************************|   322M  0:00:00 ETA
    ONIE: Executing installer: http://10.0.0.129/onie-installer
    installer: computing checksum of original archive
    installer: checksum is OK
    ...
    

    The installation will now start, and then ONL will boot culminating in

    Open Network Linux OS ONL-wedge100bf-32qs, 2020-11-04.19:44-64100e9
    
    localhost login:
    
    The default ONL login is::
    
    username: root
    password: onl
    

    If you login, you can verify that the switch is getting it’s IP address via DHCP

    root@localhost:~# ip addr
    ...
    3: ma1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
          link/ether 00:90:fb:5c:e1:97 brd ff:ff:ff:ff:ff:ff
          inet 10.0.0.130/25 brd 10.0.0.255 scope global ma1
    ...
    
  3. (Optional) Setup switch IP and hostname after the installation if DHCP is not available

Warning

Stop and return to Post-ONL configuration and continue the remaining steps there if you came from Aether docs. Otherwise, please continue the rest of the page here.

Step 2: Configure switches as special Kubernetes nodes

Our ONL version includes all packages required by running the Kubernetes on top of it. Once the Kubernetes is ready, the Stratum application will be deployed to the switch to manage it.

Unlike server, switch has less CPU and memory resources and we should avoid deploying unnecessary workloads into switch. Besides, the Stratum application should only be deployed to all switches.

To achieve the above goals, please apply the resources to your Kubernetes cluster.

  1. Set up Label to all switch node, e.g node-role.kubernetes.io=switch

  2. Set up Taint with NoSchedule to all switch node, e.g node-role.kubernetes.io=switch:NoSchedule

  3. Properly configure the NodeSelector and Toleration when deploying Stratum via DaemonSet

Example of a five nodes Kubernetes cluster, two switches and three servers

╰─$ kubectl get node -o custom-columns=NAME:.metadata.name,TAINT:.spec.taints
NAME       TAINT
compute1   <none>
compute2   <none>
compute3   <none>
leaf1      [map[effect:NoSchedule key:node-role.kubernetes.io value:switch]]
leaf2      [map[effect:NoSchedule key:node-role.kubernetes.io value:switch]]
╰─$ kubectl get nodes -lnode-role.kubernetes.io=switch
NAME    STATUS   ROLES    AGE   VERSION
leaf1   Ready    worker   27d   v1.18.8
leaf2   Ready    worker   27d   v1.18.8

Step 3: Prepare ONOS network configuration

See Network Configuration for instructions

Step 4: Prepare Stratum chassis configuration

See See Stratum Chassis Configuration for instructions

Step 5: Install SD-Fabric with Helm

To install SD-Fabric into your Kubernetes cluster, follow instructions described on the SD-Fabric Helm Chart README