Chapter 2. Control Plane Virtualization

The key factors driving the Juniper QFX5100 are the advent of virtualization and cloud computing; however, there are many facets to virtualization. One is decoupling the service from the physical hardware. When this is combined with orchestration and automation, the service is now said to be agile: it has the ability to be quickly provisioned, even within seconds. Another aspect is scale in the number of instances of the service. Because it becomes so easy to provision a service, the total number of instances quickly increases.

Compute virtualization is such a simple concept, yet it yields massive benefit to both the end user and operator. The next logical step is to apply the benefits of compute virtualization to the control plane of the network. After all, the control board is nothing but an x86 processor, memory, and storage.

The immediate benefit of virtualizing the control board might not be so obvious. Generally, operators like to toy around and create a virtual machine (VM) running Linux so that they’re able to execute operational scripts and troubleshoot. However, there is a much more exciting use case to virtualization of the control board. Traditionally, only networking equipment that was chassis-based was able to support two routing engines. The benefit of two routing engines is that it increases the high availability of the chassis and allows the operator to upgrade the control plane software in real time without traffic loss. This feature is commonly referred to as In-Service Software Upgrade (ISSU). One of the key requirements of ISSU is to have two routing engines that are synchronized using the Nonstop Routing (NSR), Nonstop Bridging (NSB), and Graceful Routing Engine Switchover (GRES) protocols. Fixed networking equipment such as top-of-rack (ToR) switches generally have only a single routing engine and do not support ISSU due to the lack of a second routing engine. Taking advantage of virtualization allows a ToR switch to have two virtualized routing engines that make possible features such as ISSU. The Juniper QFX5100 family takes virtualization to heart and uses the Linux kernel-based virtual machine (KVM) as the host operating system and places Junos, the network operating system, inside of a VM. When an operator wants to perform a real-time software upgrade, the Juniper QFX5100 switch will provision a second routing engine, synchronize the data, and perform the ISSU without dropping traffic.

Another great benefit of compute virtualization inside of a switch is that you can create user-defined VMs and run your own applications and programs on the switch. Use cases include Network Functional Virtualization (NFV), network management, and statistical reporting.

Architecture

Recall that the Juniper QFX5100 series is split into two major components (see Figure 2-1): the control board and switch board. The control board is the foundation for the control plane, whereas the switch board is the foundation for the data plane.

QFX5100 architecture
Figure 2-1. QFX5100 architecture

Focusing on the control board components, the blocks shaded in gray represent all of the roles in that architecture that are responsible for virtualizing the control plane. The control board is running commodity hardware that’s easily compatible with common hypervisors. The processor is an Intel 1.5 Ghz dual-core Sandy Bridge CPU, and there is 8 GB of memory and a 32 GB solid-state disk (SSD). The Juniper QFX5100 boots directly into CentOS Linux instead of Junos; this provides the platform with several advantages. The first advantage is the ability to virtualize the underlying hardware by using Linux KVM and QEMU; the second advantage is the ability to host operational daemons and Application Programming Interfaces (APIs) directly on the host operating system.

To make the management of the hypervisor easier, the virtualization library (libvir) is used to provision and manage the VMs. The libvir provides a normalized management framework across a set of hypervisors. The ability to use a common framework to control a hypervisor provides more flexibility in the future if any of the underlying components happen to change.

Host Operating System

As mentioned in the previous section, the Juniper QFX5100 boots directly into Linux, specifically CentOS. This provides the operating system and virtualization foundation for Junos and all other network-related functionality.

Let’s log in to the host operating system and do some exploring:

dhanks@qfx5100> request app-engine host-shell
Last login: Sun Nov 17 14:30:47 from 192.168.1.2
--- Host 13.2I20131114_1603_vsdk_build_30 built 2013-11-14 16:03:50 UTC

Now, let’s take a peek at the PCI bus and see what’s installed on the host operating system:

-sh-4.1# lspci
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller
(rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI
Express Root Port (rev 09)
00:01.1 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI
Express Root Port (rev 09)
00:01.2 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI
Express Root Port (rev 09)
00:06.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI
Express Root Port (rev 09)
00:1c.0 PCI bridge: Intel Corporation DH89xxCC PCI Express Root Port #1 (rev 08)
00:1c.1 PCI bridge: Intel Corporation DH89xxCC PCI Express Root Port #2 (rev 08)
00:1c.2 PCI bridge: Intel Corporation DH89xxCC PCI Express Root Port #3 (rev 08)
00:1c.3 PCI bridge: Intel Corporation DH89xxCC PCI Express Root Port #4 (rev 08)
00:1d.0 USB controller: Intel Corporation DH89xxCC USB2 Enhanced Host Controller #1 (rev 08)
00:1f.0 ISA bridge: Intel Corporation DH89xxCC LPC Controller (rev 08)
00:1f.2 SATA controller: Intel Corporation DH89xxCC 4 Port SATA AHCI Controller (rev 08)
00:1f.3 SMBus: Intel Corporation DH89xxCC SMBus Controller (rev 08)
00:1f.7 System peripheral: Intel Corporation DH89xxCC Watchdog Timer (rev 08)
01:00.0 Co-processor: Intel Corporation Device 0434 (rev 21)
01:00.1 Ethernet controller: Intel Corporation DH8900CC Series Gigabit Network (rev 21)
01:00.2 Ethernet controller: Intel Corporation DH8900CC Series Gigabit Network (rev 21)
01:00.3 Ethernet controller: Intel Corporation DH8900CC Series Gigabit Network (rev 21)
01:00.4 Ethernet controller: Intel Corporation DH8900CC Series Gigabit Network (rev 21)
07:00.0 Unassigned class [ff00]: Juniper Networks Device 0062 (rev 01)
08:00.0 Unassigned class [ff00]: Juniper Networks Device 0063 (rev 01)
09:00.0 Ethernet controller: Broadcom Corporation Device b854 (rev 02)

Pretty vanilla so far. Four CPUs, a USB port, a SATA controller, and some network interface controllers (NICs). But, the two Juniper Networks devices are interesting; what are they? These are the FPGA controllers that are responsible for the chassis fan, sensors, and other environmental functions.

The final device is the Broadcom 56850 chipset. The way a network operating system controls the Packet Forwarding Engine (PFE) is simply through a PCI interface by using a Software Development Kit (SDK).

Let’s take a closer look at the CPU:

-sh-4.1# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Pentium(R) CPU  @ 1.50GHz
stepping        : 7
cpu MHz         : 1500.069
cache size      : 3072 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm
constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni
pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 
sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow 
vnmi flexpriority ept vpid
bogomips        : 3000.13
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Pentium(R) CPU  @ 1.50GHz
stepping        : 7
cpu MHz         : 1500.069
cache size      : 3072 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm
constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni
pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 
sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow 
vnmi flexpriority ept vpid
bogomips        : 3000.13
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Pentium(R) CPU  @ 1.50GHz
stepping        : 7
cpu MHz         : 1500.069
cache size      : 3072 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm
constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni
pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 
sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow 
vnmi flexpriority ept vpid
bogomips        : 3000.13
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Pentium(R) CPU  @ 1.50GHz
stepping        : 7
cpu MHz         : 1500.069
cache size      : 3072 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 3
initial apicid  : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm
constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni
pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 
sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow 
vnmi flexpriority ept vpid
bogomips        : 3000.13
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

The CPU is a server-class Intel Xeon E3-1200 processor; it’s a single socket with four cores. There’s plenty of power to operate multiple VMs and the network operating system.

Now, let’s move on to the memory:

-sh-4.1# free
             total       used       free     shared    buffers     cached
Mem:       7529184    3135536    4393648          0     158820     746800
-/+ buffers/cache:    2229916    5299268
Swap:

After some of the memory has been reserved by other hardware and the kernel, you can see that we have about 7.3 GB total.

Next, let’s see how many disks there are and how they’re partitioned:

-sh-4.1# fdisk -l

Disk /dev/sdb: 16.0 GB, 16013852672 bytes
255 heads, 63 sectors/track, 1946 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000dea11

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1         125     1000000   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sdb2             125        1857    13914062+  83  Linux

Disk /dev/sda: 16.0 GB, 16013852672 bytes
255 heads, 63 sectors/track, 1946 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000d8b25

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1         125     1000000   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2             125        1857    13914062+  83  Linux

Disk /dev/mapper/vg0_vjunos-lv_junos_recovery: 4294 MB, 4294967296 bytes
255 heads, 63 sectors/track, 522 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/mapper/vg0_vjunos-lv_var: 11.3 GB, 11307843584 bytes
255 heads, 63 sectors/track, 1374 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/mapper/vg0_vjunos-lv_junos: 12.9 GB, 12884901888 bytes
255 heads, 63 sectors/track, 1566 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

The host system has two SSD storage devices, each with 16 GB of capacity. From the partition layout illustrated in Figure 2-2, you can see that we’re running the Linux Volume Manager (LVM).

Linux LVM and storage design
Figure 2-2. Linux LVM and storage design

There are two 16 GB SSDs, which are part of the Linux LVM. The primary volume group is vg0_vjunos. This volume group has three volumes that are used by Junos:

  • lv_junos_recovery

  • lv_var

  • lv_junos

Linux KVM

When the Juniper QFX5100 boots up, the host operating system is Linux. All of the control plane operations happen within the network operating system, Junos. The Juniper QFX5100 takes advantage of compute virtualization in the host operating system by using Linux KVM. A VM is created specifically for Junos. Given that KVM can create multiple VMs, the Juniper QFX5100 series has the ability to perform ISSU and support third-party VMs that can host additional services such as network management and monitoring.

virsh

The Juniper QFX5100 uses the libvir library as well as the libsh management user interface to interact with Linux KVM. If you’re familiar with libvir, walking around the virtualization capabilities of the Juniper QFX5100 will come as second nature. If you aren’t familiar with libvir, let’s use virsh to explore and see what’s happening under the hood.

The first thing we need to do is drop into the host shell from the Junos CLI:

dhanks@qfx5100> request app-engine host-shell
Last login: Sun Nov 17 14:30:47 from 192.168.1.2
--- Host 13.2I20131114_1603_vsdk_build_30 built 2013-11-14 16:03:50 UTC

Now, let’s take a look at the VMs installed in the Linux KVM:

-sh-4.1# virsh list --all
 Id    Name                           State
----------------------------------------------------
 1     vjunos0                        running

By default there’s a single VM running the Junos networking operating system. The VM’s name is vjunos0 with an ID of 1, and we can see that the state is running.

Hmm. Are you curious as to what version of the libvir library and QEMU the Juniper QFX5100 is using? Let’s find out:

-sh-4.1# virsh version
Compiled against library: libvir 0.9.10
Using library: libvir 0.9.10
Using API: QEMU 0.9.10
Running hypervisor: QEMU 0.12.1

At this point, let’s take a look at the overall host memory and CPU statistics:

-sh-4.1# virsh nodememstats
total  :              7269088 kB
free   :              4147596 kB
buffers:               264772 kB
cached :               761476 kB

-sh-4.1#
-sh-4.1# virsh nodecpustats

user:                305995340000000
system:              145678380000000
idle:              11460475070000000
iowait:                1075190000000
sdf

Now that we’re familiar with what the host system is capable of, software versions, and of course how many VMs are configured, let’s examine the Junos VM:

-sh-4.1# virsh dominfo vjunos0
Id:             1
Name:           vjunos0
UUID:           100e7ead-ae00-0140-0000-564a554e4f53
OS Type:        hvm
State:          running
CPU(s):         1
CPU time:       445895.2s
Max memory:     2000896 kB
Used memory:    2000896 kB
Persistent:     no
Autostart:      disable
Managed save:   no

Each VM has a unique identifier that can be used to refer to the VM. One of the more interesting attributes is the OS Type, which is set to hvm; this stands for Hardware Virtual Machine. Because Junos is based on FreeBSD and heavily modified to support network control plane functions, it’s difficult to say that it’s pure FreeBSD. Instead, the alternative is to use a vendor-neutral OS Type of hvm, which basically means that it’s an x86-based operating system.

Let’s focus on the memory and network settings for vjunos0:

-sh-4.1# virsh dommemstat vjunos0
rss 1895128

-sh-4.1# virsh domiflist vjunos0
Interface  Type       Source     Model       MAC
-------------------------------------------------------
vnet0      bridge     virbr0     e1000       52:54:00:bf:d1:6c
vnet1      bridge     ctrlbr0    e1000       52:54:00:e7:b6:cd

In the 13.2X53D20 version of Junos, there are two bridges installed for the VMs within KVM. The vnet0/virbr0 interface is used across all of the VMs to communicate with the outside world through their management interfaces. The other interface, vnet1/ctrlbr0, is used exclusively for ISSU. During an ISSU, there are two copies of Junos running; all control plane communication between the VMs are performed over this special bridge so that any other control plane functions such as Secure Shell (SSH), Open Shortest Path First (OSPF), and Border Gateway Protocol (BGP) aren’t impacted while synchronizing the kernel state between the master and backup Junos VMs.

Another interesting place to look for more information is in the /proc filesystem. We can take a look at the process ID (PID) of vjunos0 and examine the task status:

-sh-4.1# cat /var/run/libvirt/qemu/vjunos0.pid
2972
-sh-4.1# cat /proc/2972/task/*/status
Name:   qemu-kvm
State:  S (sleeping)
Tgid:   2972
Pid:    2972
PPid:   1
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
Utrace: 0
FDSize: 256
Groups:
VmPeak:  2475100 kB
VmSize:  2276920 kB
VmLck:         0 kB
VmHWM:   1895132 kB
VmRSS:   1895128 kB
VmData:  2139812 kB
VmStk:        88 kB
VmExe:      2532 kB
VmLib:     16144 kB
VmPTE:      4284 kB
VmSwap:        0 kB
Threads:        2
SigQ:   1/55666
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000010002840
SigIgn: 0000000000001000
SigCgt: 0000002180006043
CapInh: 0000000000000000
CapPrm: fffffffc00000000
CapEff: fffffffc00000000
CapBnd: fffffffc00000000
Cpus_allowed:   04
Cpus_allowed_list:      2
Mems_allowed:
00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000001
Mems_allowed_list:      0
voluntary_ctxt_switches:        5825006750
nonvoluntary_ctxt_switches:     46300
Name:   qemu-kvm
State:  S (sleeping)
Tgid:   2972
Pid:    2975
PPid:   1
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
Utrace: 0
FDSize: 256
Groups:
VmPeak:  2475100 kB
VmSize:  2276920 kB
VmLck:         0 kB
VmHWM:   1895132 kB
VmRSS:   1895128 kB
VmData:  2139812 kB
VmStk:        88 kB
VmExe:      2532 kB
VmLib:     16144 kB
VmPTE:      4284 kB
VmSwap:        0 kB
Threads:        2
SigQ:   1/55666
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: ffffffde7ffbfebf
SigIgn: 0000000000001000
SigCgt: 0000002180006043
CapInh: 0000000000000000
CapPrm: fffffffc00000000
CapEff: fffffffc00000000
CapBnd: fffffffc00000000
Cpus_allowed:   04
Cpus_allowed_list:      2
Mems_allowed:
00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000001
Mems_allowed_list:      0
voluntary_ctxt_switches:        5526311517
nonvoluntary_ctxt_switches:     586609665

One of the more interesting things to notice is the Cpus_allowed_list, which is set to a value of 2. By default, Juniper assigns the third CPU directly to the vjunos0 VM; this guarantees that other tasks outside of the scope of the control plane don’t negatively impact Junos. The value is set to 2 because the first CPU has a value of 0. We can verify this again with another virsh command:

-sh-4.1# virsh vcpuinfo vjunos0
VCPU:           0
CPU:            2
State:          running
CPU time:       311544.1s
CPU Affinity:   --y-

We can see that the CPU affinity is set to y on the third CPU, which verifies what we see in the /proc file system.

App Engine

If you’re interested in learning more about the VMs but don’t feel like dropping to the host shell and using virsh commands, there is an alternative called the Junos App Engine, which is accessible within the Junos CLI.

To view the App Engine settings, use the show app-engine command. There are several different views that are available, as listed in Table 2-1.

Table 2-1. Junos App Engine views
View Description
ARP View all of the ARP entries of the VMs connected into all the bridge domains
Bridge View all of the configured Linux bridge tables
Information Get information about the compute cluster, such as model, kernel version, and management IP addresses
Netstat Just a simple wrapper around the Linux netstat –rn command
Resource usage Show the CPU, memory, disk, and storage usage statistics in an easy-to-read format

Let’s explore some of the most common Junos App Engine commands and examine the output:

dhanks@QFX5100> show app-engine arp
Compute cluster: default-cluster

  Compute node: default-node

   Arp
   ===
   Address                  HWtype  HWaddress        Flags Mask            Iface
   192.168.1.2              ether   10:0e:7e:ad:af:30   C                  virbr0

This is just a simple summary show command that aggregates the management IP, MAC, and the bridge table to which it’s bound.

Let’s take a look at the bridge tables:

dhanks@QFX5100> show app-engine bridge
Compute cluster: default-cluster

  Compute node: default-node

   Bridge Table
   ============
   bridge name  bridge id               STP enabled     interfaces
   ctrlbr0              8000.fe5400e7b6cd       no      vnet1
   virbr0               8000.100e7eadae03       yes     virbr0-nic
                                                        vnet0

Just another nice wrapper for the Linux brctl command. Recall that vnet0 is for the regular control plane side of Junos, whereas vnet1 is reserved for inter-routing engine traffic during an ISSU:

dhanks@QFX5100> show app-engine resource-usage
Compute cluster: default-cluster

  Compute node: default-node
   CPU Usage
   =========
   15:48:46     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
   15:48:46     all    0.30    0.00    1.22    0.01    0.00    0.00    0.00    2.27   96.20
   15:48:46       0    0.08    0.00    0.08    0.03    0.00    0.00    0.00    0.00   99.81
   15:48:46       1    0.08    0.00    0.11    0.00    0.00    0.00    0.00    0.00   99.81
   15:48:46       2    1.03    0.00    4.75    0.01    0.00    0.00    0.00    9.18   85.03
   15:48:46       3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

   Memory Usage
   ============
                total       used       free     shared    buffers     cached
   Mem:          7098       3047       4051          0        258        743
   Swap:            0          0          0

   Disk Usage
   ==========
   Filesystem            Size  Used Avail Use% Mounted on
   tmpfs                 3.5G  4.0K  3.5G   1% /dev/shm
   /dev/mapper/vg0_vjunos-lv_var
                          11G  198M  9.7G   2% /var
   /dev/mapper/vg0_vjun
   os-lv_junos
                          12G  2.2G  9.1G  20% /junos
   /dev/mapper/vg0_vjunos-lv_junos_recovery
                         4.0G  976M  2.8G  26% /recovery
   /dev/sda1             962M  312M  602M  35% /boot

   Storage Information
   ===================
     VG         #PV #LV #SN Attr   VSize  VFree
     vg0_vjunos   2   3   0 wz--n- 26.53g    0

show app-engine resource-usage is a nice aggregated command showing the utilization of the CPU, memory, disk, and storage information; it’s a very easy way to get a bird’s-eye view of the health of the App Engine.

ISSU

Since the original M Series routers, one of the great Junos features is its ability to support ISSU. With ISSU, the network operating system can upgrade the firmware of the router without having to shut it down and impact production traffic. One of the key requirements for ISSU is that there are two routing engines. During an ISSU, the two engines need to synchronize kernel and control plane state with each other. The idea is that one routing engine is upgraded while the other routing engine is handling the control plane.

Although Juniper QFX5100 switches don’t physically have two routing engines, they are able to carry out the same functional requirements thanks to the power of virtualization. The Juniper QFX5100 series is able to create a second VM running Junos during an ISSU to meet all of the synchronization requirements, as is illustrated in Figure 2-3.

Each Junos VM has three management interfaces. Two of those interfaces, em0 and em1, are used for management and map to the external interfaces C0 and C1, respectively. The third management interface, em2, is used exclusively for communication between the two Junos VMs. For example, control plane protocols such as NSR, NSB, and GRES are required in order for a successful ISSU to complete; these protocols would communicate across the isolated em2 interface as well as an isolated ctrlbr0 bridge table in the Linux host.

The QFX5100 Linux KVM and management architecture
Figure 2-3. The QFX5100 Linux KVM and management architecture

The backup Junos VM is only created and running during an ISSU. At a high level, Junos goes through the following steps during an ISSU:

  • The backup Junos VM is created and started.

  • The backup Junos VM is upgraded to the software version specified in the ISSU command.

  • The PFE goes into an ISSU-prepared state in which data is copied from the PFE to RAM.

  • The PFE connects to the recently upgraded backup Junos VM, which now becomes the master routing engine.

  • The PFE performs a warm reboot.

  • The new master Junos VM installs the PFE state from RAM back into the PFE.

  • The other Junos VM is shut down.

  • Junos has been upgraded and the PFE has performed a warm reboot.

Let’s see an ISSU in action:

dhanks@QFX5100> request system software in-service-upgrade flex-13.2X51-D20.2-
domestic-signed.tgz
warning: Do NOT use /user during ISSU. Changes to /user during ISSU may get lost!
ISSU: Validating Image
error: 'Non Stop Routing' not configured
error: aborting ISSU
error: ISSU Aborted!
ISSU: IDLE

Ah, bummer! What happened here? There are some requirements for the control plane that must be enabled before a successful ISSU can be achieved:

  • NSR

  • NSB

  • GRES

  • Commit Synchronization

Let’s configure these quickly and try an ISSU once again.

{master:0}[edit]
dhanks@QFX5100# set chassis redundancy graceful-switchover
{master:0}[edit]
dhanks@QFX5100# set protocols layer2-control nonstop-bridging
{master:0}[edit]
dhanks@QFX5100# set system commit synchronize
{master:0}[edit]
dhanks@QFX5100# commit and-quit
configuration check succeeds
commit complete
Exiting configuration mode

OK, now that all of the software features required for ISSU are configured and committed, let’s try the ISSU one more time:

dhanks@QFX5100> request system software in-service-upgrade flex-13.2X51-D20.2-
domestic-signed.tgz
warning: Do NOT use /user during ISSU. Changes to /user during ISSU may get lost!
ISSU: Validating Image
ISSU: Preparing Backup RE
Prepare for ISSU
ISSU: Backup RE Prepare Done
Extracting jinstall-qfx-5-flex-13.2X51-D20.2-domestic ...
Install jinstall-qfx-5-flex-13.2X51-D20.2-domestic completed
Spawning the backup RE
Spawn backup RE, index 1 successful
GRES in progress
GRES done in 0 seconds
Waiting for backup RE switchover ready
GRES operational
Copying home directories
Copying home directories successful
Initiating Chassis In-Service-Upgrade
Chassis ISSU Started
ISSU: Preparing Daemons
ISSU: Daemons Ready for ISSU
ISSU: Starting Upgrade for FRUs
ISSU: Preparing for Switchover
ISSU: Ready for Switchover
Checking In-Service-Upgrade status
  Item           Status                  Reason
  FPC 0          Online
Send ISSU done to chassisd on backup RE
Chassis ISSU Completed
ISSU: IDLE
Initiate em0 device handoff
pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway
pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway
pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway
pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway
em0: bus=0, device=3, func=0, Ethernet address 10:0e:7e:b2:2d:78
hub 1-1:1.0: over-current change on port 1
hub 1-1:1.0: over-current change on port 3
hub 1-1:1.0: over-current change on port 5

QFX5100 (ttyd0)

login:

Excellent! The ISSU has completed successfully and no traffic was impacted during the software upgrade of Junos.

One of the advantages of the Broadcom warm reboot feature is that no firmware is installed in the PFE. This effectively makes the ISSU problem a control plane–only problem, which is very easy to solve. When you need to synchronize both the PFE firmware and control plane firmware, there are more moving parts, and the problem is more difficult to solve. Juniper MX Series by Douglas Richard Hanks, Jr. and Harry Reynolds (O’Reilly) thoroughly explains all of the benefits and drawbacks of ISSU in such a platform that upgrades both the control plane firmware in addition to the PFE firmware. The end result is that a control plane–only ISSU is more stable and finishes much faster when compared to a platform such as the Juniper MX. However, the obvious drawback is that no new PFE features can be used as part of a control plane–only ISSU, which is where the Juniper MX would win.

Summary

This chapter walked you through the design of the control plane and how the Juniper QFX5100 is really just a server that thinks it’s a switch. The Juniper QFX5100 has a powerful Intel CPU, standard memory, and SSD hard drives. What was surprising is that the switch boots directly into Linux and uses KVM to virtualize Junos, which is the network operating system. Because Junos is running a VM, it enables the Juniper QFX5100 to support carrier-class features such as ISSU, NSR, and NSB.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset