The key factors driving the Juniper QFX5100 are the advent of virtualization and cloud computing; however, there are many facets to virtualization. One is decoupling the service from the physical hardware. When this is combined with orchestration and automation, the service is now said to be agile: it has the ability to be quickly provisioned, even within seconds. Another aspect is scale in the number of instances of the service. Because it becomes so easy to provision a service, the total number of instances quickly increases.
Compute virtualization is such a simple concept, yet it yields massive benefit to both the end user and operator. The next logical step is to apply the benefits of compute virtualization to the control plane of the network. After all, the control board is nothing but an x86 processor, memory, and storage.
The immediate benefit of virtualizing the control board might not be so obvious. Generally, operators like to toy around and create a virtual machine (VM) running Linux so that they’re able to execute operational scripts and troubleshoot. However, there is a much more exciting use case to virtualization of the control board. Traditionally, only networking equipment that was chassis-based was able to support two routing engines. The benefit of two routing engines is that it increases the high availability of the chassis and allows the operator to upgrade the control plane software in real time without traffic loss. This feature is commonly referred to as In-Service Software Upgrade (ISSU). One of the key requirements of ISSU is to have two routing engines that are synchronized using the Nonstop Routing (NSR), Nonstop Bridging (NSB), and Graceful Routing Engine Switchover (GRES) protocols. Fixed networking equipment such as top-of-rack (ToR) switches generally have only a single routing engine and do not support ISSU due to the lack of a second routing engine. Taking advantage of virtualization allows a ToR switch to have two virtualized routing engines that make possible features such as ISSU. The Juniper QFX5100 family takes virtualization to heart and uses the Linux kernel-based virtual machine (KVM) as the host operating system and places Junos, the network operating system, inside of a VM. When an operator wants to perform a real-time software upgrade, the Juniper QFX5100 switch will provision a second routing engine, synchronize the data, and perform the ISSU without dropping traffic.
Another great benefit of compute virtualization inside of a switch is that you can create user-defined VMs and run your own applications and programs on the switch. Use cases include Network Functional Virtualization (NFV), network management, and statistical reporting.
Recall that the Juniper QFX5100 series is split into two major components (see Figure 2-1): the control board and switch board. The control board is the foundation for the control plane, whereas the switch board is the foundation for the data plane.
Focusing on the control board components, the blocks shaded in gray represent all of the roles in that architecture that are responsible for virtualizing the control plane. The control board is running commodity hardware that’s easily compatible with common hypervisors. The processor is an Intel 1.5 Ghz dual-core Sandy Bridge CPU, and there is 8 GB of memory and a 32 GB solid-state disk (SSD). The Juniper QFX5100 boots directly into CentOS Linux instead of Junos; this provides the platform with several advantages. The first advantage is the ability to virtualize the underlying hardware by using Linux KVM and QEMU; the second advantage is the ability to host operational daemons and Application Programming Interfaces (APIs) directly on the host operating system.
To make the management of the hypervisor easier, the virtualization library (libvir) is used to provision and manage the VMs. The libvir provides a normalized management framework across a set of hypervisors. The ability to use a common framework to control a hypervisor provides more flexibility in the future if any of the underlying components happen to change.
As mentioned in the previous section, the Juniper QFX5100 boots directly into Linux, specifically CentOS. This provides the operating system and virtualization foundation for Junos and all other network-related functionality.
Let’s log in to the host operating system and do some exploring:
dhanks@qfx5100> request app-engine host-shell Last login: Sun Nov 17 14:30:47 from 192.168.1.2 --- Host 13.2I20131114_1603_vsdk_build_30 built 2013-11-14 16:03:50 UTC
Now, let’s take a peek at the PCI bus and see what’s installed on the host operating system:
-sh-4.1# lspci 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09) 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09) 00:01.1 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09) 00:01.2 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09) 00:06.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09) 00:1c.0 PCI bridge: Intel Corporation DH89xxCC PCI Express Root Port #1 (rev 08) 00:1c.1 PCI bridge: Intel Corporation DH89xxCC PCI Express Root Port #2 (rev 08) 00:1c.2 PCI bridge: Intel Corporation DH89xxCC PCI Express Root Port #3 (rev 08) 00:1c.3 PCI bridge: Intel Corporation DH89xxCC PCI Express Root Port #4 (rev 08) 00:1d.0 USB controller: Intel Corporation DH89xxCC USB2 Enhanced Host Controller #1 (rev 08) 00:1f.0 ISA bridge: Intel Corporation DH89xxCC LPC Controller (rev 08) 00:1f.2 SATA controller: Intel Corporation DH89xxCC 4 Port SATA AHCI Controller (rev 08) 00:1f.3 SMBus: Intel Corporation DH89xxCC SMBus Controller (rev 08) 00:1f.7 System peripheral: Intel Corporation DH89xxCC Watchdog Timer (rev 08) 01:00.0 Co-processor: Intel Corporation Device 0434 (rev 21) 01:00.1 Ethernet controller: Intel Corporation DH8900CC Series Gigabit Network (rev 21) 01:00.2 Ethernet controller: Intel Corporation DH8900CC Series Gigabit Network (rev 21) 01:00.3 Ethernet controller: Intel Corporation DH8900CC Series Gigabit Network (rev 21) 01:00.4 Ethernet controller: Intel Corporation DH8900CC Series Gigabit Network (rev 21) 07:00.0 Unassigned class [ff00]: Juniper Networks Device 0062 (rev 01) 08:00.0 Unassigned class [ff00]: Juniper Networks Device 0063 (rev 01) 09:00.0 Ethernet controller: Broadcom Corporation Device b854 (rev 02)
Pretty vanilla so far. Four CPUs, a USB port, a SATA controller, and some network interface controllers (NICs). But, the two Juniper Networks devices are interesting; what are they? These are the FPGA controllers that are responsible for the chassis fan, sensors, and other environmental functions.
The final device is the Broadcom 56850 chipset. The way a network operating system controls the Packet Forwarding Engine (PFE) is simply through a PCI interface by using a Software Development Kit (SDK).
Let’s take a closer look at the CPU:
-sh-4.1# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Pentium(R) CPU @ 1.50GHz stepping : 7 cpu MHz : 1500.069 cache size : 3072 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid bogomips : 3000.13 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Pentium(R) CPU @ 1.50GHz stepping : 7 cpu MHz : 1500.069 cache size : 3072 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid bogomips : 3000.13 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Pentium(R) CPU @ 1.50GHz stepping : 7 cpu MHz : 1500.069 cache size : 3072 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid bogomips : 3000.13 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Pentium(R) CPU @ 1.50GHz stepping : 7 cpu MHz : 1500.069 cache size : 3072 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid bogomips : 3000.13 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
The CPU is a server-class Intel Xeon E3-1200 processor; it’s a single socket with four cores. There’s plenty of power to operate multiple VMs and the network operating system.
Now, let’s move on to the memory:
-sh-4.1# free total used free shared buffers cached Mem: 7529184 3135536 4393648 0 158820 746800 -/+ buffers/cache: 2229916 5299268 Swap:
After some of the memory has been reserved by other hardware and the kernel, you can see that we have about 7.3 GB total.
Next, let’s see how many disks there are and how they’re partitioned:
-sh-4.1# fdisk -l Disk /dev/sdb: 16.0 GB, 16013852672 bytes 255 heads, 63 sectors/track, 1946 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000dea11 Device Boot Start End Blocks Id System /dev/sdb1 * 1 125 1000000 83 Linux Partition 1 does not end on cylinder boundary. /dev/sdb2 125 1857 13914062+ 83 Linux Disk /dev/sda: 16.0 GB, 16013852672 bytes 255 heads, 63 sectors/track, 1946 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000d8b25 Device Boot Start End Blocks Id System /dev/sda1 * 1 125 1000000 83 Linux Partition 1 does not end on cylinder boundary. /dev/sda2 125 1857 13914062+ 83 Linux Disk /dev/mapper/vg0_vjunos-lv_junos_recovery: 4294 MB, 4294967296 bytes 255 heads, 63 sectors/track, 522 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/mapper/vg0_vjunos-lv_var: 11.3 GB, 11307843584 bytes 255 heads, 63 sectors/track, 1374 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/mapper/vg0_vjunos-lv_junos: 12.9 GB, 12884901888 bytes 255 heads, 63 sectors/track, 1566 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000
The host system has two SSD storage devices, each with 16 GB of capacity. From the partition layout illustrated in Figure 2-2, you can see that we’re running the Linux Volume Manager (LVM).
There are two 16 GB SSDs, which are part of the Linux LVM. The primary volume group is vg0_vjunos. This volume group has three volumes that are used by Junos:
When the Juniper QFX5100 boots up, the host operating system is Linux. All of the control plane operations happen within the network operating system, Junos. The Juniper QFX5100 takes advantage of compute virtualization in the host operating system by using Linux KVM. A VM is created specifically for Junos. Given that KVM can create multiple VMs, the Juniper QFX5100 series has the ability to perform ISSU and support third-party VMs that can host additional services such as network management and monitoring.
The Juniper QFX5100 uses the libvir library as well as the libsh
management user interface to interact with Linux KVM. If you’re familiar with libvir, walking around the virtualization capabilities of the Juniper QFX5100 will come as second nature. If you aren’t familiar with libvir, let’s use virsh
to explore and see what’s happening under the hood.
The first thing we need to do is drop into the host shell from the Junos CLI:
dhanks@qfx5100> request app-engine host-shell Last login: Sun Nov 17 14:30:47 from 192.168.1.2 --- Host 13.2I20131114_1603_vsdk_build_30 built 2013-11-14 16:03:50 UTC
Now, let’s take a look at the VMs installed in the Linux KVM:
-sh-4.1# virsh list --all Id Name State ---------------------------------------------------- 1 vjunos0 running
By default there’s a single VM running the Junos networking operating system. The VM’s name is vjunos0
with an ID of 1, and we can see that the state is running
.
Hmm. Are you curious as to what version of the libvir library and QEMU the Juniper QFX5100 is using? Let’s find out:
-sh-4.1# virsh version Compiled against library: libvir 0.9.10 Using library: libvir 0.9.10 Using API: QEMU 0.9.10 Running hypervisor: QEMU 0.12.1
At this point, let’s take a look at the overall host memory and CPU statistics:
-sh-4.1# virsh nodememstats total : 7269088 kB free : 4147596 kB buffers: 264772 kB cached : 761476 kB -sh-4.1# -sh-4.1# virsh nodecpustats user: 305995340000000 system: 145678380000000 idle: 11460475070000000 iowait: 1075190000000 sdf
Now that we’re familiar with what the host system is capable of, software versions, and of course how many VMs are configured, let’s examine the Junos VM:
-sh-4.1# virsh dominfo vjunos0 Id: 1 Name: vjunos0 UUID: 100e7ead-ae00-0140-0000-564a554e4f53 OS Type: hvm State: running CPU(s): 1 CPU time: 445895.2s Max memory: 2000896 kB Used memory: 2000896 kB Persistent: no Autostart: disable Managed save: no
Each VM has a unique identifier that can be used to refer to the VM. One of the more interesting attributes is the OS Type, which is set to hvm
; this stands for Hardware Virtual Machine. Because Junos is based on FreeBSD and heavily modified to support network control plane functions, it’s difficult to say that it’s pure FreeBSD. Instead, the alternative is to use a vendor-neutral OS Type of hvm
, which basically means that it’s an x86-based operating system.
Let’s focus on the memory and network settings for vjunos0
:
-sh-4.1# virsh dommemstat vjunos0 rss 1895128 -sh-4.1# virsh domiflist vjunos0 Interface Type Source Model MAC ------------------------------------------------------- vnet0 bridge virbr0 e1000 52:54:00:bf:d1:6c vnet1 bridge ctrlbr0 e1000 52:54:00:e7:b6:cd
In the 13.2X53D20 version of Junos, there are two bridges installed for the VMs within KVM. The vnet0
/virbr0
interface is used across all of the VMs to communicate with the outside world through their management interfaces. The other interface, vnet1
/ctrlbr0
, is used exclusively for ISSU. During an ISSU, there are two copies of Junos running; all control plane communication between the VMs are performed over this special bridge so that any other control plane functions such as Secure Shell (SSH), Open Shortest Path First (OSPF), and Border Gateway Protocol (BGP) aren’t impacted while synchronizing the kernel state between the master and backup Junos VMs.
Another interesting place to look for more information is in the /proc filesystem. We can take a look at the process ID (PID) of vjunos0
and examine the task status:
-sh-4.1# cat /var/run/libvirt/qemu/vjunos0.pid 2972 -sh-4.1# cat /proc/2972/task/*/status Name: qemu-kvm State: S (sleeping) Tgid: 2972 Pid: 2972 PPid: 1 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 Utrace: 0 FDSize: 256 Groups: VmPeak: 2475100 kB VmSize: 2276920 kB VmLck: 0 kB VmHWM: 1895132 kB VmRSS: 1895128 kB VmData: 2139812 kB VmStk: 88 kB VmExe: 2532 kB VmLib: 16144 kB VmPTE: 4284 kB VmSwap: 0 kB Threads: 2 SigQ: 1/55666 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000010002840 SigIgn: 0000000000001000 SigCgt: 0000002180006043 CapInh: 0000000000000000 CapPrm: fffffffc00000000 CapEff: fffffffc00000000 CapBnd: fffffffc00000000 Cpus_allowed: 04 Cpus_allowed_list: 2 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000, 00000000,00000000,00000000,00000000,00000000,00000000,00000000, 00000000,00000001 Mems_allowed_list: 0 voluntary_ctxt_switches: 5825006750 nonvoluntary_ctxt_switches: 46300 Name: qemu-kvm State: S (sleeping) Tgid: 2972 Pid: 2975 PPid: 1 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 Utrace: 0 FDSize: 256 Groups: VmPeak: 2475100 kB VmSize: 2276920 kB VmLck: 0 kB VmHWM: 1895132 kB VmRSS: 1895128 kB VmData: 2139812 kB VmStk: 88 kB VmExe: 2532 kB VmLib: 16144 kB VmPTE: 4284 kB VmSwap: 0 kB Threads: 2 SigQ: 1/55666 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: ffffffde7ffbfebf SigIgn: 0000000000001000 SigCgt: 0000002180006043 CapInh: 0000000000000000 CapPrm: fffffffc00000000 CapEff: fffffffc00000000 CapBnd: fffffffc00000000 Cpus_allowed: 04 Cpus_allowed_list: 2 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000, 00000000,00000000,00000000,00000000,00000000,00000000,00000000, 00000000,00000001 Mems_allowed_list: 0 voluntary_ctxt_switches: 5526311517 nonvoluntary_ctxt_switches: 586609665
One of the more interesting things to notice is the Cpus_allowed_list
, which is set to a value of 2. By default, Juniper assigns the third CPU directly to the vjunos0
VM; this guarantees that other tasks outside of the scope of the control plane don’t negatively impact Junos. The value is set to 2 because the first CPU has a value of 0. We can verify this again with another virsh
command:
-sh-4.1# virsh vcpuinfo vjunos0 VCPU: 0 CPU: 2 State: running CPU time: 311544.1s CPU Affinity: --y-
We can see that the CPU affinity is set to y
on the third CPU, which verifies what we see in the /proc file system.
If you’re interested in learning more about the VMs but don’t feel like dropping to the host shell and using virsh
commands, there is an alternative called the Junos App Engine, which is accessible within the Junos CLI.
To view the App Engine settings, use the show app-engine
command. There are several different views that are available, as listed in Table 2-1.
View | Description |
---|---|
ARP | View all of the ARP entries of the VMs connected into all the bridge domains |
Bridge | View all of the configured Linux bridge tables |
Information | Get information about the compute cluster, such as model, kernel version, and management IP addresses |
Netstat | Just a simple wrapper around the Linux netstat –rn command |
Resource usage | Show the CPU, memory, disk, and storage usage statistics in an easy-to-read format |
Let’s explore some of the most common Junos App Engine commands and examine the output:
dhanks@QFX5100> show app-engine arp Compute cluster: default-cluster Compute node: default-node Arp === Address HWtype HWaddress Flags Mask Iface 192.168.1.2 ether 10:0e:7e:ad:af:30 C virbr0
This is just a simple summary show
command that aggregates the management IP, MAC, and the bridge table to which it’s bound.
Let’s take a look at the bridge tables:
dhanks@QFX5100> show app-engine bridge Compute cluster: default-cluster Compute node: default-node Bridge Table ============ bridge name bridge id STP enabled interfaces ctrlbr0 8000.fe5400e7b6cd no vnet1 virbr0 8000.100e7eadae03 yes virbr0-nic vnet0
Just another nice wrapper for the Linux brctl
command. Recall that vnet0
is for the regular control plane side of Junos, whereas vnet1
is reserved for inter-routing engine traffic during an ISSU:
dhanks@QFX5100> show app-engine resource-usage Compute cluster: default-cluster Compute node: default-node CPU Usage ========= 15:48:46 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 15:48:46 all 0.30 0.00 1.22 0.01 0.00 0.00 0.00 2.27 96.20 15:48:46 0 0.08 0.00 0.08 0.03 0.00 0.00 0.00 0.00 99.81 15:48:46 1 0.08 0.00 0.11 0.00 0.00 0.00 0.00 0.00 99.81 15:48:46 2 1.03 0.00 4.75 0.01 0.00 0.00 0.00 9.18 85.03 15:48:46 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Memory Usage ============ total used free shared buffers cached Mem: 7098 3047 4051 0 258 743 Swap: 0 0 0 Disk Usage ========== Filesystem Size Used Avail Use% Mounted on tmpfs 3.5G 4.0K 3.5G 1% /dev/shm /dev/mapper/vg0_vjunos-lv_var 11G 198M 9.7G 2% /var /dev/mapper/vg0_vjun os-lv_junos 12G 2.2G 9.1G 20% /junos /dev/mapper/vg0_vjunos-lv_junos_recovery 4.0G 976M 2.8G 26% /recovery /dev/sda1 962M 312M 602M 35% /boot Storage Information =================== VG #PV #LV #SN Attr VSize VFree vg0_vjunos 2 3 0 wz--n- 26.53g 0
show app-engine resource-usage
is a nice aggregated command showing the utilization of the CPU, memory, disk, and storage information; it’s a very easy way to get a bird’s-eye view of the health of the App Engine.
Since the original M Series routers, one of the great Junos features is its ability to support ISSU. With ISSU, the network operating system can upgrade the firmware of the router without having to shut it down and impact production traffic. One of the key requirements for ISSU is that there are two routing engines. During an ISSU, the two engines need to synchronize kernel and control plane state with each other. The idea is that one routing engine is upgraded while the other routing engine is handling the control plane.
Although Juniper QFX5100 switches don’t physically have two routing engines, they are able to carry out the same functional requirements thanks to the power of virtualization. The Juniper QFX5100 series is able to create a second VM running Junos during an ISSU to meet all of the synchronization requirements, as is illustrated in Figure 2-3.
Each Junos VM has three management interfaces. Two of those interfaces, em0
and em1
, are used for management and map to the external interfaces C0
and C1
, respectively. The third management interface, em2
, is used exclusively for communication between the two Junos VMs. For example, control plane protocols such as NSR, NSB, and GRES are required in order for a successful ISSU to complete; these protocols would communicate across the isolated em2
interface as well as an isolated ctrlbr0
bridge table in the Linux host.
The backup Junos VM is only created and running during an ISSU. At a high level, Junos goes through the following steps during an ISSU:
The backup Junos VM is created and started.
The backup Junos VM is upgraded to the software version specified in the ISSU command.
The PFE goes into an ISSU-prepared state in which data is copied from the PFE to RAM.
The PFE connects to the recently upgraded backup Junos VM, which now becomes the master routing engine.
The PFE performs a warm reboot.
The new master Junos VM installs the PFE state from RAM back into the PFE.
The other Junos VM is shut down.
Junos has been upgraded and the PFE has performed a warm reboot.
Let’s see an ISSU in action:
dhanks@QFX5100> request system software in-service-upgrade flex-13.2X51-D20.2- domestic-signed.tgz warning: Do NOT use /user during ISSU. Changes to /user during ISSU may get lost! ISSU: Validating Image error: 'Non Stop Routing' not configured error: aborting ISSU error: ISSU Aborted! ISSU: IDLE
Ah, bummer! What happened here? There are some requirements for the control plane that must be enabled before a successful ISSU can be achieved:
NSR
NSB
GRES
Commit Synchronization
Let’s configure these quickly and try an ISSU once again.
{master:0}[edit] dhanks@QFX5100# set chassis redundancy graceful-switchover {master:0}[edit] dhanks@QFX5100# set protocols layer2-control nonstop-bridging {master:0}[edit] dhanks@QFX5100# set system commit synchronize {master:0}[edit] dhanks@QFX5100# commit and-quit configuration check succeeds commit complete Exiting configuration mode
OK, now that all of the software features required for ISSU are configured and committed, let’s try the ISSU one more time:
dhanks@QFX5100> request system software in-service-upgrade flex-13.2X51-D20.2- domestic-signed.tgz warning: Do NOT use /user during ISSU. Changes to /user during ISSU may get lost! ISSU: Validating Image ISSU: Preparing Backup RE Prepare for ISSU ISSU: Backup RE Prepare Done Extracting jinstall-qfx-5-flex-13.2X51-D20.2-domestic ... Install jinstall-qfx-5-flex-13.2X51-D20.2-domestic completed Spawning the backup RE Spawn backup RE, index 1 successful GRES in progress GRES done in 0 seconds Waiting for backup RE switchover ready GRES operational Copying home directories Copying home directories successful Initiating Chassis In-Service-Upgrade Chassis ISSU Started ISSU: Preparing Daemons ISSU: Daemons Ready for ISSU ISSU: Starting Upgrade for FRUs ISSU: Preparing for Switchover ISSU: Ready for Switchover Checking In-Service-Upgrade status Item Status Reason FPC 0 Online Send ISSU done to chassisd on backup RE Chassis ISSU Completed ISSU: IDLE Initiate em0 device handoff pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway em0: bus=0, device=3, func=0, Ethernet address 10:0e:7e:b2:2d:78 hub 1-1:1.0: over-current change on port 1 hub 1-1:1.0: over-current change on port 3 hub 1-1:1.0: over-current change on port 5 QFX5100 (ttyd0) login:
Excellent! The ISSU has completed successfully and no traffic was impacted during the software upgrade of Junos.
One of the advantages of the Broadcom warm reboot feature is that no firmware is installed in the PFE. This effectively makes the ISSU problem a control plane–only problem, which is very easy to solve. When you need to synchronize both the PFE firmware and control plane firmware, there are more moving parts, and the problem is more difficult to solve. Juniper MX Series by Douglas Richard Hanks, Jr. and Harry Reynolds (O’Reilly) thoroughly explains all of the benefits and drawbacks of ISSU in such a platform that upgrades both the control plane firmware in addition to the PFE firmware. The end result is that a control plane–only ISSU is more stable and finishes much faster when compared to a platform such as the Juniper MX. However, the obvious drawback is that no new PFE features can be used as part of a control plane–only ISSU, which is where the Juniper MX would win.
This chapter walked you through the design of the control plane and how the Juniper QFX5100 is really just a server that thinks it’s a switch. The Juniper QFX5100 has a powerful Intel CPU, standard memory, and SSD hard drives. What was surprising is that the switch boots directly into Linux and uses KVM to virtualize Junos, which is the network operating system. Because Junos is running a VM, it enables the Juniper QFX5100 to support carrier-class features such as ISSU, NSR, and NSB.