APPENDIX A

Virtualization

Among others, two goals of capacity planning are to employ the resources that your organization has at hand in the most efficient manner, and to predict future needs based on the patterns of current use. For well-defined workloads, you potentially can get pretty close to utilizing most of the hardware resources for each class of server you have, such as databases, web servers, and storage devices. Unfortunately, web application workloads are rarely (if ever) perfectly aligned with the available hardware resources.

Image

In such circumstances, you end up with inefficiencies in usage of available capacity. For example, if you know that a database’s specific ceiling (limit) is determined by its memory or disk usage, but meanwhile it uses very little CPU, there’s no reason to buy servers with two quad-core CPUs. That resource (and investment) will simply be wasted unless you direct the server to work on other CPU-intensive tasks. Even buying a single CPU can be overkill. But often, that’s all that’s available, so you end up with idle resources. It’s the continual need to balance correct resources to workload demand that makes capacity planning so important, and in recent years some technologies and approaches have emerged that render this balance easier to manage, with ever-finer granularity.

Overview

There are many definitions of virtualization. In general, virtualization is the abstraction of computing resources at various levels of a computer. Hardware, application, and operating system (OS) levels are some of the few places in which this abstraction can take place, but in the context of growing web operations, virtualization is generally used to describe OS abstraction, otherwise known as server virtualization. Examples include the Xen1 virtual machine (VM) monitor, VMWare’s ESXi2 server, the Hyper-V, and KVM, in which a host OS functions with the hypervisor running on top of it. Because a VM is not dependent on the state of the physical hardware, you can install multiple VMs on a single set of hardware. Entire books are written on the topic of virtualization. As it relates to capacity planning, virtualization allows for more granular control of how resources are used at the bare metal level. Here are some of the advantages of virtualization:

Efficient use of resources

There’s no reason to waste an entire server to run small tasks like corporate email. If there are spare CPU, memory, or disk resources, you can pile on other services to that resource to make better use of it. Because of this, organizations use virtualization to consolidate many servers to run on a single piece of hardware.

Portability and fault tolerance

When a physical host is reaching known (or perhaps unknown) limits, or suffers a hardware failure, a guest OS (and its associated load) can be safely migrated to another host.

Development sandboxes

Because entire operating systems can be created and destroyed without harming the underlying host environment, virtualization is ideal for building multiple development environments that demand different operating systems, kernels, or system configurations. If there’s a major bug that causes the entire test-bed to explode, no problem—you can easily re-create it.

Less management overhead

Virtualization makes it possible for you to consolidate several individual servers with idle resources into fewer servers with higher resource utilization. This can translate into reduced power consumption as well as a smaller datacenter footprint. Another benefit of less hardware is that there are fewer parts subject to failure, such as disk drives, CPUs, and power supplies. Of course, the counterpoint to this is consolidation can increase the exposure to a single-point-of-failure (SPOF), given that many services are dependent on the same physical hardware. Virtualization packages solve this potential problem by allowing VMs to easily migrate from server to server for disaster recovery, and for rebalancing workloads.

Virtualization essentially allows you to do more work with less hardware. These efficiencies have a trade-off in that they can complicate measurements. Identifying which resource is virtual usage and which is physical can be confusing, as the abstraction layer introduces another level of metric collection and measurement. One additional advantage to virtualization is that you can separate application ceilings on a role-by-role basis, even when you are only running on a single physical server. For example, suppose that you are consolidating email, backup, and logging services onto a single server. You might allocate more memory to the logging services for buffering the log writes to disk, and you might allocate more disk space to the backup application so it has room to grow. As long as you can keep track of the virtual and the physical, the capacity planning process is roughly the same. Consider the physical servers as generic containers in which you can run a limited number of virtual servers.

In recent years, containers have emerged as an alternative to VMs. Containers encapsulate an application with its dependencies. Containers share resources with the host OS and hence are very efficient. Further, containers can be started and stopped very quickly. The portability of containers can potentially help eliminate bugs induced due to change in the environment. Thus, the use of containers enables developers to build software locally, knowing that it will run identically regardless of the host environment.

NOTE

For a comparative performance evaluation of VMs and Linux containers, read the 2014 paper titled “An Updated Performance Comparison of Virtual Machines and Linux Containers,” by W. Felter et al.

Figure A-1 illustrates three applications running in three separate VMs on a host. The hypervisor creates and runs VMs, controls access to the underlying OS and hardware, and interprets system calls when necessary. Each VM requires a full copy of the OS, the application being run, and any supporting libraries. In contrast, Figure A-1 illustrates three applications running in a containerized system. Unlike VMs, the host kernel is shared between the different containers. Akin to a hypervisor on a VM, the container engine is responsible for starting and stopping containers. Processes running in containers are equivalent to native processes on the host and do not incur the overhead associated with hypervisor execution.

FIXME: Kindly redraw the figure. Move the captions in the subfigures as (a) and (b) here. Note that the figure is taken from O’Reilly book ‘Using Docker’ A. Mouat.
Figure A-1. (a) Three VMs running on a single host (b) Three containers running on a single host

There are two sides of a coin. Alongside the benefits outlined earlier, there are challenges associated with the use of VMs and containers on public clouds; for instance, security and performance isolation. The latter is, in part, addressed by the use of dedicated instances; however, the use dedicated instances limits the benefits of the pay-per-usage model of the cloud. The performance of instances on the cloud can vary to a great extent. We can ascribe this, in part, to the following:

  • Datacenters grow to contain multiple generations of hardware (e.g., network switches, disks, and CPU architectures) as old components are replaced or new capacity is added. In a 2012 paper titled “Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2” by Z. Ou et al., the authors reported that Amazon EC2 uses diversified hardware to host the same type of instance. The hardware diversity results in performance variation. In general, the variation between the fast instances and slow instances can reach 40 percent. In some applications, the variation can even approach up to 60 percent.

  • Network topology can vary, with some routes having lower latency or supporting higher bandwidth than others.

  • Multiplexing systems across customers with different workloads also can lead to uneven resource contention across the cloud.

For further reading about performance variability in the cloud, refer to the following:

  1. B. Guenter et al. (2011). Managing Cost, Performance, and Reliability Tradeoffs for Energy-Aware Server Provisioning.

  2. V. Jalaparti et al. (2012). Bridging the Tenant-Provider Gap in Cloud Services.

  3. B. Farley et al. (2012). More for Your Money: Exploiting Performance Heterogeneity in Public Clouds.

  4. Z. Ou et al. (2013). Is the Same Instance Type Created Equal? Exploiting Heterogeneity of Public Clouds.

  5. P. Leitner and J. Cito, (2014). Patterns in the Chaos - a Study of Performance Variation and Predictability in Public IaaS Clouds.

  6. J. Mogul and R. R. Kompella, (2015). Inferring the Network Latency Requirements of Cloud Tenants.

  7. A. Anwar et al. (2016). Towards Managing Variability in the Cloud.

Looking Back and Moving forward

Virtualization technologies have spawned an entire industry of computing “utility” providers who take advantage of the efficiencies inherent in virtualization to build public clouds. Cloud service providers then make those resources available on a cost-per-usage basis via an API, or other means. Because cloud computing and storage essentially takes some of the infrastructure deployment and management out of the hands of the developer, using a cloud infrastructure can be an attractive alternative to running one’s own servers. Cloud infrastructure providers offer a “menu” of compute instance items, ranging from lower-powered CPU and memory platforms to large-scale, multicore CPU systems (nowadays, even graphics processing units [GPUs] and field-programmable gate arrays [FPGAs] also are supported) with massive amounts of memory. These choices aren’t as customizable as off-the-shelf systems you own; so, when you need to, determine how to fit the needs using the “menu.” Since the writing of the first edition of this book, public clouds such as Amazon Web Services (AWS) and Microsoft Azure have grown to businesses of more than $10 billion.

Image

Note that virtualization has been researched for more than four decades. Early research in virtualization can be traced back to the following:

  1. R. F. Rosin. (1969). Contemporary Concepts of Microprogramming and Emulation.

  2. R. P. Goldberg. (1973). Architecture of Virtual Machines.

  3. R. P. Goldberg. (1974). Survey of Virtual Machine Research.

  4. G. J. Popek and R. P. Goldberg. (1974). Formal Requirements for Virtualizable Third-Generation Architectures.

  5. L. Seawright and R. MacKinnon. (1979). VM/370—A Study of Multiplicity and Usefulness.

  6. P. H. Gum. (1983). System/370 Extended Architecture: Facilities for Virtual Machines.

Image

At one time, computers were seen as equipment only managed by large financial, educational, or research institutions. Because computers were extremely expensive, IBM and other manufacturers built large-scale minicomputers and mainframes to handle processing for multiple users at once, utilizing many of the virtualization concepts still in use today. Users would be granted slices of computation time from mainframe machines, accessing them from thin, or dumb, terminals. Users submitted jobs whose computation contended for resources. The centralized system was managed via queues, virtual operating systems, and system accounting that governed resource allocation. All of the heavy lifting of computation was handled by the mainframe and its operators, and was largely invisible to the end users. The design of these systems was largely driven by security and reliability, so considerable effort was applied to containing user environments and data redundancy. Virtualization gained traction in datacenter space and then became the bedrock of innovation in the realm of cloud computing. For a deeper dive into datacenter virtualization, refer to the following:

  1. C. Guo et al. (2010) SecondNet: A Data Center Network Virtualization Architecture with Bandwidth Guarantees.

  2. G. A. A. Santana. (2013). Data Center Virtualization Fundamentals: Understanding Techniques and Designs for Highly Efficient Data Centers with Cisco Nexus, UCS, MDS, and Beyond.

Today, virtualization is not limited to compute, it also is applied to network, storage, and I/O. To mitigate the impact of virtualization on performance, there’s been increasing support for virtualization in hardware. Last but not least, security in the context of virtualization has also been an active area of research. To learn more about the various facets of virtualization, refer to the following research papers and surveys:

  1. T. Clark. (2005). Storage Virtualization: Technologies for Simplifying Data Storage and Management.

  2. K. Adams and O. Agesen. (2006). A comparison of software and hardware techniques for x86 virtualization.

  3. S. Crosby and D. Brown. (2006). The Virtualization Reality: Are hypervisors the new foundation for system software?

  4. K. Roussos. (2007). Storage Virtualization Gets Smart.

  5. S. Rixner. (2008). Network Virtualization: Breaking the Performance Barrier.

  6. M. Carbone et al. (2008). Taming Virtualization.

  7. X. Chen et al. (2008). Overshadow: a virtualization-based approach to retrofitting protection in commodity operating systems.

  8. J. Carapinha and J. Jiménez. (2009). Network virtualization: a view from the bottom.

  9. A. van Cleeff et al. (2009). Security Implications of Virtualization: A Literature Study.

  10. M. Rosenblum and C. Waldspurger. (2012). I/O Virtualization.

  11. M. Pearce et al. (2013). Virtualization: Issues, security threats, and solutions.

  12. G. Pék et al. (2013). A survey of security issues in hardware virtualization.

  13. T. Koponen et al. (2014). Network virtualization in multi-tenant datacenters.

  14. J. Shuja et al. (2016). A Survey of Mobile Device Virtualization: Taxonomy and State of the Art.

  15. Open vSwitch, [online]. Available at http://openvswitch.org/.

1 P. Barham et al. (2003). Xen and the Art of Virtualization.

2 E. Haletky. (2011). VMware ESX and ESXi in the Enterprise: Planning Deployment of Virtualization Servers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset