Cloud-based solutions should scale on demand. Meaning, if an application’s user demand reaches a specific threshold, one or more servers should be added dynamically to support the application. Likewise, when the demand decreases, the application should scale down its resource use. When an application uses multiple servers, one server, as shown in FIGURE 19-1, must perform the task of load balancing.
The load-balancing server receives client requests and distributes each request to one of the available servers. To determine which server gets the request, the load balancer may use a round-robin technique, a random algorithm, or a more complex technique based upon each server’s capacity and current workload. For an application to fully exploit load balancing, the application developers must design the application for scaling.