Server – the web traffic controller

To run our prediction service, we need to communicate with external systems to receive requests to train a model, score new data, evaluate existing performance, or provide model parameter information. The web server performs this function, accepting incoming HTTP requests and forwarding them on to our web application either directly or through whatever middleware may be used.

Though we could have made many different choices of server in illustrating this example, we have chosen the CherryPy library because unlike other popular servers such as Apache Tomcat or Nginx, it is written in Python (allowing us to demonstrate its functionality inside a notebook) and is scalable, processing many requests in only a few milliseconds (http://www.aminus.org/blogs/index.php/2006/12/23/cherrypy_3_has_fastest_wsgi_server_yet.). The server is attached to a particular port, or endpoint (this is usually given in the format url:port), to which we direct requests that are then forwarded to the web application. The use of ports means that we could in theory have multiple servers on a given URL, each listening to requests on a different endpoint.

As we discussed previously, the server uses the WGSI specification to communicate with the application itself. In concrete terms, the server has a function known as a callable (for example, any object with a __call__ method) that is executed every time it receives a request, whose result is handed off to the application. In our example in this chapter, the WGSI is already implemented by CherryPy, and we will simply illustrate how it does so. Complete documentation of the interface is available at (https://www.python.org/dev/peps/pep-0333/). In a way, the WGSI solves the same problem as HTTP in the communication between servers and applications: it provides a common way in which the two systems exchange information, allowing us to swap the components or event place intervening components without altering the fundamental way in which information is transferred.

In cases where we might wish to scale the application to a larger load, we could imagine middleware such as a load balancer between the server and the application. The middleware would receive the callable output and pass it along to the web application. In the case of a load balancer, this could potentially redistribute requests to many separate instances of the same predictive service, allowing us to scale the service horizontally (see Aside). Each of these services would then return their response to the server before it is sent back to the client.

Tip

Aside: horizontal and vertical scaling

As the volume of data or computational complexity of our prediction services increases, we have two primary ways to increase the performance of the service. The first, known as horizontal scaling, might involve adding more instances of our application. Separately, we might also increase the number of resources in our underlying computing layer, such as Spark. In contrast, vertical scaling involves improving the existing resources by adding more RAM, CPU, or disk space. While horizontal scaling is more easily implemented using software alone, the right solution for such resources constraints will depend on the problem domain and organizational budget.

Application – the engine of the predictive services

Once a request has made its way from the client to the application, we need to provide the logic that will execute these commands and return a response to the server and subsequently client upstream. To do so, we must attach a function to the particular endpoint and requests we anticipate receiving.

In this chapter, we will be using the Flask framework to develop our web application (http://flask.pocoo.org/). While Flask can also support template generation of HTML pages, in this chapter we will be using it purely to implement various requests to the underlying predictive algorithm code through URL endpoints corresponding to the HTTP requests discussed previously. Implementing these endpoints allows a consistent interface through which many other software systems could interact with our application—they just need to point to the appropriate web address and process the response returned from our service. In case you are concerned we will not generate any actual 'webpages' in our application, do not be worried: we will use the same Flask framework in Chapter 9, Reporting and Testing – Iterating on Analytic Systems, to develop a dashboard system based on the data we will generate through the predictive modeling service in this chapter.

In writing the logic for our predictive modeling application, it is important to keep in mind that the functions that are called in response to client requests can themselves be interfaces specifying a generic, modular service. While we could directly implement a particular machine learning algorithm in the code for the web application itself, we have chosen to abstract this design, with the web application instead making a generic call to construct a model with some parameters, train, and score using an algorithm, regardless of the data or particular model used in the application. This allows us to reuse the web application code with many different algorithms while also affording the flexibility to implement these algorithms in different ways over time. It also forces us to determine a consistent set of operations for our algorithms since the web application will only interact with them through this abstraction layer.

Finally, we have the algorithm itself, which is called by the web application code. This program needs to implement functions, such as training a model and scoring records using a set of data, specified in the web application. The details can change substantially over time without need to modify the web application, allowing us to flexibly develop new models or experiment with different libraries.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset