SciPy

SciPy (pronounced sigh pi) adds a layer to NumPy that wraps common scientific and statistical applications on top of the more purely mathematical constructs of NumPy. SciPy provides higher-level functions for manipulating and visualizing data, and it is especially useful when using Python interactively. SciPy is organized into sub-packages covering different scientific computing applications. A list of the packages most relevant to ML and their functions appear as follows:

Package

Description

cluster

This contains two sub-packages:

cluster.vq for K-means clustering and vector quantization.

cluster.hierachy for hierarchical and agglomerative clustering, which is useful for distance matrices, calculating statistics on clusters, as well as visualizing clusters with dendrograms.

constants

These are physical and mathematical constants such as pi and e.

integrate

These are differential equation solvers

interpolate

These are interpolation functions for creating new data points within a range of known points.

io

This refers to input and output functions for creating string, binary, or raw data streams, and reading and writing to and from files.

optimize

This refers to optimizing and finding roots.

linalg

This refers to linear algebra routines such as basic matrix calculations, solving linear systems, finding determinants and norms, and decomposition.

ndimage

This is N-dimensional image processing.

odr

This is orthogonal distance regression.

stats

This refers to statistical distributions and functions.

Many of the NumPy modules have the same name and similar functionality as those in the SciPy package. For the most part, SciPy imports its NumPy equivalent and extends its functionality. However, be aware that some identically named functions in SciPy modules may have slightly different functionality compared to those in NumPy. It also should be mentioned that many of the SciPy classes have convenience wrappers in the scikit-learn package, and it is sometimes easier to use those instead.

Each of these packages requires an explicit import; here is an example:

import scipy.cluster

You can get documentation from the SciPy website (scipy.org) or from the console, for example, help(sicpy.cluster).

As we have seen, a common task in many different ML settings is that of optimization. We looked at the mathematics of the simplex algorithm in the last chapter. Here is the implementation using SciPy. We remember simplex optimizes a set of linear equations. The problem we looked at was as follows:

Maximize x1 + x2 within the constraints of: 2x1 + x2 ≤ 4 and x1 + 2x2 ≤ 3

The linprog object is probably the simplest object that will solve this problem. It is a minimization algorithm, so we reverse the sign of our objective.

From scipy.optimize, import linprog:

objective=[-1,-1]
con1=[[2,1],[1,2]]
con2=[4,3]
res=linprog(objective,con1,con2)
print(res)

You will observe the following output:

SciPy

There is also an optimisation.minimize object that is suitable for slightly more complicated problems. This object takes a solver as a parameter. There are currently about a dozen solvers available, and if you need a more specific solver, you can write your own. The most commonly used, and suitable for most problems, is the nelder-mead solver. This particular solver uses a downhill simplex algorithm that is basically a heuristic search that replaces each test point with a high error with a point located in the centroid of the remaining points. It iterates through this process until it converges on a minimum.

In this example, we use the Rosenbrock function as our test problem. This is a non-convex function that is often used to test optimization problems. The global minimum of this function is on a long parabolic valley, and this makes it challenging for an algorithm to find the minimum in a large, relatively flat valley. We will see more of this function:

import numpy as np
from scipy.optimize import minimize
def rosen(x):
    return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)

def nMin(funct,x0):

    return(minimize(rosen, x0, method='nelder-mead', options={'xtol':
        1e-8, 'disp': True}))

x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])

nMin(rosen,x0)

The output for the preceding code is as follows:

SciPy

The minimize function takes two mandatory parameters. These are the objective function and the initial value of x0. The minimize function also takes an optional parameter for the solver method, in this example we use the nelder-mead method. The options are a solver-specific set of key-value pairs, represented as a dictionary. Here, xtol is the relative error acceptable for convergence, and disp is set to print a message. Another package that is extremely useful for machine learning applications is scipy.linalg. This package adds the ability to perform tasks such as inverting matrices, calculating eigenvalues, and matrix decomposition.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset