Introduction

In the previous chapters, we have become familiar with instantiating NumPy arrays, pandas data frames/series, and performing some basic data manipulation tasks using those. However, a typical data analysis task would involve a more detailed exploration, which in turn requires us to perform more scientific tasks.

This chapter discusses the solution of matrix-oriented problems in SciPy, which constitute the fundamentals of much of the work done in scientific computing and data analysis.

In order to understand the need for learning matrix analysis and linear algebra further, let us look into a few examples:

Image analysis: Essentially, one can consider an image a matrix with m rows and n columns. In any type of image analysis, such as image classification or transformation, we can potentially work on the image by first converting it into a matrix format and then performing the analysis.
Optimization: In some data analysis tasks, we could be optimizing certain parameters/variables—we might want to maximize or minimize those (for example, in a regression exercise, we would be minimizing the cost function). Linear algebra forms the basis of such optimization tasks.
Recommender systems: One of the popular techniques used in building recommender systems is called singular value decomposition, and it is a combination of matrix analysis and linear algebra.
Text mining/market basket analysis: There are multiple use cases in which the analyses results in a matrix where the majority of the matrix elements are zero, with only a few elements with a value of one. In the example of market basket analysis, if each transaction represents a row and each column represents a single product in the store, a majority of the rows would be zero, as the customer would be purchasing only a few out of the thousands of products present in the store. This is a classical example of a sparse matrix (a matrix with only a few ones). Learning to analyze on top of sparse matrices helps us code more efficiently.
Dimensionality reduction: Extracting the eigenvalues and eigenvectors of a matrix is a key step in reducing the number of dimensions (number of columns in a dataset) without losing a lot of information from the dataset.

Table of Contents for Introduction

Create new playlist

Sign In

Sign Up

Table of Contents for
Introduction