NumPy ndarrays

Arrays are vital objects in the data analysis scenario. Arrays allow for structured handling of elements that are stacked across rows and columns. The elements of an array are bound by the rule that they should all be of the same data type. For example, the medical records of five patients have been presented as an array as follows:

Blood glucose level

Heart rate

Cholesterol level

Peter Parker

100

65

160

Bruce Wayne

150

82

200

Tony Stark

90

55

80

Barry Allen

130

73

220

Steve Rogers

190

80

150

 

It is seen that all 15 elements are of data type int. Arrays could also be composed of strings, floats, or complex numbers. Arrays could be constructed from lists—a widely used and versatile data structure in Python:

array_list = [[100, 65, 160],
[150, 82, 200],
[90, 55, 80],
[130, 73, 220],
[190, 80, 150]]

An element in the ith row and jth column (for example, first row and second column in the first example) of an array or matrix can be accessed as shown in the following code. Note that indexing in Python starts from 0:

In [2]: array_list[1][2]
Out[2]: 200

In [3]: array_list[3][0]
Out[3]: 130

Python has an built-in array module to create arrays. However, this array module is more like a glorified list where all elements are required to have the same data type. An array can be created using the array module by providing two arguments—the type code of the data type, and the elements in a list, string, or any iterable object. Let's create an array of floats. Here, d is the type code for a double-floating point value:

import array as arr
arr_x = arr.array("d", [98.6, 22.35, 72.1])

It is not possible to create a two-dimensional entity with rows and columns using the array module. This can be achieved through a nested list of such arrays. Special functions implicit with matrices or arrays, such as matrix multiplication, determinants, and eigenvalues, are not defined in this module.

NumPy is the preferred package to create and work on array-type objects. NumPy allows multidimensional arrays to be created. Multidimensional arrays provide a systematic and efficient framework for storing data. Complex computations, which are built-in vectorized operations in the NumPy package, can be done quickly on these multidimensional arrays without the need for loops. Consider the earlier example where we created a two-dimensional array to store the medical records of five patients. The patients' names and the clinical indicators were the two dimensions in this case. Now, if the clinical parameters of the same patients were recorded for three years, from 2016 to 2018, then all this information could be conveniently represented in a three-dimensional array. The year in which the records were fetched will get in as the third dimension. The resultant array will be of dimension 3 x 5 x 3, and entirely composed of integers:

2016 2017 2018
100 65 160 95 68 140 110 72 160
150 82 200 145 80 222 160 95 185
90 55 80 90 62 100 100 80 110
130 73 220 150 92 200 140 92 120
190 80 150 140 60 90 100 55 100

 

In NumPy, these multidimensional arrays are referred to as ndarrays (n-dimensional arrays). All NumPy array objects are of the type numpy.ndarray.

Let's view the preceding data as an ndarray:

In [4]: ndarray_1

Out[4]:
array([[[100, 65, 160],
[150, 82, 200],
[ 90, 55, 80],
[130, 73, 220],
[190, 80, 150]],
[[ 95, 68, 140],
[145, 80, 222],
[ 90, 62, 100],
[150, 92, 200],
[140, 60, 90]],
[[110, 72, 160],
[160, 95, 185],
[100, 80, 110],
[140, 92, 120],
[100, 55, 100]]])

Attributes of an ndarray such as the data type, shape, number of dimensions, and size can be accessed by different attributes of the array. Some attributes for the ndarray ndarray_1 have been explored in the following code:

# Data type of the array
In [5]: ndarray_1.dtype
Out[5]: dtype('int32')

# Shape of the array
In [6]: ndarray_1.shape
Out[6]: (3, 5, 3)

# Number of dimensions in the array
In [7]: ndarray_1.ndim
Out[7]: 3

# Size of the array (number of elements in the array)
In [8]: ndarray_1.size
Out[8]: 45

NumPy's ndarray makes use of a strided indexing scheme for its internal memory layout. A memory segment by itself can accommodate only one-dimensional structures. Hence, a specific memory allocation scheme such as the strided indexing scheme is needed to facilitate easy indexing and slicing of ndarrays. A stride indicates the number of bytes to jump to traverse to the subsequent element. The number of bytes for each stride is determined by the data type of the array. Let's understand strides through the array explored earlier. The number of bytes occupied by each element can be determined as shown in the following code:

In [9]: ndarray_1.itemsize
Out[9]: 4
In [10]: ndarray_1.nbytes
Out[10]: 180

It is seen that each element occupies 4 bytes, and the entire array occupies 180 bytes. The strides for the array are represented as follows:

In [11]: ndarray_1.strides
Out[11]: (60, 12, 4)

The shape of the array is given by the tuple (3, 5, 3). The values in the tuple represent the number of years for which there is data, the number of patients, and the number of clinical parameters, respectively. For each year or first dimension, there are 15 records, and hence to move from one year to another in the array, 60 bytes should be jumped across. On a similar note, each distinct patient has 3 records for a given year, and 12 bytes of memory should be moved past to get to the next patient.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset