NumPy ndarrays

Arrays are vital objects in the data analysis scenario. Arrays allow for structured handling of elements that are stacked across rows and columns. The elements of an array are bound by the rule that they should all be of the same data type. For example, the medical records of five patients have been presented as an array as follows:

	Blood glucose level	Heart rate	Cholesterol level
Peter Parker	100	65	160
Bruce Wayne	150	82	200
Tony Stark	90	55	80
Barry Allen	130	73	220
Steve Rogers	190	80	150

It is seen that all 15 elements are of data type int. Arrays could also be composed of strings, floats, or complex numbers. Arrays could be constructed from lists—a widely used and versatile data structure in Python:

array_list = [[100, 65, 160],
[150, 82, 200],
[90, 55, 80],
[130, 73, 220],
[190, 80, 150]]

An element in the i^th row and j^th column (for example, first row and second column in the first example) of an array or matrix can be accessed as shown in the following code. Note that indexing in Python starts from 0:

In [2]: array_list[1][2]
Out[2]: 200

In [3]: array_list[3][0]
Out[3]: 130

Python has an built-in array module to create arrays. However, this array module is more like a glorified list where all elements are required to have the same data type. An array can be created using the array module by providing two arguments—the type code of the data type, and the elements in a list, string, or any iterable object. Let's create an array of floats. Here, d is the type code for a double-floating point value:

import array as arr
arr_x = arr.array("d", [98.6, 22.35, 72.1])

It is not possible to create a two-dimensional entity with rows and columns using the array module. This can be achieved through a nested list of such arrays. Special functions implicit with matrices or arrays, such as matrix multiplication, determinants, and eigenvalues, are not defined in this module.

NumPy is the preferred package to create and work on array-type objects. NumPy allows multidimensional arrays to be created. Multidimensional arrays provide a systematic and efficient framework for storing data. Complex computations, which are built-in vectorized operations in the NumPy package, can be done quickly on these multidimensional arrays without the need for loops. Consider the earlier example where we created a two-dimensional array to store the medical records of five patients. The patients' names and the clinical indicators were the two dimensions in this case. Now, if the clinical parameters of the same patients were recorded for three years, from 2016 to 2018, then all this information could be conveniently represented in a three-dimensional array. The year in which the records were fetched will get in as the third dimension. The resultant array will be of dimension 3 x 5 x 3, and entirely composed of integers:

2016			2017			2018
100	65	160	95	68	140	110	72	160
150	82	200	145	80	222	160	95	185
90	55	80	90	62	100	100	80	110
130	73	220	150	92	200	140	92	120
190	80	150	140	60	90	100	55	100

In NumPy, these multidimensional arrays are referred to as ndarrays (n-dimensional arrays). All NumPy array objects are of the type numpy.ndarray.

Let's view the preceding data as an ndarray:

In [4]: ndarray_1

Out[4]:
array([[[100, 65, 160],
[150, 82, 200],
[ 90, 55, 80],
[130, 73, 220],
[190, 80, 150]],
[[ 95, 68, 140],
[145, 80, 222],
[ 90, 62, 100],
[150, 92, 200],
[140, 60, 90]],
[[110, 72, 160],
[160, 95, 185],
[100, 80, 110],
[140, 92, 120],
[100, 55, 100]]])

Attributes of an ndarray such as the data type, shape, number of dimensions, and size can be accessed by different attributes of the array. Some attributes for the ndarray ndarray_1 have been explored in the following code:

# Data type of the array
In [5]: ndarray_1.dtype
Out[5]: dtype('int32')

# Shape of the array
In [6]: ndarray_1.shape
Out[6]: (3, 5, 3)

# Number of dimensions in the array
In [7]: ndarray_1.ndim
Out[7]: 3

# Size of the array (number of elements in the array)
In [8]: ndarray_1.size
Out[8]: 45

NumPy's ndarray makes use of a strided indexing scheme for its internal memory layout. A memory segment by itself can accommodate only one-dimensional structures. Hence, a specific memory allocation scheme such as the strided indexing scheme is needed to facilitate easy indexing and slicing of ndarrays. A stride indicates the number of bytes to jump to traverse to the subsequent element. The number of bytes for each stride is determined by the data type of the array. Let's understand strides through the array explored earlier. The number of bytes occupied by each element can be determined as shown in the following code:

In [9]: ndarray_1.itemsize
Out[9]: 4
In [10]: ndarray_1.nbytes
Out[10]: 180

It is seen that each element occupies 4 bytes, and the entire array occupies 180 bytes. The strides for the array are represented as follows:

In [11]: ndarray_1.strides
Out[11]: (60, 12, 4)

The shape of the array is given by the tuple (3, 5, 3). The values in the tuple represent the number of years for which there is data, the number of patients, and the number of clinical parameters, respectively. For each year or first dimension, there are 15 records, and hence to move from one year to another in the array, 60 bytes should be jumped across. On a similar note, each distinct patient has 3 records for a given year, and 12 bytes of memory should be moved past to get to the next patient.

Table of Contents for NumPy ndarrays

Create new playlist

Sign In

Sign Up

Table of Contents for
NumPy ndarrays