Loading an array from a text file

The loadtxt() function reads an array stored using the text format, in particular arrays stored using the savetxt() function. The following line of code reads the array stored in the array_x.txt, which was generated in a previous example:

x = np.loadtxt('array_x.txt')

This code will not produce any output, since the array is silently assigned to the x variable. To see a snapshot of the array, we can execute the following in the command line or Jupyter cell:

x[0:3, 0:3]

This prints the first three rows and columns of the array.

The loadtxt() command offers some flexibility in terms of the format of the text file being read, and has several options to facilitate loading data generated by other software. As an example, let's suppose that we have a text file with data from the following table:

Name
ID
Position
Salary
Years of service
Rob Reliable
101
Associate
42000.00
5
Sam Social
203
Associate
31000.00
3
Hellen Hardworking
105
Manager
67000.00
8

 

The data is in the employees.txt file and each row in the table occupies one line in the file, including the headings. In each row, fields are separated by a comma. Notice that some of the columns contain string data. Let's assume that we are only interested in the numerical fields. Under these assumptions, the relevant columns can be read into a NumPy array with the following statement:

x = np.loadtxt('employees.txt', delimiter=',', 
skiprows=1, usecols=(1,3,4))

The options in the loadtxt() function have the following meanings:

  • delimiter=',' specifies a comma as the field separator.
  • skiprows=1 states that the first row, containing the column headers, should be skipped.
  • usecols=(1,3,4) specifies which columns of the table should be read into the array. Since indexing is zero-based, this would result in the second, fourth, and fifth columns being loaded.
Notice that, since NumPy arrays are restricted to holding elements of the same data type, we have to use an array of floats in this example, even though the ID and Years of service columns would be better represented by integers. We could use a NumPy array with dtype=object to store all columns, but a better approach would be to use a pandas DataFrame, which is a data structure designed with generic data in mind. pandas is looked at in detail in Chapter 4, Data Wrangling with pandas.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset