Where does pandas fit in the pipeline?

As discussed in the previous section, pandas can be used to perform Step 4 to Step 6 in the pipeline. And Step 4 to Step 6 are the backbone of any data science process, application, or product:

Where does pandas fit in the data analytics pipeline?

The Step 1 to Step 6 can be performed in pandas by some methods. Those in the Step 4 to Step 6 are the primary tasks while the Step 1 to Step 3 can also be done in some way or other in pandas.

pandas is an indispensable library if you're working with data, and it would be near impossible to find code for data modeling that doesn't import pandas into the working environment. Easy-to-use syntax in Python and the availability of a spreadsheet-like data structure called a dataframe make it amenable even to users who are too comfortable and too unwilling to move away from Excel. At the same time, it is loved by scientists and researchers to handle exotic file formats such as parquet, feather file, and many more. It can read data in batch mode without clogging all the machine's memory. No wonder the famous news aggregator Quartz called it the most important tool in data science.

pandas is suited well for the following types of dataset:

  • Tabular with heterogeneous type columns
  • Ordered and unordered time series
  • Matrix/array data with labeled or unlabeled rows and columns

pandas can perform the following operations on data with finesse:

  • Easy handling of missing and NaN data
  • Addition and deletion of columns
  • Automatic and explicit data alignment with labels
  • GroupBy for aggregating and transforming data using split-apply-combine
  • Converting differently indexed Python or NumPy data to DataFrame
  • Slicing, indexing, hierarchical indexing, and subsetting of data
  • Merging, joining, and concatenating data
  • I/O methods for flat files, HDF5, feather, and parquet formats
  • Time series functionality
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset