How Python and pandas fit into the data analytics pipeline

The Python programming language is one of the fastest-growing languages today in the emerging field of data science and analytics. Python was created by Guido van Rossum in 1991, and its key features include the following:

Interpreted rather than compiled
Dynamic type system
Pass by value with object references
Modular capability
Comprehensive libraries
Extensibility with respect to other languages
Object orientation
Most of the major programming paradigms: procedural, object-oriented, and, to a lesser extent, functional

For more information, refer to the following article on Python at https://www.python.org/about/.

Among the characteristics that make Python popular for data science are its very user-friendly (human-readable) syntax, the fact that it is interpreted rather than compiled (leading to faster development time), and it has very comprehensive libraries for parsing and analyzing data, as well as its capacity for numerical and statistical computations. Python has libraries that provide a complete toolkit for data science and analysis. The major ones are as follows:

NumPy: The general-purpose array functionality with an emphasis on numeric computation
SciPy: Numerical computing
Matplotlib: Graphics
pandas: Series and data frames (1D and 2D array-like types)
Scikit-learn: Machine learning
NLTK: Natural language processing
Statstool: Statistical analysis

For this book, we will be focusing on the fourth library in the preceding list, pandas.

Table of Contents for How Python and pandas fit into the data analytics pipeline

Create new playlist

Sign In

Sign Up

Table of Contents for
How Python and pandas fit into the data analytics pipeline