How Python and pandas fit into the data analytics pipeline

The Python programming language is one of the fastest-growing languages today in the emerging field of data science and analytics. Python was created by Guido van Rossum in 1991, and its key features include the following:

  • Interpreted rather than compiled
  • Dynamic type system
  • Pass by value with object references
  • Modular capability
  • Comprehensive libraries
  • Extensibility with respect to other languages
  • Object orientation
  • Most of the major programming paradigms: procedural, object-oriented, and, to a lesser extent, functional

For more information, refer to the following article on Python at https://www.python.org/about/.

Among the characteristics that make Python popular for data science are its very user-friendly (human-readable) syntax, the fact that it is interpreted rather than compiled (leading to faster development time), and it has very comprehensive libraries for parsing and analyzing data, as well as its capacity for numerical and statistical computations. Python has libraries that provide a complete toolkit for data science and analysis. The major ones are as follows:

  • NumPy: The general-purpose array functionality with an emphasis on numeric computation
  • SciPy: Numerical computing
  • Matplotlib: Graphics
  • pandas: Series and data frames (1D and 2D array-like types)
  • Scikit-learn: Machine learning
  • NLTK: Natural language processing
  • Statstool: Statistical analysis

For this book, we will be focusing on the fourth library in the preceding list, pandas.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset