What is a pandas DataFrame?

A pandas DataFrame can be thought of as a two-dimensional, matrix-like data structure that consists of rows and columns. A pandas DataFrame is analogous to a dataframe in R or a table in SQL. Advantages over traditional matrices and other Python data structures include the ability to have columns of different types in the same DataFrame, a wide array of predefined functions for easy data manipulation, and one-line interfaces that allow quick conversion to other file formats including databases, flat file formats, and NumPy arrays (for integration with scikit-learn's machine learning functionality). Therefore, pandas is indeed the glue that holds together many machine learning pipelines, from data importation to algorithm application.

The limitations of pandas include slower performance and lack of built-in parallel processing for pandas functionality. Therefore, if you are working with millions or billions of data points, Apache Spark (https://spark.apache.org/) may be a better option, since it has parallel processing built into its language.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset