Chapter 12

How can we understand some general properties of a dataset with pandas?

Using either specific statistics, such as mean, median, or standard deviation, on specific columns. Alternatively, you can use the describe method—it will compute descriptive statistics (the ones above it, plus the minimum/maximum, quartiles, and a few more) for all the columns in a dataframe.

What does the resample function do in pandas? How is it different from aggregation?

This method is meant to be used on a dataframe of time-based records. resample works similar to aggregation, except that it groups by a time period and returns rows (with empty values) for missing periods as well.

How does visualization work in pandas?

Pandas has an extensive and simple interface for visualization, but it doesn't create charts on its own; all the actual visual stuff is done by matplotlib. Starting with version 0.25, pandas allows other visualization engines to be used instead.

What are the benefits of declarative data visualization (for example, with Altair)?

There are multiple benefits to this approach. First, declaration (also known as a specification) is decoupled from the engine – so, in theory, it can be used with different engines. Next, specification is also decoupled from the data, and so it can be used on different datasets with no change. Third, it is decoupled from the aesthetical parts, so colors, fonts, and margins can be defined externally and easily adjusted outside of the specification. As a result, the declarative approach allows for a very flexible and effective workflow, allowing ease of change, iteration, and reuse.

In which cases can big data visualization be useful?

Big data visualization can be extremely useful if you wish to understand the overall distribution of the dataset. This is especially true if you're working with spatial data, networks, or embeddings.

Table of Contents for Chapter 12

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 12