How it works...

First, we select the two features we want to learn more about to see how they interact with each other; in our case they are the displacement and cylinders features.

Our example here is small so we can work with all our data. However, in the real world, you should sample your data first before attempting to plot billions of data points.

After registering the temp table, we use the %%sql magic to select all the data from the scatter table and expose it locally as a scatter_source. Now, we can start plotting:

%%local
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

fig = plt.figure(figsize=(12,9))
ax = fig.add_subplot(1, 1, 1)
ax.scatter(
list(scatter_source['Cylinders'])
, list(scatter_source['Displacement'])
, s = 200
, alpha = 0.5
)

ax.set_xlabel('Cylinders')
ax.set_ylabel('Displacement')

ax.set_title('Relationship between cylinders and displacement')

First, we load the Matplotlib library and set it up.

See the Drawing histograms recipe for a more detailed explanation of what these Matplotlib commands do.

Next, we create a figure and add a subplot to it. Then, we draw a scatter plot using our data; the x axis will represent the number of cylinders and the y axis will represent the displacement. Finally, we set the axes labels and the chart title. 

Here's what the final result looks like:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset