There's more...

The SQL queries are not limited to extracting data only. We can also run some aggregations:

spark.sql('''
SELECT b.FormFactor
, COUNT(*) AS ComputerCnt
FROM sample_data_view AS a
LEFT JOIN models AS b
ON a.Model == b.Model
GROUP BY FormFactor
''').show()

In this simple example, we will count how many different computers of different FormFactors we have. The COUNT(*) operator counts how many computers we have and works in conjunction with the GROUP BY clause that specifies the aggregation columns.

Here's what we get from this query:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset