Data Lake Analytics

In the previous section, we explored the capabilities of Stream Analytics. We calculated the efficiency of a compressor station and saved the results in data lake. In this simple exercise, we will reuse the data processed to produce a simple report of the average efficiency of our machines. We will do this using Data Lake Analytics. This is a big data service for processing a huge amount of data. It supports a very efficient parallelization mechanism and a map-reduce pattern. Indeed, Data Lake Analytics will compile any U-SQL functions that we write to maximize efficiency and speed up data processing:

  1. To enable Data Lake Analytics from the Azure portal, search for Data Lake Analytics.
  2. Click on Create New Data Analytics.
  3. We need to provide a name, such as iiotbookdla, and select our Data Lake Storage, which is iiotstore. These steps are shown in the following screenshot:

Building our Data Lake Analytics instance
  1. Finally, click on the Create button.

If everything went well, we can create our first job. To do this, perform the following steps:

  1. Add a new job from the Data Lake Analytics instance that we just created
  2. Provide a name, such as my-dla-efficiency-job as shown in the following screenshot:

Building our first Data Lake Analytics job
  1. In the text panel, copy and paste the following U-SQL code:
DECLARE @now DateTime = DateTime.Now;
DECLARE @outputfile = "/out/reports/"[email protected]("yyyy/MM/dd")+"-efficiency.csv";

// Step 1: extract data and skip the first row
@d =
EXTRACT device string,
ts string,
temperature float,
flow float,
efficiency float,
date DateTime,
filename string
FROM "/out/logs/{date:yyyy}/{date:MM}/{date:dd}/{filename}.csv"
USING Extractors.Tsv(skipFirstNRows:1);

// Step 2: build result
@result = SELECT
AVG(efficiency) AS efficiency,
device,
date
FROM @d
GROUP BY device, date;

// Step 3: write the OUTPUT
OUTPUT @result
TO @outputfile
USING Outputters.Text();
  1. Finally, click on the Submit button

This simple U-SQL script evaluates the average efficiency daily for each device and saves the report in Data Lake Storage to be operated on later by the operator or other analysts. For people familiar with T-SQL, U-SQL will be quite easy to read. The first block parses the CSV, the second block calculates the average grouping by device and date, and the last block saves the output. After a few seconds, the output will be processed and we can open our Data Lake Storage to see the report.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset