Run U-SQL from a job in the Data Lake Analytics

In this section, we will learn how to create a Data Lake Analytics job that will debug and run a U-SQL script. This job will summarize data from the file created by Task 1 in the preceding data factory pipeline (the task that imports SQL Server data into a blob file). The summary data will be copied to a new file on the blob storage.

With U-SQL, we can join different blob files and manipulate/summarize the data. We can also import data from different data sources. However, in this section, we will only provide a very basic U-SQL as an example.

Let's get started...

First, we open the Data Lake Analytics resource from the dashboard. We first need to add the Blob Storage account here. Open Data sources:

Click on Add data source:

Fill in the details:

You should see the added blob storage in the list:

You can explore the containers in the blob storage and files from the Data Lake Analytics | Data explorer:

Click on Data explorer:

In order to get the path, you can click on the container and then copy the WASB PATH, either for the container or the file itself. By clicking on the file, you can find the path in the Properties.

Do not copy a URL from the blob storage resource (only from the Data Lake Analytics), as it will not give you the WASB PATH.

Next, we will add the SQL Server as a data source as well.

So now we can click on New job from the Data Lake Analytics page to add a job that will execute a U-SQL:

Data Lake Analytics blade (creating new job)

Create the job with the U-SQL code. You may Save it; the U-SQL code will appear as a file in the Downloads of your browser.

You need to Submit it in order to run the job:

Here is the U-SQL code example (use the container or filename URL you have previously copied in the script as the input file):

@results = EXTRACT
Date DateTime
,Supplier string
,Category string
,[Stock Item] string
,[Ordered Quantity] int

FROM "wasb://purchase-data@adfv2book/ADFV2BookPurchaseData.csv"
USING Extractors.Tsv(skipFirstNRows:1); //Skip header row

@sumresults =SELECT Supplier, Category,
SUM([Ordered Quantity]) AS TotalQuanity
FROM @results
GROUP BY Supplier, Category;

OUTPUT @sumresults
TO "wasb://purchase-data@adfv2book/SumPurchase.csv"
USING Outputters.Csv();

At the end of the run, this is what you wish to see:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset