Calling Databricks notebook execution in ADF

We now have laid down everything to trigger the notebook execution in ADF. Going back to the factory, we're going to add a linked service. So far, all the linked services we created in this book were connected to a data store: SQL Server, blob storage, and so on. This time, we're going to use a computation linked service: Azure Databricks.

As shown in the following screenshot, add a linked service. Click on the Compute tab, select Azure Databricks, and click on Continue:

We'll now enter the details of the cluster in the next step. We used Azure Databricks for the Type property. The following screenshot shows the properties to set:

The properties are explained as follows:

  • Connect via integration runtime: We use the Default one. It has access to all Azure resources.
  • Account selection method: From Azure subscription.
  • Select cluster: We're going to create a cluster on the fly and it will be running only for the duration of our job; therefore, we select the New job cluster option. It will be terminated later. If you selected the Existing cluster option, it would mean that you would use an interactive cluster, one that is already running outside ADF.
  • Domain/Region: Choose the domain/region you have used since the beginning of the book.
  • Access token: This one is a bit trickier. You can get it in your Databrick workspace by clicking on the user icon and selecting User Settings from the submenu:

You are taken to the Access Tokens tab. Click on Generate New Token, as shown in the following screenshot:

A token appears in a new window. Copy it and store it someplace from where you can retrieve it later:

Going back to the factory, paste the token in the Access token textbox:

  • Cluster node type: Choose the smallest node since you do not have a lot of data to process.
  • Cluster version: Accept the default.
  • Workers: This tells us how many machines will execute the work. Since we do not have a lot of data to process, we'll use 1.
  • Enable autoscalingWe do not need to enable this in our case.

Click on Finish to complete the linked service creation.

We'll now add a new pipeline to our factory, and we'll drag and drop a Databrick's Notebook activity on it as shown in the following screenshot. Rename it AzureDatabricks:

Now, click on the Settings tab and adjust the properties as shown in this screenshot:

The properties are explained here:

  • Linked service: Select AzureDatabricks from the list.
  • Notebook path: /ADFCalls/ADFV2Notebook.
  • Parameters: We click on + New for each parameter we're going to add. They must match the parameter (widgets) system names in the notebook:
    • storage_account_name: ADFV2Book
    • storage_account_key: Your account storage access key
    •  file_location: In our case, wasbs://[email protected]/
    •  file_type: csv

We can now click on Debug to test it and see whether everything works well. Once we do this, it will create a real execution in Azure Databricks. Going to the Azure Databricks workspace, click on the Clusters tab and you should see something like this:

Once the execution completes, we can see it in the list of Job Clusters with status Terminated, as shown in the following screenshot:

If we click on job-11-run-1, we can see the detailed run, as shown in the next screenshot:

Now, go back to the factory. Click on the refresh icon to refresh the status monitoring, as shown in the following screenshot:

We can now attach the AzureDatabricks notebook activity to the main pipeline. The final pipeline for this chapter should look like this:

We can now delete the unnecessary pipelines used for development and publish all objects in the factory. Our factory is now executing an SSIS package that refreshes an on-premise data warehouse. Then it copies some data to the cloud to be modified further via Sparks Databricks.

ADF allows us to leverage hybrid ETL scenarios, both on-premise and in the cloud.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset