The great benefit of the Unity catalog is that data is ours and stored in an open format on cloud storage in our container. To install the unity catalog, we need to create storage and give databricks access to that storage so metastore can be made through the admin console.
We will use Azure Cloud and Azure Data Lake Storage in that manual.
Storage account
We need to search for “Storage accounts” in the Azure portal.
In the storage account, we hit Create button.
On the next page, the most important is to use the region as our databricks region, and on the advanced page, please select it as Data Lake Storage Gen2.
We need to go to create a storage account, and we need to create a container on which we will store metastore.
We need to remember the storage account and container name as we will later use it in metastore settings as <storage_account_name>@<container_name>.dfs.core.windows.net/ Copy and save so we will use it later.
Access Connector for Azure Databricks
Now we need to give databricks access to our storage. So we need to search for “Access Connector for Azure databricks” to achieve that.
Hit “create” and remember again to use the same region.
After the creation is complete, we must go to the newly created resource. From there, we need to copy the ID of the Access Connector. It is pretty long in format /subscriptions/<YOUR_SUBSCRIPTION_ID>/resourceGroups/<YOUR_RESOURCE_GROUP>/providers/Microsoft.Databricks/accessConnectors/<ACCESS_CONNECTOR_NAME> . Copy and save it so we will use it later.
Grant access to the storage account.
Okay, now we need to back to our storage account for the unity catalog. Inside the storage account on the left menu, please click “Access Control (IAM)” and then “+ Add.”
We need to select the role “Storage Blob Data Contributor”.
We need to select the previously created Access Connector. It is registered as managed identity. We must choose it and hit “Select” and “Review + Assign”.
Creating metastore
Now we can go back to Databricks. On the top right corner menu, please select “Manage Account”.
In the left menu, we need to select “Data” and choose “Create metastore”. Next, we must specify the name and the region we are using. To ADLS Gen 2 path, we need to enter <storage_account_name>@<container_name>.dfs.core.windows.net/, which we created earlier. The forward slash is essential, as is defining the root directory in the container.
Access connector id is the value that we copied earlier in format /subscriptions/<YOUR_SUBSCRIPTION_ID>/resourceGroups/<YOUR_RESOURCE_GROUP>/providers/Microsoft.Databricks/accessConnectors/<ACCESS_CONNECTOR_NAME>
In the next step, we need to select our databricks workspace, and that’s all.
Tests
In databricks, we can go to data explorer. There will be displayed information about created metastore. Inside the metastore example, catalog “main” with schema “default” is created. To test metastore, we can create a table using CREATE TABLE main.default.test (ID int);