Skip to content

Use databricks profiles to emulate service principals

TL;DR: Edit your ~/.databrickscfg file to create a profile for your service principals.

Problem: The feedback loop when developing a CI/CD pipeline is too slow

I have a CI/CD pipeline that interacts with a Databricks workspace through the Databricks CLI. I usually develop the pipeline locally, testing it against a sandbox Databricks workspace, authenticated as myself.

But when I deploy the pipeline to the CI/CD environment, it runs as a service principal, first against a dev workspace, then against a prod workspace.

There can be some issues that only appear when running as a service principal, like permissions errors or workspace configurations. And the feedback loop is too slow: I have to commit, push, wait for the pipeline to run, check the logs, and repeat.

I want to test the pipeline locally, authenticated as a service principal, to catch these issues earlier.

Solution: Use databricks profiles to emulate service principals

Reading about the one million ways to authenticate to an Azure Databricks workspace is enough to give me a headache (Seriously, there are too many options). I have previously used environment variables to authenticate as a service principal, the various secrets in an .env file, and commenting and un-commenting as needed. It is a mess, and I'm guaranteed to forget to switch back to my user account at some point.

Instead, I can use databricks profiles to store the different authentication configurations. In ~/.databrickscfg, I can create a profile for each service principal, and switch between them with the --profile flag.

Here is an example of a ~/.databrickscfg file with two Service principal profiles:

[DEFAULT]
host  = <SOME_HOST>
token = <SOME_TOKEN>

[project-prod-sp]
host                = 
azure_client_id     = 
azure_client_secret = 
azure_tenant_id     = 

[project-dev-sp]
<same setup as above>

Of course, you should replace the placeholders with the actual values.

To test what workspace and user your profile is using, you can try the following command:

databricks auth describe --profile project-prod-sp

This will also show you where the authentication is coming from (because, as I mentioned above, there are too many ways to authenticate).

Finally, you can run your pipeline locally, using the --profile flag to specify that you want to use the service principal profile:

databricks bundle deploy --profile project-dev-sp

Alternative to using --profile flag

If you still want to use environment variables, you can set the DATABRICKS_CONFIG_PROFILE variable to the profile name you want to use, e.g.:

DATABRICKS_CONFIG_PROFILE=DEFAULT