---
title: "Connect to a Databricks Workspace"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Connect to a Databricks Workspace}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

## Defining Credentials

The `{brickster}` package connects to a Databricks workspace is two ways:

1.  [OAuth user-to-machine (U2M) authentication](https://docs.databricks.com/en/dev-tools/auth/oauth-u2m.html#oauth-user-to-machine-u2m-authentication)
2.  [Personal Access Tokens (PAT)](https://docs.databricks.com/en/dev-tools/auth/pat.html)

It's recommended to use option (1) when using `{brickster}` interactively, if you need to run code via an automated process the only option currently is (2).

`{brickster}` will automatically detect when a session has [Posit Workbench managed Databricks OAuth credentials](https://docs.posit.co/ide/server-pro/integration/databricks.html) enabled. For more information about this authentication flow see the section [Posit Workbench Managed Databricks OAuth Credentials](#posit-workbench-managed-databricks-oauth-credentials).

Personal Access Tokens can be generated in a few steps, for a step-by-step breakdown [refer to the documentation](https://docs.databricks.com/dev-tools/api/latest/authentication.html).

Once you have a token you'll be able to store it alongside the workspace URL in an `.Renviron` file. The `.Renviron` is used for storing the variables, such as those which may be sensitive (e.g. credentials) and de-couple them from the code [additional reading](https://CRAN.R-project.org/package=startup/vignettes/startup-intro.html).

To get started add the following to your `.Renviron`:

-   `DATABRICKS_HOST`: The workspace URL

-   `DATABRICKS_TOKEN`: Personal access token (*not required if using OAuth U2M*)

-   `DATABRICKS_WSID`: The workspace ID ([docs](https://docs.databricks.com/workspace/workspace-details.html#workspace-instance-names-urls-and-ids))

`DATABRICKS_WSID` is only required for the RStudio IDE integration with the connection pane.

Example of entries in `.Renviron`:

```         
DATABRICKS_HOST=xxxxxxx.cloud.databricks.com
DATABRICKS_TOKEN=dapi123456789012345678a9bc01234defg5
DATABRICKS_WSID=123123123123123
```

**Note**: Recommend creating an `.Renviron` for each project. You can create `.Renviron` within your user home directory if required.

Restarting your R session will allow those variable to be picked up via the `{brickster}` package.

## Using Credentials with `{brickster}`

Authentication should now be possible without specifying the credentials in your R code. You can load `{brickster}` and list the clusters within the workspace using `db_cluster_list()`, to access the host/token use `db_host()`/`db_token()` respectively.

```{r setup}
library(brickster)

# using db_host() and db_token() to get credentials
clusters <- db_cluster_list(host = db_host(), token = db_token())
```

All `{brickster}` functions have their host/token parameters default to calling `db_host()`/`db_token()` therefore we can omit explicit calls to the functions.

```{r}
# all host/token parameters default to db_host()/db_token()
clusters <- db_cluster_list()
```

When using OAuth U2M authentication you don't define a token in `.Renviron` and therefore `db_token()` will return `NULL`.

## Managing Multiple Credentials

There are two methods that `{brickster}` supports to simplify switching of credentials within an R project/session:

1.  Adding multiple credentials to `.Renviron`, each additional set of credentials is differentiated via a suffix (e.g. `DATABRICKS_TOKEN_DEV`)
2.  Using a `.databrickscfg` file (primary method in [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html#set-up-authentication))

To differentiate between (1) and (2) the option `use_databrickscfg` is used, the following example shows how to switch the session to use `.databrickscfg`.

```{r}
# will use the `DEFAULT` profile in `.databrickscfg`
options(use_databrickscfg = TRUE)

# values returned should be those in profile of `.databrickscfg`
db_host()
db_token()
```

The default behaviour is to read credentials from `.Renviron`. If you wish to change this it's recommended to set the option within `.Rprofile` so that it's set during initialization of the R session.

### Switching Between Credentials

The `db_profile` option controls which profiles credentials are returned by `db_host()`/`db_token()`/`db_wsid()`.

Profiles enable you to switch contexts between:

-   Different workspaces (e.g. development or production)

-   Different permissions (e.g. admin or restricted user)

This behaviour works when using credentials specified in either `.Renviron` or `.databrickscfg`:

```{r}
# using .Renviron
db_host() # returns `DB_HOST` (.Renviron)

# switch profile to 'prod'
options(db_profile = "prod")
db_host() # returns `DB_HOST_PROD` (.Renviron)

# set back to default (NULL)
options(db_profile = NULL)
# use .databrickcfg
options(use_databrickscfg = TRUE)
db_host() # returns host from `DEFAULT` profile (.databrickscfg)

options(db_profile = "prod")
db_host() # returns host from `prod` profile in (.datarickscfg)
```

It is expected that profiles in `.Renviron` will adhere to the same naming convention as default but add an additional suffix.

Here is an example of an `.Renviron` file that has three profiles (default, dev, prod):

```         
# default
DATABRICKS_HOST=xxxxxxx.cloud.databricks.com
DATABRICKS_TOKEN=dapixxxxxxxxxxxxxxxxxxxxxxxxx
DATABRICKS_WSID=123123123123123
# dev
DATABRICKS_HOST_DEV=xxxxxxx-dev.cloud.databricks.com
DATABRICKS_TOKEN_DEV=dapixxxxxxxxxxxxxxxxxxxxxxxxx
DATABRICKS_WSID_DEV=123123123123124
# prod
DATABRICKS_HOST_PROD=xxxxxxx-prod.cloud.databricks.com
DATABRICKS_TOKEN_PROD=dapixxxxxxxxxxxxxxxxxxxxxxxxx
DATABRICKS_WSID_PROD=123123123123125
```

### Configuring `.databrickscfg`

For details on configuring please refer to [documentation from Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html#connection-profiles).

There is only one `{brickster}` specific feature and it is the inclusion of `wsid` alongside `host`/`token`.

`wsid` is used by the connections pane integration in RStudio as the underlying API's require it.

### Posit Workbench Managed Databricks OAuth Credentials

Posit Workbench has a [managed Databricks OAuth credentials](https://docs.posit.co/ide/server-pro/integration/databricks.html) feature, which allows users to sign into a Databricks workspace from the home page of Workbench when launching a session and then access Databricks resources as their own identity. When in an RStudio Pro session running on Posit Workbench with managed Databricks OAuth credentials selected, `{brickster}` functions using `db_host()`/`db_token()` respectively should just work without needing to specify any credentials in your R code. See the code below as an example.

```{r}
library(brickster)
db_cluster_list()
```

`{brickster}` will automatically detect when a session has Workbench managed OAuth credentials and then use the `workbench` profile defined in a `.databrickscfg` file at the `DATABRICKS_CONFIG_FILE` specified location. Workbench generates this `.databrickscfg` file in a temporary directory and should not be modified directly.

To use an alternative `.databrickscfg` file, a different `profile`, an alternative env variable `DATABRICKS_HOST` or set an env variable `DATABRICKS_TOKEN`, launch an RStudio Pro session without the Databricks managed credentials box selected.