---
title: "AzureCosmosR: Interface to Azure Cosmos DB"
author: Hong Ooi
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{AzureCosmosR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{utf8}
---

# AzureCosmosR

An interface to [Azure Cosmos DB](https://azure.microsoft.com/en-us/services/cosmos-db/), a NoSQL database service from Microsoft.

> Azure Cosmos DB is a fully managed NoSQL database for modern app development. Single-digit millisecond response times, and automatic and instant scalability, guarantee speed at any scale. Business continuity is assured with SLA-backed availability and enterprise-grade security. App development is faster and more productive thanks to turnkey multi region data distribution anywhere in the world, open source APIs and SDKs for popular languages. As a fully managed service, Azure Cosmos DB takes database administration off your hands with automatic management, updates and patching. It also handles capacity management with cost-effective serverless and automatic scaling options that respond to application needs to match capacity with demand.

On the Resource Manager side, AzureCosmosR extends the [AzureRMR](https://cran.r-project.org/package=AzureRMR) class framework to allow creating and managing Cosmos DB accounts. On the client side, it provides a comprehensive interface to the Cosmos DB SQL/core API as well as bridges to the MongoDB and table storage APIs.

## SQL interface

AzureCosmosR provides a suite of methods to work with databases, containers (tables) and documents (rows) using the SQL API.

```r
library(dplyr)
library(AzureCosmosR)

endp <- cosmos_endpoint("https://myaccount.documents.azure.com:443/", key="mykey")

list_cosmos_databases(endp)

db <- get_cosmos_database(endp, "mydatabase")

# create a new container and upload the Star Wars dataset from dplyr
cont <- create_cosmos_container(db, "mycontainer", partition_key="sex")
bulk_import(cont, starwars)

query_documents(cont, "select * from mycontainer")

# remove document metadata cruft
query_documents(cont, "select * from mycontainer", metadata=FALSE)

# an array select: all characters who appear in ANH
query_documents(cont,
    "select c.name
        from mycontainer c
        where array_contains(c.films, 'A New Hope')")
```

You can easily create and execute stored procedures and user-defined functions:

```r
proc <- create_stored_procedure(
    cont,
    "helloworld",
    'function () {
        var context = getContext();
        var response = context.getResponse();
        response.setBody("Hello, World");
    }'
)

exec_stored_procedure(proc)

create_udf(cont, "times2", "function(x) { return 2*x; }")

query_documents(cont, "select udf.times2(c.height) from cont c")
```

Aggregates take some extra work, as the Cosmos DB REST API only has limited support for cross-partition queries. Set `by_pkrange=TRUE` in the `query_documents` call, which will run the query on each partition key range (pkrange) and return a list of data frames. You can then process the list to obtain an overall result.

```r
# average height by sex, by pkrange
df_lst <- query_documents(cont,
    "select c.gender, count(1) n, avg(c.height) height
        from mycontainer c
        group by c.gender",
    by_pkrange=TRUE
)

# combine pkrange results
df_lst %>%
    bind_rows(.id="pkrange") %>%
    group_by(gender) %>%
    summarise(height=weighted.mean(height, n))
```

Full support for cross-partition queries, including aggregates, may come in a future version of AzureCosmosR.

## Other client interfaces

### MongoDB

You can query data in a MongoDB-enabled Cosmos DB instance using the mongolite package. AzureCosmosR provides a simple bridge to facilitate this.

```r
endp <- cosmos_mongo_endpoint("https://myaccount.mongo.cosmos.azure.com:443/", key="mykey")

# a mongolite::mongo object
conn <- cosmos_mongo_connection(endp, "mycollection", "mydatabase")
conn$find("{}")
```

For more information on working with MongoDB, see the [mongolite](https://jeroen.github.io/mongolite/) documentation.

### Table storage

You can work with data in a table storage-enabled Cosmos DB instance using the AzureTableStor package.

```r
endp <- AzureTableStor::table_endpoint("https://myaccount.table.cosmos.azure.com:443/", key="mykey")

tab <- AzureTableStor::storage_table(endp, "mytable")
AzureTableStor::list_table_entities(tab, filter="firstname eq 'Satya'")
```

### ODBC (SQL interface)

As an alternative to AzureCosmosR, you can also use the ODBC protocol to interface with the SQL API. By installing a suitable ODBC driver, you can then talk to Cosmos DB in a manner similar to other SQL databases. An advantage of the ODBC interface is that it fully supports cross-partition queries, unlike the REST API. A disadvantage is that it does not support nested document fields; functions like `array_contains()` cannot be used, and attempts to reference arrays and objects may return incorrect results.

```r
conn <- DBI::dbConnect(
    odbc::odbc(),
    driver="Microsoft Azure DocumentDB ODBC Driver",
    host="https://myaccount.documents.azure.com:443/",
    authenticationkey="mykey",
    RESTAPIversion="2018-12-31"  # for large partition key support
)

DBI::dbListTables(conn)

DBI::dbGetQuery(conn, "select * from mycontainer where gender = 'masculine'")
```

## Azure Resource Manager interface

On the ARM side, AzureCosmosR extends the AzureRMR class framework with a new `az_cosmosdb` class representing a Cosmos DB account resource, and methods for the `az_resource_group` resource group class.

```r
rg <- AzureRMR::get_azure_login()$
    get_subscription("sub_id")$
    get_resource_group("rgname")

rg$create_cosmosdb_account("mycosmosdb", interface="sql", free_tier=TRUE)
rg$list_cosmosdb_accounts()
cosmos <- rg$get_cosmosdb_account("mycosmosdb")

# access keys (passwords) for this account
cosmos$list_keys()

# get an endpoint object -- detects which API this account uses
endp <- cosmos$get_endpoint()

# API-specific endpoints
cosmos$get_sql_endpoint()
cosmos$get_mongo_endpoint()
cosmos$get_table_endpoint()
```

## Further information

- [An introduction to Azure Cosmos DB](https://www.sqlservercentral.com/articles/an-introduction-to-azure-cosmos-db)
- [Azure Cosmos DB documentation](https://docs.microsoft.com/en-us/azure/cosmos-db/)
- [REST API reference](https://docs.microsoft.com/en-us/rest/api/cosmos-db/)