---
title: "The cdm reference"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{a01_cdm_reference}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

A cdm reference is a single R object that represents OMOP CDM data. The tables in the cdm reference may be in a database, but a cdm reference may also contain OMOP CDM tables that are in dataframes or tibbles, or in arrow. In the latter cases the cdm reference would typically be a subset of an original cdm reference that has been derived as part of a particular analysis.

omopgenerics provides a general class definition a cdm reference and a dataframe/ tibble implementation. For creating a cdm reference using a database, see the CDMConnector package (<https://darwin-eu.github.io/CDMConnector/>).

A cdm reference is a list of tables. These tables come in three types: standard OMOP CDM tables, cohort tables, and other auxiliary tables.

### 1) Standard OMOP CDM tables

There are multiple versions of the OMOP CDM. The list of tables included in version 5.3 are as follows.

```{r}
library(omopgenerics)
omopTables()
```

The standard OMOP tables have required fields. We can check the required column of the person table, for example, like so 
```{r}
omopColumns(table = "person", version = "5.3")
```

```{r}
omopColumns(table = "observation_period", version = "5.3")
```


### 2) Cohort tables

Studies using the OMOP CDM often create study-specific cohort tables. We also consider these as part of the cdm reference once created. Each cohort table is associated with a specific class of its own, a `generatedCohortSet`, which is described more in a subsequent vignette. As with the standard OMOP CDM tables, cohort tables are expected to contain a specific set of fields (with no restriction placed on whether they include additional fields or not).

```{r}
cohortColumns(table = "cohort", version = "5.3")
cohortColumns(table = "cohort_set", version = "5.3")
cohortColumns(table = "cohort_attrition", version = "5.3")
```

### 3) Achilles result tables

The Achilles R package provides descriptive statistics on an OMOP CDM database. The results from Achilles are stored in tables in the database. The following tables are created with the given columns.

```{r}
achillesTables()
achillesColumns("achilles_analysis")
achillesColumns("achilles_results")
achillesColumns("achilles_results_dist")
```

### 4) Other tables

Beyond the standard OMOP CDM tables and cohort tables, additional tables can be added to the cdm reference. These tables could, for example, be OMOP extension/ expansion tables or extra tables containing data required to perform a study but not normally included as part of the OMOP CDM. These tables could contain any set of fields.

## General rules for a cdm reference

Any table to be part of a cdm object has to fulfill the following conditions:

-   All tables must share a common source (that is, a mix of tables in the database and in-memory is not permitted).

-   The name of the tables must be lower snake_case.

-   The name of the column names of each table must be lower snake_case.

-   The `person` and `observation_period` tables must be present.

-   The cdm reference must have an attribute "cdmName" that gives the name associated with the data contained there within.

## Export metadata about the cdm reference

When the export method is applied to a cdm reference, metadata about that cdm will be written to a csv. The csv contains the following columns

| Variable                               | Description                                                                                             | Datatype  | Required |
|:----------------|:-----------------|:----------------|:--------------------|
| result_type                            | Always "Snapshot". Identifies this result as a summary of a cdm reference.                              | Character | Yes      |
| cdm_name                               | The name of the data source.                                                                            | Character | Yes      |
| cdm_source_name                        | Value of cdm source name taken from the cdm source table (if present in the cdm reference).             | Character | No       |
| cdm_description                        | Value of cdm description taken from the cdm source table (if present in the cdm reference).             | Character | No       |
| cdm_documentation_reference            | Value of cdm documentation reference taken from the cdm source table (if present in the cdm reference). | Character | No       |
| cdm_version                            | The cdm version associated with the cdm reference.                                                      | Character | Yes      |
| cdm_holder                             | Value of cdm holder reference taken from the cdm source table (if present in the cdm reference).        | Character | No       |
| cdm_release_date                       | Value of cdm release date taken from the cdm source table (if present in the cdm reference).            | Date      | No       |
| vocabulary_version                     | Version of the vocabulary being used taken from the concept table (if present in the cdm reference).    | Character | No       |
| person_count                           | Number of records in the person table.                                                                  | Integer   | Yes      |
| observation_period_count               | Number of records in the observation period table.                                                      | Integer   | Yes      |
| earliest_observation_period_start_date | Earliest date in the observation period start date field from the observation period table.             | Date      | Yes      |
| latest_observation_period_end_date     | Latest date in the observation period start date field from the observation period table.               | Date      | Yes      |
| snapshot_date                          | Date at which this snapshot was created.                                                                | Date      | Yes      |