In this vignette, we explore how the OmopSketch function
databaseCharacteristics() and
shinyCharacteristics() can serve as a valuable tool for
characterising databases containing electronic health records mapped to
the OMOP Common Data Model.
We begin by loading the necessary packages and creating a mock CDM
using the mockOmopSketch() function:
library(dplyr)
library(OmopSketch)
cdm <- mockOmopSketch()
cdm
#> 
#> ── # OMOP CDM reference (duckdb) of mockOmopSketch ─────────────────────────────
#> • omop tables: person, observation_period, cdm_source, concept, vocabulary,
#> concept_relationship, concept_synonym, concept_ancestor, drug_strength,
#> condition_occurrence, death, drug_exposure, measurement, observation,
#> procedure_occurrence, visit_occurrence, device_exposure
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -The databaseCharacteristics() function provides a
comprehensive summary of the CDM, returning a summarised
result that includes:
A general database snapshot, using
summariseOmopSnapshot()
A characterisation of the population in observation, built using the CohortConstructor and CohortCharacteristics packages
A summary of the observation period table using
summariseObservationPeriod() and
summariseInObservation()
A data quality assessment of the clinical tables using
summariseMissingData()
A characterisation of the clinical tables with
summariseClinicalRecords() and
summariseRecordCount()
result <- databaseCharacteristics(cdm)
#> The characterisation will focus on the following OMOP tables: person,
#> observation_period, visit_occurrence, condition_occurrence, drug_exposure,
#> procedure_occurrence, device_exposure, measurement, observation, and death
#> → Getting cdm snapshot
#> Warning: Vocabulary version in cdm_source (NA) doesn't match the one in the vocabulary
#> table (v5.0 18-JAN-19)
#> → Getting population characteristics
#> ℹ Building new trimmed cohort
#> Creating initial cohort
#> ✔ Cohort trimmed
#> ℹ adding demographics columns
#> 
#> ℹ summarising data
#> 
#> ℹ summarising cohort general_population
#> 
#> ✔ summariseCharacteristics finished!
#> 
#> → Summarising missing data
#> Warning: These columns contain missing values, which are not permitted:
#> "race_concept_id" and "ethnicity_concept_id"
#> Warning: These columns contain missing values, which are not permitted:
#> "period_type_concept_id"
#> Warning: device_exposureomop table is empty.
#> ! 56 duplicated rows eliminated.
#> → Summarising table quality
#> Warning: device_exposureomop table is empty.
#> → Summarising clinical records
#> ℹ Adding variables of interest to observation_period.
#> ℹ Summarising records per person in observation_period.
#> ℹ Summarising observation_period: `in_observation` and `type_concept`.
#> ℹ Adding variables of interest to visit_occurrence.
#> ℹ Summarising records per person in visit_occurrence.
#> ℹ Summarising visit_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.
#> ℹ Adding variables of interest to drug_exposure.
#> ℹ Summarising records per person in drug_exposure.
#> ℹ Summarising drug_exposure: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.
#> ℹ Adding variables of interest to procedure_occurrence.
#> ℹ Summarising records per person in procedure_occurrence.
#> ℹ Summarising procedure_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.
#> Warning: device_exposure is empty.
#> ℹ Adding variables of interest to measurement.
#> ℹ Summarising records per person in measurement.
#> ℹ Summarising measurement: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.
#> ℹ Adding variables of interest to observation.
#> ℹ Summarising records per person in observation.
#> ℹ Summarising observation: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.
#> ℹ Adding variables of interest to death.
#> ℹ Summarising records per person in death.
#> ℹ Summarising death: `in_observation`, `standard_concept`, `source_vocabulary`,
#>   `domain_id`, and `type_concept`.
#> → Summarising record counts
#> Warning: device_exposure omop table is empty after application of date range.
#> → Summarising in observation records, subjects, person-days, age and sex
#> ℹ The following estimates will be computed:
#> • age: median
#> → Start summary of data, at 2025-06-18 20:18:17.374429
#> 
#> ✔ Summary finished, at 2025-06-18 20:18:17.425096
#> → Summarising observation period
#> ☺ Database characterisation finished. Code ran in 0 min and 16 sec
#> ℹ 1 table created: "og_075_1750274284".
omopgenerics::settings(result) |> dplyr::select("result_id", "result_type", "package_name")
#> # A tibble: 8 × 3
#>   result_id result_type                  package_name         
#>       <int> <chr>                        <chr>                
#> 1         1 summarise_omop_snapshot      OmopSketch           
#> 2         2 summarise_characteristics    CohortCharacteristics
#> 3         3 summarise_missing_data       OmopSketch           
#> 4         4 summarise_table_quality      OmopSketch           
#> 5         5 summarise_clinical_records   OmopSketch           
#> 6         6 summarise_record_count       OmopSketch           
#> 7         7 summarise_in_observation     OmopSketch           
#> 8         8 summarise_observation_period OmopSketchBy default, the following OMOP tables are included in the characterisation: person, observation_period, visit_occurrence, condition_occurrence, drug_exposure, procedure_occurrence, device_exposure, measurement, observation, death.
You can customise which tables to include in the analysis by
specifying them with the omopTableName argument.
result <- databaseCharacteristics(cdm, omopTableName = c("drug_exposure", "condition_occurrence"))To stratify the characterisation results by sex, set the
sex argument to TRUE:
result <- databaseCharacteristics(cdm, omopTableName = c("drug_exposure", "condition_occurrence"),
                                  sex = TRUE)You can choose to characterise the data stratifying by age group by creating a list defining the age groups you want to use.
result <- databaseCharacteristics(cdm, omopTableName = c("drug_exposure", "condition_occurrence"),
                                  ageGroup = list(c(0,50), c(51,100)))Use the dateRange argument to limit the analysis to a
specific period. Combine it with the interval argument to
stratify results by time. Valid values for interval include “overall”
(default), “years”, “quarters”, and “months”:
result <- databaseCharacteristics(cdm,
                                 interval = "years",
                                 dateRange = as.Date(c("2010-01-01", "2018-12-31")))To include concept counts in the characterisation, set
conceptIdCounts = TRUE:
result <- databaseCharacteristics(cdm,
                                  conceptIdCounts = TRUE)To explore the characterisation results interactively, you can use
the shinyCharacteristics() function. This function
generates a Shiny application in the specified directory,
allowing you to browse, filter, and visualise the results through an
intuitive user interface.
shinyCharacteristics(result = result, directory = "path/to/your/shiny")You can customise the title, logo, and theme of the Shiny app by setting the appropriate arguments:
title: The title displayed at the top of the
app
logo: Path to a custom logo (must be in SVG
format)
theme: A custom Bootstrap theme (e.g., using
bslib::bs_theme())
shinyCharacteristics(result = result, directory = "path/to/my/shiny",
                     title = "Characterisation of my data",
                     logo = "path/to/my/logo.svg",
                     theme = "bslib::bs_theme(bootswatch = 'flatly')")An example of the Shiny application generated by
shinyCharacteristics() can be explored here,
where the characterisation of several synthetic datasets is
available.