2 Introduction

In this vignette, we will explore the OmopSketch functions designed to provide an overview of the clinical tables within a CDM object (observation_period, visit_occurrence, condition_occurrence, drug_exposure, procedure_occurrence, device_exposure, measurement, observation, and death). Specifically, there are four key functions that facilitate this:

2.1 Create a mock cdm

Let’s see an example of its functionalities. To start with, we will load essential packages and create a mock cdm using the mockOmopSketch() database.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(OmopSketch)

# Connect to mock database
cdm <- mockOmopSketch()

3 Summarise clinical tables

Let’s now use summariseClinicalTables()from the OmopSketch package to help us have an overview of one of the clinical tables of the cdm (i.e., condition_occurrence).

summarisedResult <- summariseClinicalRecords(cdm, "condition_occurrence")
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |> print()
#> # A tibble: 20 × 13
#>    result_id cdm_name       group_name group_level      strata_name strata_level
#>        <int> <chr>          <chr>      <chr>            <chr>       <chr>       
#>  1         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  2         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  3         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  4         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  5         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  6         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  7         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  8         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  9         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 10         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 11         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 12         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 13         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 14         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 15         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 16         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 17         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 18         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 19         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 20         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> #   estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> #   additional_name <chr>, additional_level <chr>

Notice that the output is in the summarised result format.

We can use the arguments to specify which statistics we want to perform. For example, use the argument recordsPerPerson to indicate which estimates you are interested regarding the number of records per person.

summarisedResult <- summariseClinicalRecords(cdm,
  "condition_occurrence",
  recordsPerPerson = c("mean", "sd", "q05", "q95")
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |>
  filter(variable_name == "records_per_person") |>
  select(variable_name, estimate_name, estimate_value)
#> # A tibble: 4 × 3
#>   variable_name      estimate_name estimate_value
#>   <chr>              <chr>         <chr>         
#> 1 records_per_person mean          6             
#> 2 records_per_person q05           3             
#> 3 records_per_person q95           9             
#> 4 records_per_person sd            2.2428

You can further specify if you want to include the number of records in observation (inObservation = TRUE), the number of concepts mapped (standardConcept = TRUE), which types of source vocabulary does the table contain (sourceVocabulary = TRUE), which types of domain does the vocabulary have (domainId = TRUE) or the concept’s type (typeConcept = TRUE).

summarisedResult <- summariseClinicalRecords(cdm,
  "condition_occurrence",
  recordsPerPerson = c("mean", "sd", "q05", "q95"),
  inObservation = TRUE,
  standardConcept = TRUE,
  sourceVocabulary = TRUE,
  domainId = TRUE,
  typeConcept = TRUE
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |>
  select(variable_name, estimate_name, estimate_value) |>
  glimpse()
#> Rows: 17
#> Columns: 3
#> $ variable_name  <chr> "Number subjects", "Number subjects", "Number records",…
#> $ estimate_name  <chr> "count", "percentage", "count", "mean", "q05", "q95", "…
#> $ estimate_value <chr> "100", "100", "600", "6", "3", "9", "2.2428", "600", "1…

Additionally, you can also stratify the previous results by sex and age groups:

summarisedResult <- summariseClinicalRecords(cdm,
  "condition_occurrence",
  recordsPerPerson = c("mean", "sd", "q05", "q95"),
  inObservation = TRUE,
  standardConcept = TRUE,
  sourceVocabulary = TRUE,
  domainId = TRUE,
  typeConcept = TRUE,
  sex = TRUE,
  ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |>
  select(variable_name, strata_level, estimate_name, estimate_value) |>
  glimpse()
#> Rows: 153
#> Columns: 4
#> $ variable_name  <chr> "Number subjects", "Number subjects", "Number records",…
#> $ strata_level   <chr> "overall", "overall", "overall", "overall", "overall", …
#> $ estimate_name  <chr> "count", "percentage", "count", "mean", "q05", "q95", "…
#> $ estimate_value <chr> "100", "100", "600", "6", "3", "9.0500", "2.2428", "600…

Notice that, by default, the “overall” group will be also included, as well as crossed strata (that means, sex == “Female” and ageGroup == “>35”).

Also, see that the analysis can be conducted for multiple OMOP tables at the same time:

summarisedResult <- summariseClinicalRecords(cdm,
  c("observation_period", "drug_exposure"),
  recordsPerPerson = c("mean", "sd"),
  inObservation = FALSE,
  standardConcept = FALSE,
  sourceVocabulary = FALSE,
  domainId = FALSE,
  typeConcept = FALSE
)
#> ℹ Adding variables of interest to observation_period.
#> ℹ Summarising records per person in observation_period.
#> ℹ Adding variables of interest to drug_exposure.
#> ℹ Summarising records per person in drug_exposure.

summarisedResult |>
  select(group_level, variable_name, estimate_name, estimate_value) |>
  glimpse()
#> Rows: 10
#> Columns: 4
#> $ group_level    <chr> "observation_period", "observation_period", "observatio…
#> $ variable_name  <chr> "Number subjects", "Number subjects", "Number records",…
#> $ estimate_name  <chr> "count", "percentage", "count", "mean", "sd", "count", …
#> $ estimate_value <chr> "100", "100", "100", "1", "0", "100", "100", "3100", "3…

We can also filter the clinical table to a specific time window by setting the dateRange argument.

summarisedResult <- summariseClinicalRecords(cdm, "drug_exposure",
  dateRange = as.Date(c("1990-01-01", "2010-01-01"))) 
#> ℹ Adding variables of interest to drug_exposure.
#> ℹ Summarising records per person in drug_exposure.
#> ℹ Summarising drug_exposure: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |>
  omopgenerics::settings()|>
  glimpse()
#> Rows: 1
#> Columns: 10
#> $ result_id          <int> 1
#> $ result_type        <chr> "summarise_clinical_records"
#> $ package_name       <chr> "OmopSketch"
#> $ package_version    <chr> "0.4.0"
#> $ group              <chr> "omop_table"
#> $ strata             <chr> ""
#> $ additional         <chr> ""
#> $ min_cell_count     <chr> "0"
#> $ study_period_end   <chr> "2010-01-01"
#> $ study_period_start <chr> "1990-01-01"

3.1 Tidy the summarised object

tableClinicalRecords() will help you to tidy the previous results and create a gt table.

summarisedResult <- summariseClinicalRecords(cdm,
  "condition_occurrence",
  recordsPerPerson = c("mean", "sd", "q05", "q95"),
  inObservation = TRUE,
  standardConcept = TRUE,
  sourceVocabulary = TRUE,
  domainId = TRUE,
  typeConcept = TRUE,
  sex = TRUE
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |>
  tableClinicalRecords()
Variable name Variable level Estimate name
Database name
mockOmopSketch
condition_occurrence; overall
Domain Condition N (%) 600 (100.00%)
In observation Yes N (%) 600 (100.00%)
Number records - N 600.00
Number subjects - N (%) 100 (100.00%)
Records per person - Mean (SD) 6.00 (2.24)
q05 3.00
q95 9.05
Source vocabulary No matching concept N (%) 600 (100.00%)
Standard concept S N (%) 600 (100.00%)
Type concept id Unknown type concept: 1 N (%) 600 (100.00%)
condition_occurrence; Female
Domain Condition N (%) 293 (100.00%)
In observation Yes N (%) 293 (100.00%)
Number records - N 293.00
Number subjects - N (%) 50 (100.00%)
Records per person - Mean (SD) 5.86 (2.30)
q05 3.00
q95 9.00
Source vocabulary No matching concept N (%) 293 (100.00%)
Standard concept S N (%) 293 (100.00%)
Type concept id Unknown type concept: 1 N (%) 293 (100.00%)
condition_occurrence; Male
Domain Condition N (%) 307 (100.00%)
In observation Yes N (%) 307 (100.00%)
Number records - N 307.00
Number subjects - N (%) 50 (100.00%)
Records per person - Mean (SD) 6.14 (2.19)
q05 3.45
q95 10.55
Source vocabulary No matching concept N (%) 307 (100.00%)
Standard concept S N (%) 307 (100.00%)
Type concept id Unknown type concept: 1 N (%) 307 (100.00%)

4 Summarise record counts

OmopSketch can also help you to summarise the trend of the records of an OMOP table. See the example below, where we use summariseRecordCount() to count the number of records within each year, and then, we use plotRecordCount() to create a ggplot with the trend. We can also use tableRecordCount() to display results in a table of type gt, reactable or datatable. By default it creates a gt table.

summarisedResult <- summariseRecordCount(cdm, "drug_exposure", interval = "years")

summarisedResult |> tableRecordCount(type = "gt")
Time interval
mockOmopSketch
Number records
drug_exposure 1962-01-01 to 1962-12-31 1
1963-01-01 to 1963-12-31 2
1964-01-01 to 1964-12-31 2
1965-01-01 to 1965-12-31 2
1966-01-01 to 1966-12-31 1
1967-01-01 to 1967-12-31 18
1968-01-01 to 1968-12-31 24
1969-01-01 to 1969-12-31 18
1970-01-01 to 1970-12-31 26
1971-01-01 to 1971-12-31 12
1972-01-01 to 1972-12-31 20
1973-01-01 to 1973-12-31 18
1974-01-01 to 1974-12-31 18
1975-01-01 to 1975-12-31 14
1976-01-01 to 1976-12-31 9
1977-01-01 to 1977-12-31 8
1978-01-01 to 1978-12-31 24
1979-01-01 to 1979-12-31 33
1980-01-01 to 1980-12-31 20
1981-01-01 to 1981-12-31 28
1982-01-01 to 1982-12-31 32
1983-01-01 to 1983-12-31 24
1984-01-01 to 1984-12-31 33
1985-01-01 to 1985-12-31 58
1986-01-01 to 1986-12-31 40
1987-01-01 to 1987-12-31 52
1988-01-01 to 1988-12-31 50
1989-01-01 to 1989-12-31 50
1990-01-01 to 1990-12-31 46
1991-01-01 to 1991-12-31 77
1992-01-01 to 1992-12-31 65
1993-01-01 to 1993-12-31 113
1994-01-01 to 1994-12-31 51
1995-01-01 to 1995-12-31 93
1996-01-01 to 1996-12-31 120
1997-01-01 to 1997-12-31 130
1998-01-01 to 1998-12-31 96
1999-01-01 to 1999-12-31 116
2000-01-01 to 2000-12-31 105
2001-01-01 to 2001-12-31 69
2002-01-01 to 2002-12-31 132
2003-01-01 to 2003-12-31 117
2004-01-01 to 2004-12-31 103
2005-01-01 to 2005-12-31 59
2006-01-01 to 2006-12-31 60
2007-01-01 to 2007-12-31 96
2008-01-01 to 2008-12-31 88
2009-01-01 to 2009-12-31 69
2010-01-01 to 2010-12-31 63
2011-01-01 to 2011-12-31 98
2012-01-01 to 2012-12-31 107
2013-01-01 to 2013-12-31 43
2014-01-01 to 2014-12-31 39
2015-01-01 to 2015-12-31 47
2016-01-01 to 2016-12-31 57
2017-01-01 to 2017-12-31 80
2018-01-01 to 2018-12-31 66
2019-01-01 to 2019-12-31 58
overall 3100

Note that you can adjust the time interval period using the interval argument, which can be set to either “years”, “months” or “quarters”. See the example below, where it shows the number of records every 18 months:

summariseRecordCount(cdm, "drug_exposure", interval = "quarters") |>
  plotRecordCount()

We can further stratify our counts by sex (setting argument sex = TRUE) or by age (providing an age group). Notice that in both cases, the function will automatically create a group called overall with all the sex groups and all the age groups.

summariseRecordCount(cdm, "drug_exposure",
  interval = "months",
  sex = TRUE,
  ageGroup = list(
    "<30" = c(0, 29),
    ">=30" = c(30, Inf)
  )
) |>
  plotRecordCount()

By default, plotRecordCount() does not apply faceting or colour to any variables. This can result confusing when stratifying by different variables, as seen in the previous picture. We can use VisOmopResults package to help us know by which columns we can colour or face by:

summariseRecordCount(cdm, "drug_exposure",
  interval = "months",
  sex = TRUE,
  ageGroup = list(
    "0-29" = c(0, 29),
    "30-Inf" = c(30, Inf)
  )
) |>
  visOmopResults::tidyColumns()
#> [1] "cdm_name"       "omop_table"     "age_group"      "sex"           
#> [5] "variable_name"  "variable_level" "count"          "time_interval" 
#> [9] "interval"

Then, we can simply specify this by using the facet and colour arguments from plotRecordCount()

summariseRecordCount(cdm, "drug_exposure",
  interval = "months",
  sex = TRUE,
  ageGroup = list(
    "0-29" = c(0, 29),
    "30-Inf" = c(30, Inf)
  )
) |>
  plotRecordCount(facet = omop_table ~ age_group, colour = "sex")

We can also filter the clinical table to a specific time window by setting the dateRange argument.

summariseRecordCount(cdm, "drug_exposure",
  interval = "years",
  sex = TRUE, 
  dateRange = as.Date(c("1990-01-01", "2010-01-01"))) |>
  tableRecordCount(type = "gt")
Time interval Sex
mockOmopSketch
Number records
drug_exposure 1990-01-01 to 1990-12-31 overall 46
Female 26
Male 20
1991-01-01 to 1991-12-31 overall 77
Male 29
Female 48
1992-01-01 to 1992-12-31 overall 65
Female 28
Male 37
1993-01-01 to 1993-12-31 overall 113
Female 45
Male 68
1994-01-01 to 1994-12-31 overall 51
Female 28
Male 23
1995-01-01 to 1995-12-31 overall 93
Male 35
Female 58
1996-01-01 to 1996-12-31 overall 120
Male 44
Female 76
1997-01-01 to 1997-12-31 overall 130
Female 80
Male 50
1998-01-01 to 1998-12-31 overall 96
Male 66
Female 30
1999-01-01 to 1999-12-31 overall 116
Male 68
Female 48
2000-01-01 to 2000-12-31 overall 105
Female 44
Male 61
2001-01-01 to 2001-12-31 overall 69
Male 43
Female 26
2002-01-01 to 2002-12-31 overall 132
Male 41
Female 91
2003-01-01 to 2003-12-31 overall 117
Female 62
Male 55
2004-01-01 to 2004-12-31 overall 103
Male 58
Female 45
2005-01-01 to 2005-12-31 overall 59
Female 16
Male 43
2006-01-01 to 2006-12-31 overall 60
Male 58
Female 2
2007-01-01 to 2007-12-31 overall 96
Male 26
Female 70
2008-01-01 to 2008-12-31 overall 88
Male 50
Female 38
2009-01-01 to 2009-12-31 overall 69
Female 39
Male 30
overall overall 1805
Male 905
Female 900

Finally, disconnect from the cdm

PatientProfiles::mockDisconnect(cdm = cdm)