--- title: "Summarise concept id counts" output: html_document: pandoc_args: [ "--number-offset=1,0" ] number_sections: yes toc: yes vignette: > %\VignetteIndexEntry{summarise_concept_id_counts} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction In this vignette, we will explore the *OmopSketch* functions designed to provide information about the number of counts of concepts in tables. Specifically, there are two key functions that facilitate this, `summariseConceptIdCounts()` and `tableConceptIdCounts()`. The former one creates a summary statistics results with the number of counts per each concept in the clinical table, and the latter one displays the result in a table. ## Create a mock cdm Let's see an example of the previous functions. To start with, we will load essential packages and create a mock cdm using `mockOmopSketch()`. ```{r, warning=FALSE} library(duckdb) library(OmopSketch) library(dplyr) cdm <- mockOmopSketch() cdm ``` # Summarise concept id counts We now use the `summariseConceptIdCounts()` function from the OmopSketch package to retrieve counts for each concept id and name, as well as for each source concept id and name, across the clinical tables. ```{r, warning=FALSE} summariseConceptIdCounts(cdm, omopTableName = "drug_exposure") |> select(group_level, variable_name, variable_level, estimate_name, estimate_value, additional_name, additional_level) |> glimpse() ``` By default, the function returns the number of records (`estimate_name == "count_records"`) for each concept_id. To include counts by person, you can set the `countBy` argument to `"person"` or to c`("record", "person")` to obtain both record and person counts. ```{r, warning=FALSE} summariseConceptIdCounts(cdm, omopTableName = "drug_exposure", countBy = c("record", "person") ) |> select( variable_name, estimate_name, estimate_value) ``` Further stratification can be applied using the `interval`, `sex`, and `ageGroup` arguments. The interval argument supports "overall" (no time stratification), "years", "quarters", or "months". ``` {r, warning=FALSE} summariseConceptIdCounts(cdm, omopTableName = "condition_occurrence", countBy = "person", interval = "years", sex = TRUE, ageGroup = list("<=50" = c(0, 50), ">50" = c(51, Inf)) ) |> select(group_level, strata_level, variable_name, estimate_name, additional_level) |> glimpse() ``` We can also filter the clinical table to a specific time window by setting the dateRange argument. ```{r} summarisedResult <- summariseConceptIdCounts(cdm, omopTableName = "condition_occurrence", dateRange = as.Date(c("1990-01-01", "2010-01-01"))) summarisedResult |> omopgenerics::settings()|> glimpse() ``` Finally, you can summarise concept counts on a subset of records by specifying the `sample` argument. ```{r} summariseConceptIdCounts(cdm, omopTableName = "condition_occurrence", sample = 50) |> select(group_level, variable_name, estimate_name) |> glimpse() ``` ## Display the results Finally, concept counts can be visualised using `tableConceptIdCounts()`. By default, it generates an interactive [reactable](https://glin.github.io/reactable/) table, but [DT](https://rstudio.github.io/DT/) datatables are also supported. ```{r, warning=FALSE} result <- summariseConceptIdCounts(cdm, omopTableName = "measurement", countBy = "record" ) tableConceptIdCounts(result, type = "reactable") ``` ```{r} tableConceptIdCounts(result, type = "datatable") ``` The display argument in tableConceptIdCounts() controls which concept counts are shown. Available options include `display = "overall"`. It is the default option and it shows both standard and source concept counts. ```{r} tableConceptIdCounts(result, display = "overall") ``` If `display = "standard"` the table shows only **standard** concept_id and concept_name counts. ```{r} tableConceptIdCounts(result, display = "standard") ``` If `display = "source"` the table shows only **source** concept_id and concept_name counts. ```{r} tableConceptIdCounts(result, display = "source") ``` If `display = "missing source"` the table shows only counts for concept ids that are missing a corresponding source concept id. ```{r} tableConceptIdCounts(result, display = "missing source") ``` If `display = "missing standard"` the table shows only counts for source concept ids that are missing a mapped standard concept id. ```{r} tableConceptIdCounts(result, display = "missing standard") ```