---
title: "Summarise patient characteristics"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{summarise_characteristics}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

  
```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>", message = FALSE, warning = FALSE,
  fig.width = 7
)

CDMConnector::requireEunomia()
```

## Introduction
In this example we're going to summarise the characteristics of individuals with an ankle sprain, ankle fracture, forearm fracture, or a hip fracture using the Eunomia synthetic database. 

We'll begin by creating our condition study cohorts with the `generateConceptCohortSet` function from `CDMConnector`. 

```{r}
library(duckdb)
library(CDMConnector)
library(dplyr, warn.conflicts = FALSE)
library(ggplot2)
library(CodelistGenerator)
library(PatientProfiles)
library(CohortCharacteristics)

con <- dbConnect(duckdb(), dbdir = eunomiaDir())
cdm <- cdmFromCon(
  con = con, cdmSchem = "main", writeSchema = "main", cdmName = "Eunomia"
)

cdm <- generateConceptCohortSet(
  cdm = cdm,
  name = "injuries",
  conceptSet = list(
    "ankle_sprain" = 81151,
    "ankle_fracture" = 4059173,
    "forearm_fracture" = 4278672,
    "hip_fracture" = 4230399
  ),
  end = "event_end_date",
  limit = "all"
)
settings(cdm$injuries)
cohortCount(cdm$injuries)
```

## Summarising study cohorts

Now we've created our cohorts, we can obtain a summary of the characteristics in the patients included in these cohorts. We'll create two different age group in below example: under 50 and 50+. 

```{r}
chars <- cdm$injuries |>
  summariseCharacteristics(ageGroup = list(c(0, 49), c(50, Inf)))
chars |>
  glimpse()
```

Now we have generated the results, we can create a nice table in gt format to display the results using `tableCharacteristics` function.
```{r}
tableCharacteristics(chars)
```

We can also use the `plotCharacteristics` function to display the results in a plot. The `plotCharacteristics` function can only take in one variable. So you will need to filter the results to the variable you want to create a plot for beforehand.

```{r}
chars |>
  filter(variable_name == "Age") |>
  plotCharacteristics(
    plotType = "boxplot",
    colour = "cohort_name",
    facet = c("cdm_name")
  )
```

## Stratified summaries

We can also generate summaries that are stratified by some variable of interest. In this example we added an age group variable to our cohort table and then created the stratification for age group in our results.

```{r}
chars <- cdm$injuries |>
  addAge(ageGroup = list(
    c(0, 49),
    c(50, Inf)
  )) |>
  summariseCharacteristics(strata = list("age_group"))
```

Again we used the `tableCharacteristics` function to display the results in gt table format.

```{r}
tableCharacteristics(chars,
  groupColumn = "age_group"
)
```

Then plotted age stratified prior observation time.

```{r}
chars |>
  filter(variable_name == "Prior observation") |>
  plotCharacteristics(
    plotType = "boxplot",
    colour = "cohort_name",
    facet = c("age_group")
  ) +
  coord_flip()
```

## Summaries including presence in other cohorts

We explored whether patients had any exposure to a list of selected medications (acetaminophen, morphine, warfarin)

```{r}
medsCs <- getDrugIngredientCodes(
  cdm = cdm,
  name = c("acetaminophen", "morphine", "warfarin")
)
cdm <- generateConceptCohortSet(
  cdm = cdm,
  name = "meds",
  conceptSet = medsCs,
  end = "event_end_date",
  limit = "all",
  overwrite = TRUE
)
```

We can use the `intersects` arguement inside the function to get this information.

```{r}
chars <- cdm$injuries |>
  summariseCharacteristics(cohortIntersectFlag = list(
    "Medications prior to index date" = list(
      targetCohortTable = "meds",
      window = c(-Inf, -1)
    ),
    "Medications on index date" = list(
      targetCohortTable = "meds",
      window = c(0, 0)
    )
  ))
```

To view the summary table

```{r}
tableCharacteristics(chars)
```

To visualise the exposure of these drugs in a bar plot.

```{r}
plot_data <- chars |>
  filter(
    variable_name == "Medications prior to index date",
    estimate_name == "percentage"
  )

plot_data |>
  plotCharacteristics(
    plotType = "barplot",
    colour = "variable_level",
    facet = c("cdm_name", "cohort_name")
  ) +
  scale_x_discrete(limits = rev(sort(unique(plot_data$variable_level)))) +
  coord_flip() +
  ggtitle("Medication use prior to index date")
```

## Summaries Using Concept Sets Directly
Instead of creating cohorts, we could have directly used our concept sets for medications when characterising our study cohorts.

```{r}
chars <- cdm$injuries |>
  summariseCharacteristics(conceptIntersectFlag = list(
    "Medications prior to index date" = list(
      conceptSet = medsCs,
      window = c(-Inf, -1)
    ),
    "Medications on index date" = list(
      conceptSet = medsCs,
      window = c(0, 0)
    )
  ))
```

Although, like here, concept sets can lead to the same result as using cohorts it is important to note this will not always be the case. This is because the creation of cohorts will have involved the collapsing of overlapping records as well as imposing certain requirements, such as only including records that were observed during an an ongoing observation period. Meanwhile, when working with concept sets we will instead be working directly with record-level data.

```{r}
tableCharacteristics(chars)
```


## Summaries using clinical tables 

More generally, we can also include summaries of the patients' presence in other clinical tables of the OMOP CDM. For example, here we add a count of visit occurrences

```{r}
chars <- cdm$injuries |>
  summariseCharacteristics(
    tableIntersectCount = list(
      "Visits in the year prior" = list(
        tableName = "visit_occurrence",
        window = c(-365, -1)
      )
    ),
    tableIntersectFlag = list(
      "Any drug exposure in the year prior" = list(
        tableName = "drug_exposure",
        window = c(-365, -1)
      ),
      "Any procedure in the year prior" = list(
        tableName = "procedure_occurrence",
        window = c(-365, -1)
      )
    )
  )
```

```{r}
tableCharacteristics(chars)
```