--- title: "Computing Pathways" always_allow_html: yes output: html_document: toc: yes toc_depth: '3' df_print: paged html_vignette: toc: yes toc_depth: 3 vignette: > %\VignetteIndexEntry{Computing Pathways} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} pdf_document: toc: yes --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) withr::local_envvar( R_USER_CACHE_DIR = tempfile(), EUNOMIA_DATA_FOLDER = Sys.getenv("EUNOMIA_DATA_FOLDER", unset = tempfile()) ) ``` ## Target and Event cohorts TreatmentPatterns build pathways of **event** cohorts, that occur during a **target** cohort. A target cohort in published studies is usually a disease. But other use cases is also valid. Like an exposure window to a drug or treatment. Events are things that happen during the target cohort window. For published studies these are usually exposures to drugs or treatments during a disease. To re-frame that into a research question: **"What is the order and pathway of treatments during an infection"**. Our target cohort would be the infection itself, the event cohorts would be the treatments that occur during the infection. TreatmentPatterns needs to know which cohort is a target cohort and which is an event cohort. This can be derived from the cohort set table, usually generated by either `CDMConnector` or `CohortGenerator`. To let TreatmentPatterns know which cohort is what kind, we supply a table that contains a 1) ID, 2) Name, and 3) cohort type. As an example: | cohortId | cohortName | type | | -------- | ----------- | ------ | | 1 | Infection | target | | 2 | Treatment A | event | | 3 | Treatment B | event | | 4 | Treatment C | event | Like mentioned earlier this table can be easily dirived from a cohort set generated by `CohortGenerator` or `CDMConnector`. Here is an example that uses `CDMConnector`. The target cohort is *Viral Sinusitis*, the events are several treatments: ```{r, message=FALSE, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)} library(dplyr) library(CDMConnector) cohortSet <- readCohortSet( path = system.file(package = "TreatmentPatterns", "exampleCohorts") ) cohorts <- cohortSet %>% # Remove 'cohort' and 'json' columns select(-"cohort", -"json", -"cohort_name_snakecase") %>% mutate(type = c("event", "event", "event", "event", "exit", "event", "event", "target")) %>% rename( cohortId = "cohort_definition_id", cohortName = "cohort_name", ) cohorts ``` ## Database interface TreatmentPatterns can connect to a database with either `CDMConnector` using the *CDM-Reference* or with `DatabaseConnector` using the *ConnectionDetails*. The main difference between the two is that the connection to the database is managed by TreatmentPatterns when using `DatabaseConnector`. When using `CDMConnector`, the connectoin is managed outside of TreatmentPatterns. Either case may be useful depending on the environment TreatmentPatterns is run in. It is worth noting that it does not matter how you create your cohorts. You can use any means to generate a cohort table your heart desires. You can use OHDSI tools like **ATLAS** or **Capr** to specify cohort definitions. You can generate them by executing the raw SQL from **Circe**, or generate them with `CohortGenerator` or `CDMConnector`. The only thing that matters is that the resulting cohort table: | cohort_definition_id | subject_id | cohort_start_date | cohort_end_date | | --- | --- | --- | --- | | 1 | 1 | 2020-01-01 | 2020-12-31 | | 2 | 1 | 2020-02-04 | 2020-06-27 | | 3 | 1 | 2020-07-03 | 2020-08-12 | | 4 | 1 | 2020-08-30 | 2020-11-29 | | 1 | 2 | 2020-03-01 | 2020-10-30 | | 2 | 2 | 2020-05-04 | 2020-06-27 | | 4 | 2 | 2020-10-03 | 2020-10-12 | ### CDMConnector We can use `CDMConnector` to generate cohorts from JSON definitions into the *cohrot_table* table in our database. ```{r, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)} library(DBI) library(duckdb) con <- dbConnect( drv = duckdb(), dbdir = eunomiaDir() ) cdm <- cdmFromCon( con = con, cdmSchema = "main", writeSchema = "main" ) cdm <- generateCohortSet( cdm = cdm, cohortSet = cohortSet, name = "cohort_table", overwrite = TRUE ) ``` Once we have our cohort generated, and our CDM-reference is setup, we can simply pass the CDM-reference to `computePathways()`. ```{r, eval=FALSE} outputEnv <- computePathways( cohorts = cohorts, cohortTableName = "cohort_table", cdm = cdm ) ``` ### DatabaseConnector Similarly we can create `ConnectionDetails` to our database: ```{r, eval=FALSE} library(DatabaseConnector) connectionDetails <- createConnectionDetails( dbms = "postgres", user = "user", password = "password", server = "some-server.database.net", port = 1337, pathToDriver = "./path/to/jdbc/" ) outputEnv <- computePathways( cohorts = cohorts, cohortTableName = "cohort_table", connectionDetails = connectionDetails, cdmSchema = "main", resultSchema = "main", tempEmulationSchema = NULL, ) ``` We still have to specify the `cdmSchema`, `resultSchema`, and `tempEmulationSchema` (when applicable). For the CDM-Reference from `CDMConnector`, this is handled outside of TreatmentPatterns. ## Analysis We can specify some analysis identification parameters to keep track of multiple analyses. This is particularly useful when uploading multiple results to one database. We can use `analysisId` to keep them seperate. We can also add a `description` to give some context. ```{r, eval=FALSE} outputEnv <- computePathways( cohorts = cohorts, cohortTableName = "cohort_table", cdm = cdm, analysisId = 1, description = "My First Treatment Patterns Analysis" ) ``` ## Events TreatmentPatterns will build pathways from specified events. We have control over how long these events should last, at minimum. We can do this by setting the `minEraDuration`. Any event that lasts shorter than the specified `minEraDuration` is not considered as a valid event. We can also specify which events to consider for our pathway by specifying a method in `filterTreatments`. `"first"` only takes the first occurrence of each event. `"All"` will consider all events and `"Changes"` will only consider events that of which the next event is different than itself. ![](./figures/a011_filterTreatments_first.png){#id .class width=30%} ![](./figures/a011_filterTreatments_changes.png){#id .class width=30%} ![](./figures/a011_filterTreatments_all.png){#id .class width=30%} When we set `filterTreatments = "All"` we can additionally collapse multiple occurring records of one event, into one record. The `eraCollapseSize` specifies the gap that between two of the same event that should be collapsed. ![](./figures/a011_filterTreatments_all_merge.png){#id .class width=50%} ```{r, eval=FALSE} computePathways( cohorts = cohorts, cohortTableName = "cohort_table", cdm = cdm, minEraDuration = 30, eraCollapseSize = 30, filterTreatments = "First" ) ``` ## Combinations TreatmentPatterns can classify combination events within the supplied events. It will look for overlap for all the occurring events. Not all overlap is classified as a combination. This is dictated by the `combinationWindow` parameter. This parameter specifies the minimum duration of overlap between two events to classify as a combination-event. ![](./figures/a05_combination_1.png){#id .class width=100%} Effectively we split the two event records in three records: 1) Event A, 2) Event B, and 3) The combination of event B and C. The `minPostCombinationDuration` parameter dictates what to do with the newly created events records. Because we could end up with a remaining duration of Event A and Event B that will only last 1 day. `minPostCombinationDuration` dictates the minimum duration of these newly created records, removing events that last shorter then the specified time in days. It is therefore usually unwise to specify the `minEraDuration` smaller than the `combinationWindow`. ```{r, eval=FALSE} computePathways( cohorts = cohorts, cohortTableName = "cohort_table", cdm = cdm, combinationWindow = 30, minPostCombinationDuration = 30 ) ``` ### Non-Significant Overlap Overlap is deemed not significant, the event transition is classified as a *switch* of events. The first treatment is then trucated to the index date of the overlapping event (in this case **Event B**). If you're only interrested in the final pathways that TreatmentPatterns generates, this does not influence your resulting pathways (barring some edge cases). However when you'd like to investigate the patient-level records generated by `computePathways()` this might lead to undesired end dates. ![](./figures/a05_combination_2.png){#id .class width=100%} ## Overlap Method The `overlapMethod` parameter allows you to specify what method to use to deal with this non-significant overlap: `"truncate"` to truncate the first event, as described before: ```{r, eval=FALSE} outputEnv <- computePathways( cohorts = cohorts, cohortTableName = "cohort_table", cdm = cdm, overlapMethod = "truncate" ) ``` Or `"keep"`, to keep the start and end dates of both records intact: ```{r, eval=FALSE} outputEnv <- computePathways( cohorts = cohorts, cohortTableName = "cohort_table", cdm = cdm, overlapMethod = "keep" ) ``` ## Acute and Therapy Splits We can split specific events into *acute* and *therapy* events. We do this by selecting our events of interest by `cohort_definition_id` (or `cohortId`) in the `splitEventCOhorts` parameter. We then set a cutoff for the minimum amount of **days** we would like to classify as *therapy* with `splitTime`. The first *n* days will be classified as *acute* and remaining duration will be classified as *therapy*. ![](./figures/a04_splitTime_1.png){#id .class width=100%} Let's say we want to assume that the first 60 days of our treatment is acute, and beyond that therapy. ```{r, eval=FALSE} outputEnv <- computePathways( cohorts = cohorts, cohortTableName = "cohort_table", cdm = cdm, splitEventCohorts = c(1, 2), splitTime = 30 ) ``` ![](./figures/a04_splitTime_2.png){#id .class width=100%} ## Event Windows ### Anchoring The `computePathways()` function has two 'anchor' arguments, `startAnchor` and `endAnchor`. These arguments dictate what point to use as a reference. The two values that you can set for both of these parameters are: `"startDate"` and `"endDate"`, referencing the `cohort_start_date` and `cohort_end_date` columns in the cohort table. By default they are set to: `startAnchor = "startDate"` and `endAnchor = "endDate"` ### windowStart and windowEnd The `windowStart` and `windowEnd` parameters dictate an offset from their corresponding *anchor*. If we assuming the following parameters (defaults): ``` startAnchor = "startDate" windowStart = 0 endAnchor = "endDate" windowEnd = 0 ``` We will just use the `cohort_start_date` and `cohort_end_date` as our window of interest. ```{r, eval=FALSE} outputEnv <- computePathways( cohorts = cohorts, cohortTableName = "cohort_table", cdm = cdm, startAnchor = "startDate", windowStart = 0, endAnchor = "endDate", windowEnd = 0 ) ``` ![](figures/a03_window_1.png) We can extend our window with a 30 days on either side by altering the `windowStart` and `windowEnd` variables: ``` startAnchor = "startDate" windowStart = -30 endAnchor = "endDate" windowEnd = 30 ``` Note that `windowStart = -30`, as in we subtract 30 days from the `startAnchor`. `windowEnd = 30` as in we add 30 days to the `endAnchor`. ```{r, eval=FALSE} outputEnv <- computePathways( cohorts = cohorts, cohortTableName = "cohort_table", cdm = cdm, startAnchor = "startDate", windowStart = -30, endAnchor = "endDate", windowEnd = 30 ) ``` ![](figures/a03_window_2.png) ### Changing the anchoring We can change the anchoring of `startAnchor` and `endAnchor` to set our window to a period prior to the index date: ``` startAnchor = "startDate" windowStart = -30 endAnchor = "startDate" windowEnd = 0 ``` Note that we set **both** the `startAnchor` and `endAnchor` are set to `"startDate"`. So we start -30 days from the index date, and end on the index date. ```{r, eval=FALSE} outputEnv <- computePathways( cohorts = cohorts, cohortTableName = "cohort_table", cdm = cdm, startAnchor = "startDate", windowStart = -30, endAnchor = "startDate", windowEnd = 0 ) ``` ![](figures/a03_window_3.png) ## Pathways Finally we can also dictate some parameters of the pathway. We can specify the maximum length of the pathways with `maxPathLength`. This parameter will truncate the pathways that exceed the set limit. We can also set `concatTargets` to either `TRUE` or `FALSE`. When set to `TRUE` it will append multiple cases, which might be useful for time invariant target cohorts like chronic conditions. If you would like to evaluate each occurrence of a target cohort seperately, we can set it to `FALSE`. ## Running the analysis After careful consideration of the settings, we can run our Treatment Patterns analysis: ```{r setup_analysis, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE), warning=FALSE, error=FALSE} library(TreatmentPatterns) # Computing pathways outputEnv <- computePathways( cohorts = cohorts, cohortTableName = "cohort_table", cdm = cdm, analysisId = 1, description = "My Treatment Pathway analysis", # Window startAnchor = "startDate", windowStart = 0, endAnchor = "endDate", windowEnd = 0, # Acute / Therapy splitEventCohorts = NULL, splitTime = NULL, # Events minEraDuration = 7, filterTreatments = "All", eraCollapseSize = 3, # Combinations combinationWindow = 7, minPostCombinationDuration = 7, overlapMethod = "truncate", # Pathways maxPathLength = 10, concatTargets = FALSE ) ``` The result is an `Andromeda` object that contains patient-level data. We will go into exporting the results to share-able files in the **Exporting** vignette.