There are some rules of thumb that I follow when using the
TreatmentPatterns
package. These rules tend to work well in
most situations, across databases and datasets.
minPostCombinationWindow <= minEraDuration
.combinationWindow >= minEraDuration
.When creating cohorts, it is important to keep in mind that the subjects will be dived across pathways. Lets assume we have 10000 subjects in a fictitious cohort. Let’s also assume we have 5 event cohorts.
The total number of potential pathways, assuming only mono therapies equals to pathwaysn=nn, assuming we do not allow for any re-occurring treatments it would still equal to pathwaysn=n!.
Assuming our 5 event cohorts this would equal to:
5^5
## [1] 3125
factorial(5)
## [1] 120
Combinations add additional pathway possibilities. Each event can be uniquely combined with each other event. Each combination can combine with another singular event or any other combination. However, each event in a combination must be unique. So: AB=BA. As an example it is irrelevant if a person receives penicillin and ibuprofen or ibuprofen and penicillin.
We can draw out all possible combinations in a graph for events A B C.
The subscript of the nodes are the layers where the combination exists in. I.e. combination AB is in layer 2, and combinations ABCD is in layer 4. The layer coincides with the number of events in the combination.
We can count the number of nodes per layer, for each graph: l1l2l3l4sumA13318B12104C11002D10001
Our sums look suspiciously similar to 2n.
2^1
## [1] 2
2^2
## [1] 4
2^3
## [1] 8
2^4
## [1] 16
We seem to overshoot by 1 n, so we can try 2n−1.
2^0
## [1] 1
2^1
## [1] 2
2^2
## [1] 4
2^3
## [1] 8
So our total number of events equals: n−1∑i=02i
Which we can define as a function f1.
sum(c(2^0, 2^1, 2^2, 2^3))
## [1] 15
# Or:
n <- 4
sum(2^(0:(n - 1)))
## [1] 15
f_1 <- function(n) {
sum(2^(0:(n - 1)))
}
We can simulate our f1 function for 100 events.
n <- 1:25
f_1_events <- unlist(lapply(n, f_1))
data.frame(
n = n,
f_1 = f_1_events
)
n <int> | f_1 <dbl> | |||
---|---|---|---|---|
1 | 1 | |||
2 | 3 | |||
3 | 7 | |||
4 | 15 | |||
5 | 31 | |||
6 | 63 | |||
7 | 127 | |||
8 | 255 | |||
9 | 511 | |||
10 | 1023 |
Notice how the number of events increases with 2n−1.
We define this as f2. We can compare f1 to f2.
f_2 <- function(n) {
2^n - 1
}
n <- 1:25
f_1_events <- unlist(lapply(n, f_1))
f_2_events <- unlist(lapply(n, f_2))
data.frame(
n = n,
f_1 = f_1_events,
f_2 = f_2_events
)
n <int> | f_1 <dbl> | f_2 <dbl> | ||
---|---|---|---|---|
1 | 1 | 1 | ||
2 | 3 | 3 | ||
3 | 7 | 7 | ||
4 | 15 | 15 | ||
5 | 31 | 31 | ||
6 | 63 | 63 | ||
7 | 127 | 127 | ||
8 | 255 | 255 | ||
9 | 511 | 511 | ||
10 | 1023 | 1023 |
Now we can assert the following: monoEvents=ntotalEvents=2n−1combinationEvents=totalEvents−n
n <- 5
totalEvents <- 2^n - 1
combinationEvents <- totalEvents - n
sprintf("monoEvents: %s", n)
## [1] "monoEvents: 5"
sprintf("totalEvents: %s", totalEvents)
## [1] "totalEvents: 31"
sprintf("combinationEvents: %s", combinationEvents)
## [1] "combinationEvents: 26"
The minEraDuration
, combinationWindow
, and
minPostCombinationWindow
have significant effects on how
the treatment pathways are built. Conciser the following example:
library(dplyr)
cohort_table <- tribble(
~cohort_definition_id, ~subject_id, ~cohort_start_date, ~cohort_end_date,
1, 1, as.Date("2020-01-01"), as.Date("2021-01-01"),
2, 1, as.Date("2020-01-01"), as.Date("2020-01-20"),
3, 1, as.Date("2020-01-22"), as.Date("2020-02-28"),
4, 1, as.Date("2020-02-20"), as.Date("2020-03-3")
)
cohort_table
cohort_definition_id <dbl> | subject_id <dbl> | cohort_start_date <date> | cohort_end_date <date> | |
---|---|---|---|---|
1 | 1 | 2020-01-01 | 2021-01-01 | |
2 | 1 | 2020-01-01 | 2020-01-20 | |
3 | 1 | 2020-01-22 | 2020-02-28 | |
4 | 1 | 2020-02-20 | 2020-03-03 |
Assume that the target cohort is cohort_definition_id: 1, the rest are event cohorts.
cohort_table <- cohort_table %>%
mutate(duration = as.numeric(cohort_end_date - cohort_start_date))
cohort_table
cohort_definition_id <dbl> | subject_id <dbl> | cohort_start_date <date> | cohort_end_date <date> | duration <dbl> |
---|---|---|---|---|
1 | 1 | 2020-01-01 | 2021-01-01 | 366 |
2 | 1 | 2020-01-01 | 2020-01-20 | 19 |
3 | 1 | 2020-01-22 | 2020-02-28 | 37 |
4 | 1 | 2020-02-20 | 2020-03-03 | 12 |
As you can see, the duration of the treatments are: 19, 37 and 12 days. Also cohort 3 overlaps with treatment 4 for 8 days.
We can compute the overlap as follows:
cohort_table <- cohort_table %>%
# Filter out target cohort
filter(cohort_definition_id != 1) %>%
mutate(overlap = case_when(
# If the result of the next cohort_end_date is NA, set 0
is.na(lead(cohort_end_date)) ~ 0,
# Compute duration of cohort_end_date - next cohort_start_date
# 2020-02-28 - 2020-02-20 = -8
.default = as.numeric(cohort_end_date - lead(cohort_start_date))))
cohort_table
cohort_definition_id <dbl> | subject_id <dbl> | cohort_start_date <date> | cohort_end_date <date> | duration <dbl> | overlap <dbl> |
---|---|---|---|---|---|
2 | 1 | 2020-01-01 | 2020-01-20 | 19 | -2 |
3 | 1 | 2020-01-22 | 2020-02-28 | 37 | 8 |
4 | 1 | 2020-02-20 | 2020-03-03 | 12 | 0 |
We see that the overlap between treatment 2 and 3 is -2
,
so rather than an overlap there is a gap between these treatments.
Between treatment 3 and 4 there is an 8 day overlap. There is no next
treatment after treatment 4, so the overlap is 0, let’s assume our
minEraDuration = 5
.
We can draw it out like so:
2: -------------------
3: -------------------------------------
4: ------------
If we set our minCombinationWindow = 5
, the combination
would be computed for cohort 3 and 4. This would leave us with the
following treatments:
2: -------------------
3: -----------------------------
3+4: --------
4: ----
Treatment 3 now lasts 11 days; Treatment 4 lasts 4 days; and
combination treatment 3+4 lasts 8 days. If our
minPostCombinationDuration
is not set properly, we can
filter out either too many, or too little treatments.
Assuming we would set minPostCombinationDuration = 10
,
we would lose treatment 4 and combination treatment 3+4. This would
leave us with the following paths:
2: -------------------
3: -----------------------------
Pathway: 2-3
As a rule of thumb the setting the
minPostCombinationDuration <= minEraDuration
seems to
yield reasonable results. This would leave us with the following paths
minPostCombinationDuration = 5
:
2: -------------------
3: -----------------------------
3+4: --------
Pathway: 2-3-3+4