The staggR package fits linear difference-in-differences
models in scenarios where intervention roll-outs are staggered over
time. The package implements a version of an approach proposed by Sun
and Abraham1 to estimate
cohort- and time-since-treatment specific difference-in-differences
parameters. This package provides an alternative approach that proceeds
in two steps:
Estimate a difference-in-differences regression with cohort- and (calendar) time-specific parameters;
Translate difference-in-differences estimates into average estimates for arbitrary time periods relative to the intervention (e.g., post-intervention period).
By separating these two steps, we provide an easy-to-implement, transparent, intuitive difference-in-differences framework.
Consider a scenario with \(i=1,...,N\) units and \(t=1,...,T\) periods. Units are exposed to the intervention at different times. We group units having the intervention during the same time period into cohorts, which we denote by \(c=1,...,C\). To simplify the notation, we assume that units in cohort \(c=1\) are exposed to treatment at time \(t=1\), etc. Units exposed to the intervention at time \(t>T\) form the comparison group because they are not exposed to the intervention during the study period.
Abraham and Sun’s difference-in-differences specification interacts cohort indicators with time indicators relative to intervention exposure start, and might be written as follows:
\[ y_{it} = \alpha_c D_c + \beta_t D_t + \left(\sum_c \sum_{l\neq -1} \gamma_{cl} D_c \cdot D^l_{it}\right) + \psi X_{it} + \varepsilon_{st}, \]
where \(D_c\) are cohort indicators equal to one if unit \(i\) is in cohort \(c\) and zero otherwise, \(D_t\) are time indicators, and \(D^l_{it}\) are indicators equal to one if unit \(i\) is \(l\) periods away from initial intervention exposure, \(X_{it}\) are covariates, and \(\varepsilon_{st}\) is the error term. The last pre-intervention period \(l=-1\) is omitted to ensure identification. The parameters \(\gamma_{cl}\) measure difference-in-differences estimates for a cohort \(c\) and time since intervention \(l\).
Our approach recognizes the following relationship between time since intervention \(l\) and time \(t\), which was also noted by Sun and Abraham:
\[ l = t-c \]
For instance, time \(t=2\) for cohort \(c=3\) is \(l=-1\), i.e., the last period prior to intervention exposure, because intervention exposure for cohort \(c\) is \(t=3\).
Using this insight, the difference-in-differences regression can alternatively be stated as follows:
\[ y_{it} = \alpha_c D_c + \beta_t D_t + \left(\sum_c \sum_{t\neq (c-1)} \gamma_{ct} D_c \cdot D_t\right) + \psi X_{it} + \varepsilon_{st}. \]
Using this specification has two advantages:
The above equation preserves this structure, while showing how it is a more general version of the difference-in-differences design. Specifically, parameters \(\beta_t\) measure changes over time for the comparison group, which now can include multiple periods; and parameters \(\gamma_{ct}\) measure differential changes, but for multiple treatment cohorts instead of just one treatment group, and for multiple periods instead of just one.
This packages provides functions to estimate such a
difference-in-differences specification. It also provides functions to
translate difference-in-differences estimates into quantities that are
useful for research purposes. For instance, users can calculate average
difference-in-differences estimates for each period relative to
intervention exposure, or they can calculate the average
difference-in-differences estimates for the full post-intervention
period. The ave_coeff() function accomplishes this by
calculating averages of user-specified sets of regression coefficients
and combining the corresponding standard errors using the estimated
variance–covariance matrix, whose calculation method users can also
specify. This aggregation yields a single estimate that properly
accounts for sampling variability and the correlation among underlying
event-study coefficients.
Consider as a didactic example a policy intervention designed to
reduce inpatient hospitalizations in 15 counties. This longitudinal data
set contains one row per individual-year. Each individual in the data
set is identified by a globally unique identifier (guid),
and we have measures of the individuals’ ages, sexes, and comorbidities,
and an indicator showing whether the individual was hospitalized during
the current year. Each individual resides in a county, which itself is
grouped into cohorts based on intervention year. Finally, the
yr column indicates the year of the current
observation.
data("hosp", package = "staggR")
head(hosp, 20)
#> guid age sex comorb hospitalized county intervention_yr
#> 1 000P7XB6O5P8 26 M 1 FALSE Otter Pop County 2017
#> 2 0022LHNIMI01 27 F 1 FALSE Briar Glen County 2016
#> 3 0022LHNIMI01 28 F 0 FALSE Briar Glen County 2016
#> 4 0022LHNIMI01 29 F 1 TRUE Briar Glen County 2016
#> 5 0022LHNIMI01 30 F 1 FALSE Briar Glen County 2016
#> 6 0032MSL22PRS 41 M 1 FALSE Northhaven County <NA>
#> 7 0032MSL22PRS 42 M 1 TRUE Northhaven County <NA>
#> 8 0032MSL22PRS 43 M 0 FALSE Northhaven County <NA>
#> 9 0032MSL22PRS 44 M 0 FALSE Northhaven County <NA>
#> 10 0032MSL22PRS 45 M 0 FALSE Northhaven County <NA>
#> 11 004NK9W8RUT0 35 F 0 FALSE Driftwood County 2017
#> 12 004NK9W8RUT0 36 F 1 FALSE Driftwood County 2017
#> 13 004NK9W8RUT0 37 F 1 TRUE Driftwood County 2017
#> 14 008Z7PKWNY5K 32 F 1 FALSE Clearfork County 2017
#> 15 008Z7PKWNY5K 33 F 1 TRUE Clearfork County 2017
#> 16 008Z7PKWNY5K 34 F 1 FALSE Clearfork County 2017
#> 17 008Z7PKWNY5K 35 F 1 FALSE Clearfork County 2017
#> 18 008Z7PKWNY5K 36 F 0 FALSE Clearfork County 2017
#> 19 008Z7PKWNY5K 37 F 1 FALSE Clearfork County 2017
#> 20 008Z7PKWNY5K 38 F 0 TRUE Clearfork County 2017
#> cohort yr
#> 1 7 2018
#> 2 6 2015
#> 3 6 2016
#> 4 6 2017
#> 5 6 2018
#> 6 0 2013
#> 7 0 2014
#> 8 0 2015
#> 9 0 2016
#> 10 0 2017
#> 11 7 2011
#> 12 7 2012
#> 13 7 2013
#> 14 7 2010
#> 15 7 2011
#> 16 7 2012
#> 17 7 2013
#> 18 7 2014
#> 19 7 2015
#> 20 7 2016The county is never exposed to the intervetnion if
intervention_yr is NA. Among the 15 counties
in the data set, 3 counties implemented the intervention in 2015; 2
counties implemented in 2016; 5 counties implemented in 2017; 1 county
implemented in 2018; and 4 counties did not implement the intervention
at all during the study period.
with(hosp,
table(county, intervention_yr, useNA = "ifany"))
#> intervention_yr
#> county 2015 2016 2017 2018 <NA>
#> Ashbrook County 2177 0 0 0 0
#> Banana Peel County 0 0 2017 0 0
#> Briar Glen County 0 2101 0 0 0
#> Cinder Bluff County 0 0 1879 0 0
#> Clearfork County 0 0 1863 0 0
#> Driftwood County 0 0 1749 0 0
#> Mapleford County 0 0 0 0 2146
#> Meadowridge County 0 1958 0 0 0
#> Moonwhistle County 0 0 0 0 2309
#> Northhaven County 0 0 0 0 2209
#> Otter Pop County 0 0 1890 0 0
#> Pickle Springs County 0 0 0 1727 0
#> Pine Hollow County 2297 0 0 0 0
#> Silver Run County 0 0 0 0 2327
#> Stonefield County 2391 0 0 0 0Counties are organized into cohorts by intervention year, i.e., counties with the same intervention year are assigned to the same cohort.
county_cohorts <- unique(hosp[, c("county", "cohort", "intervention_yr")])
county_cohorts[order(county_cohorts$cohort), ]
#> county cohort intervention_yr
#> 6 Northhaven County 0 <NA>
#> 31 Moonwhistle County 0 <NA>
#> 95 Mapleford County 0 <NA>
#> 100 Silver Run County 0 <NA>
#> 28 Pine Hollow County 5 2015
#> 43 Stonefield County 5 2015
#> 96 Ashbrook County 5 2015
#> 2 Briar Glen County 6 2016
#> 36 Meadowridge County 6 2016
#> 1 Otter Pop County 7 2017
#> 11 Driftwood County 7 2017
#> 14 Clearfork County 7 2017
#> 56 Banana Peel County 7 2017
#> 73 Cinder Bluff County 7 2017
#> 22 Pickle Springs County 8 2018The study period includes 11 years, from 2010 through 2020. Hospitalizations occurs in every county-year.
stats::xtabs(hospitalized ~ county + yr,
data = hosp)
#> yr
#> county 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
#> Ashbrook County 28 46 50 77 82 127 126 167 174 139 137
#> Banana Peel County 25 28 39 49 59 67 108 154 140 149 196
#> Briar Glen County 25 27 32 47 41 40 63 66 45 44 62
#> Cinder Bluff County 14 18 21 37 71 76 98 139 138 142 218
#> Clearfork County 16 27 37 35 54 81 100 128 140 127 191
#> Driftwood County 16 18 28 39 45 74 76 126 137 123 210
#> Mapleford County 7 25 44 52 93 105 100 108 114 95 62
#> Meadowridge County 17 20 29 27 37 46 60 45 41 43 47
#> Moonwhistle County 15 27 35 72 85 89 108 125 134 117 67
#> Northhaven County 10 30 36 47 70 99 119 122 110 114 74
#> Otter Pop County 16 24 32 56 60 71 91 139 142 134 221
#> Pickle Springs County 8 14 21 24 35 27 35 36 41 52 77
#> Pine Hollow County 33 49 67 79 110 111 132 142 158 145 157
#> Silver Run County 15 32 36 48 77 96 109 130 111 109 78
#> Stonefield County 28 41 75 91 109 129 169 172 163 125 144To visualize trends in the proportion of individuals who were
hospitalized over time, we can call the function ts_plot(),
which takes, at minimum, 3 arguments: a formula, in the format
outcome ~ group + time_var, the source data set, and the
name of the column that contains the time period during which each group
implemented the intervention.
staggR::ts_plot(hospitalized ~ county + yr,
df = hosp,
intervention_var = "intervention_yr")Equivalently, we can specify the outcome, group, and time variable
using the parameters y, group, and
time_var.
staggR::ts_plot(y = "hospitalized",
group = "county",
time_var = "yr",
df = hosp,
intervention_var = "intervention_yr")While this visualization is helpful, it excludes counties that did
not implement the intervention. This is because the function
ts_plot() omits any counties where any of y,
group, time_var, (or our formula equivalent,
hospitalized ~ county + yr) or the
intervention_var are NA.
table(hosp$hospitalized, useNA = "always")
#>
#> FALSE TRUE <NA>
#> 18033 13007 0
table(hosp$county, useNA = "always")
#>
#> Ashbrook County Banana Peel County Briar Glen County
#> 2177 2017 2101
#> Cinder Bluff County Clearfork County Driftwood County
#> 1879 1863 1749
#> Mapleford County Meadowridge County Moonwhistle County
#> 2146 1958 2309
#> Northhaven County Otter Pop County Pickle Springs County
#> 2209 1890 1727
#> Pine Hollow County Silver Run County Stonefield County
#> 2297 2327 2391
#> <NA>
#> 0
table(hosp$yr, useNA = "always")
#>
#> 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 <NA>
#> 1104 1592 2059 2569 2996 3385 3632 3677 3539 3006 3481 0
table(hosp$intervention_yr, useNA = "always")
#>
#> 2015 2016 2017 2018 <NA>
#> 6865 4059 9398 1727 8991We can add these counties to the plot if we change NAs
in intervention_yr to some other string value.
hosp2 <- hosp
hosp2$intervention_yr <- ifelse(is.na(hosp2$intervention_yr),
yes = "Comparison group",
no = hosp2$intervention_yr)
county_plot <- staggR::ts_plot(hospitalized ~ county + yr,
df = hosp2,
intervention_var = "intervention_yr")
county_plotIt might be easier to compare the panels in this plot if the X axes
were all aligned at the point of intervention. We can accomplish this by
passing a time-since-intervention (tsi) object to
ts_plot(). We create a tsi object by passing
the hosp2 data set to the id_tsi() function,
and we identify the names of the columns that contain intervention
cohorts, time periods, and the time period during which each cohort
implemented the intervention. The returned tsi object
contains a data frame, which includes columns cohort,
time, intervention_time, and tsi.
Each row corresponds to a unique county-time period. The
tsi column contains an integer that tells us how many time
periods the corresponding time period is away from the intervention. For
example, Ashbrook County belongs to cohort 5, which means the
intervention took place in 2015. If we examine the first row of the
tsi object, we’ll see that the time period 2010 is 5 time
periods prior to the intervention, so tsi has a value of
-5. Values of tsi are negative before the intervention, 0
during the year of intervention, and positive for the years following
the intervention.
hosp_tsi <- staggR::id_tsi(df = hosp2,
cohort_var = "county",
time_var = "yr",
intervention_var = "intervention_yr")
head(hosp_tsi$data, 11)
#> cohort time intervention_time tsi
#> 634 Ashbrook County 2010 2015 -5
#> 96 Ashbrook County 2011 2015 -4
#> 97 Ashbrook County 2012 2015 -3
#> 98 Ashbrook County 2013 2015 -2
#> 99 Ashbrook County 2014 2015 -1
#> 218 Ashbrook County 2015 2015 0
#> 240 Ashbrook County 2016 2015 1
#> 129 Ashbrook County 2017 2015 2
#> 242 Ashbrook County 2018 2015 3
#> 243 Ashbrook County 2019 2015 4
#> 291 Ashbrook County 2020 2015 5Now we can pass hosp_tsi to ts_plot() to
generate a set of plots with time since intervention on the X axis
instead of years.
county_plot <- staggR::ts_plot(hospitalized ~ county + yr,
df = hosp2,
intervention_var = "intervention_yr",
tsi = hosp_tsi)
county_plotWe might like to customize this plot a little bit and use better
labels for the X and Y axes. The function ts_plot() returns
a ggplot object, so we can append any ggplot2
functions to customize the plot.
library(ggplot2)
county_plot +
scale_x_continuous(name = "Years",
breaks = seq(-8, 5, by = 2)) +
scale_y_continuous(name = "Percent hospitalized",
breaks = seq(0, 1, by = 0.2),
labels = function(x) paste0(round(x * 100), "%")) +
theme_classic() +
theme(panel.grid.major.x = element_line(linewidth = 0.5))Now that we have some familiarity with the data, we can fit a
staggered difference-in-differences model. The function for estimating
such a model is called sdid().
The first argument of the function is the regression formula. The
order of terms in the regression formula is important:
sdid() will assume that the first term on the right-hand
side of the formula is the variable that identifies cohorts, and the
second term is the variable that identifies the time period for each
observation. All remaining terms on the right-hand side of the formula
are covariates that we would like to use to adjust our model. The
dependent variable is on the left-hand side of the formula.
Other important parameters include intervention_var,
which specifies the name of the column that contains the time period
during which each cohort implemented the intervention (in our case, this
is intervention_yr), and df, which specifies
the data set.
The call for estimating the staggered difference-in-differences regression, adjusting for age, sex, and comorbitiy, is as follows:
sdid_hosp <- staggR::sdid(hospitalized ~ cohort + yr + age + sex + comorb,
df = hosp,
intervention_var = "intervention_yr")
summary(sdid_hosp)
#>
#> Supplied formula:
#> hospitalized ~ cohort + yr + age + sex + comorb
#>
#> Fitted formula:
#> hospitalized ~ cohort_5 + cohort_6 + cohort_7 + cohort_8 + yr_2011 +
#> yr_2012 + yr_2013 + yr_2014 + yr_2015 + yr_2016 + yr_2017 +
#> yr_2018 + yr_2019 + yr_2020 + cohort_5:yr_2010 + cohort_5:yr_2011 +
#> cohort_5:yr_2012 + cohort_5:yr_2013 + cohort_5:yr_2015 +
#> cohort_5:yr_2016 + cohort_5:yr_2017 + cohort_5:yr_2018 +
#> cohort_5:yr_2019 + cohort_5:yr_2020 + cohort_6:yr_2010 +
#> cohort_6:yr_2011 + cohort_6:yr_2012 + cohort_6:yr_2013 +
#> cohort_6:yr_2014 + cohort_6:yr_2016 + cohort_6:yr_2017 +
#> cohort_6:yr_2018 + cohort_6:yr_2019 + cohort_6:yr_2020 +
#> cohort_7:yr_2010 + cohort_7:yr_2011 + cohort_7:yr_2012 +
#> cohort_7:yr_2013 + cohort_7:yr_2014 + cohort_7:yr_2015 +
#> cohort_7:yr_2017 + cohort_7:yr_2018 + cohort_7:yr_2019 +
#> cohort_7:yr_2020 + cohort_8:yr_2010 + cohort_8:yr_2011 +
#> cohort_8:yr_2012 + cohort_8:yr_2013 + cohort_8:yr_2014 +
#> cohort_8:yr_2015 + cohort_8:yr_2016 + cohort_8:yr_2018 +
#> cohort_8:yr_2019 + cohort_8:yr_2020 + age + sex + comorb
#>
#> Residuals:
#> Min Q1 Median Q3 Max
#> -0.8656 -0.3703 -0.2186 0.473 0.8912
#>
#> Coefficients:
#> term estimate std_error t_value p_value
#> (Intercept) 0.041538404 0.0367618169 1.1299334 0.25851
#> cohort_5 0.085465821 0.0229393129 3.7257359 0.00020 ***
#> cohort_6 -0.157204977 0.0263264217 -5.9713766 2.4e-09 ***
#> cohort_7 0.077020852 0.0197878817 3.8923242 0.00010 ***
#> cohort_8 -0.232052150 0.0358428703 -6.4741509 9.7e-11 ***
#> yr_2011 0.053809624 0.0396503707 1.3571027 0.17476
#> yr_2012 0.022191831 0.0374003952 0.5933582 0.55295
#> yr_2013 0.041736045 0.0361860691 1.1533733 0.24877
#> yr_2014 0.102216988 0.0355358831 2.8764443 0.00402 **
#> yr_2015 0.119986328 0.0351817131 3.4104743 0.00065 ***
#> yr_2016 0.139478844 0.0350227354 3.9825229 0.00007 ***
#> yr_2017 0.181007359 0.0350332547 5.1667297 2.4e-07 ***
#> yr_2018 0.186156454 0.0351558360 5.2951793 1.2e-07 ***
#> yr_2019 0.227572896 0.0356460646 6.3842362 1.7e-10 ***
#> yr_2020 0.218923330 0.0372348645 5.8795254 4.2e-09 ***
#> age 0.005021507 0.0004916025 10.2145669 0.0e+00 ***
#> sexM 0.014436532 0.0052571103 2.7460965 0.00603 **
#> comorb 0.007434585 0.0054038538 1.3757931 0.16890
#> cohort_5:yr_2010 -0.073298286 0.0463075431 -1.5828584 0.11346
#> cohort_5:yr_2011 -0.072496499 0.0390939557 -1.8544171 0.06369 .
#> cohort_5:yr_2012 0.007407740 0.0357130944 0.2074236 0.83568
#> cohort_5:yr_2013 0.018351888 0.0335827224 0.5464681 0.58475
#> cohort_5:yr_2015 0.032452652 0.0316589301 1.0250710 0.30534
#> cohort_5:yr_2016 0.074734676 0.0313560797 2.3834190 0.01716 *
#> cohort_5:yr_2017 0.112519746 0.0314518730 3.5775213 0.00035 ***
#> cohort_5:yr_2018 0.182252536 0.0319802129 5.6989156 1.2e-08 ***
#> cohort_5:yr_2019 0.186210308 0.0338471964 5.5014987 3.8e-08 ***
#> cohort_5:yr_2020 0.261743731 0.0355565320 7.3613403 1.9e-13 ***
#> cohort_6:yr_2010 0.192407710 0.0553397293 3.4768459 0.00051 ***
#> cohort_6:yr_2011 0.081741042 0.0466185244 1.7534026 0.07954 .
#> cohort_6:yr_2012 0.126187949 0.0428912399 2.9420448 0.00326 **
#> cohort_6:yr_2013 0.102161820 0.0399766017 2.5555404 0.01061 *
#> cohort_6:yr_2014 0.016850413 0.0382018594 0.4410888 0.65915
#> cohort_6:yr_2016 0.037963212 0.0364675056 1.0410148 0.29788
#> cohort_6:yr_2017 -0.020106968 0.0367710526 -0.5468151 0.58451
#> cohort_6:yr_2018 -0.067739307 0.0372930932 -1.8164036 0.06932 .
#> cohort_6:yr_2019 -0.074178123 0.0387938325 -1.9121112 0.05587 .
#> cohort_6:yr_2020 -0.083133234 0.0381499590 -2.1791173 0.02933 *
#> cohort_7:yr_2010 -0.027712569 0.0458969192 -0.6038002 0.54598
#> cohort_7:yr_2011 -0.098466031 0.0375474688 -2.6224412 0.00873 **
#> cohort_7:yr_2012 -0.049054517 0.0336942562 -1.4558718 0.14544
#> cohort_7:yr_2013 -0.039127782 0.0311998428 -1.2541019 0.20981
#> cohort_7:yr_2014 -0.055261507 0.0296767494 -1.8621145 0.06260 .
#> cohort_7:yr_2015 -0.042251185 0.0285647712 -1.4791361 0.13911
#> cohort_7:yr_2017 0.124607104 0.0277761427 4.4861198 7.3e-06 ***
#> cohort_7:yr_2018 0.131896098 0.0279532596 4.7184514 2.4e-06 ***
#> cohort_7:yr_2019 0.157899398 0.0290538400 5.4347170 5.5e-08 ***
#> cohort_7:yr_2020 0.192719914 0.0297778430 6.4719232 9.8e-11 ***
#> cohort_8:yr_2010 0.169800222 0.0812808931 2.0890546 0.03671 *
#> cohort_8:yr_2011 0.162146753 0.0708023809 2.2901314 0.02202 *
#> cohort_8:yr_2012 0.229016705 0.0645212698 3.5494761 0.00039 ***
#> cohort_8:yr_2013 0.181452224 0.0596286196 3.0430391 0.00234 **
#> cohort_8:yr_2014 0.161011028 0.0560541472 2.8724195 0.00408 **
#> cohort_8:yr_2015 0.046657219 0.0533463636 0.8746092 0.38179
#> cohort_8:yr_2016 0.054386841 0.0519756746 1.0463903 0.29539
#> cohort_8:yr_2018 0.008099328 0.0501890641 0.1613763 0.87180
#> cohort_8:yr_2019 0.035236092 0.0511142067 0.6893600 0.49060
#> cohort_8:yr_2020 -0.003491331 0.0471774065 -0.0740043 0.94101
#>
#> Significance codes: < 0.001: '***'; < 0.01: '**'; < 0.05: '*'; < 0.1: '.'
#> Residual standard error: 0.4623 on 30982 degrees of freedom
#> R^2: 0.122084901176252; Adjusted R^2: 0.120469732348128Although we supplied a formula to fit this model, it is important to
recognize that this formula is specified according to a convenience
syntax, which assumes that we want to fit a Sun and Abraham-style
staggered difference-in-differences model, rather than the very simple
model implied by our formula. If we would like to see the formula passed
to lm() to fit the staggered difference-in-differences
model, we can examine the components of the sdid_hosp
object.
names(sdid_hosp)
#> [1] "mdl" "formula" "vcov" "tsi"
#> [5] "obs_cnt" "cohort" "time" "intervention_var"
#> [9] "covariates"
sdid_hosp$formula
#> $supplied
#> hospitalized ~ cohort + yr + age + sex + comorb
#>
#> $fitted
#> hospitalized ~ cohort_5 + cohort_6 + cohort_7 + cohort_8 + yr_2011 +
#> yr_2012 + yr_2013 + yr_2014 + yr_2015 + yr_2016 + yr_2017 +
#> yr_2018 + yr_2019 + yr_2020 + cohort_5:yr_2010 + cohort_5:yr_2011 +
#> cohort_5:yr_2012 + cohort_5:yr_2013 + cohort_5:yr_2015 +
#> cohort_5:yr_2016 + cohort_5:yr_2017 + cohort_5:yr_2018 +
#> cohort_5:yr_2019 + cohort_5:yr_2020 + cohort_6:yr_2010 +
#> cohort_6:yr_2011 + cohort_6:yr_2012 + cohort_6:yr_2013 +
#> cohort_6:yr_2014 + cohort_6:yr_2016 + cohort_6:yr_2017 +
#> cohort_6:yr_2018 + cohort_6:yr_2019 + cohort_6:yr_2020 +
#> cohort_7:yr_2010 + cohort_7:yr_2011 + cohort_7:yr_2012 +
#> cohort_7:yr_2013 + cohort_7:yr_2014 + cohort_7:yr_2015 +
#> cohort_7:yr_2017 + cohort_7:yr_2018 + cohort_7:yr_2019 +
#> cohort_7:yr_2020 + cohort_8:yr_2010 + cohort_8:yr_2011 +
#> cohort_8:yr_2012 + cohort_8:yr_2013 + cohort_8:yr_2014 +
#> cohort_8:yr_2015 + cohort_8:yr_2016 + cohort_8:yr_2018 +
#> cohort_8:yr_2019 + cohort_8:yr_2020 + age + sex + comorb
#> <environment: 0x11f44b408>Here we see both the supplied and fitted
formulas. The fitted formula includes main effects for each
cohort, time periods, and interaction terms for each cohort-time period
combination. You might notice that these terms do not exist in our
supplied data.frame; this is because sdid() calls
prep_data() to create all necessary dummy variables. These
are needed because we have to be very explicit about which levels of
each cohort and time period variable are used as referents in the model,
and these referent levels differ for main effects and for each set of
interaction terms.
By default, sdid() assumes that the first observed
levels of both the cohort column and the time-period column should
function as the referents for the main effects. This is the same as with
lm() and most other modeling functions in R. For the long
list of cohort-time period interactions, however, sdid()
assumes that the time period immediately preceding the intervention time
period (defined by intervention_var) should be the
referent. In our example, that means that the interaction terms for
cohort_5 should omit 2014 as the referent time period,
because that cohort’s intervention took place in 2015. Similarly, the
interaction terms for cohort_6 should omit 2015 as the
referent time period, because that cohort’s intervention took place in
2016. Indeed, we observe that the formula has interaction terms
cohort_5:yr_2010 through cohort_5:yr_2020,
with cohort_5_2014 omitted for identification.
We can use the cohort_time_refs parameter to specify a
different time period to function as the referent for cohort-time
interactions:
sdid_hosp_earlier_timeref <- staggR::sdid(hospitalized ~ cohort + yr + age + sex + comorb,
df = hosp,
cohort_time_refs = list(`5` = "2013",
`6` = "2014",
`7` = "2015",
`8` = "2016"),
intervention_var = "intervention_yr")We can now proceed with the second, post-estimation step: translating difference-in-differences estimates into average estimates.
Before we can calculate coefficient averages, we first need to
identify the coefficients we want to combine. We do this by supplying
the sdid object and either “pre” or “post” to the
select_period() function to retrieve the coefficients for
interaction terms corresponding to the pre-intervention or
post-intervention periods, respectively.
# Identify the coefficients that compose the effect of the intervention during the
# post-intervention period
(post_coefs <- staggR::select_period(sdid = sdid_hosp,
period = "post"))
#> [1] "cohort_5:yr_2015" "cohort_5:yr_2016" "cohort_5:yr_2017" "cohort_5:yr_2018"
#> [5] "cohort_5:yr_2019" "cohort_5:yr_2020" "cohort_6:yr_2016" "cohort_6:yr_2017"
#> [9] "cohort_6:yr_2018" "cohort_6:yr_2019" "cohort_6:yr_2020" "cohort_7:yr_2017"
#> [13] "cohort_7:yr_2018" "cohort_7:yr_2019" "cohort_7:yr_2020" "cohort_8:yr_2018"
#> [17] "cohort_8:yr_2019" "cohort_8:yr_2020"Now that we’ve identified the coefficients of interest, we will pass
the sdid object and the selected coefficients to the
ave_coeff() function, which calculates averages of any
specified set of regression coefficients, along with the corresponding
standard errors.
To demonstrate how this function works, we first calculate the average effect of the intervention during the post-intervention period.
staggR::ave_coeff(sdid = sdid_hosp,
coefs = post_coefs)
#> est se pval sign lb ub n
#> 1 0.1000987 0.01398148 8.118538e-13 *** 0.07269504 0.1275024 11745The result shows that the intervention is associated with an increase in the probability of hospitalization.
Next, we can use the ave_coeff() function to examine
whether the risk of hospitalization differs by treatment group during
the pre-intervention period. For that, we pass “pre” to the
period argument, as follows:
# Identify the coefficients that represent the difference between exposure groups
# during the pre-intervention period
(pre_coefs <- staggR::select_period(sdid = sdid_hosp,
period = "pre"))
#> [1] "cohort_5:yr_2010" "cohort_5:yr_2011" "cohort_5:yr_2012" "cohort_5:yr_2013"
#> [5] "cohort_6:yr_2010" "cohort_6:yr_2011" "cohort_6:yr_2012" "cohort_6:yr_2013"
#> [9] "cohort_6:yr_2014" "cohort_7:yr_2010" "cohort_7:yr_2011" "cohort_7:yr_2012"
#> [13] "cohort_7:yr_2013" "cohort_7:yr_2014" "cohort_7:yr_2015" "cohort_8:yr_2010"
#> [17] "cohort_8:yr_2011" "cohort_8:yr_2012" "cohort_8:yr_2013" "cohort_8:yr_2014"
#> [21] "cohort_8:yr_2015" "cohort_8:yr_2016"
staggR::ave_coeff(sdid = sdid_hosp,
coefs = pre_coefs)
#> est se pval sign lb ub n
#> 1 -0.001761013 0.01451994 0.8853825 -0.03022009 0.02669807 7929Here we see risk of hospitalization during the pre-intervention period does not significantly differ by intervention status.
We can also examine whether there exists heterogeneity in
difference-in-differences estimates between various cohorts, using the
cohorts parameter in the select_period()
function. For instance, we could calculate the average of
difference-in-differences estimates for the post-intervention period for
cohorts 5 and 6.
# Identify the coefficients for the effect of the intervention on cohorts 5 and 6
# during the post-intervention period
(post_coefs_cohort5 <- staggR::select_period(sdid = sdid_hosp,
period = "post",
cohorts = c("5", "6")))
#> [1] "cohort_5:yr_2015" "cohort_5:yr_2016" "cohort_5:yr_2017" "cohort_5:yr_2018"
#> [5] "cohort_5:yr_2019" "cohort_5:yr_2020" "cohort_6:yr_2016" "cohort_6:yr_2017"
#> [9] "cohort_6:yr_2018" "cohort_6:yr_2019" "cohort_6:yr_2020"
staggR::ave_coeff(sdid = sdid_hosp,
coefs = post_coefs_cohort5)
#> est se pval sign lb ub n
#> 1 0.07151805 0.01901505 0.0001660672 *** 0.03424855 0.1087875 6376We can similarly calculate the average of difference-in-differences estimates for the post-intervention period for cohorts 7 and 8.
# Identify the coefficients for the effect of the intervention on cohorts 7 and 8
# during the post-intervention period
(post_coefs_cohort8 <- staggR::select_period(sdid = sdid_hosp,
period = "post",
cohorts = c("7", "8")))
#> [1] "cohort_7:yr_2017" "cohort_7:yr_2018" "cohort_7:yr_2019" "cohort_7:yr_2020"
#> [5] "cohort_8:yr_2018" "cohort_8:yr_2019" "cohort_8:yr_2020"
staggR::ave_coeff(sdid = sdid_hosp,
coefs = post_coefs_cohort8)
#> est se pval sign lb ub n
#> 1 0.13404 0.02030546 4.061829e-11 *** 0.09424126 0.1738386 5369Comparing these averages shows that the association between the intervention and probability of hospitalization for the last two cohorts is almost twice as large compared to the first two cohorts.
If we want to aggregate a set of coefficients other than those for
the pre- or post-intervention periods, we select our coefficients using
select_terms() instead of select_period(). We
supply a named list containing values for cohorts and
either times or tsi. The cohorts
element should contain a character vector of cohorts; if we omit the
cohorts element, the function will select all available
cohorts. The times element contains a character vector of
time periods as they are recorded in the time variable supplied to
sdid(). If we would prefer to select time periods relative
to the intervention period, we would instead use the tsi
element, which contains a vector of integers representing the number of
units of time relative to each cohort’s intervention.
For example, we could calculate the added risk of hospitalization associated with the intervention across all cohorts for the year 2018.
# Identify the coefficients corresponding to all cohorts in 2018
(terms_2018 <- staggR::select_terms(sdid = sdid_hosp,
selection = list(times = "2018")))
#> [1] "cohort_5:yr_2018" "cohort_6:yr_2018" "cohort_7:yr_2018" "cohort_8:yr_2018"Then we can pass the selected terms to the ave_coeff()
function to aggregate the 2018 effect across all cohorts.
staggR::ave_coeff(sdid = sdid_hosp,
coefs = terms_2018)
#> est se pval sign lb ub n
#> 1 0.1012555 0.02090127 1.25012e-06 *** 0.06028899 0.142222 2441If we are interested in the added risk of hospitalization associated
with the intervention in the year 2018, but only for the first two
cohorts (5 and 6), we could include a cohorts element in
the selection list.
(terms_2018_cohorts56 <- staggR::select_terms(sdid = sdid_hosp,
selection = list(cohorts = c("5", "6"),
times = "2018")))
#> [1] "cohort_5:yr_2018" "cohort_6:yr_2018"
staggR::ave_coeff(sdid = sdid_hosp,
coefs = terms_2018_cohorts56)
#> est se pval sign lb ub n
#> 1 0.08850559 0.02620059 0.0007164136 *** 0.03715245 0.1398587 1136Alternatively, if we already know exactly which coefficients we want
to aggregate, we can simply list them in the coefs
argument.
(terms_custom <- staggR::select_terms(sdid = sdid_hosp,
coefs = c("cohort_5:yr_2018", "cohort_6:yr_2018")))
#> [1] "cohort_5:yr_2018" "cohort_6:yr_2018"
staggR::ave_coeff(sdid = sdid_hosp,
coefs = terms_custom)
#> est se pval sign lb ub n
#> 1 0.08850559 0.02620059 0.0007164136 *** 0.03715245 0.1398587 1136Because outcomes within the same unit (in our case, county) are often correlated over time, assuming independent errors could understate uncertainty. Clustering standard errors allows for correlation within clusters while maintaining independence across them. Following Sun and Abraham1, we can supply a custom variance–covariance function to compute standard errors clustered at the county level.
# Fit SDID model with standard errors clustered at the county level
sdid_hosp <- staggR::sdid(hospitalized ~ cohort + yr + age + sex + comorb,
df = hosp,
intervention_var = "intervention_yr",
.vcov = sandwich::vcovCL,
cluster = hosp$county)
summary(sdid_hosp)
#>
#> Supplied formula:
#> hospitalized ~ cohort + yr + age + sex + comorb
#>
#> Fitted formula:
#> hospitalized ~ cohort_5 + cohort_6 + cohort_7 + cohort_8 + yr_2011 +
#> yr_2012 + yr_2013 + yr_2014 + yr_2015 + yr_2016 + yr_2017 +
#> yr_2018 + yr_2019 + yr_2020 + cohort_5:yr_2010 + cohort_5:yr_2011 +
#> cohort_5:yr_2012 + cohort_5:yr_2013 + cohort_5:yr_2015 +
#> cohort_5:yr_2016 + cohort_5:yr_2017 + cohort_5:yr_2018 +
#> cohort_5:yr_2019 + cohort_5:yr_2020 + cohort_6:yr_2010 +
#> cohort_6:yr_2011 + cohort_6:yr_2012 + cohort_6:yr_2013 +
#> cohort_6:yr_2014 + cohort_6:yr_2016 + cohort_6:yr_2017 +
#> cohort_6:yr_2018 + cohort_6:yr_2019 + cohort_6:yr_2020 +
#> cohort_7:yr_2010 + cohort_7:yr_2011 + cohort_7:yr_2012 +
#> cohort_7:yr_2013 + cohort_7:yr_2014 + cohort_7:yr_2015 +
#> cohort_7:yr_2017 + cohort_7:yr_2018 + cohort_7:yr_2019 +
#> cohort_7:yr_2020 + cohort_8:yr_2010 + cohort_8:yr_2011 +
#> cohort_8:yr_2012 + cohort_8:yr_2013 + cohort_8:yr_2014 +
#> cohort_8:yr_2015 + cohort_8:yr_2016 + cohort_8:yr_2018 +
#> cohort_8:yr_2019 + cohort_8:yr_2020 + age + sex + comorb
#>
#> Residuals:
#> Min Q1 Median Q3 Max
#> -0.8656 -0.3703 -0.2186 0.473 0.8912
#>
#> Coefficients:
#> term estimate std_error t_value p_value
#> (Intercept) 0.041538404 0.045267775 0.9176153 0.35883
#> cohort_5 0.085465821 0.027716214 3.0836037 0.00205 **
#> cohort_6 -0.157204977 0.017866138 -8.7990466 0.0e+00 ***
#> cohort_7 0.077020852 0.018400409 4.1858228 0.00003 ***
#> cohort_8 -0.232052150 0.004451773 -52.1257805 0.0e+00 ***
#> yr_2011 0.053809624 0.026875736 2.0021637 0.04528 *
#> yr_2012 0.022191831 0.047318078 0.4689927 0.63908
#> yr_2013 0.041736045 0.034298156 1.2168597 0.22367
#> yr_2014 0.102216988 0.034984090 2.9218136 0.00348 **
#> yr_2015 0.119986328 0.037058526 3.2377523 0.00121 **
#> yr_2016 0.139478844 0.035277808 3.9537277 0.00008 ***
#> yr_2017 0.181007359 0.028238688 6.4099068 1.5e-10 ***
#> yr_2018 0.186156454 0.040668647 4.5773948 4.7e-06 ***
#> yr_2019 0.227572896 0.031695092 7.1800674 7.1e-13 ***
#> yr_2020 0.218923330 0.030912111 7.0821216 1.4e-12 ***
#> age 0.005021507 0.000624192 8.0448109 8.9e-16 ***
#> sexM 0.014436532 0.004693948 3.0755627 0.00210 **
#> comorb 0.007434585 0.004245490 1.7511723 0.07993 .
#> cohort_5:yr_2010 -0.073298286 0.046712037 -1.5691520 0.11662
#> cohort_5:yr_2011 -0.072496499 0.043114601 -1.6814837 0.09268 .
#> cohort_5:yr_2012 0.007407740 0.029246140 0.2532895 0.80005
#> cohort_5:yr_2013 0.018351888 0.035385632 0.5186254 0.60403
#> cohort_5:yr_2015 0.032452652 0.046848221 0.6927190 0.48849
#> cohort_5:yr_2016 0.074734676 0.039220635 1.9054938 0.05673 .
#> cohort_5:yr_2017 0.112519746 0.039615253 2.8403137 0.00451 **
#> cohort_5:yr_2018 0.182252536 0.033255362 5.4803954 4.3e-08 ***
#> cohort_5:yr_2019 0.186210308 0.019756605 9.4252178 0.0e+00 ***
#> cohort_5:yr_2020 0.261743731 0.043226983 6.0551007 1.4e-09 ***
#> cohort_6:yr_2010 0.192407710 0.039805182 4.8337353 1.3e-06 ***
#> cohort_6:yr_2011 0.081741042 0.021310552 3.8357074 0.00013 ***
#> cohort_6:yr_2012 0.126187949 0.016936012 7.4508657 9.5e-14 ***
#> cohort_6:yr_2013 0.102161820 0.055195574 1.8509060 0.06419 .
#> cohort_6:yr_2014 0.016850413 0.018629983 0.9044782 0.36575
#> cohort_6:yr_2016 0.037963212 0.015173891 2.5018772 0.01236 *
#> cohort_6:yr_2017 -0.020106968 0.046378986 -0.4335362 0.66463
#> cohort_6:yr_2018 -0.067739307 0.024751017 -2.7368292 0.00621 **
#> cohort_6:yr_2019 -0.074178123 0.027898187 -2.6588869 0.00784 **
#> cohort_6:yr_2020 -0.083133234 0.044718804 -1.8590219 0.06303 .
#> cohort_7:yr_2010 -0.027712569 0.039950251 -0.6936770 0.48789
#> cohort_7:yr_2011 -0.098466031 0.025944113 -3.7953131 0.00015 ***
#> cohort_7:yr_2012 -0.049054517 0.030866908 -1.5892268 0.11202
#> cohort_7:yr_2013 -0.039127782 0.037834688 -1.0341775 0.30106
#> cohort_7:yr_2014 -0.055261507 0.028025632 -1.9718202 0.04864 *
#> cohort_7:yr_2015 -0.042251185 0.029956121 -1.4104358 0.15842
#> cohort_7:yr_2017 0.124607104 0.013629312 9.1425824 0.0e+00 ***
#> cohort_7:yr_2018 0.131896098 0.031705189 4.1600793 0.00003 ***
#> cohort_7:yr_2019 0.157899398 0.015434476 10.2303053 0.0e+00 ***
#> cohort_7:yr_2020 0.192719914 0.037622842 5.1224178 3.0e-07 ***
#> cohort_8:yr_2010 0.169800222 0.029106010 5.8338542 5.5e-09 ***
#> cohort_8:yr_2011 0.162146753 0.014472928 11.2034516 0.0e+00 ***
#> cohort_8:yr_2012 0.229016705 0.024229351 9.4520362 0.0e+00 ***
#> cohort_8:yr_2013 0.181452224 0.018408752 9.8568452 0.0e+00 ***
#> cohort_8:yr_2014 0.161011028 0.012653160 12.7249656 0.0e+00 ***
#> cohort_8:yr_2015 0.046657219 0.010565894 4.4158325 0.00001 ***
#> cohort_8:yr_2016 0.054386841 0.007691923 7.0706431 1.6e-12 ***
#> cohort_8:yr_2018 0.008099328 0.014228755 0.5692225 0.56921
#> cohort_8:yr_2019 0.035236092 0.004283812 8.2254050 2.2e-16 ***
#> cohort_8:yr_2020 -0.003491331 0.016389585 -0.2130213 0.83131
#>
#> Significance codes: < 0.001: '***'; < 0.01: '**'; < 0.05: '*'; < 0.1: '.'
#> Residual standard error: 0.4623 on 30982 degrees of freedom
#> R^2: 0.122084901176252; Adjusted R^2: 0.120469732348128
# Examine the association between the intervention and risk of hospitalization in
# the post-intervention period
staggR::ave_coeff(sdid = sdid_hosp,
coefs = staggR::select_period(sdid = sdid_hosp,
period = "post"))
#> est se pval sign lb ub n
#> 1 0.1000987 0.01588058 2.895417e-10 *** 0.06897278 0.1312247 11745Data sets containing staggered interventions can sometimes be very large, so it might be preferable to fit the model using aggregated data. To this end, we created an alternate version of the data set that has been aggregated to the county-year level.
data("hosp_agg", package = "staggR")
head(hosp_agg)
#> # A data frame: 6 × 9
#> yr county cohort intervention_yr pct_hospitalized n_enr mean_age pct_fem
#> * <chr> <chr> <chr> <chr> <dbl> <int> <dbl> <dbl>
#> 1 2010 Ashbrook… 5 2015 0.259 108 35.1 0.5
#> 2 2011 Ashbrook… 5 2015 0.348 132 35.5 0.5
#> 3 2012 Ashbrook… 5 2015 0.314 159 35.8 0.465
#> 4 2013 Ashbrook… 5 2015 0.391 197 36.2 0.462
#> 5 2014 Ashbrook… 5 2015 0.373 220 36.3 0.473
#> 6 2015 Ashbrook… 5 2015 0.518 245 36.7 0.449
#> # ℹ 1 more variable: pct_cmb <dbl>
# This data set contains one row per county-year
nrow(hosp_agg); nrow(unique(hosp_agg[,c("yr", "county")]))
#> [1] 165
#> [1] 165As before, we can use the function sdid() to fit the
difference-in-differences regression model. The regression formula now
uses the variable representing the percent of hospitalized individuals
in each county-year (pct_hospitalized) as the dependent
variable, and the covariates have been aggregated (i.e.,
mean_age, pct_fem, and pct_cmb).
The main difference is that we need to specify the weights
argument for the number of observations contained within each row of the
data set, which is represented by the n_enr column.
sdid_hosp_agg <- staggR::sdid(pct_hospitalized ~ cohort + yr + mean_age + pct_fem + pct_cmb,
df = hosp_agg,
weights = "n_enr",
intervention_var = "intervention_yr",
# Cluster standard errors at the county level
.vcov = sandwich::vcovCL,
cluster = hosp_agg$county)Aggregating means that we decoupled values of these covariates from
outcomes within county-years, so coefficients are slightly different but
substantially similar compared to the regression using individual-level
data. We can examine differences between models by summarizing them, as
we would with a lm or glm model.
# Summarize the original model
summary(sdid_hosp)
#>
#> Supplied formula:
#> hospitalized ~ cohort + yr + age + sex + comorb
#>
#> Fitted formula:
#> hospitalized ~ cohort_5 + cohort_6 + cohort_7 + cohort_8 + yr_2011 +
#> yr_2012 + yr_2013 + yr_2014 + yr_2015 + yr_2016 + yr_2017 +
#> yr_2018 + yr_2019 + yr_2020 + cohort_5:yr_2010 + cohort_5:yr_2011 +
#> cohort_5:yr_2012 + cohort_5:yr_2013 + cohort_5:yr_2015 +
#> cohort_5:yr_2016 + cohort_5:yr_2017 + cohort_5:yr_2018 +
#> cohort_5:yr_2019 + cohort_5:yr_2020 + cohort_6:yr_2010 +
#> cohort_6:yr_2011 + cohort_6:yr_2012 + cohort_6:yr_2013 +
#> cohort_6:yr_2014 + cohort_6:yr_2016 + cohort_6:yr_2017 +
#> cohort_6:yr_2018 + cohort_6:yr_2019 + cohort_6:yr_2020 +
#> cohort_7:yr_2010 + cohort_7:yr_2011 + cohort_7:yr_2012 +
#> cohort_7:yr_2013 + cohort_7:yr_2014 + cohort_7:yr_2015 +
#> cohort_7:yr_2017 + cohort_7:yr_2018 + cohort_7:yr_2019 +
#> cohort_7:yr_2020 + cohort_8:yr_2010 + cohort_8:yr_2011 +
#> cohort_8:yr_2012 + cohort_8:yr_2013 + cohort_8:yr_2014 +
#> cohort_8:yr_2015 + cohort_8:yr_2016 + cohort_8:yr_2018 +
#> cohort_8:yr_2019 + cohort_8:yr_2020 + age + sex + comorb
#>
#> Residuals:
#> Min Q1 Median Q3 Max
#> -0.8656 -0.3703 -0.2186 0.473 0.8912
#>
#> Coefficients:
#> term estimate std_error t_value p_value
#> (Intercept) 0.041538404 0.045267775 0.9176153 0.35883
#> cohort_5 0.085465821 0.027716214 3.0836037 0.00205 **
#> cohort_6 -0.157204977 0.017866138 -8.7990466 0.0e+00 ***
#> cohort_7 0.077020852 0.018400409 4.1858228 0.00003 ***
#> cohort_8 -0.232052150 0.004451773 -52.1257805 0.0e+00 ***
#> yr_2011 0.053809624 0.026875736 2.0021637 0.04528 *
#> yr_2012 0.022191831 0.047318078 0.4689927 0.63908
#> yr_2013 0.041736045 0.034298156 1.2168597 0.22367
#> yr_2014 0.102216988 0.034984090 2.9218136 0.00348 **
#> yr_2015 0.119986328 0.037058526 3.2377523 0.00121 **
#> yr_2016 0.139478844 0.035277808 3.9537277 0.00008 ***
#> yr_2017 0.181007359 0.028238688 6.4099068 1.5e-10 ***
#> yr_2018 0.186156454 0.040668647 4.5773948 4.7e-06 ***
#> yr_2019 0.227572896 0.031695092 7.1800674 7.1e-13 ***
#> yr_2020 0.218923330 0.030912111 7.0821216 1.4e-12 ***
#> age 0.005021507 0.000624192 8.0448109 8.9e-16 ***
#> sexM 0.014436532 0.004693948 3.0755627 0.00210 **
#> comorb 0.007434585 0.004245490 1.7511723 0.07993 .
#> cohort_5:yr_2010 -0.073298286 0.046712037 -1.5691520 0.11662
#> cohort_5:yr_2011 -0.072496499 0.043114601 -1.6814837 0.09268 .
#> cohort_5:yr_2012 0.007407740 0.029246140 0.2532895 0.80005
#> cohort_5:yr_2013 0.018351888 0.035385632 0.5186254 0.60403
#> cohort_5:yr_2015 0.032452652 0.046848221 0.6927190 0.48849
#> cohort_5:yr_2016 0.074734676 0.039220635 1.9054938 0.05673 .
#> cohort_5:yr_2017 0.112519746 0.039615253 2.8403137 0.00451 **
#> cohort_5:yr_2018 0.182252536 0.033255362 5.4803954 4.3e-08 ***
#> cohort_5:yr_2019 0.186210308 0.019756605 9.4252178 0.0e+00 ***
#> cohort_5:yr_2020 0.261743731 0.043226983 6.0551007 1.4e-09 ***
#> cohort_6:yr_2010 0.192407710 0.039805182 4.8337353 1.3e-06 ***
#> cohort_6:yr_2011 0.081741042 0.021310552 3.8357074 0.00013 ***
#> cohort_6:yr_2012 0.126187949 0.016936012 7.4508657 9.5e-14 ***
#> cohort_6:yr_2013 0.102161820 0.055195574 1.8509060 0.06419 .
#> cohort_6:yr_2014 0.016850413 0.018629983 0.9044782 0.36575
#> cohort_6:yr_2016 0.037963212 0.015173891 2.5018772 0.01236 *
#> cohort_6:yr_2017 -0.020106968 0.046378986 -0.4335362 0.66463
#> cohort_6:yr_2018 -0.067739307 0.024751017 -2.7368292 0.00621 **
#> cohort_6:yr_2019 -0.074178123 0.027898187 -2.6588869 0.00784 **
#> cohort_6:yr_2020 -0.083133234 0.044718804 -1.8590219 0.06303 .
#> cohort_7:yr_2010 -0.027712569 0.039950251 -0.6936770 0.48789
#> cohort_7:yr_2011 -0.098466031 0.025944113 -3.7953131 0.00015 ***
#> cohort_7:yr_2012 -0.049054517 0.030866908 -1.5892268 0.11202
#> cohort_7:yr_2013 -0.039127782 0.037834688 -1.0341775 0.30106
#> cohort_7:yr_2014 -0.055261507 0.028025632 -1.9718202 0.04864 *
#> cohort_7:yr_2015 -0.042251185 0.029956121 -1.4104358 0.15842
#> cohort_7:yr_2017 0.124607104 0.013629312 9.1425824 0.0e+00 ***
#> cohort_7:yr_2018 0.131896098 0.031705189 4.1600793 0.00003 ***
#> cohort_7:yr_2019 0.157899398 0.015434476 10.2303053 0.0e+00 ***
#> cohort_7:yr_2020 0.192719914 0.037622842 5.1224178 3.0e-07 ***
#> cohort_8:yr_2010 0.169800222 0.029106010 5.8338542 5.5e-09 ***
#> cohort_8:yr_2011 0.162146753 0.014472928 11.2034516 0.0e+00 ***
#> cohort_8:yr_2012 0.229016705 0.024229351 9.4520362 0.0e+00 ***
#> cohort_8:yr_2013 0.181452224 0.018408752 9.8568452 0.0e+00 ***
#> cohort_8:yr_2014 0.161011028 0.012653160 12.7249656 0.0e+00 ***
#> cohort_8:yr_2015 0.046657219 0.010565894 4.4158325 0.00001 ***
#> cohort_8:yr_2016 0.054386841 0.007691923 7.0706431 1.6e-12 ***
#> cohort_8:yr_2018 0.008099328 0.014228755 0.5692225 0.56921
#> cohort_8:yr_2019 0.035236092 0.004283812 8.2254050 2.2e-16 ***
#> cohort_8:yr_2020 -0.003491331 0.016389585 -0.2130213 0.83131
#>
#> Significance codes: < 0.001: '***'; < 0.01: '**'; < 0.05: '*'; < 0.1: '.'
#> Residual standard error: 0.4623 on 30982 degrees of freedom
#> R^2: 0.122084901176252; Adjusted R^2: 0.120469732348128
# Summarize the model based on aggregated data
summary(sdid_hosp_agg)
#>
#> Supplied formula:
#> pct_hospitalized ~ cohort + yr + mean_age + pct_fem + pct_cmb
#>
#> Fitted formula:
#> pct_hospitalized ~ cohort_5 + cohort_6 + cohort_7 + cohort_8 +
#> yr_2011 + yr_2012 + yr_2013 + yr_2014 + yr_2015 + yr_2016 +
#> yr_2017 + yr_2018 + yr_2019 + yr_2020 + cohort_5:yr_2010 +
#> cohort_5:yr_2011 + cohort_5:yr_2012 + cohort_5:yr_2013 +
#> cohort_5:yr_2015 + cohort_5:yr_2016 + cohort_5:yr_2017 +
#> cohort_5:yr_2018 + cohort_5:yr_2019 + cohort_5:yr_2020 +
#> cohort_6:yr_2010 + cohort_6:yr_2011 + cohort_6:yr_2012 +
#> cohort_6:yr_2013 + cohort_6:yr_2014 + cohort_6:yr_2016 +
#> cohort_6:yr_2017 + cohort_6:yr_2018 + cohort_6:yr_2019 +
#> cohort_6:yr_2020 + cohort_7:yr_2010 + cohort_7:yr_2011 +
#> cohort_7:yr_2012 + cohort_7:yr_2013 + cohort_7:yr_2014 +
#> cohort_7:yr_2015 + cohort_7:yr_2017 + cohort_7:yr_2018 +
#> cohort_7:yr_2019 + cohort_7:yr_2020 + cohort_8:yr_2010 +
#> cohort_8:yr_2011 + cohort_8:yr_2012 + cohort_8:yr_2013 +
#> cohort_8:yr_2014 + cohort_8:yr_2015 + cohort_8:yr_2016 +
#> cohort_8:yr_2018 + cohort_8:yr_2019 + cohort_8:yr_2020 +
#> mean_age + pct_fem + pct_cmb
#>
#> Residuals:
#> Min Q1 Median Q3 Max
#> -1.1314 -0.2452 0 0.2286 0.9615
#>
#> Coefficients:
#> term estimate std_error t_value p_value
#> (Intercept) 0.228693901 0.322881661 0.70829015 0.48031
#> cohort_5 0.090752122 0.035935625 2.52540820 0.01302 *
#> cohort_6 -0.161546669 0.020864811 -7.74254183 5.8e-12 ***
#> cohort_7 0.074689254 0.023650709 3.15801334 0.00206 **
#> cohort_8 -0.238704323 0.010795223 -22.11203182 0.0e+00 ***
#> yr_2011 0.053217642 0.032687929 1.62805182 0.10645
#> yr_2012 0.025658025 0.060724164 0.42253403 0.67348
#> yr_2013 0.048676642 0.047491057 1.02496438 0.30769
#> yr_2014 0.111307673 0.048899012 2.27627652 0.02482 *
#> yr_2015 0.131778714 0.053450280 2.46544479 0.01527 *
#> yr_2016 0.152113979 0.051690073 2.94280833 0.00399 **
#> yr_2017 0.196827024 0.047779418 4.11949397 0.00008 ***
#> yr_2018 0.200525981 0.060793374 3.29848416 0.00132 **
#> yr_2019 0.241053642 0.048216896 4.99936048 2.3e-06 ***
#> yr_2020 0.227642925 0.041266317 5.51643430 2.4e-07 ***
#> mean_age -0.001117751 0.008708643 -0.12834957 0.89811
#> pct_fem -0.001137132 0.089179306 -0.01275107 0.98985
#> pct_cmb 0.064343082 0.080361840 0.80066711 0.42510
#> cohort_5:yr_2010 -0.078048485 0.057633107 -1.35423004 0.17852
#> cohort_5:yr_2011 -0.073868461 0.056009708 -1.31885103 0.19003
#> cohort_5:yr_2012 0.006914643 0.036732333 0.18824404 0.85104
#> cohort_5:yr_2013 0.016749498 0.043164816 0.38803589 0.69876
#> cohort_5:yr_2015 0.029976283 0.056849689 0.52729019 0.59908
#> cohort_5:yr_2016 0.072492394 0.048277349 1.50158192 0.13615
#> cohort_5:yr_2017 0.107452271 0.048854246 2.19944588 0.03000 *
#> cohort_5:yr_2018 0.179814496 0.041723476 4.30967197 0.00004 ***
#> cohort_5:yr_2019 0.183575249 0.024702718 7.43137870 2.8e-11 ***
#> cohort_5:yr_2020 0.254698618 0.054186426 4.70041369 7.8e-06 ***
#> cohort_6:yr_2010 0.195231407 0.050606871 3.85780432 0.00020 ***
#> cohort_6:yr_2011 0.088438535 0.032468976 2.72378574 0.00754 **
#> cohort_6:yr_2012 0.131341225 0.021888462 6.00047767 2.7e-08 ***
#> cohort_6:yr_2013 0.103776798 0.069542351 1.49228199 0.13857
#> cohort_6:yr_2014 0.020857116 0.021794643 0.95698357 0.34073
#> cohort_6:yr_2016 0.041451067 0.017865726 2.32014450 0.02223 *
#> cohort_6:yr_2017 -0.018284118 0.054735070 -0.33404759 0.73900
#> cohort_6:yr_2018 -0.062396321 0.030822420 -2.02438098 0.04542 *
#> cohort_6:yr_2019 -0.069263623 0.034670807 -1.99775055 0.04828 *
#> cohort_6:yr_2020 -0.079013447 0.053561485 -1.47519151 0.14310
#> cohort_7:yr_2010 -0.032002540 0.046205257 -0.69261687 0.49005
#> cohort_7:yr_2011 -0.095926160 0.037377329 -2.56642633 0.01166 *
#> cohort_7:yr_2012 -0.044481466 0.038813288 -1.14603703 0.25434
#> cohort_7:yr_2013 -0.035256339 0.048640480 -0.72483535 0.47014
#> cohort_7:yr_2014 -0.053916075 0.034308437 -1.57151065 0.11902
#> cohort_7:yr_2015 -0.041956676 0.035908135 -1.16844487 0.24522
#> cohort_7:yr_2017 0.122128992 0.015191221 8.03944538 1.3e-12 ***
#> cohort_7:yr_2018 0.133041240 0.040262463 3.30434928 0.00130 **
#> cohort_7:yr_2019 0.158755768 0.018854300 8.42013584 1.9e-13 ***
#> cohort_7:yr_2020 0.188190948 0.045681855 4.11959951 0.00008 ***
#> cohort_8:yr_2010 0.168017984 0.037608434 4.46756126 0.00002 ***
#> cohort_8:yr_2011 0.163233956 0.021546976 7.57572466 1.3e-11 ***
#> cohort_8:yr_2012 0.226848004 0.029934054 7.57825861 1.3e-11 ***
#> cohort_8:yr_2013 0.182161256 0.023642829 7.70471486 7.1e-12 ***
#> cohort_8:yr_2014 0.167771896 0.015444263 10.86305609 0.0e+00 ***
#> cohort_8:yr_2015 0.053514914 0.010942679 4.89047637 3.6e-06 ***
#> cohort_8:yr_2016 0.055586510 0.012678831 4.38419826 0.00003 ***
#> cohort_8:yr_2018 0.007757137 0.016926798 0.45827550 0.64768
#> cohort_8:yr_2019 0.041353989 0.007016441 5.89387009 4.4e-08 ***
#> cohort_8:yr_2020 -0.004063714 0.019899948 -0.20420726 0.83858
#>
#> Significance codes: < 0.001: '***'; < 0.01: '**'; < 0.05: '*'; < 0.1: '.'
#> Residual standard error: 0.3943 on 107 degrees of freedom
#> R^2: 0.972406151984569; Adjusted R^2: 0.957706625471676If we would like to compare the two models’ estimates of risk of
hospitalization associated with the intervention during the
post-intervention period, we can use ave_coeff() again.
# Summarize the post-intervention period from the original model
summary_sdid_hosp <-
staggR::ave_coeff(sdid = sdid_hosp,
coefs = staggR::select_period(sdid = sdid_hosp,
period = "post"))
# # Summarize the post-intervention period from the model fitted using aggregated data
summary_sdid_hosp_agg <-
staggR::ave_coeff(sdid = sdid_hosp_agg,
coefs = staggR::select_period(sdid = sdid_hosp_agg,
period = "post"))
# Combine the two summaries
summary_sdid_hosp$model <- "Individual-level data"
summary_sdid_hosp_agg$model <- "Aggregated data"
rbind(summary_sdid_hosp, summary_sdid_hosp_agg)
#> est se pval sign lb ub n
#> 1 0.10009873 0.01588058 2.895417e-10 *** 0.06897278 0.1312247 11745
#> 2 0.09905923 0.01936651 1.359248e-06 *** 0.06110088 0.1370176 11745
#> model
#> 1 Individual-level data
#> 2 Aggregated data