Example using simulated data

Explore data

Consider as a didactic example a policy intervention designed to reduce inpatient hospitalizations in 15 counties. This longitudinal data set contains one row per individual-year. Each individual in the data set is identified by a globally unique identifier (guid), and we have measures of the individuals’ ages, sexes, and comorbidities, and an indicator showing whether the individual was hospitalized during the current year. Each individual resides in a county, which itself is grouped into cohorts based on intervention year. Finally, the yr column indicates the year of the current observation.

data("hosp", package = "staggR")
head(hosp, 20)
#>            guid age sex comorb hospitalized            county intervention_yr
#> 1  000P7XB6O5P8  26   M      1        FALSE  Otter Pop County            2017
#> 2  0022LHNIMI01  27   F      1        FALSE Briar Glen County            2016
#> 3  0022LHNIMI01  28   F      0        FALSE Briar Glen County            2016
#> 4  0022LHNIMI01  29   F      1         TRUE Briar Glen County            2016
#> 5  0022LHNIMI01  30   F      1        FALSE Briar Glen County            2016
#> 6  0032MSL22PRS  41   M      1        FALSE Northhaven County            <NA>
#> 7  0032MSL22PRS  42   M      1         TRUE Northhaven County            <NA>
#> 8  0032MSL22PRS  43   M      0        FALSE Northhaven County            <NA>
#> 9  0032MSL22PRS  44   M      0        FALSE Northhaven County            <NA>
#> 10 0032MSL22PRS  45   M      0        FALSE Northhaven County            <NA>
#> 11 004NK9W8RUT0  35   F      0        FALSE  Driftwood County            2017
#> 12 004NK9W8RUT0  36   F      1        FALSE  Driftwood County            2017
#> 13 004NK9W8RUT0  37   F      1         TRUE  Driftwood County            2017
#> 14 008Z7PKWNY5K  32   F      1        FALSE  Clearfork County            2017
#> 15 008Z7PKWNY5K  33   F      1         TRUE  Clearfork County            2017
#> 16 008Z7PKWNY5K  34   F      1        FALSE  Clearfork County            2017
#> 17 008Z7PKWNY5K  35   F      1        FALSE  Clearfork County            2017
#> 18 008Z7PKWNY5K  36   F      0        FALSE  Clearfork County            2017
#> 19 008Z7PKWNY5K  37   F      1        FALSE  Clearfork County            2017
#> 20 008Z7PKWNY5K  38   F      0         TRUE  Clearfork County            2017
#>    cohort   yr
#> 1       7 2018
#> 2       6 2015
#> 3       6 2016
#> 4       6 2017
#> 5       6 2018
#> 6       0 2013
#> 7       0 2014
#> 8       0 2015
#> 9       0 2016
#> 10      0 2017
#> 11      7 2011
#> 12      7 2012
#> 13      7 2013
#> 14      7 2010
#> 15      7 2011
#> 16      7 2012
#> 17      7 2013
#> 18      7 2014
#> 19      7 2015
#> 20      7 2016

The county is never exposed to the intervetnion if intervention_yr is NA. Among the 15 counties in the data set, 3 counties implemented the intervention in 2015; 2 counties implemented in 2016; 5 counties implemented in 2017; 1 county implemented in 2018; and 4 counties did not implement the intervention at all during the study period.

with(hosp,
     table(county, intervention_yr, useNA = "ifany"))
#>                        intervention_yr
#> county                  2015 2016 2017 2018 <NA>
#>   Ashbrook County       2177    0    0    0    0
#>   Banana Peel County       0    0 2017    0    0
#>   Briar Glen County        0 2101    0    0    0
#>   Cinder Bluff County      0    0 1879    0    0
#>   Clearfork County         0    0 1863    0    0
#>   Driftwood County         0    0 1749    0    0
#>   Mapleford County         0    0    0    0 2146
#>   Meadowridge County       0 1958    0    0    0
#>   Moonwhistle County       0    0    0    0 2309
#>   Northhaven County        0    0    0    0 2209
#>   Otter Pop County         0    0 1890    0    0
#>   Pickle Springs County    0    0    0 1727    0
#>   Pine Hollow County    2297    0    0    0    0
#>   Silver Run County        0    0    0    0 2327
#>   Stonefield County     2391    0    0    0    0

Counties are organized into cohorts by intervention year, i.e., counties with the same intervention year are assigned to the same cohort.

county_cohorts <- unique(hosp[, c("county", "cohort", "intervention_yr")])
county_cohorts[order(county_cohorts$cohort), ]
#>                    county cohort intervention_yr
#> 6       Northhaven County      0            <NA>
#> 31     Moonwhistle County      0            <NA>
#> 95       Mapleford County      0            <NA>
#> 100     Silver Run County      0            <NA>
#> 28     Pine Hollow County      5            2015
#> 43      Stonefield County      5            2015
#> 96        Ashbrook County      5            2015
#> 2       Briar Glen County      6            2016
#> 36     Meadowridge County      6            2016
#> 1        Otter Pop County      7            2017
#> 11       Driftwood County      7            2017
#> 14       Clearfork County      7            2017
#> 56     Banana Peel County      7            2017
#> 73    Cinder Bluff County      7            2017
#> 22  Pickle Springs County      8            2018

The study period includes 11 years, from 2010 through 2020. Hospitalizations occurs in every county-year.

stats::xtabs(hospitalized ~ county + yr, 
             data = hosp)
#>                        yr
#> county                  2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
#>   Ashbrook County         28   46   50   77   82  127  126  167  174  139  137
#>   Banana Peel County      25   28   39   49   59   67  108  154  140  149  196
#>   Briar Glen County       25   27   32   47   41   40   63   66   45   44   62
#>   Cinder Bluff County     14   18   21   37   71   76   98  139  138  142  218
#>   Clearfork County        16   27   37   35   54   81  100  128  140  127  191
#>   Driftwood County        16   18   28   39   45   74   76  126  137  123  210
#>   Mapleford County         7   25   44   52   93  105  100  108  114   95   62
#>   Meadowridge County      17   20   29   27   37   46   60   45   41   43   47
#>   Moonwhistle County      15   27   35   72   85   89  108  125  134  117   67
#>   Northhaven County       10   30   36   47   70   99  119  122  110  114   74
#>   Otter Pop County        16   24   32   56   60   71   91  139  142  134  221
#>   Pickle Springs County    8   14   21   24   35   27   35   36   41   52   77
#>   Pine Hollow County      33   49   67   79  110  111  132  142  158  145  157
#>   Silver Run County       15   32   36   48   77   96  109  130  111  109   78
#>   Stonefield County       28   41   75   91  109  129  169  172  163  125  144

Visualize hospitalization trends

To visualize trends in the proportion of individuals who were hospitalized over time, we can call the function ts_plot(), which takes, at minimum, 3 arguments: a formula, in the format outcome ~ group + time_var, the source data set, and the name of the column that contains the time period during which each group implemented the intervention.

staggR::ts_plot(hospitalized ~ county + yr,
                df = hosp,
                intervention_var = "intervention_yr")

Equivalently, we can specify the outcome, group, and time variable using the parameters y, group, and time_var.

staggR::ts_plot(y = "hospitalized",
                group = "county",
                time_var = "yr",
                df = hosp,
                intervention_var = "intervention_yr")

While this visualization is helpful, it excludes counties that did not implement the intervention. This is because the function ts_plot() omits any counties where any of y, group, time_var, (or our formula equivalent, hospitalized ~ county + yr) or the intervention_var are NA.

table(hosp$hospitalized, useNA = "always")
#> 
#> FALSE  TRUE  <NA> 
#> 18033 13007     0
table(hosp$county, useNA = "always")
#> 
#>       Ashbrook County    Banana Peel County     Briar Glen County 
#>                  2177                  2017                  2101 
#>   Cinder Bluff County      Clearfork County      Driftwood County 
#>                  1879                  1863                  1749 
#>      Mapleford County    Meadowridge County    Moonwhistle County 
#>                  2146                  1958                  2309 
#>     Northhaven County      Otter Pop County Pickle Springs County 
#>                  2209                  1890                  1727 
#>    Pine Hollow County     Silver Run County     Stonefield County 
#>                  2297                  2327                  2391 
#>                  <NA> 
#>                     0
table(hosp$yr, useNA = "always")
#> 
#> 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 <NA> 
#> 1104 1592 2059 2569 2996 3385 3632 3677 3539 3006 3481    0
table(hosp$intervention_yr, useNA = "always")
#> 
#> 2015 2016 2017 2018 <NA> 
#> 6865 4059 9398 1727 8991

We can add these counties to the plot if we change NAs in intervention_yr to some other string value.

hosp2 <- hosp
hosp2$intervention_yr <- ifelse(is.na(hosp2$intervention_yr),
                                yes = "Comparison group",
                                no = hosp2$intervention_yr)
county_plot <- staggR::ts_plot(hospitalized ~ county + yr,
                               df = hosp2,
                               intervention_var = "intervention_yr")
county_plot

It might be easier to compare the panels in this plot if the X axes were all aligned at the point of intervention. We can accomplish this by passing a time-since-intervention (tsi) object to ts_plot(). We create a tsi object by passing the hosp2 data set to the id_tsi() function, and we identify the names of the columns that contain intervention cohorts, time periods, and the time period during which each cohort implemented the intervention. The returned tsi object contains a data frame, which includes columns cohort, time, intervention_time, and tsi. Each row corresponds to a unique county-time period. The tsi column contains an integer that tells us how many time periods the corresponding time period is away from the intervention. For example, Ashbrook County belongs to cohort 5, which means the intervention took place in 2015. If we examine the first row of the tsi object, we’ll see that the time period 2010 is 5 time periods prior to the intervention, so tsi has a value of -5. Values of tsi are negative before the intervention, 0 during the year of intervention, and positive for the years following the intervention.

hosp_tsi <- staggR::id_tsi(df = hosp2,
                           cohort_var = "county",
                           time_var = "yr",
                           intervention_var = "intervention_yr")
head(hosp_tsi$data, 11)
#>              cohort time intervention_time tsi
#> 634 Ashbrook County 2010              2015  -5
#> 96  Ashbrook County 2011              2015  -4
#> 97  Ashbrook County 2012              2015  -3
#> 98  Ashbrook County 2013              2015  -2
#> 99  Ashbrook County 2014              2015  -1
#> 218 Ashbrook County 2015              2015   0
#> 240 Ashbrook County 2016              2015   1
#> 129 Ashbrook County 2017              2015   2
#> 242 Ashbrook County 2018              2015   3
#> 243 Ashbrook County 2019              2015   4
#> 291 Ashbrook County 2020              2015   5

Now we can pass hosp_tsi to ts_plot() to generate a set of plots with time since intervention on the X axis instead of years.

county_plot <- staggR::ts_plot(hospitalized ~ county + yr,
                               df = hosp2,
                               intervention_var = "intervention_yr", 
                               tsi = hosp_tsi)
county_plot

We might like to customize this plot a little bit and use better labels for the X and Y axes. The function ts_plot() returns a ggplot object, so we can append any ggplot2 functions to customize the plot.

library(ggplot2)
county_plot +
  scale_x_continuous(name = "Years", 
                     breaks = seq(-8, 5, by = 2)) +
  scale_y_continuous(name = "Percent hospitalized",
                     breaks = seq(0, 1, by = 0.2),
                     labels = function(x) paste0(round(x * 100), "%")) +
  theme_classic() +
  theme(panel.grid.major.x = element_line(linewidth = 0.5))

Fit a difference-in-differences regression model

Now that we have some familiarity with the data, we can fit a staggered difference-in-differences model. The function for estimating such a model is called sdid().

The first argument of the function is the regression formula. The order of terms in the regression formula is important: sdid() will assume that the first term on the right-hand side of the formula is the variable that identifies cohorts, and the second term is the variable that identifies the time period for each observation. All remaining terms on the right-hand side of the formula are covariates that we would like to use to adjust our model. The dependent variable is on the left-hand side of the formula.

Other important parameters include intervention_var, which specifies the name of the column that contains the time period during which each cohort implemented the intervention (in our case, this is intervention_yr), and df, which specifies the data set.

The call for estimating the staggered difference-in-differences regression, adjusting for age, sex, and comorbitiy, is as follows:

sdid_hosp <- staggR::sdid(hospitalized ~ cohort + yr + age + sex + comorb,
                          df = hosp,
                          intervention_var  = "intervention_yr")

summary(sdid_hosp)
#> 
#> Supplied formula:
#> hospitalized ~ cohort + yr + age + sex + comorb
#> 
#> Fitted formula:
#> hospitalized ~ cohort_5 + cohort_6 + cohort_7 + cohort_8 + yr_2011 + 
#>     yr_2012 + yr_2013 + yr_2014 + yr_2015 + yr_2016 + yr_2017 + 
#>     yr_2018 + yr_2019 + yr_2020 + cohort_5:yr_2010 + cohort_5:yr_2011 + 
#>     cohort_5:yr_2012 + cohort_5:yr_2013 + cohort_5:yr_2015 + 
#>     cohort_5:yr_2016 + cohort_5:yr_2017 + cohort_5:yr_2018 + 
#>     cohort_5:yr_2019 + cohort_5:yr_2020 + cohort_6:yr_2010 + 
#>     cohort_6:yr_2011 + cohort_6:yr_2012 + cohort_6:yr_2013 + 
#>     cohort_6:yr_2014 + cohort_6:yr_2016 + cohort_6:yr_2017 + 
#>     cohort_6:yr_2018 + cohort_6:yr_2019 + cohort_6:yr_2020 + 
#>     cohort_7:yr_2010 + cohort_7:yr_2011 + cohort_7:yr_2012 + 
#>     cohort_7:yr_2013 + cohort_7:yr_2014 + cohort_7:yr_2015 + 
#>     cohort_7:yr_2017 + cohort_7:yr_2018 + cohort_7:yr_2019 + 
#>     cohort_7:yr_2020 + cohort_8:yr_2010 + cohort_8:yr_2011 + 
#>     cohort_8:yr_2012 + cohort_8:yr_2013 + cohort_8:yr_2014 + 
#>     cohort_8:yr_2015 + cohort_8:yr_2016 + cohort_8:yr_2018 + 
#>     cohort_8:yr_2019 + cohort_8:yr_2020 + age + sex + comorb
#> 
#> Residuals:
#>      Min      Q1  Median    Q3    Max
#>  -0.8656 -0.3703 -0.2186 0.473 0.8912
#> 
#> Coefficients:
#>              term     estimate    std_error    t_value p_value    
#>       (Intercept)  0.041538404 0.0367618169  1.1299334 0.25851    
#>          cohort_5  0.085465821 0.0229393129  3.7257359 0.00020 ***
#>          cohort_6 -0.157204977 0.0263264217 -5.9713766 2.4e-09 ***
#>          cohort_7  0.077020852 0.0197878817  3.8923242 0.00010 ***
#>          cohort_8 -0.232052150 0.0358428703 -6.4741509 9.7e-11 ***
#>           yr_2011  0.053809624 0.0396503707  1.3571027 0.17476    
#>           yr_2012  0.022191831 0.0374003952  0.5933582 0.55295    
#>           yr_2013  0.041736045 0.0361860691  1.1533733 0.24877    
#>           yr_2014  0.102216988 0.0355358831  2.8764443 0.00402 ** 
#>           yr_2015  0.119986328 0.0351817131  3.4104743 0.00065 ***
#>           yr_2016  0.139478844 0.0350227354  3.9825229 0.00007 ***
#>           yr_2017  0.181007359 0.0350332547  5.1667297 2.4e-07 ***
#>           yr_2018  0.186156454 0.0351558360  5.2951793 1.2e-07 ***
#>           yr_2019  0.227572896 0.0356460646  6.3842362 1.7e-10 ***
#>           yr_2020  0.218923330 0.0372348645  5.8795254 4.2e-09 ***
#>               age  0.005021507 0.0004916025 10.2145669 0.0e+00 ***
#>              sexM  0.014436532 0.0052571103  2.7460965 0.00603 ** 
#>            comorb  0.007434585 0.0054038538  1.3757931 0.16890    
#>  cohort_5:yr_2010 -0.073298286 0.0463075431 -1.5828584 0.11346    
#>  cohort_5:yr_2011 -0.072496499 0.0390939557 -1.8544171 0.06369 .  
#>  cohort_5:yr_2012  0.007407740 0.0357130944  0.2074236 0.83568    
#>  cohort_5:yr_2013  0.018351888 0.0335827224  0.5464681 0.58475    
#>  cohort_5:yr_2015  0.032452652 0.0316589301  1.0250710 0.30534    
#>  cohort_5:yr_2016  0.074734676 0.0313560797  2.3834190 0.01716 *  
#>  cohort_5:yr_2017  0.112519746 0.0314518730  3.5775213 0.00035 ***
#>  cohort_5:yr_2018  0.182252536 0.0319802129  5.6989156 1.2e-08 ***
#>  cohort_5:yr_2019  0.186210308 0.0338471964  5.5014987 3.8e-08 ***
#>  cohort_5:yr_2020  0.261743731 0.0355565320  7.3613403 1.9e-13 ***
#>  cohort_6:yr_2010  0.192407710 0.0553397293  3.4768459 0.00051 ***
#>  cohort_6:yr_2011  0.081741042 0.0466185244  1.7534026 0.07954 .  
#>  cohort_6:yr_2012  0.126187949 0.0428912399  2.9420448 0.00326 ** 
#>  cohort_6:yr_2013  0.102161820 0.0399766017  2.5555404 0.01061 *  
#>  cohort_6:yr_2014  0.016850413 0.0382018594  0.4410888 0.65915    
#>  cohort_6:yr_2016  0.037963212 0.0364675056  1.0410148 0.29788    
#>  cohort_6:yr_2017 -0.020106968 0.0367710526 -0.5468151 0.58451    
#>  cohort_6:yr_2018 -0.067739307 0.0372930932 -1.8164036 0.06932 .  
#>  cohort_6:yr_2019 -0.074178123 0.0387938325 -1.9121112 0.05587 .  
#>  cohort_6:yr_2020 -0.083133234 0.0381499590 -2.1791173 0.02933 *  
#>  cohort_7:yr_2010 -0.027712569 0.0458969192 -0.6038002 0.54598    
#>  cohort_7:yr_2011 -0.098466031 0.0375474688 -2.6224412 0.00873 ** 
#>  cohort_7:yr_2012 -0.049054517 0.0336942562 -1.4558718 0.14544    
#>  cohort_7:yr_2013 -0.039127782 0.0311998428 -1.2541019 0.20981    
#>  cohort_7:yr_2014 -0.055261507 0.0296767494 -1.8621145 0.06260 .  
#>  cohort_7:yr_2015 -0.042251185 0.0285647712 -1.4791361 0.13911    
#>  cohort_7:yr_2017  0.124607104 0.0277761427  4.4861198 7.3e-06 ***
#>  cohort_7:yr_2018  0.131896098 0.0279532596  4.7184514 2.4e-06 ***
#>  cohort_7:yr_2019  0.157899398 0.0290538400  5.4347170 5.5e-08 ***
#>  cohort_7:yr_2020  0.192719914 0.0297778430  6.4719232 9.8e-11 ***
#>  cohort_8:yr_2010  0.169800222 0.0812808931  2.0890546 0.03671 *  
#>  cohort_8:yr_2011  0.162146753 0.0708023809  2.2901314 0.02202 *  
#>  cohort_8:yr_2012  0.229016705 0.0645212698  3.5494761 0.00039 ***
#>  cohort_8:yr_2013  0.181452224 0.0596286196  3.0430391 0.00234 ** 
#>  cohort_8:yr_2014  0.161011028 0.0560541472  2.8724195 0.00408 ** 
#>  cohort_8:yr_2015  0.046657219 0.0533463636  0.8746092 0.38179    
#>  cohort_8:yr_2016  0.054386841 0.0519756746  1.0463903 0.29539    
#>  cohort_8:yr_2018  0.008099328 0.0501890641  0.1613763 0.87180    
#>  cohort_8:yr_2019  0.035236092 0.0511142067  0.6893600 0.49060    
#>  cohort_8:yr_2020 -0.003491331 0.0471774065 -0.0740043 0.94101    
#> 
#> Significance codes: < 0.001: '***'; < 0.01: '**'; < 0.05: '*'; < 0.1: '.'
#> Residual standard error: 0.4623 on 30982 degrees of freedom
#> R^2: 0.122084901176252; Adjusted R^2: 0.120469732348128

Although we supplied a formula to fit this model, it is important to recognize that this formula is specified according to a convenience syntax, which assumes that we want to fit a Sun and Abraham-style staggered difference-in-differences model, rather than the very simple model implied by our formula. If we would like to see the formula passed to lm() to fit the staggered difference-in-differences model, we can examine the components of the sdid_hosp object.

names(sdid_hosp)
#> [1] "mdl"              "formula"          "vcov"             "tsi"             
#> [5] "obs_cnt"          "cohort"           "time"             "intervention_var"
#> [9] "covariates"
sdid_hosp$formula
#> $supplied
#> hospitalized ~ cohort + yr + age + sex + comorb
#> 
#> $fitted
#> hospitalized ~ cohort_5 + cohort_6 + cohort_7 + cohort_8 + yr_2011 + 
#>     yr_2012 + yr_2013 + yr_2014 + yr_2015 + yr_2016 + yr_2017 + 
#>     yr_2018 + yr_2019 + yr_2020 + cohort_5:yr_2010 + cohort_5:yr_2011 + 
#>     cohort_5:yr_2012 + cohort_5:yr_2013 + cohort_5:yr_2015 + 
#>     cohort_5:yr_2016 + cohort_5:yr_2017 + cohort_5:yr_2018 + 
#>     cohort_5:yr_2019 + cohort_5:yr_2020 + cohort_6:yr_2010 + 
#>     cohort_6:yr_2011 + cohort_6:yr_2012 + cohort_6:yr_2013 + 
#>     cohort_6:yr_2014 + cohort_6:yr_2016 + cohort_6:yr_2017 + 
#>     cohort_6:yr_2018 + cohort_6:yr_2019 + cohort_6:yr_2020 + 
#>     cohort_7:yr_2010 + cohort_7:yr_2011 + cohort_7:yr_2012 + 
#>     cohort_7:yr_2013 + cohort_7:yr_2014 + cohort_7:yr_2015 + 
#>     cohort_7:yr_2017 + cohort_7:yr_2018 + cohort_7:yr_2019 + 
#>     cohort_7:yr_2020 + cohort_8:yr_2010 + cohort_8:yr_2011 + 
#>     cohort_8:yr_2012 + cohort_8:yr_2013 + cohort_8:yr_2014 + 
#>     cohort_8:yr_2015 + cohort_8:yr_2016 + cohort_8:yr_2018 + 
#>     cohort_8:yr_2019 + cohort_8:yr_2020 + age + sex + comorb
#> <environment: 0x11f44b408>

Here we see both the supplied and fitted formulas. The fitted formula includes main effects for each cohort, time periods, and interaction terms for each cohort-time period combination. You might notice that these terms do not exist in our supplied data.frame; this is because sdid() calls prep_data() to create all necessary dummy variables. These are needed because we have to be very explicit about which levels of each cohort and time period variable are used as referents in the model, and these referent levels differ for main effects and for each set of interaction terms.

By default, sdid() assumes that the first observed levels of both the cohort column and the time-period column should function as the referents for the main effects. This is the same as with lm() and most other modeling functions in R. For the long list of cohort-time period interactions, however, sdid() assumes that the time period immediately preceding the intervention time period (defined by intervention_var) should be the referent. In our example, that means that the interaction terms for cohort_5 should omit 2014 as the referent time period, because that cohort’s intervention took place in 2015. Similarly, the interaction terms for cohort_6 should omit 2015 as the referent time period, because that cohort’s intervention took place in 2016. Indeed, we observe that the formula has interaction terms cohort_5:yr_2010 through cohort_5:yr_2020, with cohort_5_2014 omitted for identification.

We can use the cohort_time_refs parameter to specify a different time period to function as the referent for cohort-time interactions:

sdid_hosp_earlier_timeref <- staggR::sdid(hospitalized ~ cohort + yr + age + sex + comorb,
                                          df = hosp,
                                          cohort_time_refs = list(`5` = "2013",
                                                                  `6` = "2014",
                                                                  `7` = "2015",
                                                                  `8` = "2016"),
                                          intervention_var  = "intervention_yr")

We can now proceed with the second, post-estimation step: translating difference-in-differences estimates into average estimates.

Calculating coefficient averages

Before we can calculate coefficient averages, we first need to identify the coefficients we want to combine. We do this by supplying the sdid object and either “pre” or “post” to the select_period() function to retrieve the coefficients for interaction terms corresponding to the pre-intervention or post-intervention periods, respectively.

# Identify the coefficients that compose the effect of the intervention during the 
# post-intervention period
(post_coefs <- staggR::select_period(sdid = sdid_hosp, 
                                     period = "post"))
#>  [1] "cohort_5:yr_2015" "cohort_5:yr_2016" "cohort_5:yr_2017" "cohort_5:yr_2018"
#>  [5] "cohort_5:yr_2019" "cohort_5:yr_2020" "cohort_6:yr_2016" "cohort_6:yr_2017"
#>  [9] "cohort_6:yr_2018" "cohort_6:yr_2019" "cohort_6:yr_2020" "cohort_7:yr_2017"
#> [13] "cohort_7:yr_2018" "cohort_7:yr_2019" "cohort_7:yr_2020" "cohort_8:yr_2018"
#> [17] "cohort_8:yr_2019" "cohort_8:yr_2020"

Now that we’ve identified the coefficients of interest, we will pass the sdid object and the selected coefficients to the ave_coeff() function, which calculates averages of any specified set of regression coefficients, along with the corresponding standard errors.

To demonstrate how this function works, we first calculate the average effect of the intervention during the post-intervention period.

staggR::ave_coeff(sdid = sdid_hosp,
                  coefs = post_coefs)
#>         est         se         pval sign         lb        ub     n
#> 1 0.1000987 0.01398148 8.118538e-13  *** 0.07269504 0.1275024 11745

The result shows that the intervention is associated with an increase in the probability of hospitalization.

Next, we can use the ave_coeff() function to examine whether the risk of hospitalization differs by treatment group during the pre-intervention period. For that, we pass “pre” to the period argument, as follows:

# Identify the coefficients that represent the difference between exposure groups 
# during the pre-intervention period
(pre_coefs <- staggR::select_period(sdid = sdid_hosp, 
                                    period = "pre"))
#>  [1] "cohort_5:yr_2010" "cohort_5:yr_2011" "cohort_5:yr_2012" "cohort_5:yr_2013"
#>  [5] "cohort_6:yr_2010" "cohort_6:yr_2011" "cohort_6:yr_2012" "cohort_6:yr_2013"
#>  [9] "cohort_6:yr_2014" "cohort_7:yr_2010" "cohort_7:yr_2011" "cohort_7:yr_2012"
#> [13] "cohort_7:yr_2013" "cohort_7:yr_2014" "cohort_7:yr_2015" "cohort_8:yr_2010"
#> [17] "cohort_8:yr_2011" "cohort_8:yr_2012" "cohort_8:yr_2013" "cohort_8:yr_2014"
#> [21] "cohort_8:yr_2015" "cohort_8:yr_2016"

staggR::ave_coeff(sdid = sdid_hosp,
                  coefs = pre_coefs)
#>            est         se      pval sign          lb         ub    n
#> 1 -0.001761013 0.01451994 0.8853825      -0.03022009 0.02669807 7929

Here we see risk of hospitalization during the pre-intervention period does not significantly differ by intervention status.

We can also examine whether there exists heterogeneity in difference-in-differences estimates between various cohorts, using the cohorts parameter in the select_period() function. For instance, we could calculate the average of difference-in-differences estimates for the post-intervention period for cohorts 5 and 6.

# Identify the coefficients for the effect of the intervention on cohorts 5 and 6 
# during the post-intervention period
(post_coefs_cohort5 <- staggR::select_period(sdid = sdid_hosp, 
                                             period = "post",
                                             cohorts = c("5", "6")))
#>  [1] "cohort_5:yr_2015" "cohort_5:yr_2016" "cohort_5:yr_2017" "cohort_5:yr_2018"
#>  [5] "cohort_5:yr_2019" "cohort_5:yr_2020" "cohort_6:yr_2016" "cohort_6:yr_2017"
#>  [9] "cohort_6:yr_2018" "cohort_6:yr_2019" "cohort_6:yr_2020"

staggR::ave_coeff(sdid = sdid_hosp,
                  coefs = post_coefs_cohort5)
#>          est         se         pval sign         lb        ub    n
#> 1 0.07151805 0.01901505 0.0001660672  *** 0.03424855 0.1087875 6376

We can similarly calculate the average of difference-in-differences estimates for the post-intervention period for cohorts 7 and 8.

# Identify the coefficients for the effect of the intervention on cohorts 7 and 8 
# during the post-intervention period
(post_coefs_cohort8 <- staggR::select_period(sdid = sdid_hosp, 
                                             period = "post",
                                             cohorts = c("7", "8")))
#> [1] "cohort_7:yr_2017" "cohort_7:yr_2018" "cohort_7:yr_2019" "cohort_7:yr_2020"
#> [5] "cohort_8:yr_2018" "cohort_8:yr_2019" "cohort_8:yr_2020"

staggR::ave_coeff(sdid = sdid_hosp,
                  coefs = post_coefs_cohort8)
#>       est         se         pval sign         lb        ub    n
#> 1 0.13404 0.02030546 4.061829e-11  *** 0.09424126 0.1738386 5369

Comparing these averages shows that the association between the intervention and probability of hospitalization for the last two cohorts is almost twice as large compared to the first two cohorts.

If we want to aggregate a set of coefficients other than those for the pre- or post-intervention periods, we select our coefficients using select_terms() instead of select_period(). We supply a named list containing values for cohorts and either times or tsi. The cohorts element should contain a character vector of cohorts; if we omit the cohorts element, the function will select all available cohorts. The times element contains a character vector of time periods as they are recorded in the time variable supplied to sdid(). If we would prefer to select time periods relative to the intervention period, we would instead use the tsi element, which contains a vector of integers representing the number of units of time relative to each cohort’s intervention.

For example, we could calculate the added risk of hospitalization associated with the intervention across all cohorts for the year 2018.

# Identify the coefficients corresponding to all cohorts in 2018
(terms_2018 <- staggR::select_terms(sdid = sdid_hosp,
                                    selection = list(times = "2018")))
#> [1] "cohort_5:yr_2018" "cohort_6:yr_2018" "cohort_7:yr_2018" "cohort_8:yr_2018"

Then we can pass the selected terms to the ave_coeff() function to aggregate the 2018 effect across all cohorts.

staggR::ave_coeff(sdid = sdid_hosp,
                  coefs = terms_2018)
#>         est         se        pval sign         lb       ub    n
#> 1 0.1012555 0.02090127 1.25012e-06  *** 0.06028899 0.142222 2441

If we are interested in the added risk of hospitalization associated with the intervention in the year 2018, but only for the first two cohorts (5 and 6), we could include a cohorts element in the selection list.

(terms_2018_cohorts56 <- staggR::select_terms(sdid = sdid_hosp,
                                              selection = list(cohorts = c("5", "6"),
                                                               times = "2018")))
#> [1] "cohort_5:yr_2018" "cohort_6:yr_2018"

staggR::ave_coeff(sdid = sdid_hosp,
                  coefs = terms_2018_cohorts56)
#>          est         se         pval sign         lb        ub    n
#> 1 0.08850559 0.02620059 0.0007164136  *** 0.03715245 0.1398587 1136

Alternatively, if we already know exactly which coefficients we want to aggregate, we can simply list them in the coefs argument.

(terms_custom <- staggR::select_terms(sdid = sdid_hosp,
                                      coefs = c("cohort_5:yr_2018", "cohort_6:yr_2018")))
#> [1] "cohort_5:yr_2018" "cohort_6:yr_2018"

staggR::ave_coeff(sdid = sdid_hosp,
                  coefs = terms_custom)
#>          est         se         pval sign         lb        ub    n
#> 1 0.08850559 0.02620059 0.0007164136  *** 0.03715245 0.1398587 1136

Calculating custom standard errors

Because outcomes within the same unit (in our case, county) are often correlated over time, assuming independent errors could understate uncertainty. Clustering standard errors allows for correlation within clusters while maintaining independence across them. Following Sun and Abraham¹, we can supply a custom variance–covariance function to compute standard errors clustered at the county level.

# Fit SDID model with standard errors clustered at the county level
sdid_hosp <- staggR::sdid(hospitalized ~ cohort + yr + age + sex + comorb,
                               df = hosp,
                               intervention_var  = "intervention_yr",
                               .vcov = sandwich::vcovCL,
                               cluster = hosp$county)
summary(sdid_hosp)
#> 
#> Supplied formula:
#> hospitalized ~ cohort + yr + age + sex + comorb
#> 
#> Fitted formula:
#> hospitalized ~ cohort_5 + cohort_6 + cohort_7 + cohort_8 + yr_2011 + 
#>     yr_2012 + yr_2013 + yr_2014 + yr_2015 + yr_2016 + yr_2017 + 
#>     yr_2018 + yr_2019 + yr_2020 + cohort_5:yr_2010 + cohort_5:yr_2011 + 
#>     cohort_5:yr_2012 + cohort_5:yr_2013 + cohort_5:yr_2015 + 
#>     cohort_5:yr_2016 + cohort_5:yr_2017 + cohort_5:yr_2018 + 
#>     cohort_5:yr_2019 + cohort_5:yr_2020 + cohort_6:yr_2010 + 
#>     cohort_6:yr_2011 + cohort_6:yr_2012 + cohort_6:yr_2013 + 
#>     cohort_6:yr_2014 + cohort_6:yr_2016 + cohort_6:yr_2017 + 
#>     cohort_6:yr_2018 + cohort_6:yr_2019 + cohort_6:yr_2020 + 
#>     cohort_7:yr_2010 + cohort_7:yr_2011 + cohort_7:yr_2012 + 
#>     cohort_7:yr_2013 + cohort_7:yr_2014 + cohort_7:yr_2015 + 
#>     cohort_7:yr_2017 + cohort_7:yr_2018 + cohort_7:yr_2019 + 
#>     cohort_7:yr_2020 + cohort_8:yr_2010 + cohort_8:yr_2011 + 
#>     cohort_8:yr_2012 + cohort_8:yr_2013 + cohort_8:yr_2014 + 
#>     cohort_8:yr_2015 + cohort_8:yr_2016 + cohort_8:yr_2018 + 
#>     cohort_8:yr_2019 + cohort_8:yr_2020 + age + sex + comorb
#> 
#> Residuals:
#>      Min      Q1  Median    Q3    Max
#>  -0.8656 -0.3703 -0.2186 0.473 0.8912
#> 
#> Coefficients:
#>              term     estimate   std_error     t_value p_value    
#>       (Intercept)  0.041538404 0.045267775   0.9176153 0.35883    
#>          cohort_5  0.085465821 0.027716214   3.0836037 0.00205 ** 
#>          cohort_6 -0.157204977 0.017866138  -8.7990466 0.0e+00 ***
#>          cohort_7  0.077020852 0.018400409   4.1858228 0.00003 ***
#>          cohort_8 -0.232052150 0.004451773 -52.1257805 0.0e+00 ***
#>           yr_2011  0.053809624 0.026875736   2.0021637 0.04528 *  
#>           yr_2012  0.022191831 0.047318078   0.4689927 0.63908    
#>           yr_2013  0.041736045 0.034298156   1.2168597 0.22367    
#>           yr_2014  0.102216988 0.034984090   2.9218136 0.00348 ** 
#>           yr_2015  0.119986328 0.037058526   3.2377523 0.00121 ** 
#>           yr_2016  0.139478844 0.035277808   3.9537277 0.00008 ***
#>           yr_2017  0.181007359 0.028238688   6.4099068 1.5e-10 ***
#>           yr_2018  0.186156454 0.040668647   4.5773948 4.7e-06 ***
#>           yr_2019  0.227572896 0.031695092   7.1800674 7.1e-13 ***
#>           yr_2020  0.218923330 0.030912111   7.0821216 1.4e-12 ***
#>               age  0.005021507 0.000624192   8.0448109 8.9e-16 ***
#>              sexM  0.014436532 0.004693948   3.0755627 0.00210 ** 
#>            comorb  0.007434585 0.004245490   1.7511723 0.07993 .  
#>  cohort_5:yr_2010 -0.073298286 0.046712037  -1.5691520 0.11662    
#>  cohort_5:yr_2011 -0.072496499 0.043114601  -1.6814837 0.09268 .  
#>  cohort_5:yr_2012  0.007407740 0.029246140   0.2532895 0.80005    
#>  cohort_5:yr_2013  0.018351888 0.035385632   0.5186254 0.60403    
#>  cohort_5:yr_2015  0.032452652 0.046848221   0.6927190 0.48849    
#>  cohort_5:yr_2016  0.074734676 0.039220635   1.9054938 0.05673 .  
#>  cohort_5:yr_2017  0.112519746 0.039615253   2.8403137 0.00451 ** 
#>  cohort_5:yr_2018  0.182252536 0.033255362   5.4803954 4.3e-08 ***
#>  cohort_5:yr_2019  0.186210308 0.019756605   9.4252178 0.0e+00 ***
#>  cohort_5:yr_2020  0.261743731 0.043226983   6.0551007 1.4e-09 ***
#>  cohort_6:yr_2010  0.192407710 0.039805182   4.8337353 1.3e-06 ***
#>  cohort_6:yr_2011  0.081741042 0.021310552   3.8357074 0.00013 ***
#>  cohort_6:yr_2012  0.126187949 0.016936012   7.4508657 9.5e-14 ***
#>  cohort_6:yr_2013  0.102161820 0.055195574   1.8509060 0.06419 .  
#>  cohort_6:yr_2014  0.016850413 0.018629983   0.9044782 0.36575    
#>  cohort_6:yr_2016  0.037963212 0.015173891   2.5018772 0.01236 *  
#>  cohort_6:yr_2017 -0.020106968 0.046378986  -0.4335362 0.66463    
#>  cohort_6:yr_2018 -0.067739307 0.024751017  -2.7368292 0.00621 ** 
#>  cohort_6:yr_2019 -0.074178123 0.027898187  -2.6588869 0.00784 ** 
#>  cohort_6:yr_2020 -0.083133234 0.044718804  -1.8590219 0.06303 .  
#>  cohort_7:yr_2010 -0.027712569 0.039950251  -0.6936770 0.48789    
#>  cohort_7:yr_2011 -0.098466031 0.025944113  -3.7953131 0.00015 ***
#>  cohort_7:yr_2012 -0.049054517 0.030866908  -1.5892268 0.11202    
#>  cohort_7:yr_2013 -0.039127782 0.037834688  -1.0341775 0.30106    
#>  cohort_7:yr_2014 -0.055261507 0.028025632  -1.9718202 0.04864 *  
#>  cohort_7:yr_2015 -0.042251185 0.029956121  -1.4104358 0.15842    
#>  cohort_7:yr_2017  0.124607104 0.013629312   9.1425824 0.0e+00 ***
#>  cohort_7:yr_2018  0.131896098 0.031705189   4.1600793 0.00003 ***
#>  cohort_7:yr_2019  0.157899398 0.015434476  10.2303053 0.0e+00 ***
#>  cohort_7:yr_2020  0.192719914 0.037622842   5.1224178 3.0e-07 ***
#>  cohort_8:yr_2010  0.169800222 0.029106010   5.8338542 5.5e-09 ***
#>  cohort_8:yr_2011  0.162146753 0.014472928  11.2034516 0.0e+00 ***
#>  cohort_8:yr_2012  0.229016705 0.024229351   9.4520362 0.0e+00 ***
#>  cohort_8:yr_2013  0.181452224 0.018408752   9.8568452 0.0e+00 ***
#>  cohort_8:yr_2014  0.161011028 0.012653160  12.7249656 0.0e+00 ***
#>  cohort_8:yr_2015  0.046657219 0.010565894   4.4158325 0.00001 ***
#>  cohort_8:yr_2016  0.054386841 0.007691923   7.0706431 1.6e-12 ***
#>  cohort_8:yr_2018  0.008099328 0.014228755   0.5692225 0.56921    
#>  cohort_8:yr_2019  0.035236092 0.004283812   8.2254050 2.2e-16 ***
#>  cohort_8:yr_2020 -0.003491331 0.016389585  -0.2130213 0.83131    
#> 
#> Significance codes: < 0.001: '***'; < 0.01: '**'; < 0.05: '*'; < 0.1: '.'
#> Residual standard error: 0.4623 on 30982 degrees of freedom
#> R^2: 0.122084901176252; Adjusted R^2: 0.120469732348128

# Examine the association between the intervention and risk of hospitalization in 
# the post-intervention period
staggR::ave_coeff(sdid = sdid_hosp, 
                  coefs = staggR::select_period(sdid = sdid_hosp,
                                                period = "post"))
#>         est         se         pval sign         lb        ub     n
#> 1 0.1000987 0.01588058 2.895417e-10  *** 0.06897278 0.1312247 11745

Using aggregated data

Data sets containing staggered interventions can sometimes be very large, so it might be preferable to fit the model using aggregated data. To this end, we created an alternate version of the data set that has been aggregated to the county-year level.

data("hosp_agg", package = "staggR")
head(hosp_agg)
#> # A data frame: 6 × 9
#>   yr    county    cohort intervention_yr pct_hospitalized n_enr mean_age pct_fem
#> * <chr> <chr>     <chr>  <chr>                      <dbl> <int>    <dbl>   <dbl>
#> 1 2010  Ashbrook… 5      2015                       0.259   108     35.1   0.5  
#> 2 2011  Ashbrook… 5      2015                       0.348   132     35.5   0.5  
#> 3 2012  Ashbrook… 5      2015                       0.314   159     35.8   0.465
#> 4 2013  Ashbrook… 5      2015                       0.391   197     36.2   0.462
#> 5 2014  Ashbrook… 5      2015                       0.373   220     36.3   0.473
#> 6 2015  Ashbrook… 5      2015                       0.518   245     36.7   0.449
#> # ℹ 1 more variable: pct_cmb <dbl>

# This data set contains one row per county-year
nrow(hosp_agg); nrow(unique(hosp_agg[,c("yr", "county")]))
#> [1] 165
#> [1] 165

As before, we can use the function sdid() to fit the difference-in-differences regression model. The regression formula now uses the variable representing the percent of hospitalized individuals in each county-year (pct_hospitalized) as the dependent variable, and the covariates have been aggregated (i.e., mean_age, pct_fem, and pct_cmb). The main difference is that we need to specify the weights argument for the number of observations contained within each row of the data set, which is represented by the n_enr column.

sdid_hosp_agg <- staggR::sdid(pct_hospitalized ~ cohort + yr + mean_age + pct_fem + pct_cmb,
                              df = hosp_agg,
                              weights = "n_enr",
                              intervention_var  = "intervention_yr",
                              # Cluster standard errors at the county level
                              .vcov = sandwich::vcovCL,
                              cluster = hosp_agg$county)

Aggregating means that we decoupled values of these covariates from outcomes within county-years, so coefficients are slightly different but substantially similar compared to the regression using individual-level data. We can examine differences between models by summarizing them, as we would with a lm or glm model.

# Summarize the original model
summary(sdid_hosp)
#> 
#> Supplied formula:
#> hospitalized ~ cohort + yr + age + sex + comorb
#> 
#> Fitted formula:
#> hospitalized ~ cohort_5 + cohort_6 + cohort_7 + cohort_8 + yr_2011 + 
#>     yr_2012 + yr_2013 + yr_2014 + yr_2015 + yr_2016 + yr_2017 + 
#>     yr_2018 + yr_2019 + yr_2020 + cohort_5:yr_2010 + cohort_5:yr_2011 + 
#>     cohort_5:yr_2012 + cohort_5:yr_2013 + cohort_5:yr_2015 + 
#>     cohort_5:yr_2016 + cohort_5:yr_2017 + cohort_5:yr_2018 + 
#>     cohort_5:yr_2019 + cohort_5:yr_2020 + cohort_6:yr_2010 + 
#>     cohort_6:yr_2011 + cohort_6:yr_2012 + cohort_6:yr_2013 + 
#>     cohort_6:yr_2014 + cohort_6:yr_2016 + cohort_6:yr_2017 + 
#>     cohort_6:yr_2018 + cohort_6:yr_2019 + cohort_6:yr_2020 + 
#>     cohort_7:yr_2010 + cohort_7:yr_2011 + cohort_7:yr_2012 + 
#>     cohort_7:yr_2013 + cohort_7:yr_2014 + cohort_7:yr_2015 + 
#>     cohort_7:yr_2017 + cohort_7:yr_2018 + cohort_7:yr_2019 + 
#>     cohort_7:yr_2020 + cohort_8:yr_2010 + cohort_8:yr_2011 + 
#>     cohort_8:yr_2012 + cohort_8:yr_2013 + cohort_8:yr_2014 + 
#>     cohort_8:yr_2015 + cohort_8:yr_2016 + cohort_8:yr_2018 + 
#>     cohort_8:yr_2019 + cohort_8:yr_2020 + age + sex + comorb
#> 
#> Residuals:
#>      Min      Q1  Median    Q3    Max
#>  -0.8656 -0.3703 -0.2186 0.473 0.8912
#> 
#> Coefficients:
#>              term     estimate   std_error     t_value p_value    
#>       (Intercept)  0.041538404 0.045267775   0.9176153 0.35883    
#>          cohort_5  0.085465821 0.027716214   3.0836037 0.00205 ** 
#>          cohort_6 -0.157204977 0.017866138  -8.7990466 0.0e+00 ***
#>          cohort_7  0.077020852 0.018400409   4.1858228 0.00003 ***
#>          cohort_8 -0.232052150 0.004451773 -52.1257805 0.0e+00 ***
#>           yr_2011  0.053809624 0.026875736   2.0021637 0.04528 *  
#>           yr_2012  0.022191831 0.047318078   0.4689927 0.63908    
#>           yr_2013  0.041736045 0.034298156   1.2168597 0.22367    
#>           yr_2014  0.102216988 0.034984090   2.9218136 0.00348 ** 
#>           yr_2015  0.119986328 0.037058526   3.2377523 0.00121 ** 
#>           yr_2016  0.139478844 0.035277808   3.9537277 0.00008 ***
#>           yr_2017  0.181007359 0.028238688   6.4099068 1.5e-10 ***
#>           yr_2018  0.186156454 0.040668647   4.5773948 4.7e-06 ***
#>           yr_2019  0.227572896 0.031695092   7.1800674 7.1e-13 ***
#>           yr_2020  0.218923330 0.030912111   7.0821216 1.4e-12 ***
#>               age  0.005021507 0.000624192   8.0448109 8.9e-16 ***
#>              sexM  0.014436532 0.004693948   3.0755627 0.00210 ** 
#>            comorb  0.007434585 0.004245490   1.7511723 0.07993 .  
#>  cohort_5:yr_2010 -0.073298286 0.046712037  -1.5691520 0.11662    
#>  cohort_5:yr_2011 -0.072496499 0.043114601  -1.6814837 0.09268 .  
#>  cohort_5:yr_2012  0.007407740 0.029246140   0.2532895 0.80005    
#>  cohort_5:yr_2013  0.018351888 0.035385632   0.5186254 0.60403    
#>  cohort_5:yr_2015  0.032452652 0.046848221   0.6927190 0.48849    
#>  cohort_5:yr_2016  0.074734676 0.039220635   1.9054938 0.05673 .  
#>  cohort_5:yr_2017  0.112519746 0.039615253   2.8403137 0.00451 ** 
#>  cohort_5:yr_2018  0.182252536 0.033255362   5.4803954 4.3e-08 ***
#>  cohort_5:yr_2019  0.186210308 0.019756605   9.4252178 0.0e+00 ***
#>  cohort_5:yr_2020  0.261743731 0.043226983   6.0551007 1.4e-09 ***
#>  cohort_6:yr_2010  0.192407710 0.039805182   4.8337353 1.3e-06 ***
#>  cohort_6:yr_2011  0.081741042 0.021310552   3.8357074 0.00013 ***
#>  cohort_6:yr_2012  0.126187949 0.016936012   7.4508657 9.5e-14 ***
#>  cohort_6:yr_2013  0.102161820 0.055195574   1.8509060 0.06419 .  
#>  cohort_6:yr_2014  0.016850413 0.018629983   0.9044782 0.36575    
#>  cohort_6:yr_2016  0.037963212 0.015173891   2.5018772 0.01236 *  
#>  cohort_6:yr_2017 -0.020106968 0.046378986  -0.4335362 0.66463    
#>  cohort_6:yr_2018 -0.067739307 0.024751017  -2.7368292 0.00621 ** 
#>  cohort_6:yr_2019 -0.074178123 0.027898187  -2.6588869 0.00784 ** 
#>  cohort_6:yr_2020 -0.083133234 0.044718804  -1.8590219 0.06303 .  
#>  cohort_7:yr_2010 -0.027712569 0.039950251  -0.6936770 0.48789    
#>  cohort_7:yr_2011 -0.098466031 0.025944113  -3.7953131 0.00015 ***
#>  cohort_7:yr_2012 -0.049054517 0.030866908  -1.5892268 0.11202    
#>  cohort_7:yr_2013 -0.039127782 0.037834688  -1.0341775 0.30106    
#>  cohort_7:yr_2014 -0.055261507 0.028025632  -1.9718202 0.04864 *  
#>  cohort_7:yr_2015 -0.042251185 0.029956121  -1.4104358 0.15842    
#>  cohort_7:yr_2017  0.124607104 0.013629312   9.1425824 0.0e+00 ***
#>  cohort_7:yr_2018  0.131896098 0.031705189   4.1600793 0.00003 ***
#>  cohort_7:yr_2019  0.157899398 0.015434476  10.2303053 0.0e+00 ***
#>  cohort_7:yr_2020  0.192719914 0.037622842   5.1224178 3.0e-07 ***
#>  cohort_8:yr_2010  0.169800222 0.029106010   5.8338542 5.5e-09 ***
#>  cohort_8:yr_2011  0.162146753 0.014472928  11.2034516 0.0e+00 ***
#>  cohort_8:yr_2012  0.229016705 0.024229351   9.4520362 0.0e+00 ***
#>  cohort_8:yr_2013  0.181452224 0.018408752   9.8568452 0.0e+00 ***
#>  cohort_8:yr_2014  0.161011028 0.012653160  12.7249656 0.0e+00 ***
#>  cohort_8:yr_2015  0.046657219 0.010565894   4.4158325 0.00001 ***
#>  cohort_8:yr_2016  0.054386841 0.007691923   7.0706431 1.6e-12 ***
#>  cohort_8:yr_2018  0.008099328 0.014228755   0.5692225 0.56921    
#>  cohort_8:yr_2019  0.035236092 0.004283812   8.2254050 2.2e-16 ***
#>  cohort_8:yr_2020 -0.003491331 0.016389585  -0.2130213 0.83131    
#> 
#> Significance codes: < 0.001: '***'; < 0.01: '**'; < 0.05: '*'; < 0.1: '.'
#> Residual standard error: 0.4623 on 30982 degrees of freedom
#> R^2: 0.122084901176252; Adjusted R^2: 0.120469732348128

# Summarize the model based on aggregated data
summary(sdid_hosp_agg)
#> 
#> Supplied formula:
#> pct_hospitalized ~ cohort + yr + mean_age + pct_fem + pct_cmb
#> 
#> Fitted formula:
#> pct_hospitalized ~ cohort_5 + cohort_6 + cohort_7 + cohort_8 + 
#>     yr_2011 + yr_2012 + yr_2013 + yr_2014 + yr_2015 + yr_2016 + 
#>     yr_2017 + yr_2018 + yr_2019 + yr_2020 + cohort_5:yr_2010 + 
#>     cohort_5:yr_2011 + cohort_5:yr_2012 + cohort_5:yr_2013 + 
#>     cohort_5:yr_2015 + cohort_5:yr_2016 + cohort_5:yr_2017 + 
#>     cohort_5:yr_2018 + cohort_5:yr_2019 + cohort_5:yr_2020 + 
#>     cohort_6:yr_2010 + cohort_6:yr_2011 + cohort_6:yr_2012 + 
#>     cohort_6:yr_2013 + cohort_6:yr_2014 + cohort_6:yr_2016 + 
#>     cohort_6:yr_2017 + cohort_6:yr_2018 + cohort_6:yr_2019 + 
#>     cohort_6:yr_2020 + cohort_7:yr_2010 + cohort_7:yr_2011 + 
#>     cohort_7:yr_2012 + cohort_7:yr_2013 + cohort_7:yr_2014 + 
#>     cohort_7:yr_2015 + cohort_7:yr_2017 + cohort_7:yr_2018 + 
#>     cohort_7:yr_2019 + cohort_7:yr_2020 + cohort_8:yr_2010 + 
#>     cohort_8:yr_2011 + cohort_8:yr_2012 + cohort_8:yr_2013 + 
#>     cohort_8:yr_2014 + cohort_8:yr_2015 + cohort_8:yr_2016 + 
#>     cohort_8:yr_2018 + cohort_8:yr_2019 + cohort_8:yr_2020 + 
#>     mean_age + pct_fem + pct_cmb
#> 
#> Residuals:
#>      Min      Q1 Median     Q3    Max
#>  -1.1314 -0.2452      0 0.2286 0.9615
#> 
#> Coefficients:
#>              term     estimate   std_error      t_value p_value    
#>       (Intercept)  0.228693901 0.322881661   0.70829015 0.48031    
#>          cohort_5  0.090752122 0.035935625   2.52540820 0.01302 *  
#>          cohort_6 -0.161546669 0.020864811  -7.74254183 5.8e-12 ***
#>          cohort_7  0.074689254 0.023650709   3.15801334 0.00206 ** 
#>          cohort_8 -0.238704323 0.010795223 -22.11203182 0.0e+00 ***
#>           yr_2011  0.053217642 0.032687929   1.62805182 0.10645    
#>           yr_2012  0.025658025 0.060724164   0.42253403 0.67348    
#>           yr_2013  0.048676642 0.047491057   1.02496438 0.30769    
#>           yr_2014  0.111307673 0.048899012   2.27627652 0.02482 *  
#>           yr_2015  0.131778714 0.053450280   2.46544479 0.01527 *  
#>           yr_2016  0.152113979 0.051690073   2.94280833 0.00399 ** 
#>           yr_2017  0.196827024 0.047779418   4.11949397 0.00008 ***
#>           yr_2018  0.200525981 0.060793374   3.29848416 0.00132 ** 
#>           yr_2019  0.241053642 0.048216896   4.99936048 2.3e-06 ***
#>           yr_2020  0.227642925 0.041266317   5.51643430 2.4e-07 ***
#>          mean_age -0.001117751 0.008708643  -0.12834957 0.89811    
#>           pct_fem -0.001137132 0.089179306  -0.01275107 0.98985    
#>           pct_cmb  0.064343082 0.080361840   0.80066711 0.42510    
#>  cohort_5:yr_2010 -0.078048485 0.057633107  -1.35423004 0.17852    
#>  cohort_5:yr_2011 -0.073868461 0.056009708  -1.31885103 0.19003    
#>  cohort_5:yr_2012  0.006914643 0.036732333   0.18824404 0.85104    
#>  cohort_5:yr_2013  0.016749498 0.043164816   0.38803589 0.69876    
#>  cohort_5:yr_2015  0.029976283 0.056849689   0.52729019 0.59908    
#>  cohort_5:yr_2016  0.072492394 0.048277349   1.50158192 0.13615    
#>  cohort_5:yr_2017  0.107452271 0.048854246   2.19944588 0.03000 *  
#>  cohort_5:yr_2018  0.179814496 0.041723476   4.30967197 0.00004 ***
#>  cohort_5:yr_2019  0.183575249 0.024702718   7.43137870 2.8e-11 ***
#>  cohort_5:yr_2020  0.254698618 0.054186426   4.70041369 7.8e-06 ***
#>  cohort_6:yr_2010  0.195231407 0.050606871   3.85780432 0.00020 ***
#>  cohort_6:yr_2011  0.088438535 0.032468976   2.72378574 0.00754 ** 
#>  cohort_6:yr_2012  0.131341225 0.021888462   6.00047767 2.7e-08 ***
#>  cohort_6:yr_2013  0.103776798 0.069542351   1.49228199 0.13857    
#>  cohort_6:yr_2014  0.020857116 0.021794643   0.95698357 0.34073    
#>  cohort_6:yr_2016  0.041451067 0.017865726   2.32014450 0.02223 *  
#>  cohort_6:yr_2017 -0.018284118 0.054735070  -0.33404759 0.73900    
#>  cohort_6:yr_2018 -0.062396321 0.030822420  -2.02438098 0.04542 *  
#>  cohort_6:yr_2019 -0.069263623 0.034670807  -1.99775055 0.04828 *  
#>  cohort_6:yr_2020 -0.079013447 0.053561485  -1.47519151 0.14310    
#>  cohort_7:yr_2010 -0.032002540 0.046205257  -0.69261687 0.49005    
#>  cohort_7:yr_2011 -0.095926160 0.037377329  -2.56642633 0.01166 *  
#>  cohort_7:yr_2012 -0.044481466 0.038813288  -1.14603703 0.25434    
#>  cohort_7:yr_2013 -0.035256339 0.048640480  -0.72483535 0.47014    
#>  cohort_7:yr_2014 -0.053916075 0.034308437  -1.57151065 0.11902    
#>  cohort_7:yr_2015 -0.041956676 0.035908135  -1.16844487 0.24522    
#>  cohort_7:yr_2017  0.122128992 0.015191221   8.03944538 1.3e-12 ***
#>  cohort_7:yr_2018  0.133041240 0.040262463   3.30434928 0.00130 ** 
#>  cohort_7:yr_2019  0.158755768 0.018854300   8.42013584 1.9e-13 ***
#>  cohort_7:yr_2020  0.188190948 0.045681855   4.11959951 0.00008 ***
#>  cohort_8:yr_2010  0.168017984 0.037608434   4.46756126 0.00002 ***
#>  cohort_8:yr_2011  0.163233956 0.021546976   7.57572466 1.3e-11 ***
#>  cohort_8:yr_2012  0.226848004 0.029934054   7.57825861 1.3e-11 ***
#>  cohort_8:yr_2013  0.182161256 0.023642829   7.70471486 7.1e-12 ***
#>  cohort_8:yr_2014  0.167771896 0.015444263  10.86305609 0.0e+00 ***
#>  cohort_8:yr_2015  0.053514914 0.010942679   4.89047637 3.6e-06 ***
#>  cohort_8:yr_2016  0.055586510 0.012678831   4.38419826 0.00003 ***
#>  cohort_8:yr_2018  0.007757137 0.016926798   0.45827550 0.64768    
#>  cohort_8:yr_2019  0.041353989 0.007016441   5.89387009 4.4e-08 ***
#>  cohort_8:yr_2020 -0.004063714 0.019899948  -0.20420726 0.83858    
#> 
#> Significance codes: < 0.001: '***'; < 0.01: '**'; < 0.05: '*'; < 0.1: '.'
#> Residual standard error: 0.3943 on 107 degrees of freedom
#> R^2: 0.972406151984569; Adjusted R^2: 0.957706625471676

If we would like to compare the two models’ estimates of risk of hospitalization associated with the intervention during the post-intervention period, we can use ave_coeff() again.

# Summarize the post-intervention period from the original model
summary_sdid_hosp <- 
  staggR::ave_coeff(sdid = sdid_hosp, 
                    coefs = staggR::select_period(sdid = sdid_hosp,
                                                  period = "post"))


# # Summarize the post-intervention period from the model fitted using aggregated data
summary_sdid_hosp_agg <- 
  staggR::ave_coeff(sdid = sdid_hosp_agg, 
                    coefs = staggR::select_period(sdid = sdid_hosp_agg,
                                                  period = "post"))

# Combine the two summaries
summary_sdid_hosp$model <- "Individual-level data"
summary_sdid_hosp_agg$model <- "Aggregated data"
rbind(summary_sdid_hosp, summary_sdid_hosp_agg)
#>          est         se         pval sign         lb        ub     n
#> 1 0.10009873 0.01588058 2.895417e-10  *** 0.06897278 0.1312247 11745
#> 2 0.09905923 0.01936651 1.359248e-06  *** 0.06110088 0.1370176 11745
#>                   model
#> 1 Individual-level data
#> 2       Aggregated data

staggR

staggR: Fit difference-in-differences models with staggered interventions

Background