modelsum objects togethermodelsum within an Sweave documentmodelsum results to a .CSV filemodelsum object to a separate Word or HTML fileVery often we are asked to summarize model results from multiple fits into a nice table. The endpoint might be of different types (e.g., survival, case/control, continuous) and there may be several independent variables that we want to examine univariately or adjusted for certain variables such as age and sex. Locally, the SAS macros %modelsum, %glmuniv, and %logisuni were written to create such summary tables. With the increasing interest in R, we have developed the function modelsum to create similar tables within the R environment.
In developing the modelsum function, the goal was to bring the best features of these macros into an R function. However, the task was not simply to duplicate all the functionality, but rather to make use of R’s strengths (modeling, method dispersion, flexibility in function definition and output format) and make a tool that fits the needs of R users. Additionally, the results needed to fit within the general reproducible research framework so the tables could be displayed within an R markdown report.
This report provides step-by-step directions for using the functions associated with modelsum. All functions presented here are available within the arsenal package. An assumption is made that users are somewhat familiar with R markdown documents. For those who are new to the topic, a good initial resource is available at rmarkdown.rstudio.com.
The first step when using the modelsum function is to load the arsenal package. All the examples in this report use a dataset called mockstudy made available by Paul Novotny which includes a variety of types of variables (character, numeric, factor, ordered factor, survival) to use as examples.
> require(arsenal)
> data(mockstudy) # load data
> dim(mockstudy)  # look at how many subjects and variables are in the dataset 
[1] 1499   14
> # help(mockstudy) # learn more about the dataset and variables
> str(mockstudy) # quick look at the data
'data.frame':   1499 obs. of  14 variables:
 $ case       : int  110754 99706 105271 105001 112263 86205 99508 90158 88989 90515 ...
 $ age        : atomic  67 74 50 71 69 56 50 57 51 63 ...
  ..- attr(*, "label")= chr "Age in Years"
 $ arm        : atomic  F: FOLFOX A: IFL A: IFL G: IROX ...
  ..- attr(*, "label")= chr "Treatment Arm"
 $ sex        : Factor w/ 2 levels "Male","Female": 1 2 2 2 2 1 1 1 2 1 ...
 $ race       : atomic  Caucasian Caucasian Caucasian Caucasian ...
  ..- attr(*, "label")= chr "Race"
 $ fu.time    : int  922 270 175 128 233 120 369 421 387 363 ...
 $ fu.stat    : int  2 2 2 2 2 2 2 2 2 2 ...
 $ ps         : int  0 1 1 1 0 0 0 0 1 1 ...
 $ hgb        : num  11.5 10.7 11.1 12.6 13 10.2 13.3 12.1 13.8 12.1 ...
 $ bmi        : atomic  25.1 19.5 NA 29.4 26.4 ...
  ..- attr(*, "label")= chr "Body Mass Index (kg/m^2)"
 $ alk.phos   : int  160 290 700 771 350 569 162 152 231 492 ...
 $ ast        : int  35 52 100 68 35 27 16 12 25 18 ...
 $ mdquality.s: int  NA 1 1 1 NA 1 1 1 1 1 ...
 $ age.ord    : Ord.factor w/ 8 levels "10-19"<"20-29"<..: 6 7 4 7 6 5 4 5 5 6 ...To create a simple linear regression table (the default), use a formula statement to specify the variables that you want summarized. The example below predicts BMI with the variables sex and age.
> tab1 <- modelsum(bmi ~ sex + age, data=mockstudy)If you want to take a quick look at the table, you can use summary on your modelsum object and the table will print out as text in your R console window. If you use summary without any options you will see a number of \(\ \) statements which translates to “space” in HTML.
If you want a nicer version in your console window then adding the text=TRUE option.
> summary(tab1, text=TRUE)
----------------------------------------------------------------------------------
                    estimate        std.error       p.value         adj.r.squared 
------------------ --------------- --------------- --------------- ---------------
(Intercept)        27.5            0.181           <0.001          0.004          
sex Female         -0.731          0.29            0.012           .              
(Intercept)        26.4            0.752           <0.001          0              
Age in Years       0.013           0.012           0.290           .              
----------------------------------------------------------------------------------In order for the report to look nice within an R markdown (knitr) report, you just need to specify results="asis" when creating the r chunk. This changes the layout slightly (compresses it) and bolds the variable names.
> summary(tab1)| estimate | std.error | p.value | adj.r.squared | |
|---|---|---|---|---|
| (Intercept) | 27.5 | 0.181 | <0.001 | 0.004 | 
| sex Female | -0.731 | 0.29 | 0.012 | . | 
| (Intercept) | 26.4 | 0.752 | <0.001 | 0 | 
| Age in Years | 0.013 | 0.012 | 0.290 | . | 
If you want a data.frame version, simply use as.data.frame.
> as.data.frame(tab1)
          term model endpoint estimate std.error p.value adj.r.squared
1  (Intercept)     1      bmi   27.500     0.181      NA         0.004
2   sex Female     1      bmi   -0.731     0.290   0.012         0.004
3  (Intercept)     2      bmi   26.400     0.752      NA         0.000
4 Age in Years     2      bmi    0.013     0.012   0.290         0.000The argument adjust allows the user to indicate that all the variables should be adjusted for these terms.
> tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust= ~age + sex, data=mockstudy)
> summary(tab2)| estimate | std.error | p.value | adj.r.squared | |
|---|---|---|---|---|
| (Intercept) | 176 | 20.6 | <0.001 | -0.001 | 
| Treatment Arm F: FOLFOX | -14 | 8.73 | 0.117 | . | 
| Treatment Arm G: IROX | -2.2 | 9.86 | 0.820 | . | 
| sex Female | 3.02 | 7.52 | 0.688 | . | 
| Age in Years | -0.017 | 0.319 | 0.956 | . | 
| (Intercept) | 148 | 19.6 | <0.001 | 0.045 | 
| ps | 46.7 | 5.99 | <0.001 | . | 
| sex Female | 1.17 | 7.34 | 0.874 | . | 
| Age in Years | -0.084 | 0.311 | 0.787 | . | 
| (Intercept) | 337 | 32.2 | <0.001 | 0.031 | 
| hgb | -14 | 2.14 | <0.001 | . | 
| sex Female | -6 | 7.52 | 0.426 | . | 
| Age in Years | 0.095 | 0.314 | 0.763 | . | 
To make sure the correct model is run you need to specify “family”. The options available right now are : gaussian, binomial, survival, and poisson. If there is enough interest, additional models can be added.
Look at whether there is any evidence that AlkPhos values vary by study arm after adjusting for sex and age (assuming a linear age relationship).
> fit <- lm(alk.phos ~ arm + age + sex, data=mockstudy)
> summary(fit)
Call:
lm(formula = alk.phos ~ arm + age + sex, data = mockstudy)
Residuals:
    Min      1Q  Median      3Q     Max 
-168.80  -81.45  -47.17   37.39  853.56 
Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  175.54808   20.58665   8.527   <2e-16 ***
armF: FOLFOX -13.70062    8.72963  -1.569    0.117    
armG: IROX    -2.24498    9.86004  -0.228    0.820    
age           -0.01741    0.31878  -0.055    0.956    
sexFemale      3.01598    7.52097   0.401    0.688    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 128.5 on 1228 degrees of freedom
  (266 observations deleted due to missingness)
Multiple R-squared:  0.002552,  Adjusted R-squared:  -0.0006969 
F-statistic: 0.7855 on 4 and 1228 DF,  p-value: 0.5346
> plot(fit)The results suggest that the endpoint may need to be transformed. Calculating the Box-Cox transformation suggests a log transformation.
> require(MASS)
> boxcox(fit)> fit2 <- lm(log(alk.phos) ~ arm + age + sex, data=mockstudy)
> summary(fit2)
Call:
lm(formula = log(alk.phos) ~ arm + age + sex, data = mockstudy)
Residuals:
    Min      1Q  Median      3Q     Max 
-3.0098 -0.4470 -0.1065  0.4205  2.0620 
Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)   4.9692474  0.1025239  48.469   <2e-16 ***
armF: FOLFOX -0.0766798  0.0434746  -1.764    0.078 .  
armG: IROX   -0.0192828  0.0491041  -0.393    0.695    
age          -0.0004058  0.0015876  -0.256    0.798    
sexFemale     0.0179253  0.0374553   0.479    0.632    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6401 on 1228 degrees of freedom
  (266 observations deleted due to missingness)
Multiple R-squared:  0.003121,  Adjusted R-squared:  -0.0001258 
F-statistic: 0.9613 on 4 and 1228 DF,  p-value: 0.4278
> plot(fit2)Finally, look to see whether there there is a non-linear relationship with age.
> require(gam)
> fit3 <- lm(log(alk.phos) ~ arm + ns(age, df=2) + sex, data=mockstudy)
> 
> # test whether there is a difference between models 
> anova(fit2,fit3)
Analysis of Variance Table
Model 1: log(alk.phos) ~ arm + age + sex
Model 2: log(alk.phos) ~ arm + ns(age, df = 2) + sex
  Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
1   1228 503.19                              
2   1227 502.07  1    1.1137 2.7218 0.09924 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> # look at functional form of age
> termplot(fit3, term=2, se=T, rug=T)In this instance it looks like there isn’t enough evidence to say that the relationship is non-linear.
broom packageThe broom package makes it easy to extract information from the fit.
> tmp <- tidy(fit3) # coefficients, p-values
> class(tmp)
[1] "data.frame"
> tmp
              term    estimate  std.error statistic       p.value
1      (Intercept)  4.76454026 0.14102237 33.785704 1.928465e-177
2     armF: FOLFOX -0.07668790 0.04344412 -1.765208  7.777754e-02
3       armG: IROX -0.01945575 0.04906984 -0.396491  6.918118e-01
4 ns(age, df = 2)1  0.33031939 0.26002425  1.270341  2.042041e-01
5 ns(age, df = 2)2 -0.10069469 0.09349337 -1.077025  2.816809e-01
6        sexFemale  0.01829092 0.03742970  0.488674  6.251598e-01
> 
> glance(fit3)
  r.squared adj.r.squared     sigma statistic   p.value df    logLik
1 0.0053278   0.001274531 0.6396787  1.314445 0.2552466  6 -1195.653
       AIC      BIC deviance df.residual
1 2405.305 2441.126 502.0747        1227> ms.logy <- modelsum(log(alk.phos) ~ arm + ps + hgb, data=mockstudy, adjust= ~age + sex, 
+                     family=gaussian,  
+                     gaussian.stats=c("estimate","CI.lower.estimate","CI.upper.estimate","p.value"))
> summary(ms.logy)| estimate | CI.lower.estimate | CI.upper.estimate | p.value | |
|---|---|---|---|---|
| (Intercept) | 4.97 | 4.77 | 5.17 | <0.001 | 
| Treatment Arm F: FOLFOX | -0.077 | -0.162 | 0.009 | 0.078 | 
| Treatment Arm G: IROX | -0.019 | -0.116 | 0.077 | 0.695 | 
| sex Female | 0.018 | -0.056 | 0.091 | 0.632 | 
| Age in Years | 0 | -0.004 | 0.003 | 0.798 | 
| (Intercept) | 4.83 | 4.64 | 5.02 | <0.001 | 
| ps | 0.226 | 0.167 | 0.284 | <0.001 | 
| sex Female | 0.009 | -0.063 | 0.081 | 0.814 | 
| Age in Years | -0.001 | -0.004 | 0.002 | 0.636 | 
| (Intercept) | 5.76 | 5.45 | 6.08 | <0.001 | 
| hgb | -0.069 | -0.09 | -0.048 | <0.001 | 
| sex Female | -0.027 | -0.101 | 0.046 | 0.468 | 
| Age in Years | 0 | -0.003 | 0.003 | 0.925 | 
> boxplot(age ~ mdquality.s, data=mockstudy, ylab=attr(mockstudy$age,'label'), xlab='mdquality.s')> 
> fit <- glm(mdquality.s ~ age + sex, data=mockstudy, family=binomial)
> summary(fit)
Call:
glm(formula = mdquality.s ~ age + sex, family = binomial, data = mockstudy)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1832   0.4500   0.4569   0.4626   0.4756  
Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  2.329442   0.514684   4.526 6.01e-06 ***
age         -0.002353   0.008256  -0.285    0.776    
sexFemale    0.039227   0.195330   0.201    0.841    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
    Null deviance: 807.68  on 1246  degrees of freedom
Residual deviance: 807.55  on 1244  degrees of freedom
  (252 observations deleted due to missingness)
AIC: 813.55
Number of Fisher Scoring iterations: 4
> 
> # create Odd's ratio w/ confidence intervals
> tmp <- data.frame(summary(fit)$coef)
> tmp
                Estimate  Std..Error    z.value     Pr...z..
(Intercept)  2.329441734 0.514683688  4.5259677 6.011977e-06
age         -0.002353404 0.008255814 -0.2850602 7.755980e-01
sexFemale    0.039227292 0.195330166  0.2008256 8.408350e-01
> 
> tmp$OR <- round(exp(tmp[,1]),2)
> tmp$lower.CI <- round(exp(tmp[,1] - 1.96* tmp[,2]),2)
> tmp$upper.CI <- round(exp(tmp[,1] + 1.96* tmp[,2]),2)
> names(tmp)[4] <- 'P-value'
> 
> kable(tmp[,c('OR','lower.CI','upper.CI','P-value')])| OR | lower.CI | upper.CI | P-value | |
|---|---|---|---|---|
| (Intercept) | 10.27 | 3.75 | 28.17 | 0.000006 | 
| age | 1.00 | 0.98 | 1.01 | 0.775598 | 
| sexFemale | 1.04 | 0.71 | 1.53 | 0.840835 | 
> 
> # Assess the predictive ability of the model
> 
> # code using the pROC package
> require(pROC)
> pred <- predict(fit, type='response')
> tmp <- pROC::roc(mockstudy$mdquality.s[!is.na(mockstudy$mdquality.s)]~ pred, plot=TRUE, percent=TRUE)> tmp$auc
Area under the curve: 50.69%broom packageThe broom package makes it easy to extract information from the fit.
> tidy(fit, exp=T, conf.int=T) # coefficients, p-values, conf.intervals
         term   estimate   std.error  statistic      p.value  conf.low
1 (Intercept) 10.2722053 0.514683688  4.5259677 6.011977e-06 3.8305925
2         age  0.9976494 0.008255814 -0.2850602 7.755980e-01 0.9814436
3   sexFemale  1.0400068 0.195330166  0.2008256 8.408350e-01 0.7119068
  conf.high
1 28.876261
2  1.013764
3  1.533763
> 
> glance(fit) # model summary statistics
  null.deviance df.null    logLik      AIC      BIC deviance df.residual
1      807.6764    1246 -403.7734 813.5468 828.9323 807.5468        1244> summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial))| OR | CI.lower.OR | CI.upper.OR | p.value | concordance | Nmiss | |
|---|---|---|---|---|---|---|
| (Intercept) | NA | NA | NA | <0.001 | 0.507 | 0 | 
| Age in Years | 0.998 | 0.981 | 1.01 | 0.776 | . | . | 
| sexFemale | 1.04 | 0.712 | 1.53 | 0.841 | . | . | 
| (Intercept) | NA | NA | NA | 0.003 | 0.55 | 21 | 
| Body Mass Index (kg/m^2) | 1.02 | 0.987 | 1.06 | 0.220 | . | . | 
| sexFemale | 1.05 | 0.717 | 1.56 | 0.794 | . | . | 
> 
> fitall <- modelsum(mdquality.s ~ age, data=mockstudy, family=binomial,
+                    binomial.stats=c("Nmiss2","OR","p.value"))
> summary(fitall)| OR | p.value | Nmiss2 | |
|---|---|---|---|
| (Intercept) | NA | <0.001 | 0 | 
| Age in Years | 0.998 | 0.766 | . | 
> require(survival)
Loading required package: survival
> 
> # multivariable model with all 3 terms
> fit  <- coxph(Surv(fu.time, fu.stat) ~ age + sex + arm, data=mockstudy)
> summary(fit)
Call:
coxph(formula = Surv(fu.time, fu.stat) ~ age + sex + arm, data = mockstudy)
  n= 1499, number of events= 1356 
                  coef exp(coef)  se(coef)      z Pr(>|z|)    
age           0.004600  1.004611  0.002501  1.839   0.0659 .  
sexFemale     0.039893  1.040699  0.056039  0.712   0.4765    
armF: FOLFOX -0.454650  0.634670  0.064878 -7.008 2.42e-12 ***
armG: IROX   -0.140785  0.868676  0.072760 -1.935   0.0530 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
             exp(coef) exp(-coef) lower .95 upper .95
age             1.0046     0.9954    0.9997    1.0095
sexFemale       1.0407     0.9609    0.9324    1.1615
armF: FOLFOX    0.6347     1.5756    0.5589    0.7207
armG: IROX      0.8687     1.1512    0.7532    1.0018
Concordance= 0.563  (se = 0.009 )
Rsquare= 0.037   (max possible= 1 )
Likelihood ratio test= 56.21  on 4 df,   p=1.811e-11
Wald test            = 56.26  on 4 df,   p=1.77e-11
Score (logrank) test = 56.96  on 4 df,   p=1.259e-11
> 
> # check proportional hazards assumption
> fit.z <- cox.zph(fit)
> fit.z
                 rho chisq     p
age          -0.0311  1.46 0.226
sexFemale    -0.0325  1.44 0.230
armF: FOLFOX  0.0343  1.61 0.205
armG: IROX    0.0337  1.54 0.214
GLOBAL            NA  4.59 0.332
> plot(fit.z[1], resid=FALSE) # makes for a cleaner picture in this case
> abline(h=coef(fit)[1], col='red')> 
> # check functional form for age using pspline (penalized spline)
> # results are returned for the linear and non-linear components
> fit2 <- coxph(Surv(fu.time, fu.stat) ~ pspline(age) + sex + arm, data=mockstudy)
> fit2
Call:
coxph(formula = Surv(fu.time, fu.stat) ~ pspline(age) + sex + 
    arm, data = mockstudy)
                         coef se(coef)      se2    Chisq   DF       p
pspline(age), linear  0.00443  0.00237  0.00237  3.48989 1.00  0.0617
pspline(age), nonlin                            13.11270 3.08  0.0047
sexFemale             0.03993  0.05610  0.05607  0.50663 1.00  0.4766
armF: FOLFOX         -0.46240  0.06494  0.06493 50.69608 1.00 1.1e-12
armG: IROX           -0.15243  0.07301  0.07299  4.35876 1.00  0.0368
Iterations: 6 outer, 16 Newton-Raphson
     Theta= 0.954 
Degrees of freedom for terms= 4.1 1.0 2.0 
Likelihood ratio test=70.1  on 7.08 df, p=1.59e-12  n= 1499 
> 
> # plot smoothed age to visualize why significant
> termplot(fit2, se=T, terms=1)
> abline(h=0)> 
> # The c-statistic comes out in the summary of the fit
> summary(fit2)$concordance
          C       se(C) 
0.568432549 0.008779125 
> 
> # It can also be calculated using the survConcordance function
> survConcordance(Surv(fu.time, fu.stat) ~ predict(fit2), data=mockstudy)
Call:
survConcordance(formula = Surv(fu.time, fu.stat) ~ predict(fit2), 
    data = mockstudy)
  n= 1499 
Concordance= 0.5684325 se= 0.008779125
concordant discordant  tied.risk  tied.time   std(c-d) 
 620221.00  470282.00    5021.00     766.00   19235.49 broom packageThe broom package makes it easy to extract information from the fit.
> tidy(fit) # coefficients, p-values
          term     estimate   std.error  statistic      p.value
1          age  0.004600011 0.002501114  1.8391844 6.588807e-02
2    sexFemale  0.039892735 0.056038632  0.7118792 4.765396e-01
3 armF: FOLFOX -0.454650445 0.064878289 -7.0077441 2.421952e-12
4   armG: IROX -0.140784996 0.072759529 -1.9349355 5.299821e-02
       conf.low    conf.high
1 -0.0003020836  0.009502105
2 -0.0699409642  0.149726435
3 -0.5818095536 -0.327491336
4 -0.2833910528  0.001821061
> 
> glance(fit) # model summary statistics
     n nevent statistic.log  p.value.log statistic.sc   p.value.sc
1 1499   1356      56.21071 1.811218e-11      56.9642 1.258749e-11
  statistic.wald p.value.wald  r.squared r.squared.max concordance
1          56.26 1.770173e-11 0.03680443     0.9999923    0.562838
  std.error.concordance    logLik      AIC      BIC
1           0.008779125 -8797.588 17603.18 17624.03> ##Note: You must use quotes when specifying family="survival" 
> ##      family=survival will not work
> summary(modelsum(Surv(fu.time, fu.stat) ~ arm, 
+                  adjust=~age + sex, data=mockstudy, family="survival"))| HR | CI.lower.HR | CI.upper.HR | p.value | concordance | |
|---|---|---|---|---|---|
| Treatment Arm F: FOLFOX | 0.635 | 0.559 | 0.721 | <0.001 | 0.563 | 
| Treatment Arm G: IROX | 0.869 | 0.753 | 1 | 0.053 | . | 
| sexFemale | 1.04 | 0.932 | 1.16 | 0.477 | . | 
| age | 1 | 1 | 1.01 | 0.066 | . | 
> 
> ##Note: the pspline term is not working yet
> #summary(modelsum(Surv(fu.time, fu.stat) ~ arm, 
> #                adjust=~pspline(age) + sex, data=mockstudy, family='survival'))Poisson regression is useful when predicting an outcome variable representing counts. It can also be useful when looking at survival data. Cox models and Poisson models are very closely related and survival data can be summarized using Poisson regression. If you have overdispersion (see if the residual deviance is much larger than degrees of freedom), you may want to use quasipoisson() instead of poisson(). Some of these diagnostics need to be done outside of modelsum.
For the first example, use the solder dataset available in the rpart package. The endpoint skips has a definite Poisson look.
> require(rpart) ##just to get access to solder dataset
> data(solder)
> hist(solder$skips)> 
> fit <- glm(skips ~ Opening + Solder + Mask , data=solder, family=poisson)
> anova(fit, test='Chi')
Analysis of Deviance Table
Model: poisson, link: log
Response: skips
Terms added sequentially (first to last)
        Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                      719     6855.7              
Opening  2  2524.56       717     4331.1 < 2.2e-16 ***
Solder   1   936.95       716     3394.2 < 2.2e-16 ***
Mask     3  1653.09       713     1741.1 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> summary(fit)
Call:
glm(formula = skips ~ Opening + Solder + Mask, family = poisson, 
    data = solder)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-4.7252  -1.3409  -0.6276   0.6930   5.2342  
Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.30871    0.08068 -16.222  < 2e-16 ***
OpeningM     0.25851    0.06656   3.884 0.000103 ***
OpeningS     1.89349    0.05363  35.306  < 2e-16 ***
SolderThin   1.09973    0.03864  28.465  < 2e-16 ***
MaskA3       0.42819    0.07547   5.674  1.4e-08 ***
MaskB3       1.20225    0.06697  17.953  < 2e-16 ***
MaskB6       1.86648    0.06310  29.580  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
    Null deviance: 6855.7  on 719  degrees of freedom
Residual deviance: 1741.1  on 713  degrees of freedom
AIC: 3337.2
Number of Fisher Scoring iterations: 5Overdispersion is when the Residual deviance is larger than the degrees of freedom. This can be tested, approximately using the following code. The goal is to have a p-value that is \(>0.05\).
> 1-pchisq(fit$deviance, fit$df.residual)
[1] 0One possible solution is to use the quasipoisson family instead of the poisson family. This adjusts for the overdispersion.
> fit2 <- glm(skips ~ Opening + Solder + Mask, data=solder, family=quasipoisson)
> summary(fit2)
Call:
glm(formula = skips ~ Opening + Solder + Mask, family = quasipoisson, 
    data = solder)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-4.7252  -1.3409  -0.6276   0.6930   5.2342  
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.30871    0.12496 -10.473  < 2e-16 ***
OpeningM     0.25851    0.10310   2.507 0.012382 *  
OpeningS     1.89349    0.08307  22.794  < 2e-16 ***
SolderThin   1.09973    0.05984  18.377  < 2e-16 ***
MaskA3       0.42819    0.11689   3.663 0.000268 ***
MaskB3       1.20225    0.10372  11.591  < 2e-16 ***
MaskB6       1.86648    0.09774  19.097  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 2.399074)
    Null deviance: 6855.7  on 719  degrees of freedom
Residual deviance: 1741.1  on 713  degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5broom packageThe broom package makes it easy to extract information from the fit.
> tidy(fit) # coefficients, p-values
         term   estimate  std.error  statistic       p.value
1 (Intercept) -1.3087062 0.08067587 -16.221780  3.537930e-59
2    OpeningM  0.2585107 0.06656163   3.883780  1.028452e-04
3    OpeningS  1.8934884 0.05363137  35.305612 4.816124e-273
4  SolderThin  1.0997315 0.03863508  28.464582 3.216362e-178
5      MaskA3  0.4281934 0.07546810   5.673833  1.396375e-08
6      MaskB3  1.2022472 0.06696662  17.952933  4.552147e-72
7      MaskB6  1.8664830 0.06309987  29.579826 2.716304e-192
> 
> glance(fit) # model summary statistics
  null.deviance df.null    logLik      AIC      BIC deviance df.residual
1       6855.69     719 -1661.623 3337.247 3369.302  1741.08         713> summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="quasipoisson"))| RR | CI.lower.RR | CI.upper.RR | p.value | |
|---|---|---|---|---|
| (Intercept) | NA | NA | NA | <0.001 | 
| Opening M | 1.29 | 0.915 | 1.84 | 0.147 | 
| Opening S | 6.64 | 5.06 | 8.89 | <0.001 | 
| (Intercept) | NA | NA | NA | <0.001 | 
| Solder Thin | 3 | 2.34 | 3.89 | <0.001 | 
| (Intercept) | NA | NA | NA | 0.007 | 
| Mask A3 | 1.53 | 0.99 | 2.41 | 0.059 | 
| Mask B3 | 3.33 | 2.27 | 5.01 | <0.001 | 
| Mask B6 | 6.47 | 4.53 | 9.53 | <0.001 | 
> summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="poisson"))| RR | CI.lower.RR | CI.upper.RR | p.value | |
|---|---|---|---|---|
| (Intercept) | NA | NA | NA | <0.001 | 
| Opening M | 1.29 | 1.14 | 1.48 | <0.001 | 
| Opening S | 6.64 | 5.99 | 7.39 | <0.001 | 
| (Intercept) | NA | NA | NA | <0.001 | 
| Solder Thin | 3 | 2.79 | 3.24 | <0.001 | 
| (Intercept) | NA | NA | NA | <0.001 | 
| Mask A3 | 1.53 | 1.32 | 1.78 | <0.001 | 
| Mask B3 | 3.33 | 2.92 | 3.8 | <0.001 | 
| Mask B6 | 6.47 | 5.72 | 7.33 | <0.001 | 
This second example uses the survival endpoint available in the mockstudy dataset. There is a close relationship between survival and Poisson models, and often it is easier to fit the model using Poisson regression, especially if you want to present absolute risk.
> # add .01 to the follow-up time (.01*1 day) in order to keep everyone in the analysis
> fit <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm, data=mockstudy, family=poisson)
> summary(fit)
Call:
glm(formula = fu.stat ~ offset(log(fu.time + 0.01)) + age + sex + 
    arm, family = poisson, data = mockstudy)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.1188  -0.4041   0.3242   0.9727   4.3588  
Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -5.875627   0.108984 -53.913  < 2e-16 ***
age           0.003724   0.001705   2.184   0.0290 *  
sexFemale     0.027321   0.038575   0.708   0.4788    
armF: FOLFOX -0.335141   0.044600  -7.514 5.72e-14 ***
armG: IROX   -0.107776   0.050643  -2.128   0.0333 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
    Null deviance: 2113.5  on 1498  degrees of freedom
Residual deviance: 2048.0  on 1494  degrees of freedom
AIC: 5888.2
Number of Fisher Scoring iterations: 5
> 1-pchisq(fit$deviance, fit$df.residual)
[1] 0
> 
> coef(coxph(Surv(fu.time,fu.stat) ~ age + sex + arm, data=mockstudy))
         age    sexFemale armF: FOLFOX   armG: IROX 
 0.004600011  0.039892735 -0.454650445 -0.140784996 
> coef(fit)[-1]
         age    sexFemale armF: FOLFOX   armG: IROX 
 0.003723763  0.027320917 -0.335141090 -0.107775577 
> 
> # results from the Poisson model can then be described as risk ratios (similar to the hazard ratio)
> exp(coef(fit)[-1])
         age    sexFemale armF: FOLFOX   armG: IROX 
   1.0037307    1.0276976    0.7152372    0.8978291 
> 
> # As before, we can model the dispersion which alters the standard error
> fit2 <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm, 
+             data=mockstudy, family=quasipoisson)
> summary(fit2)
Call:
glm(formula = fu.stat ~ offset(log(fu.time + 0.01)) + age + sex + 
    arm, family = quasipoisson, data = mockstudy)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.1188  -0.4041   0.3242   0.9727   4.3588  
Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -5.875627   0.566666 -10.369   <2e-16 ***
age           0.003724   0.008867   0.420    0.675    
sexFemale     0.027321   0.200572   0.136    0.892    
armF: FOLFOX -0.335141   0.231899  -1.445    0.149    
armG: IROX   -0.107776   0.263318  -0.409    0.682    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 27.03493)
    Null deviance: 2113.5  on 1498  degrees of freedom
Residual deviance: 2048.0  on 1494  degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5broom packageThe broom package makes it easy to extract information from the fit.
> tidy(fit) ##coefficients, p-values
          term     estimate   std.error   statistic      p.value
1  (Intercept) -5.875626610 0.108984423 -53.9125359 0.000000e+00
2          age  0.003723763 0.001705363   2.1835606 2.899455e-02
3    sexFemale  0.027320917 0.038575062   0.7082533 4.787879e-01
4 armF: FOLFOX -0.335141090 0.044600079  -7.5143610 5.718959e-14
5   armG: IROX -0.107775577 0.050642805  -2.1281518 3.332450e-02
> 
> glance(fit) ##model summary statistics
  null.deviance df.null    logLik      AIC      BIC deviance df.residual
1      2113.504    1498 -2939.082 5888.164 5914.727 2047.979        1494modelsumRemember that the result from modelsum is different from the fit above. The modelsum summary shows the results for age + offset(log(fu.time+.01)) then sex + offset(log(fu.time+.01)) instead of age + sex + arm + offset(log(fu.time+.01)).
> summary(modelsum(fu.stat ~ age, adjust=~offset(log(fu.time+.01))+ sex + arm, 
+                  data=mockstudy, family=poisson))| RR | CI.lower.RR | CI.upper.RR | p.value | |
|---|---|---|---|---|
| (Intercept) | NA | NA | NA | <0.001 | 
| Age in Years | 1 | 1 | 1.01 | 0.029 | 
| armF: FOLFOX | 0.715 | 0.656 | 0.781 | <0.001 | 
| armG: IROX | 0.898 | 0.813 | 0.991 | 0.033 | 
| sexFemale | 1.03 | 0.953 | 1.11 | 0.479 | 
Here are multiple examples showing how to use some of the different options.
There are standard settings for each type of model regarding what information is summarized in the table. This behavior can be modified using the modelsum.control function. In fact, you can save your standard settings and use that for future tables.
> mycontrols  <- modelsum.control(gaussian.stats=c("estimate","std.error","adj.r.squared","Nmiss"),
+                                 show.adjust=FALSE, show.intercept=FALSE)                            
> tab2 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy, control=mycontrols)
> summary(tab2)| estimate | std.error | adj.r.squared | |
|---|---|---|---|
| Age in Years | 0.012 | 0.012 | 0.004 | 
You can also change these settings directly in the modelsum call.
> tab3 <- modelsum(bmi ~  age, adjust=~sex, data=mockstudy,
+                  gaussian.stats=c("estimate","std.error","adj.r.squared","Nmiss"), 
+                  show.intercept=FALSE, show.adjust=FALSE)
> summary(tab3)| estimate | std.error | adj.r.squared | |
|---|---|---|---|
| Age in Years | 0.012 | 0.012 | 0.004 | 
In the above example, age is shown with a label (Age in Years), but sex is listed “as is”. This is because the data was created in SAS and in the SAS dataset, age had a label but sex did not. The label is stored as an attribute within R.
> ## Look at one variable's label
> attr(mockstudy$age,'label')
[1] "Age in Years"
> 
> ## See all the variables with a label
> unlist(lapply(mockstudy,'attr','label'))
                       age                        arm 
            "Age in Years"            "Treatment Arm" 
                      race                        bmi 
                    "Race" "Body Mass Index (kg/m^2)" 
> 
> ## or
> cbind(sapply(mockstudy,attr,'label'))
            [,1]                      
case        NULL                      
age         "Age in Years"            
arm         "Treatment Arm"           
sex         NULL                      
race        "Race"                    
fu.time     NULL                      
fu.stat     NULL                      
ps          NULL                      
hgb         NULL                      
bmi         "Body Mass Index (kg/m^2)"
alk.phos    NULL                      
ast         NULL                      
mdquality.s NULL                      
age.ord     NULL                      If you want to add labels to other variables, there are a couple of options. First, you could add labels to the variables in your dataset.
> attr(mockstudy$age,'label')  <- 'Age, yrs'
> 
> tab1 <- modelsum(bmi ~  age, adjust=~sex, data=mockstudy)
> summary(tab1)| estimate | std.error | p.value | adj.r.squared | |
|---|---|---|---|---|
| (Intercept) | 26.8 | 0.766 | <0.001 | 0.004 | 
| Age, yrs | 0.012 | 0.012 | 0.348 | . | 
| sex Female | -0.718 | 0.291 | 0.014 | . | 
Another option is to add labels after you have created the table
> mylabels <- list(sexFemale = "Female", age ="Age, yrs")
> summary(tab1, labelTranslations = mylabels)| estimate | std.error | p.value | adj.r.squared | |
|---|---|---|---|---|
| (Intercept) | 26.8 | 0.766 | <0.001 | 0.004 | 
| Age, yrs | 0.012 | 0.012 | 0.348 | . | 
| sex Female | -0.718 | 0.291 | 0.014 | . | 
Alternatively, you can check the variable labels and manipulate them with a function called labels, which works on the tableby object.
> labels(tab1)
                       bmi                        age 
"Body Mass Index (kg/m^2)"                 "Age, yrs" 
                 sexFemale 
              "sex Female" 
> labels(tab1) <- c(sexFemale="Female", age="Baseline Age (yrs)")
> labels(tab1)
                       bmi                        age 
"Body Mass Index (kg/m^2)"       "Baseline Age (yrs)" 
                 sexFemale 
                  "Female" > summary(tab1)| estimate | std.error | p.value | adj.r.squared | |
|---|---|---|---|---|
| (Intercept) | 26.8 | 0.766 | <0.001 | 0.004 | 
| Baseline Age (yrs) | 0.012 | 0.012 | 0.348 | . | 
| Female | -0.718 | 0.291 | 0.014 | . | 
> summary(modelsum(age~mdquality.s+sex, data=mockstudy), show.intercept=FALSE)| estimate | std.error | p.value | adj.r.squared | Nmiss | |
|---|---|---|---|---|---|
| mdquality.s | -0.326 | 1.09 | 0.766 | -0.001 | 252 | 
| sex Female | -1.2 | 0.61 | 0.048 | 0.002 | 0 | 
> summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial),
+         show.adjust=FALSE)  | OR | CI.lower.OR | CI.upper.OR | p.value | concordance | Nmiss | |
|---|---|---|---|---|---|---|
| (Intercept) | NA | NA | NA | <0.001 | 0.507 | 0 | 
| Age, yrs | 0.998 | 0.981 | 1.01 | 0.776 | . | . | 
| (Intercept) | NA | NA | NA | 0.003 | 0.55 | 21 | 
| Body Mass Index (kg/m^2) | 1.02 | 0.987 | 1.06 | 0.220 | . | . | 
Often one wants to summarize a number of variables. Instead of typing by hand each individual variable, an alternative approach is to create a formula using the paste command with the collapse="+" option.
> # create a vector specifying the variable names
> myvars <- names(mockstudy)
> 
> # select the 8th through the 12th
> # paste them together, separated by the + sign
> RHS <- paste(myvars[8:12], collapse="+")
> RHS[1] “ps+hgb+bmi+alk.phos+ast”
> 
> # create a formula using the as.formula function
> as.formula(paste('mdquality.s ~ ', RHS))mdquality.s ~ ps + hgb + bmi + alk.phos + ast
> 
> # use the formula in the modelsum function
> summary(modelsum(as.formula(paste('mdquality.s ~', RHS)), family=binomial, data=mockstudy))| OR | CI.lower.OR | CI.upper.OR | p.value | concordance | Nmiss | |
|---|---|---|---|---|---|---|
| (Intercept) | NA | NA | NA | <0.001 | 0.62 | 208 | 
| ps | 0.461 | 0.332 | 0.639 | <0.001 | . | . | 
| (Intercept) | NA | NA | NA | 0.783 | 0.573 | 208 | 
| hgb | 1.18 | 1.04 | 1.33 | 0.011 | . | . | 
| (Intercept) | NA | NA | NA | 0.002 | 0.549 | 21 | 
| Body Mass Index (kg/m^2) | 1.02 | 0.987 | 1.06 | 0.225 | . | . | 
| (Intercept) | NA | NA | NA | <0.001 | 0.552 | 208 | 
| alk.phos | 0.999 | 0.998 | 1 | 0.159 | . | . | 
| (Intercept) | NA | NA | NA | <0.001 | 0.545 | 208 | 
| ast | 0.995 | 0.988 | 1 | 0.099 | . | . | 
These steps can also be done using the formulize function.
> ## The formulize function does the paste and as.formula steps
> tmp <- formulize('mdquality.s',myvars[8:10])
> tmpmdquality.s ~ ps + hgb + bmi <environment: 0x7453118>
> 
> ## More complex formulas could also be written using formulize
> tmp2 <- formulize('mdquality.s',c('ps','hgb','sqrt(bmi)'))
> 
> ## use the formula in the modelsum function
> summary(modelsum(tmp, data=mockstudy, family=binomial))| OR | CI.lower.OR | CI.upper.OR | p.value | concordance | Nmiss | |
|---|---|---|---|---|---|---|
| (Intercept) | NA | NA | NA | <0.001 | 0.62 | 208 | 
| ps | 0.461 | 0.332 | 0.639 | <0.001 | . | . | 
| (Intercept) | NA | NA | NA | 0.783 | 0.573 | 208 | 
| hgb | 1.18 | 1.04 | 1.33 | 0.011 | . | . | 
| (Intercept) | NA | NA | NA | 0.002 | 0.549 | 21 | 
| Body Mass Index (kg/m^2) | 1.02 | 0.987 | 1.06 | 0.225 | . | . | 
Here are two ways to get the same result (limit the analysis to subjects age>50 and in the F: FOLFOX treatment group).
mockstudy. This example also selects a subset of variables. The modelsum function is then applied to this subsetted data.> newdata <- subset(mockstudy, subset=age>50 & arm=='F: FOLFOX', select = c(age,sex, bmi:alk.phos))
> dim(mockstudy)
[1] 1499   14
> table(mockstudy$arm)
   A: IFL F: FOLFOX   G: IROX 
      428       691       380 
> dim(newdata)
[1] 557   4
> names(newdata)
[1] "age"      "sex"      "bmi"      "alk.phos"> summary(modelsum(alk.phos ~ ., data=newdata))| estimate | std.error | p.value | adj.r.squared | Nmiss | |
|---|---|---|---|---|---|
| (Intercept) | 123 | 46.9 | 0.009 | -0.001 | 0 | 
| age | 0.619 | 0.719 | 0.390 | . | . | 
| (Intercept) | 165 | 7.67 | <0.001 | -0.002 | 0 | 
| sex Female | -5.5 | 12.1 | 0.650 | . | . | 
| (Intercept) | 239 | 33.7 | <0.001 | 0.01 | 11 | 
| bmi | -2.8 | 1.21 | 0.022 | . | . | 
modelsum to subset the data.> summary(modelsum(log(alk.phos) ~ sex + ps + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy))| estimate | std.error | p.value | adj.r.squared | Nmiss | |
|---|---|---|---|---|---|
| (Intercept) | 4.87 | 0.039 | <0.001 | -0.002 | 0 | 
| sex Female | -0.005 | 0.062 | 0.931 | . | . | 
| (Intercept) | 4.77 | 0.04 | <0.001 | 0.027 | 0 | 
| ps | 0.183 | 0.05 | <0.001 | . | . | 
| (Intercept) | 5.21 | 0.172 | <0.001 | 0.007 | 11 | 
| bmi | -0.012 | 0.006 | 0.044 | . | . | 
> summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset = age>50 & bmi<24, data=mockstudy))| estimate | std.error | p.value | adj.r.squared | |
|---|---|---|---|---|
| (Intercept) | 179 | 14.6 | <0.001 | 0.007 | 
| ps | 20.8 | 13.4 | 0.122 | . | 
| sex Female | -18 | 16.7 | 0.293 | . | 
| (Intercept) | 373 | 104 | <0.001 | 0.009 | 
| bmi | -8.2 | 4.73 | 0.083 | . | 
| sex Female | -24 | 16.9 | 0.155 | . | 
> summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset=1:30, data=mockstudy))| estimate | std.error | p.value | adj.r.squared | Nmiss | |
|---|---|---|---|---|---|
| (Intercept) | 169 | 57 | 0.006 | 0.294 | 0 | 
| ps | 255 | 68.1 | <0.001 | . | . | 
| sex Female | 49.6 | 67.6 | 0.470 | . | . | 
| (Intercept) | 453 | 201 | 0.033 | -0.049 | 1 | 
| bmi | -6 | 7.41 | 0.426 | . | . | 
| sex Female | -22 | 79.8 | 0.782 | . | . | 
> ## create a variable combining the levels of mdquality.s and sex
> with(mockstudy, table(interaction(mdquality.s,sex)))
  0.Male   1.Male 0.Female 1.Female 
      77      686       47      437 > summary(modelsum(age ~ interaction(mdquality.s,sex), data=mockstudy))| estimate | std.error | p.value | adj.r.squared | Nmiss | |
|---|---|---|---|---|---|
| (Intercept) | 59.7 | 1.31 | <0.001 | 0.003 | 252 | 
| interaction(mdquality.s, sex)1.Male | 0.73 | 1.39 | 0.598 | . | . | 
| interaction(mdquality.s, sex)0.Female | 0.988 | 2.13 | 0.643 | . | . | 
| interaction(mdquality.s, sex)1.Female | -1 | 1.42 | 0.474 | . | . | 
Certain transformations need to be surrounded by I() so that R knows to treat it as a variable transformation and not some special model feature. If the transformation includes any of the symbols / - + ^ * then surround the new variable by I().
> summary(modelsum(arm=="F: FOLFOX" ~ I(age/10) + log(bmi) + mdquality.s,
+                  data=mockstudy, family=binomial))| OR | CI.lower.OR | CI.upper.OR | p.value | concordance | Nmiss | |
|---|---|---|---|---|---|---|
| (Intercept) | NA | NA | NA | 0.126 | 0.514 | 0 | 
| Age, yrs | 1.05 | 0.957 | 1.14 | 0.326 | . | . | 
| (Intercept) | NA | NA | NA | 0.611 | 0.508 | 33 | 
| Body Mass Index (kg/m^2) | 1.09 | 0.638 | 1.87 | 0.748 | . | . | 
| (Intercept) | NA | NA | NA | 0.074 | 0.502 | 252 | 
| mdquality.s | 1.04 | 0.719 | 1.53 | 0.819 | . | . | 
> mytab <- modelsum(bmi ~ sex + alk.phos + age, data=mockstudy)
> mytab2 <- mytab[c('age','sex','alk.phos')]
> summary(mytab2)| estimate | std.error | p.value | adj.r.squared | Nmiss | |
|---|---|---|---|---|---|
| (Intercept) | 26.4 | 0.752 | <0.001 | 0 | 0 | 
| Age, yrs | 0.013 | 0.012 | 0.290 | . | . | 
| (Intercept) | 27.5 | 0.181 | <0.001 | 0.004 | 0 | 
| sex Female | -0.731 | 0.29 | 0.012 | . | . | 
| (Intercept) | 27.9 | 0.253 | <0.001 | 0.011 | 261 | 
| alk.phos | -0.005 | 0.001 | <0.001 | . | . | 
> summary(mytab[c('age','sex')])| estimate | std.error | p.value | adj.r.squared | |
|---|---|---|---|---|
| (Intercept) | 26.4 | 0.752 | <0.001 | 0 | 
| Age, yrs | 0.013 | 0.012 | 0.290 | . | 
| (Intercept) | 27.5 | 0.181 | <0.001 | 0.004 | 
| sex Female | -0.731 | 0.29 | 0.012 | . | 
> summary(mytab[c(3,1)])| estimate | std.error | p.value | adj.r.squared | |
|---|---|---|---|---|
| (Intercept) | 26.4 | 0.752 | <0.001 | 0 | 
| Age, yrs | 0.013 | 0.012 | 0.290 | . | 
| (Intercept) | 27.5 | 0.181 | <0.001 | 0.004 | 
| sex Female | -0.731 | 0.29 | 0.012 | . | 
modelsum objects togetherIt is possible to combine two modelsum objects so that they print out together, however you need to pay attention to the columns that are being displayed. It is easier to combine two models of the same family (such as two sets of linear models). If you want to combine linear and logistic model results then you would want to display the beta coefficients for the logistic model.
> ## demographics
> tab1 <- modelsum(bmi ~ sex + age, data=mockstudy)
> ## lab data
> tab2 <- modelsum(mdquality.s ~ hgb + alk.phos, data=mockstudy, family=binomial)
>                 
> tab12 <- merge(tab1,tab2)
> class(tab12)[1] “modelsumList”
> 
> ##ERROR: The merge works, but not the summary
> #summary(tab12)When creating a pdf the tables are automatically numbered and the title appears below the table. In Word and HTML, the titles appear un-numbered and above the table.
> t1 <- modelsum(bmi ~ sex + age, data=mockstudy)
> summary(t1, title='Demographics')| estimate | std.error | p.value | adj.r.squared | |
|---|---|---|---|---|
| (Intercept) | 27.5 | 0.181 | <0.001 | 0.004 | 
| sex Female | -0.731 | 0.29 | 0.012 | . | 
| (Intercept) | 26.4 | 0.752 | <0.001 | 0 | 
| Age, yrs | 0.013 | 0.012 | 0.290 | . | 
Depending on the report you are writing you have the following options:
Use all values available for each variable
Use only those subjects who have measurements available for all the variables
> ## look at how many missing values there are for each variable
> apply(is.na(mockstudy),2,sum)
       case         age         arm         sex        race     fu.time 
          0           0           0           0           7           0 
    fu.stat          ps         hgb         bmi    alk.phos         ast 
          0         266         266          33         266         266 
mdquality.s     age.ord 
        252           0 > ## Show how many subjects have each variable (non-missing)
> summary(modelsum(bmi ~ ast + age, data=mockstudy,
+                 control=modelsum.control(gaussian.stats=c("N","estimate"))))| estimate | N | |
|---|---|---|
| (Intercept) | 27.3 | 1205 | 
| ast | -0.005 | . | 
| (Intercept) | 26.4 | 1466 | 
| Age, yrs | 0.013 | . | 
> 
> ## Always list the number of missing values
> summary(modelsum(bmi ~ ast + age, data=mockstudy,
+                 control=modelsum.control(gaussian.stats=c("Nmiss2","estimate"))))| estimate | Nmiss2 | |
|---|---|---|
| (Intercept) | 27.3 | 261 | 
| ast | -0.005 | . | 
| (Intercept) | 26.4 | 0 | 
| Age, yrs | 0.013 | . | 
> 
> ## Only show the missing values if there are some (default)
> summary(modelsum(bmi ~ ast + age, data=mockstudy, 
+                 control=modelsum.control(gaussian.stats=c("Nmiss","estimate"))))| estimate | Nmiss | |
|---|---|---|
| (Intercept) | 27.3 | 261 | 
| ast | -0.005 | . | 
| (Intercept) | 26.4 | 0 | 
| Age, yrs | 0.013 | . | 
> 
> ## Don't show N at all
> summary(modelsum(bmi ~ ast + age, data=mockstudy, 
+                 control=modelsum.control(gaussian.stats=c("estimate"))))| estimate | |
|---|---|
| (Intercept) | 27.3 | 
| ast | -0.005 | 
| (Intercept) | 26.4 | 
| Age, yrs | 0.013 | 
Within modelsum.control function there are 4 options for controlling the number of significant digits shown.
digits: controls the number of significant digits (counting both before and after the decimal point) for continuous variables
nsmall: controls the number of digits after the decimal point for the beta and standard error
nsmall.ratio: controls the number of digits for the ratio statistics (OR, HR, RR), default=2
digits.test: controls the number of digits after the decimal point for p-values (default=3)
> summary(modelsum(bmi ~ sex + age + fu.time, data=mockstudy), digits=4, digits.test=2)| estimate | std.error | p.value | adj.r.squared | |
|---|---|---|---|---|
| (Intercept) | 27.49 | 0.1813 | <0.01 | 0.0036 | 
| sex Female | -0.7311 | 0.2903 | 0.01 | . | 
| (Intercept) | 26.42 | 0.7521 | <0.01 | 1e-04 | 
| Age, yrs | 0.013 | 0.0123 | 0.29 | . | 
| (Intercept) | 26.49 | 0.2447 | <0.01 | 0.0079 | 
| fu.time | 0.0011 | 3e-04 | <0.01 | . | 
It is important to understand how R treats the digits argument. Here are some summaries for the variable pi. Note that with 4 digits, the number after the decimal point changes after multiplying pi by 10 or 100. However, the nsmall option specifies the number of values after the decimal point. The two can be used together (see the help file for format for more details on how that works).
> format(pi, digits=1)
[1] "3"
> format(pi, digits=3)
[1] "3.14"
> format(pi, digits=4)
[1] "3.142"
> format(pi*10, digits=4)
[1] "31.42"
> format(pi*100, digits=4)
[1] "314.2"
> format(pi*100, nsmall=4)
[1] "314.1593"
> format(pi*100, nsmall=2, digits=4)
[1] "314.16"Occasionally it is of interest to fit models using case weights. The modelsum function allows you to pass on the weights to the models and it will do the appropriate fit.
> mockstudy$agegp <- cut(mockstudy$age, breaks=c(18,50,60,70,90), right=FALSE)
> 
> ## create weights based on agegp and sex distribution
> tab1 <- with(mockstudy,table(agegp, sex))
> tab1
         sex
agegp     Male Female
  [18,50)  152    110
  [50,60)  258    178
  [60,70)  295    173
  [70,90)  211    122
> tab2 <- with(mockstudy, table(agegp, sex, arm))
> gpwts <- rep(tab1, length(unique(mockstudy$arm)))/tab2
> 
> ## apply weights to subjects
> index <- with(mockstudy, cbind(as.numeric(agegp), as.numeric(sex), as.numeric(as.factor(arm)))) 
> mockstudy$wts <- gpwts[index]
> 
> ## show weights by treatment arm group
> tapply(mockstudy$wts,mockstudy$arm, summary)
$`A: IFL`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.923   3.225   3.548   3.502   3.844   4.045 
$`F: FOLFOX`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.033   2.070   2.201   2.169   2.263   2.303 
$`G: IROX`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  3.667   3.734   4.023   3.945   4.031   4.471 > mockstudy$newvarA <- as.numeric(mockstudy$arm=='A: IFL')
> tab1 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), 
+                  family=binomial)
> summary(tab1, title='No Case Weights used')| OR | CI.lower.OR | CI.upper.OR | p.value | concordance | Nmiss | |
|---|---|---|---|---|---|---|
| (Intercept) | NA | NA | NA | <0.001 | 0.55 | 210 | 
| ast | 1 | 0.998 | 1.01 | 0.258 | . | . | 
| (Intercept) | NA | NA | NA | 0.091 | 0.5 | 29 | 
| bmi | 1 | 0.98 | 1.03 | 0.808 | . | . | 
| (Intercept) | NA | NA | NA | 0.990 | 0.514 | 210 | 
| hgb | 0.965 | 0.894 | 1.04 | 0.372 | . | . | 
> 
> suppressWarnings({
+ tab2 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), 
+                  weights=wts, family=binomial)
+ summary(tab2, title='Case Weights used')
+ })| OR | CI.lower.OR | CI.upper.OR | p.value | concordance | Nmiss | |
|---|---|---|---|---|---|---|
| (Intercept) | NA | NA | NA | 0.504 | 0.55 | 210 | 
| ast | 1 | 1 | 1.01 | 0.068 | . | . | 
| (Intercept) | NA | NA | NA | 0.820 | 0.5 | 29 | 
| bmi | 1 | 0.988 | 1.02 | 0.780 | . | . | 
| (Intercept) | NA | NA | NA | 0.039 | 0.514 | 210 | 
| hgb | 0.956 | 0.913 | 1 | 0.058 | . | . | 
modelsum within an Sweave documentFor those users who wish to create tables within an Sweave document, the following code seems to work.
\documentclass{article}
\usepackage{longtable}
\usepackage{pdfpages}
\begin{document}
\section{Read in Data}
<<echo=TRUE>>=
require(arsenal)
require(knitr)
require(rmarkdown)
data(mockstudy)
tab1 <- modelsum(bmi~sex+age, data=mockstudy)
@
\section{Convert Summary.modelsum to LaTeX}
<<echo=TRUE, results='hide', message=FALSE>>=
capture.output(summary(tab1), file="Test.md")
## Convert R Markdown Table to LaTeX
render("Test.md", pdf_document(keep_tex=TRUE))
@ 
\includepdf{Test.pdf}
\end{document}modelsum results to a .CSV fileWhen looking at multiple variables it is sometimes useful to export the results to a csv file. The as.data.frame function creates a data frame object that can be exported or further manipulated within R.
> summary(tab2, text=T)
-----------------------------------------------------------------------------------------------------------
                   OR             CI.lower.OR    CI.upper.OR    p.value        concordance    Nmiss        
----------------- -------------- -------------- -------------- -------------- -------------- --------------
(Intercept)       NA             NA             NA             0.504          0.55           210           
ast               1              1              1.01           0.068          .              .             
(Intercept)       NA             NA             NA             0.820          0.5            29            
bmi               1              0.988          1.02           0.780          .              .             
(Intercept)       NA             NA             NA             0.039          0.514          210           
hgb               0.956          0.913          1              0.058          .              .             
-----------------------------------------------------------------------------------------------------------
> tmp <- as.data.frame(tab2)
> tmp
         term model endpoint    OR CI.lower.OR CI.upper.OR p.value
1 (Intercept)     1  newvarA    NA          NA          NA   0.504
2         ast     1  newvarA 1.000       1.000        1.01   0.068
3 (Intercept)     2  newvarA    NA          NA          NA   0.820
4         bmi     2  newvarA 1.000       0.988        1.02   0.780
5 (Intercept)     3  newvarA    NA          NA          NA   0.039
6         hgb     3  newvarA 0.956       0.913        1.00   0.058
  concordance Nmiss
1       0.550   210
2       0.550   210
3       0.500    29
4       0.500    29
5       0.514   210
6       0.514   210
> # write.csv(tmp, '/my/path/here/mymodel.csv')modelsum object to a separate Word or HTML file> ## write to an HTML document
> # write2html(tab2, "~/ibm/trash.html")
> 
> ## write to a Word document
> # write2word(tab2, "~/ibm/trash.doc", title="My table in Word")The available summary statistics, by varible type, are:
binomial,quasibinomial: Logistic regression modelsOR, CI.lower.OR, CI.upper.OR, p.value, concordance, Nmissestimate, CI.lower.estimate, CI.upper.estimate, N, Nmiss2,      endpoint, std.error, statistic, logLik, AIC,      BIC, null.deviance, deviance, df.residual, df.nullgaussian: Linear regression modelsestimate, std.error, p.value, adj.r.squared, NmissCI.lower.estimate, CI.upper.estimate,      N, Nmiss2, statistic, standard.estimate, endpoint,      r.squared, AIC, BIC, logLik, statistic.F, p.value.Fpoisson, quasipoisson: Poisson regression modelsRR, CI.lower.RR, CI.upper.RR, p.value, concordance, NmissCI.lower.estimate, CI.upper.estimate, CI.RR, Nmiss2, se, estimate,      z.stat, endpoint, AIC, BIC, logLik, dispersion,      null.deviance, deviance, df.residual, df.nullsurvival: Cox modelsHR, CI.lower.HR, CI.upper.HR, p.value, concordance, NmissCI.lower.estimate, CI.upper.estimate, N, Nmiss2, estimate, se,      endpoint, Nevents, z.stat, r.squared, logLik,      AIC, BIC, statistic.sc, p.value.sc, p.value.log,      p.value.wald, N, std.error.concordanceThe full description of these parameters that can be shown for models include:
N: a count of the number of observations used in the analysisNmiss: only show the count of the number of missing values if there are some missing valuesNmiss2: always show a count of the number of missing values for a modelendpoint: dependent variable used in the modelstd.err: print the standard errorstatistic: test statisticp.value: print the p-valuer.squared: print the model R-squareadj.r.squared: print the model adjusted R-squarer.squared: print the model R-squareconcordance: print the model C statistic (which is the AUC for logistic models)logLik: print the loglikelihood valuep.value.log: print the p-value for the overall model likelihood testp.value.wald: print the p-value for the overall model wald testp.value.sc: print the p-value for overall model score testAIC: print the Akaike information criterionBIC: print the Bayesian information criterionnull.deviance: null deviancedeviance: model deviancedf.residual: degrees of freedom for the residualdf.null: degrees of freedom for the null modeldispersion: This is used in Poisson models and is defined as the deviance/df.residualstatistic.sc: overall model score statisticstd.error.concordance: standard error for the C statisticHR: print the hazard ratio (for survival models), i.e. exp(beta)CI.lower.HR, CI.upper.HR: print the confidence interval for the HROR: print the odd’s ratio (for logistic models), i.e. exp(beta)CI.lower.OR, CI.upper.OR: print the confidence interval for the ORRR: print the risk ratio (for poisson models), i.e. exp(beta)CI.lower.RR, CI.upper.RR: print the confidence interval for the RRestimate: print beta coefficientstandardized.estimate: print the standardized beta coefficientCI.lower.estimate, CI.upper.estimate: print the confidence interval for the beta coefficientmodelsum.control settingsA quick way to see what arguments are possible to utilize in a function is to use the args() command. Settings involving the number of digits can be set in modelsum.control or in summary.modelsum.
> args(modelsum.control)
function (digits = 3, nsmall = NULL, nsmall.ratio = 2, digits.test = 3, 
    show.adjust = TRUE, show.intercept = TRUE, conf.level = 0.95, 
    binomial.stats = c("OR", "CI.lower.OR", "CI.upper.OR", "p.value", 
        "concordance", "Nmiss"), gaussian.stats = c("estimate", 
        "std.error", "p.value", "adj.r.squared", "Nmiss"), poisson.stats = c("RR", 
        "CI.lower.RR", "CI.upper.RR", "p.value", "concordance", 
        "Nmiss"), survival.stats = c("HR", "CI.lower.HR", "CI.upper.HR", 
        "p.value", "concordance", "Nmiss"), ...) 
NULLSettings:
summary.modelsum settingsThe summary.modelsum function has options that modify how the table appears (such as adding a title or modifying labels).
> args(arsenal:::summary.modelsum)
function (object, title = NULL, labelTranslations = NULL, digits = NA, 
    nsmall = NA, nsmall.ratio = NA, digits.test = NA, show.intercept = NA, 
    show.adjust = NA, text = FALSE, removeBlanks = text, labelSize = 1.2, 
    pfootnote = TRUE, ...) 
NULLSettings: