This package is meant to implement the concept of a grammar of tables. It allows for a simple formula expression and a data frame to create a rich summary table in a variety of formats. It is designed for extensibility at each step of the process, so that one is not limited by the authors choice of table statistics, output format. The grammar however is an integral part of the package, and as such is not modifiable.
Here’s an example similar to summaryM from Hmisc to get us started:
tangram("drug ~ bili + albumin + stage::Categorical + protime + sex + age + spiders", pbc)=====================================================================================================================
                                    N   D-penicillamine       placebo        not randomized       Test Statistic     
                                              154               158               106                                
---------------------------------------------------------------------------------------------------------------------
Serum Bilirubin (mg/dl)            418  0.70 *1.30* 3.60  0.80 *1.40* 3.22  0.70 *1.40* 3.12  F_{2,415}=0.03, P=0.972
Albumin (gm/dl)                    418  3.34 *3.54* 3.78  3.21 *3.56* 3.83  3.12 *3.47* 3.73  F_{2,415}=2.13, P=0.120
Histologic Stage, Ludwig Criteria  412                                                          X^2_6=5.33, P=0.502  
   1                                     0.026    4/154    0.076   12/158    0.050    5/100                          
   2                                     0.208   32/154    0.222   35/158    0.250   25/100                          
   3                                     0.416   64/154    0.354   56/158    0.350   35/100                          
   4                                     0.351   54/154    0.348   55/158    0.350   35/100                          
Prothrombin Time (sec.)            416  10.0 *10.6* 11.4  10.0 *10.6* 11.0  10.1 *10.6* 11.0  F_{2,413}=0.23, P=0.795
sex : female                       418   0.903  139/154    0.867  137/158    0.925   98/106     X^2_2=2.38, P=0.304  
Age                                418  41.4 *48.1* 55.8  42.9 *51.9* 59.0  46.0 *53.0* 61.1  F_{2,415}=6.10, P=0.002
spiders : present                  312   0.292   45/154    0.285   45/158                       X^2_1=0.02, P=0.885  
=====================================================================================================================Or the same directly into an Rmarkdown pipe_table:
#rmd(tangram("drug ~ bili[2] + albumin + stage::Categorical + protime + sex + age + spiders", pbc))Notice that stage in the formula wasn’t stored as a factor, i.e. Categorical variable, so by adding a type specifier in the formula given, it is treated as a Categorical. There is no preconversion applied to the data frame, nor is there a guess based on the number of unique values. Full direct control of typing is provided in the formula specification.
It also supports HTML5, with styling fragments
html5(tangram("drug ~ bili[2] + albumin + stage::Categorical + protime + sex + age + spiders", pbc, msd=TRUE, quant=seq(0, 1, 0.25)),
      fragment=TRUE, inline="hmisc.css", caption = "HTML5 Table Hmisc Style", id="tbl2")| N | D-penicillamine | placebo | not randomized | Test Statistic | |
| 154 | 158 | 106 | |||
| Serum Bilirubinmg/dl | 418 | 0.300.701.303.6028.00 3.65±5.28 | 0.300.801.403.2220.00 2.87±3.63 | 0.400.701.403.1218.00 3.12±4.04 | F2,415 = 0.03,P = 0.9721 | 
| Albumingm/dl | 418 | 1.963.343.543.784.38 3.52±0.40 | 2.103.213.563.834.64 3.52±0.44 | 2.313.123.473.734.52 3.43±0.43 | F2,415 = 2.13,P = 0.1201 | 
| Histologic Stage, Ludwig Criteria | 412 | χ2 6 = 5.33,P = 0.5022 | |||
| 1 | 0 .0262.597  4154 | 0 .0767.595 12158 | 0 .0505.000  5100 | ||
| 2 | 0 .20820.779 32154 | 0 .22222.152 35158 | 0 .25025.000 25100 | ||
| 3 | 0 .41641.558 64154 | 0 .35435.443 56158 | 0 .35035.000 35100 | ||
| 4 | 0 .35135.065 54154 | 0 .34834.810 55158 | 0 .35035.000 35100 | ||
| Prothrombin Timesec. | 416 | 9.210.010.611.417.1 10.8±1.1 | 9.010.010.611.014.1 10.7±0.9 | 9.010.110.611.018.0 10.8±1.1 | F2,413 = 0.23,P = 0.7951 | 
| sex : female | 418 | 0 .90390.260139154 | 0 .86786.709137158 | 0 .92592.453 98106 | χ2 2 = 2.38,P = 0.3042 | 
| Age | 418 | 30.641.448.155.874.5 48.6±10.0 | 26.342.951.959.078.4 51.4±11.0 | 33.046.053.061.175.0 52.9±9.8 | F2,415 = 6.10,P = 0.0021 | 
| spiders : present | 312 | 0 .29229.221 45154 | 0 .28528.481 45158 | χ2 1 = 0.02,P = 0.8852 | 
Fragments can have localized style sheets specified by given id.
html5(tangram("drug ~ bili[2] + albumin + stage::Categorical + protime + sex + age + spiders", pbc),
      fragment=TRUE, inline="nejm.css", caption = "HTML5 Table NEJM Style", id="tbl3")| N | D-penicillamine | placebo | not randomized | Test Statistic | |
| 154 | 158 | 106 | |||
| Serum Bilirubinmg/dl | 418 | 0.701.303.60 | 0.801.403.22 | 0.701.403.12 | F2,415 = 0.03,P = 0.9721 | 
| Albumingm/dl | 418 | 3.343.543.78 | 3.213.563.83 | 3.123.473.73 | F2,415 = 2.13,P = 0.1201 | 
| Histologic Stage, Ludwig Criteria | 412 | χ2 6 = 5.33,P = 0.5022 | |||
| 1 | 0 .0262.597  4154 | 0 .0767.595 12158 | 0 .0505.000  5100 | ||
| 2 | 0 .20820.779 32154 | 0 .22222.152 35158 | 0 .25025.000 25100 | ||
| 3 | 0 .41641.558 64154 | 0 .35435.443 56158 | 0 .35035.000 35100 | ||
| 4 | 0 .35135.065 54154 | 0 .34834.810 55158 | 0 .35035.000 35100 | ||
| Prothrombin Timesec. | 416 | 10.010.611.4 | 10.010.611.0 | 10.110.611.0 | F2,413 = 0.23,P = 0.7951 | 
| sex : female | 418 | 0 .90390.260139154 | 0 .86786.709137158 | 0 .92592.453 98106 | χ2 2 = 2.38,P = 0.3042 | 
| Age | 418 | 41.448.155.8 | 42.951.959.0 | 46.053.061.1 | F2,415 = 6.10,P = 0.0021 | 
| spiders : present | 312 | 0 .29229.221 45154 | 0 .28528.481 45158 | χ2 1 = 0.02,P = 0.8852 | 
Fragments can have localized style sheets specified by given id.
tbl <- tangram("drug ~ bili[2] + albumin + stage::Categorical[1] + protime + sex[1] + age + spiders[1]", 
              data=pbc,
              pformat = 5)
html5(tbl,
      fragment=TRUE,
      inline="lancet.css",
      caption = "HTML5 Table Lancet Style", id="tbl4"
      )| N | D-penicillamine | placebo | not randomized | Test Statistic | |
| 154 | 158 | 106 | |||
| Serum Bilirubinmg/dl | 418 | 0.701.303.60 | 0.801.403.22 | 0.701.403.12 | F2,415 = 0.03,P = 0.972481 | 
| Albumingm/dl | 418 | 3.343.543.78 | 3.213.563.83 | 3.123.473.73 | F2,415 = 2.13,P = 0.119961 | 
| Histologic Stage, Ludwig Criteria | 412 | χ2 6 = 5.33,P = 0.502352 | |||
| 1 | 0 .02.6  4154 | 0 .17.6 12158 | 0 .15.0  5100 | ||
| 2 | 0 .220.8 32154 | 0 .222.2 35158 | 0 .225.0 25100 | ||
| 3 | 0 .441.6 64154 | 0 .435.4 56158 | 0 .335.0 35100 | ||
| 4 | 0 .435.1 54154 | 0 .334.8 55158 | 0 .335.0 35100 | ||
| Prothrombin Timesec. | 416 | 10.010.611.4 | 10.010.611.0 | 10.110.611.0 | F2,413 = 0.23,P = 0.794721 | 
| sex : female | 418 | 0 .990.3139154 | 0 .986.7137158 | 0 .992.5 98106 | χ2 2 = 2.38,P = 0.303872 | 
| Age | 418 | 41.448.155.8 | 42.951.959.0 | 46.053.061.1 | F2,415 = 6.10,P = 0.002451 | 
| spiders : present | 312 | 0 .329.2 45154 | 0 .328.5 45158 | χ2 1 = 0.02,P = 0.885342 | 
It is also capable of producing an index of contents inside a table for traceability.
index(tangram("drug ~ bili + albumin + stage::Categorical + protime + sex + age + spiders", pbc))[1:20,]      key    src                                               value  
 [1,] "NTM3" "tangram:bili:drug[D-penicillamine]:N"            "154"  
 [2,] "OTRl" "tangram:bili:drug[placebo]:N"                    "158"  
 [3,] "ZjNi" "tangram:bili:drug[not randomized]:N"             "106"  
 [4,] "MGNk" "tangram:bili:drug:cell_n1"                       "418"  
 [5,] "MzAx" "tangram:bili:drug[D-penicillamine]:cell_iqr1"    "0.70" 
 [6,] "NzM5" "tangram:bili:drug[D-penicillamine]:cell_iqr2"    "1.30" 
 [7,] "YWE4" "tangram:bili:drug[D-penicillamine]:cell_iqr3"    "3.60" 
 [8,] "M2Yw" "tangram:bili:drug[placebo]:cell_iqr1"            "0.80" 
 [9,] "OGQ4" "tangram:bili:drug[placebo]:cell_iqr2"            "1.40" 
[10,] "Mjg1" "tangram:bili:drug[placebo]:cell_iqr3"            "3.22" 
[11,] "MTAw" "tangram:bili:drug[not randomized]:cell_iqr1"     "0.70" 
[12,] "NTdl" "tangram:bili:drug[not randomized]:cell_iqr2"     "1.40" 
[13,] "OGZi" "tangram:bili:drug[not randomized]:cell_iqr3"     "3.12" 
[14,] "OTU5" "tangram:bili:drug:F"                             "0.03" 
[15,] "NzFm" "tangram:bili:drug:df1"                           "2"    
[16,] "ZjRl" "tangram:bili:drug:df2"                           "415"  
[17,] "MjIz" "tangram:bili:drug:P"                             "0.972"
[18,] "MTY2" "tangram:albumin:drug:cell_n1"                    "418"  
[19,] "Yzlm" "tangram:albumin:drug[D-penicillamine]:cell_iqr1" "3.34" 
[20,] "OGFj" "tangram:albumin:drug[D-penicillamine]:cell_iqr2" "3.54" x <- round(rnorm(375, 79, 10))
y <- round(rnorm(375, 80,  9))
y[rbinom(375, 1, prob=0.05)] <- NA
attr(x, "label") <- "Global score, 3m"
attr(y, "label") <- "Global score, 12m"
html5(tangram(1 ~ x+y,
                    data.frame(x=x, y=y),
                    after=hmisc_intercept_cleanup),
      fragment=TRUE, inline="lancet.css", caption="", id="tbl5")| N | All | |
| Global score, 3m | 375 | 738087 | 
| Global score, 12m | 374 | 737985 | 
The Hmisc default style recognizes 3 types: Categorical, Bionimial, and Numerical. Then for each product of these two, a function is provided to generate the corresponding rows and columns. As mentioned before, the user can declare any type in a formula, and one is not limited to the Hmisc defaults. This is completely customizable, which will be covered later.
Let’s cover the phases of table generations.
drug ~ stage::Categorical, is a Categorical\(\times\)Categorical which references the summarize_chisq for compiling. One can easily specify different compilers for a formula and get very different results inside a formula. Note: the application of multiplication * cannot be done in the previous phase, because this involves semantic meaning of what multiplication means. In one context it might be an interaction, in another simple multiplication. Handling multiplicative terms can be tricky. Once compiling is finished a table object composed of cells (list of lists) which are one of a variety of S3 types is the result.A simple example of using an intercept in a formula, with some post processing to remove undesired columns.
d1 <- iris
d1$A <- d1$Sepal.Length > 5.1
attr(d1$A,"label") <- "Sepal Length > 5.1"
tbl1 <- tangram(
 Species + 1 ~ A + Sepal.Width,
 data = d1,
 after = list(drop_statistics, function(tbl) del_col(tbl, 6))
 )
html5(tbl1,
     fragment=TRUE, inline="nejm.css", caption = "Example All Summary", id="tbl1")| N | setosa | versicolor | virginica | All | |
| 50 | 50 | 50 | 150 | ||
| Sepal Length > 5.1 : TRUE | 150 | 0 .28028.0001450 | 0 .92092.0004650 | 0 .98098.0004950 | 0 .72772.667109150 | 
| Sepal.Width | 150 | 3.193.403.70 | 2.502.803.00 | 2.803.003.20 | 2.803.003.31 | 
The library is designed to be extensible, in the hopes that more useful summary functions can generate results into a wide variety of formats. This is done by the translator functions, which given a row and column from a formula will process the data into a table.
This example shows how to create a function that given a row and column, to construct summary entries for a table.
### Make up some data, which has events nested within an id
n  <- 1000
df <- data.frame(id = sample(1:250, n*3, replace=TRUE), event = as.factor(rep(c("A", "B","C"), n)))
attr(df$id, "label") <- "ID"
### Now create custom function for counting events with a category
summarize_count <- function(table, row, column)
{
  ### Getting Data for row column ast nodes, assuming no factors
  datar <- row$data
  datac <- column$data
  ### Grabbing categories
  col_categories <- levels(datac)
  n_labels <- lapply(col_categories, FUN=function(cat_name){
    x <- datar[datac == cat_name]
    cell_n(length(unique(x)), subcol=cat_name)
  })
  # Test a poisson model
  test <- aov(glm(x ~ treatment,
                  aggregate(datar, by=list(id=datar, treatment=datac), FUN=length),
                  family=poisson))
  # Build the table
  table                                              %>%
  # Create Headers
  row_header(derive_label(row))                      %>%
  col_header("N", col_categories, "Test Statistic")  %>%
  col_header("",  n_labels,       ""              )  %>%
  # Add the First column of summary data as an N value
  add_col(cell_n(length(unique(datar))))             %>%
  # Now add quantiles for the counts
  table_builder_apply(col_categories, FUN=
    function(tbl, cat_name) {
      # Compute each data set
      x  <- datar[datac == cat_name]
      xx <- aggregate(x, by=list(x), FUN=length)$x
      # Add a column that is a quantile
      add_col(tbl, cell_iqr(xx, row$format, na.rm=TRUE))
  })                                                 %>%
  # Now add a statistical test for the final column
  add_col(test)
}
tangram(event ~ id["%1.0f"], df, summarize_count)=============================================================
      N       A        B        C         Test Statistic     
             244      245      246                           
-------------------------------------------------------------
ID  N=250  3 *4* 5  3 *4* 5  3 *4* 5  F_{2,732}=0.02, P=0.982
=============================================================