---
title: "Pre-processing of clinical data for clinical data review report"
author: "Laure Cougnaud, Michela Pasetto"
date: "`r format(Sys.Date(), '%B %d, %Y')`"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 5
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Pre-processing of clinical data for clinical data review report}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r options, echo = FALSE, message = FALSE}

library(knitr)
opts_chunk$set(
    echo = TRUE, 
    results = 'asis', 
    warning = FALSE, 
    error = FALSE, message = FALSE, cache = FALSE,
    fig.width = 8, fig.height = 7,
    fig.path = "./figures_vignette/",
    fig.align = 'center')

heightLineIn  <- 0.2
```

This vignette shows functionalities used for annotating and filtering the data
within the `clinDataReview` package.

Utility functions to automate standard pre-processing steps of the data are
available in the package.

Note that these functions are mainly useful in combination with the
specification of the parameters in 'config' file in the clinical data
reports (see the
dedicated reporting
[vignette](../doc/clinDataReview-reporting.html)).

For this vignette, we will use example data available in the `clinUtils`
package.

```{r loadPackages, message = FALSE}

library(clinDataReview)

```

# Data format

The input dataset for the clinical data review should be a `data.frame`
with clinical data. Such data is typically imported
from _SAS_ data file or _xpt_ data file.  
Such dataset can be imported for multiple files at once via the 
`clinUtils::loadDataADaMSDTM` function.

The label of the variables stored in the
`SAS` datasets is also used for title/captions. 

A few `ADaM` datasets are included in the `clinUtils` package for the
demonstration, via the dataset `dataADaMCDISCP01` and corresponding variable
labels.

```{r loadData}	

library(clinUtils)

data(dataADaMCDISCP01)
labelVars <- attr(dataADaMCDISCP01, "labelVars")

dataLB <- dataADaMCDISCP01$ADLBC
dataDM <- dataADaMCDISCP01$ADSL
dataAE <- dataADaMCDISCP01$ADAE

```

# Annotate data

The `annotateData` enables to add metadata for a specific domain/dataset.

```{r annotateData, message = TRUE, warning = TRUE}

dataLBAnnot <- annotateData(
    data = dataLB, 
    annotations = list(data = dataDM, vars = c("ETHNIC", "ARM")), 
    verbose = TRUE
)
knitr::kable(
    head(dataLBAnnot), 
    caption = paste("Laboratory parameters annotated with",
        "demographics information with the `annotatedData` function"
    )
)
```

# Filter data

The `filterData` enables to filter a dataset.

```{r filterData, message = TRUE, warning = TRUE}

dataLBAnnotTreatment <- filterData(
    data = dataLBAnnot, 
    filters = list(var = "ARM", value = "Placebo", rev = TRUE), 
    verbose = TRUE
)
knitr::kable(
    unique(dataLBAnnotTreatment[, c("USUBJID", "ARM")]), 
    caption = paste("Subset of laboratory parameters filtered",
        "with placebo patients"
    )
)
```

# Transform data

The `transformData` enables to convert data to a different format.

For example, the laboratory data is converted from a long format, containing one
record per endpoint * visit * subject to a wide format containing one record per
visit * subject. The endpoints are included in different columns.

```{r transformData, message = TRUE, warning = TRUE}

eDishData <- transformData(
    data = subset(dataLB, PARAMCD %in% c("ALT", "BILI")),
    transformations = list(
        type = "pivot_wider",
        varsID = c("USUBJID", "VISIT"), 
        varsValue = c("LBSTRESN", "LBNRIND"),
        varPivot = "PARAMCD"
    ),
    verbose = TRUE,
    labelVars = labelVars
)
knitr::kable(head(eDishData))

```

# Process data

The `processData` function executes all the pre-processing steps described in
the previous section at once.

```{r processData}

dataLBAnnotTreatment2 <- processData(
    data = dataLB,
    processing = list(
        list(annotate = list(data = dataDM, vars = c("ETHNIC", "ARM"))),
        list(filter = list(var = "ARM", value = "Placebo", rev = TRUE))
    ),
    verbose = TRUE
)

identical(dataLBAnnotTreatment, dataLBAnnotTreatment2)

```

# Appendix

## Session info

```{r sessionInfo, echo = FALSE}

print(sessionInfo())

```