---
title: "Accessing Monoclonal Antibody Data"
author:
- Ju Yeong Kim
- Jason Taylor
date: "2022-06-15"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Accessing Monoclonal Antibody Data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

<style>
table {
    max-width: 100%;
    max-height: 600px;
    overflow: scroll;
}
thead{
    position: sticky;
    top: 0px;
    background-color: #fff;
}
</style>



## Workflow overview

Using the DataSpace [app](https://dataspace.cavd.org/cds/CAVD/app.view#mabgrid), the workflow of using the mAb grid is the following:

1. Navigate to the mAb Grid and browse the available mAb mixtures
2. Select the mAb mixtures that you'd like to investigate
3. Or filter rows by using columns:
    - mAb/Mixture
    - donor species
    - isotype
    - HXB2 location
    - tiers
    - clades
    - viruses
4. Click "Neutralization Curves" or "IC50 Titer Heatmap" to visualize the mAb data
5. Click "Export CSV" or "Export Excel" to download the mAb data

`DataSpaceR` offers a similar interface:

1. Browse the mAb Grid by `con$mabGridSummary`
2. Select the mAb mixtures by filtering the mAb grid using any columns found in `con$mabGrid` using `con$filterMabGrid()`
3. Use `con$getMab()` to retrieve the mAb data


## Browse the mAb Grid

You can browse the mAb Grid by calling the `mabGridSummary` field in the connection object:


```r
library(DataSpaceR)
con <- connectDS()

knitr::kable(head(con$mabGridSummary))
```



|mab_mixture    |donor_species |isotype |hxb2_location | n_viruses| n_clades| n_tiers| geometric_mean_curve_ic50| n_studies|
|:--------------|:-------------|:-------|:-------------|---------:|--------:|-------:|-------------------------:|---------:|
|10-1074        |human         |IgG     |Env           |         7|        3|       2|                 0.0213723|         1|
|10E8           |human         |IgG3    |gp160         |       227|       11|       7|                 0.4843333|         2|
|10E8 V2.0      |human         |        |              |        28|        3|       1|                 0.0031350|         1|
|10E8 V2.0/iMab |human         |        |gp160         |        13|        8|       2|                 0.0462897|         1|
|10E8 V4.0      |human         |        |              |        28|        3|       1|                 0.0024094|         1|
|10E8 V4.0/iMab |human         |        |              |       119|       12|       6|                 0.0015396|         1|

This table is designed to mimic the mAb grid found in the app.

One can also access the unsummarized data from the mAb grid by calling `con$mabGrid`.

## Filter the mAb grid

You can filter rows in the grid by specifying the values to keep in the columns found in the field `con$mabGrid`: `mab_mixture`, `donor_species`, `isotype`, `hxb2_location`, `tiers`, `clades`, `viruses`, and `studies`. `filterMabGrid` takes the column and the values and filters the underlying tables (private fields), and when you call the `mabGridSummary` or (which is actually an [active binding](https://r6.r-lib.org/articles/Introduction.html#active-bindings)), it returns the filtered grid with updated `n_` columns and `geometric_mean_curve_ic50`.


```r
# filter the grid by viruses
con$filterMabGrid(using = "virus", value = c("242-14", "Q23.17", "6535.3", "BaL.26", "DJ263.8"))

# filter the grid by donor species (llama)
con$filterMabGrid(using = "donor_species", value = "llama")

# check the updated grid
knitr::kable(con$mabGridSummary)
```



|mab_mixture |donor_species |isotype |hxb2_location | n_viruses| n_clades| n_tiers| geometric_mean_curve_ic50| n_studies|
|:-----------|:-------------|:-------|:-------------|---------:|--------:|-------:|-------------------------:|---------:|
|11F1B       |llama         |        |              |         4|        2|       1|                        NA|         1|
|11F1F       |llama         |        |              |         4|        2|       1|                26.2961178|         1|
|1H9         |llama         |        |Env           |         4|        2|       1|                 5.0898322|         1|
|2B4F        |llama         |        |              |         4|        2|       1|                 1.5242288|         1|
|2H10        |llama         |        |              |         2|        2|       1|                        NA|         1|
|2H10/W100A  |llama         |        |              |         2|        2|       1|                        NA|         1|
|3E3         |llama         |        |gp160         |         3|        3|       1|                 0.9944945|         1|
|4H73        |llama         |        |              |         4|        2|       1|                        NA|         1|
|5B10D       |llama         |        |              |         4|        2|       1|                        NA|         1|
|9B6B        |llama         |        |              |         4|        2|       1|                24.3643637|         1|
|A14         |llama         |        |gp160         |         3|        3|       1|                 1.8444582|         1|
|B21         |llama         |        |gp160         |         3|        3|       1|                 0.0936399|         1|
|B9          |llama         |        |gp160         |         3|        3|       1|                 0.0386986|         1|
|LAB5        |llama         |        |              |         4|        2|       1|                        NA|         1|

Or we can use method chaining to call multiple filter methods and browse the grid. Method chaining is unique to R6 objects and related to the pipe. See Hadley Wickham's [Advanced R](https://adv-r.hadley.nz/r6.html) for more info.


```r
con$resetMabGrid()
con$
  filterMabGrid(using = "virus", value = c("242-14", "Q23.17", "6535.3", "BaL.26", "DJ263.8"))$
  filterMabGrid(using = "donor_species", value = "llama")$
  mabGridSummary
```

## Retrieve column values from the mAb grid

You can retrieve values from the grid by `mab_mixture`, `donor_species`, `isotype`, `hxb2_location`, `tier`, `clade`, `virus`, and `studies`, or any variables found in the `mabGrid` field in the connection object via `data.table` operations.


```r
# retrieve available viruses in the filtered grid
con$mabGrid[, unique(virus)]
#> [1] "6535.3"  "Q23.17"  "DJ263.8" "BaL.26"  "242-14"

# retrieve available clades for 1H9 mAb mixture in the filtered grid
con$mabGrid[mab_mixture == "1H9", unique(clade)]
#> [1] "CRF02_AG" "B"
```


## Create a DataSpaceMab object

After filtering the grid, you can create a DataSpaceMab object that contains the filtered mAb data.


```r
mab <- con$getMab()
mab
#> <DataSpaceMab>
#>   URL: https://dataspace.cavd.org
#>   User: jmtaylor@scharp.org
#>   Summary:
#>     - 3 studies
#>     - 14 mAb mixtures
#>     - 1 neutralization tiers
#>     - 3 clades
#>   Filters:
#>     - virus: 242-14, Q23.17, 6535.3, BaL.26, DJ263.8
#>     - mab_donor_species: llama
```

There are 6 public fields available in the `DataSpaceMab` object: `studyAndMabs`, `mabs`, `nabMab`, `studies`, `assays`, and `variableDefinitions`, and they are equivalent to the sheets in the excel file or the csv files you would download from the app via "Export Excel"/"Export CSV".


```r
knitr::kable(con$mabGridSummary)
```



|mab_mixture |donor_species |isotype |hxb2_location | n_viruses| n_clades| n_tiers| geometric_mean_curve_ic50| n_studies|
|:-----------|:-------------|:-------|:-------------|---------:|--------:|-------:|-------------------------:|---------:|
|11F1B       |llama         |        |              |         4|        2|       1|                        NA|         1|
|11F1F       |llama         |        |              |         4|        2|       1|                26.2961178|         1|
|1H9         |llama         |        |Env           |         4|        2|       1|                 5.0898322|         1|
|2B4F        |llama         |        |              |         4|        2|       1|                 1.5242288|         1|
|2H10        |llama         |        |              |         2|        2|       1|                        NA|         1|
|2H10/W100A  |llama         |        |              |         2|        2|       1|                        NA|         1|
|3E3         |llama         |        |gp160         |         3|        3|       1|                 0.9944945|         1|
|4H73        |llama         |        |              |         4|        2|       1|                        NA|         1|
|5B10D       |llama         |        |              |         4|        2|       1|                        NA|         1|
|9B6B        |llama         |        |              |         4|        2|       1|                24.3643637|         1|
|A14         |llama         |        |gp160         |         3|        3|       1|                 1.8444582|         1|
|B21         |llama         |        |gp160         |         3|        3|       1|                 0.0936399|         1|
|B9          |llama         |        |gp160         |         3|        3|       1|                 0.0386986|         1|
|LAB5        |llama         |        |              |         4|        2|       1|                        NA|         1|

## View metadata concerning the mAb object

There are several metadata fields that can be exported in the mAb object.


```r
names(mab)
#>  [1] ".__enclos_env__"     "variableDefinitions" "assays"             
#>  [4] "studies"             "nabMab"              "mabs"               
#>  [7] "studyAndMabs"        "config"              "clone"              
#> [10] "getLanlMetadata"     "refresh"             "print"              
#> [13] "initialize"
```

DataSpaceR can also fetch and add metadata associated with downloaded mAbs via the `getLanlMetadata` method that is associated with the `DataSpaceMab` object.


```r
mab$getLanlMetadata()
```

The LANL metadata can now be found at the `mabs$lanl_metadata` variable. This is a list column and its structure can very depending on what data LANL has collected.
    

```r
mab$mabs[mab_name_std == "B9"]$lanl_metadata
#> [[1]]
#> [[1]]$epitopes
#>    accession alt_names      binding_type               cite   country dis
#> 1:        NA        NA <data.table[1x2]> <data.table[1x25]> <list[0]>  NA
#>    disrange donor epitope epitope_name hxb2_contig hxb2loc2end hxb2loc2start
#> 1:       NA    NA      NA           NA          NA          NA            NA
#>    hxb2locend hxb2locstart hxb2protein hxb2protein_id   id         immunogen
#> 1:         NA           NA       gp160             18 3219 <data.table[1x2]>
#>    in_catnap in_feature_db is_adcc   isotype           keyword mab_name
#> 1:      TRUE         FALSE      NA <list[0]> <data.table[4x2]>       B9
#>             modifydate neutralizing              note origlocend origlocstart
#> 1: 2018-06-01 13:59:55            L <data.table[2x4]>         NA           NA
#>    origprotein origprotein_id   patient           species strain subprotein
#> 1:          NA              1 <list[0]> <data.table[1x2]>     NA         NA
#>    subprotein_id           subtype table total_cite_count total_note_count
#> 1:             1 <data.table[2x2]>    ab                1                2
#>    vaccine_adjuvant vaccine_component    vaccine_strain      vaccine_type
#> 1:        <list[0]> <data.table[1x2]> <data.table[2x2]> <data.table[1x2]>
#> 
#> [[1]]$params
#> [[1]]$params$id
#> [1] "3219"
#> 
#> [[1]]$params$table
#> [1] "ab"
#> 
#> 
#> [[1]]$timestamp
#> [1] "2022-04-01 17:01:42z"
#> 
#> [[1]]$source
#> [1] "https://www.hiv.lanl.gov/mojo/immunology/api/v1/epitope/ab?id=3219"
```

## Session information


```r
sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C             
#>  [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8    
#>  [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8   
#>  [7] LC_PAPER=en_US.utf8       LC_NAME=C                
#>  [9] LC_ADDRESS=C              LC_TELEPHONE=C           
#> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] data.table_1.14.2 DataSpaceR_0.7.5  knitr_1.37       
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.8       digest_0.6.29    assertthat_0.2.1 R6_2.5.1        
#>  [5] jsonlite_1.8.0   magrittr_2.0.2   evaluate_0.15    highr_0.9       
#>  [9] httr_1.4.2       stringi_1.7.6    curl_4.3.2       tools_4.1.2     
#> [13] stringr_1.4.0    Rlabkey_2.8.3    xfun_0.29        compiler_4.1.2
```