---
title: "Phylogenetic trees from morphological data"
author:
- name: Iris Bardel-Kahr, Klaus Schliep
  affiliation: University of Graz, Graz University of Technology
  email: klaus.schliep@gmail.com
date: "`r Sys.Date()`"
bibliography: phangorn.bib
output: rmarkdown::html_vignette
vignette: |
  %\VignetteIndexEntry{Phylogenetic trees from morphological data}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r setup, echo=FALSE}
# set global chunk options: images will be bigger
knitr::opts_chunk$set(fig.width=6, fig.height=4)
options(digits = 2)
```

In this vignette, we will show how to work with morphological data in _phangorn_ [@Schliep2011]. In most cases the different morphological characters or character states are encoded with the numbers 0:9 (or less, if there are less differences).
Morphological data can come in different formats. The most common ones are **.csv** and **.nexus**.

# Load packages

We start by loading the _phangorn_ package and setting a random seed:
```{r load packages}
library(phangorn)
set.seed(9)
```

# Load data

The dataset we're using contains morphological data for 12 mite species, with 79 encoded characters [@schaffer2010phylogenetic]. 
When reading in the _.csv_ file, `row.names = 1` uses the first column (species) as row names. To get a `phyDat` object, we have to convert the dataframe into a matrix with `as.matrix`. 
``` {r load data}
fdir <- system.file("extdata", package = "phangorn")
mm <- read.csv(file.path(fdir, "mites.csv"), row.names = 1)
mm_pd <- phyDat(as.matrix(mm), type = "USER", levels = 0:7)
```
The data can then be written into a _nexus_ file:
```{r write_nexus, eval=FALSE}
write.phyDat(mm_pd, file.path(fdir, "mites.nex"), format = "nexus")
```
Reading in a _nexus_ file is even easier than reading in a _csv_ file:
```{r, read_nexus}
mm_pd <- read.phyDat(file.path(fdir, "mites.nex"), format = "nexus", type = "STANDARD")
```
After reading in the _nexus_ file, we have the states 0:9, but the data only has the states 0:7. Here is one possibility to change the contrast matrix:
``` {r contrast_matrix}
contrast <- matrix(data = c(1,0,0,0,0,0,0,0,0,
    0,1,0,0,0,0,0,0,0,
    0,0,1,0,0,0,0,0,0,
    0,0,0,1,0,0,0,0,0,
    0,0,0,0,1,0,0,0,0,
    0,0,0,0,0,1,0,0,0,
    0,0,0,0,0,0,1,0,0,
    0,0,0,0,0,0,0,1,0,
    0,0,0,0,0,0,0,0,1,
    1,1,1,1,1,1,1,1,1),
    ncol = 9, byrow = TRUE)
dimnames(contrast) <- list(c(0:7,"-","?"),
    c(0:7, "-"))
contrast
mm_pd <- phyDat(mm_pd, type="USER", contrast=contrast)
```
Now that we have our data, we can start the analyses.

# Parsimony

For morphological data, one of the most frequently used approaches to conduct phylogenetic trees is maximum parsimony (MP). `pratchet` (as already described in _Estimating phylogenetic trees with phangorn_) implements the parsimony ratchet [@Nixon1999]. To create a starting tree, we can use the function `random.addition`:
```{r random addition}
mm_start <- random.addition(mm_pd)
```
This tree can then be given to `pratchet`:
```{r pratchet, cache=TRUE}
mm_tree <- pratchet(mm_pd, start = mm_start, minit = 1000, maxit = 10000,
                    all = TRUE, trace = 0)
mm_tree
```
With `all=TRUE` we get all (in this case 19) trees with lowest parsimony score in a `multiPhylo` object. Since we we did a minimum of 1000 iterations, we already have some edge support. Now we can assign the edge lengths.
```{r edge lengths}
mm_tree <- acctran(mm_tree, mm_pd)
```

## Branch and bound

In the case of our mites-dataset with 12 sequences, it's also possible to use the branch and bound algorithm [@Hendy1982] to find all most parsimonious trees. With bigger datasets it is definitely recommended to use `pratchet`.

``` {r bab}
mm_bab <- bab(mm_pd, trace = 0)
mm_bab
```

## Root trees

If we want our unrooted trees to be rooted, we have the possibility to use `midpoint` to perform midpoint rooting. Rooting the trees with a specific species (we chose _C. cymba_ here) can be done with the function `root` from the _ape_ package [@Paradis2018]. To save the correct node labels (edge support), it's important to set `edgelabel=TRUE`.

``` {r root trees, message=FALSE}
mm_tree_rooted <- root(mm_tree, outgroup = "C._cymba", resolve.root = TRUE,
                       edgelabel = TRUE)
```

## Plot trees

<!-- With `plotBS`, we can either plot all of the trees with their respective edge support, or we can subset to only get a certain tree.

# plot all trees
plotBS(mm_tree_rooted, digits = 2)
-->
With `plotBS` we plot a tree with their respective edge support. It is also possible to save the plots as _.pdf_ (or various other formats, e.g. svg, png, tiff) file. `digits` is an argument to determine the number of digits shown for the bootstrap values.
```{r plot_trees, eval=FALSE}
# subsetting for tree nr. 9
plotBS(mm_tree_rooted[[9]], digits = 2)

# save plot as pdf
pdf(file = "mm_rooted.pdf")
plotBS(mm_tree_rooted[[9]], digits = 2)
dev.off()
```

## Consensus tree

To look at the consensus tree of our 19 trees from `pratchet`, or of our 37 most parsimonious trees from `bab`, we can use the `consensus` function from _ape_.
```{r consensus tree}
# unrooted pratchet tree
mm_cons <- consensus(mm_tree)

# rooted pratchet tree
mm_cons_root <- consensus(mm_tree_rooted, rooted = TRUE)

# branch and bound, we root the consensus tree in the same step
mm_bab_cons <- root(consensus(mm_bab), outgroup = "C._cymba",
                    resolve.root = TRUE, edgelabel = TRUE)
```
```{r plot_cons_tree, fig.cap="Unrooted and rooted consensus trees of the mites dataset with MP.", fig.show="hold", out.width="33%"}
plot(mm_cons, main="Unrooted pratchet consensus tree")
plot(mm_cons_root, main="Rooted pratchet consensus tree")
plot(mm_bab_cons, main="Rooted bab consensus tree")
```
We can clearly see that, as expected, the two rooted trees have the same topology.

# Session info

```{r sessionInfo, echo=FALSE}
sessionInfo()
```

# References