yaml --- title: "letsRept: An Interface to the Reptile Database" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{letsRept} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This package was developed to facilitate the processes of reptile nomenclature update based on a search for species synonyms according to [The Reptile Database](https://reptile-database.reptarium.cz) website (Uetz et al., 2025). Currently, the package accesses many species information from the Reptile Database using R interface. It is useful to people trying to match databases from different sources (IUCN, species traits database, etc), or trying to get summaries from a given higher taxa or region (e.g.: Snakes from Brazil). But it can also just print single species information directly in R, optionally storing this information in a list. Any feedback, suggestion or request are welcome! ### **Download** The package is available on CRAN, check if the installed version is 1.0.0 or above. If not, the development version can be installed from GitHub. To install the stable version of this package, users must run: ```{.r} # Install CRAN version (check package version, must be 1.0.0 or above) install.packages("letsRept") #OR # Install development version from GitHub # install.packages("devtools") devtools::install_github("joao-svalencar/letsRept") library(letsRept) ``` ### **List of functions and examples** **NOTE:** Some functions use parallel processing by default. Parallel processing may affect system performance. Adjust the number of cores according to your needs. A snippet showing how to access the number of available cores in your computer is provided along with the functions examples. #### **Function `reptSearch`:** Retrieves species information from The Reptile Database using a binomial name. ```{.r} # single species: reptSearch(binomial = "Apostolepis adhara") # If user wants to check the list of references related to species, as listed in RDB: reptSearch(binomial = "Bothrops pauloensis", getRef=TRUE) ``` `reptSearch()` supports synonym-based queries. If the provided binomial does not match any currently valid species name, the function automatically passes the query to `reptAdvancedSearch(synonym = binomial)`. In such cases, if the synonym can be unambiguously resolved to a valid species, the function will return the corresponding species information. Otherwise, it provides a link (which can be accessed using `reptSpecies(url = link)`) to a list of all species that include the queried synonym in their synonymy. #### **Function `reptRefs`:** Retrieves the reference list for a given species from the Reptile Database. By default, it returns both the citation and a link to the reference source (if available). If the user sets `getLink = FALSE`, the function returns only the plain-text references. This function is useful to extract citation metadata for taxonomic or historical analyses. ```{.r} # Basic usage reptRefs("Boa constrictor") # If you want only reference texts, without links: reptRefs("Boa constrictor", getLink = FALSE) ``` #### **Function `reptAdvancedSearch`:** Creates a link for a page as derived from an Advanced Search in RD (multiple species in a page): ```{.r} # create multiple species link: link_boa <- reptAdvancedSearch(genus = "Boa", exact = TRUE) #returns a link to a list of all Boa species. link_apo <- reptAdvancedSearch(genus = "Apostolepis") #returns a link to a list of all Apostolepis species link <- reptAdvancedSearch(higher = "snakes", location = "Brazil") #returns a link to a list of all snake species in Brazil link <- reptAdvancedSearch(year = "2010 OR 2011 OR 2012") #returns a link to a list of all species described from 2010 to 2012 ``` `reptAdvancedSearch(synonym = "example")` will return the species information directly (via `reptSearch("example")`) if the synonym uniquely matches a single valid species. The `exact = TRUE` argument in the *Boa* example parses the information strictly as "Boa", avoiding to return genus like *Pseudoboa*, or *Boaedon*. ⚠️ **Note:** The argument `exact` does not work properly for searches using logical arguments (e.g. `AND/OR`). If you want to force an exact match (e.g., "*Boa*" as a phrase) with multiple terms (e.g., "*Boa* OR *Apostolepis*"), you must manually include quotes in the input string, e.g., "\"*Boa*\" OR *Apostolepis*". #### **Function `reptSpecies`:** Retrieve species data from the species link created by `reptAdvancedSearch`. User can also copy and paste the url from a RD webpage containing the list of species derived from an advanced search. It includes higher taxa information, authors, year of description, and the species url. By default this function uses parallel processing, with default set to use half of the available cores. If user wants to use just one core, there are options for backup saving based on the `checkpoint` argument. ```{.r} #check number of available cores: install.packages("parallel") parallel::detectCores() # sample multiple species data: #Returns higher taxa information and species url: boa <- reptSpecies(link_boa, taxonomicInfo = TRUE, getLink = TRUE) #link from reptAdvancedSearch(genus = "Boa") #example 2 apo <- reptSpecies(link_apo, taxonomicInfo = TRUE, getLink = TRUE) #link from reptAdvancedSearch(genus = "Apostolepis") #Returns only species url - Faster and recommended for large datasets: boa <- reptSpecies(link_boa, taxonomicInfo = FALSE, getLink = TRUE) #link from reptAdvancedSearch(genus = "Boa") #example 2 apo <- reptSpecies(link_apo, taxonomicInfo = FALSE, getLink = TRUE) #link from reptAdvancedSearch(genus = "Apostolepis") #With checkpoint and backups (only without parallel sampling: a lot slower but safer): path <- "path/to/save/backup_file.rds" #not to run just example apo <- reptSpecies(link_apo, taxonomicInfo = FALSE, getLink = TRUE, checkpoint=6, backup_file=path) #link from reptAdvancedSearch(genus = "Apostolepis") ``` ⚠️ **Note:** All console messages, warnings, and progress updates can be silenced. However, `reptSpecies()` issues a helpful warning if any species information fails to be retrieved, along with instructions on how to access those entries from the function output. #### **Function `reptStats``:** This function summarizes the taxonomic content of a species list, typically an object created with `reptSpecies()` with higher taxa information. If no object is provided it summarizes the internal dataset `allReptiles`. ```{.r} # summary of internal database allReptiles reptStats() # filter by family and return summary table (internal database) reptStats(family = "Elapidae") reptStats(family = "Elapidae", verbose = TRUE) # creating a subset from RDB link <- reptAdvancedSearch(higher="snakes", location = "Brazil") snakes_br <- reptSpecies(link, taxonomicInfo = TRUE) # summarizing subset reptStats(snakes_br) reptStats(snakes_br, family = "Viperidae", verbose = TRUE) ``` #### **Function `reptSynonyms`:** Samples species synonyms either using binomial or a data frame with species names and the species link (e.g.: the result of `reptSpecies(link, getLink=TRUE)`). By default this function uses parallel processing, with default set to use half of the available cores. ```{.r} #check number of available cores: install.packages("parallel") parallel::detectCores() # sample species synonyms boa_syn <- reptSynonyms(boa) # using data frame created, with reptSpecies(boa_link, getLink = TRUE) Bconstrictor_syn <- reptSynonyms(x = "Boa constrictor") # using species binomial #example 2 apo_syn <- reptSynonyms(apo) ``` ⚠️ **Note:** All console messages, warnings, and progress updates can be silenced. However, `reptSynonyms()` issues a helpful warning if any species information fails to be retrieved, along with instructions on how to access those entries from the original data frame. The complex `regex` pattern used to sample synonyms from The Reptile Database is quite efficient, but still samples about 0.2% in an incorrect format. Most cases represent unusual nomenclature so users might not face any problems trying to match current valid names. In any case, I fixed (potentially) all unusual synonym formats in the internal dataset `allSynonyms` (last update: 23rd May, 2025) #### **Function `reptCompare`:** Compares the nomenclature of a user provided vector of species (`x`) with the RDB nomenclature. User may provide vector of current names sampled from RDB with reptSpecies(), otherwise the names in `x` will be compared with those in the internal dataset `allReptiles` (the dataset documentation to see the RDB version (current is May, 2025). The function returns species that are either unmatched ("review") or matched with the RDB list, or both. ```{.r} my_species <- data.frame(species = c("Boa constrictor", "Pantherophis guttatus", "Fake species")) reptCompare(my_species) reptCompare(my_species, filter = "review") reptCompare(my_species, filter = "matched") ``` Species that requires nomenclature review are usually queried to `reptSync()`, while matched species is ideally queried to `reptSplitCheck` with a reference date parsed to the argument `pubDate`. #### **Function `reptSync`:** Initially inspired in function aswSync from package [AmphiNom](https://github.com/hcliedtke/AmphiNom) (Liedtke, 2018). This is the most recursive function of the package, using all the previous functions in order to provide the most likely updated nomenclature for the queried species. By default this function uses parallel processing, with default set to use half of the available cores. The function is divided in two main steps. Here is how it works: **Step 1** The function queries a vector of species (e.g.: IUCN, or a regional list), check their validity through `reptSearch` and returns a data frame with current valid species names. When `reptSearch` finds a species page it assumes that is the valid name for the queried species and returns the status "up_to_date". When `reptSearch` doesn't find a species it parses the binomial to `reptAdvancedSearch` using the synonym filter. If `reptAvancedSearch` returns a link for a species page that species name is considered valid for the synonym queried and the function returns the status `"updated"`. Otherwise, `reptAvancedSearch` will return a link for a page with a list of species, then the function assumes that the queried synonym could be assigned to any of those valid names and returns the status: `"ambiguous"`. If the queried species does not return a species page nor a page for multiple species the function returns to columns `"RDB"` and `"status"` the sentence `"not_found"`. **Step 2** Step 2 is activated only if `solveAmbiguity = TRUE`. When `reptAvancedSearch` returns a link for a page with a list of species, that link is parsed to `reptSpecies` which collects species names and `urls` and automatically parses the resulting data frame to `reptSynonyms`. Finally, with the result of `reptSynonyms` the function compares the queried species with all listed synonyms. If the queried species is actually listed as a synonym of only one of the searched species (e.g. the queried name is not a synonym, but is mentioned in the comments section), the function will return that valid name and status will be `"updated"`. If the queried species is actually a synonym of more than one valid species, then the function will return both species names and the status will still be `"ambiguous"`. ```{.r} #check number of available cores: install.packages("parallel") parallel::detectCores() # comparing synonyms: query <- c("Vieira-Alencar authoristicus", "Boa atlantica", "Boa diviniloqua", "Boa imperator", "Boa constrictor longicauda") reptSync(query) #example 2: query <- c("Vieira-Alencar authorisensis", "Apostolepis ambiniger", "Apostolepis cerradoensis", "Elapomorphus assimilis", "Apostolepis tertulianobeui", "Apostolepis goiasensis") reptSync(query) ``` The column `"RDB"` shows current valid name according to The Reptile Database. Pay special attention to the `"status"` column: **status:** - `"up_to_date"` - Species name provided is the current valid name found in The Reptile Database - `"updated"` - Species name provided is a synonym, and the current valid name is reported unambiguously. - `"ambiguous"` - Species name provided is considered a synonym of more than one current valid species, likely from a split in taxonomy. - `"not_found"` - Species name provided in query is not a current valid name nor synonym according to The Reptile Database. This status could be derived from a typo within species name. A very rare situation where this status can pop up is when the current valid name is updated in the query but is not found in the list of synonyms. - `"synonymization"` - Multiple names in the query are now considered synonyms of a single valid species, likely from a synonymization. ⚠️ **ATTENTION!**⚠️ `letsRept` does not make authoritative taxonomic decisions. It matches input names against currently accepted names in the Reptile Database (RDB). A name marked as `"up_to_date"` may still refer to a taxon that has been split, and thus may not reflect the most recent population-level taxonomy, see function `reptSplitCheck` below. #### **Function `reptSplitCheck`:** Species names in recent databases are most likely marked with the status `"up_to_date"`. However, the function `reptSync` only indicates whether the queried binomial is currently valid. In some cases, a species may have undergone a taxonomic split, where part of its original populations retains the name while others have been described as new species. `reptSync` does not account for such cases, so it is recommended to review all `"up_to_date"` species for potential taxonomic splits. To assist with this, the function `reptSplitCheck` queries binomial names as synonyms using `reptAdvancedSearch`, and checks whether any associated species were described after a user-defined date (e.g., the publication date of the dataset being used). By default this function uses parallel processing, with default set to use half of the available cores. ```{.r} query <- c("Atractus dapsilis", "Atractus trefauti", "Atractus snethlageae", "Tantilla melanocephala", "Oxybelis aeneus", "Oxybelis rutherfordi") reptSplitCheck(query, pubDate = 2019) # pubDate of Nogueira et al., Atlas of Brazilian Snakes reptSplitCheck(query, pubDate = 2019, includeAll = TRUE) ``` Pay special attention to the `"status"` column: **status:** - `"up_to_date"` – The species name provided is not a synonym of any species described after `pubDate`. - `"check_split"` – The species name provided is a synonym of at least one valid species described in or after `pubDate`, suggesting a possible taxonomic split. - `"not_found"` - This is a temporary status to detect species that are not listed as synonyms of themselves, so it does not appear in the advanced search (e.g.: to correct in RDB). **OBS:** With argument `includeAll` set to default (`FALSE`), if the queried species is a synonym of only species described in the same year `pubDate` that are already included in the queried species list, the queried species will receive the status `"up_to_date"`. To check every possible taxonomic split regardless of whether the new species is already present in the queried species list, change the argument value to `TRUE`. **Attention:** The column `"RDB"` shows only the current valid names of species described in, or after, `pubDate` according to The Reptile Database. It will NOT show the queried species, but be aware that `reptSplitCheck` is for valid names only (e.g.: "matched", from `reptCompare`). Therefore, the `"check_split"` status is only to highlight that users must check if the data related to the queried species could be used (e.g.: species traits) or divided (e.g.: distribution) to the `"RDB"` column species. In other words, in the example above, it does not mean that *Tantilla melanocephala* current valid name is *T. selmae*. It means that users must ensure that the use of the queried nomenclature will not overlook the possibility that the data, or part of it, should be assigned to *T. selmae* instead. #### **Function `reptTidySyn`:** This function was developed exclusively to improve the visualization of `reptSync` and `reptSplitCheck` outcomes. Queried species with many current valid names would break the data frame visualization in the R console. `reptTidySyn` stacks current valid names and improves data visualization. Moreover, the argument `filter`, allows users to filter the printed data frame by "status" so users can focus only in the status that they want to evaluate. ```{.r} query <- c("Vieira-Alencar authorisensis", "Apostolepis ambiniger", "Apostolepis cerradoensis", "Elapomorphus assimilis", "Apostolepis tertulianobeui", "Apostolepis goiasensis") df <- reptSync(query) reptTidySyn(df) ``` ### **Internal datasets** - The package counts with a full list of current valid species (`allReptiles` - 12,440 species) with their respective higher taxa information (updated to May 23rd, 2025); - A dataset with all unique synonyms for each current valid species (`allSynonyms` - 53,159 entries - updated to May 23rd, 2025); - Another synonyms dataset with all entries considering their respective references (`allSynonymsRef` - 110,413 entries - updated to May 23rd, 2025). ### **How to Cite** To get the full reference to cite this package in publications, run to get the most up to date version reference: ```{.r} citation("letsRept") ``` ⚠️ **Important note** `letsRept` retrieves valuable taxonomic and synonymy data directly from [The Reptile Database](http://www.reptile-database.org). When citing this package, please also cite the original database as a data source. ### **References** Liedtke, H. C. (2018). AmphiNom: an amphibian systematic tool. Systematics and Biodiversity, 17(1) 1-6. https://doi.org/10.1080/14772000.2018.1518935 Nogueira, C. C., Argôlo, A. J. S., Arzamendia, V., Azevedo, J. A. R., Barbo, F. E., Bérnils, R. S., … & Martins, M. (2019). Atlas of Brazilian Snakes: Verified Point-Locality Maps to Mitigate the Wallacean Shortfall in a Megadiverse Snake Fauna. South American Journal of Herpetology, 14(sp1), 1–274. http://dx.doi.org/10.2994/sajh-d-19-00120.1 Uetz, P., Freed, P, Aguilar, R., Reyes, F., Kudera, J. & Hošek, J. (eds.) (2025). [The Reptile Database](http://www.reptile-database.org) ### **Author:** João Paulo dos Santos Vieira de Alencar Email: joaopaulo.valencar@gmail.com [Orcid](https://orcid.org/0000-0001-6894-6773) | [ResearchGate](https://www.researchgate.net/profile/Joao-Paulo-Alencar)