---
title: "Quick start guide"
author: "Martin Westgate & Dax Kellie"
date: '2024-11-19'
output:
  rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Quick start guide}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
galah is an R interface to biodiversity data hosted by the Global Biodiversity 
Information Facility ([GBIF](https://www.gbif.org)) and its subsidiary node
organisations. GBIF and its partner nodes collate and store observations of 
individual life forms using the ['Darwin Core'](https://dwc.tdwg.org) data 
standard.

# Installation

To install from CRAN:

``` r
install.packages("galah")
```

Or install the development version from GitHub:

``` r
install.packages("remotes")
remotes::install_github("AtlasOfLivingAustralia/galah")
```

Load the package

``` r
library(galah)
```

# Configuration
By default, galah downloads information from the Atlas of Living Australia 
(ALA). To show the full list of organisations currently supported by galah, 
use  `show_all(atlases)`.


``` r
show_all(atlases)
```

```
## # A tibble: 10 × 4
##    region         institution                                                             acronym url                    
##    <chr>          <chr>                                                                   <chr>   <chr>                  
##  1 Australia      Atlas of Living Australia                                               ALA     https://www.ala.org.au 
##  2 Austria        Biodiversitäts-Atlas Österreich                                         BAO     https://biodiversityat…
##  3 Brazil         Sistemas de Informações sobre a Biodiversidade Brasileira               SiBBr   https://sibbr.gov.br   
##  4 France         Portail français d'accès aux données d'observation sur les espèces      OpenObs https://openobs.mnhn.fr
##  5 Global         Global Biodiversity Information Facility                                GBIF    https://gbif.org       
##  6 Guatemala      Sistema Nacional de Información sobre Diversidad Biológica de Guatemala SNIBgt  https://snib.conap.gob…
##  7 Portugal       GBIF Portugal                                                           GBIF.pt https://www.gbif.pt    
##  8 Spain          GBIF Spain                                                              GBIF.es https://gbif.es        
##  9 Sweden         Swedish Biodiversity Data Infrastructure                                SBDI    https://biodiversityda…
## 10 United Kingdom National Biodiversity Network                                           NBN     https://nbn.org.uk
```

Use `galah_config()` to set the node organisation using its region, name, or 
acronym. Once set, `galah` will automatically populate the server configuration for your 
selected GBIF node. To download occurrence records from your chosen 
GBIF node, you will need to register an account with them (using their website), 
then provide your registration email to galah. 
To download from GBIF, you will need to provide the email, username, and 
password.


``` r
galah_config(atlas = "GBIF",
             username = "user1",
             email = "email@email.com",
             password = "my_password")
```
You can find a full list of configuration options by running `?galah_config`.

# Basic syntax
The standard method to construct queries in `{galah}` is via piped functions. 
Pipes in `galah` start with the `galah_call()` function, and typically end with 
`collect()`, though `collapse()` and `compute()` are also supported. The 
development team use the base pipe by default (`|>`), but the `{magrittr}` pipe 
(`%>%`) should work too.
  

``` r
galah_config(atlas = "ALA",
             verbose = FALSE)
galah_call() |>
  count() |>
  collect()
```

```
## # A tibble: 1 × 1
##       count
##       <int>
## 1 146185520
```

To pass more complex queries, you can use additional `{dplyr}` functions such as
`filter()`, `select()`, and `group_by()`.


``` r
galah_call() |> 
  filter(year >= 2020) |> 
  count() |>
  collect()
```

```
## # A tibble: 1 × 1
##      count
##      <int>
## 1 40200358
```

Each GBIF node allows you to query using their own set of in-built fields. You 
can investigate which fields are available using `show_all()` and `search_all()`:


``` r
search_all(fields, "australian states")
```

```
## # A tibble: 2 × 3
##   id     description                            type  
##   <chr>  <chr>                                  <chr> 
## 1 cl2013 ASGS Australian States and Territories fields
## 2 cl22   Australian States and Territories      fields
```

# Taxonomic searches
To narrow your search to a particular taxonomic group, use `identify()`. Note 
that this function only accepts scientific names and is not case sensitive. 
It's good practice to first use `search_taxa()` to check that the taxa you 
provide returns the correct taxonomic results.


``` r
search_taxa("reptilia") # Check whether taxonomic info is correct
```

```
## # A tibble: 1 × 9
##   search_term scientific_name taxon_concept_id                               rank  match_type kingdom phylum class issues
##   <chr>       <chr>           <chr>                                          <chr> <chr>      <chr>   <chr>  <chr> <chr> 
## 1 reptilia    REPTILIA        https://biodiversity.org.au/afd/taxa/682e1228… class exactMatch Animal… Chord… Rept… noIss…
```

``` r
galah_call() |>
  identify("reptilia") |> 
  filter(year >= 2020) |> 
  count() |>
  collect()
```

```
## # A tibble: 1 × 1
##    count
##    <int>
## 1 338434
```

If you want to query something other than the number of records, modify the 
`type` argument in `galah_call()`. Here we'll query the number of species:


``` r
galah_call(type = "species") |>
  identify("reptilia") |> 
  filter(year >= 2020) |> 
  count() |>
  collect()
```

```
## # A tibble: 1 × 1
##   count
##   <int>
## 1   883
```

# Download
To download records---rather than find how many records are available---simply
remove the `count()` function from your pipe.


``` r
result <- galah_call() |>
  identify("Litoria") |>
  filter(year >= 2020, cl22 == "Tasmania") |>
  select(basisOfRecord, group = "basic") |>
  collect()
```

```
## Retrying in 1 seconds.
```

``` r
result |> head()
```

```
## # A tibble: 6 × 9
##   recordID            scientificName taxonConceptID decimalLatitude decimalLongitude eventDate           occurrenceStatus
##   <chr>               <chr>          <chr>                    <dbl>            <dbl> <dttm>              <chr>           
## 1 00052544-d943-42e9… Litoria ewing… https://biodi…           -42.9             147. 2022-09-19 00:00:00 PRESENT         
## 2 00168ca6-84d0-4af1… Litoria ranif… https://biodi…           -41.2             146. 2023-12-21 10:20:19 PRESENT         
## 3 001a43fe-8586-4064… Litoria ewing… https://biodi…           -43.0             147. 2021-08-07 00:00:00 PRESENT         
## 4 00250163-ec50-4eda… Litoria ranif… https://biodi…           -41.2             147. 2023-08-23 11:49:28 PRESENT         
## 5 003e0f63-9f95-4af9… Litoria ewing… https://biodi…           -42.9             148. 2022-12-24 06:27:00 PRESENT         
## 6 0070521f-bb45-46fb… Litoria ewing… https://biodi…           -43.1             147. 2023-12-20 14:29:23 PRESENT         
## # ℹ 2 more variables: dataResourceName <chr>, basisOfRecord <chr>
```

Check out our other vignettes for more detail on how to use these functions.