---
title: "Introduction to nisrarr"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to nisrarr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

can_plot <- requireNamespace("ggplot2", quietly = TRUE) && 
  requireNamespace("scales", quietly = TRUE)

has_prettyunits <- requireNamespace("prettyunits", quietly = TRUE)
```

# Overview of the NISRA Data Portal API

This package interacts with the [NISRA Data Portal](https://data.nisra.gov.uk/),
which allows users to access data produced by the Northern Ireland Statistics 
and Research Agency (NISRA) in a variety of machine-readable formats, and to 
interactively query, plot, or map the data. The NISRA Data Portal is built on
[PxStat](https://github.com/CSOIreland/PxStat), a dissemination system developed
by the Central Statistics Office (CSO) of Ireland. Guidance on the NISRA Data 
Portal is available [here](https://data.nisra.gov.uk/guide.html).

## Searching for data

We can search for data using `nisra_search()`, which gives information on 
available datasets such as when it was last updated or what variables are in the
data.

```{r search}
library(nisrarr)

x <- nisra_search()
head(x)
```

If we don't know the exact name of the dataset we're interested in, we can 
search using a keyword that appears in the label or a set of variables that we
need:

```{r search-keyword}
nisra_search(keyword = "employ")

nisra_search(variables = "Free School Meal Entitlement")
```


## Fetching data

We can use `nisra_read_dataset()` with the dataset code we found above to 
request the dataset from the API and convert it to a tibble. Every dataset will 
have a `Statistic` column and a `value` column, a column for the time period,
and any other variables included in the breakdown:

```{r read-data, eval=can_plot}
mye <- nisra_read_dataset("MYE01T04")
head(mye)

library(dplyr)
library(ggplot2)

mye <- mye |> 
  filter(
    `Broad age band (4 cat)` == "Age 65+",
    Sex %in% c("Females", "Males")
  ) |> 
  mutate(Year = as.numeric(Year))


ggplot(mye, aes(Year, value, colour = Sex)) +
  geom_line() +
  scale_y_continuous(labels = scales::label_comma()) +
  facet_wrap(
    vars(`Local Government District`), 
    scales = "free_y",
    labeller = label_wrap_gen(width = 18)
  ) +
  labs(
    title = "Population aged 65+ by sex and local government district, 2001 to 2022",
    x = NULL, 
    y = NULL, 
    colour = NULL
  ) +
  theme(legend.position = "top")
```


## Metadata

nisrarr has some functionality for working with metadata. We can use the 
`get_metadata()` function on any dataset we download from the API to fetch some
of the common or useful fields, such as whether these are official statistics, 
the subject of the statistics, and contact information:

```{r meta}
get_metadata(mye)
```

If we need to work with any of these fields programmatically, we can fetch
specific fields using `get_metadata_field()`:

```{r meta-field, eval=has_prettyunits}
updated <- get_metadata_field(mye, "updated")

updated |> 
  lubridate::ymd_hms() |> 
  prettyunits::time_ago()
```


## Caching

By default, nisrarr caches data fetch from the data portal API to speed up 
repeatedly fetching the same data. Results are cached for 1 hour then removed,
or the cached values can be ignored by setting `flush_cache = TRUE` in 
`nisra_search()` or `nisra_read_dataset()`. Caching is useful when working 
interactively, but it is better to fetch data directly if it's part of a larger
script or pipeline.