---
title: 'gtfs2gps: Converting GTFS data to GPS-like format'
author: "Rafael H. M. Pereira, Pedro R. Andrade, Joao Bazzo"
output: rmarkdown::html_vignette
abstract: Package `gtfs2gps` has a set of functions to convert public transport GTFS data to GPS-like format using `data.table`. It also has some functions to convert both representations to simple feature format.
urlcolor: blue
vignette: |
  %\VignetteIndexEntry{gtfs2gps}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```
# Introduction 

Package `gtfs2gps` allows users to convert public transport GTFS data into a single `data.table` format with GPS-like records, which can then be used in various applications such as running transport simulations or scenario analyses. Before using the package, just install it from GitHub.

```{r, eval = FALSE}
install.packages("gtfs2gps")
```

# Loading data

After loading the package, GTFS data can be read into R by using `read_gtfs()`.
This function gets a zipped GTFS file and returns a list of `data.table` objects. The returning list contains the data of each GTFS file indexed according to their file names without extension.

```{r}
library("gtfs2gps")
poa <- read_gtfs(system.file("extdata/poa.zip", package ="gtfs2gps"))
names(poa)
head(poa$trips)
```

Note that not all GTFS files are loaded into R. This function only loads the necessary data to spatially and temporally handle trips and stops, which are:
"shapes.txt", "stop_times.txt", "stops.txt", "trips.txt",
"agency.txt", "calendar.txt", "routes.txt", and "frequencies.txt", with
this last four being optional.
If a given GTFS zipped file does not contain all of these required files then `read_gtfs()` will stop with an error.

In the code below we filter only the shape ids `c("T2-1", "A141-1") to allow faster execution the next scripts.

```{r}
object.size(poa) |> format(units = "Kb")
poa_small <- gtfstools::filter_by_shape_id(poa, c("T2-1", "A141-1"))
object.size(poa_small) |> format(units = "Kb")
```

We can then easily convert the data to simple feature format and plot them.

```{r poa_small_shapes_sf, message = FALSE}
poa_small_shapes_sf <- gtfs2gps::gtfs_shapes_as_sf(poa_small)
poa_small_stops_sf <- gtfs2gps::gtfs_stops_as_sf(poa_small)
plot(sf::st_geometry(poa_small_shapes_sf))
plot(sf::st_geometry(poa_small_stops_sf), pch = 20, col = "red", add = TRUE)
box()
```

After subsetting the data, it is also possible to save it as a new GTFS file using `write_gtfs()`, as shown below.

```{r, message = FALSE}
temp_gtfs <- tempfile(pattern = 'poa_small', fileext = '.zip')

gtfs2gps::write_gtfs(poa_small, temp_gtfs)
```

# Converting to GPS-like format

To convert GTFS to GPS-like format, use `gtfs2gps()`. This is the core function of the package. It takes a GTFS zipped file as an input and returns a `data.table` where each row represents a 'GPS-like' data point for every trip in the GTFS file. In summary, this function interpolates the space-time position of each vehicle in each trip considering the network distance and average speed between stops. The function samples the timestamp of each vehicle every $15m$ by default, but the user can set a different value in the `spatial_resolution` argument. See the example below.

```{r, message = FALSE}
poa_gps <- gtfs2gps(temp_gtfs, spatial_resolution = 100)
head(poa_gps)
```
 The following figure maps the first 100 data points of the sample data we processed. They can be converted to `simple feature` points or linestring.

```{r, message = FALSE}
poa_gps60 <- poa_gps[1:100, ]

# points
poa_gps60_sfpoints <- gps_as_sfpoints(poa_gps60)

# linestring
poa_gps60_sflinestring <- gps_as_sflinestring(poa_gps60)

# plot
plot(sf::st_geometry(poa_gps60_sfpoints), pch = 20)
plot(sf::st_geometry(poa_gps60_sflinestring), col = "blue", add = TRUE)
box()
```

The function `gtfs2gps()` automatically recognizes whether the GTFS data brings detailed `stop_times.txt` information or whether it is a `frequency.txt` GTFS file. A sample data of a GTFS with detailed  `stop_times.txt` cab be found below:

```{r, message = FALSE}
poa <- system.file("extdata/poa.zip", package ="gtfs2gps")

poa_gps <- gtfs2gps(poa, spatial_resolution = 50)

poa_gps_sflinestrig <- gps_as_sfpoints(poa_gps)

plot(sf::st_geometry(poa_gps_sflinestrig[1:200,]))

box()
```

# Methodological note

For a given trip, the function `gtfs2gps` calculates the average speed between each pair of consecutive stops given by the ratio between cumulative network distance `S` and departure time `t` for a consecutive pair of valid stop_ids (`i`), 

```{r equation, echo = FALSE, message = FALSE}
knitr::include_graphics("https://github.com/ipeaGIT/gtfs2gps/blob/master/man/figures/equation1.png?raw=true")
```

Since the beginning of each trip usually starts before the first stop_id, the mean speed cannot be calculated as shown in the previous equation because information on `i` period does not exist. In this case, the function consider the mean speed for the whole trip. It also happens after the last valid stop_id (`N`) of the trips, where info on `i + 1` also does not exist. 

```{r speed, echo = FALSE, message = FALSE}
knitr::include_graphics("https://github.com/ipeaGIT/gtfs2gps/blob/master/man/figures/speed.PNG?raw=true")
```

# Final remarks

If you have any suggestions or want to report an error, please visit the GitHub page of the package [here](https://github.com/ipeaGIT/gtfs2gps).