---
title: "`import_spss`: Importing data from 'SPSS'"
author: "Benjamin Becker"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{`import_spss`: Importing data from 'SPSS'}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

`import_spss()` allows importing data from `SPSS` (`.sav` and `.zsav` files) into `R` by using the `R` package `haven`. 

This vignette illustrates a typical workflow of importing a `SPSS` file using `import_spss()` and `extractData2()`. For illustrative purposes we use a small example data set from the campus files of the German PISA Plus assessment. The complete campus files and the original data set can be accessed [here](https://www.iqb.hu-berlin.de/fdz/Datenzugang/CF-Antrag/AntragsformularCF) and [here](https://www.iqb.hu-berlin.de/fdz/Datenzugang/SUF-Antrag). 

## Importing

```{r setup}
library(eatGADS)
```

We can import an `.sav` data set via the `import_spss()` function. Checks on variable names (for data base compatibility) are performed automatically. Changes to the variable names are reported to the console. This behavior can be suppressed by setting `checkVarNames = FALSE`.

```{r import spss}
sav_path <- system.file("extdata", "pisa.zsav", package = "eatGADS")
gads_obj <- import_spss(sav_path)
```

## `GADSdat` objects

The resulting object is of the class `GADSdat`. It is basically a named list containing the actual data (`dat`) and the meta data (`labels`). 

```{r class}
class(gads_obj)
names(gads_obj)
```

The names of the variables in a `GADSdat` object can be accessed via the `namesGADS()` function. The meta data of variables can be accessed via the `extractMeta()` function.

```{r names_and_meta}
namesGADS(gads_obj)
extractMeta(gads_obj, vars = c("schtype", "idschool"))
```

Commonly, the most informative columns are `varLabel` (containing variable labels), `value` (referencing labeled values), `valLabel` (containing value labels) and `missings` (missing tag: is a labeled value a missing value (`"miss"`) or not (`"valid"`)).  


## Extracting data from `GADSdat`

If we want to use the data for analyses in `R` we have to extract it from the `GADSdat` object via the function `extractData2()`. 
In doing so, we have to make two important decisions: (a) how should values marked as missing values be treated (`convertMiss`)? 
And (b) how should labeled values in general be treated (`labels2character`, `labels2factor`, `labels2ordered`, `dropPartialLabels`)? 

If a variable name is not provided under any of `labels2character`, `labels2factor`, `labels2ordered`, all value labels of the corresponding variable are simply dropped.
If a variable name is provided under `labels2character`, the value labels of the corresponding variable are applied and the resulting variable is a character variable. `labels2factor` converts variables to factor and `labels2ordered` converts variables to ordered factors.

See `?extractData2` for more details.

```{r extractdata}
## convert all labeled variables to character
dat1 <- extractData2(gads_obj, labels2character = namesGADS(gads_obj))
dat1[1:5, 1:10]

## leave labeled variables as numeric
dat2 <- extractData2(gads_obj)
dat2[1:5, 1:10]

## leave labeled variables as numeric but convert some variables to character and some to factor
dat3 <- extractData2(gads_obj, labels2character = c("gender", "language"),
                     labels2factor = c("schtype", "sameteach"))
dat3[1:5, 1:10]
```

In general, we recommend leaving labeled variables as numeric and converting values with missing codes to `NA`. 
Both are the default behavior for `extractData2()`.
If required, value labels can always be accessed via using `extractMeta()` on the `GADSdat` object or the data base.