---
title: "Frequently Asked Questions"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{FAQ}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

options(rmarkdown.html_vignette.check_title = FALSE)

```

Below are some frequently asked questions about the **libr** package. Click
on the links below to navigate to the full question and answer content.

## Index{#top}

* [How do I create a libname in R?](#libname)
* [Which data formats does the libname function support?](#formats)
* [Is there a way to filter the datasets in my libname?](#filter)
* [How do I view the variables in my datasets?](#dictionary)
* [How do I export data to another file format?](#export)
* [How do I copy a library?](#copy)
* [Can I really do a datastep in R?](#datastep)
* [Why is the datastep so slow?](#performance)
* [Can I do "set" and "merge" operations with the datastep?](#merge)


## Content

### How do I create a libname in R? {#libname}

**Q:** I have a directory full of datasets.  I need to use several of them
in my analysis.  In SAS®, I would create a libname so I could access all of them.
Is there a way to do something similar in R?

**A:** With the **libr** package, you can create a libname in R very 
much the same way you create a libname in SAS®.  
```{r eval=FALSE, echo=TRUE}
libname(mylib, "c:/mypath/mydata", "csv")

```
The above statement will create a libname "mylib" from the directory
specified on the second parameter.  The libname will use the CSV engine.
If there are any CSV files in the directory, they will be all loaded 
into the library.  To work directly with the datasets, you can then do:
```{r eval=FALSE, echo=TRUE}
mylib$mydataset
```
To access your datasets. 

[top](#top) 

******

### Which data formats does the libname function support? {#formats}

**Q:** I can see from the examples that the **libr** package supports
CSV and SAS dataset file formats.  What other data formats does the package
support?


**A:** The package supports the following data formats: csv, sas7bdat, 
rds, Rdata, rda, xls, xlsx, xpt, and dbf.  The `libname()` help page has a full list,
and a short discussion of some details on each format.  Note that the sas7bdat
file format is read-only at this time.

[top](#top) 

******

### Is there a way to filter the datasets in my libname? {#filter}

**Q:** I have a directory with over 100 datasets.  I want to use the 
`libname()` function, but worry about loading all those datasets into 
memory.  Is there a way I can filter the libname, to get only some of
the datasets?

**A:** Yes.  The `filter` parameter on the `libname()` function allows
you to pass a wildcard filter string.  For example, the following call
will load only those datasets that start with 'a':
```{r eval=FALSE, echo=TRUE}
libname(mylib, "c:/mypath/mydata", "csv", filter = "a*")

```
If you have a more complicated filter criteria, you can also pass a vector
of filter strings.  The below example will load only those datasets that
start with 'a' or 'b'.
```{r eval=FALSE, echo=TRUE}
libname(mylib, "c:/mypath/mydata", "csv", filter = c("a*", "b*"))

```

[top](#top) 

******


### How do I view the variables in my datasets? {#dictionary}

**Q:** I'm doing some analysis with my data, and can't remember all the variable
names.  Is there an easy way to view or print out the variables in my 
datasets?

**A:** Yes.  The `dictionary()` function from the **libr** pacakge will 
return a dataset with all the 
variables in your dataset, and some interesting attributes for each variable.
The `dictionary()` function works on a single data frame, or an entire 
library.  You can save this dictionary as metadata, print it, or even create 
a report from it.  Here is an example:
```{r eval=FALSE, echo=TRUE}
# Create libname
libname(mylib, "c:/mypath/mydata", "csv")

# Get dictionary
d <- dictionary(mylib)

# View dictionary
# View(d)
```

[top](#top)

******

### How do I export data to another file format? {#export}

**Q:** Let's say I have some data in one format (sas7bdat), and want to export
this data to another format (csv or Excel).  How can I do that with the **libr**
package?

**A:** The `lib_export()` function was designed for this purpose.  You can
take an existing library and export the entire thing to another library 
with a different file format.  Like this:
```{r eval=FALSE, echo=TRUE}
libname(libA, "c:/mypath/mydata1", "sas7bdat")

lib_export(libA, libB, "c:/mypath/mydata2", "csv")

```
The above statements will take the SAS® datasets in the library "libA", 
export them to CSV, place the new CSV files in the directory 
"c:/mypath/mydata2", and assign a new libname "libB" to that directory.
You now have two libnames, and can continue working with each as desired.

[top](#top) 

******


### How do I copy a library? {#copy}

**Q:** I have a directory full of datasets.  I want to back up the entire
thing to another directory.  How can I do that?

**A:** You can use the `lib_copy()` function, like this:
```{r eval=FALSE, echo=TRUE}
# Create libname
libname(lib1, "c:/mypath/mydata1", "csv")

# Copy to a new location
lib_copy(lib1, lib2, "c:/mypath/mydata2")
```
You will now have a reference to the new libname `lib2` at the new location, and
can use this libname like any other.

[top](#top)

******

### Can I really do a datastep in R? {#datastep}

**Q:** When I first started learning R I searched all over for a way 
to do a datastep.  I was shocked to learn there was nothing similar.  Does
the **libr** package really allow me to do a datastep in R?

**A:** Yes.  The **libr** `datastep()` function does not have all the 
capabilities of a SAS® datastep.  But it has the most commonly-used 
functionality.  You can loop through the data row by row, examine, 
and compare variable values for each row.  It has basic data shaping, 
grouping, retain, assigning
of attributes, and a datastep array. Here is a simple example showing
categorization of an age variable into age groups:
```{r eval=FALSE, echo=TRUE}
library(dplyr)
library(libr)

# Define data library
libname(dat, "./data", "csv") 

# Prepare data
dm_mod <- dat$DM %>% 
  select(USUBJID, SEX, AGE, ARM) %>% 
  filter(ARM != "SCREEN FAILURE") %>% 
  datastep({
    
    if (AGE >= 18 & AGE <= 24)
      AGECAT = "18 to 24"
    else if (AGE >= 25 & AGE <= 44)
      AGECAT = "25 to 44"
    else if (AGE >= 45 & AGE <= 64)
      AGECAT <- "45 to 64"
    else if (AGE >= 65)
      AGECAT <- ">= 65"
    
  }) 

```
The datastep example above is part of a **dplyr** pipeline,
but it can also function independently.
Notice that, just like a SAS® datastep, you don't have to declare new variables.
You can just assign the new variable a value, and the datastep function will 
create it automatically.  

You can check out the `datastep()` help page, or the 
[datastep vignette](https://libr.r-sassy.org/articles/libr-datastep.html) 
for additional examples and complete documentation.

[top](#top) 

******

### Why is the datastep so slow? {#performance}

**Q:** I like the `datastep()` function very much.  But it seems quite slow.
Is there anything I can do to speed it up? 

**A:** Yes. Performance of the `datastep()` is directly related to the size
of the input data.  The best thing you can do to increase performance
is to reduce the input data to only those rows and columns that you need.
The Base R `subset()` function and **Tidyverse** `select()` and `filter()`
functions are useful for this purpose.  Or you can use the Base R subset brackets ([])
if you are familiar with that syntax.  If the datastep performance is still
not satisfactory, it is recommended that you explore other R functions to 
perform your intended operation.

[top](#top) 

******

### Can I do "set" and "merge" operations with the datastep? {#merge}

**Q:** In SAS®, I used the datastep frequently to combine two or more datasets.
Does the **libr** datastep support "set" and "merge"? 

**A:** Yes. The `datastep()` function supports both "set" and "merge" operations.
The "set" parameter accepts a list of one or more datasets to stack together,
and the "merge" parameters are used in almost the same way as SAS®.
Here is an example:
```
# Subset iris dataset
dat1 <- subset(mtcars, cyl == 4, c('mpg', 'cyl', 'disp'))[1:5, ]
dat2 <- subset(mtcars, cyl == 6, c('mpg', 'cyl', 'disp'))[1:5, ]
dat3 <- mtcars[1:10, c('hp', 'drat', 'wt')]

# Stack datasets using set operation
res1 <- datastep(dat1, set = dat2, {})
#     mpg cyl  disp
# 1  22.8   4 108.0
# 2  24.4   4 146.7
# 3  22.8   4 140.8
# 4  32.4   4  78.7
# 5  30.4   4  75.7
# 6  21.0   6 160.0
# 7  21.0   6 160.0
# 8  21.4   6 258.0
# 9  18.1   6 225.0
# 10 19.2   6 167.6

# Merge row by row
res2 <- datastep(res1, merge = dat3, {})
#     mpg cyl  disp  hp drat    wt
# 1  22.8   4 108.0 110 3.90 2.620
# 2  24.4   4 146.7 110 3.90 2.875
# 3  22.8   4 140.8  93 3.85 2.320
# 4  32.4   4  78.7 110 3.08 3.215
# 5  30.4   4  75.7 175 3.15 3.440
# 6  21.0   6 160.0 105 2.76 3.460
# 7  21.0   6 160.0 245 3.21 3.570
# 8  21.4   6 258.0  62 3.69 3.190
# 9  18.1   6 225.0  95 3.92 3.150
# 10 19.2   6 167.6 123 3.92 3.440
```
The above merge shows how you can append columns even without a key column.
If you want to merge by a key, use the "merge_by" and "merge_in" parameters.
See the `datastep()` documentation for more information and examples.

[top](#top) 

******

<!-- ### Question 2? {#q2} -->

<!-- **Q:** Question here. -->

<!-- **A:** Answer here. -->

<!-- [top](#top) -->

<!-- ****** -->

<!-- ### Question 1? {#q1} -->

<!-- **Q:** Question here. -->


<!-- **A:** Answer here. -->

<!-- [top](#top)  -->

<!-- ****** -->