---
title: "How to use Fast Step Graph"
author: 
  - Juan G. Colonna^[Institute of Computing. Federal University of Amazonas. Brasil. juancolonna@icomp.ufam.edu.br]
  - Marcelo Ruiz^[Mathematics Department. National University of Río Cuarto. Argentina. mruiz@exa.unrc.edu.ar]

# output: rmarkdown::html_vignette
# output: pdf_document
vignette: >
  %\VignetteIndexEntry{How to use Fast Step Graph}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

To install the last version of this package directly from GitHub uncomment and run: 

```{r setup, message = FALSE}
# library(devtools)
# use "quiet = FALSE" if you want to see the outputs of this command
# devtools::install_github("juancolonna/FastStepGraph", quiet = TRUE, force = TRUE)

# Then, load it:
library(FastStepGraph)
```

Simulate Gaussian Data with an Autoregressive (AR) Model:

```{r}
set.seed(1234567)
phi <- 0.4 
p <- 50  # number of variables (dimension)
n <- 30 # number of samples

# Generate Data from a Gaussian distribution 
data <- FastStepGraph::SigmaAR(n, p, phi)
X <- scale(data$X) # standardizing variables
```

To fit the Omega matrix with `FastStepGraph()` function you have to know the optimal values of $\mathbf{\alpha_f}$ and $\mathbf{\alpha_b}$. If you don't know these values, try to find them using cross-validation as follows: 

```{r}
t0 <- Sys.time() # INITIAL TIME
res <- FastStepGraph::cv.FastStepGraph(X, data_shuffle = TRUE)
difftime(Sys.time(), t0, units = "secs")
# print(res$alpha_f_opt)
# print(res$alpha_b_opt)
```
If your input variables are non-standardized (with zero mean and unit variance), we recommend that you set `data_scale=TURE`. Subsequently, calculate the Omega matrix by calling the `FastStepGraph()` function passing the optimal parameters $\mathbf{\alpha_f}$ and $\mathbf{\alpha_b}$ found by cross-validation to fit the final model:

```{r}
t0 <- Sys.time() # INITIAL TIME
G <- FastStepGraph::FastStepGraph(X, alpha_f = res$alpha_f_opt, alpha_b = res$alpha_b_opt)
difftime(Sys.time(), t0, units = "secs")
# print(G$Omega)
```
You can also perform these two steps, the cross-validation to obtain the ideal parameters and return the fitted model, in a single step by setting the `return_model=TRUE` option as follows:

```{r}
t0 <- Sys.time() # INITIAL TIME
res <- FastStepGraph::cv.FastStepGraph(X, return_model=TRUE, data_shuffle = TRUE)
difftime(Sys.time(), t0, units = "secs")
# print(res$alpha_f_opt)
# print(res$alpha_b_opt)
# print(res$Omega)
```

The arguments `n_folds = 5`, `alpha_f_min = 0.1`, `alpha_f_max = 0.9`, `n_alpha = 32` (size of the grid search) and `nei.max = 5`, have defaults values and can be omitted. Note that, `cv.FastStepGraph(X)` is not an exhaustive grid search over $\mathbf{\alpha_f}$ and $\mathbf{\alpha_b}$. This is a heuristic that tests only a few $\mathbf{\alpha_b}$ values starting with the rule $\mathbf{\alpha_b}=\frac{\mathbf{\alpha_f}}{2}$. It is recommended to shuffle the rows of `X` before running cross-validation. The default value is `data_shuffle = TRUE`, but if you want to disable row shuffle, set it to `data_shuffle = FALSE`.

To increase time performance, you can run `cv.FastStepGraph(X, parallel = TRUE)` in parallel. Before, you'll need to install and register a parallel backend. To run on a Linux system the **doParallel** dependency must be installed `install.packages("doParallel")`. These parallel packages will also require the following dependencies: **foreach**, **iterators** and **parallel**. Make sure you satisfy them. Then, call the method setting the parameter **parallel = TRUE**, as follows: 


```{r, eval=FALSE}
t0 <- Sys.time() # INITIAL TIME
# use 'n_cores = NULL' to set the maximum number of cores minus one on your machine 
res <- FastStepGraph::cv.FastStepGraph(X, return_model=TRUE, parallel = TRUE, n_cores = 2)
difftime(Sys.time(), t0, units = "secs")
# print(res$alpha_f_opt)
# print(res$alpha_b_opt)
# print(res$Omega)
```

Remember, you can set the `n_cores` parameter to a value equal to the number of cores you have, but be careful as this may overload your system. Setting it to `1` disables parallel processing, and setting it to a number greater than the number of available cores does not improve efficiency.