---
title: "AdapDiscom: A multimodal high dimensional method accounting for missing data and measurement error heterogeneity"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{AdapDiscom: A multimodal high dimensional method accounting for missing data and measurement error heterogeneity}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
```

```{r setup, eval=FALSE}
library(AdapDiscom)
library(MASS)
```

# AdapDiscom Package

The `AdapDiscom` package provides implementations of **AdapDISCOM**, a novel adaptive direct sparse regression method for high-dimensional multimodal data with block-wise missingness and measurement errors. Building on the DISCOM framework, AdapDISCOM introduces modality-specific weighting schemes to account for heterogeneity in data structures and error magnitudes across modalities. Unlike existing methods that are limited to a fixed number of blocks, **AdapDISCOM supports any number of blocks K**, making it highly flexible for diverse multimodal data applications. The package includes robust variants (AdapDISCOM-Huber) and computationally efficient versions (Fast-AdapDISCOM) to handle heavy-tailed distributions and large-scale datasets. AdapDISCOM has been shown to consistently outperform existing methods such as DISCOM, SCOM, and imputation based, particularly under heterogeneous contamination scenarios, providing a scalable framework for realistic multimodal data analysis with missing data and measurement errors.

## Main Functions

The package provides four main estimation functions:

1. **`adapdiscom()`** - Main adaptive DISCOM method
2. **`discom()`** - Original DISCOM method  
3. **`fast_adapdiscom()`** - Fast version of adaptive DISCOM
4. **`fast_discom()`** - Fast version of original DISCOM

## Different Covariance Structures

The package handles different covariance structures:

```{r, eval=FALSE}
# Number of variables
p <- 10

# AR(1) structure
Sigma_ar1 <- generate.cov(p, example = 1)
print(Sigma_ar1[1:3, 1:3])

# Block diagonal structure
Sigma_block <- generate.cov(p, example = 2)
print(Sigma_block[1:3, 1:3])

# Kronecker product structure  
Sigma_kron <- generate.cov(p, example = 3)
print(dim(Sigma_kron))
```

## Basic Usage Example

### Data Simulation

This simulation scenario represents a simple replication of scenario 2 from the AdapDISCOM's paper. It is case study with 300 variables distributed across 3 equal blocks (100 variables each), where each block follows a different covariance structure: the first block uses an AR(1) structure, the second a block-diagonal structure, and the third a Kronecker product structure. The training data (n=440) exhibits a structured missing data pattern where each quarter of the sample is completely missing one of the three variable blocks, creating heterogeneity in data availability that reflects real-world situations where different subgroups of subjects may have access to different sets of measurements. The true coefficient vector $\beta$ follows a sparse pattern with only $5\%$ of variables having non-zero effects ($\beta = 0.5$), allowing evaluation of the method's ability to correctly identify relevant variables in a high-dimensional context with heterogeneous missing data.

```{r, eval=FALSE}
set.seed(123)
# Set parameters
p <- 300
n <- 440
n.tuning <- 200
n.test <- 400
p1 <- p%/%3
p2 <- p%/%3
p3 <- p - p1 - p2
pp <- c(p1, p2, p3)  # Block sizes
sigma <- 1
```

```{r, eval=FALSE}
# Generate different covariance matrices for each block
cov.mat1 <- generate.cov(p1, 1)  # AR(1) structure for block 1
cov.mat2 <- generate.cov(p2, 2)  # Block diagonal for block 2
cov.mat3 <- generate.cov(p3, 3)  # Kronecker product for block 3
```

```{r, eval=FALSE}
# Generate true beta coefficients
beta1 <- rep(c(rep(0.5, 5), rep(0, 95)), p/100)
beta.true <- beta1

# Generate complete data for all samples
pre.x1 <- mvrnorm(n + n.tuning + n.test, rep(0, p%/%3), cov.mat1)
pre.x2 <- mvrnorm(n + n.tuning + n.test, rep(0, p%/%3), cov.mat2)
pre.x3 <- mvrnorm(n + n.tuning + n.test, rep(0, p%/%3), cov.mat3)
pre.x <- cbind(pre.x1, pre.x2, pre.x3)
pre.ep <- rnorm(n + n.tuning + n.test, 0, sigma)

# Training data
n.com <- n/4
n1 <- n2 <- n3 <- n4 <- n.com
X.train <- pre.x[1:n, ]
ep <- pre.ep[1:n]
y.train <- X.train %*% beta1 + ep 
colnames(X.train) <- paste0("X", 1:p)

# Introduce missing data pattern
X.train[(n1 + 1):(n1 + n2), (p1 + p2 + 1):(p1 + p2 + p3)] <- NA
X.train[(n1 + n2 + 1):(n1 + n2 + n3), (p1 + 1):(p1 + p2)] <- NA
X.train[(n1 + n2 + n3 + 1):(n1 + n2 + n3 + n4), (1:p1)] <- NA

# Tuning data
X.tuning <- pre.x[(n + 1):(n + n.tuning), ]
ep.tuning <- pre.ep[(n + 1):(n + n.tuning)]
y.tuning <- X.tuning %*% beta1 + ep.tuning
colnames(X.tuning) <- paste0("X", 1:p)

# Test data
X.test <- pre.x[(n + n.tuning + 1):(n + n.tuning + n.test), ]
ep.test <- pre.ep[(n + n.tuning + 1):(n + n.tuning + n.test)]
y.test <- X.test %*% beta1 + ep.test
colnames(X.test) <- paste0("X", 1:p)
```

### Running AdapDiscom

```{r, eval=FALSE}
# Run AdapDiscom with default parameters
result <- adapdiscom(
  beta = beta.true,      # True coefficients (optional, for evaluation)
  x = X.train,          # Training data
  y = y.train,          # Training response
  x.tuning = X.tuning,  # Tuning data
  y.tuning = y.tuning,  # Tuning response
  x.test = X.test,      # Test data
  y.test = y.test,      # Test response
  nlambda = 20,         # Number of lambda values
  nalpha = 10,          # Number of alpha values
  pp = pp,              # Block sizes
  robust = 0,           # Classical estimation
  standardize = TRUE,   # Standardize data
  itcp = TRUE          # Include intercept
)

# View results
print(paste("R-squared:", round(result$R2, 4)))
```

### Examining Results

The functions return a list with the following components:

```{r, eval=FALSE}
# Optimal parameters
cat("Optimal lambda:", result$lambda, "\n")
cat("Optimal alpha:", paste(round(result$alpha, 4), collapse = ", "), "\n")

# Performance metrics
cat("Training error:", round(result$train.error, 4), "\n")
cat("Test error:", round(result$test.error, 4), "\n")
cat("R-squared:", round(result$R2, 4), "\n")

# Model selection
cat("Number of selected variables:", result$select, "\n")

# If true beta provided, evaluation metrics
if (!is.null(beta.true)) {
  cat("Estimation error:", round(result$est.error, 4), "\n")
  cat("False Positive Rate:", round(result$fpr, 4), "\n")
  cat("False Negative Rate:", round(result$fnr, 4), "\n")
}

# Estimated coefficients
cat("First 6 estimated coefficients:\n")
print(round(head(result$a1), 4))
```

### Comparing Methods

```{r, eval=FALSE}
# Compare different methods
methods <- c("adapdiscom", "discom", "fast_adapdiscom", "fast_discom")
results <- list()

# AdapDiscom
results$adapdiscom <- result # Already computed above

# DISCOM
results$discom <- discom(
  beta = beta.true, x = X.train, y = y.train,
  x.tuning = X.tuning, y.tuning = y.tuning,
  x.test = X.test, y.test = y.test,
  nlambda = 20, nalpha = 10, pp = pp
)

# Fast AdapDiscom
results$fast_adapdiscom <- fast_adapdiscom(
  beta = beta.true, x = X.train, y = y.train,
  x.tuning = X.tuning, y.tuning = y.tuning,
  x.test = X.test, y.test = y.test,
  nlambda = 20, nalpha = 10, pp = pp
)

# Fast DISCOM
results$fast_discom <- fast_discom(
  beta = beta.true, x = X.train, y = y.train,
  x.tuning = X.tuning, y.tuning = y.tuning,
  x.test = X.test, y.test = y.test,
  nlambda = 20, pp = pp
)

# Compare performance
comparison <- data.frame(
  Method = names(results),
  Test_Error = round(sapply(results, function(x) x$test.error), 4),
  R_Squared = round(sapply(results, function(x) x$R2), 4),
  Selected_Vars = sapply(results, function(x) x$select),
  Est_Error = round(sapply(results, function(x) x$est.error), 4),
  FPR = round(sapply(results, function(x) x$fpr), 4),
  FNR = round(sapply(results, function(x) x$fnr), 4),
  Time = round(sapply(results, function(x) x$time), 2)
)

print(comparison)
```

> **Note:** When the missing blocks are of equal size within each modality and are disjoint across modalities (i.e., no observations have missing data in multiple modalities simultaneously), Fast-AdapDISCOM reduces to Fast-DISCOM since the adaptive weighting scheme becomes uniform across all modalities.

## Robust Estimation

The package supports robust estimation using Huber's M-estimator:

```{r eval=FALSE}
# Run with robust estimation
result_robust <- adapdiscom(
  beta = beta.true,
  x = X.train, y = y.train,
  x.tuning = X.tuning, y.tuning = y.tuning,
  x.test = X.test, y.test = y.test,
  nlambda = 20, nalpha = 10, pp = pp,
  robust = 1,        # Enable robust estimation
  k.value = 1.5      # Huber tuning parameter
)
```

## Advanced Parameters

```{r eval=FALSE}
# Advanced usage with custom parameters
result_advanced <- adapdiscom(
  beta = beta.true,
  x = X.train, y = y.train,
  x.tuning = X.tuning, y.tuning = y.tuning,
  x.test = X.test, y.test = y.test,
  nlambda = 30,              # More lambda values
  nalpha = 15,               # More alpha values
  pp = pp,
  robust = 0,
  standardize = TRUE,
  itcp = TRUE,
  lambda.min.ratio = 0.001,  # Custom lambda range
  k.value = 1.5
)
```

## Block Structure Requirements

- **AdapDiscom**: Supports any number of blocks (specified by `pp` vector)
- **DISCOM**: Supports 2, 3, or 4 blocks only
- **Fast variants**: Optimized versions with reduced computational complexity

## Tips for Usage

1. **Block sizes**: Ensure `sum(pp) == p` (total number of variables)
2. **Sample sizes**: Use adequate tuning and test sample sizes for reliable parameter selection
3. **Standardization**: Generally recommended (`standardize = TRUE`)
4. **Robust estimation**: Use when data contains outliers (`robust = 1`)
5. **Lambda range**: Adjust `lambda.min.ratio` if needed for better regularization path

## References

For more details on the methodology, please refer to the original papers on _A multimodal high dimensional method accounting for missing data and measurement error heterogeneity_.