---
title: "Working with Measures in boilerplate"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Working with Measures in boilerplate}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

```{r setup}
library(boilerplate)
```

# Overview

This vignette provides a comprehensive guide to working with measures in the boilerplate package. Measures are a special type of content that describes variables, instruments, and scales used in research. The package provides powerful tools for managing, standardising, and generating formatted text about your measures.

# Quick Start: Adding and Using Measures

## Basic Workflow

```{r}
# Initialise and import the database
# Using a temporary directory for this example
temp_measures <- file.path(tempdir(), "measures_workflow_example")
boilerplate_init(data_path = temp_measures, create_dirs = TRUE, create_empty = FALSE, confirm = FALSE, quiet = TRUE)
unified_db <- boilerplate_import(data_path = temp_measures, quiet = TRUE)

# Add a measure directly to the unified database
# IMPORTANT: Measures must be at the top level, not nested in categories
unified_db$measures$anxiety_gad7 <- list(
  name = "generalised anxiety disorder scale (GAD-7)",
  description = "anxiety was measured using the GAD-7 scale.",
  reference = "spitzer2006",
  waves = "1-3",
  keywords = c("anxiety", "mental health", "gad"),
  items = list(
    "feeling nervous, anxious, or on edge",
    "not being able to stop or control worrying",
    "worrying too much about different things",
    "trouble relaxing",
    "being so restless that it is hard to sit still",
    "becoming easily annoyed or irritable",
    "feeling afraid, as if something awful might happen"
  )
)

# Save the database
boilerplate_save(unified_db, data_path = temp_measures, confirm = FALSE, quiet = TRUE)

# Generate formatted text about the measure
measures_text <- boilerplate_generate_measures(
  variable_heading = "Anxiety Measure",
  variables = "anxiety_gad7",
  db = unified_db,
  heading_level = 3,
  print_waves = TRUE
)

cat(measures_text)
```

# Understanding Measure Structure

## Required Fields

Every measure must have these fields:

- **name**: A descriptive name for the measure
- **description**: A brief description of what the measure assesses
- **type**: The measurement type (continuous, categorical, ordinal, or binary)

## Optional Fields

Additional fields provide more detail:

- **reference**: Citation for the measure
- **waves**: Data collection waves where the measure was used
- **keywords**: Terms for searching and categorisation
- **items**: List of individual items/questions
- **values**: Possible response values (for categorical/ordinal)
- **value_labels**: Labels for the response values
- **range**: Min and max values (for continuous measures)
- **unit**: Unit of measurement
- **cutoffs**: Clinical or meaningful cutoff values
- **scoring**: Information about how to score the measure
- **subscales**: Details of any subscales

## Common Mistakes to Avoid

### ❌ Incorrect: Nesting measures under categories

```{r}
# DON'T DO THIS - measures should not be nested under categories
unified_db$measures$psychological$anxiety <- list(...)  # WRONG
```

### ✅ Correct: Top-level measure entries

```{r}
# DO THIS - add measures directly at the top level
unified_db$measures$anxiety_gad7 <- list(...)  # CORRECT
unified_db$measures$depression_phq9 <- list(...)  # CORRECT
```

# Managing Multiple Measures

## Adding Multiple Measures at Once

```{r}
# Add several psychological measures
unified_db$measures$depression_phq9 <- list(
  name = "patient health questionnaire-9 (PHQ-9)",
  description = "depression symptoms were assessed using the PHQ-9.",
  type = "ordinal",
  reference = "kroenke2001",
  waves = "1-3",
  items = list(
    "little interest or pleasure in doing things",
    "feeling down, depressed, or hopeless",
    "trouble falling or staying asleep, or sleeping too much",
    "feeling tired or having little energy",
    "poor appetite or overeating",
    "feeling bad about yourself — or that you are a failure",
    "trouble concentrating on things",
    "moving or speaking slowly, or being fidgety or restless",
    "thoughts that you would be better off dead"
  ),
  values = c(0, 1, 2, 3),
  value_labels = c("not at all", "several days", 
                   "more than half the days", "nearly every day")
)

unified_db$measures$self_esteem <- list(
  name = "rosenberg self-esteem scale",
  description = "self-esteem was measured using a 3-item version of the Rosenberg scale.",
  type = "continuous",
  reference = "rosenberg1965",
  waves = "5-current",
  range = c(1, 7),
  items = list(
    "On the whole, I am satisfied with myself.",
    "I take a positive attitude toward myself.",
    "I feel that I am a person of worth, at least on an equal plane with others."
  )
)

# Save all changes
boilerplate_save(unified_db, data_path = temp_measures, confirm = FALSE, quiet = TRUE)
```

## Interactive Management

Browse and edit measures programmatically:

```{r}
# View all measures
names(unified_db$measures)

# Access a specific measure (if it exists)
if ("anxiety" %in% names(unified_db$measures)) {
  unified_db$measures$anxiety
} else if ("anxiety_gad7" %in% names(unified_db$measures)) {
  unified_db$measures$anxiety_gad7
}

# Add or update measures using boilerplate_add_entry() or boilerplate_update_entry()
```

# Standardising Measures

The standardisation process cleans and enhances your measure entries for consistency and completeness.

## What Standardisation Does

1. **Extracts scale information** from descriptions
2. **Identifies reversed items** marked with (r)
3. **Cleans formatting** issues
4. **Ensures complete structure** with all standard fields
5. **Standardises references** for consistency

## Running Standardisation

```{r}
# Check quality before standardisation
boilerplate_measures_report(unified_db$measures)

# Standardise all measures
unified_db$measures <- boilerplate_standardise_measures(
  unified_db$measures,
  extract_scale = TRUE,      # Extract scale info from descriptions
  identify_reversed = TRUE,   # Identify reversed items
  clean_descriptions = TRUE,  # Clean up description text
  verbose = TRUE             # Show progress
)

# Check quality after standardisation
boilerplate_measures_report(unified_db$measures)

# Save the standardised database
boilerplate_save(unified_db, data_path = temp_measures, confirm = FALSE, quiet = TRUE)
```

## Example: Before and After Standardisation

### Before:
```{r}
# Messy measure entry
unified_db$measures$perfectionism <- list(
  name = "perfectionism scale",
  description = "Perfectionism (1 = Strongly Disagree, 7 = Strongly Agree). Higher scores indicate greater perfectionism.",
  items = list(
    "Doing my best never seems to be enough.",
    "My performance rarely measures up to my standards.",
    "I am hardly ever satisfied with my performance. (r)"
  )
)
```

### After standardisation:
```{r}
# Clean, standardised entry
# The standardisation process will:
# - Extract scale: "1 = Strongly Disagree, 7 = Strongly Agree"
# - Clean description: "Perfectionism. Higher scores indicate greater perfectionism."
# - Identify reversed items: item 3 marked as reversed
# - Add missing fields: type, scale_info, scale_anchors, reversed_items
```

# Generating Quality Reports

## Basic Quality Assessment

```{r}
# Get a quality overview
boilerplate_measures_report(unified_db$measures)

# Output shows:
# - Total measures
# - Completeness percentages
# - Missing information
# - Standardisation status
```

## Detailed Quality Analysis

```{r}
# Get detailed report as data frame
quality_report <- boilerplate_measures_report(
  unified_db$measures, 
  return_report = TRUE
)

# Find measures missing critical information
missing_refs <- quality_report[!quality_report$has_reference, ]
missing_items <- quality_report[!quality_report$has_items, ]

# View specific issues
cat("Measures without references:", missing_refs$measure_name, sep = "\n")
cat("\nMeasures without items:", missing_items$measure_name, sep = "\n")
```

# Batch Operations on Measures

## Finding Entries to Clean

```{r}
# Find measures with specific characters in references
problematic_refs <- boilerplate_find_chars(
  db = unified_db,
  field = "reference",
  chars = c("@", "[", "]", " "),
  category = "measures"
)

print(problematic_refs)
```

## Batch Cleaning

```{r}
# Clean reference formatting
unified_db <- boilerplate_batch_clean(
  db = unified_db,
  field = "reference",
  remove_chars = c("@", "[", "]"),
  replace_pairs = list(" " = "_"),
  trim_whitespace = TRUE,
  category = "measures",
  preview = TRUE  # Preview first
)

# If preview looks good, run without preview
unified_db <- boilerplate_batch_clean(
  db = unified_db,
  field = "reference",
  remove_chars = c("@", "[", "]"),
  replace_pairs = list(" " = "_"),
  trim_whitespace = TRUE,
  category = "measures"
)
```

## Batch Editing

```{r}
# Update references for multiple measures
unified_db <- boilerplate_batch_edit(
  db = unified_db,
  field = "reference",
  new_value = "sibley2024",
  target_entries = c("political_orientation", "social_dominance"),
  category = "measures",
  preview = TRUE
)

# Update wave information using wildcards
unified_db <- boilerplate_batch_edit(
  db = unified_db,
  field = "waves",
  new_value = "1-16",
  target_entries = "political_*",  # All political measures
  category = "measures"
)

# Update based on current values
unified_db <- boilerplate_batch_edit(
  db = unified_db,
  field = "waves",
  new_value = "1-current",
  match_values = c("1-15", "1-16"),  # Update these specific values
  category = "measures"
)
```

# Generating Formatted Output

## Basic Measure Text

```{r}
# Generate text for a single measure
exposure_text <- boilerplate_generate_measures(
  variable_heading = "Exposure Variable",
  variables = "perfectionism",
  db = unified_db,
  heading_level = 3,
  subheading_level = 4,
  print_waves = TRUE
)

cat(exposure_text)
```

## Multiple Measures with Categories

```{r}
# Generate text for multiple measures grouped by type
psychological_measures <- boilerplate_generate_measures(
  variable_heading = "Psychological Measures",
  variables = c("anxiety_gad7", "depression_phq9", "self_esteem"),
  db = unified_db,
  heading_level = 3,
  subheading_level = 4,
  print_waves = TRUE,
  sample_items = 3  # Show only first 3 items
)

demographic_measures <- boilerplate_generate_measures(
  variable_heading = "Demographic Variables",
  variables = c("age", "gender", "education"),
  db = unified_db,
  heading_level = 3,
  subheading_level = 4,
  print_waves = FALSE  # Don't show waves for demographics
)

# Combine into methods section
methods_measures <- paste(
  "## Measures\n\n",
  psychological_measures, "\n\n",
  demographic_measures,
  sep = ""
)
```

## Advanced Formatting Options

```{r}
# Table format for enhanced presentation
measures_table <- boilerplate_generate_measures(
  variable_heading = "Study Measures",
  variables = c("anxiety_gad7", "perfectionism"),
  db = unified_db,
  table_format = TRUE,        # Use table format
  sample_items = 3,           # Show only 3 items
  check_completeness = TRUE,  # Note missing information
  quiet = TRUE               # Suppress progress messages
)

cat(measures_table)
```

# Complete Workflow Example

Here's a complete workflow from adding measures to generating a methods section:

```{r}
# 1. Initialise and import
# Create a new temp directory for this complete example
temp_complete <- file.path(tempdir(), "complete_measures_example")
boilerplate_init(data_path = temp_complete, create_dirs = TRUE, create_empty = FALSE, confirm = FALSE, quiet = TRUE)
unified_db <- boilerplate_import(data_path = temp_complete, quiet = TRUE)

# 2. Add your measures
unified_db$measures$political_orientation <- list(
  name = "political orientation",
  description = "political orientation on a liberal-conservative spectrum",
  type = "continuous",
  reference = "jost2009",
  waves = "all",
  range = c(1, 7),
  items = list("Please rate your political orientation")
)

unified_db$measures$social_wellbeing <- list(
  name = "social wellbeing scale",
  description = "social wellbeing measured using the Keyes Social Well-Being Scale",
  type = "continuous",
  reference = "keyes1998",
  waves = "5-current",
  items = list(
    "I feel like I belong to a community",
    "I feel that people are basically good",
    "I have something important to contribute to society",
    "Society is becoming a better place for everyone"
  )
)

# 3. Standardise the measures
unified_db$measures <- boilerplate_standardise_measures(
  unified_db$measures,
  verbose = FALSE
)

# 4. Check quality
boilerplate_measures_report(unified_db$measures)

# 5. Save the database
boilerplate_save(unified_db, data_path = temp_complete, confirm = FALSE, quiet = TRUE)

# 6. Generate formatted output
exposure_text <- boilerplate_generate_measures(
  variable_heading = "Exposure Variable",
  variables = "political_orientation",
  db = unified_db,
  heading_level = 3
)

outcome_text <- boilerplate_generate_measures(
  variable_heading = "Outcome Variable",
  variables = "social_wellbeing",
  db = unified_db,
  heading_level = 3
)

# 7. Combine with other methods text
sample_text <- boilerplate_generate_text(
  category = "methods",
  sections = "sample.default",  # Use a valid section path
  global_vars = list(
    population = "New Zealand adults",
    timeframe = "2020-2024"
  ),
  db = unified_db
)

# 8. Create complete methods section
methods_section <- paste(
  "# Methods\n\n",
  "## Participants\n\n",
  sample_text, "\n\n",
  "## Measures\n\n",
  exposure_text, "\n\n",
  outcome_text,
  sep = ""
)

cat(methods_section)
```

# Best Practices

## 1. Measure Organisation

- Keep measure names descriptive but concise
- Use consistent naming conventions (e.g., `scale_abbreviation`)
- Group related measures using consistent prefixes

## 2. Quality Control

- Run standardisation after importing data
- Review the quality report regularly
- Keep references consistent and complete
- Document any special scoring requirements

## 3. Workflow Tips

- Export your database before major changes
- Use preview mode for batch operations
- Test on a few measures before applying to all
- Keep the original items text exact for reproducibility

## 4. Integration with Text Generation

- Define measures before referencing them in text
- Use the exact measure name in `boilerplate_generate_measures()`
- Consider your audience when choosing format options
- Combine measure descriptions with method text for complete sections

# Troubleshooting

## Common Issues

### Measure not found
```{r}
# Error: Measure 'anxiety' not found
# Solution: Check exact name
names(unified_db$measures)  # List all measure names
```

### Standardisation warnings
```{r}
# Warning: Some measures already standardised
# Solution: This is normal - already standardised measures are skipped
```

### Missing required fields
```{r}
# Error: Measure missing required field 'type'
# Solution: Add the missing field
unified_db$measures$my_measure$type <- "continuous"
```

## Getting Help

If you encounter issues:

1. Check the measure structure matches the examples
2. Run `boilerplate_measures_report()` to identify problems
3. Use `verbose = TRUE` in functions for detailed output
4. Consult the package documentation: `?boilerplate_generate_measures`

# Summary

The boilerplate package provides a complete workflow for managing research measures:

1. **Add** measures to the unified database with proper structure
2. **Standardise** entries for consistency and completeness  
3. **Assess** quality using the reporting tools
4. **Edit** multiple measures efficiently with batch operations
5. **Generate** professional formatted output for publications

By following this workflow, you can maintain a high-quality, consistent database of measures that integrates seamlessly with your research documentation.