--- title: "JSON Schema Documentation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{JSON Schema Documentation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Overview The boilerplate package supports JSON format for all database operations. This document describes the JSON schema structure for different database types. # Unified Database Schema The unified database combines all categories into a single JSON file: ```json { "methods": { ... }, "measures": { ... }, "results": { ... }, "discussion": { ... }, "appendix": { ... }, "template": { ... } } ``` # Methods Database Schema Methods entries contain standardised text with template variables. ## Basic Structure ```json { "category": { "subcategory": { "entry_name": { "text": "Method description with {{variables}}", "reference": "@citation2023", "keywords": ["keyword1", "keyword2"] } } } } ``` ## Entry Variants Methods can have multiple text variants: ```json { "statistical": { "regression": { "linear": { "default": "We used linear regression to analyse {{outcome}}.", "large": "We employed ordinary least squares linear regression to examine the relationship between {{predictors}} and {{outcome}}. Model assumptions were checked including...", "brief": "Linear regression was used." } } } } ``` ## Fields - **text** or **default**: Main content (required) - **large**: Extended version (optional) - **brief**: Condensed version (optional) - **reference**: Citation in @key format (optional) - **keywords**: Array of searchable terms (optional) - **_meta**: Metadata object (optional) # Measures Database Schema Measures entries describe variables and instruments used in research. ## Basic Structure ```json { "category": { "measure_name": { "name": "unique_identifier", "description": "Detailed description of the measure", "type": "continuous|categorical|ordinal|binary", "additional_fields": "..." } } } ``` ## Complete Example ```json { "psychological": { "anxiety": { "gad7": { "name": "gad7", "description": "Generalised Anxiety Disorder 7-item scale", "type": "ordinal", "items": 7, "range": [0, 21], "values": [0, 1, 2, 3], "value_labels": ["Not at all", "Several days", "More than half the days", "Nearly every day"], "cutoffs": { "mild": 5, "moderate": 10, "severe": 15 }, "reference": "@spitzer2006brief", "keywords": ["anxiety", "screening", "GAD-7"], "scoring": { "type": "sum", "interpretation": { "0-4": "Minimal anxiety", "5-9": "Mild anxiety", "10-14": "Moderate anxiety", "15-21": "Severe anxiety" } } } } } } ``` ## Required Fields - **name**: Unique identifier (string, alphanumeric + underscore) - **description**: Full description (string, min 10 characters) - **type**: One of: "continuous", "categorical", "ordinal", "binary" ## Optional Fields ### For All Types - **reference**: Citation (string) - **keywords**: Search terms (array of strings) - **waves**: Data collection waves (array of integers) - **unit**: Unit of measurement (string) ### For Categorical/Ordinal - **values**: Possible values (array) - **value_labels**: Labels for values (array of strings) ### For Continuous - **range**: [min, max] values (array of 2 numbers) ### For Scales - **items**: Number of items (integer) - **scoring**: Scoring method object - **subscales**: Subscale definitions object - **cutoffs**: Clinical cutoffs object # Results Database Schema Results entries follow the same pattern as methods: ```json { "descriptive": { "demographics": { "age": { "text": "The mean age was {{mean_age}} years (SD = {{sd_age}}).", "reference": "@reporting2023" } } } } ``` # Template Database Schema Template variables for substitution: ```json { "global": { "n": 100, "study_name": "Example Study", "year": 2024 }, "methods": { "software": "R version 4.3.0", "alpha": 0.05 }, "measures": { "wave1_date": "January 2024", "wave2_date": "June 2024" } } ``` ## Variable Scoping - **global**: Available to all sections - **[section]**: Override globals for specific section # Schema Validation ## JSON Schema Files Located in `inst/examples/json-poc/schema/`: - `measures_schema.json`: Formal schema for measures - `methods_schema.json`: Formal schema for methods ## Validation in R ```r # Validate a JSON database boilerplate::validate_json_database( json_file = "my_database.json", schema_file = "measures_schema.json" ) ``` ## Common Validation Errors 1. **Missing required fields** ```json { "measure1": { "description": "Missing 'name' and 'type' fields" } } ``` 2. **Invalid type values** ```json { "measure1": { "name": "m1", "description": "Invalid type", "type": "numeric" // Should be "continuous" } } ``` 3. **Mismatched arrays** ```json { "measure1": { "values": [1, 2, 3], "value_labels": ["Low", "High"] // Should have 3 labels } } ``` # Migration from RDS ## Converting RDS to JSON ```r # Single category boilerplate_rds_to_json( rds_file = "measures_db.rds", json_file = "measures_db.json" ) # Unified database boilerplate_migrate_to_json( rds_file = "boilerplate_unified.rds", output_dir = "data/json/" ) ``` ## Format Differences ### RDS Format - Binary R object - Preserves all R data types - Not human-readable - Platform-specific ### JSON Format - Text-based - Limited data types - Human-readable - Cross-platform ## Handling Special Cases 1. **NULL values**: Removed in JSON 2. **Factors**: Converted to character 3. **Dates**: Stored as ISO 8601 strings 4. **Attributes**: Stored in _meta fields # Best Practices ## File Organisation ``` project/ ├── data/ │ ├── boilerplate_unified.json # Single unified file │ └── categories/ # Or separate files │ ├── methods.json │ ├── measures.json │ └── results.json ``` ## Naming Conventions 1. **Keys**: Use lowercase with underscores 2. **Categories**: Descriptive, hierarchical 3. **Measures**: Include instrument abbreviation ## Version Control JSON files work well with git: - Human-readable diffs - Easy conflict resolution - Track changes over time ## Performance Considerations 1. **File Size**: JSON files are larger than RDS 2. **Parse Time**: Slightly slower than RDS 3. **Recommendation**: Use unified format for <1000 entries # Examples ## Creating a New Measures Entry ```json { "demographic": { "age": { "name": "age", "description": "Participant age at time of assessment", "type": "continuous", "unit": "years", "range": [18, 100] } } } ``` ## Adding a Methods Entry with Variants ```json { "sampling": { "random": { "default": "Participants were randomly selected from {{population}}.", "large": "We employed a stratified random sampling approach. The {{population}} was first divided into {{strata}} strata based on {{stratification_var}}. Within each stratum, participants were randomly selected using a random number generator with seed {{seed}} for reproducibility.", "brief": "Random sampling was used.", "reference": "@cochran1977sampling" } } } ``` ## Template Variables with Overrides ```json { "global": { "software": "R", "version": "4.3.0" }, "methods": { "software": "R version 4.3.0 with lme4 package" } } ``` In this example, methods sections will use the more specific software description, while other sections use the global version.