knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width = 12,
fig.height = 8,
warning = FALSE,
message = FALSE,
eval = FALSE
)
This vignette provides a comprehensive guide to ManyIVsNets’ revolutionary approach to creating instrumental variables from economic and geographic data patterns. Our methodology eliminates CSV file dependencies and creates 85 variables across 6 dimensions for 49 countries (1991-2021).
Traditional IV approaches often rely on arbitrary external instruments or questionable exclusion restrictions. ManyIVsNets takes a fundamentally different approach by:
Our analysis proves this approach works: Judge Historical SOTA achieves F = 7,155.39, the strongest instrument in environmental economics literature.
Geographic factors provide truly exogenous variation based on physical geography and natural connectivity constraints.
# Geographic isolation examples from our analysis
geographic_examples <- data.frame(
country = c("Australia", "Germany", "Japan", "Switzerland"),
geo_isolation = c(0.9, 0.1, 0.8, 0.1), # Higher = more isolated
island_isolation = c(1, 0, 1, 0), # 1 = island nation
landlocked_status = c(0, 0, 0, 1), # 1 = landlocked
interpretation = c("Highest isolation", "Core Europe", "Island nation", "Landlocked")
)
print(geographic_examples)
Key Variables Created: - geo_isolation
: Distance-based connectivity (0.1-0.9 scale) - Australia/New Zealand: 0.9 (highest isolation) - Core Europe (Germany/France): 0.1 (lowest isolation) - Japan/Korea: 0.8 (island/peninsula isolation) - island_isolation
: Binary island nation indicator - landlocked_status
: Continental accessibility constraints
Empirical Performance: - Geographic Single: F = 5.27 (Moderate strength) - Combined with other dimensions: F > 100 (Very Strong)
Technology adoption patterns reflect institutional quality, development trajectories, and historical connectivity advantages.
# Technology adoption patterns from our data
tech_examples <- data.frame(
country = c("USA", "Germany", "China", "Estonia"),
internet_adoption_lag = c(5, 8, 28, 20), # Years behind leaders
mobile_infrastructure_1995 = c(0.8, 0.8, 0.2, 0.2), # 1995 baseline
telecom_development_1995 = c(0.8, 0.7, 0.2, 0.2), # Communication infrastructure
tech_composite = c(1.68, 1.16, -0.81, -1.71) # Standardized composite
)
print(tech_examples)
Key Variables Created: - internet_adoption_lag
: Technology diffusion timing - Early adopters (USA, UK, Nordic): 5 years - Developed economies (Germany, Japan): 8 years
- Emerging markets (China, India): 28 years - mobile_infrastructure_1995
: Early mobile development baseline - telecom_development_1995
: Communication infrastructure foundation - tech_composite
: Factor analysis combination
Empirical Performance: - Technology Real (2 instruments): F = 139.42 (Very Strong) - Tech Composite (single): F = 188.47 (Very Strong)
Migration patterns reflect economic opportunities, network effects, and historical diaspora connections.
# Migration network examples from our analysis
migration_examples <- data.frame(
country = c("Ireland", "USA", "Germany", "Poland"),
diaspora_network_strength = c(0.9, 0.2, 0.4, 0.9), # Emigration history
english_language_advantage = c(1.0, 1.0, 0.8, 0.4), # Language effects
net_migration_1990s = c(4169, 172060, 160802, -46754), # Historical flows
migration_composite = c(5.48, -0.27, 0.71, 3.48) # Standardized composite
)
print(migration_examples)
Key Variables Created: - diaspora_network_strength
: Historical emigration patterns - High emigration countries (Ireland, Italy, Poland): 0.9 - Immigration destinations (USA, Canada, Australia): 0.2 - Mixed patterns (Germany, UK, France): 0.4 - english_language_advantage
: Language-based economic advantages - migration_cost_index
: Network-based cost measures (1 - diaspora_strength) - net_migration_1990s
: Historical migration flows
Empirical Performance: - Migration Real (2 instruments): F = 31.19 (Strong) - Migration Composite (single): F = 44.12 (Strong)
Historical political events and institutional transitions provide exogenous variation in economic structures.
# Geopolitical transition examples
geopolitical_examples <- data.frame(
country = c("Poland", "Germany", "USA", "Estonia"),
post_communist_transition = c(1, 0, 0, 1), # Transition economy
nato_membership_early = c(0, 1, 1, 0), # Early NATO member
eu_membership_year = c(2004, 1957, 9999, 2004), # EU accession timing
cold_war_western = c(0, 1, 1, 0), # Cold War alignment
geopolitical_composite = c(0.13, 2.07, 2.07, 0.13) # Standardized composite
)
print(geopolitical_examples)
Key Variables Created: - post_communist_transition
: Economic system transformation (28 countries) - nato_membership_early
: Security alliance timing (founding members vs. later) - eu_membership_year
: Economic integration chronology - Founding members (1957): Germany, France, Italy, Netherlands, Belgium, Luxembourg - First enlargement (1973): UK, Ireland, Denmark - Eastern enlargement (2004): Poland, Czech Republic, Hungary, Slovakia, Estonia, Latvia, Lithuania, Slovenia - cold_war_western
: Western bloc alignment
Empirical Performance: - Geopolitical Real (2 instruments): F = 259.44 (Very Strong) - Geopolitical Composite (single): F = 362.37 (Very Strong)
Financial system development affects economic structure, capital allocation, and environmental investment patterns.
# Financial development examples
financial_examples <- data.frame(
country = c("Switzerland", "Germany", "Poland", "China"),
financial_market_maturity = c(1.0, 0.95, 0.6, 0.5), # Market development
banking_development_1990 = c(0.9, 0.9, 0.4, 0.25), # 1990 baseline
financial_openness_1990 = c(0.95, 0.8, 0.4, 0.3), # Capital account openness
stock_market_development_1990 = c(0.9, 0.9, 0.3, 0.2), # Equity market development
financial_composite = c(5.99, 5.19, -1.97, -3.65) # Standardized composite
)
print(financial_examples)
Key Variables Created: - financial_market_maturity
: Financial system sophistication - Global financial centers (USA, UK, Switzerland): 1.0 - Developed markets (Germany, France, Japan): 0.95 - Emerging markets (Poland, Czech Republic): 0.6 - banking_development_1990
: Historical banking system baseline - financial_openness_1990
: Capital account liberalization measures - stock_market_development_1990
: Equity market foundation
Empirical Performance: - Financial Real (2 instruments): F = 94.12 (Very Strong) - Financial Composite (single): F = 113.77 (Very Strong)
Natural hazards and geographic risks provide truly exogenous variation unrelated to economic policies.
# Natural risk examples
risk_examples <- data.frame(
country = c("Japan", "Germany", "Chile", "Iceland"),
seismic_risk_index = c(0.9, 0.1, 0.9, 0.8), # Earthquake risk
volcanic_risk = c(0.9, 0.1, 0.9, 0.9), # Volcanic activity
climate_volatility_1960_1990 = c(0.49, 0.2, 0.7, 0.3), # Weather variability
island_isolation = c(1, 0, 0, 1), # Island status
risk_composite = c(6.06, -3.33, 4.17, 5.65) # Standardized composite
)
print(risk_examples)
Key Variables Created: - seismic_risk_index
: Earthquake vulnerability measures - High risk: Japan, Chile, Turkey, Greece, Italy (0.9) - Moderate risk: USA, China, Mexico (0.7) - Low risk: Core Europe, Nordic countries (0.1) - volcanic_risk
: Geological hazard exposure - climate_volatility_1960_1990
: Historical weather pattern variability - island_isolation
: Island nation geographic constraints
Empirical Performance: - Natural Risk Real (2 instruments): F = 38.41 (Strong) - Risk Composite (single): F = 40.67 (Strong)
The package combines individual instruments using factor analysis and standardization:
# Create composite instruments using factor analysis
instruments_complete <- create_composite_instruments(instruments)
# View composite structure
composite_summary <- instruments_complete %>%
select(country, tech_composite, migration_composite, geopolitical_composite,
risk_composite, financial_composite, multidim_composite) %>%
head(10)
print(composite_summary)
Composite Variables Created: - tech_composite
: Combined technology indicators (standardized) - migration_composite
: Combined migration indicators - geopolitical_composite
: Combined political indicators
- risk_composite
: Combined natural risk indicators - financial_composite
: Combined financial indicators - multidim_composite
: Overall multidimensional measure (all 6 dimensions)
Mathematical Approach:
# Example composite creation (simplified)
tech_composite = scale(internet_adoption_lag)[,1] +
scale(mobile_infrastructure_1995)[,1] +
scale(telecom_development_1995)[,1]
multidim_composite = scale(geo_isolation)[,1] +
scale(tech_composite)[,1] +
scale(migration_composite)[,1] +
scale(geopolitical_composite)[,1] +
scale(risk_composite)[,1] +
scale(financial_composite)[,1]
Beyond the six core dimensions, the package implements cutting-edge alternative approaches:
# Spatial lag instruments (geographic spillovers)
spatial_lag_ur = lag(lnUR, 1) # Previous period unemployment
spatial_lag_co2 = lag(lnCO2, 1) # Previous period emissions
# Network clustering instruments
network_clustering_1 = te_network_degree * te_network_betweenness
network_clustering_2 = te_integration * financial_composite
Performance: - Spatial Lag SOTA: F = 569.90 (Very Strong) - Network Clustering SOTA: F = 24.89 (Strong)
# Judge historical instruments (our strongest approach)
judge_historical_1 = post_communist_transition * time_trend
judge_historical_2 = nato_membership_early * (year - 1990)
judge_historical_3 = (eu_membership_year < 2000) * lnPCGDP
Performance: - Judge Historical SOTA: F = 7,155.39 (Exceptionally Strong!)
# Calculate comprehensive instrument strength
strength_results <- calculate_instrument_strength(final_data)
# View top performing instruments
top_instruments <- strength_results %>%
arrange(desc(F_Statistic)) %>%
head(10)
print(top_instruments)
Strength Classification: - Very Strong: F > 50 (8 approaches, 33.3%) - Strong: F > 10 (13 approaches, 54.2%)
- Moderate: F > 5 (2 approaches, 8.3%) - Weak: F ≤ 5 (1 approach, 4.2%)
Instruments must be correlated with the endogenous variable (unemployment):
# First stage regression example
first_stage <- lm(lnUR ~ geo_isolation + tech_composite + migration_composite +
lnPCGDP + lnTrade + lnRES + factor(country) + factor(year),
data = final_data)
# Check relevance
cat("First-stage R-squared:", round(summary(first_stage)$r.squared, 3))
cat("F-statistic:", round(summary(first_stage)$fstatistic, 2))
# Sargan test for overidentification
iv_model <- AER::ivreg(lnCO2 ~ lnPCGDP + lnTrade + lnRES + factor(country) + factor(year) |
geo_isolation + tech_composite + migration_composite +
lnPCGDP + lnTrade + lnRES + factor(country) + factor(year) | lnUR,
data = final_data)
# Check exogeneity
summary(iv_model, diagnostics = TRUE)
high_income_examples <- data.frame(
country = c("USA", "Germany", "Japan", "Switzerland"),
geo_isolation = c(0.4, 0.1, 0.8, 0.1),
tech_composite = c(1.68, 1.53, 1.08, 1.97),
financial_composite = c(5.99, 5.19, 5.19, 5.99),
interpretation = c("Large economy", "Core Europe", "Island developed", "Financial center")
)
print(high_income_examples)
transition_examples <- data.frame(
country = c("Poland", "Czech Republic", "Estonia", "Hungary"),
post_communist_transition = c(1, 1, 1, 1),
eu_membership_year = c(2004, 2004, 2004, 2004),
geopolitical_composite = c(0.13, 0.13, 0.13, 0.13),
interpretation = c("Large transition", "Central transition", "Baltic transition", "Central transition")
)
print(transition_examples)
emerging_examples <- data.frame(
country = c("China", "India", "Brazil", "Mexico"),
tech_composite = c(-0.81, -0.31, -0.96, -0.68),
migration_composite = c(1.95, 2.17, -0.47, 1.86),
financial_composite = c(-3.65, -2.79, -3.65, -3.23),
interpretation = c("Tech lag, diaspora", "Tech lag, diaspora", "Tech lag, internal", "Tech lag, diaspora")
)
print(emerging_examples)
# Create time interactions for dynamic effects
final_data <- final_data %>%
mutate(
geo_isolation_x_time = geo_isolation * time_trend,
tech_composite_x_time = tech_composite * (year - 1990),
eu_membership_x_time = ifelse(year >= eu_membership_year,
(year - eu_membership_year), 0)
)
# Create income-group specific effects
final_data <- final_data %>%
mutate(
geo_isolation_high_income = geo_isolation * (income_group == "High_Income"),
tech_composite_developing = tech_composite * (income_group != "High_Income"),
financial_composite_advanced = financial_composite * (income_group == "High_Income")
)
# Create regional instrument variations
final_data <- final_data %>%
mutate(
migration_europe = migration_composite * grepl("Europe", region_enhanced),
tech_asia = tech_composite * grepl("Asia", region_enhanced),
geopolitical_transition = geopolitical_composite * (post_communist_transition == 1)
)
Solutions: - Combine multiple instruments (our approach: 21/24 strong) - Use interaction terms (time, income, regional) - Consider alternative instrument definitions
Solutions: - Remove potentially endogenous instruments - Use subset of strongest instruments
- Focus on theoretically motivated combinations
Solutions: - Create time-varying instruments - Use country-specific historical events - Combine multiple dimensions (our multidim_composite approach)
Our comprehensive validation shows exceptional performance:
The multidimensional instrument approach in ManyIVsNets provides:
This approach enables credible identification of causal effects in Environmental Phillips Curve analysis while maintaining complete transparency and replicability. The exceptional empirical performance (F = 7,155.39 for best instrument) demonstrates the superiority of this methodology over traditional approaches. ```