The search.sur() function is one of the three main
functions in the ldt package. This vignette explains a
basic usage of this function using the world bank dataset (World Bank (2022)). Output growth is a widely
discussed topic in the field of economics. Several factors can influence
the rate and quality of output growth, including physical and human
capital, technological progress, institutions, trade openness, and
macroeconomic stability Chirwa and Odhiambo
(2016). We will use this package to identify the long-run
determinants of GDP per capita growth while making minimal
assumptions.
To minimize user discretion, we use all available data to select the set of potential regressors. Additionally, to avoid the endogeneity problem, we use information from before 2005 to explain the dependent variable after this year. This results in 571 potential regressors and 208 observations.
Of course, for this illustration, we use just the first 5 columns of data:
Here are the last few observations from this subset of the data:
tail(data)
#>     NY.GDP.PCAP.KD NY.GDP.PCAP.KD.lag AG.AGR.TRAC.NO AG.CON.FERT.PT.ZS
#> WSM      0.6948973                 NA             NA                NA
#> XKX      3.5026405                 NA             NA                NA
#> YEM     -5.7036924                 NA             NA                NA
#> ZAF     -0.2084907         0.83394060      -1.533726         0.2149429
#> ZMB      1.9830446        -0.63088082             NA                NA
#> ZWE      1.2915497        -0.05297394             NA        -3.1003477
#>     AG.CON.FERT.ZS AG.LND.AGRI.K2
#> WSM     5.16498292    -0.65382289
#> XKX             NA             NA
#> YEM    14.88937834     0.01804223
#> ZAF     2.20864028    -0.08807695
#> ZMB     4.42032159     0.37414717
#> ZWE    -0.01642054     0.86883765And here are some summary statistics for each variable:
sapply(as.data.frame(data), summary)
#>         NY.GDP.PCAP.KD NY.GDP.PCAP.KD.lag AG.AGR.TRAC.NO AG.CON.FERT.PT.ZS
#> Min.        -5.7036924         -2.7562067      -1.533726       -16.9560997
#> 1st Qu.     -0.1431228          0.7235014       1.308611        -2.9008332
#> Median       1.0235845          1.7697597       2.876800        -1.2511855
#> Mean         1.1094147          1.9232678       3.856278        -1.6759932
#> 3rd Qu.      2.4052532          2.8698123       5.600846         0.2268538
#> Max.         7.1613101         12.7823340      20.814750         7.3208970
#> NA's         9.0000000         73.0000000     134.000000       146.0000000
#>         AG.CON.FERT.ZS AG.LND.AGRI.K2
#> Min.         -6.526751    -6.62157767
#> 1st Qu.       1.310606    -0.29306446
#> Median        4.326556     0.01489903
#> Mean          4.329299     0.05915160
#> 3rd Qu.       6.856201     0.57567407
#> Max.         15.949830     2.23869809
#> NA's         80.000000     7.00000000The columns of the data represent the following variables:
NY.GDP.PCAP.KD: GDP per capita (constant 2015 US$)
AG.AGR.TRAC.NO: Agricultural machinery, tractors
AG.CON.FERT.PT.ZS: Fertilizer consumption (% of fertilizer production)
AG.CON.FERT.ZS: Fertilizer consumption (kilograms per hectare of arable land)
AG.LND.AGRI.K2: Agricultural land (sq. km)
We use the AIC metric to find four best explanatory models. Note that
we restrict the modelset by setting a maximum value for the number of
equations allowed in the models. Note that “intercept” and “lag” of the
dependent variable are included in all equations by
numFixPartitions argument.
search_res <- search.sur(data = get.data(data, endogenous = 1),
                         combinations = get.combinations(sizes = c(1,2,3),
                                                         numTargets = 1,
                                                         numFixPartitions = 2), 
                         metric <- get.search.metrics(typesIn = c("aic")),
                         items = get.search.items(bestK = 4))
print(search_res)
#> LDT search result:
#>  Method in the search process: SUR 
#>  Expected number of models: 5, searched: 5 , failed: 0 (0%)
#>  Elapsed time: 0.01687205 minutes 
#>  Length of results: 4 
#> --------
#>  Target (NY.GDP.PCAP.KD):
#>    Evaluation (aic):
#>       Best model:
#>        endogenous: NY.GDP.PCAP.KD
#>        exogenous: (3x1) (Intercept), NY.GDP.PCAP.KD.lag, AG.CON.FERT.PT.ZS
#>        metric: 213.8385
#> --------
#>  ** results for 4 best model(s) are savedThe output of the search.SUR() function does not contain
any estimation results, but only the information required to replicate
them. The summary() function returns a similar structure
but with the estimation results included.
The following code generates a table for presenting the result.
models <- lapply(0:3, function(i)
  search_sum$results[which(sapply(search_sum$results, function(d)
    d$info==i && d$typeName=="best model"))][[1]]$value)
names(models) <- paste("Best",c(1:4))
table <- coefs.table(models, latex = FALSE, 
                     regInfo = c("obs", "aic", "sic"))| Best 1 | Best 2 | Best 3 | Best 4 | |
|---|---|---|---|---|
| (Intercept) | 0.34 | 0.80* | 0.41 | 0.85*** | 
| NY.GDP.PCAP.KD.lag | 0.41* | -0.10 | 0.20* | 0.04 | 
| AG.CON.FERT.PT.ZS | 0.08 | |||
| AG.AGR.TRAC.NO | 0.07 | |||
| AG.CON.FERT.ZS | 0.08** | |||
| AG.LND.AGRI.K2 | 0.21 | |||
| obs | 51 | 58 | 106 | 133 | 
| aic | 213.84 | 234.47 | 430.61 | 546.35 | 
| sic | 219.63 | 240.65 | 438.60 | 555.02 | 
This package can be a recommended tool for empirical studies that
require reducing assumptions and summarizing uncertainty analysis
results. This vignette is just a demonstration. There are indeed other
options you can explore with the search.sur() function. For
instance, you can experiment with different evaluation metrics or
restrict the model set based on your specific needs. Additionally,
there’s an alternative approach where you can combine modeling with
Principal Component Analysis (PCA) (see estim.sur()
function). I encourage you to experiment with these options and see how
they can enhance your data analysis journey.