In the social sciences, we often ask locational questions, such as:
These questions make no mention of the specific distance between relative groups and instead focus on the order of outcome magnitudes. While the statistics applied to these questions are usually variants of the general linear model, there is no reason to impose the assumption of linearity on the reality underlying these tests. One alternative is to apply the general monotone model (GeMM) as proposed by Dougherty and Thomas (2012).
GeMM uses a search and scale procedure to find the optimal relative weights for a set of predictors and scale these weights to minimize the order-constrained squared error. This first, computationally-intensive step is accomplished by using a genetic algorithm to optimize some fit criterion (e.g., Kendall’s \(\tau\)) between an observed outcome and a weighted set of predictors. Use of \(\tau\) in this case assures relative weights that maximally reflect the monotone relationship between the outcome and model predictions. Other fit criteria penalize for complexity, but are based on transformations of \(\tau\). We then regress the original outcome onto the relative-weighted model predictions to compute an intercept and scaling factor that minimizes squared error conditioned on this ordinal constraint.
We implement GeMM with the gemmR package, which uses Rcpp to speed up repeated calculation of Kendall’s \(\tau\) for use in the genetic search process. As GeMM serves as a functional replacement for the linear model, a similar syntax is used to fit a GeMM model.
library(gemmR)
data(culture)
mod <- gemm(murder.rate ~ pasture + gini + gnp, data = culture, n.chains = 3, 
    n.gens = 10, n.beta = 200, check.convergence = TRUE)This produces a gemm object, which is modeled after the lm object.
The gemmR package includes a number of S3 methods and a few novel functions to help extract information from gemm objects.
summary displays some helpful information about the fitted gemm object.
summary(mod)## Call:
## gemm.formula(formula = murder.rate ~ pasture + gini + gnp, data = culture, 
##     n.chains = 3, n.gens = 10, n.beta = 200, check.convergence = TRUE)
## 
## Coefficients:
##       intercept pasture      gini           gnp
## [1,]  0.5478463       0 0.2485879 -0.0001917586
## [2,]  0.2353015       0 0.2556735 -0.0001893033
## [3,] -2.7045577       0 0.3204726 -0.0001619418
## 
## bic
## [1] -45.56397 -45.08970 -43.91595GeMM is a stochastic process, so multiple replications are advisable to ensure stability of parameter estimates. gemm runs four replications by default, all of which are displayed by descending value on the fit criterion.
Below the four chains are the corresponding values of the optimized fit criterion. While all fit criteria are calculated and contained in the gemm object, only the criterion used for selection is displayed with summary.
Though no method exists for verifying that results of a random search process on empirical data, one quick way to check the suitability of a solution is to demonstrate convergent results across starting conditions. A quick way to check genetic algorithm performance for a given dataset is to plot the best criterion value across generations and chains.
plot(mod)The predict function for gemm serves two roles. The first is to generate model predictions based on the best chain of a given model. predict will also generate the counts of concordances, disconcordances, outcome ties and predictor ties for a given model.
yhat <- predict(mod, tie.struct = TRUE)
head(yhat)## [1]  6.5033527  7.7697543 11.0312726  8.8763036  2.1071913  0.5268963attr(yhat, "tie.struct")##       tau.a     tau.b n.pairs n.ties.1 n.ties.2 n.ties.both n.dis n.con
## 1 0.4878165 0.4879331    4186        2        0           0  1071  3113gemmRThe information criteria calculated by gemmR are based on ordinal statistics and cannot be directly compared with likelihood-based criteria. gemmR includes a number of methods so that traditional information criteria can be easily extracted for comparison with other models.
logLik(mod)## 'log Lik.' -330.7949 (df=4)AIC(mod)## [1] 669.5899BIC(mod)## [1] 679.677Chrabaszcz, Anna, and Nan Jiang. 2014. “The Role of the Native Language in the Use of the English Nongeneric Definite Article by L2 Learners: A Cross-Linguistic Comparison.” Second Language Research 30 (3). SAGE Publications: 351–79.
Dougherty, Michael R, and Rick P Thomas. 2012. “Robust Decision Making in a Nonlinear World.” Psychological Review 119 (2). American Psychological Association: 321.
Dougherty, Michael R, Rick P Thomas, Ryan P Brown, Jeffrey S Chrabaszcz, and Joe W Tidwell. 2014. “An Introduction to the General Monotone Model with Application to Two Problematic Datasets.”
Tidwell, Joe W, Michael R Dougherty, Jeffrey R Chrabaszcz, Rick P Thomas, and Jorge L Mendoza. 2014. “What Counts as Evidence for Working Memory Training? Problems with Correlated Gains and Dichotomization.” Psychonomic Bulletin & Review 21 (3). Springer: 620–28.