In the social sciences, we often ask locational questions, such as:
These questions make no mention of the specific distance between relative groups and instead focus on the order of outcome magnitudes.
While the statistics applied to these questions are usually variants of the general linear model, there is no reason to impose the assumption of linearity on the reality underlying these tests.
One alternative is to apply the general monotone model (GeMM
) as proposed by @dougherty2012robust.
GeMM
uses a search and scale procedure to find the optimal relative weights for a set of predictors and scale these weights to minimize the order-constrained squared error.
This first, computationally-intensive step is accomplished by using a genetic algorithm to optimize some fit criterion (e.g., Kendall's $\tau$) between an observed outcome and a weighted set of predictors.
Use of $\tau$ in this case assures relative weights that maximally reflect the monotone relationship between the outcome and model predictions.
Other fit criteria penalize for complexity, but are based on transformations of $\tau$.
We then regress the original outcome onto the relative-weighted model predictions to compute an intercept and scaling factor that minimizes squared error conditioned on this ordinal constraint.
We implement GeMM
with the gemmR
package, which uses Rcpp
to speed up repeated calculation of Kendall's $\tau$ for use in the genetic search process.
As GeMM
serves as a functional replacement for the linear model, a similar syntax is used to fit a GeMM
model.
library(gemmR) data(culture) mod <- gemm(murder.rate ~ pasture + gini + gnp, data = culture, n.chains = 3, n.gens = 10, n.beta = 200, check.convergence = TRUE)
This produces a gemm
object, which is modeled after the lm
object.
\section*{Helper functions}
The gemmR
package includes a number of S3 methods and a few novel functions to help extract information from gemm
objects.
summary
displays some helpful information about the fitted gemm
object.
summary(mod)
GeMM
is a stochastic process, so multiple replications are advisable to ensure stability of parameter estimates.
gemm
runs four replications by default, all of which are displayed by descending value on the fit criterion.
Below the four chains are the corresponding values of the optimized fit criterion.
While all fit criteria are calculated and contained in the gemm
object, only the criterion used for selection is displayed with summary
.
Though no method exists for verifying that results of a random search process on empirical data, one quick way to check the suitability of a solution is to demonstrate convergent results across starting conditions. A quick way to check genetic algorithm performance for a given dataset is to plot the best criterion value across generations and chains.
plot(mod)
The predict
function for gemm
serves two roles.
The first is to generate model predictions based on the best chain of a given model.
predict
will also generate the counts of concordances, disconcordances, outcome ties and predictor ties for a given model.
yhat <- predict(mod, tie.struct = TRUE) head(yhat) attr(yhat, "tie.struct")
gemmR
The information criteria calculated by gemmR
are based on ordinal statistics and cannot be directly compared with likelihood-based criteria.
gemmR
includes a number of methods so that traditional information criteria can be easily extracted for comparison with other models.
logLik(mod) AIC(mod) BIC(mod)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.