compare_models | R Documentation |
Compares two or more model fit to the same data set to determine which provides the best fit, using a variety of methods.
compare_models(..., method = "lrt")
... |
Two or more models objects to be compared. These objects should
be in the same format as the objects returned by the |
method |
The method of comparison to use. This currently includes |
The available comparison methods are
lrt: The likelihood ratio test. This method can be applied to a maximum of two models, and the parameters of these models (i.e., their constraints) must be in a strict subset/superset relationship. If your models do not meet these requirements, you should use a different method.
The likelihood ratio is calculated as follows:
LR = 2(LL_2 - LL_1)
where LL_2
is log likelihood of the model with more parameters. A
p-value is calculated by conducting a chi-squared test with X^2 = LR
and the degrees of freedom set to the difference in number of parameters
between the two models. This p-value tells us whether the difference in
likelihood between the two models is significant (i.e., whether the extra
parameters in the full model are justified by the increase in model fit).
aic: The Akaike Information Criterion. This is calculated as follows for each model:
AIC = 2k - 2LL
where k
is the number of model parameters (i.e., constraints) and LL
is the model's log likelihood.
aic_c: The Akaike Information Criterion corrected for small sample sizes. This is calculated as follows:
AIC_c = 2k - 2LL + \frac{2k^2 + 2k}{n - k - 1}
where n
is the number of samples and the other parameters are
identical to those used in the AIC calculation. As n
approaches
infinity, the final term converges to 0, and so this equation becomes
equivalent to AIC. Please see the note below for information about sample
sizes.
bic: The Bayesian Information Criterion. This is calculated as follows:
BIC = k\ln(n) - 2LL
As with aic_c
, this calculation relies on the number of samples. Please
see the discussion on sample sizes below before using this method.
A few caveats for several of the comparison methods:
The likelihood ratio test (lrt
) method applies to exactly two
models, and assumes that the parameters of these models are nested:
that is, the constraints in the smaller model are a strict subset of the
constraints in the larger model. This function will verify this to some
extent based on the number and names of constraints.
The Akaike Information Criterion adjusted for small sample sizes
(aic_c
) and the Bayesian Information Criterion (bic
) rely on sample
sizes in their calculations. The sample size for a data set is defined as
the sum of the column of surface form frequencies. If you want to apply
these methods, it is important that the values in the column are token
counts, not relative frequencies. Applying these methods to relative
frequencies, which effectively ignore sample size, will produce invalid
results.
The aic
, aic_c
, and bic
comparison methods return raw AIC/AICc/BIC
values as well as weights corresponding to these values. These weights
are calculated similarly for each model:
W_i = \frac{\exp(-0.5 \delta_i)}{\sum_{j=1}^{m}{\exp(-0.5 \delta_j)}}
where \delta_i
is the difference in score (AIC, AICc, BIC) between
model i
and the model with the best score, and m
is the number of
models being compared. These weights provide the relative likelihood or
conditional probability of this model being the best model (by whatever
definition of "best" is assumed by the measurement type) given the data and
the selection of models it is being compared to.
A data frame containing information about the comparison. The contents and size of this data frame vary depending on the method used.
lrt
: A data frame with a single row and the following columns:
description
: the names of the two models being compared.
The name of the model with more parameters will be first.
chi_sq
: the chi-squared value calculated during the test.
k_delta
: the difference in parameters between the two
models used as degrees of freedom in the chi-squared test.
p_value
: the p-value calculated by the test
aic
: A data frame with as many rows as there were models
passed in. The models are sorted in ascending order of AIC (i.e., best
first). This data frame has the following columns:
model
: The name of the model.
k
: The number of parameters.
aic
: The model's AIC value.
aic.delta
: The difference between this model's AIC value
and the AIC value of the model with the smallest AIC value.
aic.wt
: The model's AIC weight: this reflects the relative
likelihood (or conditional probability) that this model is the
"best" model in the set.
cum.wt
: The cumulative sum of AIC weights up to and
including this model.
ll
: The log likelihood of this model.
aicc
: The data frame returned here is analogous to the
structure of the AIC data frame, with AICc values replacing AICs and
accordingly modified column names. There is one additional column:
n
: The number of samples in the data the model is fit to.
bic
: The data frame returned here is analogous to the
structure of the AIC and AICc data frames. Like the AICc data frame,
it contains the n
column.
# Get paths to toy data files
# This file has two constraints
data_file_small <- system.file(
"extdata", "sample_data_frame.csv", package = "maxent.ot"
)
# This file has three constraints
data_file_large <- system.file(
"extdata", "sample_data_frame_large.csv", package = "maxent.ot"
)
# Fit weights to both data sets with no biases
tableaux_small <- read.csv(data_file_small)
small_model <- optimize_weights(tableaux_small)
tableaux_large <- read.csv(data_file_large)
large_model <- optimize_weights(tableaux_large)
# Compare models using likelihood ratio test. This is appropriate here
# because the constraints are nested.
compare_models(small_model, large_model, method='lrt')
# Compare models using AIC
compare_models(small_model, large_model, method='aic')
# Compare models using AICc
compare_models(small_model, large_model, method='aic_c')
# Compare models using BIC
compare_models(small_model, large_model, method='bic')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.