optilevels | R Documentation |
Finds the optimal number of factor levels given the data and a model using a likelihood-based agglomerative algorithm.
optilevels(y, x, z = NULL, alpha = 0, dist = "gaussian", ...) ## S3 method for class 'optilevels' bestmodel(object, ...)
y |
vector of observations. |
x |
a factor or a matrix of proportions (i.e. the values 0 and 1 should have
consistent meaning across the columns, often through a unit sum constraint).
It is the user's responsibility to ensure that values supplied
for |
z |
a design matrix with predictor variables besides the one(s) defined
via the argument |
alpha |
numeric [0-1], weighting factor for calculating information criteria for model selection (i.e. IC = (1-alpha)*AIC + alpha*BIC, also referred to as CAIC: consistent AIC). |
dist |
character, distribution argument passed to underlying functions,
see listed on the help page of |
object |
fitted object. |
... |
other arguments passed to the underlying functions, see |
An object of class 'optilevels' that is a list with the following elements:
"delta"
delta IC values along the selection path considering best models.
"ic"
IC values along the selection path considering best models.
"coef"
matrix of coefficients (linear predictor scale)
corresponding to argument x
along the selection path considering best models.
"zcoef"
matrix of coefficients (linear predictor scale)
corresponding to argument z
when not NULL
along the selection path considering best models, or NULL
.
"rank"
matrix ranks based on the coefficients
along the selection path considering best models.
Ranking uses the default ties.method = "average"
in rank
.
"deltalist"
delta IC values along the selection path considering all competing models.
"iclist"
IC values along the selection path considering all competing models.
"coeflist"
matrix of coefficients (linear predictor scale)
corresponding to argument x
along the selection path considering all competing models.
"zcoeflist"
matrix of coefficients (linear predictor scale)
corresponding to argument z
when not NULL
along the selection path considering all competing models, or NULL
.
"ranklist"
matrix ranks based on the coefficients along the selection path considering all competing models.
"levels"
list of (merged) factor levels along the selection path considering best models.
"Y"
vector of observations (argument y
).
"X"
design matrix component corresponding to argument x
.
"Z"
design matrix component corresponding to argument z
.
"alpha"
weighting argument.
"dist"
distribution argument.
"factor"
logical, indicating if argument x
is a factor
(TRUE
) or a matrix (FALSE
).
bestmodel
returns the best supported model for further
manipulation (e.g. prediction).
Peter Solymos <solymos@ualberta.ca>
opticut
and multicut
for fitting best binary and multi-level response models.
## --- Factor levels with Gaussian distribution ## simple example from Legendre 2013 ## Indicator Species: Computation, in ## Encyclopedia of Biodiversity, Volume 4 ## http://dx.doi.org/10.1016/B978-0-12-384719-5.00430-5 gr <- as.factor(paste0("X", rep(1:5, each=5))) spp <- cbind(Species1=rep(c(4,6,5,3,2), each=5), Species2=c(rep(c(8,4,6), each=5), 4,4,2, rep(0,7)), Species3=rep(c(18,2,0,0,0), each=5)) rownames(spp) <- gr ## must add some noise to avoid perfect fit spp[6, "Species1"] <- 7 spp[1, "Species3"] <- 17 spp ol <- optilevels(spp[,"Species3"], gr) ol[c("delta", "coef", "rank", "levels")] ## get the final factor level gr1 <- gr levels(gr1) <- ol$level[[length(ol$level)]] table(gr, gr1) ## compare the models o0 <- lm(spp[,"Species3"] ~ gr - 1) o1 <- lm(spp[,"Species3"] ~ gr1 - 1) data.frame(AIC(o0, o1), delta=AIC(o0, o1)$AIC - AIC(o0)) ol$delta # should be identical ## --- Proportions with Poisson distribution ## simulation set.seed(123) n <- 500 # number of observations k <- 5 # number of habitat types b <- c(-1, -0.2, -0.2, 0.5, 1) names(b) <- LETTERS[1:k] x <- replicate(k, exp(rnorm(n))) x <- x / rowSums(x) # proportions X <- model.matrix(~.-1, data=data.frame(x)) lam <- exp(drop(crossprod(t(X), b))) y <- rpois(n, lam) z <- optilevels(y, x, dist="poisson") ## best model refit bestmodel(z) ## estimates plogis(z$coef) plogis(b) ## optimal classification z$rank ## get the final matrix x1 <- mefa4::groupSums(x, 2, z$levels[[length(z$levels)]]) head(x) head(x1) ## compare the models m0 <- glm(y ~ x - 1, family="poisson") m1 <- glm(y ~ x1 - 1, family="poisson") data.frame(AIC(m0, m1), delta=AIC(m0, m1)$AIC - AIC(m0)) z$delta # should be identical ## Not run: ## dolina example with factor data(dolina) dolina$samp$stratum <- as.integer(dolina$samp$stratum) y <- dolina$xtab[dolina$samp$method == "Q", "ppyg"] x <- dolina$samp$mhab[dolina$samp$method == "Q"] z <- scale(model.matrix(~ stratum + lmoist - 1, dolina$samp[dolina$samp$method == "Q",])) ## without additional covariates dol1 <- optilevels(y, x, z=NULL, dist="poisson") dol1$rank summary(bestmodel(dol1)) ## with additional covariates dol2 <- optilevels(y, x, z, dist="poisson") dol2$rank summary(bestmodel(dol2)) ## compare the two models AIC(bestmodel(dol1), bestmodel(dol2)) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.