anlvm.fit | R Documentation |
Fits an Auxiliary Nonlinear Variance Model (ANLVM) to estimate the error variances of a heteroskedastic linear regression model.
anlvm.fit( mainlm, g, M = NULL, cluster = FALSE, varselect = c("none", "hettest", "cv.linear", "cv.cluster", "qgcv.linear", "qgcv.cluster"), nclust = c("elbow.swd", "elbow.mwd", "elbow.both"), clustering = NULL, param.init = function(q) stats::runif(n = q, min = -5, max = 5), maxgridrows = 20L, nconvstop = 3L, zerosallowed = FALSE, maxitql = 100L, tolql = 1e-08, nestedql = FALSE, reduce2homosked = TRUE, cvoption = c("testsetols", "partitionres"), nfolds = 5L, ... )
mainlm |
Either an object of |
g |
A numeric-valued function of one variable, or a character denoting
the name of such a function. |
M |
An n\times n annihilator matrix. If |
cluster |
A logical; should the design matrix X be replaced with an
n\times n_c matrix of ones and zeroes, with a single one in each
row, indicating assignments of the n observations to n_c
clusters using an agglomerative hierarchical clustering algorithm. In
this case, the dimensionality of γ is n_c and not
p. Defaults to |
varselect |
Either a character indicating how variable selection should
be conducted, or an integer vector giving indices of columns of the
predictor matrix (
|
nclust |
A character indicating which elbow method to use to select
the number of clusters (ignored if |
clustering |
A list object of class |
param.init |
Specifies the initial values of the parameter vector to
use in the Gauss-Newton fitting algorithm. This can either be a function
for generating the initial values from a probability distribution, a
list containing named objects corresponding to the arguments of
|
maxgridrows |
An integer indicating the maximum number of initial
values of the parameter vector to try, in case of |
nconvstop |
An integer indicating how many times the quasi-likelihood
estimation algorithm should converge before the grid search across
different initial parameter values is truncated. Defaults to |
zerosallowed |
A logical indicating whether 0 values are acceptable
in the initial values of the parameter vector. Defaults to |
maxitql |
An integer specifying the maximum number of iterations to
run in the Gauss-Newton algorithm for quasi-likelihood estimation.
Defaults to |
tolql |
A double specifying the convergence criterion for the
Gauss-Newton algorithm; defaults to |
nestedql |
A logical indicating whether to use the nested updating step
suggested in \insertCiteSeber03;textualskedastic. Defaults to
|
reduce2homosked |
A logical indicating whether the homoskedastic
error variance estimator e'e/(n-p) should be used if the
variable selection procedure does not select any variables. Defaults to
|
cvoption |
A character, either |
nfolds |
An integer specifying the number of folds K to use for
cross-validation, if the λ and/or n_c hyperparameters
are to be tuned using cross-validation. Defaults to |
... |
Other arguments that can be passed to (non-exported) helper functions, namely:
|
The ANLVM model equation is
e_i^2=\displaystyle∑_{k=1}^{n} g(X_{k\cdot}'γ) m_{ik}^2+u_i
, where e_i is the ith Ordinary Least Squares residual, X_{k\cdot} is a vector corresponding to the kth row of the n\times p design matrix X, m_{ik}^2 is the (i,k)th element of the annihilator matrix M=I-X(X'X)^{-1}X', u_i is a random error term, γ is a p-vector of unknown parameters, and g(\cdot) is a continuous, differentiable function that need not be linear in γ, but must be expressible as a function of the linear predictor X_{k\cdot}'γ. This method has been developed as part of the author's doctoral research project.
The parameter vector γ is estimated using the maximum quasi-likelihood method as described in section 2.3 of \insertCiteSeber03;textualskedastic. The optimisation problem is solved numerically using a Gauss-Newton algorithm.
For further discussion of feature selection and the methods for choosing the
number of clusters to use with the clustering version of the model, see
alvm.fit
.
An object of class "anlvm.fit"
, containing the following:
coef.est
, a vector of parameter estimates, \hat{γ}
var.est
, a vector of estimates \hat{ω} of the error
variances for all observations
method
, either "cluster"
or "functionalform"
,
depending on whether cluster
was set to TRUE
ols
, the lm
object corresponding to the original linear
regression model
fitinfo
, a list containing three named objects, g
(the
heteroskedastic function), Msq
(the elementwise-square of the
annihilator matrix M), Z
(the design matrix used in the
ANLVM, after feature selection if applicable), and clustering
(a list object with results of the clustering procedure, if applicable).
selectinfo
, a list containing two named objects,
varselect
(the value of the eponymous argument), and
selectedcols
(a numeric vector with column indices of X
that were selected, with 1
denoting the intercept column)
qlinfo
, a list containing nine named objects: converged
(a logical, indicating whether the Gauss-Newton algorithm converged
for at least one initial value of the parameter vector),
iterations
(the number of Gauss-Newton iterations used to
obtain the parameter estimates returned), Smin
(the minimum
achieved value of the objective function used in the Gauss-Newton
routine), and six arguments passed to the function (nested
,
param.init
, maxgridrows
, nconvstop
,
maxitql
, and tolql
)
alvm.fit
, avm.ci
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) myanlvm <- anlvm.fit(mtcars_lm, g = function(x) x ^ 2, varselect = "qgcv.linear")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.