scair | R Documentation |
This function searches for a maximum likelihood estimator (mle) of the generalised additive index regression with shape constraints. A stochastic search strategy is used here.
Each index is a linear combination of some (or all) the covariates. Each additive component function of these index predictors is assumed to belong to one of the nine possible shape restrictions.
The output is an object of class scair
which contains all the information
needed to plot the estimator using the plot
method, or
to evaluate it using the predict
method.
scair(x,y,shape=rep("l",1), family=gaussian(), weights=rep(1,length(y)), epsilon=1e-8, delta=0.1, indexgen=c("unif", "norm"), iter = 200, allnonneg = FALSE)
x |
Observed covariates in R^d, in the form of an n x d
numeric |
y |
Observed responses, in the form of a numeric |
shape |
A vector that specifies the shape restrictions for additive component function
of each index (also called the ridge function), in the form of a string vector of
length m. Here for the sake of identifiability, we require the number of
indices m <= d. The shape constraints we considered (with their
coresponding abbreviations used in
|
family |
A description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. Currently only the following five common exponential families are allowed: Gaussian, Binomial, Poisson, and Gamma. By default the canonical link function is used. |
weights |
An optional vector of prior weights to be used when maximising the likelihood. It is a numeric vector of length n. By default equal weights are used. |
epsilon |
Positive convergence tolerance epsilon when performing the
iteratively reweighted least squares (IRLS) method at each iteration of
the active set algorithm in |
delta |
A tuning parameter used to avoid the perfect fit phenomenon, and to ensure identifiability. It represents the lower bound of the minimum eigenvalue of all possible A^T A subject to identiability conditions, where A is an index matrix. It should be smaller than 1. This parameter is NOT needed when d=1, or the prediction function is convex or concave, or all the entries of the index matrix are non-negative if all ridge functions are increasing or decreasing. |
indexgen |
It determines how the index matrices are generated in the stochastic
search. If its value is " |
iter |
Number of iterations of the stochastic search. |
allnonneg |
A boolean variable that specifies whether all the entries of the
index matrices are non-negative. If it is true, then |
For i=1,...,n, let X_i be the d-dimensional covariates, Y_i be the corresponding one-dimensional response and w_i be its weight. The generalised additive index model can be written as
g(mu) = f(A^T x),
where x=(x_1,...,x_d)^T, g is a known link function, A is an d x m index matrix, and f is an additive function. Our task is to estimate both the index matrix and the additive function.
Assume the canonical link function is used here, then the maximum likelihood estimator of the generalised additive index model based on observations (X_1,Y_1), ..., (X_n,Y_n) is the function that maximises
[w_1 (Y_1 f(A^T X_1) -B(f(A^T X_1))) + ... + w_n (Y_n f(A^T X_n) -B(f(A^T X_n)))]/n
subject to the restrictions that for every j=1,...,m,
the j-th additive component of f satisfies the constraint indicated by the
j-th element of shape
. Here B(.) is the log-partition function of
the specified exponential family distribution, and w_i are the weights. For i.i.d. data,
w_i should be 1 for each i.
For any given A, the optimization problem can solved using the active set algorithm
implemented in scar
. Therefore, this problem can be reduced to a finite-dimensional
optimisation problem. Here we apply a simple stochastic search strategy is proposed, though other methods,
such as downhill simplex, is also possible (and sometimes offers competitive performance).
All the implementaton details can be found in Chen and Samworth (2016), where theoretical
justification of our estimator (i.e. uniform consistency) is also given.
For the identifiability of additive index models, we refer to Yuan (2011).
An object of class scair
, with the following components:
x |
Covariates copied from input. |
y |
Response copied from input. |
shape |
Shape vector copied from input. |
weights |
Vector of weights copied from input. |
family |
The exponential family copied from input. |
componentfit |
Value of the fitted component function at each observed
index (computed using the estimated index matrix), in the form of an
n x m numeric |
constant |
The estimated value of the constant c in the
additive function f (see details of |
deviance |
Up to a constant, minus twice the maximised log-likelihood.
Where applicable, the constant is chosen to make the saturated
model to have zero deviance. See also |
nulldeviance |
The deviance for the null model. |
delta |
A parameter copied from input. |
iter |
Total number of iterations of the stochastic search algorithm |
allnonneg |
specifies whether all entris of the index matrix is non-negative, copied from input. |
Yining Chen and Richard Samworth
Chen, Y. and Samworth, R. J. (2016). Generalized additive and index models with shape constraints. Journal of the Royal Statistical Society: Series B, 78, 729-754.
Yuan, M. (2011). On the identifiability of additive index models. Statistica Sinica, 21, 1901-1911.
plot.scair
, predict.scair
, scar
, decathlon
## An example in the Gaussian additive index regression setting: ## Define the additive function f on the scale of the predictors f<-function(x){ return((0.5*x[,1]+0.25*x[,2]-0.25*x[,3])^2) } ## Simulate the covariates and the responses ## covariates are drawn uniformly from [-1,1]^3 set.seed(10) d = 3 n = 500 x = matrix(runif(n*d)*2-1,nrow=n,ncol=d) y = f(x) + rnorm(n,sd=0.5) ## Single index model so no delta is required here shape=c("cvx") object = scair(x,y,shape=shape, family=gaussian(),iter = 100) ## Estimated index matrix object$index ## Root squared error for the estimated index sqrt(sum((object$index - c(0.5,0.25,-0.25))^2)) ## Plot the estimatied additive function for the single index plot(object) ## Evaluate the estimated prediction function at 10^4 random points ## drawing from the interior of the support testx = matrix((runif(10000*d)*1.96-0.98),ncol=d) testf = predict(object,testx) ## and calculate the (estimated) absolute prediction error mean(abs(testf-f(testx))) ## Here we can treat the obtained index matrix as a warm start and perform ## further optimization (on the second and third entry of the index) ## using e.g. the default R optimisation routine. fn<-function(w){ dev = Inf if (abs(w[1])+abs(w[2])>1) return(dev) else { wnew = matrix(c(1-abs(w[1])-abs(w[2]),w[1],w[2]),ncol=1) dev = scar(x %*% wnew, y, shape = "cvx")$deviance return (dev) } } index23 = optim(object$index[2:3],fn)$par newindex = matrix(c(1-sum(abs(index23)),index23),ncol=1); newindex ## Root squared error for the new estimated index sqrt(sum((newindex - c(0.5,0.25,-0.25))^2)) ## A further example is provided in decathlon dataset
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.