core: Covariance Reduction

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/core.R

Description

Method to reduce sample covariance matrices to an informational core that is sufficient to characterize the variance heterogeneity among different populations.

Usage

1
2
core(X, y, Sigmas = NULL, ns = NULL, numdir = 2,
        numdir.test = FALSE, ...)

Arguments

X

Data matrix with n rows of observations and p columns of predictors. The predictors are assumed to have a continuous distribution.

y

Vector of group labels. Observations with the same label are considered to be in the same group.

Sigmas

A list object of sample covariance matrices corresponding to the different populations.

ns

A vector of number of observations of the samples corresponding to the different populations.

numdir

Integer between 1 and p. It is the number of directions to estimate for the reduction.

numdir.test

Boolean. If FALSE, core computes the reduction for the specific number of directions numdir. If TRUE, it does the computation of the reduction for the numdir directions, from 0 to numdir. Likelihood ratio test and information criteria are used to estimate the true dimension of the sufficient reduction.

...

Other arguments to pass to GrassmannOptim.

Details

Consider the problem of characterizing the covariance matrices Σ_y, y=1,...,h, of a random vector X observed in each of h normal populations. Let S_y = (n_y-1)\tilde{Σ}_y where \tilde{Σ}_y is the sample covariance matrix corresponding to Σ_y, and n_y is the number of observations corresponding to y. The goal is to find a semi-orthogonal matrix Γ \in R^{p \times d}, d < p, with the property that for any two populations j and k

S_j|(Γ' S_j Γ=B, n_j=m) \sim S_k|(Γ' S_k Γ=B, n_k=m).

That is, given Γ' S_g Γ and n_g, the conditional distribution of S_g must must depend on g. Thus Γ' S_g Γ is sufficient to account for the heterogeneity among the population covariance matrices. The central subspace \mathcal{S}, spanned by the columns of Γ is obtained by optimizing the following log-likelihood function

L(\mathcal{S})= c-\frac{n}{2} \log|\tilde{Σ}| + \frac{n}{2} \log|P_{\mathcal{S}} \tilde{Σ} P_{\mathcal{S}}|-∑_{y=1}^{h}\frac{n_y}{2} \log|P_{\mathcal{S}} \tilde{Σ}_y P_{\mathcal{S}}|,

where c is a constant depending only on p and n_y, \tilde{Σ}_y, y=1,...,h, denotes the sample covariance matrix from population y computed with divisor n_y, and \tilde{Σ}=∑_{y=1}^{h} (n_y/n)\tilde{Σ}. The optimization is carried over \mathcal{G}_{(d,p)}, the set of all d-dimensional subspaces in R^{p}, called Grassmann manifold of dimension d(p-d).

The dimension d is to be estimated. A sequential likelihood ratio test and information criteria (AIC, BIC) are implemented, following Cook and Forzani (2008).

Value

This command returns a list object of class ldr. The output depends on the argument numdir.test. If numdir.test=TRUE, a list of matrices is provided corresponding to the numdir values (1 through numdir) for each of the parameters Γ, Σ, and Σ_g. Otherwise, a single list of matrices for a single value of numdir. A likelihood ratio test and information criteria are provided to estimate the dimension of the sufficient reduction when numdir.test=TRUE. The output of loglik, aic, bic, numpar are vectors with numdir elements if numdir.test=TRUE, and scalars otherwise. Following are the components returned:

Gammahat

Estimate of Γ.

Sigmahat

Estimate of overall Σ.

Sigmashat

Estimate of group-specific Σ_g's.

loglik

Maximized value of the CORE log-likelihood.

aic

Akaike information criterion value.

bic

Bayesian information criterion value.

numpar

Number of parameters in the model.

Note

Currently loglik, AIC, and BIC are computed up to a constant. Therefore, these can be compared relatively (e.g. two loglik's can be subtracted to compute a likelihood ratio test), but they should not be treated as absolute quantities.

Author(s)

Andrew Raim and Kofi P Adragni, University of Maryland, Baltimore County

References

Cook RD and Forzani L (2008). Covariance reducing models: An alternative to spectral modelling of covariance matrices. Biometrika, Vol. 95, No. 4, 799–812.

See Also

lad, pfc

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
data(flea)
fit1 <- core(X=flea[,-1], y=flea[,1], numdir.test=TRUE)
summary(fit1)

## Not run: 
data(snakes)
fit2 <- ldr(Sigmas=snakes[-3], ns=snakes[[3]], numdir = 4, 
	model = "core", numdir.test = TRUE, verbose=TRUE, 
	sim_anneal = TRUE, max_iter = 200, max_iter_sa=200)
summary(fit2)

## End(Not run)

ldr documentation built on May 2, 2019, 2:13 p.m.

Related to core in ldr...