pfc: Principal fitted components

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/pfc.R

Description

Principal fitted components model for sufficient dimension reduction. This function estimates all parameters in the model.

Usage

1
2
pfc(X, y, fy = NULL, numdir = NULL, structure = c("iso", "aniso",
    "unstr", "unstr2"), eps_aniso = 1e-3, numdir.test = FALSE, ...)

Arguments

X

Design matrix with n rows of observations and p columns of predictors. The predictors are assumed to have a continuous distribution.

y

The response vector of n observations, continuous or categorical.

fy

Basis function to be obtained using bf or defined by the user. It is a function of y alone and has r independent column vectors. See bf, for detail.

numdir

The number of directions to be used in estimating the reduction subspace. The dimension must be less than or equal to the minimum of r and p. By default numdir=\min\{r,p\}.

structure

Structure of var(X|Y). The following options are available: "iso" for isotropic (predictors, conditionally on the response, are independent and on the same measurement scale); "aniso" for anisotropic (predictors, conditionally on the response, are independent and on different measurement scales); "unstr" for unstructured variance. The fourth structure "unstr2" refers to an extended PFC model with an heterogenous error structure.

eps_aniso

Precision term used in estimating var(X|Y) for the anisotropic structure.

numdir.test

Boolean. If FALSE, pfc fits with the numdir provided only. If TRUE, PFC models are fit for all dimensions less than or equal to numdir.

...

Additional arguments to Grassmannoptim.

Details

Let X be a column vector of p predictors, and Y be a univariate response variable. Principal fitted components model is an inverse regression model for sufficient dimension reduction. It is an inverse regression model given by X|(Y=y) \sim N(μ + Γ β f_y, Δ). The term Δ is assumed independent of y. Its simplest structure is the isotropic (iso) with Δ=δ^2 I_p, where, conditionally on the response, the predictors are independent and are on the same measurement scale. The sufficient reduction is Γ^TX. The anisotropic (aniso) PFC model assumes that Δ=diag(δ_1^2, ..., δ_p^2), where the conditional predictors are independent and on different measurement scales. The unstructured (unstr) PFC model allows a general structure for Δ. With the anisotropic and unstructured Δ, the sufficient reduction is Γ^T Δ^{-1}X. it should be noted that X \in R^{p} while the data-matrix to use is in R^{n \times p}.

The error structure of the extended structure has the following form

Δ=Γ Ω Γ^T + Γ_0 Ω_0 Γ_0^T,

where Γ_0 is the orthogonal completion of Γ such that (Γ, Γ_0) is a p \times p orthogonal matrix. The matrices Ω \in R^{d \times d} and Ω_0 \in R^{(p-d) \times (p-d)} are assumed to be symmetric and full-rank. The sufficient reduction is Γ^{T}X. Let \mathcal{S}_{Γ} be the subspace spanned by the columns of Γ. The parameter space of \mathcal{S}_{Γ} is the set of all d dimensional subspaces in R^p, called Grassmann manifold and denoted by \mathcal{G}_{(d,p)}. Let \hat{Σ}, \hat{Σ}_{\mathrm{fit}} be the sample variance of X and the fitted covariance matrix, and let \hat{Σ}_{\mathrm{res}}=\hat{Σ} - \hat{Σ}_{\mathrm{fit}}. The MLE of \mathcal{S}_{Γ} under unstr2 setup is obtained by maximizing the log-likelihood

L(\mathcal{S}_U) = - \log|U^T \hat{Σ}_{\mathrm{res}} U| - \log|V^T \hat{Σ}V|

over \mathcal{G}_{(d,p)}, where V is an orthogonal completion of U.

The dimension d of the sufficient reduction must be estimated. A sequential likelihood ratio test is implemented as well as Akaike and Bayesian information criterion following Cook and Forzani (2008)

Value

This command returns a list object of class ldr. The output depends on the argument numdir.test. If numdir.test=TRUE, a list of matrices is provided corresponding to the numdir values (1 through numdir) for each of the parameters μ, β, Γ, Γ_0, Ω, and Ω_0. Otherwise, a single list of matrices for a single value of numdir. The outputs of loglik, aic, bic, numpar are vectors of numdir elements if numdir.test=TRUE, and scalars otherwise. Following are the components returned:

R

The reduction data-matrix of X obtained using the centered data-matrix X. The centering of the data-matrix of X is such that each column vector is centered around its sample mean.

Muhat

Estimate of μ.

Betahat

Estimate of β.

Deltahat

The estimate of the covariance Δ.

Gammahat

An estimated orthogonal basis representative of \hat{\mathcal{S}}_{Γ}, the subspace spanned by Γ.

Gammahat0

An estimated orthogonal basis representative of \hat{\mathcal{S}}_{Γ_0}, the subspace spanned by Γ_0.

Omegahat

The estimate of the covariance Ω if an extended model is used.

Omegahat0

The estimate of the covariance Ω_0 if an extended model is used.

loglik

The value of the log-likelihood for the model.

aic

Akaike information criterion value.

bic

Bayesian information criterion value.

numdir

The number of directions to estimate.

numpar

The number of parameters in the model.

evalues

The first numdir largest eigenvalues of \hat{Σ}_{\mathrm{fit}}.

Author(s)

Kofi Placid Adragni <kofi@umbc.edu>

References

Adragni, KP and Cook, RD (2009): Sufficient dimension reduction and prediction in regression. Phil. Trans. R. Soc. A 367, 4385-4405.

Cook, RD (2007): Fisher Lecture - Dimension Reduction in Regression (with discussion). Statistical Science, 22, 1–26.

Cook, RD and Forzani, L (2008): Principal fitted components for dimension reduction in regression. Statistical Science 23, 485–501.

See Also

core, lad

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
data(bigmac)
fit1 <- pfc(X=bigmac[,-1], y=bigmac[,1], fy=bf(y=bigmac[,1], case="poly",
        degree=3),numdir=3, structure="aniso")
summary(fit1)
plot(fit1)

fit2 <- pfc(X=bigmac[,-1], y=bigmac[,1], fy=bf(y=bigmac[,1], case="poly",
        degree=3), numdir=3, structure="aniso", numdir.test=TRUE)
summary(fit2)
	

Example output

Loading required package: GrassmannOptim
Loading required package: Matrix

Call:
pfc(X = bigmac[, -1], y = bigmac[, 1], fy = bf(y = bigmac[, 1], 
    case = "poly", degree = 3), numdir = 3, structure = "aniso")


Estimated Basis Vectors for Central Subspace:
         Dir1    Dir2    Dir3
 [1,]  0.0274 -0.1403 -0.4889
 [2,] -0.9876  0.0882  0.0629
 [3,] -0.0808 -0.2834 -0.2790
 [4,] -0.0284  0.0860 -0.2030
 [5,] -0.0073  0.0307  0.0691
 [6,] -0.1209 -0.5787 -0.4920
 [7,] -0.0418  0.2019  0.0878
 [8,]  0.0130  0.7127 -0.6184
 [9,]  0.0018 -0.0135 -0.0310

Call:
pfc(X = bigmac[, -1], y = bigmac[, 1], fy = bf(y = bigmac[, 1], 
    case = "poly", degree = 3), numdir = 3, structure = "aniso", 
    numdir.test = TRUE)

Estimated Basis Vectors for Central Subspace:
         Dir1    Dir2    Dir3
 [1,]  0.0274 -0.1403 -0.4889
 [2,] -0.9876  0.0882  0.0629
 [3,] -0.0808 -0.2834 -0.2790
 [4,] -0.0284  0.0860 -0.2030
 [5,] -0.0073  0.0307  0.0691
 [6,] -0.1209 -0.5787 -0.4920
 [7,] -0.0418  0.2019  0.0878
 [8,]  0.0130  0.7127 -0.6184
 [9,]  0.0018 -0.0135 -0.0310

Information Criterion:
         d=0      d=1      d=2      d=3
aic 3108.861 3305.375 3301.165 3307.941
bic 3141.381 3357.768 3369.818 3389.241

Large sample likelihood ratio test 
                   Stat df    p.value
0D vs >= 1D -145.080270 27 1.00000000
1D vs >= 2D   29.434033 16 0.02116847
2D vs >= 3D    7.223627  7 0.40597338

               Dir1   Dir2   Dir3
Eigenvalues  5.4793 0.5414 0.1685
R^2(OLS|pfc) 0.5148 0.5177 0.5404

ldr documentation built on May 2, 2019, 2:13 p.m.

Related to pfc in ldr...