Ohit: Fit a high-dimensional linear regression model via...

Description Usage Arguments Value Author(s) References Examples

Description

The first step is to sequentially select input variables via orthogonal greedy algorithm (OGA). The second step is to determine the number of OGA iterations using high-dimensional information criterion (HDIC). The third step is to remove irrelevant variables remaining in the second step using HDIC.

Usage

1
2
Ohit(X, y, Kn = NULL, c1 = 5, HDIC_Type = "HDBIC", c2 = 2, c3 = 2.01,
  intercept = TRUE)

Arguments

X

Input matrix of n rows and p columns.

y

Response vector of length n.

Kn

The number of OGA iterations. Kn must be a positive integer between 1 and p. Default is Kn=max(1, min(floor(c1*sqrt(n/log(p))), p)), where c1 is a tuning parameter.

c1

The tuning parameter for the number of OGA iterations. Default is c1=5.

HDIC_Type

High-dimensional information criterion. The value must be "HDAIC", "HDBIC" or "HDHQ". The formula is n*log(rmse)+k_use*omega_n*log(p) where rmse is the residual mean squared error and k_use is the number of variables used to fit the model. For HDIC_Type="HDAIC", it is HDIC with omega_n=c2. For HDIC_Type="HDBIC", it is HDIC with omega_n=log(n). For HDIC_Type="HDHQ", it is HDIC with omega_n=c3*log(log(n)). Default is HDIC_Type="HDBIC".

c2

The tuning parameter for HDIC_Type="HDAIC". Default is c2=2.

c3

The tuning parameter for HDIC_Type="HDHQ". Default is c3=2.01.

intercept

Should an intercept be fitted? Default is intercept=TRUE.

Value

n

The number of observations.

p

The number of input variables.

Kn

The number of OGA iterations.

J_OGA

The index set of Kn variables sequencially selected by OGA.

HDIC

The HDIC values along the OGA path.

J_HDIC

The index set of valuables determined by OGA+HDIC.

J_Trim

The index set of valuables determined by OGA+HDIC+Trim.

betahat_HDIC

The estimated regression coefficients of the model determined by OGA+HDIC.

betahat_Trim

The estimated regression coefficients of the model determined by OGA+HDIC+Trim.

Author(s)

Hai-Tang Chiou, Ching-Kang Ing and Tze Leung Lai.

References

Ing, C.-K. and Lai, T. L. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica, 21, 1473–1513.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Example setup (Example 3 in Section 5 of Ing and Lai (2011))
n = 400
p = 4000
q = 10
beta_1q = c(3, 3.75, 4.5, 5.25, 6, 6.75, 7.5, 8.25, 9, 9.75)
b = sqrt(3/(4 * q))

x_relevant = matrix(rnorm(n * q), n, q)
d = matrix(rnorm(n * (p - q), 0, 0.5), n, p - q)
x_relevant_sum = apply(x_relevant, 1, sum)
x_irrelevant = apply(d, 2, function(a) a + b * x_relevant_sum)
X = cbind(x_relevant, x_irrelevant)
epsilon = rnorm(n)
y = as.vector((x_relevant %*% beta_1q) + epsilon)

# Fit a high-dimensional linear regression model via OGA+HDIC+Trim
Ohit(X, y, intercept = FALSE)

Example output

$n
[1] 400

$p
[1] 4000

$Kn
[1] 34

$J_OGA
 [1]  976 2911   10    9    8    7    6    5    4    3    2    1  432  900 1867
[16]  282   77 3532  275 3190  508 2978 1895   37 3937 3792 2457 2254 1841  557
[31]  559 3418  823 3464

$HDIC
 [1] 1977.3699 1855.2058 1835.4618 1808.4484 1776.9847 1752.5538 1673.9168
 [8] 1600.5852 1536.5116 1421.7832 1286.3834  620.9388  655.9772  692.4024
[15]  729.9630  769.1924  808.4527  848.8927  889.5049  929.5755  966.6255
[22] 1007.0329 1044.2767 1083.2249 1122.3615 1160.7090 1200.2530 1239.7055
[29] 1279.8540 1320.4601 1360.0681 1397.7032 1436.6435 1476.5672

$J_HDIC
 [1]    1    2    3    4    5    6    7    8    9   10  976 2911

$J_Trim
 [1]  1  2  3  4  5  6  7  8  9 10

$betahat_HDIC

Call:
lm(formula = y ~ . - 1, data = X_HDIC)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.9917 -0.6678 -0.0493  0.5657  3.1895 

Coefficients:
      Estimate Std. Error t value Pr(>|t|)    
X1     2.93495    0.06669  44.006   <2e-16 ***
X2     3.58349    0.06738  53.183   <2e-16 ***
X3     4.40725    0.06602  66.760   <2e-16 ***
X4     5.15546    0.07035  73.282   <2e-16 ***
X5     5.97555    0.07099  84.172   <2e-16 ***
X6     6.71770    0.07092  94.718   <2e-16 ***
X7     7.42118    0.07208 102.955   <2e-16 ***
X8     8.11738    0.07111 114.152   <2e-16 ***
X9     8.90003    0.07423 119.900   <2e-16 ***
X10    9.63881    0.07427 129.776   <2e-16 ***
X976   0.08101    0.11937   0.679    0.498    
X2911  0.16786    0.11148   1.506    0.133    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.048 on 388 degrees of freedom
Multiple R-squared:  0.9979,	Adjusted R-squared:  0.9979 
F-statistic: 1.551e+04 on 12 and 388 DF,  p-value: < 2.2e-16


$betahat_Trim

Call:
lm(formula = y ~ . - 1, data = X_Trim)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8830 -0.7174 -0.0498  0.6022  3.3112 

Coefficients:
    Estimate Std. Error t value Pr(>|t|)    
X1   2.99865    0.04986   60.14   <2e-16 ***
X2   3.64529    0.05154   70.73   <2e-16 ***
X3   4.46990    0.05033   88.82   <2e-16 ***
X4   5.22402    0.05173  101.00   <2e-16 ***
X5   6.03686    0.05555  108.68   <2e-16 ***
X6   6.78602    0.05266  128.88   <2e-16 ***
X7   7.49202    0.05303  141.27   <2e-16 ***
X8   8.19060    0.05191  157.77   <2e-16 ***
X9   8.97783    0.05265  170.52   <2e-16 ***
X10  9.71184    0.05316  182.68   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.049 on 390 degrees of freedom
Multiple R-squared:  0.9979,	Adjusted R-squared:  0.9979 
F-statistic: 1.859e+04 on 10 and 390 DF,  p-value: < 2.2e-16

Ohit documentation built on May 1, 2019, 8:43 p.m.

Related to Ohit in Ohit...