# Ohit: Fit a high-dimensional linear regression model via... In Ohit: OGA+HDIC+Trim and High-Dimensional Linear Regression Models

## Description

The first step is to sequentially select input variables via orthogonal greedy algorithm (OGA). The second step is to determine the number of OGA iterations using high-dimensional information criterion (HDIC). The third step is to remove irrelevant variables remaining in the second step using HDIC.

## Usage

 1 2 Ohit(X, y, Kn = NULL, c1 = 5, HDIC_Type = "HDBIC", c2 = 2, c3 = 2.01, intercept = TRUE)

## Arguments

 X Input matrix of n rows and p columns. y Response vector of length n. Kn The number of OGA iterations. Kn must be a positive integer between 1 and p. Default is Kn=max(1, min(floor(c1*sqrt(n/log(p))), p)), where c1 is a tuning parameter. c1 The tuning parameter for the number of OGA iterations. Default is c1=5. HDIC_Type High-dimensional information criterion. The value must be "HDAIC", "HDBIC" or "HDHQ". The formula is n*log(rmse)+k_use*omega_n*log(p) where rmse is the residual mean squared error and k_use is the number of variables used to fit the model. For HDIC_Type="HDAIC", it is HDIC with omega_n=c2. For HDIC_Type="HDBIC", it is HDIC with omega_n=log(n). For HDIC_Type="HDHQ", it is HDIC with omega_n=c3*log(log(n)). Default is HDIC_Type="HDBIC". c2 The tuning parameter for HDIC_Type="HDAIC". Default is c2=2. c3 The tuning parameter for HDIC_Type="HDHQ". Default is c3=2.01. intercept Should an intercept be fitted? Default is intercept=TRUE.

## Value

 n The number of observations. p The number of input variables. Kn The number of OGA iterations. J_OGA The index set of Kn variables sequencially selected by OGA. HDIC The HDIC values along the OGA path. J_HDIC The index set of valuables determined by OGA+HDIC. J_Trim The index set of valuables determined by OGA+HDIC+Trim. betahat_HDIC The estimated regression coefficients of the model determined by OGA+HDIC. betahat_Trim The estimated regression coefficients of the model determined by OGA+HDIC+Trim.

## Author(s)

Hai-Tang Chiou, Ching-Kang Ing and Tze Leung Lai.

## References

Ing, C.-K. and Lai, T. L. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica, 21, 1473–1513.

## Examples

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # Example setup (Example 3 in Section 5 of Ing and Lai (2011)) n = 400 p = 4000 q = 10 beta_1q = c(3, 3.75, 4.5, 5.25, 6, 6.75, 7.5, 8.25, 9, 9.75) b = sqrt(3/(4 * q)) x_relevant = matrix(rnorm(n * q), n, q) d = matrix(rnorm(n * (p - q), 0, 0.5), n, p - q) x_relevant_sum = apply(x_relevant, 1, sum) x_irrelevant = apply(d, 2, function(a) a + b * x_relevant_sum) X = cbind(x_relevant, x_irrelevant) epsilon = rnorm(n) y = as.vector((x_relevant %*% beta_1q) + epsilon) # Fit a high-dimensional linear regression model via OGA+HDIC+Trim Ohit(X, y, intercept = FALSE)

### Example output

\$n
[1] 400

\$p
[1] 4000

\$Kn
[1] 34

\$J_OGA
[1]  976 2911   10    9    8    7    6    5    4    3    2    1  432  900 1867
[16]  282   77 3532  275 3190  508 2978 1895   37 3937 3792 2457 2254 1841  557
[31]  559 3418  823 3464

\$HDIC
[1] 1977.3699 1855.2058 1835.4618 1808.4484 1776.9847 1752.5538 1673.9168
[8] 1600.5852 1536.5116 1421.7832 1286.3834  620.9388  655.9772  692.4024
[15]  729.9630  769.1924  808.4527  848.8927  889.5049  929.5755  966.6255
[22] 1007.0329 1044.2767 1083.2249 1122.3615 1160.7090 1200.2530 1239.7055
[29] 1279.8540 1320.4601 1360.0681 1397.7032 1436.6435 1476.5672

\$J_HDIC
[1]    1    2    3    4    5    6    7    8    9   10  976 2911

\$J_Trim
[1]  1  2  3  4  5  6  7  8  9 10

\$betahat_HDIC

Call:
lm(formula = y ~ . - 1, data = X_HDIC)

Residuals:
Min      1Q  Median      3Q     Max
-2.9917 -0.6678 -0.0493  0.5657  3.1895

Coefficients:
Estimate Std. Error t value Pr(>|t|)
X1     2.93495    0.06669  44.006   <2e-16 ***
X2     3.58349    0.06738  53.183   <2e-16 ***
X3     4.40725    0.06602  66.760   <2e-16 ***
X4     5.15546    0.07035  73.282   <2e-16 ***
X5     5.97555    0.07099  84.172   <2e-16 ***
X6     6.71770    0.07092  94.718   <2e-16 ***
X7     7.42118    0.07208 102.955   <2e-16 ***
X8     8.11738    0.07111 114.152   <2e-16 ***
X9     8.90003    0.07423 119.900   <2e-16 ***
X10    9.63881    0.07427 129.776   <2e-16 ***
X976   0.08101    0.11937   0.679    0.498
X2911  0.16786    0.11148   1.506    0.133
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.048 on 388 degrees of freedom
Multiple R-squared:  0.9979,	Adjusted R-squared:  0.9979
F-statistic: 1.551e+04 on 12 and 388 DF,  p-value: < 2.2e-16

\$betahat_Trim

Call:
lm(formula = y ~ . - 1, data = X_Trim)

Residuals:
Min      1Q  Median      3Q     Max
-2.8830 -0.7174 -0.0498  0.6022  3.3112

Coefficients:
Estimate Std. Error t value Pr(>|t|)
X1   2.99865    0.04986   60.14   <2e-16 ***
X2   3.64529    0.05154   70.73   <2e-16 ***
X3   4.46990    0.05033   88.82   <2e-16 ***
X4   5.22402    0.05173  101.00   <2e-16 ***
X5   6.03686    0.05555  108.68   <2e-16 ***
X6   6.78602    0.05266  128.88   <2e-16 ***
X7   7.49202    0.05303  141.27   <2e-16 ***
X8   8.19060    0.05191  157.77   <2e-16 ***
X9   8.97783    0.05265  170.52   <2e-16 ***
X10  9.71184    0.05316  182.68   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.049 on 390 degrees of freedom
Multiple R-squared:  0.9979,	Adjusted R-squared:  0.9979
F-statistic: 1.859e+04 on 10 and 390 DF,  p-value: < 2.2e-16

Ohit documentation built on May 1, 2019, 8:43 p.m.