cor.fsreg: Correlation based forward regression.

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/variable_selection.R

Description

Correlation based forward regression.

Usage

1
2
cor.fsreg(y, x, ystand = TRUE, xstand = TRUE, threshold = 0.05, 
tolb = 2, tolr = 0.02, stopping = "BIC") 

Arguments

y

A numerical vector.

x

A matrix with data, the predictor variables.

ystand

If this is TRUE the response variable is centered. The mean is subtracted from every value.

xstand

If this is TRUE the independent variables are standardised.

threshold

The significance level, set to 0.05 by default. Bear in mind that the logarithm of it is used, as the logarithm of the p-values is calculated at every point. This will avoid numerical overflows and small p-values, less than the machine epsilon, being returned as zero.

tolb

If we see only the significane of the variables, many may enter the linear regression model. For this reason, we also use the BIC as a way to validate the inclusion of a candidate variable. If the BIC difference between two successive models is less than the tolerance value, the variable will not enter the model, even if it statistically significant. Set it to 0 if you do not want this extra check.

tolr

This is an alternative to the BIC change and it uses the adjusted coefficient of determination. If the increase in the adjusted R^2 is more than the tolr continue.

stopping

This refers to the type of extra checking to do. If you want the BIC check, set it to "BIC". If you want the adjusted R^2 check set this to "ar2". Or, if you want both of them to take place, both of these criteria to be satisfied make this "BICR2".

Details

The forward regression tries one by one the variables using the F-test, basically partial F-test every time for the latest variable. This is the same as testing the significance of the coefficient of this latest enetered variable. Alternatively the correlation can be used and this case the partial correlation coefficient. There is a direct relationship between the t-test statistic and the partial correlation coefficient. Now, instead of having to calculate the test statistic, we calculate the partial correlation coefficient. Using Fisher's z-transform we get the variance imediately. The partial correlation coefficient, using Fisher's z-transform, and the partial F-test (or the coefficient's t-test statistic) are not identical. They will be identical for large sample sizes though.

Value

A matrix with three columns, the index of the selected variables, the logged p-value and the the test statistic value and the BIC or adjusted R^2 of each model. In the case of stopping="BICR2" both of these criteria will be returned.

Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris <mtsagris@uoc.gr> and Manos Papadakis <papadakm95@gmail.com>.

References

Draper, N.R. and Smith H. (1988). Applied regression analysis. New York, Wiley, 3rd edition.

See Also

score.glms, univglms, logistic_only, poisson_only, regression

Examples

1
2
3
4
5
## 200 variables, hence 200 univariate regressions are to be fitted
x <- matrnorm(200,  100)
y <- rnorm(200)
system.time( cor.fsreg(y, x) )
x <- NULL

Example output

Loading required package: Rcpp
Loading required package: RcppZiggurat
   user  system elapsed 
  0.009   0.000   0.029 

Rfast documentation built on Dec. 11, 2021, 9:59 a.m.