cor.fsreg: Correlation based forward regression. In Rfast: A Collection of Efficient and Extremely Fast R Functions

Description

Correlation based forward regression.

Usage

 ```1 2``` ```cor.fsreg(y, x, ystand = TRUE, xstand = TRUE, threshold = 0.05, tolb = 2, tolr = 0.02, stopping = "BIC") ```

Arguments

 `y` A numerical vector. `x` A matrix with data, the predictor variables. `ystand` If this is TRUE the response variable is centered. The mean is subtracted from every value. `xstand` If this is TRUE the independent variables are standardised. `threshold` The significance level, set to 0.05 by default. Bear in mind that the logarithm of it is used, as the logarithm of the p-values is calculated at every point. This will avoid numerical overflows and small p-values, less than the machine epsilon, being returned as zero. `tolb` If we see only the significane of the variables, many may enter the linear regression model. For this reason, we also use the BIC as a way to validate the inclusion of a candidate variable. If the BIC difference between two successive models is less than the tolerance value, the variable will not enter the model, even if it statistically significant. Set it to 0 if you do not want this extra check. `tolr` This is an alternative to the BIC change and it uses the adjusted coefficient of determination. If the increase in the adjusted R^2 is more than the tolr continue. `stopping` This refers to the type of extra checking to do. If you want the BIC check, set it to "BIC". If you want the adjusted R^2 check set this to "ar2". Or, if you want both of them to take place, both of these criteria to be satisfied make this "BICR2".

Details

The forward regression tries one by one the variables using the F-test, basically partial F-test every time for the latest variable. This is the same as testing the significance of the coefficient of this latest enetered variable. Alternatively the correlation can be used and this case the partial correlation coefficient. There is a direct relationship between the t-test statistic and the partial correlation coefficient. Now, instead of having to calculate the test statistic, we calculate the partial correlation coefficient. Using Fisher's z-transform we get the variance imediately. The partial correlation coefficient, using Fisher's z-transform, and the partial F-test (or the coefficient's t-test statistic) are not identical. They will be identical for large sample sizes though.

Value

A matrix with three columns, the index of the selected variables, the logged p-value and the the test statistic value and the BIC or adjusted R^2 of each model. In the case of stopping="BICR2" both of these criteria will be returned.

Michail Tsagris

References

Draper, N.R. and Smith H. (1988). Applied regression analysis. New York, Wiley, 3rd edition.

``` score.glms, univglms, logistic_only, poisson_only, regression ```

Examples

 ```1 2 3 4 5``` ```## 200 variables, hence 200 univariate regressions are to be fitted x <- matrnorm(200, 100) y <- rnorm(200) system.time( cor.fsreg(y, x) ) x <- NULL ```

Example output

```Loading required package: Rcpp