epilogi: The epilogi Variable Selection Algorithm for Continuous Data.

View source: R/epilogi.R

epilogiR Documentation

The epilogi Variable Selection Algorithm for Continuous Data.

Description

The \epsilonpilogi Variable Selection Algorithm for Continuous Data.

Usage

epilogi(y, x, tol = 0.01, alpha = 0.05, parallel = FALSE)

Arguments

y

A vector with the continuous response variable.

x

A matrix with the continuous predictor variables.

tol

The tolerance value for the algortihm to terminate. This takes values greater than 0 and it refers to the change between two successive R^2-adjusted values.

alpha

The significance level to deem a predictor variable is statistically equivalent to a selected variable.

parallel

If set to TRUE, some of the computations take place in parallel (in C++).

Details

The \epsilonpilogi variable selection algorithm (Lakiotaki et al., 2023) is a generalisation of the \gamma-OMP algorithm (Tsagris et al. 2022). It applies the aforementioned algorithm with the addition that it returns possible statistically equivalent predictor(s) for each selected predictor. Once a variable is selected the algorithm searches for possible equivalent predictors using the partial correlation between the residuals.

The heuristic method to consider two predictors R and C informationally equivalent given the current selected predictor S is determined as follows: first, the residuals r of the model using S are computed. Then, if the following two conditions hold R and C are considered equivalent: Ind(R; r | C) and Ind(r ; C | R), where Ind(R; r | C) denotes the conditional independence of R with r given C. When linearity is assumed, the test can be implemented by testing for significance the corresponding partial correlation. The tests Ind return a p-value and independence is accepted when it is larger than a threshold (significance value, argument alpha). Intuitively, R and C are heuristically considered equivalent, if C is known, then R provides no additional information for the residuals r, and if R is known, then C provides no additional information for r.

Value

A list including:

runtime

The runtime of the algorithm.

result

A matrix with two columns. The selected predictor(s) and the adjusted R^2-values.

equiv

A list with the equivalent predictors (if any) corresponding to each selected predictor.

Author(s)

Michail Tsagris.

R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.

References

Lakiotaki K., Papadovasilakis Z., Lagani V., Fafalios S., Charonyktakis P., Tsagris M. and Tsamardinos I. (2023). Automated machine learning for Genome Wide Association Studies. Bioinformatics, 39(9): btad545

Tsagris M., Papadovasilakis Z., Lakiotaki K. and Tsamardinos I. (2022). The \gamma-OMP algorithm for feature selection with application to gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(2): 1214–1224.

See Also

pcor.equiv

Examples

#simulate a dataset with continuous data
set.seed(1234)
n <- 500
x <- matrix( rnorm(n * 50, 0, 30), ncol = 50 )

#define a simulated class variable
y <- 2 * x[, 1] - 1.5 * x[, 2] + x[, 3] + rnorm(n, 0, 15)

# define some simulated equivalences
x[, 4] <- x[, 1] + rnorm(n, 0, 1)
x[, 5] <- x[, 2] + rnorm(n, 0, 1)

epilogi(y, x, tol = 0.05)

epilogi documentation built on April 4, 2025, 2:02 a.m.