# VarSelection: Variable Selection In LinkHD: LinkHD: a versatile framework to explore and integrate heterogeneous data

## Description

Function to do variable selection using a Regression Biplot methodology. This function calculates the regression biplot on the compromise matrix. Biplot can be understood as the decomposition of a target matrix (\$Y=XB\$). Here, \$Y\$ is the matrix containing all variables taken into account in the analisis,\$X\$ is the matrix containing the explaining variables, i.e., the coordinates of compromise matrix and finally, \$B\$ are the regression coefficients to be estimated. Then, the method is interpreted as a general linear regression into the \$X\$ matrix (\$Y_hat=X(X'X)^(-1)X'Y\$) and the matrix \$X(X'X)^(-1)X'\$ is the projection matrix onto the compromise configuration. We use a classical linear model to obtain the regressors coefficients, however the model could be extended and alternatives methods are able to use. The quality of the regression biplot is measured using the proportion of explained variance by each regression (adjusted r squared coefficient).

## Usage

 ```1 2``` ```VarSelection(x, Data, intercept = FALSE, model = "LM", Crit = "Rsquare", perc = 0.9, nDims = 2, Normalize = FALSE) ```

## Arguments

 `x` is an object of DistStatis Class. `Data` should be a list of data.frame or ExpressionSet data with the same length of the number of tables to be integrate. In each dataframe, the Observations (common elements on Statis) should be in rows and the variables should be in columns. Data are the same data used to obtained the compromise configuration.It also can be a MultissayExperiment object, please check help of LinkData function and the package vignette. `intercept` Logical. If is TRUE, the models with intercept are computed, else the intercept is zero. `model` character. 'LM' for classical lm model. We've planned to implemening alternative models in the future. `Crit` Character indicating the variable selection criteria.You could chose 'Rsquare' or 'p-val'. `perc` The value of percentil that indicate how much data than are selected. `nDims` Numeric that indicates the number of dimensions to use for do the model. Default is 2. `Normalize` Logical. If is TRUE, the response variable in each model is normalized.

## Value

a

 `VarSelection` VarSelection class with the corresponding completed slots according to the given model

## Author(s)

Laura M Zingatetti

## References

1. Gabriel, K. (1971). The biplot graphic display of matrices with application to principal component analysis. Biometrika 58(3), 453–467.

2. Gower, J. & Hand, D. (1996). Biplots, Monographs on statistics and applied probability. 54. London: Chapman and Hall., 277 pp.

3. Greenacre, M. J. (2010). Biplots in practice. Fundacion BBVA.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17``` ```{ data(Taraoceans) pro.phylo <- Taraoceans\$taxonomy[ ,'Phylum'] TaraOc<-list(Taraoceans\$phychem,as.data.frame(Taraoceans\$pro.phylo), as.data.frame(Taraoceans\$pro.NOGs)) TaraOc_1<-scale(TaraOc[]) Normalization<-lapply(list(TaraOc[],TaraOc[]), function(x){DataProcessing(x,Method='Compositional')}) colnames(Normalization[])=pro.phylo colnames(Normalization[])=Taraoceans\$GO TaraOc<-list(TaraOc_1,Normalization[],Normalization[]) names(TaraOc)<-c('phychem','pro_phylo','pro_NOGs') TaraOc<-lapply(TaraOc,as.data.frame) Output<-LinkData(TaraOc,Scale =FALSE, Distance = c('ScalarProduct','Euclidean','Euclidean')) Selection<-VarSelection(Output,TaraOc,Crit='Rsquare',perc=0.95) } ```

LinkHD documentation built on Nov. 8, 2020, 5:08 p.m.