preprocess_data: A proposition of function to process high dimensional data...
In epertham/xLLiM: High Dimensional Locally-Linear Mapping

preprocess_data

R Documentation

A proposition of function to process high dimensional data before running gllim, sllim or bllim

Description

The goal of preprocess_data() is to get relevant clusters for G-, S-, or BLLiM initialization, coupled with a feature selection for high-dimensional datasets. This function is an alternative to the default initialization implemented in gllim(), sllim() and bllim().

In this function, clusters are initialized with K-means, and variable selection is performed with a LASSO (glmnet) within each clusters. Then selected features are merged to get a subset variables before running any prediction method of xLLiM.

Usage

preprocess_data(tapp,yapp,in_K,...)

Arguments

`tapp`	An `L x N` matrix of training responses with variables in rows and subjects in columns
`yapp`	An `D x N` matrix of training covariates with variables in rows and subjects in columns
`in_K`	Initial number of components or number of clusters
`...`	Other arguments of glmnet can be passed

Value

`selected.variables`	Vector of the indexes of selected variables. Selection is made within clusters and merged hereafter.
`clusters`	Initialization clusters with k-means

Author(s)

Emeline Perthame (emeline.perthame@pasteur.fr), Emilie Devijver (emilie.devijver@kuleuven.be), Melina Gallopin (melina.gallopin@u-psud.fr)

References

[1] E. Devijver, M. Gallopin, E. Perthame. Nonlinear network-based quantitative trait prediction from transcriptomic data. Submitted, 2017, available at https://arxiv.org/abs/1701.07899.