Feature selection is critical in omics data analysis to extract restricted and meaningful molecular signatures from complex and high-dimension data, and to build robust classifiers. This package implements a new method to assess the relevance of the variables for the prediction performances of the classifier. The approach can be run in parallel with the PLS-DA, Random Forest, and SVM binary classifiers. The signatures and the corresponding 'restricted' models are returned, enabling future predictions on new datasets. A Galaxy implementation of the package is available within the Workflow4metabolomics.org online infrastructure for computational metabolomics.
Maintainer: Philippe Rinaudo <[email protected]>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
## loading the diaplasma dataset data(diaplasma) attach(diaplasma) ## restricting to a smaller dataset for this example featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500 dataMatrix <- dataMatrix[, featureSelVl] variableMetadata <- variableMetadata[featureSelVl, ] ## signature selection for all 3 classifiers ## a bootI = 5 number of bootstraps is used for this example ## we recommend to keep the default bootI = 50 value for your analyzes set.seed(123) diaSign <- biosign(dataMatrix, sampleMetadata[, "type"], bootI = 5) detach(diaplasma)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.