Description Usage Arguments Value Author(s) References See Also Examples
View source: R/function.main.R
Allows to train the classifier, calculate the genes network...
1 2 3 4 5 6 7 8 9 10 11 12 13 | geNetClassifier(eset, sampleLabels, plotsName = NULL,
buildClassifier = TRUE, estimateGError = FALSE,
calculateNetwork = TRUE, labelsOrder = NULL, geneLabels = NULL,
numGenesNetworkPlot = 100,
minGenesTrain = 1, maxGenesTrain = 100, continueZeroError = FALSE,
numIters = 6, lpThreshold = 0.95, numDecimals = 3,
removeCorrelations = FALSE, correlationsThreshold = 0.8,
correlationMethod = "pearson",
removeInteractions = FALSE, interactionsThreshold = 0.5,
minProbAssignCoeff = 1, minDiffAssignCoeff = 0.8,
IQRfilterPercentage = 0, skipInteractions = TRUE,
precalcGenesNetwork = NULL, precalcGenesRanking = NULL,
returnAllGenesRanking = TRUE, kernel="linear", verbose=TRUE, ...)
|
eset |
ExpressionSet or matrix. Gene expression of the train samples (positive & non-logaritmic normalized values). |
sampleLabels |
Character. PhenoData variable (column name) containing the train samples class labels. |
labelsOrder |
Vector or Factor. Order in which the labels should be shown in the returned results and plots. |
plotsName |
Character. File name with which the plots should be saved. If not provided, no plots will be drawn. |
buildClassifier |
Logical. If TRUE trains a classifier with the given samples. |
estimateGError |
Logical. If TRUE uses cross-validation to estimate the Generalization Error of a classiffier trained with the given samples. |
calculateNetwork |
Logical. If TRUE calculates the coexpression network between the best genes. |
geneLabels |
Vector or Matrix. Gene name, ID or label which should be shown in the returned results and plots. |
numGenesNetworkPlot |
Integer. Number of genes to show in the coexpression network for each class. |
minGenesTrain |
Integer. Maximum number of genes per class to train the classifier with. |
maxGenesTrain |
Integer. Maximum number of genes per class to train the classifier with. |
continueZeroError |
Logical. If TRUE, the program will continue testing combinations with more genes even if error 0 has been reached. |
numIters |
Integer. Number of iterations to determine the optimum number of genes (between |
lpThreshold |
Numeric between 0 and 1. Required posterior probability value to consider a gene 'significant'. |
removeCorrelations |
Logical. If TRUE, no correlated genes will be chosen to train the classifier. |
correlationsThreshold |
Numeric between 0 and 1. Minimum Pearson's correlation coefficient to consider genes correlated. |
correlationMethod |
"pearson", "kendall" or "spearman". Type of correlation to calculate between genes. |
removeInteractions |
Logical. If TRUE, genes with Mutual Information coefficient over the threshold will not be chosen to train the classifier. |
interactionsThreshold |
Numeric between 0 and 1. Minimum Mutual Information coefficient to consider two genes equivalent. |
numDecimals |
Integer. Number of decimals to show in the statistics. |
minProbAssignCoeff |
Numeric. Allows modifying the required probability to assign a sample to a class in the internal crossvalidation. For details see: |
minDiffAssignCoeff |
Numeric. Allows modifying the difference of probabilities required between the most likely class and second most likely class to assign a sample. For details see: |
IQRfilterPercentage |
Integer. InterQuartile Range (IQR) filter applied to the initial data. Not recommended for more than two classes. |
skipInteractions |
Logical. If TRUE, the interactions between genes are not calculated (they will not appear on the genes network). Saves some execution time. Only available if |
precalcGenesNetwork |
|
precalcGenesRanking |
|
returnAllGenesRanking |
Logical. If TRUE, returns the whole genes ranking. If FALSE the returned ranking contains only the significant genes (genes over lpThreshold). |
verbose |
Logical. If TRUE, messages indicating the execution progress will be shown. |
kernel |
Character. Type of SVM kernel. Default: "linear", |
... |
Other arguments to pass to the |
A GeNetClassifierReturn
object containing the classifier and the genes chosen to train it (classificationGenes
), Cross-Validation statistics, the whole GenesRanking
and each class' GenesNetwork
(if requested).
Several plots saved as 'plotsName_....pdf
' in the working directory.
Bioinformatics and Functional Genomics Group. Centro de Investigacion del Cancer (CIC-IBMCC, USAL-CSIC). Salamanca. Spain
Packages used by this function:
EBarrays: emfit
(Implements EM algorithm for gene expression mixture model) and ebPatterns, for calculating the gene ranking.
Ming Yuan, Michael Newton, Deepayan Sarkar and Christina Kendziorski (2007). EBarrays: Unified Approach for Simultaneous Gene Clustering and Differential Expression Identification. R package.
e1071: svm
.
Evgenia Dimitriadou, Kurt Hornik, Friedrich Leisch, David Meyer and Andreas Weingessel (2011). e1071: Misc Functions of the Department of Statistics (e1071), TU Wien. R package.
http://CRAN.R-project.org/package=e1071
ipred: kfoldcv
(computes feasible sample sizes for the k groups in k-fold cv) for the cross-validations.
Andrea Peters and Torsten Hothorn (2012). ipred: Improved Predictors. R package. http://CRAN.R-project.org/package=ipred
minet
for the Mutual Information network.
Patrick E. Meyer, Frederic Lafitte and Gianluca Bontempi (2008). MINET: An open source R/Bioconductor Package for Mutual Information based Network Inference. BMC Bioinformatics.
http://www.biomedcentral.com/1471-2105/9/461
RColorBrewer
(brewer.pal
) for palettes in some of the plots.
Erich Neuwirth (2011). RColorBrewer: ColorBrewer palettes. R package.
http://CRAN.R-project.org/package=RColorBrewer
igraph
for the graphical representation of the networks.
Csardi G, Nepusz T: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. http://igraph.sf.net
To query the classifier: queryGeNetClassifier
All functions in the package: geNetClassifier-package
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | ########
# Load libraries and training data
########
# Load an expressionSet:
library(leukemiasEset)
data(leukemiasEset)
# Select the train samples:
trainSamples<- c(1:10, 13:22, 25:34, 37:46, 49:58)
# summary(leukemiasEset$LeukemiaType[trainSamples])
########
# Training
########
# NOTE: Training the classifier takes a while...
# Choose ONE of the followings, or modify to suit your needs:
## Not run:
# "Basic" execution: All default parameters
leukemiasClassifier <- geNetClassifier(eset=leukemiasEset[,trainSamples],
sampleLabels="LeukemiaType", plotsName="leukemiasClassifier")
# All default parameters also estimatings the classiffier's Generalization Error:
# ( by default: buildClassifier=TRUE, calculateNetwork=TRUE)
# Takes longer time than the basic execution
leukemiasClassifier <- geNetClassifier(eset=leukemiasEset[,trainSamples],
sampleLabels="LeukemiaType", plotsName="leukemiasClassifier",
estimateGError=TRUE)
# Faster execution (few minutes - depending on the computer):
# By skipping the calculation of the interactions (MI) betwen the genes,
# and reducing the number of genes to explore when training the classifier
# (100 by default), the execution time can be sightly reduced
leukemiasClassifier <- geNetClassifier(eset=leukemiasEset[,trainSamples],
sampleLabels="LeukemiaType", plotsName="leukemiasClassifier",
skipInteractions= TRUE, maxGenesTrain=20)
# To any of these examples, you can add/remove the argument geneLabels,
# in order to show/remove the gene name in the rankings and plots:
# The argument labelsOrder allows showing the classes in a specific order
# i.e.: labelsOrder=c("ALL","CLL","AML",CML","NoL")
save(leukemiasClassifier, file="leukemiasClassifier.RData") # Save execution result
# For loading the saved object in the future...
# (If it doesn't find it, use getwd() to make sure you are in the right directory)
#load("leukemiasClassifier.RData")
# To avoid having to train a classifier to continue learning to use the package,
# you can load the package's pre-executed example:
data(leukemiasClassifier)
#This example classifier was trained with the following code:
#leukemiasClassifier <- geNetClassifier(leukemiasEset[,trainSamples],
# "LeukemiaType", plotsName="leukemiasClassifier", buildClassifier=TRUE,
# estimateGError=TRUE, calculateNetwork=TRUE, geneLabels=geneSymbols)
########
# Explore the returned object:
########
names(leukemiasClassifier)
# More details on the class' help file:
?GeNetClassifierReturn
# Further options:
# The trained classifier can be used to find the class of new samples:
?queryGeNetClassifier
# The default plots can be modified and presonalized to fit the user needs:
?calculateGenesRanking
?plotNetwork
?plotDiscriminantPower
?plotExpressionProfiles
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.