Description Usage Arguments Details Value References Examples
Perform classification analysis on the uniformly-handled data by re-assigning samples to training and test set. More details can be found in Qin et al. (see reference).
1 2 3 |
seed |
an integer used to initialize a pseudorandom number generator. |
N |
number of simulation runs. |
biological.effect |
the estimated biological effect dataset. This dataset must have rows as probes and columns as samples. |
norm.list |
a list of strings for normalization methods to be compared in the simulation study.
The built-in normalization methods includes "NN", "QN", "MN", "VSN" for "No Normalization", "Quantile Normalization",
"Median Normalization", "Variance Stabilizing Normalization".
User can provide a list of normalization methods given the functions are supplied (also see |
class.list |
a list of strings for classification methods to be compared in the simulation study.
The built-in classification methods are "PAM" and "LASSO" for "prediction analysis for microarrays"
and "least absolute shrinkage and selection operator".
User can provide a list of classification methods given the correponding model-building
and predicting functions are supplied (also see |
norm.funcs |
a list of strings for names of user-defined normalization method functions, in the order of |
class.funcs |
a list of strings for names of user-defined classification model-building functions, in the order of |
pred.funcs |
a list of strings for names of user-defined classification predicting functions, in the order of |
The analysis for the uniformly-handled dataset consists of the following main steps:
(1) randomly split the data into a training set and a test set, balanced by sample group of interest
(2) preprocess the training data and the test data
(3) build a classifier using the preprocessed training data
(4) assess the mislcassification error rate of the classifier using the preprocessed test data
This analysis is repeated for N
random splits of training set and test set.
Data preprocessing in (2) includes three steps: log2 transformation, normalization for training data
and frozen normalization for test data,
and probe-set summarization using median. Normalization methods are specified in norm.list
.
Classifier building in (3) includes choosing the tuning parameter for each method using five-fold cross-validation and
measuring classifier accuarcy using the misclassification error rate.
Classification methods are specified in class.list
The error rate is evaluated by both external validation of test data and cross-validation of training data. For user-defined normalization method or classification method, please refer to the vignette.
benchmark analysis results – a list of training-and-test-set splits, fitted models, and misclassification error rates across simulation runs:
assign_store |
random training-and-test-set splits |
model_store |
models for each combination of normalization methods and classification methods |
error_store |
internal and external misclassification error rates for each combination of normalization methods and classification methods |
Qin LX, Huang HC, Begg CB. Cautionary note on cross validation in molecular classification. Journal of Clinical Oncology. 2016.
1 2 3 4 5 6 7 8 9 10 11 12 13 | ## Not run:
biological.effect <- estimate.biological.effect(uhdata = uhdata.pl)
ctrl.genes <- unique(rownames(uhdata.pl))[grep("NC", unique(rownames(uhdata.pl)))]
biological.effect.nc <- biological.effect[!rownames(biological.effect) %in% ctrl.genes, ]
uni.handled.results <- uni.handled.simulate(seed = 1, N = 3,
biological.effect = biological.effect.nc,
norm.list = c("NN", "QN"),
class.list = c("PAM", "LASSO"))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.