Description Usage Arguments Details Value References Examples
Perform the simulation study in Qin et al. (see reference).
1 2 3 4 5 6 7 | precision.simulate(seed, N, biological.effect.tr, biological.effect.te,
handling.effect.tr, handling.effect.te, group.id.tr, group.id.te,
design.list = c("CC+", "CC-", "PC+", "PC-"), norm.list = c("NN", "QN"),
class.list = c("PAM", "LASSO"), batch.id = NULL, icombat = FALSE,
isva = FALSE, iruv = FALSE, biological.effect.tr.ctrl = NULL,
handling.effect.tr.ctrl = NULL, norm.funcs = NULL, class.funcs = NULL,
pred.funcs = NULL)
|
seed |
an integer used to initialize a pseudorandom number generator. |
N |
number of simulation runs. |
biological.effect.tr |
the training set of the estimated biological effects. This dataset must have rows as probes and columns as samples. |
biological.effect.te |
the test set of the estimated biological effects. This dataset must have rows as probes and columns as samples. It must have the same number of probes and the same probe names as the training set of the estimated biological effects. |
handling.effect.tr |
the training set of the estimated handling effects. This dataset must have rows as probes and columns as samples. It must have the same dimensions and the same probe names as the training set of the estimated biological effects. |
handling.effect.te |
the test set of the estimated handling effects. This dataset must have rows as probes, columns as samples. It must have the same dimensions and the same probe names as the training set of the estimated handling effects. |
group.id.tr |
a vector of sample-group labels for each sample of the training set of the estimated biological effects. It must be a 2-level non-numeric factor vector. |
group.id.te |
a vector of sample-group labels for each sample of the test set of the estimated biological effects. It must be a 2-level non-numeric factor vector. |
design.list |
a list of strings for study designs to be compared in the simulation study. The built-in designs are "CC+", "CC-", "PC+", "PC-", "BLK", and "STR" for "Complete Confounding 1", "Complete Confounding 2", "Partial Confounding 1", "Partial Confounding 2", "Blocking", and "Stratification" in Qin et al. |
norm.list |
a list of strings for normalization methods to be compared in the simulation study. The build-in available normalization methods are "NN", "QN", "MN", "VSN" for "No Normalization", "Quantile Normalization", "Median Normalization", "Variance Stabilizing Normalization". User can provide a list of normalization methods given the functions are supplied (also see norm.funcs). |
class.list |
a list of strings for classification methods to be compared in the simulation study.
The built-in classification methods are "PAM" and "LASSO" for "prediction analysis for microarrays" and
"least absolute shrinkage and selection operator".
User can provide a list of classification methods given the correponding model-building and
predicting functions are supplied (also see |
batch.id |
a list of array indices grouped by batches when data were profiled.
The length of the list must be equal to the number of batches in the data;
the number of array indices must be the same as the number of samples.
This is required if stratification study design is specified in |
icombat |
an indicator for combat adjustment. By default, |
isva |
an indicator for sva adjustment. By default, |
iruv |
an indicator for RUV-4 adjustment. By default, |
biological.effect.tr.ctrl |
the training set of the negative-control probe biological effect data if |
handling.effect.tr.ctrl |
the training set of the negative-control probe handling effect data if |
norm.funcs |
a list of strings for names of user-defined normalization method functions, in the order of |
class.funcs |
a list of strings for names of user-defined classification model-building functions, in the order of |
pred.funcs |
a list of strings for names of user-defined classification predicting functions, in the order of |
The classification anlaysis of simulation study consists of the following main steps:
First, precision.simulate
requires the training and test sets for both estimated biological effects and estimated handling effects.
The effects can be simulated as follows (using estimate.biological.effect
and estimate.handling.effect
).
The uniformly-handled dataset are used to approximate the biological effect for each sample,
and the difference between the two arrays (one from the uniformly-handled dataset and
the other from the nonuniformly-handled dataset, subtracting the former from the latter)
for the same sample are used to approximate the handling effect for each array in the nonuniformly-handled dataset.
The samples are randomly split into a training set and a test set, balanced by tumor type (in Qin et al., training-to-test ratio is 2:1).
The arrays were then non-randomly split to a training set and a test set (in Qin et al., training set n = 128 – the first 64 and last 64 arrays
in the order of array processing; test set n = 64 – the middle 64 arrays).
This setup allows different pairings of arrays and samples by various different training-and-test-set splits.
Furthermore, biological signal strength and confounding level of the handling effects can be modified
(using reduce.signal
and amplify.handling.effect
).
Second, for the training set, data are simulated through "virtual re-hybridization" (using rehybridize
)
by first assigning arrays to sample groups using a confounding design or a balanced design, and
then summing the biological effect for a sample and the handling effect for its assigned array.
Rehybridization allows us to examine the use of various array-assignment schemes, specified in design.list
.
Third, the analysis for each simulated dataset follows the same steps as described
for the analysis of the uniformly-handled data (also see documentation on uni.handled.siumate
):
(1) data preprocessing (normalization methods are specified in norm.list
and
batch effects can be adjusted specified with icombat
, isva
and iruv
)
(2) classifier training (classification methods are specified in class.list
)
(3) classification error estimation using both cross-validation and external validation
The external validation is based on the test data from the uniformly-handled dataset and served as the gold standard for the misclassification error estimation.
For a given split of samples to training set versus test set,
N
datasets will be simulated and analyzed for each array-assignment scheme.
For user-defined normalization method or classification method, please refer to the vignette.
simulation study results – a list of array-to-sample assignments, fitted models, and misclassification error rates across simulation runs:
assign_store |
array-to-sample assignments for each study design |
model_store |
models for each combination of study designs, normalization methods, and classification methods |
error_store |
internal and external misclassification error rates for each combination of study designs, normalization methods, and classification methods |
Qin LX, Huang HC, Begg CB. Cautionary note on cross validation in molecular classification. Journal of Clinical Oncology. 2016
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | ## Not run:
set.seed(101)
biological.effect <- estimate.biological.effect(uhdata = uhdata.pl)
handling.effect <- estimate.handling.effect(uhdata = uhdata.pl,
nuhdata = nuhdata.pl)
ctrl.genes <- unique(rownames(uhdata.pl))[grep("NC", unique(rownames(uhdata.pl)))]
biological.effect.nc <- biological.effect[!rownames(biological.effect) %in% ctrl.genes, ]
handling.effect.nc <- handling.effect[!rownames(handling.effect) %in% ctrl.genes, ]
group.id <- substr(colnames(biological.effect.nc), 7, 7)
# randomly split biological effect data into training and test set with
# equal number of endometrial and ovarian samples
biological.effect.train.ind <- colnames(biological.effect.nc)[c(sample(which(group.id == "E"), size = 64),
sample(which(group.id == "V"), size = 64))]
biological.effect.test.ind <- colnames(biological.effect.nc)[!colnames(biological.effect.nc) %in% biological.effect.train.ind]
biological.effect.train.test.split =
list("tr" = biological.effect.train.ind,
"te" = biological.effect.test.ind)
# non-randomly split handling effect data into training and test set
handling.effect.train.test.split =
list("tr" = c(1:64, 129:192),
"te" = 65:128)
biological.effect.nc.tr <- biological.effect.nc[, biological.effect.train.ind]
biological.effect.nc.te <- biological.effect.nc[, biological.effect.test.ind]
handling.effect.nc.tr <- handling.effect.nc[, c(1:64, 129:192)]
handling.effect.nc.te <- handling.effect.nc[, 65:128]
# Simulation without batch adjustment
precision.results <- precision.simulate(seed = 1, N = 3,
biological.effect.tr = biological.effect.nc.tr,
biological.effect.te = biological.effect.nc.te,
handling.effect.tr = handling.effect.nc.tr,
handling.effect.te = handling.effect.nc.te,
group.id.tr = substr(colnames(biological.effect.nc.tr), 7, 7),
group.id.te = substr(colnames(biological.effect.nc.te), 7, 7),
design.list = c("PC-", "STR"),
norm.list = c("NN", "QN"),
class.list = c("PAM", "LASSO"),
batch.id = list(1:40,
41:64,
(129:160) - 64,
(161:192) - 64))
# Simulation with RUV-4 batch adjustment
biological.effect.ctrl <- biological.effect[rownames(biological.effect) %in% ctrl.genes, ]
handling.effect.ctrl <- handling.effect[rownames(handling.effect) %in% ctrl.genes, ]
biological.effect.tr.ctrl <- biological.effect.ctrl[, biological.effect.train.test.split$tr]
handling.effect.tr.ctrl <- handling.effect.ctrl[, handling.effect.train.test.split$tr]
precision.ruv4.results <- precision.simulate(seed = 1, N = 3,
biological.effect.tr = biological.effect.nc.tr,
biological.effect.te = biological.effect.nc.te,
handling.effect.tr = handling.effect.nc.tr,
handling.effect.te = handling.effect.nc.te,
group.id.tr = substr(colnames(biological.effect.nc.tr), 7, 7),
group.id.te = substr(colnames(biological.effect.nc.te), 7, 7),
design.list = c("PC-", "STR"),
norm.list = c("NN", "QN"),
class.list = c("PAM", "LASSO"),
batch.id = list(1:40,
41:64,
(129:160) - 64,
(161:192) - 64),
iruv = TRUE,
biological.effect.tr.ctrl = biological.effect.tr.ctrl,
handling.effect.tr.ctrl = handling.effect.tr.ctrl)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.