Description Usage Arguments Details Value Note Author(s) References See Also Examples
This function subsamples the columns (arrays) of a microarray data set and do two-sample t-tests. Subsamples from each treatment group are obtained and combined. A t-test is conducted for each row (gene) of the subsampled data set and the p-value density at one is estimated for each combined subsample.
1 2 3 |
dat |
a numeric matrix, the microarray data set with each row being a gene, and each column being a
subject. The first |
n1 |
a positive integer, the original sample size in treatment group 1. |
n2 |
a positive integer, the original sample size in treatment group 2. |
f1method |
character, the name of the function to be used to estimate the p-value density at 1. The first argument of the function needs to be a vector of values. |
max.reps |
a positive integer, the maximum number of subsamples to obtain per subsample size
configuration. If this is set to |
balanced |
logical, indicating whether only balanced subsamples are obtained. This is computationally faster and is good for initial exploration purposes. |
... |
additional arguments used by |
This function tries to get possible subsamples through combn2R
.
For each total subsample size M=3,4,...,N, where N=n1+n2, do the following,
1For each treatment 1 subsample size m1=1,2,...,n1, let m2=M-m1. If 1<=m2<=n2 and at least one of balanced
and m1=m2 is true, then do the following,
1.1Randomly choose max.reps
subsamples among all possible subsamples by choosing m1 subjects from treatment group 1 and m2 subjects from treatment group 2, by using the function combn2R
with sample.method="diff2"
and try.rest=TURE
. Note that this may not be always possible due to some pratical computational limitations. See combn2R
for details.
1.2For each subsample obtained in 1.1
, (1) do a t-test for each gene (i.e., each row of the subsample), and (2) estimate the p-value density at one.
an object of class c("subt","matrix")
, which is a G-by-3 numeric matrix, where G is nrow{dat}
,
with column names 'f1', 'n1', and 'n2', corresponding to the p-value density at 1 and subsample size
in each treatment group. This object also has the following attributes
,
n1 |
the same as the argument |
n2 |
the same as the argument |
f1method |
the same as the argument |
max.reps |
the same as the argument |
balanced |
the same as the argument |
max.reps
applies to each subsample size configuration. For example, 2 subjects subsampled from
treatment group1 and 3 subjects subsampled from treatment group 2 will be considered as a different
subsample size configuration than 3 subjects subsampled from treatment group 1 and 2 subjects subsampled
from treatment group 2. For the small sample sizes commonly seen in microarray data, a large
max.reps
is rarely a big computational burden. But be careful when you do have a very large
sample size, as the number of all possible subsamples grows very fast.
Long Qu
Qu, L., Nettleton, D., Dekkers, J.C.M. Subsampling Based Bias Reduction in Estimating the Proportion of Differentially Expressed Genes from Microarray Data. Unpublished manuscript.
print.subt
, plot.subt
, extrp.pi0
,
matrix.t.test
,combn2R
, subex
, lastbin
,
qvalue
1 2 3 4 5 6 7 8 9 10 | ## Not run:
set.seed(9992722)
## this is how the 'simulatedDat' data set in this package generated
simulatedDat=sim.dat(G=5000)
## this is how the 'simulatedSubt' object in this package generated
simulatedSubt=subt(simulatedDat,balanced=FALSE,max.reps=Inf)
## End(Not run)
data(simulatedSubt)
print(simulatedSubt)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.