subt: Subsampling a Microarray Data Set for Estimating Proportion...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/subt.R

Description

This function subsamples the columns (arrays) of a microarray data set and do two-sample t-tests. Subsamples from each treatment group are obtained and combined. A t-test is conducted for each row (gene) of the subsampled data set and the p-value density at one is estimated for each combined subsample.

Usage

1
2
3
subt(dat, n1 = round(ncol(dat)/2), n2 = ncol(dat) - n1, 
      f1method = c("lastbin", "qvalue"), 
        max.reps = if(balanced)20 else 5, balanced = FALSE,  ...) 

Arguments

dat

a numeric matrix, the microarray data set with each row being a gene, and each column being a subject. The first n1 columns correspond to treatment group 1 and the rest n2 columns correspond to treatment group 2.

n1

a positive integer, the original sample size in treatment group 1.

n2

a positive integer, the original sample size in treatment group 2.

f1method

character, the name of the function to be used to estimate the p-value density at 1. The first argument of the function needs to be a vector of values.

max.reps

a positive integer, the maximum number of subsamples to obtain per subsample size configuration. If this is set to Inf, then all possible subsamples will be tried. However, see Notes and the R argument of combn2R.

balanced

logical, indicating whether only balanced subsamples are obtained. This is computationally faster and is good for initial exploration purposes.

...

additional arguments used by f1method.

Details

This function tries to get possible subsamples through combn2R.
For each total subsample size M=3,4,...,N, where N=n1+n2, do the following,

Value

an object of class c("subt","matrix"), which is a G-by-3 numeric matrix, where G is nrow{dat}, with column names 'f1', 'n1', and 'n2', corresponding to the p-value density at 1 and subsample size in each treatment group. This object also has the following attributes,

n1

the same as the argument n1.

n2

the same as the argument n2.

f1method

the same as the argument f1method.

max.reps

the same as the argument max.reps.

balanced

the same as the argument balanced.

Note

max.reps applies to each subsample size configuration. For example, 2 subjects subsampled from treatment group1 and 3 subjects subsampled from treatment group 2 will be considered as a different subsample size configuration than 3 subjects subsampled from treatment group 1 and 2 subjects subsampled from treatment group 2. For the small sample sizes commonly seen in microarray data, a large max.reps is rarely a big computational burden. But be careful when you do have a very large sample size, as the number of all possible subsamples grows very fast.

Author(s)

Long Qu

References

Qu, L., Nettleton, D., Dekkers, J.C.M. Subsampling Based Bias Reduction in Estimating the Proportion of Differentially Expressed Genes from Microarray Data. Unpublished manuscript.

See Also

print.subt, plot.subt, extrp.pi0, matrix.t.test,combn2R, subex, lastbin, qvalue

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## Not run: 
set.seed(9992722)
## this is how the 'simulatedDat' data set in this package generated
simulatedDat=sim.dat(G=5000)        
## this is how the 'simulatedSubt' object in this package generated
simulatedSubt=subt(simulatedDat,balanced=FALSE,max.reps=Inf) 

## End(Not run)
data(simulatedSubt)
print(simulatedSubt)

pi0 documentation built on July 9, 2017, 9:01 a.m.