PCS-package: Probability of Correct Selection (PCS)

Description Details Note Author(s) References Examples

Description

These functions calculate the probability of correct selection (PCS) with G-best, d-best, and L-best selection as described in Cui & Wilson (2008) and Cui, Zhao, & Wilson (2008). The specific parameters (G,d,L), distributional assumptions (normal, Student's t, non-parametric), and calculation method (exact, bootstrap) are user-settable.

Details

Package: PCS
Type: Package
Version: 1.0
Date: 2009-04-15
License: What license is it under?
LazyLoad: yes

Probability of Correct Selection (PCS) is the probability of selecting the best t of k populations. This package allows the user to calculate the PCS for a given dataset. When k is large, even if the best t populations are significantly different from the rest, the PCS may be small due to sample variance. To address this issue, Cui & Wilson (2008, 2009) developed three tuning parameters whereby the definition of correct selection is modified (d-best, G-best, L-best) to more realistic and acceptable standards for large k problems. This package is the implementation of these three definitions, using different calculation methods.

The PCS package consists of three primary functions for users: PCS.boot.par, PCS.boot.np, and PCS.exact. PCS.boot.par and PCS.boot.np use parametric and non-parametric bootstraps, respectively, to calculate d-best, G-best, and L-best PCS. PCS.boot.par is the fastest function for large k problems. It is expected to be the most commonly used, as the parametric distributional (normal & Student's t) assumptions are reasonable and moderately robust (Cui & Wilson 2009). When k is large and the distributional assumptions are not met, then PCS.boot.np may be used. For information regarding the necessary sample size, see (Cui & Wilson 2009). When k is small to moderate, PCS.exact may be used to obtain PCS using the analytic formula.

Note

I would like to thank Xinping Cui for her support while creating this package, Thomas Girke for use of the UCR Bioinformatics Cluster, Bushi Wang for stress testing the code, and God for all the help on the way.

Author(s)

Jason Wilson Maintainer: Jason Wilson <jason.wilson@biola.edu>

References

Cui, X. and Wilson, J. 2007. On How to Calculate the Probability of Correct Selection for Large k Populations. University of California, Riverside Statistics Department Technical Report 297. http://www.bubbs.biola.edu/~jason.wilson/Article2_tech_techreport.pdf
Cui, X. and Wilson, J. 2008. On the Probability of Correct Selection for Large k Populations, with Application to Microarray Data. Biometrical Journal, 50:5, 870-883. http://www.bubbs.biola.edu/~jason.wilson/Article1_Revision.pdf
Cui, X. and Wilson, J. 2009. A Simulation Study on the Probability of Correct Selection for Large k Populations. Communications in Statistics - Simulation and Computation, 38:6. http://www.bubbs.biola.edu/~jason.wilson/Article2_sim_revised02.pdf
Cui, X.; Zhao, H. and Wilson, J. Optimization of Gene Selection in Microarray Experiments. 2008. Submitted.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
##### Small example for PCS.boot.par & PCS.exact
theta = c(4.2, 2.5, 2.3, 1.7, 1.5, 1.0)
theta = sort(theta)
T = c(1,2,3,5); G = 1:3; D = c(0.5,1,1.5); L = 1:2
PCS.boot.par(theta, T, G, D, L, B=100, SDE=1, dist="normal", df=14, trunc=6)
PCS.exact(theta, t=2, g=1, d=NULL, m=20, tol=1e-8)

##### Small example for PCS.boot.np
k=20 #number of populations
n=10 #sample size
SD=.1 #standard deviation
theta = seq(0,6,length.out=k)
X1 = rnorm(k*n,0,SD)   #Sample 1
X1 = matrix(X1,nrow=k,ncol=n,byrow=FALSE)
X2 = rnorm(k*n,theta,SD) #Sample 2, shifted
X2 = matrix(X2,nrow=k,ncol=n,byrow=FALSE)
T = c(1,2,3,5); G = 1:3; D = c(0.5,1,1.5)
PCS.boot.np(X1, X2, T, G, D, B=100, trunc=6)

##### Microarray example of t-statistics with PCS.boot.par
require(multtest)
data(golub)  	#Load microarray data
sub = 500	#Subset index for speed
ans = tindep(golub[1:sub,1:27], golub[1:sub,28:38], flag=1) #Obtain t-statistics
golub.T = sort(abs(ans[,1]))  	#Massage t-statistics
T=c(1,5,10,25,50,92); G=c(1,10,25,50,150); D=c(0,1,2) #Set PCS parameters
df=18 			  #Degrees of freedom from Satterthwaite approximation
sde=sqrt((18/(18-2))/19)  #Estimate SDE by MOM SD, divided by mean sample size
PCS.boot.par(golub.T, T, G, D, L=NULL, B=100, SDE=sde, dist="t", df=18) #Small B for speed

##### Microarray example of Golub's correlation statistics
##### (see reference) with PCS.boot.par
require(multtest)
data(golub)  					#Load microarray data
Pgc <- function(x,y) {  		#Function to calculate Golub's correlation statistics
  xbar1 = apply(x,1,mean)
  xbar2 = apply(y,1,mean)
  sd1   = apply(x,1,sd)
  sd2   = apply(y,1,sd)
  Pgc = abs((xbar1-xbar2))/(sd1+sd2)
  return(Pgc)
} #end function
sub = 500	#Subset index for speed
Pgc.gol = Pgc(golub[1:sub,1:27],golub[1:sub,28:38]) #Calculate correlation statistics
T=c(1,5,10,25,50,92); G=c(1,10,25,50,150); D=c(0,1,2) #Set PCS parameters
sde=0.20 		#Obtained by simulation on Golub data
PCS.boot.par(Pgc.gol, T, G, D, L=NULL, B=100, SDE=0.2, dist="t", df=18) #Small B for speed

##### Microarray example using non-parametric bootstrap
require(multtest)
data(golub)  							#Load microarray data
T=c(1,5,10); G=c(1,3,5); D=c(0,1,2) 	#Set PCS parameters
sub = 100	#Subset index for speed
PCS.boot.np(golub[1:sub,1:27], golub[1:sub,28:38], T, G, D, B=10, trunc=6) #Small B for speed

PCS documentation built on May 2, 2019, 9:34 a.m.