calculatePRandom: Compute P_random
In saps: Significance Analysis of Prognostic Signatures

Description Usage Arguments Value References See Also Examples

This function randomly samples gene sets, and calculates P_pure (via calculatePPure) for each one. P_random is the proportion of randomly sampled gene sets achieving a P_pure at least as significant as the provided p_pure. This function is normally called by saps.

1 2	calculatePRandom(dataSet, sampleSize, p_pure, survivalTimes, followup, random.samples = 10000)

`dataSet`	A matrix, where the column names are gene identifiers and the values are gene expression levels. Each row should contain data for a single patient.
`sampleSize`	The desired size for the randomly sampled gene sets.
`p_pure`	The candidate P_pure against which to compare the P_pure values for the randomly generated gene sets.
`survivalTimes`	A vector of survival times. The length must equal the number of rows in `dataSet`.
`followup`	A vector of 0 or 1 values, indicating whether the patient was lost to followup (0) or not (1). The length must equal the number of rows (i.e. patients) in `dataSet`.
`random.samples`	The number of random gene sets to sample.

A list with the following elements:

`p_random`	The proportion of randomly sampled gene sets with a calculated p_pure at least as significant as the provided `p_pure`.
`p_pures`	A vector of calculated p_pure values for each randomly generated geneset.

Beck AH, Knoblauch NW, Hefti MM, Kaplan J, Schnitt SJ, et al. (2013) Significance Analysis of Prognostic Signatures. PLoS Comput Biol 9(1): e1002875.doi:10.1371/journal.pcbi.1002875

saps

# 25 patients, none lost to followup
followup <- rep(1, 25)

# first 5 patients have good survival (in days)
time <- c(25, 27, 24, 21, 26, sample(1:3, 20, TRUE))*365

# create data for 100 genes, 25 patients
dat <- matrix(rnorm(25*100), nrow=25, ncol=100)
colnames(dat) <- as.character(1:100)

# relatively low threshold
p_pure <- 0.05

p_random <- calculatePRandom(dat, 5, p_pure, time, followup, random.samples=100)
p_random$p_random
hist(p_random$p_pures)
length(p_random$p_pures[p_random$p_pures <= p_pure])

# set a more stringent threshold
p_pure <- 0.001

p_random <- calculatePRandom(dat, 5, p_pure, time, followup, random.samples=100)
length(p_random$p_pures[p_random$p_pures <= p_pure])