Simulate p-values for two related experiments

Share:

Description

The function simulates two vectors of p-values using the procedure described in Hwang et al.

Usage

1
2
simulation(n, GammaA, GammaB, epsilonM = 0,
epsilonSD = 1, r1, r2, DEfirst, DEsecond, DEcommon)

Arguments

n

Number of features to simulate

GammaA

Parameter of the Gamma distribution

GammaB

Parameter of the Gamma distribution

epsilonM

Parameter of the Gaussian noise specific to the genes and experiment

epsilonSD

Parameter of the Gaussian noise specific to the genes and experiment

r1

Additional experiment-specific noise

r2

Additional experiment-specific noise

DEfirst

Number of DE features in each experiment

DEsecond

Number of DE features in each experiment

DEcommon

Number of DE features in common between the two experiments

Details

Considering two experiments (k=1,2), each of them with two classes, and n genes, for each gene we simulate a true difference between the classes delta(g), drawn from a Gamma distribution with random sign. The true difference delta(g) is 0 if the gene is not differentially expressed. We then add two normal random noise components, r(k) that act as experiment specific components and epsilon(gk), that are the gene-experiment components. The former is assigned deterministically, whilst the latter is drawn from a standard Gaussian distribution. The log fold change (FC(gk)) is the sum of all these components for each gene and experiment. We assign the n genes to four groups: genes differentially expressed (DE) in both experiments, genes differentially expressed only in the first experiment, genes differentially expressed only in the second experiment and genes differentially expressed in neither experiment. When the genes are differentially expressed in both experiments, they share the same delta(g) and the only difference between them is given by the random components: FC(g1) = delta(g) + r(1) times epsilon(g1) FC(g2) = delta(g) + r(2) times epsilon(g2) This group represents the true positive genes (i.e. truly DE in both experiments) that we are interested in finding using our method. The two groups of genes differentially expressed only in one of the two experiments act like additional noise and make the simulation more realistic.

Then, as described in Hwang et al., a two tails T-test is performed for each FC(gk) and a p-value is generated as: P(gk) = 2 Normal cdf(-absolute value (FC(gk)/r(k))).

Value

names

Which group each simulated gene expression value belongs to

FC1

T statistic for the first experiment

FC2

T statistic for the second experiment

Pval

p-value for the experiments to be compared

Author(s)

Alberto Cassese, Marta Blangiardo

References

Hwang D, Rust A, Ramsey S, Smith J, Leslie D, Weston A, de Atauri P, Aitchison J, Hood L, Siegel A, Bolouri H (2005): A data integration methodology for system biology. PNAS 2005.

M.Blangiardo and S.Richardson (2007) Statistical tools for synthesizing lists of differentially expressed features in related experiments, Genome Biology, 8, R54.

Examples

1
2
3
data = simulation(n=500,GammaA=1,GammaB=1,
r1=0.5,r2=0.8,DEfirst=300,DEsecond=200,
DEcommon=100)