rnormPerGroup: Generate a data frame with random normal values sampled...

Description Usage Arguments Details Value Author(s) Examples

Description

Generate a data frame with random normal values sampled according to group-specific parameters. Columns correspond to individuals belonging to different groups characterized by specific means and/or standard deviations. Rows correspond to features.

Usage

1

Arguments

n

A vector indicating the number of columns per group.

mean

A vector indicating the mean per group. Must have the same length as n.

sd

A vector indicating the standard deviation per group. Must have the same length as n.

nrow

Number of rows (features) of the result data frame.

Details

First version: 2015-04 Last modification: 2015-04

Value

A list with the following objects:

x

Data frame with the random numbers

cl

Vector with the class label of each column.

mean.per.group

Data frame with feature-wise means (rows) for each group (column).

sd.per.group

Data frame with feature-wise standard deviation (rows) for each group (column).

exp.mean.per.col

Vector with the expected means per column.

mean.per.col

Vector with the means per column in the result matrix.

exp.sd.per.col

Vector with the expected sds per column.

sd.per.col

Vector with the sds per column in the result matrix.

A data frame of random normal values sampled with group-specific parameters.

Author(s)

Jacques van Helden (Jacques.van-Helden@univ-amu.fr)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
################################################################
## Small test: generate a matrix, composed of three groups of different sizes, means and sds.
small.rnorm <- rnormPerGroup(n=c(6,4,5), mean=c(-3,0,5), sd=c(2,1,4), nrow=10)

## Check column means
plot(small.rnorm$exp.mean.per.col, small.rnorm$mean.per.col)
abline(a=0,b=1)

## Check column sd
plot(small.rnorm$exp.sd.per.col, small.rnorm$sd.per.col)
abline(a=0,b=1)

## Generate a wider matrix with
rnorm.result <- rnormPerGroup(n=c(100,100), mean=c(0, 0.5), sd=c(1,2), nrow=1000)

## Check the means per column
boxplot(rnorm.result$mean.per.col ~ rnorm.result$cl)

## Check the sd per column
boxplot(rnorm.result$sd.per.col ~ rnorm.result$cl, main="SD per group")

## Check means per group
boxplot(rnorm.result$mean.per.group, main="Feature-wise mean per group")

################################################################
## Run Student test on each feature to check the power.
## We chose equal sd to comply with the homoscedaticity assumption.
rnorm.result <- rnormPerGroup(n=c(50,50), mean=c(0, 0.5), sd=c(1,1), nrow=10000)
x.student <- tTestPerRow(x = rnorm.result$x, cl = rnorm.result$cl, var.equal=TRUE)

## Plot histogram of the observed differences between groups
hist(x.student$table$means.diff, breaks=100, main="Effect size distribution", xlab="Effect size")
grid(lty="solid",col="#BBBBBB")
abline(v=0.5, col="blue", lwd=2)

## Plot the histogram of p-values. 
## Note: since all the data was generatd under H1, 
## the distribution should be merely composed of low p-values.
hist(x.student$table$p.value, breaks=20, main="P-value distribution", xlab="p-value")

## Plot the empirical power curve beta = f(alpha).
## In this configuration where all features are under alternative hypothesis,
## this corresponds to a Receiver-Operator Characterisitic (ROC) curve
## With empirical TPR versus theoretical FPR.
plot(ecdf(x.student$table$p.value), 
   xlab=expression(FPR == alpha), ylab=expression(TPR == 1-beta),
   main=paste("Student ROC curve"), col="blue")
grid()
abline(v=c(0,1))
abline(h=c(0,1))
abline(a=0,b=1, lty="dashed")

jvanheld/stats4bioinfo documentation built on May 20, 2019, 5:16 a.m.