rTranscriptData: Simulation of artificial transcriptomic data
In simone: Statistical Inference for MOdular NEtworks (SIMoNe)

Description Usage Arguments Details Value Author(s) See Also Examples

Simulates a Gaussian sample that mimics transcriptomic data, according to a given network, either steady-state or time-course data. When several networks are given, multiple samples are generated.

rTranscriptData(n,
                graph,
                ...,
                mu    = rep(0, p),
                sigma = 0.1)

`n`	integer or vector of integer indicating the sample sizes of each task
`graph`	a `simone.network` object typically generated either by `rNetwork` or `coNetwork`
`...`	additional `simone.network` objects in case of multiple sample generation
`mu`	if the network(s) is(are) directed, `mu` is the offset of the VAR(1) model that is used to generate the time-course data; if undirected, `mu` is the offset of the Gaussian vector.
`sigma`	standard deviation of the noise term used in the simulation process

If the network is directed, time-course data are simulated according to a VAR(1) model. If the network is undirected, steady-state data are generated by simulating an independent, identically distributed sample of a Gaussian vector.

In both cases, samples are generated on the basis of Θ, as provided by graph$Theta.

If the network is directed, samples are generated according to the following VAR(1) process:

X₀ follows N(0,σ)

X_t = μ + Θ X_t-1 + ε_t, for all t= 1,..., n

ε_t follows N(0,σ).

If the network is undirected, samples are generated according to the following Gaussian vector:

X_i = μ + t(Θ^-1/2) U_i + ε_i, for all i in 1, ..., n,

U_i follows N(0,1)

ε_i follows N(0,σ).

Numerically, Θ^-1/2 is computed with the Cholesky decomposition of the pseudo-inverse of Θ.

Returns a list comprising :

`X`	matrix of simulated gene expression data, `n` observations in rows, genes in columns
`tasks`	factor indicating the tasks corresponding to the simulated gene expression data in case of multiple networks.

J. Chiquet, C. Charbonnier

rNetwork, coNetwork.

## time-Course data generation
##-----------------------------
# generate a directed network
n <- 20
p <- 5
g <- rNetwork(p, pi=5, directed=TRUE)
# Generate the data, data2 noisier than data1
data1  <- rTranscriptData(n,g)
data2  <- rTranscriptData(n,g,sigma=1)
matplot(1:n, data1$X,type= "l", xlab = "time points",
        ylab = "level of expression", col=rainbow(n,start=2/6,end = 3/6),
        ylim = range(c(data1$X,data2$X)),
        main="data2 (blue) generated with more noise than data1 (green)")
matlines(1:n,data2$X,type= "l",col = rainbow(n,start=4/6,end=5/6))

## steady-state data generation
##-----------------------------
# generate an undirected network
p <- 10
g <- rNetwork(p, pi=10)
data <- rTranscriptData(n=1000,g, sigma=0)
attach(data)
# Inference of Theta (here without dimension problems since p << n)
b <- sapply(1:p,function(x){
   tmp <- -solve(t(X[,-x]) %*% X[,-x]) %*% t(X[,-x]) %*% X[,x]
   res <- rep(NA,10)
   res[-x] <- tmp
   res[x] <- 1
   return(res)
  }
)
detach(data)
# comparison of theoretical Theta and inferred Theta
par(mfrow=c(1,2))
image(g$Theta, main = "Theoretical Theta")
image(b, main = "Inferred Theta")

## time-course multitask data generation
##--------------------------------------
# start by generating the networks
ancestor <- rNetwork(p=5, pi=5, name="ancestor", directed=TRUE)
child1   <- coNetwork(ancestor, 1, name = "child 1")
child2   <- coNetwork(ancestor, 1, name = "child 2")
# generate the data
n <- c(20,20)
data  <- rTranscriptData(n,child1,child2)
attach(data)
par(mfrow=c(2,1))
matplot(1:(n[1]),X[tasks ==1,],type= "l", main="Dataset from child 1",
        xlab = "time points", ylab = "level of expression")
matplot(1:(n[2]),X[tasks == 2,], type= "l", main="Dataset from child 2",
        xlab = "time points", ylab = "level of expression")
detach(data)