# rTranscriptData: Simulation of artificial transcriptomic data In simone: Statistical Inference for MOdular NEtworks (SIMoNe)

## Description

Simulates a Gaussian sample that mimics transcriptomic data, according to a given network, either steady-state or time-course data. When several networks are given, multiple samples are generated.

## Usage

 ```1 2 3 4 5``` ```rTranscriptData(n, graph, ..., mu = rep(0, p), sigma = 0.1) ```

## Arguments

 `n` integer or vector of integer indicating the sample sizes of each task `graph` a `simone.network` object typically generated either by `rNetwork` or `coNetwork` `...` additional `simone.network` objects in case of multiple sample generation `mu` if the network(s) is(are) directed, `mu` is the offset of the VAR(1) model that is used to generate the time-course data; if undirected, `mu` is the offset of the Gaussian vector. `sigma` standard deviation of the noise term used in the simulation process

## Details

If the network is directed, time-course data are simulated according to a VAR(1) model. If the network is undirected, steady-state data are generated by simulating an independent, identically distributed sample of a Gaussian vector.

In both cases, samples are generated on the basis of Θ, as provided by `graph\$Theta`.

If the network is directed, samples are generated according to the following VAR(1) process:

 X0 follows N(0,σ) Xt = μ + Θ Xt-1 + εt, for all t= 1,..., n εt follows N(0,σ).

If the network is undirected, samples are generated according to the following Gaussian vector:

 Xi = μ + t(Θ-1/2) Ui + εi, for all i in 1, ..., n, Ui follows N(0,1) εi follows N(0,σ).
Numerically, Θ-1/2 is computed with the Cholesky decomposition of the pseudo-inverse of Θ.

## Value

Returns a list comprising :

 `X` matrix of simulated gene expression data, `n` observations in rows, genes in columns `tasks` factor indicating the tasks corresponding to the simulated gene expression data in case of multiple networks.

## Author(s)

J. Chiquet, C. Charbonnier

`rNetwork`, `coNetwork`.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53``` ```## time-Course data generation ##----------------------------- # generate a directed network n <- 20 p <- 5 g <- rNetwork(p, pi=5, directed=TRUE) # Generate the data, data2 noisier than data1 data1 <- rTranscriptData(n,g) data2 <- rTranscriptData(n,g,sigma=1) matplot(1:n, data1\$X,type= "l", xlab = "time points", ylab = "level of expression", col=rainbow(n,start=2/6,end = 3/6), ylim = range(c(data1\$X,data2\$X)), main="data2 (blue) generated with more noise than data1 (green)") matlines(1:n,data2\$X,type= "l",col = rainbow(n,start=4/6,end=5/6)) ## steady-state data generation ##----------------------------- # generate an undirected network p <- 10 g <- rNetwork(p, pi=10) data <- rTranscriptData(n=1000,g, sigma=0) attach(data) # Inference of Theta (here without dimension problems since p << n) b <- sapply(1:p,function(x){ tmp <- -solve(t(X[,-x]) %*% X[,-x]) %*% t(X[,-x]) %*% X[,x] res <- rep(NA,10) res[-x] <- tmp res[x] <- 1 return(res) } ) detach(data) # comparison of theoretical Theta and inferred Theta par(mfrow=c(1,2)) image(g\$Theta, main = "Theoretical Theta") image(b, main = "Inferred Theta") ## time-course multitask data generation ##-------------------------------------- # start by generating the networks ancestor <- rNetwork(p=5, pi=5, name="ancestor", directed=TRUE) child1 <- coNetwork(ancestor, 1, name = "child 1") child2 <- coNetwork(ancestor, 1, name = "child 2") # generate the data n <- c(20,20) data <- rTranscriptData(n,child1,child2) attach(data) par(mfrow=c(2,1)) matplot(1:(n[1]),X[tasks ==1,],type= "l", main="Dataset from child 1", xlab = "time points", ylab = "level of expression") matplot(1:(n[2]),X[tasks == 2,], type= "l", main="Dataset from child 2", xlab = "time points", ylab = "level of expression") detach(data) ```