Description Usage Arguments Details Value Author(s) Examples

Simulation of gene expression data, drug sensitivity data, as well as gene-pathway association matrix

1 2 | ```
data_simulation(K, G1, G2, J, eta0, eta1, density,
alpha_tau = 1, beta_tau = 0.01, SNR = 0, file_name)
``` |

`K` |
Number of latent factors (pathways) |

`G1` |
Number of genes in matrixY1 |

`G2` |
Number of drugs in matrixY2 |

`J` |
Number of samples (for example, cell lines) |

`eta0` |
The probability of having true value of 1 for the entries in matrixZ with value 0 in matrixL |

`eta1` |
The probability of having true value of 0 for the entries in matrixZ with value 1 in matrixL |

`density` |
Density of prior-information matrixL |

`alpha_tau` |
The alpha parameter of Gamma distribution used for the simulation of noise, default value=1 |

`beta_tau` |
The beta parameter of Gamma distribution used for the simulation of noise, default value=0.01 |

`SNR` |
The signal-to-noise ratio, which the ratio of the variance of individual genes to the variance of the noise term, default value=0 |

`file_name` |
The name of the simulated data file |

When SNR is set to some non-zero value, alpha_tau and beta_tau will not be used for the simulation of noise term

A ".RData" file with the following components:

`matrixL1,matrixL2` |
The binary matrix of prior information about gene/drug-pathway associations. Dim(matrixL1)=G1*K, Dim(matrixL2)=G2*K. For example, matrixL1[g,k]=1 indicates that the g-th gene is known to be associated with the k-th pathway. This information can come from some well-known database, such as KEGG pathway database. |

`matrixPi1,matrixPi2` |
The matrix with the bernoulli probability for binary matrixZ1 and matrixZ2.Therefore, matrixPi[g,k]=P(matrixZ[g,k]==1). Dim(matrixPi1)=G1*K,Dim(matrixPi2)=G2*K. These two matrix are determined by matrixL1,L2 and eta0, eta1. |

`matrixX` |
The factor activity matrix with dimension K*J. matrixX[k,j] is the activity value of the k-th latent factor (e.g., pathway) in the j-th sample (e.g., cell line). Real continuous value with mean 0 and SD 1. |

`matrixW1,matrixW2` |
The factor loading matrix representing the degree of influence of the latent factors on individual genes/drugs. Dim(matrixW1)=G1*K; Dim(matrixW2)=G2*K. Real continuous value with mean 0 and SD 1. |

`matrixY1,matrixY2` |
The paired gene expression and drug response matrix measured across the same set of samples (cell lines). Dim(matrixY1)=G1*J. Dim(matrixY2)=G2*J. |

`matrixZ1,matrixZ2` |
The binary matrix indicating whether each entry of loading matrix W1 and W2 is non-zero. For example, matrixZ1[g,k]=1 indicates that matrixW1[g,k] is non-zero, and vice versa. Dim(matrixZ1)=G1*K. Dim(matrixZ2)=G2*K. |

`sigma1,sigma2` |
The positive-definite symmetric matrix specifying the covariance matrix of the noise term associated with each gene or drug. Dim(sigma1)=G1*G1. Dim(sigma2)=G2*G2. |

`Y1_mean,Y2_mean` |
The matrix of the mean value of matrixY before adding the noise term.Calculated by the multiplication of matrixW and matrixX. Dim(Y1_mean)=G1*J. Dim(Y2_mean)=G2*J. |

Haisu Ma <haisu.ma@yale.edu>

1 2 3 | ```
data_simulation(K=10,G1=30,G2=30,J=15,eta0=c(0.2,0.2),
eta1=c(0.2,0.2), density=c(0.1,0.1),alpha_tau=1,
beta_tau=0.01,SNR=0,file_name="demo_data.RData")
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.