generateDataset: Generate simulation data for different species

View source: R/generateDataset.R

generateDatasetR Documentation

Generate simulation data for different species

Description

To generate RNA-seq genes between different species.

Usage

generateDataset(commonTags=15000, uniqueTags=c(1000, 3000),
                       unmapped=c(4000, 2000),group=c(1, 2),
                       libLimits=c(.9, 1.1)*1e6, empiricalDist=NULL,
                       genelength, randomRate=1/100,
                       pDifferential=.05, pUp=.5, foldDifference=2)

Arguments

commonTags

The number of genes have the same expression level.

uniqueTags

The number of genes only expressed in one species.

unmapped

The number of genes only in one species.

group

The number of species.

libLimits

The limits for two species.

empiricalDist

Define where to take random sample from (empirical distribution OR random exponential), if NULL, the reads take from random exponential.

genelength

A vector of gene length for each gene of two species.

randomRate

The parameter for exponential distribution.

pDifferential

The propotion of differential expression genes.

pUp

The probably for the reads in first species fold than the second species.

foldDifference

The fold for fold expression genes.

Value

list(.) A list of output, "DATAN" represents the read counts for the first species, "DATAM" represents the read counts for the second species, "trueFactors" represents the true scaling factor for data, "group" represents the number of species, "libSizes" represents the library size for data, "differentialInd" represents the ID for differential expression genes, "commonInd" represents the ID for common expression genes.

Examples

data(orthgenes)
orthgenes[, 6:9] <- round(orthgenes[, 6:9])
orthgenes1 <- orthgenes[!(is.na(orthgenes[,6])|is.na(orthgenes[,7])|
                       is.na(orthgenes[,8])|is.na(orthgenes[,9])), ]
sim_data <- generateDataset(commonTags=5000, uniqueTags=c(100, 300),
                            unmapped=c(400, 200),group=c(1, 2),
                            libLimits=c(.9, 1.1)*1e6,
                            empiricalDist=orthgenes1[, 6],
                            genelength=orthgenes1[, 2], randomRate=1/100,
                            pDifferential=.05, pUp=.5, foldDifference=2)

FocusPaka/SCBN documentation built on April 20, 2023, 3:54 a.m.