sample_size_distribution: sample_size_distribution
In slzhao/RnaSeqSampleSize: RnaSeqSampleSize

View source: R/sampleSize.R

sample_size_distribution

R Documentation

sample_size_distribution

Description

A function to estitamete the sample size based on read counts and dispersion distribution in real data.

Usage

sample_size_distribution(
  power = 0.8,
  m = 10000,
  m1 = 100,
  f = 0.1,
  k = 1,
  w = 1,
  rho = 2,
  showMessage = FALSE,
  storeProcess = FALSE,
  distributionObject,
  libSize,
  minAveCount = 5,
  maxAveCount = 2000,
  repNumber = 100,
  dispersionDigits = 1,
  selectedGenes,
  pathway,
  species = "hsa",
  countFilterInRawDistribution = TRUE,
  selectedGeneFilterByCount = FALSE
)

Arguments

`power`	Power to detect prognostic genes.
`m`	Total number of genes for testing.
`m1`	Expected number of prognostic genes.
`f`	FDR level
`k`	Ratio of sample size between two groups (Treatment/Control).
`w`	Ratio of normalization factors between two groups.
`rho`	minimum fold changes for prognostic genes between two groups (Treatment/Control).
`showMessage`	Logical. Display the message in the estimation process.
`storeProcess`	Logical. Store the power and n in sample size or power estimation process.
`distributionObject`	A DGEList object generated by est_count_dispersion function. RnaSeqSampleSizeData package contains 13 datasets from TCGA, you can set distributionObject as any one of "TCGA_BLCA","TCGA_BRCA","TCGA_CESC","TCGA_COAD","TCGA_HNSC","TCGA_KIRC","TCGA_LGG","TCGA_LUAD","TCGA_LUSC","TCGA_PRAD","TCGA_READ","TCGA_THCA","TCGA_UCEC" to use them.
`libSize`	numeric vector giving the total count for each sample. If not specified, the libsize in distributionObject will be used.
`minAveCount`	Minimal average read count for each gene. Genes with smaller read counts will not be used.
`maxAveCount`	Maximal average read count for each gene. Genes with larger read counts will be taken as maxAveCount.
`repNumber`	Number of genes used in estimation of read counts and dispersion distribution.
`dispersionDigits`	Digits of dispersion.
`selectedGenes`	Optianal. Name of interesed genes. Only the read counts and dispersion distribution for these genes will be used in power estimation.
`pathway`	Optianal. ID of interested KEGG pathway. Only the read counts and dispersion distribution for genes in this pathway will be used in power estimation.
`species`	Optianal. Species of interested KEGG pathway.
`countFilterInRawDistribution`	Logical. If the count filter will be applied on raw count distribution. If not, count filter will be applied on libSize scaled count distribution.
`selectedGeneFilterByCount`	Logical. If the count filter will be applied to selected genes when selectedGenes parameter was used.

Details

A function to estitamete the sample size based on read counts and dispersion distribution in real data.

Value

Estimate sample size or a list including parameters and sample size in the process.

Examples

#Please note here the parameter repNumber was very small (5) to make the example code faster.
#We suggest repNumber should be at least set as 100 in real analysis.
sample_size_distribution(power=0.8,f=0.01,distributionObject="TCGA_READ",repNumber=5,
showMessage=TRUE)

slzhao/RnaSeqSampleSize documentation built on March 24, 2022, 2:21 a.m.