sample_size_distribution: sample_size_distribution

View source: R/sampleSize.R

sample_size_distributionR Documentation

sample_size_distribution

Description

A function to estitamete the sample size based on read counts and dispersion distribution in real data.

Usage

sample_size_distribution(
  power = 0.8,
  m = 10000,
  m1 = 100,
  f = 0.1,
  k = 1,
  w = 1,
  rho = 2,
  showMessage = FALSE,
  storeProcess = FALSE,
  distributionObject,
  libSize,
  minAveCount = 5,
  maxAveCount = 2000,
  repNumber = 100,
  dispersionDigits = 1,
  selectedGenes,
  pathway,
  species = "hsa",
  countFilterInRawDistribution = TRUE,
  selectedGeneFilterByCount = FALSE
)

Arguments

power

Power to detect prognostic genes.

m

Total number of genes for testing.

m1

Expected number of prognostic genes.

f

FDR level

k

Ratio of sample size between two groups (Treatment/Control).

w

Ratio of normalization factors between two groups.

rho

minimum fold changes for prognostic genes between two groups (Treatment/Control).

showMessage

Logical. Display the message in the estimation process.

storeProcess

Logical. Store the power and n in sample size or power estimation process.

distributionObject

A DGEList object generated by est_count_dispersion function. RnaSeqSampleSizeData package contains 13 datasets from TCGA, you can set distributionObject as any one of "TCGA_BLCA","TCGA_BRCA","TCGA_CESC","TCGA_COAD","TCGA_HNSC","TCGA_KIRC","TCGA_LGG","TCGA_LUAD","TCGA_LUSC","TCGA_PRAD","TCGA_READ","TCGA_THCA","TCGA_UCEC" to use them.

libSize

numeric vector giving the total count for each sample. If not specified, the libsize in distributionObject will be used.

minAveCount

Minimal average read count for each gene. Genes with smaller read counts will not be used.

maxAveCount

Maximal average read count for each gene. Genes with larger read counts will be taken as maxAveCount.

repNumber

Number of genes used in estimation of read counts and dispersion distribution.

dispersionDigits

Digits of dispersion.

selectedGenes

Optianal. Name of interesed genes. Only the read counts and dispersion distribution for these genes will be used in power estimation.

pathway

Optianal. ID of interested KEGG pathway. Only the read counts and dispersion distribution for genes in this pathway will be used in power estimation.

species

Optianal. Species of interested KEGG pathway.

countFilterInRawDistribution

Logical. If the count filter will be applied on raw count distribution. If not, count filter will be applied on libSize scaled count distribution.

selectedGeneFilterByCount

Logical. If the count filter will be applied to selected genes when selectedGenes parameter was used.

Details

A function to estitamete the sample size based on read counts and dispersion distribution in real data.

Value

Estimate sample size or a list including parameters and sample size in the process.

Examples

#Please note here the parameter repNumber was very small (5) to make the example code faster.
#We suggest repNumber should be at least set as 100 in real analysis.
sample_size_distribution(power=0.8,f=0.01,distributionObject="TCGA_READ",repNumber=5,
showMessage=TRUE)

slzhao/RnaSeqSampleSize documentation built on March 24, 2022, 2:21 a.m.