pipe.DEtools: Pipes for Group-wise Differential Expression Tools like...

pipe.DEtoolsR Documentation

Pipes for Group-wise Differential Expression Tools like DESeq, EdgeR, SAM, etc.

Description

Wrapper functions to a family of published DE tools, to find significant differentially expressed genes between groups of samples.

Usage

pipe.DESeq(sampleIDset, speciesID = getCurrentSpecies(), annotationFile = "Annotation.txt", 
	optionsFile = "Options.txt", useMultiHits = TRUE, results.path = NULL, 
	groupColumn = "Group", colorColumn = "Color", folderName = "", 
	altGeneMap = NULL, altGeneMapLabel = NULL, targetID = NULL, Ngenes = 100, 
	geneColumnHTML = if (speciesID %in% MAMMAL_SPECIES) "NAME" else "GENE_ID", 
	keepIntergenics = FALSE, verbose = !interactive(), label = "", 
	doDE = TRUE, PLOT.FUN = NULL, ...)

pipe.EdgeR(sampleIDset, speciesID = getCurrentSpecies(), annotationFile = "Annotation.txt", 
	optionsFile = "Options.txt", useMultiHits = TRUE, results.path = NULL, 
	groupColumn = "Group", colorColumn = "Color", folderName = "", 
	altGeneMap = NULL, altGeneMapLabel = NULL, targetID = NULL, Ngenes = 100, 
	geneColumnHTML = if (speciesID %in% MAMMAL_SPECIES) "NAME" else "GENE_ID", 
	keepIntergenics = FALSE, verbose = !interactive(), label = "", 
	doDE = TRUE, PLOT.FUN = NULL, ...)

pipe.RankProduct(sampleIDset, speciesID = getCurrentSpecies(), annotationFile = "Annotation.txt", 
	optionsFile = "Options.txt", useMultiHits = TRUE, results.path = NULL, 
	groupColumn = "Group", colorColumn = "Color", folderName = "", 
	altGeneMap = NULL, altGeneMapLabel = NULL, targetID = NULL, Ngenes = 100, 
	geneColumnHTML = if (speciesID %in% MAMMAL_SPECIES) "NAME" else "GENE_ID", 
	keepIntergenics = FALSE, verbose = !interactive(), label = "", 
	doDE = TRUE, PLOT.FUN = NULL, ...)

pipe.RoundRobin(sampleIDset, speciesID = getCurrentSpecies(), annotationFile = "Annotation.txt", 
	optionsFile = "Options.txt", useMultiHits = TRUE, results.path = NULL, 
	groupColumn = "Group", colorColumn = "Color", folderName = "", 
	altGeneMap = NULL, altGeneMapLabel = NULL, targetID = NULL, Ngenes = 100, 
	geneColumnHTML = if (speciesID %in% MAMMAL_SPECIES) "NAME" else "GENE_ID", 
	keepIntergenics = FALSE, verbose = !interactive(), label = "", 
	doDE = TRUE, PLOT.FUN = NULL, ...)

pipe.SAM(sampleIDset, speciesID = getCurrentSpecies(), annotationFile = "Annotation.txt", 
	optionsFile = "Options.txt", useMultiHits = TRUE, results.path = NULL, 
	groupColumn = "Group", colorColumn = "Color", folderName = "", 
	altGeneMap = NULL, altGeneMapLabel = NULL, targetID = NULL, Ngenes = 100, 
	geneColumnHTML = if (speciesID %in% MAMMAL_SPECIES) "NAME" else "GENE_ID", 
	keepIntergenics = FALSE, verbose = !interactive(), label = "", 
	doDE = TRUE, PLOT.FUN = NULL, ...)

Arguments

sampleIDset

Character vector of SampleIDs, giving the full set of samples that will take part in the DE calculations.

speciesID

The SpeciesID for one single species. The DE tools do not operate on multipe species at one time.

annotationFile

File of sample annotation details, which specifies all needed sample-specific information about the samples under study. See DuffyNGS_Annotation.

optionsFile

File of processing options, which specifies all processing parameters that are not sample specific. See DuffyNGS_Options.

useMultiHits

Logical. By default, all DE tools use the RPKM or READ values from the transcriptomes that correspond to keeping all aligned reads, including those alignments called "MultiHit" reads. If FALSE, this behavior can be restricted to only using uniquely mapped reads. Since the transcriptomes store both methods of counting gene abundance, changing how the DE results may be impacted is trivial.

results.path

The top level folder path for writing result files to. By default, read from the Options file entry 'results.path'.

groupColumn

Character string specifying one column of the annotation table, to give the group name for each sample.

colorColumn

Character string specifying one column of the annotation table, to give the group color for each sample.

folderName

Required character string, with no embedded blanks, used to name the folder of DE results that will be generated by the DE tool. Typically, use a short but informative name that describes the groups being compared.

altGeneMap

An alternate data frame of gene annotations, in a format identical to getCurrentGeneMap, that has the gene names and locations to be measured for differential expression. By default, use the standard built-in gene map for this species.

altGeneMapLabel

A character string identifier for the alternate gene map, that becomes part of all created path and file names to indicate the gene map that produced the transcriptomes used in this DE analysis.

targetID

Optional character string giving the target organism(s) being compared. Used by the gene plotting tools, defaults to the current target.

Ngenes

Number of gene to show in the HTML results and create gene plot images for.

geneColumnHTML

The name of one column in the current gene map, that contains the identifier shown in the HTML results. Some genomes require complex compound GeneIDs to give genomic location specificity, but are unwieldy for routine use. This argument lets a second simpler identifier be used as a surrogate GeneID.

keepIntergenics

Logical. By default, all transcriptomes keep gene expression values for defined intergenic "non-gene" regions defined in the gene map. These intergenic regions can be included or excluded from the DE fold change comparisons and results.

label

A character string that is passed to the gene plot tool, for inclusion in the main plot header.

doDE

Logical, controls whether the complete DE analysis is performed, or whether to just use results already present in the DE subfolder. Typically used to just remake gene plot images.

PLOT.FUN

An alternative function to use for generating gene plot images, that accepts a single GeneID as its argument. Use NA to suppress all plotting.

...

Other arguments passed down the to gene plotting function.

Details

Even though these 5 DE tools implement different methods of determining differential expression and take different input arguments, we use a common calling command line to simplify the use of all 5 tools and to standardize how they report their results.

The grouping column from the annotation file determines: how the samples are combined into groups, the names for all result files, and the number of different groups being compared. When more than 2 groups are being compared, a K-ways comparison is performed such that each one group is compared against all other groups combined, like a "Us against all other groups who are not us" strategy.

Each comparison creates a family of result files, with suffix names "UP" and "DOWN", to convey the direction of each comparison. Note that in the case of just 2 groups, the UP and DOWN results are virtually symmetric, but that is never true for 3+ group comparisons. Each comparison file uses a composite naming strategy combining <Group>.<Species>.<DEtool>.<DirectionSuffix>.

Value

A subfolder of result files, with a name constructed from the current species prefix and folderName. For each group name, a set of DE result files in various formats:

Ratio.txt

A tab delimited file of all genes in the species, sorted by fold-change and P-value, that includes all DE metrics returned by that DE tool.

UP.html
DOWN.html

A pair of HTML files of gene expression showing just the top Ngenes genes that are most differentially expressed for that comparison group and direction.

All.GeneData.txt

A tab delimited matrix file of all genes in the species, giving the expression values used by that DE tool (RPKM for some, READ counts for DESeq & EdgeR).

Cluster & PCA

A set of .PNG plots that visually summarize the similarity of the transcriptsomes. The Round Robin DE tool augments the clustering with "group average" transcriptomes as well.

Author(s)

Bob Morrison

References

  DESeq:       Anders,  Genome Biology (2010)
  EdgeR:       Robinson,  Biostatistics (2008)
  RankProduct: Breitling,  FEBS Letters (2004)
  RoundRobin:  Morrison (unpublished)
  SAM:         Tusher,  PNAS (2001)
  

See Also

pipe.DiffExpression for turning a set of transcriptomes into ratio files needed for RoundRobin pipe.MetaResults for dispatching all DE tools and merging their results.


robertdouglasmorrison/DuffyNGS documentation built on April 13, 2025, 8:48 p.m.