pipe.MetaResults: Dispatch Multiple Differential Expression Tools and then...

pipe.MetaResultsR Documentation

Dispatch Multiple Differential Expression Tools and then Merge Results.

Description

Top level wrapper function to call a family of published DE tools, that each find significant differentially expressed genes between groups of samples, and then after all DE tools complete, combine and merge all separate DE results into one folder of Meta Results.

Usage

pipe.MetaResults(sampleIDset, speciesID = getCurrentSpecies(), annotationFile = "Annotation.txt", 
		optionsFile = "Options.txt", useMultiHits = TRUE, results.path = NULL, 
		folderName = "", groupColumn = "Group", colorColumn = "Color", 
		average.FUN = sqrtmean, tools = c("RoundRobin", "RankProduct", "SAM", "EdgeR", "DESeq"), 
		altGeneMap = NULL, altGeneMapLabel = NULL, targetID = NULL, Ngenes = 100, 
		geneColumnHTML = if (speciesID %in% MAMMAL_SPECIES) "NAME" else "GENE_ID", 
		keepIntergenics = FALSE, verbose = TRUE, label = "", doDE = TRUE, 
		makePlots = doDE, copyPlots = makePlots, nFDRsimulations = 0, 
		addCellTypes = (speciesID %in% MAMMAL_SPECIES), 
		addLifeCycle = (speciesID %in% PARASITE_SPECIES), PLOT.FUN = NULL, ...)

Arguments

sampleIDset

Character vector of SampleIDs, giving the full set of samples that will take part in the DE calculations.

speciesID

The SpeciesID for one single species. The DE tools do not operate on multipe species at one time.

annotationFile

File of sample annotation details, which specifies all needed sample-specific information about the samples under study. See DuffyNGS_Annotation.

optionsFile

File of processing options, which specifies all processing parameters that are not sample specific. See DuffyNGS_Options.

useMultiHits

Logical. By default, all DE tools use the RPKM or READ values from the transcriptomes that correspond to keeping all aligned reads, including those alignments called "MultiHit" reads. If FALSE, this behavior can be restricted to only using uniquely mapped reads. Since the transcriptomes store both methods of counting gene abundance, changing how the DE results may be impacted is trivial.

results.path

The top level folder path for writing result files to. By default, read from the Options file entry 'results.path'.

folderName

Required character string, with no embedded blanks, used to name the folder of DE results that will be generated by the DE tool. Typically, use a short but informative name that describes the groups being compared.

groupColumn

Character string specifying one column of the annotation table, to give the group name for each sample.

colorColumn

Character string specifying one column of the annotation table, to give the group color for each sample.

average.FUN

The function to use for taking the average value of gene ranks across the 5 DE results. Passed down to metaResults

tools

Character vector of which DE tools to use. Defaults to all 5 currently implemented.

altGeneMap

An alternate data frame of gene annotations, in a format identical to getCurrentGeneMap, that has the gene names and locations to be measured for differential expression. By default, use the standard built-in gene map for this species.

altGeneMapLabel

A character string identifier for the alternate gene map, that becomes part of all created path and file names to indicate the gene map that produced the transcriptomes used in this DE analysis.

targetID

Optional character string giving the target organism(s) being compared. Used by the gene plotting tools, defaults to the current target.

Ngenes

Number of gene to show in the HTML results and create gene plot images for.

geneColumnHTML

The name of one column in the current gene map, that contains the identifier shown in the HTML results. Some genomes require complex compound GeneIDs to give genomic location specificity, but are unwieldy for routine use. This argument lets a second simpler identifier be used as a surrogate GeneID.

keepIntergenics

Logical. By default, all transcriptomes keep gene expression values for defined intergenic "non-gene" regions defined in the gene map. These intergenic regions can be included or excluded from the DE fold change comparisons and results.

label

A character string that is passed to the gene plot tool, for inclusion in the main plot header.

doDE

Logical, controls whether the complete DE analysis is performed, or whether to just use results already present in the DE subfolder. When FALSE, do not dispatch any DE tools, just do the final meta results combination steps.

makePlots

Logical, should the dispatched DE tools make plots or not.

copyPlots

Logical, should the final meta results step copy over plots from the dispatched DE tools.

nFDRsimulations

Passed down to metaResults, to perform an optional false discovery rate calculation.

addCellTypes

Logical, controls the addition of additional data columns and result files that associate a immune cell subset call to each gene, and then estimate cell subset enrichment for each DE group. Only currently applicable for mammalian genomes.

addLifeCycle

Logical, controls the addition of additional data columns and result files that associate a parasite lifecycle stage call to each gene, and then estimate lifecycle stage enrichment for each DE group. Only currently applicable for parasite genomes.

PLOT.FUN

An alternative function to use for generating gene plot images, that accepts a single GeneID as its argument. Use NA to suppress all plotting.

...

Other arguments passed down the to gene plotting function.

Details

This pipe acts as a top level DE tool, dispatching independent DE tools with differing methods, and then merging and averaging all those results into a final meta DE result.

The grouping column from the annotation file determines: how the samples are combined into groups, the names for all result files, and the number of different groups being compared. When more than 2 groups are being compared, a K-ways comparison is performed such that each one group is compared against all other groups combined, like a "Us against all other groups who are not us" strategy.

Each comparison creates a family of result files, with suffix names "UP" and "DOWN", to convey the direction of each comparison. Note that in the case of just 2 groups, the UP and DOWN results are roughly symmetric, but that is never true for 3+ group comparisons. Each comparison file uses a composite naming strategy combining <Group>.<Species>.Meta.<DirectionSuffix>.

Value

A subfolder of result files, with a name constructed from the current species prefix and folderName. For each group name, a set of DE result files in various formats:

UP.txt
DOWN.txt

Tab delimited files of all genes in the species, sorted by average fold-change, average P-value, and average rank, across all the DE tooks, and includes extra columns showing the actual gene rank position in each DE tool.

UP.html
DOWN.html

A pair of HTML files of gene expression showing just the top Ngenes genes that are most differentially expressed for that comparison group.

Cluster & PCA

A set of .PNG plots that visually summarize the similarity of the transcriptsomes, augmented with "group average" transcriptomes as well.

Enrichment.csv

Optional CSV files of gene subset enrichment, by cell type or life cycle stage

Note

By default, the function invokes multicore parallel processing to run the 5 DE tools at the same time. As such, all DE tools are writing their status to standard out at the same time, making it very hard to catch any runtime error messages. If an unexpected error occurs, turn off multicore behavior with multicore.setup(1) and rerun. The true error cause should now be more easy to deduce.

Author(s)

Bob Morrison

See Also

pipe.MetaGeneSets for dispatching all Gene Set & Pathway tools and merging their results.


robertdouglasmorrison/DuffyNGS documentation built on March 24, 2024, 4:16 p.m.