DRIMSeqFilter: Filter data using filtering procedure built into _DRIMSeq_...

View source: R/utility_functions.R

DRIMSeqFilterR Documentation

Filter data using filtering procedure built into DRIMSeq via the dmFilter function. Will automatically save the filtered versions of the various datasets described in sumToGene

Description

Filter data using filtering procedure built into DRIMSeq via the dmFilter function. Will automatically save the filtered versions of the various datasets described in sumToGene

Usage

DRIMSeqFilter(
  abGene,
  cntGene,
  key,
  min_samps_feature_expr,
  min_feature_expr,
  min_samps_feature_prop,
  min_feature_prop,
  min_samps_gene_expr,
  min_gene_expr,
  tx2gene,
  countsFromAbundance,
  sampstouse = NULL,
  failedinfRepsamps = NULL
)

Arguments

abGene

is the data.frame of abundances (TPMs) for each sample saved by sumToGene

cntGene

is the data.frame of counts and lengths for each sample saved by sumToGene

key

is a data.frame with columns "Sample" (corresponding to the unique biological identifier for the analysis), "Condition" (giving the condition/treatment effect variables for the data), and "Identifier", which should be named "Sample1", "Sample2", ... up to the number of rows of key. This "Identifier" needs to be created like this even if the observations don't correspond to unique biological samples.

min_samps_feature_expr

From dmFilter documentation: Minimal number of samples where features (transcripts) should be expressed

min_feature_expr

From dmFilter documentation: Minimal feature (transcript) expression.

min_samps_feature_prop

From dmFilter documentation: Minimal number of samples where features (transcripts) should be expressed.

min_feature_prop

From dmFilter documentation: Minimal proportion for feature (transcript) expression. This value should be between 0 and 1.

min_samps_gene_expr

From dmFilter documentation: Minimal number of samples where genes should be expressed.

min_gene_expr

From dmFilter documentation: Minimal gene expression.

tx2gene

is a dataframe that matches transcripts to genes. Can be created by maketx2gene.

countsFromAbundance

character corresponding to the countsFromAbundance parameter used when importing the data with tximport. Possible values are "no", "scaledTPM", or "lengthScaledTPM".

sampstouse

is a vector of sample names (in the form of "Sample1", "Sample2", etc) to be used in the analysis. This argument should be used if you only want to run a subset of all sample ID's from key$Identifier.

failedinfRepsamps

is an optional parameter that gives names of samples (in the form of "Sample1", "Sample2", etc) that had the infRep sampler fail. This should not be needed, as newer versions of Salmon don't seem to have this issue but is left for backward compatability.

Details

This function internally calls dmFilter. See the documentation for that function for more information, including a discussion of setting all filtering parameters to zero to only remove features with zero expression across all samples and genes with only one non-zero feature (since DTU analysis cannot be performed if a gene has only one transcript. See also the file (1)DataProcessing.R in the package's SampleCode folder for example code.

Value

This function will save versions of abGene, cntGene, abDatasets, and cntDatasets containing information for only those genes and transcripts that pass filtering with the given input parameters. For more information on the output datasets see sumToGene.


skvanburen/CompDTUReg documentation built on Jan. 23, 2025, 9:01 a.m.