filter_transcripts: Filter lowly abundant transcripts.

View source: R/filter_transcripts.R

filter_transcriptsR Documentation

Filter lowly abundant transcripts.

Description

filter_transcripts filters transcripts, before loading the data, according to estimated transcript level counts. The function outputs a vector containing the list of transcripts which respect the filtering criteria across all samples (i.e., min_transcript_proportion, min_transcript_counts and min_gene_counts).

Usage

filter_transcripts(
  gene_to_transcript,
  transcript_counts,
  min_transcript_proportion = 0.01,
  min_transcript_counts = 1,
  min_gene_counts = 10
)

Arguments

gene_to_transcript

a matrix or data.frame with a list of gene-to-transcript correspondances. The first column represents the gene id, while the second one contains the transcript id.

transcript_counts

a matrix or data.frame, with 1 column per sample and 1 row per transcript, containing the estimated abundances for each transcript in each sample.

min_transcript_proportion

the minimum relative abundance (i.e., proportion) of a transcript in a gene.

min_transcript_counts

the minimum overall abundance of a transcript (adding counts from all samples).

min_gene_counts

the minimum overall abundance of a gene (adding counts from all samples).

Details

Transcript pre-filtering is highly suggested: it both improves the performance of the method and decreases its computational cost.

Value

A vector containing the list of transcripts which respect the filtering criteria.

Author(s)

Simone Tiberi simone.tiberi@uzh.ch

See Also

filter_genes, create_data, BANDITS_data

Examples

# specify the directory of the internal data:
data_dir = system.file("extdata", package = "BANDITS")

# load gene_to_transcript matching:
data("gene_tr_id", package = "BANDITS")

# Load the transcript level estimated counts via tximport:
library(tximport)
quant_files = file.path(data_dir, "STAR-salmon", paste0("sample", seq_len(4)), "quant.sf")
txi = tximport(files = quant_files, type = "salmon", txOut = TRUE)
counts = txi$counts

# transcript pre-filtering:
transcripts_to_keep = filter_transcripts(gene_to_transcript = gene_tr_id,
                                         transcript_counts = counts,
                                         min_transcript_proportion = 0.01,
                                         min_transcript_counts = 10,
                                         min_gene_counts = 20)
head(transcripts_to_keep)


SimoneTiberi/BANDITS documentation built on Nov. 15, 2023, 2:35 p.m.