filterTranscripts: Filter transcripts by lengths

Description Usage Arguments Details Value Examples

Description

Filter transcripts to those who have leaders, CDS, trailers of some lengths, you can also pick the longest per gene.

Usage

1
2
3
4
5
6
7
8
9
filterTranscripts(
  txdb,
  minFiveUTR = 30L,
  minCDS = 150L,
  minThreeUTR = 30L,
  longestPerGene = TRUE,
  stopOnEmpty = TRUE,
  by = "tx"
)

Arguments

txdb

a TxDb file or a path to one of: (.gtf ,.gff, .gff2, .gff2, .db or .sqlite), if it is a GRangesList, it will return it self.

minFiveUTR

(integer) minimum bp for 5' UTR during filtering for the transcripts. Set to NULL if no 5' UTRs exists for annotation.

minCDS

(integer) minimum bp for CDS during filtering for the transcripts

minThreeUTR

(integer) minimum bp for 3' UTR during filtering for the transcripts. Set to NULL if no 3' UTRs exists for annotation.

longestPerGene

logical (TRUE), return only longest valid transcript per gene. NOTE: This is by priority longest cds isoform, if equal then pick longest total transcript. So if transcript is shorter but cds is longer, it will still be the one returned.

stopOnEmpty

logical TRUE, stop if no valid transcripts are found ?

by

a character, default "tx" Either "tx" or "gene". What names to output region by, the transcript name "tx" or gene names "gene"

Details

If a transcript does not have a trailer, then the length is 0, so they will be filtered out if you set minThreeUTR to 1. So only transcripts with leaders, cds and trailers will be returned. You can set the integer to 0, that will return all within that group.

If your annotation does not have leaders or trailers, set them to NULL, since 0 does mean there must exist a column called utr3_len etc. Genes with gene_id = NA will be be removed.

Value

a character vector of valid transcript names

Examples

1
2
3
4
5
6
gtf_file <- system.file("extdata", "annotations.gtf", package = "ORFik")
txdb <- GenomicFeatures::makeTxDbFromGFF(gtf_file)
txNames <- filterTranscripts(txdb, minFiveUTR = 1, minCDS = 30,
                             minThreeUTR = 1)
loadRegion(txdb, "mrna")[txNames]
loadRegion(txdb, "5utr")[txNames]

ORFik documentation built on March 27, 2021, 6 p.m.