optimizedTranscriptLengths: Load length and names of all transcripts

optimizedTranscriptLengthsR Documentation

Load length and names of all transcripts

Description

A speedup wrapper around transcriptLengths, default load time of lengths is ~ 15 seconds, if ORFik fst optimized lengths object has been made, load that file instead: load time reduced to ~ 0.1 second.

Usage

optimizedTranscriptLengths(
  txdb,
  with.utr5_len = TRUE,
  with.utr3_len = TRUE,
  create.fst.version = FALSE
)

Arguments

txdb

a TxDb file or a path to one of: (.gtf ,.gff, .gff2, .gff2, .db or .sqlite), if it is a GRangesList, it will return it self.

with.utr5_len

logical TRUE, include length of 5' UTRs, ignored if .fst exists

with.utr3_len

logical TRUE, include length of 3' UTRs, ignored if .fst exists

create.fst.version

logical, FALSE. If TRUE, creates a .fst version of the transcript length table (if it not already exists), reducing load time from ~ 15 seconds to ~ 0.01 second next time you run filterTranscripts with this txdb object. The file is stored in the same folder as the genome this txdb is created from, with the name:
paste0(ORFik:::remove.file_ext(metadata(txdb)[3,2]), "_", gsub(" \(.*| |:", "", metadata(txdb)[metadata(txdb)[,1] == "Creation time",2]), "_txLengths.fst")
Some error checks are done to see this is a valid location, if the txdb data source is a repository like UCSC and not a local folder, it will not be made.

Value

a data.table of loaded lengths 8 columns, 1 row per transcript isoform.

Examples

dt <- optimizedTranscriptLengths(ORFik.template.experiment())
dt
dt[cds_len > 0,] # All mRNA

Roleren/ORFik documentation built on April 25, 2024, 8:41 p.m.