readBiokitFeatureAnnotation: Read feature annotation from Biokit directory

View source: R/readBiokitAsDGEList.R

readBiokitFeatureAnnotationR Documentation

Read feature annotation from Biokit directory

Description

Read feature annotation from Biokit directory

Usage

readBiokitFeatureAnnotation(dir, anno = c("refseq", "ensembl"))

Arguments

dir

Character string, a Biokit output directory.

anno

Character, indicating the annotation type.

Value

A data.frame containing feature annotation, with feature IDs as characters in rownames. The data frame contains following columns depending on the anno parameter:

  1. FeatureName, the primary key of feature name as characters

  2. GeneID (refseq only) or EnsemblID (ensembl only)

  3. GeneSymbol

  4. mean: mean length

  5. median: median length

  6. longest_isoform: longest isoform

  7. merged: total length of merged exons

The function depends on the refseq.annot.gz (ensembl.annot.gz) and refseq.geneLength.gz (ensembl.geneLength.gz) files in the biokit directory.

If .annot.gz file is not found (which can be the case, for instance, when older biokit output directories are used), feature annotation is read from the count GCT file. The resulting data.frame will only contain two columns: FeatureName and Description.

If .geneLength.gz file is not found, no gene length information is appended.

Examples

## TODO add small example files

bedapub/ribiosNGS documentation built on Feb. 10, 2025, 12:34 a.m.