annotaTEs: Get RepeatMasker UCSC annotations

View source: R/annotations.R

annotaTEsR Documentation

Get RepeatMasker UCSC annotations

Description

The annotaTEs() function fetches RepeatMasker UCSC transposable element (TE) annotations using AnnotationHub and parses them.

Usage

annotaTEs(
  genome = "hg38",
  parsefun = rmskidentity,
  verbose = TRUE,
  AHid = NULL,
  ...
)

Arguments

genome

The genome version of the desired RepeatMasker annotations (e.g. "hg38").

parsefun

A function to parse the annotations:

  • Function rmskidentity returns RepeatMasker annotations as present in AnnotationHub, without processing them.

  • Function rmskbasicparser parses annotations by removing low complexity regions, simple repeats, satellites, rRNA, scRNA, snRNA, srpRNA and tRNA. Also removes TEs with a strand different than "+" or "-". Modifies "repFamily" and "repClass" columns when a "?" is present or when they are defined as "Unknown" or "Other". Finally, assigns a unique id to each TE instance by adding the suffix "_dup" plus a number at the end of the "repName".

  • Function rmskatenaparser parses RepeatMasker annotations reconstructing fragmented TEs by assembling together fragments from the same TE that are close enough. For LTR class TEs, it tries to reconstruct full-length and partial TEs following the LTR - internal region - LTR structure. Input is a GRanges object and output is a GRangesList object.

  • Function OneCodeToFindThemAll parses annotations following the 'One code to find them all' method by (Bailly-Bechet et al. 2014). Input is a GRanges object and output is a GRangesList object.

  • User-defined function. Input and output should be GRanges objects.

verbose

(Default TRUE) Logical value indicating whether to report progress.

AHid

AnnotationHub unique identifier, of the form AH12345, of an object with TE annotations. This is an optional argument to specify a concrete AnnotationHub resource, for instance when more there is more than one RepeatMasker annotation available for a specific genome version. If AHid is not specified, the latest RepeatMasker annotation is be used.

...

Arguments passed to parsefun.

Details

Given a specific genome version, the annotaTEs() function fetches RepeatMasker annotations from UCSC Genome Browser using the AnnotationHub package. Since RepeatMasker not only provides TE annotations but also low complexity DNA sequences and other types of repeats, a specific parsefun can be set to parse these annotations (e.g. rmskbasicparser or a user-defined function). If no parsing is required, parsefun can be set to rmskidentity.

Value

A GRanges object with transposable element annotations.

See Also

AnnotationHub

Examples

rmskid <- annotaTEs(genome="hg19", parsefun=rmskidentity)
rmskid



functionalgenomics/atena documentation built on Nov. 4, 2024, 7:33 p.m.