ERVmapParam-class: ERVmap parameter class
In functionalgenomics/atena: Analysis of Transposable Elements

ERVmapParam-class

R Documentation

ERVmap parameter class

Description

This is a class for storing parameters provided to the ERVmap algorithm. It is a subclass of the 'QuantifyParam-class'.

Build an object of the class ERVmapParam

Usage

ERVmapParam(
  bfl,
  teFeatures,
  aggregateby = character(0),
  ovMode = "ovUnion",
  geneFeatures = NULL,
  singleEnd = TRUE,
  ignoreStrand = TRUE,
  strandMode = 1L,
  fragments = !singleEnd,
  maxMismatchRate = 0.02,
  suboptimalAlignmentTag = "auto",
  suboptimalAlignmentCutoff = 5,
  geneCountMode = "all",
  verbose = TRUE
)

## S4 method for signature 'ERVmapParam'
show(object)

Arguments

`bfl`	A `BamFile` or `BamFileList` object, or a character string vector of BAM filenames.
`teFeatures`	A `GRanges` or `GRangesList` object with the transposable element (TE) annotated features to be quantified. Elements in this object should have names, which are used as a grouping factor for genomic ranges forming a common locus, unless other metadata column names are specified in the `aggregateby` parameter.
`aggregateby`	Character vector with column names in the annotation to be used to aggregate quantifications. By default, this is an empty vector, which means that the names of the input `GRanges` or `GRangesList` object given in the `teFeatures` parameter are used to aggregate quantifications.
`ovMode`	Character vector indicating the overlapping mode. Available options are: "ovUnion" (default) and "ovIntersectionStrict", which implement the corresponding methods from HTSeq (https://htseq.readthedocs.io/en/release_0.11.1/count.html). Ambiguous alignments (alignments overlapping > 1 feature) are addressed as in the original ERVmap algorithm.
`geneFeatures`	(Default NULL) A `GRanges` or `GRangesList` object with the gene annotated features to be quantified. Overlaps with unique reads are first tallied with respect to these gene features. Elements should have names indicating the gene name/id. In case that `geneFeatures` is a `GRanges` and contains a metadata column named `type`, only the elements with `type` = `exon` are considered for the analysis. Then, exon counts are summarized to the gene level. If NULL, gene expression is not quantified.
`singleEnd`	(Default TRUE) Logical value indicating if reads are single (`TRUE`) or paired-end (`FALSE`).
`ignoreStrand`	(Default TRUE) A logical which defines if the strand should be taken into consideration when computing the overlap between reads and annotated features. When `ignore_strand = FALSE`, an aligned read is considered to be overlapping an annotated feature as long as they have a non-empty intersecting genomic range on the same strand, while when `ignoreStrand = TRUE` the strand is not considered.
`strandMode`	(Default 1) Numeric vector which can take values 0, 1 or 2. The strand mode is a per-object switch on `GAlignmentPairs` objects that controls the behavior of the strand getter. See `GAlignmentPairs` class for further detail. If `singleEnd = TRUE`, then `strandMode` is ignored.
`fragments`	(Default not `singleEnd`) A logical; applied to paired-end data only. When `fragments=TRUE`, the read-counting method in the original ERVmap algorithm is applied: each mate of a paired-end read is counted (including ambiguous and not properly paired reads). When `fragments=FALSE`, if the two mates of a paired-end read map to the same element, they are counted as a single hit and singletons, reads with unmapped pairs and other ambiguous or not properly paired fragments are not counted (see "Pairing criteria" in `readGAlignments()`).
`maxMismatchRate`	(Default 0.02) Numeric value storing the maximum mismatch rate employed by the ERVmap algorithm to discard aligned reads whose rate of sum of hard and soft clipping or whose rate of the edit distance over the genome reference to the length of the read is above this threshold.
`suboptimalAlignmentTag`	(Default "auto") Character string storing the tag name in the BAM files that stores the suboptimal alignment score used in the third filter of ERVmap; see Tokuyama et al. (2018). The default, `suboptimalAlignmentTag="auto"`, first extracts the name of the read mapper software from one or more BAM files. If BAM files were generated by BWA, the suboptimal alignment scores are obtained from a tag called `XS`. For other read mappers, the suboptimal alignment score is considered to be missing since, except from BWA, no other aligner provides a tag with suboptimal alignment scores. In this case, the available secondary alignments are used to implement an analogous approach to that of the third ERVmap filter. When `suboptimalAlignmentTag="none"`, it also performs the latter approach even when the tag `XS` is available. When this parameter is different from `"auto"` and `"none"`, a tag with the given name is used to extract the suboptimal alignment score.
`suboptimalAlignmentCutoff`	(Default 5) Numeric value storing the cutoff above which the difference between the alignment score and the suboptimal alignment score is considered sufficiently large to retain the alignment. When this value is set to `NA`, the filtering step based on suboptimal alignment scores is skipped.
`geneCountMode`	(Default `"all"`) Character string indicating if the ERVmap read filters applied to quantify TEs expression should also be applied when quantifying gene expression (`"ervmap"`) or not (`"all"`), in which case all primary alignments mapping to genes are counted.
`verbose`	(Default `TRUE`) Logical value indicating whether to report progress.
`object`	A ERVmapParam object.

Details

This is the constructor function for objects of the class ERVmapParam-class. This type of object is the input to the function qtex() for quantifying expression of transposable elements using the ERVmap method Tokuyama et al. (2018). The ERVmap algorithm processes reads following conservative filtering criteria to provide reliable raw count data for each TE.

Value

A ERVmapParam object.

Slots

readMapper: The name of the software used to align reads, obtained from the BAM file header.
singleEnd: (Default FALSE) Logical value indicating if reads are single (TRUE) or paired-end (FALSE).
strandMode: (Default 1) Numeric vector which can take values 0, 1 or 2. The strand mode is a per-object switch on GAlignmentPairs objects that controls the behavior of the strand getter. See GAlignmentPairs class for further detail. If singleEnd = TRUE, then strandMode #' is ignored.
ignoreStrand: (Default TRUE) A logical which defines if the strand should be taken into consideration when computing the overlap between reads and TEs in the annotations. When ignore_strand = FALSE, only those reads which overlap the TE and are on the same strand are counted. On the contrary, when ignore_strand = TRUE, any read overlapping an element in teFeatures is counted regardless of the strand.
fragments: (Default not singleEnd) A logical; applied to paired-end data only. When fragments=TRUE, the read-counting method in the original ERVmap algorithm is applied: each mate of a paired-end read is counted (including ambiguous and not properly paired reads). When fragments=FALSE, if the two mates of a paired-end read map to the same element, they are counted as a single hit and singletons, reads with unmapped pairs and other ambiguous or not properly paired fragments are not counted (see "Pairing criteria" in readGAlignments()).
maxMismatchRate: (Default 0.02) Numeric value storing the maximum mismatch rate employed by the ERVmap algorithm to discard aligned reads whose rate of sum of hard and soft clipping, or of the edit distance over the genome reference, to the length of the read is above this threshold.
suboptimalAlignmentTag: (Default "auto") Character string storing the tag name in the BAM files that stores the suboptimal alignment score used in the third filter of ERVmap; see Tokuyama et al. (2018). The default, suboptimalAlignmentTag="auto", assumes that either the BAM files were generated by BWA and include a tag called XS that stores the suboptimal alignment score or, if the XS tag is not available, then it uses the available secondary alignments to implement an analogous approach to that of the third ERVmap filter. When suboptimalAlignmentTag="none", it also performs the latter approach even when the tag XS is available. When this parameter is different from "auto" and "none", a tag with the given name is used to extract the suboptimal alignment score. The absence of that tag will prompt an error.
suboptimalAlignmentCutoff: (Default 5) Numeric value storing the cutoff above which the difference between the alignment score and the suboptimal alignment score is considered sufficiently large to retain the alignment. When this value is set to NA, then the filtering step based on suboptimal alignment scores is skipped.
geneCountMode: (Default "all") Character string indicating if the ERVmap read filters applied to quantify TEs expression should also be applied when quantifying gene expression ("ervmap") or not ("all"), in which case all primary alignments mapping to genes are counted.

References

Tokuyama M et al. ERVmap analysis reveals genome-wide transcription of human endogenous retroviruses. PNAS. 2018;115(50):12565-12572. DOI: https://doi.org/10.1073/pnas.1814589115

Examples

bamfiles <- list.files(system.file("extdata", package="atena"),
                       pattern="*.bam", full.names=TRUE)
## Not run: 
rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser,
                    strict=FALSE, insert=500)
rmskLTR <- getLTRs(rmskat, relLength=0.8,
                   fullLength=TRUE,
                   partial=TRUE,
                   otherLTR=TRUE)

## End(Not run)

## DO NOT TYPE THIS INSTRUCTION, WHICH JUST LOADS A PRE-COMPUTED ANNOTATION
## YOU SHOULD USE THE INSTRUCTIONS ABOVE TO FETCH ANNOTATIONS
rmskLTR <- readRDS(system.file("extdata", "rmskatLTRrlen80flenpartoth.rds",
                               package="atena"))

## build a parameter object for ERVmap
empar <- ERVmapParam(bamfiles,
                     teFeatures=rmskLTR,
                     singleEnd=TRUE,
                     ignoreStrand=TRUE,
                     suboptimalAlignmentCutoff=NA)
empar

functionalgenomics/atena documentation built on April 17, 2025, 11:51 a.m.