atenaParam-class: atena parameter class

atenaParam-classR Documentation

atena parameter class

Description

This is a class for storing parameters to quantify TE (and gene) expression using the atena method. It is a subclass of the 'QuantifyParam-class'.

Build an object of the class atenaParam.

Usage

atenaParam(
  bfl,
  teFeatures,
  aggregateby = character(0),
  ovMode = "ovUnion",
  geneFeatures = NULL,
  singleEnd = TRUE,
  strandMode = 1L,
  ignoreStrand = FALSE,
  fragments = TRUE,
  pi_prior = 0L,
  theta_prior = 0L,
  em_epsilon = 1e-07,
  maxIter = 100L,
  reassign_mode = "exclude",
  conf_prob = 0.9,
  verbose = TRUE
)

## S4 method for signature 'atenaParam'
show(object)

Arguments

bfl

A BamFile or BamFileList object, or a character string vector of BAM filenames.

teFeatures

A GRanges or GRangesList object. Elements in this object should have names, which are used as a grouping factor for genomic ranges forming a common locus. This grouping is performed previous to TE expression quantification, unlike the aggregation of quantifications performed when the aggregateby parameter is specified, which is performed after individual TE instances are quantified.

aggregateby

Character vector with column names from the annotation to be used to aggregate quantifications. By default, this is an empty vector, which means that the names of the input GRanges or GRangesList object given in the teFeatures parameter are used to aggregate quantifications.

ovMode

Character vector indicating the overlapping mode. Available options are: "ovUnion" (default) and "ovIntersectionStrict", which implement the corresponding methods from HTSeq (https://htseq.readthedocs.io/en/release_0.11.1/count.html). Ambiguous alignments (alignments overlapping > 1 feature) are not counted.

geneFeatures

(Default NULL) A GRanges or GRangesList object with the gene annotated features to be quantified. Unique reads are first tallied with respect to these gene features whereas multi-mapping reads are preferentially assigned to TEs. Elements should have names indicating the gene name/id. In case that geneFeatures is a GRanges and contains a metadata column named type, only the elements with type = exon are considered for the analysis. Then, exon counts are summarized to the gene level. If NULL, gene expression is not quantified.

singleEnd

(Default TRUE) Logical value indicating if reads are single (TRUE) or paired-end (FALSE).

strandMode

(Default 1) Numeric vector which can take values 0, 1 or 2. The strand mode is a per-object switch on GAlignmentPairs objects that controls the behavior of the strand getter. See GAlignmentPairs class for further detail. If singleEnd = TRUE, then strandMode is ignored.

ignoreStrand

(Default FALSE) A logical which defines if the strand should be taken into consideration when computing the overlap between reads and annotated features. When ignoreStrand = FALSE, an aligned read is considered to be overlapping an annotated feature as long as they have a non-empty intersecting genomic range on the same strand, while when ignoreStrand = TRUE the strand is not considered.

fragments

(Default TRUE) A logical; applied to paired-end data only. When fragments=FALSE, the read-counting method only counts ‘mated pairs’ from opposite strands (non-ambiguous properly paired reads), while when fragments=TRUE same-strand pairs, singletons, reads with unmapped pairs and other ambiguous or not properly paired fragments are also counted (see "Pairing criteria" in readGAlignments()). For further details see summarizeOverlaps().

pi_prior

(Default 0) A positive numeric object indicating the prior on pi. The same prior can be specified for all features setting pi_prior as a scalar, or each feature can have a specific prior by setting pi_prior as a vector with names() corresponding to all feature names. Setting a pi prior is equivalent to adding n unique reads.

theta_prior

(Default 0) A positive numeric object indicating the prior on Q. The same prior can be specified for all features setting theta_prior as a scalar, or each feature can have a specific prior by setting theta_prior as a vector with names() corresponding to all feature names. Equivalent to adding n non-unique reads.

em_epsilon

(Default 1e-7) A numeric scalar indicating the EM Algorithm Epsilon cutoff.

maxIter

A positive integer scalar storing the maximum number of iterations of the EM SQUAREM algorithm (Du and Varadhan, 2020). Default is 100 and this value is passed to the maxiter parameter of the squarem() function.

reassign_mode

(Default 'exclude') Character vector indicating reassignment mode after EM step. Available methods are 'exclude' (reads with more than one best assignment are excluded from the final counts), 'choose' (when reads have more than one best assignment, one of them is randomly chosen), 'average' (the read count is divided evenly among the best assignments) and 'conf' (only assignments that exceed a certain threshold -defined by conf_prob parameter- are accepted, then the read count is proportionally divided among the assignments above conf_prob).

conf_prob

(Default 0.9) Minimum probability for high confidence assignment.

verbose

(Default TRUE) Logical value indicating whether to report progress.

object

A atenaParam object.

Details

This is the constructor function for objects of the class atenaParam-class. This type of object is the input to the function qtex() for quantifying expression of transposable elements, which will call the atena method with this type of object. The atena method uses a multiple '__no_feature' approach in which as many '__no_feature' features as different overlapping patterns of multimapping reads in the overlapping matrix are used to represent alignments mapping outside annotations.

Value

A atenaParam object.

Slots

singleEnd

(Default TRUE) Logical value indicating if reads are single (TRUE) or paired-end (FALSE).

strandMode

(Default 1) Numeric vector which can take values 0, 1 or 2. The strand mode is a per-object switch on GAlignmentPairs objects that controls the behavior of the strand getter. See GAlignmentPairs class for further detail. If singleEnd = TRUE, then strandMode is ignored.

ignoreStrand

(Default FALSE) A logical which defines if the strand should be taken into consideration when computing the overlap between reads and annotated features. When ignoreStrand = FALSE, an aligned read is considered to be overlapping an annotated feature as long as they have a non-empty intersecting genomic range on the same strand, while when ignoreStrand = TRUE the strand is not considered.

fragments

(Default TRUE) A logical; applied to paired-end data only. When fragments=FALSE, the read-counting method only counts ‘mated pairs’ from opposite strands (non-ambiguous properly paired reads), while when fragments=TRUE same-strand pairs, singletons, reads with unmapped pairs and other ambiguous or not properly paired fragments are also counted (see "Pairing criteria" in readGAlignments()). For further details see summarizeOverlaps().

pi_prior

(Default 0) A positive numeric object indicating the prior on pi. The same prior can be specified for all features setting pi_prior as a scalar, or each feature can have a specific prior by setting pi_prior as a vector with names() corresponding to all feature names. Setting a pi prior is equivalent to adding n unique reads.

theta_prior

(Default 0) A positive numeric object indicating the prior on Q. The same prior can be specified for all features setting theta_prior as a scalar, or each feature can have a specific prior by setting theta_prior as a vector with names() corresponding to all feature names. Equivalent to adding n non-unique reads.

em_epsilon

(Default 1e-7) A numeric scalar indicating the EM Algorithm Epsilon cutoff.

maxIter

A positive integer scalar storing the maximum number of iterations of the EM SQUAREM algorithm (Du and Varadhan, 2020). Default is 100 and this value is passed to the maxiter parameter of the squarem() function.

reassign_mode

(Default 'exclude') Character vector indicating reassignment mode after EM step. Available methods are 'exclude' (reads with more than one best assignment are excluded from the final counts), 'choose' (when reads have more than one best assignment, one of them is randomly chosen), 'average' (the read count is divided evenly among the best assignments) and 'conf' (only assignments that exceed a certain threshold -defined by conf_prob parameter- are accepted, then the read count is proportionally divided among the assignments above conf_prob).

conf_prob

(Default 0.9) Minimum probability for high confidence assignment.

Examples

bamfiles <- list.files(system.file("extdata", package="atena"),
                       pattern="*.bam", full.names=TRUE)
## Not run: 
## use the following two instructions to fetch annotations, they are here
## commented out to enable running this example quickly when building and
## checking the package
rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser,
                    strict=FALSE, insert=500)
rmskLTR <- getLTRs(rmskat, relLength=0.8,
                   fullLength=TRUE,
                   partial=TRUE,
                   otherLTR=TRUE)

## End(Not run)

## DO NOT TYPE THIS INSTRUCTION, WHICH JUST LOADS A PRE-COMPUTED ANNOTATION
## YOU SHOULD USE THE INSTRUCTIONS ABOVE TO FETCH ANNOTATIONS
rmskLTR <- readRDS(system.file("extdata", "rmskatLTRrlen80flenpartoth.rds",
                               package="atena"))

## build a parameter object for the atena method
atpar <- atenaParam(bfl=bamfiles,
                    teFeatures=rmskLTR,
                    singleEnd=TRUE,
                    ignoreStrand=TRUE)
atpar



functionalgenomics/atena documentation built on May 7, 2024, 10:33 a.m.