ovlp.res: Overlappings resolution

View source: R/ovlp.res.r

ovlp.resR Documentation

Overlappings resolution

Description

This function resolves overlapping repeats assigned to the same transcript and returns a data frame of repeats with no overlaps. The user can define the criteria to solve the overlaps, either by higher score (HS), longer length (LE) or lower Kimura's distances (LD).

Usage

ovlp.res(
  RepMask,
  anot,
  gff3,
  stranded = T,
  outdir,
  rm.cotrans = F,
  trpt.length = NULL,
  align,
  threads = 1,
  ignore.aln.pos = T,
  over.res = c("HS", "LS", "LD"),
  by = "classRep",
  ...
)

Arguments

RepMask

RepeatMasker output file. If rm.cotrans = F, them you must enter a RepeatMasker output file without co-transcripted repeats.

anot

annotation file in outfmt6 format. It is necessary when the option rm.cotrans = T

gff3

gff3 file. It is necessary when the option rm.cotrans = T

stranded

logical vector indicating if the library is strand specific

outdir

Output directory

trpt.length

A data.frame with two columns: the first column must contain the name of the transcripts, and the second the length corresponding to each transcript. The default is trpt.length=NULL, and the lengths for each transcript are taken from the RepeatMasker file.

align

.align file

threads

Number of cores to use in the processing. By default threads = 1

ignore.aln.pos

The RepeatMasker alignments file may have discrepancies in the repeats positions with respect to the output file. If you selected over.res = "LD", then you can choose whether to take into account the positions of the alignment file or to take the average per repeats class (default).

over.res

Indicates the method by which the repetition overlap will be resolved ("HS" by default). HS: higher score, bases are assigned to the element with the highest score LS: longer element, bases are assigned to the longest element LD: lower divergence, bases are assigned to the element with the least divergence. in all cases both elements have the same characteristics, the bases are assigned to the first element.

rm.cotrnas

logical vector indicating whether co-transcribed repeats should be removed

cleanTEsProt

logical vector indicating whether the search for TEs-related proteins should be carried out (e.g. transposases, integrases, env, reverse transcriptase, etc.). We recommend that users use a curated annotations file, in which these genes have been excluded; therefore the default option is F. When T is selected, a search is performed against a database obtained from UniProt, so we recommend that the annotations file have this format for the subject sequence id (e.g. "CO1A2_MOUSE"/"sp|Q01149|CO1A2_MOUSE"/"tr|H9GLU4|H9GLU4_ANOCA")

featureSum

Returns statistics related to the characteristics of the transcripts. Requires a gff3 file. If TRUE, returns a list of the


FemeniasM/ExplorATEproject documentation built on Nov. 30, 2022, 5:26 p.m.