rm.cotransRep: Excludes co-transcripted repeats

View source: R/rm.cotransRep.r

rm.cotransRepR Documentation

Excludes co-transcripted repeats

Description

excludes repeats that are co-transcripted with gene coding sequences and return a clean RepeatMasker file for downstream analysis. If featureSum is TRUE, writes a table with the repeats that overlap either the 3'UTR, 5'UTR, or CDS regions.

Usage

rm.cotransRep(
  RepMask,
  anot,
  gff3,
  stranded = T,
  cleanTEsProt = F,
  featureSum = F,
  outdir
)

Arguments

RepMask

RepeatMasker output file

anot

annotation file in outfmt6 format

gff3

gff3 file

stranded

logical vector indicating if the library is strand specific

cleanTEsProt

logical vector indicating whether the search for TEs-related proteins should be carried out (e.g. transposases, integrases, env, reverse transcriptase, etc.). We recommend that users work with a curated annotations file, in which these genes have been excluded; therefore the default option is F. When T is selected, a search is performed against a database obtained from UniProt, so we recommend that the annotations file have this format for the subject sequence id (e.g. "CO1A2_MOUSE" or "sp|Q01149|CO1A2_MOUSE" or "tr|H9GLU4|H9GLU4_ANOCA")

featureSum

The function returns a summary of the protein coding transcripts that contain repeats. Three files are created in the output directory: features.summary.csv file, with the transcripts and their characteristics, features.summary.pdf file, with a barplot graph with the number of repetitions of each TE family found in the 5'-UTR, 3 '-UTR and CDS regions and RepeatMasker RM.clean.out file.

outdir

Output directory


FemeniasM/ExplorATEproject documentation built on Nov. 30, 2022, 5:26 p.m.