View source: R/rm.cotransRep.r
rm.cotransRep | R Documentation |
excludes repeats that are co-transcripted with gene coding sequences and return a clean RepeatMasker file for downstream analysis. If featureSum is TRUE, writes a table with the repeats that overlap either the 3'UTR, 5'UTR, or CDS regions.
rm.cotransRep( RepMask, anot, gff3, stranded = T, cleanTEsProt = F, featureSum = F, outdir )
RepMask |
RepeatMasker output file |
anot |
annotation file in outfmt6 format |
gff3 |
gff3 file |
stranded |
logical vector indicating if the library is strand specific |
cleanTEsProt |
logical vector indicating whether the search for TEs-related proteins should be carried out (e.g. transposases, integrases, env, reverse transcriptase, etc.). We recommend that users work with a curated annotations file, in which these genes have been excluded; therefore the default option is F. When T is selected, a search is performed against a database obtained from UniProt, so we recommend that the annotations file have this format for the subject sequence id (e.g. "CO1A2_MOUSE" or "sp|Q01149|CO1A2_MOUSE" or "tr|H9GLU4|H9GLU4_ANOCA") |
featureSum |
The function returns a summary of the protein coding transcripts that contain repeats. Three files are created in the output directory: features.summary.csv file, with the transcripts and their characteristics, features.summary.pdf file, with a barplot graph with the number of repetitions of each TE family found in the 5'-UTR, 3 '-UTR and CDS regions and RepeatMasker RM.clean.out file. |
outdir |
Output directory |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.