meta.summarize: Summarize (concatenate) all predictions of a 'LTRpred.meta'...

View source: R/meta.summarize.R

meta.summarizeR Documentation

Summarize (concatenate) all predictions of a LTRpred.meta run

Description

Crawl through all genome predictions performed with LTRpred.meta and concatenate the prediction files for each species in the meta result folder generated by LTRpred.meta to a meta-species data.frame.

Usage

meta.summarize(
  result.folder,
  ltr.similarity = 70,
  quality.filter = TRUE,
  n.orfs = 0,
  strategy = "default"
)

Arguments

result.folder

path to meta result folder generated by LTRpred.meta.

ltr.similarity

only count elements that have an LTR similarity >= this threshold.

quality.filter

optimize search to remove potential false positives (e.g. duplicated genes, etc.). See Details for further information on the filter criteria.

n.orfs

minimum number of Open Reading Frames that must be found between the LTRs (if quality.filter = TRUE). See Details for further information on quality control.

strategy

quality filter strategy. Options are

  • strategy = "default" : see section Quality Control

  • strategy = "stringent" : in addition to filter criteria specified in section Quality Control, the filter criteria !is.na(protein_domain)) | (dfam_target_name != "unknown") is applied

Details

This function crawls through each genome stored in the meta result folder generated by LTRpred.meta and performs the following procedures:

  • Step 1: For each genome: Read the *._LTRpred_DataSheet.csv file generated by LTRpred.

  • Step 2: For each genome: Perform quality filtering and selection of elements having at least ltr.similarity sequence similarity between their LTRs (if quality.filter = TRUE). Otherwise no quality filtering is performed.

  • Step 3: Summarize all genome predictions in the meta-folder to one meta-species data.frame.

Quality Filtering

The aim of the quality filtering step is to reduce the potential false positive LTR transposons that were predicted by LTRpred. These false positives can be duplicated genes, or other homologous repetitive elements that fulfill the LTR similarity criteria, but do not have any Primer Binding Site, Open Reading Frames, Gag and Pol proteins, etc. To reduce the number of false positives, the following filters are applied to discard false positive LTR transposons.

  • ltr.similarity: Minimum similarity between LTRs. All TEs not matching this criteria are discarded.

  • n.orfs: minimum number of Open Reading Frames that must be found between the LTRs. All TEs not matching this criteria are discarded.

  • PBS or Protein Match: elements must either have a predicted Primer Binding Site or a protein match of at least one protein (Gag, Pol, Rve, ...) between their LTRs. All TEs not matching this criteria are discarded.

Value

a LTRpred.tbl storing the LTRpred prediction data.frames for all species in the meta result folder generated by LTRpred.meta.

Author(s)

Hajk-Georg Drost


HajkD/LTRpred documentation built on April 22, 2022, 4:35 p.m.