meta.summarize: Summarize (concatenate) all predictions of a 'LTRpred.meta'...
In HajkD/LTRpred: De novo functional annotation of retrotransposons

meta.summarize

R Documentation

Summarize (concatenate) all predictions of a `LTRpred.meta` run

Description

Crawl through all genome predictions performed with LTRpred.meta and concatenate the prediction files for each species in the meta result folder generated by LTRpred.meta to a meta-species data.frame.

Usage

meta.summarize(
  result.folder,
  ltr.similarity = 70,
  quality.filter = TRUE,
  n.orfs = 0,
  strategy = "default"
)

Arguments

`result.folder`	path to meta result folder generated by `LTRpred.meta`.
`ltr.similarity`	only count elements that have an LTR similarity >= this threshold.
`quality.filter`	optimize search to remove potential false positives (e.g. duplicated genes, etc.). See `Details` for further information on the filter criteria.
`n.orfs`	minimum number of Open Reading Frames that must be found between the LTRs (if `quality.filter = TRUE`). See `Details` for further information on quality control.
`strategy`	quality filter strategy. Options are `strategy = "default"` : see section `Quality Control` `strategy = "stringent"` : in addition to filter criteria specified in section `Quality Control`, the filter criteria `!is.na(protein_domain)) \| (dfam_target_name != "unknown")` is applied

Details

This function crawls through each genome stored in the meta result folder generated by LTRpred.meta and performs the following procedures:

Step 1: For each genome: Read the *._LTRpred_DataSheet.csv file generated by LTRpred.
Step 2: For each genome: Perform quality filtering and selection of elements having at least ltr.similarity sequence similarity between their LTRs (if quality.filter = TRUE). Otherwise no quality filtering is performed.
Step 3: Summarize all genome predictions in the meta-folder to one meta-species data.frame.

Quality Filtering

The aim of the quality filtering step is to reduce the potential false positive LTR transposons that were predicted by LTRpred. These false positives can be duplicated genes, or other homologous repetitive elements that fulfill the LTR similarity criteria, but do not have any Primer Binding Site, Open Reading Frames, Gag and Pol proteins, etc. To reduce the number of false positives, the following filters are applied to discard false positive LTR transposons.

ltr.similarity: Minimum similarity between LTRs. All TEs not matching this criteria are discarded.
n.orfs: minimum number of Open Reading Frames that must be found between the LTRs. All TEs not matching this criteria are discarded.
PBS or Protein Match: elements must either have a predicted Primer Binding Site or a protein match of at least one protein (Gag, Pol, Rve, ...) between their LTRs. All TEs not matching this criteria are discarded.