repbase.query: Query the RepBase to annotate putative LTRs

View source: R/repbase.query.R

repbase.queryR Documentation

Query the RepBase to annotate putative LTRs

Description

Validate or annotate putative LTR transposons that have been predicted using LTRharvest or LTRdigest.

Usage

repbase.query(
  seq.file,
  repbase.path,
  output = "RepbaseOutput.txt",
  max.hits = 5000,
  eval = 1e-30,
  cores = 1
)

Arguments

seq.file

file path to the putative LTR transposon sequences in fasta format.

repbase.path

file path to the RepBase file in fasta format.

output

file name of the BLAST output.

max.hits

maximum number of hits that shall be retrieved that still fulfill the e-value criterium. Default is max.hits = 5000.

eval

e-value threshold for BLAST hit detection. Default is eval = 1E-30.

cores

number of cores to use to perform parallel computations.

Details

The RepBase database provides a collection of curated transposable element annotations.

This function allows users to validate or annotate putative LTR transposons that have been predicted using LTRharvest or LTRdigest by blasting predicted LTR transposons to transposons known (annotated) in other species (e.g. such as Arabidopsis thaliana).

Internally, this function performs a blastn search of the putative LTR transposons predicted by LTRharvest or LTRdigest against the Repbase fasta file that is specified by the user.

For this purpose it is required that the user has a working version of BLAST+ running on his or her machine.

Author(s)

Hajk-Georg Drost

References

http://www.girinst.org/repbase/

Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410.

Gish, W. & States, D.J. (1993) "Identification of protein coding regions by database similarity search." Nature Genet. 3:266-272.

Madden, T.L., Tatusov, R.L. & Zhang, J. (1996) "Applications of network BLAST server" Meth. Enzymol. 266:131-141.

Altschul, S.F., Madden, T.L., Schaeffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997) "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res. 25:3389-3402.

Zhang Z., Schwartz S., Wagner L., & Miller W. (2000), "A greedy algorithm for aligning DNA sequences" J Comput Biol 2000; 7(1-2):203-14.

Examples

## Not run: 
# Example annotation run against the A thaliana RepBase using 4 cores
q <- repbase.query(seq.file     = "path/to/LTRtransposonSeqs.fasta",
                  repbase.path = "path/to/Athaliana_repbase.ref",
                  cores        = 4)
                 
Annot <- dplyr::select(dplyr::filter(dplyr::group_by(q,query_id), 
                                    (bit_score == max(bit_score))),
                                     query_id:q_len,evalue,bit_score,scope)
# select only hits with a scope > 0.1
Annot.HighMatches <- dplyr::filter(Annot, scope >= 0.1)
# Annotate the proportion of hits
barplot(sort(table(unlist(lapply(stringr::str_split(
        names(table(Annot.HighMatches$subject_id)),"_"), 
        function(x) x[2]))), decreasing = TRUE))

## End(Not run)

HajkD/LTRpred documentation built on April 22, 2022, 4:35 p.m.