R/pred2orfseqs.R

Defines functions pred2orfseqs

Documented in pred2orfseqs

#' @title Save the ORF sequence of the predicted LTR Transposons in a fasta file
#' @description This function allows users to save the sequence of the predicted LTR Transposons or LTRs in a fasta file.
#' @param LTRpred.tbl the \code{\link{data.frame}} generated by \code{\link{LTRpred}}.
#' @param orf.seq.file the fasta file storing the open reading frame sequences of predicted retroelements.
#' as returned by the \code{\link{LTRpred}} function.
#' @param output the fasta file to which the output sequences shall be stored in.
#' @author Hajk-Georg Drost
#' @details 
#' The output \code{data.frame}s returned by \code{\link{LTRpred}} contain all information of the predicted
#' LTR retrotransposons that can be used for post-filtering steps. After these post-filtering steps
#' sequences of the remaining (filtered) candidates can be retrieved by this function.
#' @seealso \code{\link{LTRharvest}}, \code{\link{LTRdigest}}, \code{\link{LTRpred}}, \code{\link{read.prediction}} 
#' @export 

pred2orfseqs <- function(LTRpred.tbl, orf.seq.file, output = "output.fa"){
  
  if (!file.exists(orf.seq.file))
    stop("The file '", orf.seq.file, "' does not seem to exist. Please provide a valid file path to a fasta file storing all predicted retrotransposon sequences.", call. = FALSE)
  
  PutativeLTRSeqs <- Biostrings::readDNAStringSet(orf.seq.file)
  
  names(PutativeLTRSeqs) <- unlist(sapply(PutativeLTRSeqs@ranges@NAMES, function(x) unlist(stringr::str_split(x, "[|]"))[1]))
  
 selection_subset <- stats::na.omit(match(LTRpred.tbl$orf.id, PutativeLTRSeqs@ranges@NAMES))                                                
  if (length(selection_subset) == 0)
    stop("No matching TEs were found in the input fasta file. Please check whether the names match ... ", " Example: LTRpred.tbl$orf.id[1] = ", LTRpred.tbl$orf.id[1], " and orf.seq.file[1] = ", PutativeLTRSeqs[1], call. = FALSE)
 
  Biostrings::writeXStringSet(PutativeLTRSeqs[selection_subset], output)
  
}
HajkD/LTRpred documentation built on April 22, 2022, 4:35 p.m.