ragp: Mining for Hydroxyproline rich glycoprotein sequences

Documented in get_signalp.AAStringSet get_signalp.character get_signalp.data.frame get_signalp.default get_signalp.list

#' Query SignalP web server.
#'
#' SignalP 4.1 server predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks.
#'
#' @aliases get_signalp get_signalp.default get_signalp.character get_signalp.data.frame get_signalp.list get_signalp.AAStringSet
#' @param data A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class \code{\link[seqinr]{SeqFastaAA}} resulting from \code{\link[seqinr]{read.fasta}} call. Alternatively an \code{\link[Biostrings]{AAStringSet}} object. Should be left blank if vectors are provided to sequence and id arguments.
#' @param sequence A vector of strings representing protein amino acid sequences, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
#' @param id A vector of strings representing protein identifiers, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
#' @param org_type One of c("euk", "gram-", "gram+"), defaults to "euk". Which model should be used for prediction.
#' @param Dcut_type One of c("default", "sensitive", "user"), defaults to "default". The default cutoff values for SignalP 4 are chosen to optimize the performance measured as Matthews Correlation Coefficient (MCC). This results in a lower sensitivity (true positive rate) than SignalP 3.0 had. Setting this argument to "sensitive" will yield the same sensitivity as SignalP 3.0. This will make the false positive rate slightly higher, but still better than that of SignalP 3.0.
#' @param Dcut_noTM A numeric value, with range 0 - 1, defaults to 0.45. For experimenting with cutoff values.
#' @param Dcut_TM A numeric value, with range 0 - 1, defaults to 0.5. For experimenting with cutoff values.
#' @param method One of c("best", "notm"), defaults to "best". Signalp 4.1 contains two types of neural networks. SignalP-TM has been trained with sequences containing transmembrane segments in the data set, while SignalP-noTM has been trained without those sequences. Per default, SignalP 4.1 uses SignalP-TM as a preprocessor to determine whether to use SignalP-TM or SignalP-noTM in the final prediction (if 4 or more positions are predicted to be in a transmembrane state, SignalP-TM is used, otherwise SignalP-noTM). An exception is Gram-positive bacteria, where SignalP-TM is used always. If you are confident that there are no transmembrane segments in your data, you can get a slightly better performance by choosing "Input sequences do not include TM regions", which will tell SignalP 4.1 to use SignalP-noTM always.
#' @param minlen An integer value corresponding to the minimal predicted signal peptide length, at default set to 10. SignalP 4.0 could, in rare cases, erroneously predict signal peptides shorter than 10 residues. These errors have in SignalP 4.1 been eliminated by imposing a lower limit on the cleavage site position (signal peptide length). The minimum length is by default 10, but you can adjust it. Signal peptides shorter than 15 residues are very rare. If you want to disable this length restriction completely, enter 0 (zero).
#' @param trunc An integer value corresponding to the N-terminal truncation of input sequence, at default set to 70. By default, the predictor truncates each sequence to max. 70 residues before submitting it to the neural networks. If you want to predict extremely long signal peptides, you can try a higher value, or disable truncation completely by entering 0 (zero).
#' @param splitter An integer indicating the number of sequences to be in each .fasta file that is to be sent to the server. Default is 1000. Change only in case of a server side error. Accepted values are in range of 1 to 2000.
#' @param attempts Integer, number of attempts if server unresponsive, at default set to 2.
#' @param progress Boolean, whether to show the progress bar, at default set to FALSE.
#' @param ... currently no additional arguments are accepted apart the ones documented bellow.
#'
#' @return  A data frame with columns:
#' \describe{
#'   \item{id}{Character, as from input}
#'   \item{Cmax}{Numeric, C-score (raw cleavage site score). The output from the CS networks, which are trained to distinguish signal peptide cleavage sites from everything else. Note the position numbering of the cleavage site: the C-score is trained to be high at the position immediately after the cleavage site (the first residue in the mature protein).}
#'   \item{Cmax.pos}{Integer, position of Cmax. position immediately after the cleavage site (the first residue in the mature protein).}
#'   \item{Ymax}{Numeric, Y-score (combined cleavage site score), A combination (geometric average) of the C-score and the slope of the S-score, resulting in a better cleavage site prediction than the raw C-score alone. This is due to the fact that multiple high-peaking C-scores can be found in one sequence, where only one is the true cleavage site. The Y-score distinguishes between C-score peaks by choosing the one where the slope of the S-score is steep.}
#'   \item{Ymax.pos}{Integer, position of Ymax}
#'   \item{Smax}{Numeric, S-score (signal peptide score). The output from the SP networks, which are trained to distinguish positions within signal peptides from positions in the mature part of the proteins and from proteins without signal peptides.}
#'   \item{Smax.pos}{Integer, position of Smax}
#'   \item{Smean}{Numeric, The average S-score of the possible signal peptide (from position 1 to the position immediately before the maximal Y-score)}
#'   \item{Dmean}{Numeric, D-score (discrimination score). A weighted average of the mean S and the max. Y scores. This is the score that is used to discriminate signal peptides from non-signal peptides.}
#'   \item{is.sp}{Character, does the sequence contain a N-sp}
#'   \item{Dmaxcut}{Numeric, as from input, Dcut_noTM if SignalP-noTM network used and Dcut_TM if SignalP-TM network used}
#'   \item{Networks.used}{Character, which network was used for the prediction: SignalP-noTM or SignalP-TM}
#'   \item{is.signalp}{Logical, did SignalP predict the presence of a signal peptide}
#'   \item{sp.length}{Integer, length of the predicted signal peptide.}
#'   }
#'
#' @note This function creates temporary files in the working directory.
#'
#' @source \url{https://services.healthtech.dtu.dk/service.php?SignalP-4.1}
#' @references Petersen TN. Brunak S. Heijne G. Nielsen H. (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature Methods 8: 785-786
#'
#' @seealso \code{\link[ragp]{get_signalp5}} \code{\link[ragp]{get_phobius}} \code{\link[ragp]{get_targetp}}
#'
#' @examples
#' library(ragp)
#' signalp_pred <- get_signalp(data = at_nsp[1:10,],
#'                             sequence,
#'                             Transcript.id)
#' signalp_pred
#'
#' @import seqinr
#' @import httr
#' @import xml2
#' @export 

get_signalp <- function (data, ...){
  if (missing(data) || is.null(data)) get_signalp.default(...)
  else UseMethod("get_signalp")
}

#' @rdname get_signalp
#' @method get_signalp character
#' @export

get_signalp.character <- function(data,
                                  org_type = c("euk", "gram-", "gram+"),
                                  Dcut_type = c("default", "sensitive", "user"),
                                  Dcut_noTM = 0.45,
                                  Dcut_TM = 0.5,
                                  method = c("best", "notm"),
                                  minlen = NULL,
                                  trunc = 70L,
                                  splitter = 1000L,
                                  attempts = 2,
                                  progress = FALSE,
                                  ...){
  if (missing(splitter)) {
    splitter <- 1000L
  }
  if (length(splitter) > 1){
    splitter <- 1000L
    warning("splitter should be of length 1, setting to default: splitter = 1000",
            call. = FALSE)
  }
  if (!is.numeric(splitter)){
    splitter <- as.numeric(splitter)
    warning("splitter is not numeric, converting using 'as.numeric'",
            call. = FALSE)
  }
  if (is.na(splitter)){
    splitter <- 1000L
    warning("splitter was set to NA, setting to default: splitter = 1000",
            call. = FALSE)
  }
  if (is.numeric(splitter)) {
    splitter <- floor(splitter)
  }
  if (!(splitter %in% 1:2000)) {
    splitter <- 1000L
    warning("Illegal splitter input, splitter will be set to 1000",
            call. = FALSE)
  }
  if (!missing(trunc)){
    if (length(trunc) > 1){
      stop("trunc should be of length 1.",
           call. = FALSE)
    }
    if (!is.numeric(trunc)){
      stop("trunc is not numeric.",
           call. = FALSE)
    }
    if (is.na(trunc)){
      stop("trunc was set to NA.",
           call. = FALSE)
    }
    if (is.numeric(trunc)){
      trunc <- floor(trunc)
    }
    if (trunc < 0){
      stop("trunc was set to a negative number.",
           call. = FALSE)
    }
    if (trunc == 0){
      trunc <- 1000000L
    }
  }
  if (length(attempts) > 1){
    attempts <- 2L
    warning("attempts should be of length 1, setting to default: attempts = 2",
            call. = FALSE)
  }
  if (!is.numeric(attempts)){
    attempts<- as.numeric(attempts)
    warning("attempts is not numeric, converting using 'as.numeric'",
            call. = FALSE)
  }
  if (is.na(attempts)){
    attempts <- 2L
    warning("attempts was set to NA, setting to default: attempts = 2",
            call. = FALSE)
  }
  if (is.numeric(attempts)) {
    attempts <- floor(attempts)
  }
  if (attempts < 1) {
    attempts <- 2L
    warning("attempts was set to less then 1, setting to default: attempts = 2",
            call. = FALSE)
  }
  if (missing(progress)) {
    progress <- FALSE
  }
  if (length(progress) > 1){
    progress <- FALSE
    warning("progress should be of length 1, setting to default: progress = FALSE",
            call. = FALSE)
  }
  if (!is.logical(progress)){
    progress <- as.logical(progress)
    warning("progress is not logical, converting using 'as.logical'",
            call. = FALSE)
  }
  if (is.na(progress)){
    progress <- FALSE
    warning("progress was set to NA, setting to default: progress = FALSE",
            call. = FALSE)
  }
  if (missing(org_type)) {
    org_type <- "euk"
  }
  if (!org_type %in% c("euk", "gram-", "gram+")) {
    stop("org_type should be one of: 'euk', 'gram-', 'gram+'",
         call. = FALSE)
  }
  if (length(org_type) > 1){
    stop("org_type should be one of: 'euk', 'gram-', 'gram+'",
         call. = FALSE)
  }
  if (missing(Dcut_type)) {
    Dcut_type <- "default"
  }
  if (!Dcut_type %in% c("default", "sensitive", "user")) {
    stop("Dcut_type should be one of: 'default', 'sensitive', 'user'",
         call. = FALSE)
  }
  if (length(Dcut_type) > 1){
    stop("Dcut_type should be one of: 'default', 'sensitive', 'user'",
         call. = FALSE)
  }
  if (missing(Dcut_noTM)) {
    Dcut_noTM <- "0.45"
  }  else {
    Dcut_noTM <- as.character(Dcut_noTM)[1]
  }
  if (!is.numeric(as.numeric(Dcut_noTM))){
    Dcut_noTM <- "0.45"
    warning("Dcut_noTM could not be converted to numeric, setting to default: Dcut_noTM = '0.45'",
            call. = FALSE)
  }
  if (is.na(Dcut_noTM)) {
    Dcut_noTM <- "0.45"
    warning("Dcut_noTM was set to NA, setting to default: Dcut_noTM = '0.45'",
            call. = FALSE)
  }
  if (as.numeric(Dcut_noTM[1]) > 1) {
    Dcut_noTM <- "0.45"
    warning("Dcut_noTM must take values in the range 0 - 1,
            it was set to the default: Dcut_noTM = '0.45'",
            call. = FALSE)
  }
  if (as.numeric(Dcut_noTM[1]) < 0) {
    Dcut_noTM <- "0.45"
    warning("Dcut_noTM must take values in the range 0 - 1,
            it was set to the default: Dcut_noTM = '0.45'",
            call. = FALSE)
  }
  if (missing(Dcut_TM)) {
    Dcut_TM <- "0.5"
  } else {
    Dcut_TM <- as.character(Dcut_TM)[1]
  }
  if (!is.numeric(as.numeric(Dcut_TM))){
    Dcut_TM <- "0.5"
    warning("Dcut_TM could not be converted to numeric, setting to default: Dcut_TM = '0.5'",
            call. = FALSE)
  }
  if (is.na(Dcut_TM)) {
    Dcut_TM <- "0.5"
    warning("Dcut_noTM was set to NA, setting to default: Dcut_TM = '0.5'",
            call. = FALSE)
  }
  if (as.numeric(Dcut_TM[1]) > 1) {
    Dcut_TM <- "0.5"
    warning("Dcut_TM must take values in the range 0 - 1,
            it was set to the default: Dcut_TM = '0.5'",
            call. = FALSE)
  }
  if (as.numeric(Dcut_TM[1]) < 0) {
    Dcut_TM <- "0.5"
    warning("Dcut_TM must take values in the range 0 - 1,
            it was set to the default: Dcut_TM = '0.5'",
            call. = FALSE)
  }
  if (missing(method)) {
    method <- "best"
  }
  if (!method %in% c("best", "notm")){
    stop("method should be one of: 'best', 'notm'",
         call. = FALSE)
  }
  if (length(method) > 1){
    stop("method should be one of: 'best', 'notm'",
         call. = FALSE)
  }
  if (missing(minlen)) {
    minlen <- ""
  }  else {
    minlen <- as.character(minlen)[1]
  }
  if(length(data) > 1){
    stop("one fasta file per function call can be supplied",
         call. = FALSE)
  }
  if (file.exists(data)){
    file_name <- data
  } else {
    stop("cannot find file in the specified path",
         call. = FALSE)
  }
  url <- "https://services.healthtech.dtu.dk/cgi-bin/webface2.fcgi"
  cfg_file <- "/var/www/html/services/SignalP-4.1/webface.cf"
  file_list <- ragp::split_fasta(path_in = file_name,
                                 path_out = "tmp_signalp_",
                                 num_seq = splitter,
                                 trunc = trunc)
  if(grepl("temp_", file_name)){
    unlink(file_name)
  }
  for_pb <- length(file_list)
  if(progress){
    pb <- utils::txtProgressBar(min = 0,
                                max = for_pb,
                                style = 3)
  }
  output <- vector("list", length(file_list))
  for(k in seq_along(file_list)){
    x <- file_list[[k]]
    
    file_up <- httr::upload_file(x)
    if (trunc == 1000000L){
      trunc <- ""
    }
    
    res <- httr::POST(url = url,
                      encode = "multipart",
                      body = list(configfile = cfg_file,
                                  SEQSUB = file_up,
                                  orgtype = org_type,
                                  `Dcut-type` = Dcut_type,
                                  `Dcut-noTM` = Dcut_noTM,
                                  `Dcut-TM` = Dcut_TM,
                                  graphmode = NULL,
                                  format = "short",
                                  minlen = minlen,
                                  method = method,
                                  trunc = as.character(trunc)))
    if(!grepl("jobid=", res$url)){
      stop("something went wrong on server side")
    }
    
    res <- sub("https://services.healthtech.dtu.dk/cgi-bin/webface2.cgi?jobid=",
               "",
               res$url,
               fixed = TRUE)
    
    res <- sub("&wait=20",
               "",
               res,
               fixed = TRUE)
    
    jobid <- res
    
    time1 <- Sys.time()
    
    repeat {
      res2 <- httr::GET(url = url,
                        query = list(jobid = jobid,
                                     wait = "20"))
      code <- res2$status_code
      
      if(code != 200){
        res2_split <- NULL
        warning(paste0(". Problem in file: ",
                       x))
      } else {
        res2 <- as.character(
          xml2::xml_find_all(
            httr::content(res2,
                          as = "parsed"),
            ".//pre")
        )
        res2_split <- unlist(
          strsplit(res2,
                   "\n")
        )
      }
      Sys.sleep(2)
      
      if (any(grepl("Cmax", res2_split))) {
        break
      }
      
      time2 <- Sys.time()
      
      max.time <- as.difftime(pmax(100, splitter * 1.5),
                              units = "secs")
      
      if ((time2 - time1) > max.time) {
        res2_split <- NULL
        if(progress) message(
          "file",
          x,
          "took longer then expected")
        break
      }
    }
    if (is.null(res2_split)) {
      tms <- 0
      
      while(tms < attempts && is.null(res2_split)){
        if(progress) message(
          "reattempting file",
          x)
        
        file_up <-  httr::upload_file(x)
        
        res <- httr::POST(url = url,
                          encode = "multipart",
                          body = list(configfile = cfg_file,
                                      SEQSUB = file_up,
                                      orgtype = org_type,
                                      `Dcut-type` = Dcut_type,
                                      `Dcut-noTM` = Dcut_noTM,
                                      `Dcut-TM` = Dcut_TM,
                                      graphmode = NULL,
                                      format = "short",
                                      minlen = minlen,
                                      method = method,
                                      trunc = as.character(trunc)))
        if(!grepl("jobid=", res$url)){
          stop("something went wrong on server side")
        }
        res <- sub("https://services.healthtech.dtu.dk/cgi-bin/webface2.cgi?jobid=",
                   "",
                   res$url,
                   fixed = TRUE)
        
        res <- sub("&wait=20",
                   "",
                   res,
                   fixed = TRUE)
        jobid <- res
        
        time1 <- Sys.time()
        
        repeat {
          res2 <- httr::GET(url = url,
                            query = list(jobid = jobid,
                                         wait = "20"))
          code <- res2$status_code
          
          if(code != 200){
            res2_split <- NULL
            warning(paste0( ". Problem in file: ",
                            x))
          } else {
            res2 <- as.character(
              xml2::xml_find_all(
                httr::content(res2,
                              as = "parsed"),
                ".//pre")
            )
            res2_split <- unlist(
              strsplit(res2,
                       "\n")
            )
          }
          Sys.sleep(1)
          if (any(grepl("Cmax", res2_split))) {
            break
          }
          
          time2 <- Sys.time()
          
          max.time <- as.difftime(pmax(100, splitter * 1.5),
                                  units = "secs")
          
          if ((time2 - time1) > max.time) {
            res2_split <- NULL
            break
          }
        }
        tms <- tms + 1
      }
    }

    if (is.null(res2_split)){
      output <- do.call(rbind,
                        output)
      output$is.signalp <- output$is.sp == "Y"
      if(progress){
        utils::setTxtProgressBar(pb,
                                 for_pb)
        close(pb)
      }
      warning(
        "maximum attempts reached at",
        x,
        "returning finished queries",
        call. = FALSE)
      return(output)
    }
    unlink(x)
    
    res2_split <- res2_split[(which(grepl("name",
                                          res2_split))[1] +
                                1):(which(grepl("/pre",
                                                res2_split ))[1] - 1)]
    
    if(any(grepl("hr", res2_split))){
      res2_split <- res2_split[1:(which(grepl("<hr>",
                                              res2_split))[1] - 1)]
    }
    res2_split <- strsplit(res2_split,
                           " +")
    res2_split <- do.call(rbind,
                          res2_split)
    res2_split <- as.data.frame(res2_split,
                                stringsAsFactors = F)
    colnames(res2_split) <- c("id",
                              "Cmax",
                              "Cmax.pos",
                              "Ymax",
                              "Ymax.pos",
                              "Smax",
                              "Smax.pos",
                              "Smean",
                              "Dmean",
                              "is.sp",
                              "Dmaxcut",
                              "Networks.used")
    res2_split$Ymax.pos <- as.integer(as.character(res2_split$Ymax.pos))
    res2_split$Cmax.pos <- as.integer(as.character(res2_split$Cmax.pos)) 
    res2_split$Smax.pos <- as.integer(as.character(res2_split$Smax.pos)) 
    res2_split$Cmax <- as.numeric(as.character(res2_split$Cmax))
    res2_split$Ymax <- as.numeric(as.character(res2_split$Ymax)) 
    res2_split$Smax <- as.numeric(as.character(res2_split$Smax)) 
    res2_split$Smean <- as.numeric(as.character(res2_split$Smean)) 
    res2_split$Dmean <- as.numeric(as.character(res2_split$Dmean))
    
    if(progress){
      utils::setTxtProgressBar(pb,
                               k)
    }
    output[[k]] <- res2_split
  }
  
  if(progress){
    utils::setTxtProgressBar(pb,
                             for_pb)
    close(pb)
  }
  
  output <- do.call(rbind,
                    output)
  
  output$is.signalp <- output$is.sp == "Y"
  output$sp.length <- output$Ymax.pos
  return(output)
}

#' @rdname get_signalp
#' @method get_signalp data.frame
#' @export

get_signalp.data.frame <- function(data,
                                   sequence,
                                   id,
                                   ...){
  if(missing(sequence)){
    stop("the column name with the sequences must be specified",
         call. = FALSE)
  }
  if(missing(id)){
    stop("the column name with the sequence id's must be specified",
         call. = FALSE)
  }
  id <- as.character(substitute(id))
  sequence <- as.character(substitute(sequence))
  if (length(id) != 1L){
    stop("only one column name for 'id' must be specifed",
         call. = FALSE)
  }
  if (length(sequence) != 1L){
    stop("only one column name for 'sequence' must be specifed",
         call. = FALSE)
  }
  id <- if(id %in% colnames(data)){
    data[[id]]
  } else {
    stop("specified 'id' not found in data",
         call. = FALSE)
  }
  id <- as.character(id)
  sequence  <- if(sequence %in% colnames(data)){
    data[[sequence]]
  } else {
    stop("specified 'sequence' not found in data",
         call. = FALSE)
  }
  sequence <- toupper(as.character(sequence))
  sequence <- sub("\\*$",
                  "",
                  sequence)
  aa_regex <- "[^ARNDCQEGHILKMFPSTWYV]"
  if (any(grepl(aa_regex, sequence))){
    warning(paste("sequences: ",
                  paste(id[grepl(aa_regex,
                                 sequence)],
                        collapse = ", "),
                  " contain symbols not corresponding to amino acids",
                  sep = ""),
            call. = FALSE)
  }
  file_name <- paste("temp_",
                     gsub("^X",
                          "",
                          make.names(Sys.time())),
                     ".fasta",
                     sep = "")
  seqinr::write.fasta(sequence = strsplit(sequence, ""),
                      name = id,
                      file = file_name)
  res <- get_signalp.character(data = file_name, ...)
  return(res)
}

#' @rdname get_signalp
#' @method get_signalp list
#' @export


get_signalp.list <- function(data,
                             ...){
  if(class(data[[1]]) ==  "SeqFastaAA"){
    dat <- lapply(data,
                  paste0,
                  collapse ="")
    id <- names(dat)
    sequence <- toupper(as.character(unlist(dat)))
    sequence <- sub("\\*$",
                    "",
                    sequence)
    aa_regex <- "[^ARNDCQEGHILKMFPSTWYV]"
    if (any(grepl(aa_regex, sequence))){
      warning(paste("sequences: ",
                    paste(id[grepl(aa_regex,
                                   sequence)],
                          collapse = ", "),
                    " contain symbols not corresponding to amino acids",
                    sep = ""),
              call. = FALSE)
    }
    file_name <- paste("temp_",
                       gsub("^X",
                            "",
                            make.names(Sys.time())),
                       ".fasta",
                       sep = "")
    seqinr::write.fasta(sequence = strsplit(sequence, ""),
                        name = id,
                        file = file_name)
  } else {
    stop("only lists containing objects of class SeqFastaAA are supported")
  }
  res <- get_signalp.character(data = file_name, ...)
  return(res)
}

#' @rdname get_signalp
#' @method get_signalp default
#' @export

get_signalp.default <- function(data = NULL,
                                sequence,
                                id,
                                ...){
  if (missing(sequence)){
    stop("protein sequence must be provided to obtain predictions",
         call. = FALSE)
  }
  if (missing(id)){
    stop("protein id must be provided to obtain predictions",
         call. = FALSE)
  }
  id <- as.character(id)
  sequence <- toupper(as.character(sequence))
  if (length(sequence) != length(id)){
    stop("id and sequence vectors are not of same length",
         call. = FALSE)
  }
  sequence <- sub("\\*$",
                  "",
                  sequence)
  aa_regex <- "[^ARNDCQEGHILKMFPSTWYV]"
  if (any(grepl(aa_regex, sequence))){
    warning(paste("sequences: ",
                  paste(id[grepl(aa_regex,
                                 sequence)],
                        collapse = ", "),
                  " contain symbols not corresponding to amino acids",
                  sep = ""),
            call. = FALSE)
  }
  file_name <- paste("temp_",
                     gsub("^X",
                          "",
                          make.names(Sys.time())),
                     ".fasta",
                     sep = "")
  seqinr::write.fasta(sequence = strsplit(sequence, ""),
                      name = id,
                      file = file_name)
  res <- get_signalp.character(data = file_name, ...)
  return(res)
}

#' @rdname get_signalp
#' @method get_signalp AAStringSet
#' @export

get_signalp.AAStringSet <-  function(data,
                                     ...){
  sequence <- as.character(data)
  id <- names(sequence)
  sequence <- unname(sequence)
  sequence <- toupper(sequence)
  sequence <- sub("\\*$",
                  "",
                  sequence)
  
  res <- get_signalp.default(sequence = sequence,
                             id = id,
                             ...)
  return(res)
}
missuse/ragp documentation built on Jan. 4, 2022, 10:49 a.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
missuse/ragp
Mining for Hydroxyproline rich glycoprotein sequences

R/get_signalp.R
In missuse/ragp: Mining for Hydroxyproline rich glycoprotein sequences

Defines functions get_signalp.AAStringSet get_signalp.default get_signalp.list get_signalp.data.frame get_signalp.character

Documented in get_signalp.AAStringSet get_signalp.character get_signalp.data.frame get_signalp.default get_signalp.list

R Package Documentation

Browse R Packages

We want your feedback!

missuse/ragp Mining for Hydroxyproline rich glycoprotein sequences

R/get_signalp.R In missuse/ragp: Mining for Hydroxyproline rich glycoprotein sequences

Defines functions get_signalp.AAStringSet get_signalp.default get_signalp.list get_signalp.data.frame get_signalp.character

Documented in get_signalp.AAStringSet get_signalp.character get_signalp.data.frame get_signalp.default get_signalp.list

R Package Documentation

Browse R Packages

We want your feedback!

missuse/ragp
Mining for Hydroxyproline rich glycoprotein sequences

R/get_signalp.R
In missuse/ragp: Mining for Hydroxyproline rich glycoprotein sequences