computeStructure: Computation of the Secondary Structural Features of RNA or...

View source: R/Structure.R

computeStructureR Documentation

Computation of the Secondary Structural Features of RNA or Protein Sequences

Description

The function computeStructure computes the secondary structural features of RNA or protein sequences. ViennaRNA package and Predator is required.

Usage

computeStructure(
  seqs,
  seqType = c("RNA", "Pro"),
  args.RNAsubopt = NULL,
  args.Predator = NULL,
  structureRNA.num = 6,
  structurePro = c("ChouFasman", "DeleageRoux", "Levitt"),
  Fourier.len = 10,
  workDir.Pro = getwd(),
  as.list = TRUE,
  path.RNAsubopt = "RNAsubopt",
  path.Predator = "Predator/predator",
  path.stride = "Predator/stride.dat",
  verbose = FALSE,
  parallel.cores = 2,
  cl = NULL
)

Arguments

seqs

sequences loaded by function read.fasta from seqinr-package. Or a list of RNA/protein sequences. RNA sequences will be converted into lower case letters, but protein sequences will be converted into upper case letters, and non-AA letters will be ignored. Each sequence should be a vector of single characters.

seqType

a string that specifies the nature of the sequence: "RNA" or "Pro" (protein). If the input is DNA sequence and seqType = "RNA", the DNA sequence will be converted to RNA sequence automatically. Default: "RNA".

args.RNAsubopt

string (in format such as "-N -z -S 1.07") specifying additional arguments (except "-p" which is already determined by structureRNA.num) for RNAsubopt. This is used when you want to control the behaviour of RNAsubopt. Arguments for RNAsubopt please refer to its manual. Default: NULL.

args.Predator

string specifying additional arguments (except "-a" and "-b") for Predator. This is used when you want to control the behaviour of Predator. Arguments for Predator please refer to its manual. Default: NULL.

structureRNA.num

integer. The number of random samples of suboptimal structures. Default: 6.

structurePro

strings specifying the secondary structural information that are extracted from protein sequences. Ignored if seqType = "RNA". Options: "ChouFasman", "DeleageRoux", and "Levitt". See details below.(Ref: [2-4]) Multiple elements can be selected at the same time.

Fourier.len

positive integer specifying the Fourier series length that will be used as features. The Fourier.len should be >= the length of the input sequence. Default: 10.

workDir.Pro

string specifying the directory for temporary files. The temp files will be deleted automatically when the calculation is completed.

as.list

logical. The result will be returned as a list or data frame.

path.RNAsubopt

string specifying the location of RNAsubopt program. (Ref: [5])

path.Predator

string specifying the location of Predator program. (Ref: [6])

path.stride

string specifying the location of file "stride.dat" required by program Predator.

verbose

logical. Should the relevant information be printed during the calculation? (Only available on Linux.)

parallel.cores

an integer that indicates the number of cores for parallel computation. Default: 2. Set parallel.cores = -1 to run with all the cores. parallel.cores should be == -1 or >= 1.

cl

parallel cores to be passed to this function.

Details

The secondary structures of RNA and protein are computed by RNAsubopt and Predator, respectively. And the protein secondary features are encoded using three amino acid scales:

  1. Chou & Fasman conformational parameter (Ref: [2])

  2. Deleage & Roux conformational parameter (Ref: [3])

  3. Levitt normalised frequency (Ref: [4])

The feature encoding strategy is based on lncPro (Ref: [7]).

This function depends on the program "RNAsubopt" of software "ViennaRNA" (http://www.tbi.univie.ac.at/RNA/index.html) and "Predator" (https://bioweb.pasteur.fr/packages/pack@predator@2.1.2).

Parameter path.RNAsubopt can be simply defined as "RNAsubopt" as default when the OS is UNIX/Linux. However, for some OS, such as Windows, users may need to specify the path.RNAsubopt if the path of "RNAsubopt" haven't been added in environment variables (e.g. path.RNAsubopt = '"C:/Program Files/ViennaRNA/RNAsubopt.exe"').

Program "Predator" is only available on UNIX/Linux and 32-bit Windows OS.

Value

This function returns a data frame if as.list = FALSE or returns a list if as.list = TRUE.

References

[1] Han S, Yang X, Sun H, et al. LION: an integrated R package for effective prediction of ncRNA–protein interaction. Briefings in Bioinformatics. 2022; 23(6):bbac420

[2] Chou PY, Fasman GD. Prediction of the secondary structure of proteins from their amino acid sequence. Adv. Enzymol. Relat. Areas Mol. Biol. 1978; 47:45-148

[3] Deleage G, Roux B. An algorithm for protein secondary structure prediction based on class prediction. Protein Eng. Des. Sel. 1987; 1:289-294

[4] Levitt M. Conformational preferences of amino acids in globular proteins. Biochemistry 1978; 17:4277-85

[5] Frishman D, Argos P. Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. Protein Eng. 1996; 9:133-42

[6] Lorenz R, Bernhart SH, Honer zu Siederdissen C, et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 2011; 6:26

[7] Lu Q, Ren S, Lu M, et al. Computational prediction of associations between long non-coding RNAs and proteins. BMC Genomics 2013; 14:651

See Also

runRNAsubopt, runPredator, featureStructure

Examples



data(demoPositiveSeq)
seqsRNA <- demoPositiveSeq$RNA.positive
seqsPro <- demoPositiveSeq$Pro.positive

# You need to use your own paths:

path.Predator <- "/mnt/external_drive_1/hansy/predator/predator"
path.stride <- "/mnt/external_drive_1/hansy/predator/stride.dat"

structureRNA <- computeStructure(seqsRNA, seqType = "RNA", structureRNA.num = 6,
                                 Fourier.len = 10, as.list = FALSE,
                                 path.RNAsubopt = "RNAsubopt", parallel.cores = 2)

structurePro <- computeStructure(seqsPro, seqType = "Pro",
                                 structurePro = c("ChouFasman", "DeleageRoux",
                                                  "Levitt"),
                                 Fourier.len = 10, workDir.Pro = getwd(),
                                 as.list = TRUE, path.Predator = path.Predator,
                                 path.stride = path.stride, parallel.cores = 2)


HAN-Siyu/ncProR documentation built on Nov. 3, 2023, 12:08 a.m.