computePhysChem: Computation of the Physicochemical Features of RNA or Protein...

View source: R/PhysicochemicalProperty.R

computePhysChemR Documentation

Computation of the Physicochemical Features of RNA or Protein Sequences

Description

The function computePhysChem computes the physicochemical features of RNA or protein sequences.

Usage

computePhysChem(
  seqs,
  seqType = c("RNA", "Pro"),
  Fourier.len = 10,
  physchemRNA = c("hydrogenBonding", "vanderWaal"),
  physchemPro = c("polarity.Grantham", "polarity.Zimmerman", "bulkiness.Zimmerman",
    "isoelectricPoint.Zimmerman", "hphob.BullBreese", "hphob.KyteDoolittle",
    "hphob.Eisenberg", "hphob.HoppWoods"),
  as.list = TRUE,
  parallel.cores = 2,
  cl = NULL
)

Arguments

seqs

sequences loaded by function read.fasta from seqinr-package. Or a list of RNA/protein sequences. RNA sequences will be converted into lower case letters, but protein sequences will be converted into upper case letters. Each sequence should be a vector of single characters.

seqType

a string that specifies the nature of the sequence: "RNA" or "Pro" (protein). If the input is DNA sequence and seqType = "RNA", the DNA sequence will be converted to RNA sequence automatically. Default: "RNA".

Fourier.len

positive integer specifying the Fourier series length that will be used as features. The Fourier.len should be >= the length of the input sequence. Default: 10.

physchemRNA

strings specifying the physicochemical properties that are computed in RNA sequences. Ignored if seqType = "Pro". Options: "hydrogenBonding" for Hydrogen-bonding and "vanderWaal" for Van der Waal's interaction Multiple elements can be selected at the same time. (Ref: [2])

physchemPro

strings specifying the physicochemical properties that are computed in protein sequences. Ignored if seqType = "RNA". Options: "polarity.Grantham", "polarity.Zimmerman", "bulkiness.Zimmerman", "isoelectricPoint.Zimmerman", "hphob.BullBreese", "hphob.KyteDoolittle", "hphob.Eisenberg", and "hphob.HoppWoods". Multiple elements can be selected at the same time. See details below. (Ref: [3-9])

as.list

logical. The result will be returned as a list or a data frame.

parallel.cores

an integer that indicates the number of cores for parallel computation. Default: 2. Set parallel.cores = -1 to run with all the cores. parallel.cores should be == -1 or >= 1.

cl

parallel cores to be passed to this function.

Details

The default physicochemical properties are selected or derived from tool "catRAPID" (Ref: [10]) and "lncPro" (Ref: [11]). In "catRAPID", Fourier.len = 50; in "lncPro", Fourier.len is set as 10.

  • The physicochemical properties of RNA

    1. Hydrogen-bonding ("hydrogenBonding") (Ref: [2])

    2. Van der Waal's interaction ("vanderWaal") (Ref: [2])

  • The physicochemical properties of protein sequence

    1. polarity "polarity.Grantham" (Ref: [3])

    2. polarity "polarity.Zimmerman" (Ref: [4])

    3. bulkiness "bulkiness.Zimmerman" Ref: [4]

    4. isoelectric point "isoelectricPoint.Zimmerman" (Ref: [4])

    5. hydropathicity "hphob.BullBreese" (Ref: [5])

    6. hydropathicity "hphob.KyteDoolittle" (Ref: [6])

    7. hydropathicity "hphob.Eisenberg" (Ref: [7])

    8. hydropathicity "hphob.HoppWoods" (Ref: [8])

Value

This function returns a data frame if as.list = FALSE or returns a list if as.list = TRUE.

References

[1] Han S, Yang X, Sun H, et al. LION: an integrated R package for effective prediction of ncRNA–protein interaction. Briefings in Bioinformatics. 2022; 23(6):bbac420

[2] Morozova N, Allers J, Myers J, et al. Protein-RNA interactions: exploring binding patterns with a three-dimensional superposition analysis of high resolution structures. Bioinformatics 2006; 22:2746-52

[3] Grantham R. Amino acid difference formula to help explain protein evolution. Science 1974; 185:862-4

[4] Zimmerman JM, Eliezer N, Simha R. The characterization of amino acid sequences in proteins by statistical methods. J. Theor. Biol. 1968; 21:170-201

[5] Bull HB, Breese K. Surface tension of amino acid solutions: a hydrophobicity scale of the amino acid residues. Arch. Biochem. Biophys. 1974; 161:665-670

[6] Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 1982; 157:105-132

[7] Eisenberg D, Schwarz E, Komaromy M, et al. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J. Mol. Biol. 1984; 179:125-42

[8] Hopp TP, Woods KR. Prediction of protein antigenic determinants from amino acid sequences. Proc. Natl. Acad. Sci. U. S. A. 1981; 78:3824-8

[9] Kawashima S, Kanehisa M. AAindex: amino acid index database. Nucleic Acids Res. 2000; 28:374

[10] Bellucci M, Agostini F, Masin M, et al. Predicting protein associations with long noncoding RNAs. Nat. Methods 2011; 8:444-445

[11] Lu Q, Ren S, Lu M, et al. Computational prediction of associations between long non-coding RNAs and proteins. BMC Genomics 2013; 14:651

See Also

featurePhysChem

Examples

data(demoPositiveSeq)
seqsRNA <- demoPositiveSeq$RNA.positive
seqsPro <- demoPositiveSeq$Pro.positive

# Return a data frame:
physChemRNA <- computePhysChem(seqs = seqsRNA, seqType = "RNA",
                               Fourier.len = 10, as.list = FALSE)

# Return a list:
physChemPro <- computePhysChem(seqs = seqsPro, seqType = "Pro", Fourier.len = 8,
                               physchemPro = c("polarity.Grantham",
                                               "polarity.Zimmerman",
                                               "hphob.BullBreese",
                                               "hphob.KyteDoolittle",
                                               "hphob.Eisenberg",
                                               "hphob.HoppWoods"),
                               as.list = TRUE)


HAN-Siyu/ncProR documentation built on Nov. 3, 2023, 12:08 a.m.