read_SS: Read Secondary Structure Information

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/LncFinder.R

Description

This function can read secondary structure information from your own file instead of obtaining from function run_RNAfold. This function will be useful if users have had secondary structure sequences (Dot-Bracket Notation).

Usage

1
2
3
4
5
6
7
read_SS(
  oneFile.loc,
  seqRNA.loc,
  seqSS.loc,
  separateFile = TRUE,
  withMFE = TRUE
)

Arguments

oneFile.loc

String. The location of your sequence file. This file should contains one (and only one) RNA sequence and its secondary structure sequence in Dot-Bracket Notation. This parameter needs to be defined only when separateFile = FALSE. See Details for more information.

seqRNA.loc

String. The location of your RNA sequences file (FASTA format). If your RNA sequences and secondary structure sequences are in two files, you need to define the locations of two files respectively. And the files with multiple sequences are supported for this option. This parameter needs to be defined only when separateFile is TRUE. Location of secondary structure sequences file is also needed (parameter seqSS.loc). See Details for more information.

seqSS.loc

String. The location of your secondary structure sequences file (FASTA format).

separateFile

Logical. Your RNA sequence(s) and secondary structure sequence(s) are in separate files? If separateFile = FALSE, your file should have one (and only one) RNA sequence and its secondary structure sequence. No limit when separateFile = TRUE.

withMFE

Logical. Whether MFE is provided at the end of secondary structure sequence. If withMFE = TRUE, MFE will be extracted. The format should be in accordance with the output format of RNAfold.

Details

When users want to predict sequences with secondary structure features, users may have had their own secondary structure sequences. With this function, users can read SS information from their files. Two kind of files are supported: RNA sequence and SS sequence in one file separateFile is FALSE or in separate files separateFile = TRUE.

separateFile = FALSE is used for secondary structure that obtained from some popular programs, such as RNAfold. In this case, the output file only contains one RNA sequence and its SS. Besides, this file only have two rows: RNA sequence and its SS sequences. Thus, this option is more favorable when the file only have one sequence and the sequence are in accordance with the output format of RNAfold.

If users obtained the SS sequence from experiments, RNA sequence and SS sequence may be in two files. In this case, users can select separateFile = TRUE. Two files should be in FASTA format and one file can have multiple sequences. The sequences in two files should have the same order. If your data are obtained from experiments or other sources, it is highly recommended that users should build new model with this data, since the SS sequences of pre-built model are obtained for RNAfold and may have many differences with experimental data.

Value

A dataframe. The first row is RNA sequence, the second row is Dot-Bracket Notation of secondary structure sequence, the third row is MFE (if MFE is provided).

Author(s)

HAN Siyu

See Also

run_RNAfold

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## Not run: 
### Load sequence data
data("demo_DNA.seq")
Seqs <- demo_DNA.seq[1:4]
### Convert sequences from vector to string.
Seqs <- sapply(Seqs, seqinr::getSequence, as.string = TRUE)
### Write a fasta file.
seqinr::write.fasta(Seqs, names = names(Seqs), file.out = "tmp.RNA.fa", as.string = TRUE)

### For Windows system: (Your path of RNAfold.)
RNAfold.path <- '"E:/Program Files/ViennaRNA/RNAfold.exe"'
### Define the parameters of RNAfold. See documents of RNAfold for more information.
RNAfold.command <- paste(RNAfold.path, " --noPS -i tmp.RNA.fa -o output", sep = "")
### Run RNAfold and output four result files.
system(RNAfold.command)

### Read secondary structure information for one file.
result_1 <- read_SS(oneFile.loc = "output_ENST00000510062.1.fold",
                    separateFile = FALSE, withMFE = TRUE)
### Read secondary sturcture sequences for multiple files.
filePath <- dir(pattern = ".fold")
result_2 <- sapply(filePath, read_SS, separateFile = FALSE, withMFE = TRUE)
result_2 <- as.data.frame(result_2)

## End(Not run)

LncFinder documentation built on Dec. 11, 2021, 9:39 a.m.