readStructure: Read secondary structure file

Description Usage Arguments Details Value Author(s) Examples

Description

Reads in secondary structure text files into a helix data frame.

Usage

1
2
3
4

Arguments

file

A text file in connect format, see details for format specifications.

palette

Used to colour basepairs by bracket type. See viennaToHelix for more details.

Details

Helix: Files start with a header line beginning with # followed by the sequence length, followed by a four-column tab-delimited table (with column names), where each row corresponds to a helix in the structure. The four columns are i and j for the left-most and right-most basepair positions respectively, the length of the helix (converging inwards from i and j, and finally an arbitrary value assigned to the helix.

Vienna: Dot-bracket notation from Vienna package programs, where each structure consists of matched brackets for basepairs and periods for unbased pairs. Valid brackets are (, , [, <, A, B, C, D matched with ), , ], >, a, b, c, d, respectively. An energy value can be appended to the end of any dot-bracket structure. The function will accept slight variations of the format, including those with FASTA-like headers (in which case line breaks are allows), and those without FASTA-like headers (in which case line breaks are NOT allowed), with both types allowing for a preceding (NOT following) nucleotide sequence for the structure. Multiple entries of the same length may be in a single file, which will be returned as a single helix structure, with respectively energy values (if specified).

Connect: Output from mfold and other programs, this format is expected to be a text file beginning with a header line that starts with the sequence length, with an optional Energy/dG value, followed by a six-column tab-delimited table where columns 1 and 5 denote the position that are basepaired (unpaired when column 5 is 0). Other columns are ignored, but for completeness, column 2 is the nucleotide, column 3 and 4 are the positions of the bases left and right of the base specified in column 1 respectively (with 0 denoting non-existance), and column 6 a copy of column 1. Multiple entries of the same length may be in a single file, which will be returned as a single helix structure. All helices will be assigned the energy value extracted from their respective structure header lines.

Bpseq: Format used by the Gutell Lab's Comparative RNA Website. The file may optionally begin several header lines (e.g. Filename, Organism, Accession, etc.), followed by a 3-column tab-delimited table for the structure, where column 1 is the base position, base 2 is the nucleotide base, and column 3 is the paired position (0 if unpaired). Certain pieces of header information will be parsed and returned as attributes of the output data frame. Multiple structures can be within a single file, returned as a single helix data frame, with attributes set to those of the first entry.

Value

Returns a helix format data frame.

Author(s)

Daniel Lai, Jeff Proctor

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
    file <- system.file("extdata", "helix.txt", package = "R4RNA")
    helix <- readHelix(file)
    head(helix)

    file <- system.file("extdata", "connect.txt", package = "R4RNA")
    connect <- readConnect(file)
    head(connect)
    message("Note connect data assigns structure energy level to all basepairs")

    file <- system.file("extdata", "vienna.txt", package = "R4RNA")
    vienna <- readVienna(file)
    head(vienna)
    message("Note vienna data assigns structure energy level to all basepairs")

    file <- system.file("extdata", "bpseq.txt", package = "R4RNA")
    bpseq <- readBpseq(file)
    head(bpseq)
    message("Note bpseq data has no value assigned to basepairs")

Example output

Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package:BiocGenericsThe following objects are masked frompackage:parallel:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked frompackage:stats:

    IQR, mad, sd, var, xtabs

The following objects are masked frompackage:base:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package:S4VectorsThe following object is masked frompackage:base:

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package:BiostringsThe following object is masked frompackage:base:

    strsplit

    i   j length value
1  37 150      6     0
2  37 150      5     0
3  77 113      5     0
4  78 112      4     0
5  83  94      5     0
6 100 141      4     0
  i  j length value
1 1 72      1 -51.6
2 2 71      1 -51.6
3 3 70      1 -51.6
4 4 69      1 -51.6
5 5 68      1 -51.6
6 6 67      1 -51.6
Note connect data assigns structure energy level to all basepairs
  i  j length value
1 1 72      1 -31.5
2 2 71      1 -31.5
3 3 70      1 -31.5
4 4 69      1 -31.5
5 5 68      1 -31.5
6 6 67      1 -31.5
Note vienna data assigns structure energy level to all basepairs
   i   j length value
1  5 120      1    NA
2  6 119      1    NA
3  7 118      1    NA
4  8 117      1    NA
5  9 116      1    NA
6 10 115      1    NA
Note bpseq data has no value assigned to basepairs

R4RNA documentation built on Nov. 8, 2020, 5:15 p.m.