# alignmentStatistics: Compute statistics for a multiple sequence alignments In R4RNA: An R package for RNA visualization and analysis

## Description

Functions to compute covariation, percent identity conservation, and percent canonical basepairs given a multiple sequence alignment and optionally a secondary structure. Statistics can be computed for a single base, basepair, helix or entire alignment.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15``` ``` baseConservation(msa, pos) basepairConservation(msa, pos.5p, pos.3p) basepairCovariation(msa, pos.5p, pos.3p) basepairCanonical(msa, pos.5p, pos.3p) helixConservation(helix, msa) helixCovariation(helix, msa) helixCanonical(helix, msa) alignmentConservation(msa) alignmentCovariation(msa, helix) alignmentCanonical(msa, helix) alignmentPercentGaps(msa) ```

## Arguments

 `helix` A helix data.frame `msa` A multiple sequence alignment. Can be either a `Biostrings` `XStringSet` object or a named array of strings like ones obtained from converting XStringSet with `as.character`. `pos, pos.5p, pos.3p` Positions of bases or basepairs for which statistics shall be calculated for.

## Details

Conservation values have a range of [0, 1], where 0 is the absence of primary sequence conservation (all bases different), and 1 is full primary sequence conservation (all bases identical).

Canonical values have a range of [0, 1], where 0 is a complete lack of basepair potential, and 1 indicates that all basepairs are valid

Covariation values have a range of [-2, 2], where -2 is a complete lack of basepair potential and sequence conservation, 0 is complete sequence conservation regardless of basepairing potential, and 2 is a complete lack of sequence conservation but maintaining full basepair potential.

`helix` values are average of base/basepair values, and the `alignment` values are averages of helices or all columns depending on whether the `helix` argument is required.

`alignmentPercentGaps` simply returns the percentage of nucleotides that are gaps in a sequence for each sequence of the alignment.

## Value

`baseConservation`, `basepairConservation`, `basepairCovariation`, `basepairCanonical`, `alignmentConservation`, `alignmentCovariation`, and `alignmentCanonical` return a single decimal value.

`helixConservation`, `helixCovariation`, `helixCanonical` return a list of values whose length equals the number of rows in `helix`.

`alignmentPercentGaps` returns a list of values whose length equals the number of sequences in the multiple sequence alignment.

## Author(s)

Jeff Proctor, Daniel Lai

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17``` ``` data(helix) baseConservation(fasta, 9) basepairConservation(fasta, 9, 18) basepairCovariation(fasta, 9, 18) basepairCanonical(fasta, 9, 18) helixConservation(helix, fasta) helixCovariation(helix, fasta) helixCanonical(helix, fasta) alignmentConservation(fasta) alignmentCovariation(fasta, helix) alignmentCanonical(fasta, helix) alignmentPercentGaps(fasta) ```

### Example output

```Loading required package: Biostrings

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

expand.grid

Attaching package: ‘Biostrings’

The following object is masked from ‘package:base’:

strsplit

G
1
G
0.7619048
GU
0.4761905
[1] 1
[1] 0.2644841 0.2857143 0.3666667 0.4285714 0.6619048 0.4583333 0.2678571
[8] 0.2644841 0.3809524 0.7857143 0.4761905 0.5238095 0.2857143 0.3452381
[15] 0.5918367 0.5416667 0.4970238 0.4095238 0.6507937 0.5158730 0.6269841
[22] 0.7380952 0.6944444 0.6598639 0.2952381 0.7321429 0.8690476 0.3690476
[29] 0.5238095 0.5238095 0.4764286 0.4080357 0.2869048 0.5095238 0.8928571
[36] 0.5190476 0.6250000 0.5476190 0.3125000 0.4482684 0.5555556 0.7500000
[43] 0.9642857 0.8666667 0.4136364 0.8857143 0.9047619 0.8392857 0.8650794
[50] 0.8285714 0.4190476 0.8333333 0.8630952 0.8928571 0.8452381 0.5306122
[1]  1.03174603  1.42857143  1.05714286  1.14285714  0.67619048  0.86904762
[7]  0.94047619  1.03174603 -1.23809524  0.42857143 -1.04761905  0.95238095
[13]  1.42857143 -0.83333333  0.81632653  0.75000000  0.93452381  0.24761905
[19]  0.60317460  0.79894180  0.63492063 -0.04761905  0.51587302  0.50340136
[25]  0.51428571  0.53571429  0.02380952  0.67460317 -0.95238095  0.95238095
[31]  0.67142857  0.96428571 -0.69047619  0.35238095  0.07142857  0.44761905
[37]  0.34523810  0.40952381  0.30357143  0.51948052  0.36507937  0.19047619
[43] -0.07142857 -0.03809524  0.51515152  0.00000000  0.04761905 -0.17857143
[49] -0.07936508 -0.11428571  0.40000000  0.00000000  0.13095238  0.07142857
[55]  0.16666667  0.13605442
[1] 0.9285714 1.0000000 0.9714286 1.0000000 1.0000000 0.9642857 0.9285714
[8] 0.9285714 0.1428571 1.0000000 0.7142857 1.0000000 1.0000000 0.4285714
[15] 1.0000000 0.9642857 0.9821429 0.7714286 0.9761905 0.9682540 0.9761905
[22] 0.8571429 0.8571429 0.9591837 0.8285714 1.0000000 0.9285714 0.8809524
[29] 0.5000000 1.0000000 0.9000000 0.9642857 0.5000000 0.8285714 0.9642857
[36] 0.8857143 0.8928571 0.8857143 0.7678571 0.8441558 0.8809524 0.9142857
[43] 0.9642857 0.9142857 0.8181818 0.9428571 0.9642857 0.8571429 0.9047619
[50] 0.8571429 0.8000000 0.8928571 0.9642857 0.9642857 0.9642857 0.7959184
[1] 0.523439
[1] 0.4796748
[1] 0.902439
AF183905.1/5647-5848 AF218039.1/6028-6228 AB017037.1/6286-6484
0.03809524           0.04285714           0.05238095
AB006531.1/6003-6204 AF014388.1/6078-6278 AF022937.1/6935-7121
0.03809524           0.04285714           0.10952381
AF178440.1/5925-6123
0.05238095
```

R4RNA documentation built on Nov. 8, 2020, 5:15 p.m.