codonUsage-expressivity: Calculate CU expressivity measures.

Description Usage Arguments Value Examples

Description

Calculate values of the CU expressivity measure for every sequence in the given codonTable object. The following methods are implemented: MELP, CU expressivity measure based on Measure Independent of Length and Composition Supek & Vlahovicek (2005), E, gene expression measure (E) Karlin and Mrazek (2000), CAI, Codon Adaptation Index (CAI) Sharp and Li (1987), Fop, frequency of optimal codons (Fop) Ikemura (1981), GCB, gene codon bias (GCB) Merkl (2003).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
MELP(cTobject, subsets = list(), ribosomal = FALSE,
  id_or_name2 = "1", alt.init = TRUE, stop.rm = FALSE,
  filtering = "none", len.threshold = 80)

## S4 method for signature 'codonTable'
MELP(cTobject, subsets = list(),
  ribosomal = FALSE, id_or_name2 = "1", alt.init = TRUE,
  stop.rm = FALSE, filtering = "none", len.threshold = 80)

E(cTobject, subsets = list(), ribosomal = FALSE, id_or_name2 = "1",
  alt.init = TRUE, stop.rm = FALSE, filtering = "none",
  len.threshold = 80)

## S4 method for signature 'codonTable'
E(cTobject, subsets = list(), ribosomal = FALSE,
  id_or_name2 = "1", alt.init = TRUE, stop.rm = FALSE,
  filtering = "none", len.threshold = 80)

CAI(cTobject, subsets = list(), ribosomal = FALSE, id_or_name2 = "1",
  alt.init = TRUE, stop.rm = FALSE, filtering = "none",
  len.threshold = 80)

## S4 method for signature 'codonTable'
CAI(cTobject, subsets = list(),
  ribosomal = FALSE, id_or_name2 = "1", alt.init = TRUE,
  stop.rm = FALSE, filtering = "none", len.threshold = 80)

Fop(cTobject, subsets = list(), ribosomal = FALSE, id_or_name2 = "1",
  alt.init = TRUE, stop.rm = FALSE, filtering = "none",
  len.threshold = 80)

## S4 method for signature 'codonTable'
Fop(cTobject, subsets = list(),
  ribosomal = FALSE, id_or_name2 = "1", alt.init = TRUE,
  stop.rm = FALSE, filtering = "none", len.threshold = 80)

GCB(cTobject, seed = logical(), ribosomal = FALSE, perc = 0.05,
  id_or_name2 = "1", alt.init = TRUE, stop.rm = FALSE,
  filtering = "none", len.threshold = 80)

## S4 method for signature 'codonTable'
GCB(cTobject, seed = logical(),
  ribosomal = FALSE, perc = 0.05, id_or_name2 = "1",
  alt.init = TRUE, stop.rm = FALSE, filtering = "none",
  len.threshold = 80)

Arguments

cTobject

A codonTable object.

subsets

A (named) list of logical vectors, the length of each equal to getlen(cTobject), i.e. the number of sequences in the set, or character vectors (of any length) containing KEGG/eggNOG annotations, or codonTable objects (of any length). Not used for ENC, SCUO and GCB calculations.

ribosomal

Logical, if TRUE, CU statistic is also calculated against the average CU of the ribosomal genes in the sequence set. Not used for ENC and SCUO calculations. For GCB calculations, if TRUE, ribosomal genes are used as a seed, and if FALSE (default), seed has to be specified.

id_or_name2

A single string that uniquely identifies the genetic code to extract. Should be one of the values in the id or name2 columns of GENETIC_CODE_TABLE.

alt.init

logical, whether to use alternative initiation codons. Default is TRUE.

stop.rm

Logical, whether to remove stop codons. Default is FALSE.

filtering

Character vector, one of c("none", "soft", "hard"). Specifies whether sequences shorther than some threshold value of length (in codons), len.threshold, should be excluded from calculations. If "none" (default), length of sequences is not checked, if "soft", a warrning is printed if there are shorter sequences, and if "hard", these sequences are excluded from calculation.

len.threshold

Optional numeric, specifying sequence length, in codons, used for filtering.

seed

A logical vector, of the length equal to getlen(cTobject), or a character vector (of any length) containing KEGG/eggNOG annotations, or a codonTable object (of any length). Used only in GCB calculation. Indicates a set of genes, or their CU, to be used as a target in the first iteration of the algorithm.

perc

percent of top ranking genes to be used as a target set for the next iteration of the algorithm that calculates GCB. Default is 0.05.

Value

A matrix (for GCB a numeric vector) with CU expressivity values for every specified subset (subsets, self, ribosomal) in columns.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# load example DNA sequences
exampledir <- system.file("extdata", package = "coRdon")
cT <- codonTable(readSet(exampledir))

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
# In the examples below, MELP values are calculated for all sequences; 
# any other CU expressivity measure can be caluclated in the same way,
# the only exception being GCB which takes `seed` instead of `subset` 
# parameter. (The exemples for GCB calculation are further below).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

# calculate MELP with respect to the CU
# of ribosomal genes among the example DNA sequences
melp <- MELP(cT, ribosomal = TRUE)
head(melp)

# calculate MELP distance with respect to the average CU
# of the first 20 example DNA sequences
# (i.e. the first half of the example DNA set)
melp <- MELP(cT, subsets = list(half = c(rep(TRUE, 20), rep(FALSE, 20))))

# alternatively, you can specify codonTable as a subset
halfcT <- codonTable(codonCounts(cT)[1:20,])
melp2 <- MELP(cT, subsets = list(half = halfcT))
all.equal(melp, melp2) # TRUE

# filtering
MELP(cT, ribosomal = TRUE,
     filtering = "hard", len.threshold = 80) # MELP for 9 sequences
                                             # (note that, accidentally,
                                             # all are ribosomal)
sum(getlen(cT) > 80) # 9 sequences are longer than 80 codons
melp1 <- MELP(cT, ribosomal = TRUE, filtering = "none") # no filtering
melp2 <- MELP(cT, ribosomal = TRUE, filtering = "soft") # warning
all.equal(melp1, melp2) # TRUE

# options for genetic code
melp <- MELP(cT, ribosomal = TRUE,
             stop.rm = TRUE) # don't use stop codons in calculation
melp <- MELP(cT, ribosomal = TRUE,
             alt.init = FALSE) # don't use alternative start codons
melp <- MELP(cT, ribosomal = TRUE,
             id_or_name2 = "2") # use different genetic code, for help
                                # see `?Biostrings::GENETIC_CODE`
                                
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
# GCB calculationd
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

# calculate GCB with CU of ribosomal genes among the example DNA sequences
# used as a target (seed) in the first iteration of the algorithm
gcb <- GCB(cT, ribosomal = TRUE)
head(gcb)

# calculate GCB distance with the first 20 example DNA sequences
# (i.e. the first half of the example DNA set) as a seed
gcb <- GCB(cT, seed = c(rep(TRUE, 20), rep(FALSE, 20)))

# alternatively, you can specify codonTable as a seed
halfcT <- codonTable(codonCounts(cT)[1:20,])
gcb2 <- GCB(cT, seed = halfcT)
all.equal(gcb, gcb2) # TRUE

# options for genetic code
gcb <- GCB(cT, ribosomal = TRUE,
           stop.rm = TRUE) # don't use stop codons in calculation
gcb <- GCB(cT, ribosomal = TRUE,
           alt.init = FALSE) # don't use alternative start codons
gcb <- GCB(cT, ribosomal = TRUE,
           id_or_name2 = "2") # use different genetic code, for help
                              # see `?Biostrings::GENETIC_CODE`
                              

coRdon documentation built on Nov. 8, 2020, 5:28 p.m.