codonUsage: Calculate CU measures.

Description Usage Arguments Value Examples

Description

Calculate values of the codon usage (CU) measure for every sequence in the given codonTable object. The following methods are implemented: MILC, Measure Independent of Length and Composition Supek & Vlahovicek (2005), B, codon usage bias (B) Karlin et al. (2001), ENC, effective number of codons (ENC) Wright (1990). ENCprime, effective number of codons prime (ENC') Novembre (2002), MCB, maximum-likelihood codon bias (MCB) Urrutia and Hurst (2001), SCUO, synonymous codon usage eorderliness (SCUO) Wan et al. (2004).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
MILC(cTobject, subsets = list(), self = TRUE, ribosomal = FALSE,
  id_or_name2 = "1", alt.init = TRUE, stop.rm = FALSE,
  filtering = "none", len.threshold = 80)

## S4 method for signature 'codonTable'
MILC(cTobject, subsets = list(), self = TRUE,
  ribosomal = FALSE, id_or_name2 = "1", alt.init = TRUE,
  stop.rm = FALSE, filtering = "none", len.threshold = 80)

B(cTobject, subsets = list(), self = TRUE, ribosomal = FALSE,
  id_or_name2 = "1", alt.init = TRUE, stop.rm = FALSE,
  filtering = "none", len.threshold = 80)

## S4 method for signature 'codonTable'
B(cTobject, subsets = list(), self = TRUE,
  ribosomal = FALSE, id_or_name2 = "1", alt.init = TRUE,
  stop.rm = FALSE, filtering = "none", len.threshold = 80)

MCB(cTobject, subsets = list(), self = TRUE, ribosomal = FALSE,
  id_or_name2 = "1", alt.init = TRUE, stop.rm = FALSE,
  filtering = "none", len.threshold = 80)

## S4 method for signature 'codonTable'
MCB(cTobject, subsets = list(), self = TRUE,
  ribosomal = FALSE, id_or_name2 = "1", alt.init = TRUE,
  stop.rm = FALSE, filtering = "none", len.threshold = 80)

ENCprime(cTobject, subsets = list(), self = TRUE, ribosomal = FALSE,
  id_or_name2 = "1", alt.init = TRUE, stop.rm = TRUE,
  filtering = "none", len.threshold = 80)

## S4 method for signature 'codonTable'
ENCprime(cTobject, subsets = list(),
  self = TRUE, ribosomal = FALSE, id_or_name2 = "1",
  alt.init = TRUE, stop.rm = TRUE, filtering = "none",
  len.threshold = 80)

ENC(cTobject, id_or_name2 = "1", alt.init = TRUE, stop.rm = TRUE,
  filtering = "none", len.threshold = 80)

## S4 method for signature 'codonTable'
ENC(cTobject, id_or_name2 = "1",
  alt.init = TRUE, stop.rm = TRUE, filtering = "none",
  len.threshold = 80)

SCUO(cTobject, id_or_name2 = "1", alt.init = TRUE, stop.rm = FALSE,
  filtering = "none", len.threshold = 80)

## S4 method for signature 'codonTable'
SCUO(cTobject, id_or_name2 = "1",
  alt.init = TRUE, stop.rm = FALSE, filtering = "none",
  len.threshold = 80)

Arguments

cTobject

A codonTable object.

subsets

A (named) list of logical vectors, the length of each equal to getlen(cTobject), i.e. the number of sequences in the set, or character vectors (of any length) containing KEGG/eggNOG annotations, or codonTable objects (of any length). Not used for ENC, SCUO and GCB calculations.

self

Logical, if TRUE (default), CU statistic is also calculated against the average CU of the entire set of sequences. Not used for ENC, SCUO and GCB calculations.

ribosomal

Logical, if TRUE, CU statistic is also calculated against the average CU of the ribosomal genes in the sequence set. Not used for ENC and SCUO calculations. For GCB calculations, if TRUE, ribosomal genes are used as a seed, and if FALSE (default), seed has to be specified.

id_or_name2

A single string that uniquely identifies the genetic code to extract. Should be one of the values in the id or name2 columns of GENETIC_CODE_TABLE.

alt.init

logical, whether to use alternative initiation codons. Default is TRUE.

stop.rm

Logical, whether to remove stop codons. Default is FALSE.

filtering

Character vector, one of c("none", "soft", "hard"). Specifies whether sequences shorther than some threshold value of length (in codons), len.threshold, should be excluded from calculations. If "none" (default), length of sequences is not checked, if "soft", a warrning is printed if there are shorter sequences, and if "hard", these sequences are excluded from calculation.

len.threshold

Optional numeric, specifying sequence length, in codons, used for filtering.

Value

A matrix or a numeric vector with CU measure values. For MILC, B, ENCprime, the matrix has a column with values for every specified subset (subsets, self, ribosomal). A numeric vector for ENC and SCUO.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# load example DNA sequences
exampledir <- system.file("extdata", package = "coRdon")
cT <- codonTable(readSet(exampledir))

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
# In the examples below, MILC values are calculated for all sequences; 
# B and ENCprime can be caluclated in the same way.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

# calculate MILC distance to the average CU of the example DNA sequences
milc <- MILC(cT)
head(milc)

# also calculate MILC distance to the average CU
# of ribosomal genes among the example DNA sequences
milc <- MILC(cT, ribosomal = TRUE)
head(milc)

# calculate MILC distance to the average CU
# of the first 20 example DNA sequences
# (i.e. the first half of the example DNA set)
milc <- MILC(cT, self = FALSE,
             subsets = list(half = c(rep(TRUE, 20), rep(FALSE, 20))))

# alternatively, you can specify codonTable as a subset
halfcT <- codonTable(codonCounts(cT)[1:20,])
milc2 <- MILC(cT, self = FALSE, subsets = list(half = halfcT))
all.equal(milc, milc2) # TRUE

# filtering
MILC(cT, filtering = "hard", len.threshold = 80) # MILC for 9 sequences
sum(getlen(cT) > 80) # 9 sequences are longer than 80 codons
milc1 <- MILC(cT, filtering = "none") # no filtering
milc2 <- MILC(cT, filtering = "soft") # warning
all.equal(milc1, milc2) # TRUE

# options for genetic code
milc <- MILC(cT, stop.rm = TRUE) # don't use stop codons in calculation
milc <- MILC(cT, alt.init = FALSE) # don't use alternative start codons
milc <- MILC(cT, id_or_name2 = "2") # use different genetic code, for help
                                    # see `?Biostrings::GENETIC_CODE`

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
# In the examples below, ENC values are calculated for all sequences; 
# SCUO values can be caluclated in the same way.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

# calculate ENC
enc <- ENC(cT)
head(enc)

# filtering
ENC(cT, filtering = "hard", len.threshold = 80) # ENC for 9 sequences
sum(getlen(cT) > 80) # 9 sequences are longer than 80 codons
enc1 <- ENC(cT, filtering = "none") # no filtering
enc2 <- ENC(cT, filtering = "soft") # warning
all.equal(enc1, enc2) # TRUE

# options for genetic code
enc <- ENC(cT, stop.rm = TRUE) # don't use stop codons in calculation
enc <- ENC(cT, alt.init = FALSE) # don't use alternative start codons
enc <- ENC(cT, id_or_name2 = "2") # use different genetic code, for help
                                  # see `?Biostrings::GENETIC_CODE`

BioinfoHR/coRdon documentation built on May 6, 2019, 8:35 p.m.