LDA: Discriminant analysis of codon usage by taxon

LDAR Documentation

Discriminant analysis of codon usage by taxon

Description

Perform linear discriminant analysis (LDA) on codon usage in a reference database and use it to classify sequences of unknown taxonomic affinity. Data can be optionally scaled using the Box-Cox power transformation.

Perform random sampling using different subsets of the reference database to assess the impact on model accuracy.

Usage

LDA(
  exclude = character(length = 0),
  minlen = 600,
  trans = FALSE,
  propTrain = 1,
  rank = "Phylum",
  corCut = 0.9
)

bootstrap_LDA(
  rep = 100,
  trans = FALSE,
  propTrain = 0.8,
  rank = "Phylum",
  exclude = character(length = 0),
  minlen = 600,
  corCut = 0.9,
  norm = FALSE
)

predict_LDA(
  cFobj,
  ldaObj,
  rank = "Phylum",
  minlen = 600,
  fname = NA_character_,
  units = "in",
  width = 10,
  height = 7,
  dpi = 600,
  norm = FALSE,
  plot = FALSE,
  identifier = NA_character_
)

## S4 method for signature 'ANY'
LDA(
  exclude = character(length = 0),
  minlen = 600,
  trans = FALSE,
  propTrain = 1,
  rank = "Phylum",
  corCut = 0.9
)

## S4 method for signature 'numeric'
bootstrap_LDA(
  rep = 100,
  trans = FALSE,
  propTrain = 0.8,
  rank = "Phylum",
  exclude = character(length = 0),
  minlen = 600,
  corCut = 0.9,
  norm = FALSE
)

Arguments

exclude

A character vector of codons to be excluded from comparisons.

minlen

Numeric, the minimum length of sequence (in codons) to be included in the analysis. Default = 500.

trans

Logical; if true, a Box-Cox transformation will be applied to the data. Default = FALSE.

propTrain

Numeric, proportion of the reference database to use for the LDA training set (must be in the range [0,1]). Default = 0.8.

rank

Character, taxonomic rank to be used for categorisation. Options are "Domain", "Kingdom", and "Phylum". Default = "Phylum".

corCut

Numeric, correlation cutoff used for dropping codons that exhibit collinearity. Default = 0.9.

rep

Numeric, number of bootstrap replicates to perform (default = 100).

norm

Logical, should the codon abundances be normalised? If TRUE, codon abundances will be converted to codon bias scores, such that the sum of scores for each amino acid sum to 1. Default = FALSE.

Value

An object of class lda, with the following components:

"prior": the prior probabilities used (determined from training set).

"means": the group means.

"scaling": a matrix which transforms observations to discriminant functions, normalized so that within groups covariance matrix is spherical.

"svd": the singular values, which give the ratio of the between- and within-group standard deviations on the linear discriminant variables. Their squares are the canonical F-statistics.

"N": The number of observations used.

"call": The (matched) function call to lda().

A named list with the following components:

"codons": the codons used for model construction, aftering filtering criteria have been applied

"taxa": taxa used for model construction/testing

"accuracy": the proportion of accurate classifications using the test subset of the data

Examples

    exclCod <- c("ATT", "TGT")
    LDA_tmp <- LDA(
        exclude = exclCod, rank = "Phylum", trans = FALSE,
        propTrain = 1, corCut = 0.95, minlen = 600
    )
    names(LDA_tmp)

   exclCod <- c("ATT", "TGT")
   boot <- bootstrap_LDA(
       rep = 100, propTrain = 0.8, trans = FALSE, rank = "Phylum",
       exclude = exclCod, minlen = 600, corCut = 0.95
   )
   names(boot)
   mean(boot$Accuracy)


adamd3/codondiffR documentation built on Sept. 3, 2022, 2:26 a.m.