LDA: Discriminant analysis of codon usage by taxon
In adamd3/codondiffR: Codon usage comparisons across taxa

LDA	R Documentation

Discriminant analysis of codon usage by taxon

Description

Perform linear discriminant analysis (LDA) on codon usage in a reference database and use it to classify sequences of unknown taxonomic affinity. Data can be optionally scaled using the Box-Cox power transformation.

Perform random sampling using different subsets of the reference database to assess the impact on model accuracy.

Usage

LDA(
  exclude = character(length = 0),
  minlen = 600,
  trans = FALSE,
  propTrain = 1,
  rank = "Phylum",
  corCut = 0.9
)

bootstrap_LDA(
  rep = 100,
  trans = FALSE,
  propTrain = 0.8,
  rank = "Phylum",
  exclude = character(length = 0),
  minlen = 600,
  corCut = 0.9,
  norm = FALSE
)

predict_LDA(
  cFobj,
  ldaObj,
  rank = "Phylum",
  minlen = 600,
  fname = NA_character_,
  units = "in",
  width = 10,
  height = 7,
  dpi = 600,
  norm = FALSE,
  plot = FALSE,
  identifier = NA_character_
)

## S4 method for signature 'ANY'
LDA(
  exclude = character(length = 0),
  minlen = 600,
  trans = FALSE,
  propTrain = 1,
  rank = "Phylum",
  corCut = 0.9
)

## S4 method for signature 'numeric'
bootstrap_LDA(
  rep = 100,
  trans = FALSE,
  propTrain = 0.8,
  rank = "Phylum",
  exclude = character(length = 0),
  minlen = 600,
  corCut = 0.9,
  norm = FALSE
)

Arguments

`exclude`	A character vector of codons to be excluded from comparisons.
`minlen`	Numeric, the minimum length of sequence (in codons) to be included in the analysis. Default = 500.
`trans`	Logical; if true, a Box-Cox transformation will be applied to the data. Default = FALSE.
`propTrain`	Numeric, proportion of the reference database to use for the LDA training set (must be in the range [0,1]). Default = 0.8.
`rank`	Character, taxonomic rank to be used for categorisation. Options are "Domain", "Kingdom", and "Phylum". Default = "Phylum".
`corCut`	Numeric, correlation cutoff used for dropping codons that exhibit collinearity. Default = 0.9.
`rep`	Numeric, number of bootstrap replicates to perform (default = 100).
`norm`	Logical, should the codon abundances be normalised? If TRUE, codon abundances will be converted to codon bias scores, such that the sum of scores for each amino acid sum to 1. Default = FALSE.

Value

An object of class lda, with the following components:

"prior": the prior probabilities used (determined from training set).

"means": the group means.

"scaling": a matrix which transforms observations to discriminant functions, normalized so that within groups covariance matrix is spherical.

"svd": the singular values, which give the ratio of the between- and within-group standard deviations on the linear discriminant variables. Their squares are the canonical F-statistics.

"N": The number of observations used.

"call": The (matched) function call to lda().

A named list with the following components:

"codons": the codons used for model construction, aftering filtering criteria have been applied

"taxa": taxa used for model construction/testing

"accuracy": the proportion of accurate classifications using the test subset of the data

Examples

    exclCod <- c("ATT", "TGT")
    LDA_tmp <- LDA(
        exclude = exclCod, rank = "Phylum", trans = FALSE,
        propTrain = 1, corCut = 0.95, minlen = 600
    )
    names(LDA_tmp)

   exclCod <- c("ATT", "TGT")
   boot <- bootstrap_LDA(
       rep = 100, propTrain = 0.8, trans = FALSE, rank = "Phylum",
       exclude = exclCod, minlen = 600, corCut = 0.95
   )
   names(boot)
   mean(boot$Accuracy)

adamd3/codondiffR documentation built on Sept. 3, 2022, 2:26 a.m.