plots: LDA plots

predict_LDA,codonFreq-methodR Documentation

LDA plots

Description

Predict taxonomic classifications for sequences in a codonFreq) object, using linear discriminants from an lda) model. Plot disciminants and predictions.

Plot the distribution of MCUFD values per taxon

Usage

## S4 method for signature 'codonFreq'
predict_LDA(
  cFobj,
  ldaObj,
  rank = "Phylum",
  minlen = 600,
  fname = NA_character_,
  units = "in",
  width = 10,
  height = 7,
  dpi = 600,
  norm = FALSE,
  plot = FALSE,
  identifier = NA_character_
)

## S4 method for signature 'list'
MCUFD_plot(
  cFres,
  type = "bar",
  n = NA_real_,
  rank = "Phylum",
  fname = NA_character_,
  units = "in",
  width = 10,
  height = 7,
  dpi = 600,
  save = FALSE
)

Arguments

cFobj

An object of class codonFreq.

ldaObj

Object of class lda - produced using the lda() function.

rank

Character, taxonomic rank to be used for categorisation of the CUFD hit sequences. Options are "Domain", "Kingdom", and "Phylum". Default = "Phylum".

minlen

Numeric, the minimum length of sequence (in codons) to be included in the analysis. Default = 500.

fname

Character, name of figure generated.

units

Numeric, units to be used for defining the plot size. Options are "in" (default), "cm", and "mm".

width

Numeric, width of the figure (in units).

height

Numeric, height of the figure (in units).

dpi

Numeric, resolution of the figure (default = 600).

norm

Logical, should the codon abundances be normalised? If TRUE, codon abundances will be converted to codon bias scores, such that the sum of scores for each amino acid sum to 1. Default = FALSE.

plot

Logical, should the enrichment results be plotted? Default = FALSE.

identifier

Character, optional group label to be assigned to sequences in the codonFreq object. If not supplied, each sequence will be labelled individually on the plot.

cFres

List of data frames containing MCUFD results.

type

Character, the type of plot to make. Options are "bar" (default) and "line".

n

Numeric, the number of top-ranked reference taxa to be plotted per input sequence. Default = 100.

save

Logical, should the figure be saved to file? Default = FALSE.

Value

A ggplot object.

A ggplot object.

Examples

   virusSet <- readSeq(example = TRUE)
   virusCF <- codonFreq(virusSet)
    exclCod <- c("ATT", "TGT")
    LDA_tmp <- LDA(
        exclude = exclCod, rank = "Phylum", trans = FALSE,
        propTrain = 1, corCut = 0.95, minlen = 600
    )
    predLDA <- predict_LDA(
        virusCF, LDA_tmp, rank = "Phylum", plot = TRUE,
        minlen = 600, fname = "lda_tmp2", height = 5, width = 7
    )
    head(sort(table(predLDA$class), decreasing = TRUE))

   virusSet <- readSeq(example = TRUE)
   virusCF <- codonFreq(virusSet)
   exclCod <- c("ATT", "TGT")
   MCUFD_tmp <- MCUFD(virusCF, exclude = exclCod, norm = TRUE)
   MCUFD_plot(
       cFres = MCUFD_tmp, type = "bar", save = TRUE,
       fname = "MCUFD_bar_kingdom", n = 100, rank = "Kingdom"
   )


adamd3/codondiffR documentation built on Sept. 3, 2022, 2:26 a.m.