plot_n_grams: Plot n-grams

Description Usage Arguments Details Note

View source: R/plot-n-grams.R

Description

This function takes a trained PPM model and plots transition probabilities computed by tabulating n-grams of length 1 and 2.

Usage

1
2
3
4
5
6
7
8
9
plot_n_grams(
  mod,
  pos = 1L,
  time = 0,
  max_alphabet_size = 30L,
  zero_indexed = FALSE,
  heights = c(0.25, 0.75),
  bigram_fill_scale = ggplot2::scale_fill_viridis_c("Probability (relative)")
)

Arguments

mod

A PPM model object as produced by (for example) new_ppm_simple or new_ppm_decay, and subsequently trained on input sequences using model_seq.

pos

(Integerish scalar) The nominal 'position' at which the n-gram counts are retrieved (only relevant for decay-based models).

time

(Numeric scalar) The nominal 'time' at which the n-grams are retrieved (only relevant for decay-based models).

max_alphabet_size

If the model's alphabet size is larger than this value, then the function will throw an error, to protect the user from trying to plot prohibitively large transition matrices.

zero_indexed

(Logical scalar) If zero_indexed = FALSE (default), then the alphabet is mapped to ascending integers beginning at 1; otherwise, the alphabet is mapped to ascending integers beginning at 0 (i.e. all symbols are decremented by 1).

heights

A numeric vector of length 2 specifying the relative heights of the top and bottom plot panel respectively.

bigram_fill_scale

A ggplot2 scale for the fill aesthetic of the bigram plot.

Details

The output comprises two panels. The top panel plots the empirical probability distribution of 1-grams; this captures the relative frequencies of different symbols in the alphabet. The bottom panel plots conditional probability distributions computed from 2-grams. Each row corresponds to a maximum-likelihood probability distribution for the next symbol conditioned on the preceding symbol indexed by that row. Each column corresponds to a different continuation. These 2-gram conditional probabilities are not plotted directly, but are instead plotted relative to the corresponding 1-gram probabilities (i.e. the 2-gram probability minus the 1-gram probability). This helps the reader to separate 2-gram structure from 1-gram structure.

Note

This function requires the following additional packages: dplyr, ggplot2, and egg, each of which can be installed using install.packages from CRAN.


pmcharrison/ppm documentation built on June 4, 2021, 9:45 a.m.