R/mask_histogram.R

Defines functions mask_histogram

Documented in mask_histogram

#' Mask Histogram
#'
#' Display a histogram of mask bits.
#'
#' After a full embedding search, it is sometimes useful to see which bits
#' appear in a subset of the masks, for example, the masks with the lowest Gamma
#' values.  Filtering of the search results should be done before calling this
#' function, which uses whatever it is given.  The histogram can show which
#' predictors are generally useful.  For selecting an effective mask it isn't as
#' useful as you might think - it doesn't show interactions between predictors,
#' for mask selection it would only work for linear combinations of inputs.
#'
#' @export
#' @param fe_result Output data frame from fe_search.  Normally you would filter
#' this by, for example, selecting the top 100 results from that output.  If the
#' whole fe_search result was passed in, all of the mask bits would have the same
#' frequency and the histogram would be flat.
#' @param dimension Integer number of effective columns in a mask, ncol of the
#' predictors given to the search
#' @param tick_step Integer, where to put ticks on the x axis
#' @param caption A character string you can use to identify this graph
#' @return A ggplot object, a histogram showing the mask bits used in the fe_search
#' results that are passed to it
#' @examples
#' e6 <- embed(mgls, 7)
#' t <- e6[ ,1]
#' p <- e6[ ,2:7]
#' full_search <- fe_search(predictors = p, target = t)
#' goodies <- head(full_search, 20)
#' mask_histogram(goodies, 6, caption = "mask bits in top 20 Gammas")
#' baddies <- tail(full_search, 20)
#' mask_histogram(baddies, 6, caption = "bits appearing in 20 worst Gammas")
mask_histogram <- function(fe_result,
                           dimension,
                           tick_step = 2,
                           caption = "") {
  temp <- integer(dimension)
  for (i in 1:nrow(fe_result)) {
    temp <- temp + int_to_intMask(fe_result[ i, ]$mask, dimension)
  }
  # plot labels, seems like I'm missing an easier solution but here it is
  ticks <- seq(from = 1, to = dimension, by = tick_step)
  idx <- NULL
  ggplot(data = data.frame(idx = 1:dimension, temp),
         aes(x = idx, y = temp)) +
    geom_col(fill = "blue") +
    scale_x_continuous(breaks = ticks) +
    labs(
      title = "Distribution of masks",
      caption = caption,
      x = "Position",
      y = "Count"
    )
}

Try the sr package in your browser

Any scripts or data that you put into this service are public.

sr documentation built on March 31, 2023, 9:40 p.m.