R/gofplot.R

Defines functions plot.univariate plot.rocpr plot.pr plot.roc plot.boxplot plot.gof print.gof print.univariate print.rocpr print.pr print.roc print.boxplot

Documented in plot.boxplot plot.gof plot.pr plot.roc plot.rocpr plot.univariate print.boxplot print.gof print.pr print.roc print.rocpr print.univariate

#' Plot and print methods for GOF output
#' 
#' Plot and print methods for goodness-of-fit output for network models.
#' 
#' These plot and print methods serve to display the output generated by the
#' \code{gof} function and its methods. See the help page of
#' \code{\link{gof-methods}} for details on how to compute goodness-of-fit
#' statistics.
#' 
#' @param add Add the ROC and/or PR curve to an existing plot?
#' @param avg Averaging pattern for the ROC and PR curve(s) if multiple target
#'   time steps were used. Allowed values are \code{"none"} (plot all curves
#'   separately), \code{"horizontal"} (horizontal averaging), \code{"vertical"}
#'   (vertical averaging), and \code{"threshold"} (threshold (= cutoff)
#'   averaging). Note that while threshold averaging is always feasible,
#'   vertical and horizontal averaging are not well-defined if the graph cannot
#'   be represented as a function x->y and y->x, respectively. More information
#'   can be obtained from the help pages of the \pkg{ROCR} package, the
#'   functions of which are employed here.
#' @param border Color of the borders of the boxplots.
#' @param boxplot.lwd Line width of boxplot.
#' @param col Color of the ROC or PR curve.
#' @param lwd Line width.
#' @param main Main title of a GOF plot.
#' @param mean Plot the mean curve for the observed network?
#' @param mean.col Color of the mean of the observed network statistic.
#' @param mean.lty Line type of mean line. For example "dashed" or "solid".
#' @param mean.lwd Line width of mean line.
#' @param median Plot the median curve for the observed network?
#' @param median.col Color of the median of the observed network statistic.
#' @param median.lty Line type of median line. For example "dashed" or "solid".
#' @param median.lwd Line width of median line.
#' @param mfrow Should the GOF plots come out separately (\code{mfrow = FALSE}),
#'   or should all statistics be aligned in a single diagram
#'   (\code{mfrow = TRUE})? Returning the plots separately can be helpful if the
#'   output is redirected to a multipage PDF or TIFF file.
#' @param obs.adjust Bandwidth adjustment parameter for the density curve.
#' @param obs.bar Draw a bar for the median of the statistic for the observed
#'   networks?
#' @param obs.col Color for the observed network(s).
#' @param obs.density Draw a density curve fot the statistic for the observed
#'   networks?
#' @param obs.hist Draw a histogram for the observed networks?
#' @param obs.lwd Line width for the observed network(s).
#' @param outline Print outliers in the boxplots?
#' @param pr.avg Averaging pattern for the PR curve(s) if multiple target time
#'   steps were used. Allowed values are \code{"none"} (plot all curves
#'   separately), \code{"horizontal"} (horizontal averaging), \code{"vertical"}
#'   (vertical averaging), and \code{"threshold"} (threshold (= cutoff)
#'   averaging). Note that while threshold averaging is always feasible,
#'   vertical and horizontal averaging are not well-defined if the graph cannot
#'   be represented as a function x->y and y->x, respectively. More information
#'   can be obtained from the help pages of the \pkg{ROCR} package, the
#'   functions of which are employed here.
#' @param pr.col Color of the PR curve.
#' @param pr.lwd Line width.
#' @param pr.poly If a value of \code{0} is set, nothing special happens. If a
#'   value of \code{1} is set, a straight line is fitted through the PR curve
#'   and displayed. Values between \code{2} and \code{9} fit higher-order
#'   polynomial curves through the PR curve and display the resulting curve.
#'   This argument allows to check whether the imputation of the first precision
#'   value in the PR curve yielded a reasonable result (in case the value had to
#'   be imputed).
#' @param pr.random.col Color of the PR curve of the random graph prediction.
#' @param pr.rgraph Should an PR curve also be drawn for a random graph? This
#'   serves as a baseline against which to compare the actual PR curve.
#' @param pr.spread.estimate When multiple target time steps are used and curve
#'   averaging is enabled, the variation around the average curve can be
#'   visualized as standard error bars (\code{"stderror"}), standard deviation
#'   bars (\code{"stddev"}), or by using box plots (\code{"boxplot"}). Note that
#'   the function plotCI, which is used internally by the \pkg{ROCR} package to
#'   draw error bars, might raise a warning if the spread of the curves at
#'   certain positions is 0. More details can be found in the documentation of
#'   the \pkg{ROCR} package, the functions of which are employed here.
#' @param random.col Color of the ROC or PR curve of the random graph
#'   prediction.
#' @param relative Print relative frequencies (as opposed to absolute
#'   frequencies) of a statistic on the y axis?
#' @param rgraph Should an ROC or PR curve also be drawn for a random graph?
#'   This serves as a baseline against which to compare the actual ROC or PR
#'   curve.
#' @param roc.avg Averaging pattern for the ROC curve(s) if multiple target time
#'   steps were used. Allowed values are \code{"none"} (plot all curves
#'   separately), \code{"horizontal"} (horizontal averaging), \code{"vertical"}
#'   (vertical averaging), and \code{"threshold"} (threshold (= cutoff)
#'   averaging). Note that while threshold averaging is always feasible,
#'   vertical and horizontal averaging are not well-defined if the graph cannot
#'   be represented as a function x->y and y->x, respectively. More information
#'   can be obtained from the help pages of the \pkg{ROCR} package, the
#'   functions of which are employed here.
#' @param roc.col Color of the ROC curve.
#' @param roc.lwd Line width.
#' @param roc.random.col Color of the ROC curve of the random graph prediction.
#' @param roc.rgraph Should an ROC curve also be drawn for a random graph? This
#'   serves as a baseline against which to compare the actual ROC curve.
#' @param roc.spread.estimate When multiple target time steps are used and curve
#'   averaging is enabled, the variation around the average curve can be
#'   visualized as standard error bars (\code{"stderror"}), standard deviation
#'   bars (\code{"stddev"}), or by using box plots (\code{"boxplot"}). Note that
#'   the function plotCI, which is used internally by the \pkg{ROCR} package to
#'   draw error bars, might raise a warning if the spread of the curves at
#'   certain positions is 0. More details can be found in the documentation of
#'   the \pkg{ROCR} package, the functions of which are employed here.
#' @param sim.adjust Bandwidth adjustment parameter for the density curve.
#' @param sim.bar Draw a bar for the median of the statistic for the simulated
#'   networks?
#' @param sim.col Color for the simulated networks.
#' @param sim.density Draw a density curve fot the statistic for the simulated
#'   networks?
#' @param sim.hist Draw a histogram for the simulated networks?
#' @param sim.lwd Line width for the simulated networks.
#' @param spread.estimate When multiple target time steps are used and curve
#'   averaging is enabled, the variation around the average curve can be
#'   visualized as standard error bars (\code{"stderror"}), standard deviation
#'   bars (\code{"stddev"}), or by using box plots (\code{"boxplot"}). Note that
#'   the function plotCI, which is used internally by the \pkg{ROCR} package to
#'   draw error bars, might raise a warning if the spread of the curves at
#'   certain positions is 0. More details can be found in the documentation of
#'   the \pkg{ROCR} package, the functions of which are employed here.
#' @param transform A function which transforms the y values used for the
#'   boxplots. For example, if some of the values become very large and make the
#'   output illegible, \code{transform = function(x) x^0.1} or a similar
#'   transformation of the values can be used. Note that logarithmic
#'   transformations often produce infinite values because \code{log(0) = -Inf},
#'   so one should rather use something like
#'   \code{transform = function(x) log1p} to avoid infinite values.
#' @param x An object created by one of the \code{gof} methods.
#' @param xlab Label of the x-axis of a GOF plot.
#' @param xlim Horizontal limit of the boxplots. Only the maximum value must be
#'   provided, e.g., \code{xlim = 8}.
#' @param ylab Label of the y-axis of a GOF plot.
#' @param ... Arbitrary further arguments.
#' 
#' @references
#' Leifeld, Philip, Skyler J. Cranmer and Bruce A. Desmarais (2018): Temporal
#' Exponential Random Graph Models with btergm: Estimation and Bootstrap
#' Confidence Intervals. \emph{Journal of Statistical Software} 83(6): 1--36.
#' \doi{10.18637/jss.v083.i06}.
#' 
#' @docType methods
#' @name gof-plot
#' @aliases gofplot plotgof plot-gof print,gof-method print,univariate-method
#'   print,boxplot-method print,roc-method print,pr-method print,rocpr-method
#'   plot,gof-method plot,univariate-method plot,boxplot-method plot,roc-method
#'   plot,pr-method plot,rocpr-method plot,gof-method
#' @importFrom graphics plot
NULL

#' @rdname gof-plot
#' @export
print.boxplot <- function(x, ...) {
  message(x$label, "\n")
  printCoefmat(x$stats, has.Pvalue = TRUE, cs.ind=numeric(0), 
      signif.legend = FALSE)
  message(paste("\nNote: Small p-values indicate a significant difference",
      "\n      between simulations and observed network(s)."))
}

#' @rdname gof-plot
#' @export
print.roc <- function(x, ...) {
  message(x$label, "\n")
  auc <- cbind(x$auc.roc, x$auc.roc.rgraph)
  colnames(auc) <- c("ROC model", "ROC random")
  rownames(auc) <- 1:nrow(auc)
  print(auc)
}

#' @rdname gof-plot
#' @export
print.pr <- function(x, ...) {
  message(x$label, "\n")
  auc <- cbind(x$auc.pr, x$auc.pr.rgraph)
  colnames(auc) <- c("PR model", "PR random")
  rownames(auc) <- 1:nrow(auc)
  print(auc)
}

#' @rdname gof-plot
#' @export
print.rocpr <- function(x, ...) {
  message(x$label, "\n")
  auc <- cbind(x$auc.roc, x$auc.roc.rgraph, x$auc.pr, x$auc.pr.rgraph)
  colnames(auc) <- c("ROC model", "ROC random", "PR model", "PR random")
  rownames(auc) <- 1:nrow(auc)
  print(auc)
}

#' @rdname gof-plot
#' @export
print.univariate <- function(x, ...) {
  message(x$label, "\n")
  message("Observed:")
  print(summary(x$obs))
  message("\nSimulated:")
  print(summary(x$sim))
}

#' @rdname gof-plot
#' @export
print.gof <- function(x, ...) {
  for (i in 1:length(x)) {
    cat("\n")
    print(x[[i]])
  }
}

#' @rdname gof-plot
#' @importFrom graphics plot
#' @export
plot.gof <- function(x, mfrow = TRUE, ...) {
  if (mfrow == TRUE) {
    l <- length(x)
    if (l == 0) {
      stop("'x' does not contain any gof objects.")
    } else if (l == 1) {
      # do nothing
    } else if (l == 2) {
      par(mfrow = c(1, 2))
    } else if (l == 3) {
      par(mfrow = c(1, 3))
    } else if (l == 4) {
      par(mfrow = c(2, 2))
    } else if (l == 5 || l == 6) {
      par(mfrow = c(2, 3))
    } else if (l > 6 && l < 10) {
      par(mfrow = c(3, 3))
    } else if (l > 9 && l < 13) {
      par(mfrow = c(4, 3))
    } else if (l > 12 && l < 17) {
      par(mfrow = c(4, 4))
    } else if (l > 16 && l < 21) {
      par(mfrow = c(5, 4))
    } else if (l > 20 && l < 26) {
      par(mfrow = c(5, 5))
    } else {
      warning("Too many gof objects for tiling. Switching off 'mfrow'.")
    }
  }
  for (i in 1:length(x)) {
    plot(x[[i]], ...)
  }
}

#' @rdname gof-plot
#' @importFrom graphics abline boxplot hist lines par points
#' @importFrom Matrix t
#' @export
plot.boxplot <- function(x, relative = TRUE, transform = function(x) x, 
    xlim = NULL, main = x$label, xlab = x$label, ylab = "Frequency", 
    border = "darkgray", boxplot.lwd = 0.8, outline = FALSE, median = TRUE, 
    median.col = "black", median.lty = "solid", median.lwd = 2, mean = TRUE, 
    mean.col = "black", mean.lty = "dashed", mean.lwd = 1, ...) {
  
  # transform data
  mat <- t(x$raw)
  mat <- apply(mat, 1:2, transform)
  if (ncol(x$stats == 9)) {
    x$stats[, 2] <- transform(x$stats[, 2])
  }
  x$stats[, 1] <- transform(x$stats[, 1])
  
  # convert to relative frequencies
  if (relative == TRUE) {
    rs <- rowSums(mat)
    for (i in 1:nrow(mat)) {
      if (rs[i] == 0) {
        mat[i, ] <- 0
      } else {
        mat[i, ] <- mat[i, ] / rs[i]
      }
    }
    cs <- colSums(x$stats)[1:2]
    for (i in 1:2) {
      if (cs[i] == 0) {
        x$stats[, i] <- 0
      } else {
        x$stats[, i] <- x$stats[, i] / cs[i]
      }
    }
  }
  
  # find minimum and maximum values for plotting
  if (ncol(x$stats) == 9) { # several observed networks
    obs.mean <- x$stats[, 1]
    obs <- x$stats[, 2]
  } else { # only one observed network
    obs <- x$stats[, 1]
  }
  if (any(is.infinite(c(mat, obs)))) {
    stop(paste("Simulated or observed values contain infinite values.", 
        "Check the 'transform' argument."))
  }
  
  # impose xlim
  if (!is.null(xlim)) {
    if (colnames(mat)[ncol(mat)] == "Inf") {
      if (xlim > nrow(x$stats) - 2) {
        xlim <- nrow(x$stats) - 1
        warning("'xlim' was out of bounds. Replaced by maximum possible.")
      }
      mat <- mat[, c(1:(xlim), ncol(mat))]
      obs <- obs[c(1:(xlim), length(obs))]
      if (exists("obs.mean")) {
        obs.mean <- obs.mean[c(1:(xlim), length(obs.mean))]
      }
    } else {
      if (xlim > ncol(mat) - 1) {
        warning("'xlim' was out of bounds. Replaced by maximum possible.")
      } else {
        mat <- mat[, c(1:(xlim + 1))]
        obs <- obs[c(1:(xlim + 1))]
        if (exists("obs.mean")) {
          obs.mean <- obs.mean[c(1:(xlim + 1))]
        }
      }
    }
  }
  
  # set empty main title if NULL
  if (is.null(main)) {
    main <- ""
  }
  
  # plot boxplots and curves
  boxplot(mat, border = border, xlab = xlab, ylab = ylab, main = main, 
      outline = outline, lwd = boxplot.lwd, ...)
  if (ncol(x$stats) == 9 && mean == TRUE) {
    lines(obs.mean, lwd = mean.lwd, type = "l", lty = mean.lty, col = mean.col)
  }
  if (median == TRUE) {
    lines(obs, lwd = median.lwd, type = "l", lty = median.lty, col = median.col)
  }
}

#' @rdname gof-plot
#' @export
plot.roc <- function(x, add = FALSE, main = x$label, avg = c("none", 
    "horizontal", "vertical", "threshold"), spread.estimate = c("boxplot", 
    "stderror", "stddev"), lwd = 3, rgraph = FALSE, col = "#bd0017", 
    random.col = "#bd001744", ...) {
  
  plot(x$roc, avg = avg[1], spread.estimate = spread.estimate[1], 
      add = add, col = col, main = main, lwd = lwd, 
      boxplot.boxcol = col, ylim = c(0, 1), ...)
  if (rgraph == TRUE) {
    plot(x$roc.rgraph, avg = avg[1], spread.estimate = "none", 
        col = random.col, add = TRUE, lwd = lwd, ylim = c(0, 1), ...)
  }
}

#' @rdname gof-plot
#' @export
plot.pr <- function(x, add = FALSE, main = x$label, avg = c("none", 
    "horizontal", "vertical", "threshold"), spread.estimate = c("boxplot", 
    "stderror", "stddev"), lwd = 3, rgraph = FALSE, col = "#5886be", 
    random.col = "#5886be44", pr.poly = 0, ...) {
  
  plot(x$pr, avg = avg[1], spread.estimate = spread.estimate[1], 
      add = add, col = col, main = main, lwd = lwd, 
      boxplot.boxcol = col, ylim = c(0, 1), ...)
  if (rgraph == TRUE) {
    plot(x$pr.rgraph, avg = avg[1], spread.estimate = "none", 
        col = random.col, add = TRUE, lwd = lwd, ylim = c(0, 1), ...)
  }
  
  # fit polynomial curve through PR curve
  if (pr.poly > 0) {
    for (i in 1:length(x$pr@y.values)) {
      prec <- x$pr@y.values[[i]]
      prec[1] <- NaN
      rec <- x$pr@x.values[[i]]
      p <- data.frame(poly(rec, pr.poly, raw = TRUE))
      fit <- lm(prec ~ ., data = p)
      yhat <- predict(fit, newdata = p)
      for (j in 1:length(yhat)) {
        if (yhat[j] > 1) {
          yhat[j] <- 1
        }
        if (yhat[j] < 0) {
          yhat[j] <- 0
        }
      }
      lines(rec, yhat, type = "l", lty = "dashed", col = "red", 
          lwd = lwd)
      points(rec[1], yhat[1], pch = 1, col = "red", cex = 2)
      points(x$pr@x.values[[i]][1], x$pr@y.values[[i]][1], pch = 1, 
          col = col, cex = 2)
    }
  }
}

#' @rdname gof-plot
#' @export
plot.rocpr <- function(x, main = x$label, roc.avg = c("none", "horizontal", 
    "vertical", "threshold"), roc.spread.estimate = c("boxplot", "stderror", 
    "stddev"), roc.lwd = 3, roc.rgraph = FALSE, roc.col = "#bd0017", 
    roc.random.col = "#bd001744", pr.avg = c("none", "horizontal", 
    "vertical", "threshold"), pr.spread.estimate = c("boxplot", "stderror", 
    "stddev"), pr.lwd = 3, pr.rgraph = FALSE, pr.col = "#5886be", 
    pr.random.col = "#5886be44", pr.poly = 0, ...) {
  
  plot(x$roc, avg = roc.avg[1], spread.estimate = roc.spread.estimate[1], 
      col = roc.col, main = main, lwd = roc.lwd, boxplot.boxcol = roc.col, 
      ylim = c(0, 1), ylab = "TPR / PPV", xlab = "FPR / TPR", ...)
  if (roc.rgraph == TRUE) {
    plot(x$roc.rgraph, avg = roc.avg[1], spread.estimate = "none", 
        col = roc.random.col, add = TRUE, lwd = roc.lwd, ylim = c(0, 1), ...)
  }
  
  plot(x$pr, avg = pr.avg[1], spread.estimate = pr.spread.estimate[1], 
      col = pr.col, main = main, lwd = pr.lwd, boxplot.boxcol = pr.col, 
      ylim = c(0, 1), add = TRUE, ...)
  if (pr.rgraph == TRUE) {
    plot(x$pr.rgraph, avg = pr.avg[1], spread.estimate = "none", 
        col = pr.random.col, add = TRUE, lwd = pr.lwd, ylim = c(0, 1), ...)
  }
  
  # fit polynomial curve through PR curve
  if (pr.poly > 0) {
    for (i in 1:length(x$pr@y.values)) {
      prec <- x$pr@y.values[[i]]
      prec[1] <- NaN
      rec <- x$pr@x.values[[i]]
      p <- data.frame(poly(rec, pr.poly, raw = TRUE))
      fit <- lm(prec ~ ., data = p)
      yhat <- predict(fit, newdata = p)
      for (j in 1:length(yhat)) {
        if (yhat[j] > 1) {
          yhat[j] <- 1
        }
        if (yhat[j] < 0) {
          yhat[j] <- 0
        }
      }
      lines(rec, yhat, type = "l", lty = "dashed", col = "red", 
          lwd = pr.lwd)
      points(rec[1], yhat[1], pch = 1, col = "red", cex = 2)
      points(x$pr@x.values[[i]][1], x$pr@y.values[[i]][1], pch = 1, 
          col = pr.col, cex = 2)
    }
  }
}

#' @rdname gof-plot
#' @export
plot.univariate <- function(x, main = x$label, sim.hist = TRUE, sim.bar = TRUE, 
    sim.density = TRUE, obs.hist = FALSE, obs.bar = TRUE, obs.density = TRUE, 
    sim.adjust = 1, obs.adjust = 1, sim.lwd = 2, obs.lwd = 2, 
    sim.col = "black", obs.col = "red", ...) {
  d.sim <- density(x$sim, adjust = sim.adjust)
  d.sim.max.y <- max(d.sim$y)
  d.sim.max.x <- max(d.sim$x)
  d.sim.min.y <- min(d.sim$y)
  d.sim.min.x <- min(d.sim$x)
  d.sim.median <- median(x$sim)
  if (length(x$obs) > 1) {
    d.obs <- density(x$obs, adjust = obs.adjust)
    d.obs.max.y <- max(d.obs$y)
    d.obs.max.x <- max(d.obs$x)
    d.obs.min.y <- min(d.obs$y)
    d.obs.min.x <- min(d.obs$x)
    d.obs.median <- median(x$obs)
  } else {
    d.obs.max.y <- 0
    d.obs.max.x <- x$obs
    d.obs.min.y <- 0
    d.obs.min.x <- x$obs
  }
  h <- hist(x$sim, plot = FALSE)
  y.max <- max(c(d.sim.max.y, d.obs.max.y, h$density))
  x.max <- max(c(d.sim.max.x, d.obs.max.x, h$breaks))
  y.min <- min(c(d.sim.min.y, d.obs.min.y, h$density))
  x.min <- min(c(d.sim.min.x, d.obs.min.x, h$breaks))
  plot(0, xlim = c(x.min, x.max), ylim = c(y.min, y.max), 
      xlab = x$label, ylab = "Density", main = main, type = "n")
  if (sim.hist == TRUE) {
    hist(x$sim, add = TRUE, lwd = sim.lwd, freq = FALSE)
  }
  if (sim.bar == TRUE) {
    lines(rep(d.sim.median, 2), c(y.min, y.max), col = sim.col, lwd = sim.lwd)
  }
  if (sim.density == TRUE) {
    lines(d.sim, lwd = sim.lwd)
  }
  if (obs.hist == TRUE) {
    hist(x$obs, add = TRUE, lwd = sim.lwd, freq = FALSE, border = obs.col)
  }
  if (obs.bar == TRUE) {
    if (length(x$obs) > 1) {
      lines(rep(d.obs.median, 2), c(y.min, y.max), col = obs.col, lwd = obs.lwd)
    } else {
      lines(rep(x$obs, 2), c(y.min, y.max), col = obs.col, lwd = obs.lwd)
    }
  }
  if (obs.density == TRUE) {
    if (length(x$obs) > 1) {
      lines(d.obs, col = obs.col, lwd = obs.lwd)
    }
  }
}
leifeld/btergm documentation built on April 3, 2024, 4:49 a.m.