plot_replace_missing: plot_replace_missing

View source: R/plot_replace_missing.R

plot_replace_missingR Documentation

plot_replace_missing

Description

Function plots counts of missing values for a data frame's variables along with possible replacement.

A ggplot2 bar plot of numeric missing value counts is produced along with the option to replace the values. TODO: Be able to count/replace non-numeric missing values of a data frame.

If the argument 'replace_fun' is NULL then only a bar chart showing the missing value count for each variable is returned.

Usage

plot_replace_missing(
  df,
  variables = NULL,
  replace_fun = NULL,
  miss_values = NULL,
  title = NULL,
  subtitle = NULL,
  center_titles = FALSE,
  x_title = "Variables",
  y_title = "Missing Counts",
  rot_x_tic_angle = 0,
  bar_fill = "gray",
  bar_color = "black",
  bar_alpha = 1,
  bar_lwd = 0.7,
  bar_width = NULL,
  y_limits = NULL,
  y_major_breaks = waiver(),
  do_coord_flip = FALSE,
  order_bars = NULL,
  bar_labels = FALSE,
  bar_label_sz = 6,
  bar_label_color = "black"
)

Arguments

df

The source data frame with numeric and character variables.

variables

A character vector of numeric variable names from 'df' to be included in the plot and possible value replacement.

replace_fun

A character string or function that sets the aggregate function for replacing missing values. Acceptable values are "mean", "median", "locf" (last observation carried forward), "nocb" (next observation carried backward). The parameter can also be a user defined function that accepts a vector of non-missing values for a column (as determined by 'miss_values') and returns a single replacement value. See an example below.

miss_values

A vector with numeric and character values that define in addition to NA and NaN, other values considered as missing. Examples might be a vector with "na", "N/A", 999.

title

A string that sets the plot title.

subtitle

A string that sets the plot subtitle.

center_titles

A logical which if TRUE centers both the 'title' and 'subtitle'.

x_title

A string that sets the x axis title. The default is "Variables". If NULL then the x axis title does not appear.

y_title

A string that sets the y axis title. The default is "Missing Counts". If NULL then the y axis title does not appear.

rot_x_tic_angle

A numeric that sets the angle of rotation for the x tic label. When x tic labels are long,

bar_fill

A string that sets the fill color for the bars.

bar_color

A string that sets the outline color for the bars.

bar_alpha

A numeric that sets the alpha component to 'bar_color'.

bar_lwd

A numeric that sets the outline thickness of the bars.

bar_width

A numeric that sets the width of the bars.

y_limits

A numeric 2 element vector that sets the minimum and maximum for the y axis. Use NA to refer to the existing minimum and maximum.

y_major_breaks

A numeric vector or function that defines the exact major tic locations along the y axis.

do_coord_flip

A logical which if TRUE will flip the x and y axis'.

order_bars

A string which will order the bars in a specific direction. Acceptable values are "asc" or "desc"

bar_labels

A logical which if TRUE will label each bar with its value.

bar_label_sz

A numeric that sets the size of the bar label

bar_label_color

A string that sets the color of the bar labels

Value

Returning a named list with:

  1. "missing_plot" – a ggplot2 plot object where additional aesthetics may be added.

  2. "replacement_df" – a data.table copy of 'df' with missing values replaced if 'replace_fun' is not NULL.

Examples

library(ggplot2)
library(data.table)
library(mlbench)
library(RplotterPkg)

data("Soybean", package = "mlbench")
for(i in 2:ncol(Soybean)){
  Soybean[,i] <- as.numeric(Soybean[,i])
}

columns_of_interest <- colnames(Soybean)[2:ncol(Soybean)]
Soybean$date[[3]] <- NA
Soybean$date[[4]] <- 99
Soybean$leaves[[4]] <- NA
Soybean$leaves[[5]] <- "N/A"
Soybean$leaves[[6]] <- "na"
Soybean$leaves[[7]] <- NA
Soybean$leaves[[8]] <- NaN

missing_val_fun <- function(x){
  xx <- as.numeric(x)
  return((max(xx) - min(xx))/2)
}

soybean_missing_lst <- RregressPkg::plot_replace_missing(
  df = Soybean,
  variables = columns_of_interest,
  replace_fun = missing_val_fun,
  miss_values = c("N/A", "na", 99),
  title = "Count of Missing Values",
  subtitle = "mlbench::Soybean data set",
  x_title = "Variable",
  y_title = "Count of Missing Values",
  bar_lwd = 0.6,
  bar_color = "white",
  bar_labels = TRUE,
  bar_label_sz = 3,
  do_coord_flip = TRUE,
  order_bars = "asc"
)


deandevl/RregressPkg documentation built on Feb. 5, 2025, 12:11 p.m.