Lab4Package: Eductational Tool to Assist With Visualizing Variables in ggplot2

Documented in graph_code

#' Code and Info for Graphs
#'
#' graph_code is an educational function used to learn about how to write code using the ggplot2 package for different types of graphs generated by the what_graph function {GGHelper}
#'
#' @param df A data frame.
#' @param type The name (in quotes) of the graph of interest.
#' @param x The first variable of interest (will go on the x axis of the graph)
#' @param y An additional variable of interest (will go on the y axis of the graph)
#' @param color An additional variable of interest (will be used as the color argument of the graph)
#'
#' @return graph_code returns a printed output which contains the code to create the graph of interest with variable names in their correct places, along with information about when the usage of the graph of interest.
#'
#' @details
#' It is important that before this function is used, the columns of interest have the correct data type.
#' Specifically, a categorical or discrete variable needs to have a data type of 'factor' and a continuous variable needs to have a data type of 'numeric'
#'
#' Additionally, in order for the function to work, the variables that are input in the function need to be contained inside the data set that is input.
#'
#' @examples
#' graph_code(iris, "Scatter Plot", x = Sepal.Length, y = Sepal.Width)
#'
#' @import dplyr
#'
#' @export

graph_code <- function(df, type, x, y = NULL, color = NULL){

  y <- deparse(substitute(y))
  color <- deparse(substitute(color))

  # 1 variable - continuous

  if (type == "Histogram"){
    cont <- df %>%
      pull({{x}})
    if(is.numeric(cont)){
      df <- deparse(substitute(df))
      x <- deparse(substitute(x))
      return(cat(sep = "", "Code for histogram:\n\n", df, " %>% \n   ggplot(mapping = aes(x = ", x, ")) + \n   geom_histogram()", "\n\nThis graph should be used when analyzing a distribution of a singular continuous variable. \nInside the geom_histogram function you can set the range of each bar of the histogram with 'binwidth ='\nYou can also set the number of bars with 'bins = '"))
    }
    else{stop("The 'x' argument for a histogram needs to have a numeric data type")}

  } else if (type == "Density Plot"){
    cont <- df %>%
      pull({{x}})
    if(is.numeric(cont)){
      df <- deparse(substitute(df))
      x <- deparse(substitute(x))
      return(cat(sep = "","Code for density plot:\n\n", df, " %>% \n   ggplot(mapping = aes(x = ", x, ")) + \n   geom_density()", "\n\nThis graph should be used when analyzing a distribution of a singular continuous variable. \nIt is the continuous and smoothed version of a histogram."))
    }
    else{stop("The 'x' argument for a density plot needs to have a numeric data type")}

  } else if (type == "Normal Probability Plot"){
    cont <- df %>%
      pull({{x}})
    if(is.numeric(cont)){
      df <- deparse(substitute(df))
      x <- deparse(substitute(x))
      return(cat(sep = "", "Code for normal probability plot:\n\n", df, " %>% \n   ggplot(mapping = aes(sample = ", x, ")) + \n   geom_qq() + \n   geom_qq_line()", "\n\nThis graph should be used when analyzing if a singular continuous variable follows a normal distribution. \nIf the plotted points seem to follow a straight diagonal line, the continuous variable most likely follows normal distribution."))
    }
    else{stop("The 'sample' or 'x' argument for a normal probability plot needs to have a numeric data type")}


    # 1 variable - discrete

  } else if (type == "Bar Chart"){
    discrete <- df %>%
      pull({{x}})
    if(is.factor(discrete)){
      df <- deparse(substitute(df))
      x <- deparse(substitute(x))
      return(cat(sep = "", "Code for bar chart:\n\n", df, " %>% \n   ggplot(mapping = aes(x = ", x, ")) + \n   geom_bar()", "\n\nThis graph should be used to represent the distribution of a categorical variable. \nEach bar represents the count of each category."))
    }
    else{stop("The 'x' argument for a bar chart needs to have a factor data type")}


    # 2 variables - both continuous

  } else if (type == "Scatter Plot"){
    if(y == "NULL"){
      stop("A scatter plot should take a continuous variable for argument 'y'")
    }
    else {
      cont1 <- df %>%
        pull({{x}})
      cont2 <- df %>%
        pull(y)
      if(is.numeric(cont1) & is.numeric(cont2)){
        df <- deparse(substitute(df))
        x <- deparse(substitute(x))
        return(cat(sep = "", "Code for scatter plot:\n\n", df, " %>% \n   ggplot(mapping = aes(x = ", x, ", y = ", y, ")) + \n   geom_point()", "\n\nThis graph should be used to observe the relationship between two continuous variables. \nIf the points overlap, you can shake up them up and better see them using geom_point(position = 'jitter'). \nYou can also better see the distribution of data on each axis by typing geom_point() + geom_rug(position = 'jitter'). \nLastly, to better visualize the relationship between x and y with a fitted line, use geom_point() + geom_smooth()."))
      }
      else{stop("A scatter plot takes two numeric continuous variables for arguments 'x' and 'y'")}
    }


    # 2 variables - 1 discrete 1 continuous

  } else if (type == "Column Chart"){
    if(y == "NULL"){
      stop("A column chart should take a continuous variable for argument 'y'")
    }
    else {
      discrete <- df %>%
        pull({{x}})
      cont <- df %>%
        pull(y)
      if(is.factor(discrete) & is.numeric(cont)){
        df <- deparse(substitute(df))
        x <- deparse(substitute(x))
        return(cat(sep = "", "Code for column chart:\n\n", df, " %>% \n   ggplot(mapping = aes(x = ", x, ", y = ", y, ")) + \n   geom_col()", "\n\nThis graph should be used to compare numerical values across different categories of a discrete variable. \nIt is different than a bar chart, because geom_bar only counts the occurences of categories, \nwhile geom_col represents the sum of values of another variable for each category."))
      }
      else{stop("A column chart should take a discrete variable for argument 'x' and a continuous variable for argument 'y'")}
    }

  } else if (type == "Stacked Histogram"){
    if(color == "NULL"){
      stop("A stacked histogram should take a discrete variable for argument 'color'")
    }
    else {
      cont <- df %>%
        pull({{x}})
      discrete <- df %>%
        pull(color)
      if(is.numeric(cont) & is.factor(discrete)){
        df <- deparse(substitute(df))
        x <- deparse(substitute(x))
        return(cat(sep = "", "Code for stacked histogram:\n\n", df, " %>% \n   ggplot(mapping = aes(x = ", x, ", fill = ", color, ")) + \n   geom_histogram()", "\n\nThis graph should be used when analyzing a distribution of a singular continuous variable.\nUnlike a normal histogram, it allows you to see how the distribution differs across categories of a discrete variable. \nInside the geom_histogram function you can set the range of each bar of the histogram with 'binwidth ='\nYou can also set the number of bars with 'bins = ' "))
      }
      else{stop("A stacked histogram should take a continuous variable for argument 'x' and a discrete variable for argument 'color'")}
    }

  } else if (type == "Box Plot"){
    if(y == "NULL"){
      stop("A box plot should take a continuous variable for argument 'y'")
    }
    else {
      discrete <- df %>%
        pull({{x}})
      cont <- df %>%
        pull(y)
      if(is.factor(discrete) & is.numeric(cont)){
        df <- deparse(substitute(df))
        x <- deparse(substitute(x))
        return(cat(sep = "", "Code for box plot:\n\n", df, " %>% \n   ggplot(mapping = aes(x = ", x, ", y = ", y, ")) + \n   geom_boxplot()", "\n\nThis graph should be used when analyzing the distribution of a numeric variable across categories of a discrete variable. \nFor each category, a box plot displays the min, max, median, interquartile range, and outliers of the numeric variable."))
      }
      else{stop("A box plot should take a discrete variable for argument 'x' and a continuous variable for argument 'y'")}
    }

  } else if (type == "Violin Plot"){
    if(y == "NULL"){
      stop("A violin plot should take a continuous variable for argument 'y'")
    }
    else {
      discrete <- df %>%
        pull({{x}})
      cont <- df %>%
        pull(y)
      if(is.factor(discrete) & is.numeric(cont)){
        df <- deparse(substitute(df))
        x <- deparse(substitute(x))
        return(cat(sep = "", "Code for violin plot:\n\n", df, " %>% \n   ggplot(mapping = aes(x = ", x, ", y = ", y, ")) + \n   geom_violin()", "\n\nThis graph should be used when analyzing the distribution of a numeric variable across categories of a discrete variable. \nUnlike a box plot, instead of focusing on summary statistics, a violin plot displays the density of the numeric variable."))
      }
      else{stop("A violin plot should take a discrete variable for argument 'x' and a continuous variable for argument 'y'")}
    }

  } else if (type == "Multi Density Plot"){
    if(color == "NULL"){
      stop("A multi density plot should take a discrete variable for argument 'color'")
    }
    else {
      cont <- df %>%
        pull({{x}})
      discrete <- df %>%
        pull(color)
      if(is.numeric(cont) & is.factor(discrete)){
        df <- deparse(substitute(df))
        x <- deparse(substitute(x))
        return(cat(sep = "", "Code for multi density plot:\n\n", df, " %>% \n   ggplot(mapping = aes(x = ", x, ", color = ", color, ")) + \n   geom_density()", "\n\nThis graph should be used when analyzing a distribution of a singular continuous variable across multiple categories.\nAdding a color argument allows you to compare multiple smoothed density plots at once."))
      }
      else{stop("A multi density plot should take a continuous variable for argument 'x' and a discrete variable for argument 'color'")}
    }


    # 2 variables - both discrete

  } else if (type == "Stacked Bar Chart"){
    if(color == "NULL"){
      stop("A stacked bar chart should take a discrete variable for argument 'color'")
    }
    else {
      discrete1 <- df %>%
        pull({{x}})
      discrete2 <- df %>%
        pull(color)
      if(is.factor(discrete1) & is.factor(discrete2)){
        df <- deparse(substitute(df))
        x <- deparse(substitute(x))
        return(cat(sep = "", "Code for stacked bar chart:\n\n", df, " %>% \n   ggplot(mapping = aes(x = ", x, ", fill = ", color, ")) + \n   geom_bar()", "\n\nThis graph should be used to represent the distribution of a categorical variable across another categorical variable. \n- Each bar represents the count of each category. \n- Each color represents the distribution of the counts for the other categorical variable of interest. \nUnlike a grouped bar chart, the colors are stacked rather than next to each other."))
      }
      else{stop("A stacked bar chart should take discrete variables for arguments 'x' and 'color'")}
    }

  } else if (type == "Grouped Bar Chart"){
    if(color == "NULL"){
      stop("A grouped bar chart should take a discrete variable for argument 'color'")
    }
    else {
      discrete1 <- df %>%
        pull({{x}})
      discrete2 <- df %>%
        pull(color)
      if(is.factor(discrete1) & is.factor(discrete2)){
        df <- deparse(substitute(df))
        x <- deparse(substitute(x))
        return(cat(sep = "", "Code for grouped bar chart:\n\n", df, " %>% \n   ggplot(mapping = aes(x = ", x, ", fill = ", color, ")) + \n   geom_bar(position = 'dodge')", "\n\nThis graph should be used to represent the distribution of a categorical variable across another categorical variable. \nUnlike a stacked bar chart, the colors are next to each other rather than on top of each other."))
      }
      else{stop("A grouped bar chart should take discrete variables for arguments 'x' and 'color'")}
    }

  } else if (type == "Bubble Plot"){
    if(y == "NULL"){
      stop("A bubble plot should take a discrete variable for argument 'y'")
    }
    else {
      discrete1 <- df %>%
        pull({{x}})
      discrete2 <- df %>%
        pull(y)
      if(is.factor(discrete1) & is.factor(discrete2)){
        df <- deparse(substitute(df))
        x <- deparse(substitute(x))
        return(cat(sep = "", "Code for bubble plot:\n\n", df, " %>% \n   ggplot(mapping = aes(x = ", x, ", y = ", y, ")) + \n   geom_count()", "\n\nThis graph should be used in order to compare the amount of observations at each combination of two discrete variables."))
      }
      else{stop("A bubble plot should take discrete variables for arguments 'x' and 'y'")}
    }


    # 3 variables - all continuous

  } else if (type == "Gradient Scatter Plot"){
    if(y == "NULL" || color == "NULL"){
      stop("A gradient scatter plot should take continuous variables for arguments 'y' and 'color'")
    }
    else {
      cont1 <- df %>%
        pull({{x}})
      cont2 <- df %>%
        pull(y)
      cont3 <- df %>%
        pull(color)
      if(is.numeric(cont1) & is.numeric(cont2) & is.numeric(cont3)){
        df <- deparse(substitute(df))
        x <- deparse(substitute(x))
        return(cat(sep = "", "Code for gradient scatter plot:\n\n", df, " %>% \n   ggplot(mapping = aes(x = ", x, ", y = ", y, ", color = ", color, ")) + \n   geom_point()", "\n\nThis graph should be used to observe the relationship between three continuous variables. \nAdding the color argument to a scatter plot allows you to visualize a 3 dimensional relationship. \nIf the points overlap, you can shake up them up and better see them using geom_point(position = 'jitter'). \nYou can also better see the distribution of data on each axis by typing geom_point() + geom_rug(position = 'jitter'). \nLastly, to better see the relationship between x and y with a fitted line, use geom_point() + geom_smooth()."))
      }
      else{stop("A gradient scatter plot should take continuous variables for arguments 'x', 'y', and 'color'")}
    }

  } else if (type == "Grouped Scatter Plot"){
    if(y == "NULL" || color == "NULL"){
      stop("A grouped scatter plot should take a continuous variable for argument 'y' and a discrete variable for argument 'color'")
    }
    else {
      cont1 <- df %>%
        pull({{x}})
      cont2 <- df %>%
        pull(y)
      discrete <- df %>%
        pull(color)
      if(is.numeric(cont1) & is.numeric(cont2) & is.factor(discrete)){
        df <- deparse(substitute(df))
        x <- deparse(substitute(x))
        return(cat(sep = "", "Code for grouped scatter plot:\n\n", df, " %>% \n   ggplot(mapping = aes(x = ", x, ", y = ", y, ", color = ", color, ")) + \n   geom_point()", "\n\nThis graph should be used to display the relationship between two quantitative variables and one categorical variable. \n- The result of this graph is color coded points on a scatter plot. \nIf the points overlap, you can shake up them up and better see them using geom_point(position = 'jitter'). \nYou can also better see the distribution of data on each axis by typing geom_point() + geom_rug(position = 'jitter'). \nUsing geom_point() + geom_smooth() will generate a smoothing line for each color."))
      }
      else{stop("A grouped scatter plot should take continuous variables for arguments 'x' and 'y', and a discrete variable for argument 'color'")}
    }

  } else if (type == "Stacked Column Chart"){
    if(y == "NULL" || color == "NULL"){
      stop("A stacked column chart should take a continuous variable for argument 'y' and a discrete variable for argument 'color'")
    }
    else {
      discrete1 <- df %>%
        pull({{x}})
      cont <- df %>%
        pull(y)
      discrete2 <- df %>%
        pull(color)
      if(is.factor(discrete1) & is.numeric(cont) & is.factor(discrete2)){
        df <- deparse(substitute(df))
        x <- deparse(substitute(x))
        return(cat(sep = "", "Code for stacked column chart:\n\n", df, " %>% \n   ggplot(mapping = aes(x = ", x, ", y = ", y, ", fill = ", color, ")) + \n   geom_col()", "\n\nThis graph should be used to compare numerical values across different categories of two discrete variables. \nIt is different than a stacked bar char, because geom_bar only counts the occurences of categories, \nwhile geom_col represents the sum of values of another variable for each category. \nEach color represents the distribution of the values for the other categorical variable of interest."))
      }
      else{stop("A stacked column chart should take discrete variables for arguments 'x' and 'color', and a continuous variable for argument 'y'")}
    }

  } else if (type == "Grouped Box Plot"){
    if(y == "NULL" || color == "NULL"){
      stop("A grouped box plot should take a continuous variable for argument 'y' and a discrete variable for argument 'color'")
    }
    else {
      discrete1 <- df %>%
        pull({{x}})
      cont <- df %>%
        pull(y)
      discrete2 <- df %>%
        pull(color)
      if(is.factor(discrete1) & is.numeric(cont) & is.factor(discrete2)){
        df <- deparse(substitute(df))
        x <- deparse(substitute(x))
        return(cat(sep = "", "Code for grouped box plot:\n\n", df, " %>% \n   ggplot(mapping = aes(x = ", x, ", y = ", y, ", fill = ", color, ")) + \n   geom_boxplot()", "\n\nThis graph should be used when analyzing the distribution of a numeric variable across categories of a two discrete variables. \nFor each combination of categories, a box plot displays the min, max, median, interquartile range, and outliers of the numeric variable. \nOne discrete variable is distinguished by labels on the x axis, and the other is distinguished by color."))
      }
      else{stop("A grouped box plot should take discrete variables for arguments 'x' and 'color', and a continuous variable for argument 'y'")}
    }

  } else if (type == "Grouped Violin Plot"){
    if(y == "NULL" || color == "NULL"){
      stop("A grouped violin plot should take a continuous variable for argument 'y' and a discrete variable for argument 'color'")
    }
    else {
      discrete1 <- df %>%
        pull({{x}})
      cont <- df %>%
        pull(y)
      discrete2 <- df %>%
        pull(color)
      if(is.factor(discrete1) & is.numeric(cont) & is.factor(discrete2)){
        df <- deparse(substitute(df))
        x <- deparse(substitute(x))
        return(cat(sep = "", "Code for grouped violin plot:\n\n", df, " %>% \n   ggplot(mapping = aes(x = ", x, ", y = ", y, ", fill = ", color, ")) + \n   geom_violin()", "\n\nThis graph should be used when analyzing the distribution of a numeric variable across categories of two discrete variables. \nUnlike a box plot, instead of focusing on summary statistics, a violin plot displays the density of the numeric variable. \nIn a grouped violin plot, one discrete variable is distinguished by labels on the x axis, and the other by color."))
      }
      else{stop("A grouped violin plot should take discrete variables for arguments 'x' and 'color', and a continuous variable for argument 'y'")}
    }
  } else{stop("Name of graph not recognized in 'type' argument. Please input the type of graph in quotes. \nUse the what_graph function to return possible values for 'type'.")}
}
rachael-ryan/Lab4Package documentation built on June 11, 2022, 7:20 a.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
rachael-ryan/Lab4Package
Eductational Tool to Assist With Visualizing Variables in ggplot2

R/graph_code.R
In rachael-ryan/Lab4Package: Eductational Tool to Assist With Visualizing Variables in ggplot2

Defines functions graph_code

Documented in graph_code

R Package Documentation

Browse R Packages

We want your feedback!

rachael-ryan/Lab4Package Eductational Tool to Assist With Visualizing Variables in ggplot2

R/graph_code.R In rachael-ryan/Lab4Package: Eductational Tool to Assist With Visualizing Variables in ggplot2

Defines functions graph_code

Documented in graph_code

R Package Documentation

Browse R Packages

We want your feedback!

rachael-ryan/Lab4Package
Eductational Tool to Assist With Visualizing Variables in ggplot2

R/graph_code.R
In rachael-ryan/Lab4Package: Eductational Tool to Assist With Visualizing Variables in ggplot2