library(tigerstats)

A ggplot2 Addin, as a Package

The aim of the project is to write an R package that provides an Addin for interactively consructing a plot using the the ggplot2 package. The user highlights the name of a data frame in an R script or R Markdown document, calls up the addin, and creates a plot, using the Addin, based upon variables in the selected data frame.. When the user exits the Addin, the ggplot2 code neded to produce the plot is emitted to the user's source document in place of the name of the data frame.

Examples of Basic Plots

Here are few examples of the most basic plot we would like for the user to be able to create:

A scatterplot:

ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) +
  geom_point(na.rm = TRUE)

A bar chart based on a single factor variable:

ggplot(data = m111survey, mapping = aes(x = seat)) +
  geom_bar(na.rm = T)

A bar chart based on two factor variables:

ggplot(data = m111survey, mapping = aes(x = sex)) +
  geom_bar(na.rm = T, mapping = aes(fill = weight_feel), position = "dodge")

A histogram:

ggplot(data = m111survey, mapping = aes(x = fastest)) +
  geom_histogram()

A density plot:

ggplot(data = m111survey, mapping = aes(x = fastest)) +
  geom_density()

A box-and-whiskers plot (crazy hard to make!):

ggplot(data = m111survey, mapping = aes(x = factor(0), y = fastest)) +
  geom_boxplot() + scale_x_discrete(breaks = NULL) + xlab("")

A box plot based upon a numerical and a factor variable:

ggplot(data = m111survey, mapping = aes(x = seat, y = fastest)) +
  geom_boxplot()

How about violin plots? (Density plots pasted together!)

ggplot(data = m111survey, mapping = aes(x = seat, y = fastest)) +
  geom_violin()

Titles and axis-labels:

We want the user to be able to set the title of the plot, and to label the axes, for example:

ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) +
  geom_point(na.rm = TRUE) + 
  labs(title = "GPA vs. Speed", x = "fastest speed ever driven (mph)",
       y = "Grade-Point Average")

Grouping in a Plot

Within a plot we would like the user to be able to group with respect to a new variable, based upon such characteristics as color or shape. For example, grouping by color:

ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) +
  geom_point(mapping = aes(color = sex), na.rm = TRUE)

Grouping by shape:

ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) +
  geom_point(mapping = aes(shape = sex), na.rm = TRUE)

Combining Geoms

It makes sense for some plots to be combined. For example, a density plot and a histogram:

ggplot(data = m111survey, mapping = aes(x = fastest)) +
  geom_histogram(aes(y = ..density..)) + geom_density()

Anohter option we should have is a combination of boxplots, violin plots, and the individual values jittered:

ggplot(data = m111survey, mapping = aes(x = sex, y = fastest)) +
  geom_boxplot(fill = "grey", width = 0.5, outlier.size = 0) + 
  geom_violin(alpha = 0.5, fill = "burlywood") +
  geom_jitter(width = 0.2)

Scatter plots should be able to include regression lines:

ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) +
  geom_point(mapping = aes(color = sex), na.rm = TRUE) +
  geom_smooth(mapping = aes(color = sex), method = "lm", se = F, na.rm = T)

They should also be able to include loess curves:

ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) +
  geom_point(mapping = aes(color = sex), na.rm = TRUE) +
  geom_smooth(mapping = aes(color = sex), se = F, na.rm = T)

Both regression lines and loess curves can come with confidence bands, if you like:

ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) +
  geom_point(mapping = aes(color = sex), na.rm = TRUE) +
  geom_smooth(mapping = aes(color = sex), method = "lm", na.rm = T)

Facetting

We would like to be able to make separate panels for each of the values of a new factor variable. This is called facetting:

ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) +
  geom_point(mapping = aes(color = sex), na.rm = TRUE) +
  facet_grid(. ~ weight_feel)

We should be able to go the other way, too:

ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) +
  geom_point(mapping = aes(color = sex), na.rm = TRUE) +
  facet_grid(weight_feel ~ .)

If possible, we should be able to facet by two factor variables at once:

ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) +
  geom_point(mapping = aes(color = sex), na.rm = TRUE) +
  facet_grid(weight_feel ~ seat)

Even More?

There are many more possibilities for customization. It's unlikely that you will be able to cover all of them in one semester, but here are some to be thinking about.

When working with factor variables, we may want to choose our own names for the values. Thus, instead of:

ggplot(data = m111survey, mapping = aes(x = seat, y = fastest)) +
  geom_violin()

We might have:

ggplot(data = m111survey, mapping = aes(x = seat, y = fastest)) +
  geom_violin() +
  scale_x_discrete(labels = c("front", "middle", "back"))

You can change the title of a legend, and its position relative to the legend:

ggplot(data = m111survey, mapping = aes(x = sex)) +
  geom_bar(mapping = aes(y = (..count..)/sum(..count..)*100, fill = weight_feel), 
           position = "dodge", 
           na.rm = T) + ylab("Percentage") +
  scale_fill_discrete(guide = guide_legend(title = "Feeling about weight",
                                           title.position = "top"))

You can change the number of rows in a legend, the size of a legend title, the position of the legend itself, and more!

ggplot(data = m111survey, mapping = aes(x = sex)) +
  geom_bar(mapping = aes(y = (..count..)/sum(..count..)*100, fill = weight_feel), 
           position = "dodge", 
           na.rm = T) + ylab("Percentage") +
  scale_fill_discrete(guide = guide_legend(title = "Feeling about weight", nrow = 2,
                                           title.position = "bottom")) +
  theme(legend.title = element_text(size = rel(0.9)),
        legend.position = "top")


homerhanumat/addinggplot2 documentation built on May 17, 2019, 4:50 p.m.