library(tigerstats)
The aim of the project is to write an R package that provides an Addin for interactively consructing a plot using the the ggplot2
package. The user highlights the name of a data frame in an R script or R Markdown document, calls up the addin, and creates a plot, using the Addin, based upon variables in the selected data frame.. When the user exits the Addin, the ggplot2
code neded to produce the plot is emitted to the user's source document in place of the name of the data frame.
Here are few examples of the most basic plot we would like for the user to be able to create:
A scatterplot:
ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) + geom_point(na.rm = TRUE)
A bar chart based on a single factor variable:
ggplot(data = m111survey, mapping = aes(x = seat)) + geom_bar(na.rm = T)
A bar chart based on two factor variables:
ggplot(data = m111survey, mapping = aes(x = sex)) + geom_bar(na.rm = T, mapping = aes(fill = weight_feel), position = "dodge")
A histogram:
ggplot(data = m111survey, mapping = aes(x = fastest)) + geom_histogram()
A density plot:
ggplot(data = m111survey, mapping = aes(x = fastest)) + geom_density()
A box-and-whiskers plot (crazy hard to make!):
ggplot(data = m111survey, mapping = aes(x = factor(0), y = fastest)) + geom_boxplot() + scale_x_discrete(breaks = NULL) + xlab("")
A box plot based upon a numerical and a factor variable:
ggplot(data = m111survey, mapping = aes(x = seat, y = fastest)) + geom_boxplot()
How about violin plots? (Density plots pasted together!)
ggplot(data = m111survey, mapping = aes(x = seat, y = fastest)) + geom_violin()
We want the user to be able to set the title of the plot, and to label the axes, for example:
ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) + geom_point(na.rm = TRUE) + labs(title = "GPA vs. Speed", x = "fastest speed ever driven (mph)", y = "Grade-Point Average")
Within a plot we would like the user to be able to group with respect to a new variable, based upon such characteristics as color or shape. For example, grouping by color:
ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) + geom_point(mapping = aes(color = sex), na.rm = TRUE)
Grouping by shape:
ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) + geom_point(mapping = aes(shape = sex), na.rm = TRUE)
It makes sense for some plots to be combined. For example, a density plot and a histogram:
ggplot(data = m111survey, mapping = aes(x = fastest)) + geom_histogram(aes(y = ..density..)) + geom_density()
Anohter option we should have is a combination of boxplots, violin plots, and the individual values jittered:
ggplot(data = m111survey, mapping = aes(x = sex, y = fastest)) + geom_boxplot(fill = "grey", width = 0.5, outlier.size = 0) + geom_violin(alpha = 0.5, fill = "burlywood") + geom_jitter(width = 0.2)
Scatter plots should be able to include regression lines:
ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) + geom_point(mapping = aes(color = sex), na.rm = TRUE) + geom_smooth(mapping = aes(color = sex), method = "lm", se = F, na.rm = T)
They should also be able to include loess curves:
ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) + geom_point(mapping = aes(color = sex), na.rm = TRUE) + geom_smooth(mapping = aes(color = sex), se = F, na.rm = T)
Both regression lines and loess curves can come with confidence bands, if you like:
ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) + geom_point(mapping = aes(color = sex), na.rm = TRUE) + geom_smooth(mapping = aes(color = sex), method = "lm", na.rm = T)
We would like to be able to make separate panels for each of the values of a new factor variable. This is called facetting:
ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) + geom_point(mapping = aes(color = sex), na.rm = TRUE) + facet_grid(. ~ weight_feel)
We should be able to go the other way, too:
ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) + geom_point(mapping = aes(color = sex), na.rm = TRUE) + facet_grid(weight_feel ~ .)
If possible, we should be able to facet by two factor variables at once:
ggplot(data = m111survey, mapping = aes(x = fastest, y= GPA)) + geom_point(mapping = aes(color = sex), na.rm = TRUE) + facet_grid(weight_feel ~ seat)
There are many more possibilities for customization. It's unlikely that you will be able to cover all of them in one semester, but here are some to be thinking about.
When working with factor variables, we may want to choose our own names for the values. Thus, instead of:
ggplot(data = m111survey, mapping = aes(x = seat, y = fastest)) + geom_violin()
We might have:
ggplot(data = m111survey, mapping = aes(x = seat, y = fastest)) + geom_violin() + scale_x_discrete(labels = c("front", "middle", "back"))
You can change the title of a legend, and its position relative to the legend:
ggplot(data = m111survey, mapping = aes(x = sex)) + geom_bar(mapping = aes(y = (..count..)/sum(..count..)*100, fill = weight_feel), position = "dodge", na.rm = T) + ylab("Percentage") + scale_fill_discrete(guide = guide_legend(title = "Feeling about weight", title.position = "top"))
You can change the number of rows in a legend, the size of a legend title, the position of the legend itself, and more!
ggplot(data = m111survey, mapping = aes(x = sex)) + geom_bar(mapping = aes(y = (..count..)/sum(..count..)*100, fill = weight_feel), position = "dodge", na.rm = T) + ylab("Percentage") + scale_fill_discrete(guide = guide_legend(title = "Feeling about weight", nrow = 2, title.position = "bottom")) + theme(legend.title = element_text(size = rel(0.9)), legend.position = "top")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.