library(learnr) library(gradethis) library(knitr) gradethis::gradethis_setup() tutorial_options(exercise.timelimit = 60) knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE) # Ensure that library is loaded. library(tidyverse)
# Ensure that the data is loaded for the remainder of this tutorial. consumers <- UsingRTutorials::consumers # # Alternatives: readr::read_csv("http://82.196.4.233:3838/www/consumers.csv") or readr::read_csv("data/consumers.csv")
Dr. Christin Scholz, c.scholz@uva.nl
Assistant Professor Health & Persuasive Communication
ACHC Communication, Brain & Society Lab, www.cobras-lab.com
Dr. Wouter de Nooy
Senior Lecturer Research Methods, w.denooy@uva.nl
Tell us about yourself: study programme and programming experience.
At the end of this class, you will NOT know EVERYTHING about R.
We hope you will:
Book:
Hadley Wickham & Garrett Grolemund (2016). R for Data Science. O’Reilly.
Online version: http://r4ds.had.co.nz/. Different chapter numbering!
AFTER EXTENSIVE trial and error: online help is available for exercises (e.g. https://jrnold.github.io/r4ds-exercise-solutions/)
Additional materials: Canvas.
data.frame(Week = c(1, 1, 2, 2, 3, 3, 4, 4), Date = c("Jan 10", "Jan 12", "Jan 17", "Jan 19", "Jan 24", "Jan 26", "Jan 31", "Feb 2"), Topics = c( " A New Way of Working: Preface, Part I, Ch. 1-2 {1-4}", " Descriptive Statistics and Reports: Ch. 3-6, 21 {5-8, 27}", " Principles of Database Management: Ch. 7-10 {10-13}", " Handling Special Types of Data: Ch. 11-13 {14-16}", " Programming: Ch. 14-17 up to p. 322 {18-21 up to 21.3}", " Modelling: Sections 3-4 in _Help, My Collaborator Uses R!_ instead of Ch. 18-20 {23-25}", " Communicating results: Ch. (21) 22-24 {(27) 28-30}", " Project presentation and final report submission" )) %>% knitr::kable(format = "html", col.names = c("Week", "Date", "Topics & Chapters"), align = c("l", "l", "l")) %>% kableExtra::kable_styling(bootstrap_options = "striped", full_width = TRUE) %>% kableExtra::add_footnote(label = "{} = online chapter numbers", notation = "none")
Course Content (~1.5 hours)
Data Project Collaboration (~1.5 hours)
Thu: New problem set (due Sunday).
Use your study buddy, come prepared, and bring questions.
…an optional source of knowledge, advice, and support.
Why have a study buddy?
Recommendations
We will assign a study buddy to you.
Final grade:
Selection of exercises like those in R for Data Science.
Assigned to student on Thursday as Canvas assignment.
Submission deadline: following Sunday.
knitr::include_graphics("images/wolf.png")
Source: https://interaktiv.morgenpost.de/woelfe-in-deutschland/
A visualization requires a lot of data wrangling:
Some examples of data visualizations:
Books:
SCRUM is a framework for managing teamwork in a systematic and empirically informed manner.
What is SCRUM(-Light)?
Why SCRUM(-Light)?
The Canvas data project data sets module page contains the data and data descriptions.
Who will work on which complex dataset (and with whom)?
You installed the latest versions, didn’t you?
In addition, install this tutorial:
remotes
package: install.packages("remotes")
in the RStudio console.learnr
package: install.packages("learnr")
gradethis
package: remotes::install_github("rstudio-education/gradethis")
learnr
package.DONE (gradethis)
.remotes::install_github("WdeNooy/UsingRTutorials")
no
to question Do you want to install from sources the package which needs compilation?
.DONE (UsingRTutorials)
.Now you can start the first tutorial:
learnr::run_tutorial("session1", "UsingRTutorials")
in the RStudio console.Alternatively:
Your tutorial answers are saved until you press Start Over.
Use button to stop the tutorial.
We use functions in R to accomplish something.
funtion_name(argument_name = value, ...)
Function arguments specify the input for a function:
seq(from = 1, to = 5)
.seq(1, 5)
. We recommend named arguments.
seq(1, 15, 2)
seq(from = 1, to = 15, by = 2)
gradethis::grade_code( incorrect = "Supply each argument name with an equals sign." )
seq(1, 15, length.out = )
seq(10, 1, 1)
We can assign the results of a function to a data object:
left_hand_object <- funtion_name(argument_name = value, ...)
If left-hand data object:
In the previous exercises, the function results were send to the screen.
seq(from = 1, to = 20, by = 1) seq(from = 10, to = 20, by = 1) seq(from = 10, to = 10, by = 1)
my_output <- seq(from = 1, to = 20, by = 1) my_Output <- seq(from = 10, to = 20, by = 1) my_output <- seq(from = 10, to = 10, by = 1)
gradethis::grade_code()
gradethis::grade_result( pass_if(10, "The first data object created with the name `my_output` is overwritten by the data object last created because they have the same name. In this way, a data object may not contain the data that you initially intended it to contain. Be careful!"), fail_if(~ TRUE, "If you run 'my_output', you can see what it contains.") )
# Ensure that the data objects are available in the tutorial. my_Output <- seq(10, 20, 1) my_output <- seq(10, 10, 1)
rm()
rm(my_output, my_Output)
gradethis::grade_code( incorrect = "Names of data objects must be used only once and be separated by commas." )
The trend is towards visualizing data properties rather than tabulating them.
# Summary of brand awareness by gender and wom. means <- consumers %>% group_by(`Gender` = gender, `Word of mouth` = wom) %>% summarise(`Average brand awareness` = round(mean(brand_aw), digits = 1)) # Numeric summary. knitr::kable(means, "html") %>% kableExtra::kable_styling(font_size = 16, full_width = FALSE, position = "left")
means %>% ggplot() + geom_bar(aes(x = `Word of mouth`, y = `Average brand awareness`, fill = Gender), stat = "identity", position = "dodge") + theme_bw(base_size = 12) + scale_x_discrete(name = "Heard of brand by word of mouth") + scale_fill_discrete(name = "") + theme(legend.position = "top", plot.background = element_blank()) #see ggplot book rm(means)
ggplot()
Philosophy: Grammar of Graphics (Leland Wilkinson)
We will use some (fake) consumers
data about a particular brand and exposure to an advertising campaign for the brand. These are the variables:
tibble::tibble( `Variable name` = c("ad_expo", "wom", "gender", "brand_aw", "firstname"), `Variable Label` = c("Exposure to the campaign", "Heard about the brand through word of mouth", "Gender of the respondent", "Awareness of the brand", "Respondent's first name"), `Value Labels`= c("1 = No exposure; 10 = Max exposure", "yes, no", "female, male", "1 = Not aware; 10 = Max aware", "")) %>% knitr::kable(booktabs = TRUE)
# Standard ggplot plot with title and axis labels. ggplot2::ggplot( data = consumers, mapping = aes(x = ad_expo, y = brand_aw) )+ geom_point( mapping = aes(color = gender, shape = wom), size = 4 ) + geom_smooth( method = "lm", formula = y ~x, se = FALSE, color = "black" ) + labs( title = "Does brand awareness depend on exposure, word-of-mouth, and gender?", x = "Exposure to the campaign", y = "Brand awareness" )
We are going to recreate the above plot in steps.
ggplot( data = consumers )
ggplot( data = consumers, mapping = aes(x = ad_expo, y = brand_aw) )+ geom_point()
gradethis::grade_code()
# Standard ggplot plot with title and axis labels. ggplot2::ggplot( data = consumers, mapping = aes(x = ad_expo, y = brand_aw) )+ geom_point( mapping = aes(color = gender, shape = wom), size = 4 ) + geom_smooth( method = "lm", formula = y ~x, se = FALSE, color = "black" ) + labs( title = "Does brand awareness depend on exposure, word-of-mouth, and gender?", x = "Exposure to the campaign", y = "Brand awareness" )
# Copy the solution to the preceding exercise here as your starting point. ggplot( data = consumers )
ggplot( data = consumers, mapping = aes(x = ad_expo, y = brand_aw) )+ geom_point( mapping = aes(shape = wom, color = gender), size = 4 )
gradethis::grade_code( correct = "Dot colour and shape are linked to a variable, so they must be inside an aes() function. In contrast, dot size is constant (one value for all dots), so it must be outside an aes() function." )
# Standard ggplot plot with title and axis labels. ggplot2::ggplot( data = consumers, mapping = aes(x = ad_expo, y = brand_aw) )+ geom_point( mapping = aes(color = gender, shape = wom), size = 4 ) + geom_smooth( method = "lm", formula = y ~x, se = FALSE, color = "black" ) + labs( title = "Does brand awareness depend on exposure, word-of-mouth, and gender?", x = "Exposure to the campaign", y = "Brand awareness" )
# Copy the solution to the preceding exercise here as your starting point. ggplot( data = consumers )
ggplot( data = consumers, mapping = aes(x = ad_expo, y = brand_aw) )+ geom_point( mapping = aes(shape = wom, color = gender), size = 4 ) + geom_smooth( method = "lm", se = FALSE, color = "black" )
gradethis::grade_code()
# Standard ggplot plot with title and axis labels. ggplot2::ggplot( data = consumers, mapping = aes(x = ad_expo, y = brand_aw) )+ geom_point( mapping = aes(color = gender, shape = wom), size = 4 ) + geom_smooth( method = "lm", formula = y ~x, se = FALSE, color = "black" ) + labs( title = "Does brand awareness depend on exposure, word-of-mouth, and gender?", x = "Exposure to the campaign", y = "Brand awareness" )
# Copy the solution to the preceding exercise here as your starting point. ggplot( data = consumers )
ggplot( data = consumers, mapping = aes(x = ad_expo, y = brand_aw) )+ geom_point( mapping = aes(shape = wom, color = gender), size = 4 ) + geom_smooth( method = "lm", se = FALSE, color = "black" ) + labs( title = "Does brand awareness depend on exposure, word-of-mouth, and gender?", x = "Exposure to the campaign", y = "Brand awareness" )
gradethis::grade_code()
Different graphs for different groups of observations.
# Adaptation: word of mouth as facets instead of shape. ggplot( data = consumers, mapping = aes(x = ad_expo, y =brand_aw) )+ geom_point( mapping = aes(color = gender), size = 4 ) + geom_smooth( mapping = aes(color = gender), method = "lm", se = FALSE ) + labs( title = "Does brand awareness depend on exposure, word-of-mouth, and gender?", x = "Exposure to the campaign", y = "Brand awareness" ) + facet_wrap(vars(wom))
# Copy the solution to the preceding exercise here as your starting point. ggplot()
# Adaptation: word of mouth as facets instead of shape. ggplot( data = consumers, mapping = aes(x = ad_expo, y =brand_aw) )+ geom_point( mapping = aes(color = gender), size = 4 ) + geom_smooth( mapping = aes(color = gender), method = "lm", se = FALSE ) + labs( title = "Does brand awareness depend on exposure, word-of-mouth, and gender?", x = "Exposure to the campaign", y = "Brand awareness" ) + facet_wrap(~wom)
gradethis::grade_code( correct = "You can also use facet_wrap(vars(wom))." )
We may want to know who is the outlier in the plot, having an exceptionally low exposure score. Use geom_text()
to add the participants' first names to the plot, as shown below. Carefully inspect the arguments for this geom.
# Adaptation: add firstname as label. ggplot2::ggplot( data = consumers, mapping = aes(x = ad_expo, y = brand_aw) )+ geom_point( mapping = aes(shape = wom, color = gender), size = 4 ) + geom_smooth( mapping = aes(color = gender), method = "lm", se = FALSE, ) + labs( title = "Does brand awareness depend on exposure, word-of-mouth, and gender?", x = "Exposure to the campaign", y = "Brand awareness" ) + geom_text( aes(label = firstname), nudge_y = 0.4, check_overlap = TRUE )
ggplot( data = consumers, mapping = aes(x = ad_expo, y = brand_aw) )+ geom_point( mapping = aes(shape = wom, color = gender), size = 4 ) + geom_smooth( mapping = aes(color = gender), method = "lm", se = FALSE, ) + labs( title = "Does brand awareness depend on exposure, word-of-mouth, and gender?", x = "Exposure to the campaign", y = "Brand awareness" ) + geom_text()
# Adaptation: swap color and shape, line color as aesthetic. ggplot( data = consumers, mapping = aes(x = ad_expo, y = brand_aw) )+ geom_point( mapping = aes(shape = wom, color = gender), size = 4 ) + geom_smooth( mapping = aes(color = gender), method = "lm", se = FALSE, ) + labs( title = "Does brand awareness depend on exposure, word-of-mouth, and gender?", x = "Exposure to the campaign", y = "Brand awareness" ) + geom_text( aes(label = firstname), nudge_y = 0.4, check_overlap = TRUE )
gradethis::grade_code( correct = "The example plot uses `geom_text()` because the boxes created by `geom_label()` obscure the data." )
Actually, we can draw all kinds of shapes on the plot, for example, an arrow drawing attention to the extremely low exposure score.
# Adaptation: Add arrow pointing to extreme value. ggplot2::ggplot( data = consumers, mapping = aes(x = ad_expo, y = brand_aw) )+ geom_point( mapping = aes(shape = wom, color = gender), size = 4 ) + geom_smooth( mapping = aes(color = gender), method = "lm", se = FALSE ) + labs( title = "Does brand awareness depend on exposure, word-of-mouth, and gender?", x = "Exposure to the campaign", y = "Brand awareness" ) + geom_segment( x = 1, xend = 1, y = 4, yend = 2, arrow = arrow( type = "closed" ) )
ggplot( data = consumers, mapping = aes(x = ad_expo, y = brand_aw) )+ geom_point( mapping = aes(shape = wom, color = gender), size = 4 ) + geom_smooth( mapping = aes(color = gender), method = "lm", se = FALSE ) + labs( title = "Does brand awareness depend on exposure, word-of-mouth, and gender?", x = "Exposure to the campaign", y = "Brand awareness" ) + geom_segment()
# Adaptation: Add arrow pointing to extreme value. ggplot( data = consumers, mapping = aes(x = ad_expo, y = brand_aw) )+ geom_point( mapping = aes(shape = wom, color = gender), size = 4 ) + geom_smooth( mapping = aes(color = gender), method = "lm", se = FALSE ) + labs( title = "Does brand awareness depend on exposure, word-of-mouth, and gender?", x = "Exposure to the campaign", y = "Brand awareness" ) + geom_segment( x = 1, xend = 1, y = 4, yend = 2, arrow = arrow( type = "closed" ) )
gradethis::grade_code( correct = "And yes, you can change the color and size of the arrow just like you can change them for points and other geoms." )
Once you master the ggplot2
package and other packages for creating plots (see the Fancy stuff part of this tutorial), you should start thinking about what you want to convey with a plot and whether the plot conveys your message in a clear and attractive way.
We will use the following criteria in this course:
The data visualization is sufficiently complex to tell a story. For example, it presents both a pattern (as a first impression) and deviations from this pattern (inviting reflection about the pattern). The deviations, however, should not be overwhelming because of too much information.
The data visualization is self-explanatory. It should be comprehensible if it is presented by itself. For example, use informative, readable labels.
The data visualization gives an accurate, not a biased view of the data. For example, sizes should accurately reflect quantity.
The data visualization uses graphic features (size, font type, colors, line styles) in such a way that the main parts are stressed and unimportant parts remain visually in the background. Justify your choices with comments in the code.
If you see a plot (or any other graphic):
fluidPage( fluidRow( column(2, radioButtons("radio", label = h3("Select a plot"), choices = list("Plot 1" = 1, "Plot 2" = 2, "Plot 3" = 3, "Plot 4" = 4, "Plot 5" = 5, "Plot 6" = 6), selected = 1), sliderInput("slider", label = "", min = 0, max = 20, value = 0, step = 1, ticks = FALSE, animate = TRUE ) ), column(10, plotOutput("regPlot") ) ) )
output$regPlot <- renderPlot({ g <- ggplot( data = consumers, aes(x = ad_expo, y = brand_aw) ) + scale_x_continuous(name = "Exposure", breaks = 1:10, limits = c(1, 10)) + scale_y_continuous(name = "Brand awareness", breaks = 1:10, limits = c(1, 10)) + theme_bw( base_size = 14 ) ## Create plot versions if (input$radio == 1) { # just regression line and confidence interval g <- g + geom_smooth(method = "lm", formula = y ~ x, se = TRUE, color = "black") } else if (input$radio == 2) { # point size reflects brand_aw g <- g + geom_point(aes(size = brand_aw), color = "grey") + geom_smooth(method = "lm", formula = y ~ x, se = FALSE) } else if (input$radio == 3) { # outlier not visible (and regression line without outlier) g <- g + geom_point(size = 2, color = "grey") + geom_smooth(data = consumers[consumers$ad_expo > 1,], mapping = aes(x = ad_expo, y = brand_aw), method = "lm", formula = y ~ x, se = FALSE) + scale_x_continuous(limits = c(4, 10)) } else if (input$radio == 4) { # grey total regression line, red regression line without outlier g <- g + geom_point(size = 2, aes(color = ifelse(ad_expo > 1, "grey", "red")), show.legend = FALSE) + geom_smooth(method = "lm", formula = y ~ x, se = FALSE, color = "grey", size = 2) + geom_smooth( data = consumers[consumers$ad_expo > 1,], mapping = aes(x = ad_expo, y = brand_aw), method = "lm", formula = y ~ x, se = FALSE, color = "red" ) + geom_text(x = 3.9, y = 4.3, label = "without outlier", size = 3, color = "red", alpha = 0.6) + scale_color_manual(values = c("red", "grey")) } else if (input$radio == 5) { # regression line per gender, with additional regression line for males without outlier g <- g + geom_point(aes(color = gender), size = 2) + geom_smooth(method = "lm", formula = y ~ x, se = FALSE, aes(color = gender)) + geom_smooth( data = consumers[consumers$ad_expo > 1 & consumers$gender == "male",], mapping = aes(x = ad_expo, y = brand_aw), method = "lm", formula = y ~ x, se = FALSE, color = "blue", linetype = "dashed" ) + geom_text(x = 4.1, y = 3.4, label = "without outlier", size = 3, color = "blue", alpha = 0.6) } else if (input$radio == 6) { # density conours and regression lines per gender g <- g + geom_density2d(aes(color = gender), adjust = 2) + geom_smooth(method = "lm", formula = y ~ x, se = FALSE, aes(color = gender)) } ## Show plot, depending on slider value if (input$slider == 0) { #show press button text ggplot(data = consumers) + geom_text(x = 0.5, y = 0.5, label = "Press the little play button", size = 12) } else if ((input$slider > 0 & input$slider < 4) | (input$slider > 10)) { # ask user to reset the slider if (input$slider == 20) { if (input$radio == 3 ) {hor_pos = 7} else {hor_pos = 5.5} g <- g + geom_text(x = hor_pos, y = 8.5, label = "Reset the slider to 0\nbefore you watch another plot.", size = 8) } #show plot g } else { # What did you see? ggplot(data = consumers) + geom_text(x = 0.5, y = 0.5, label = "What did you see?", size = 18) } })
Book on ggplot2:
Interactive training:
Your friend tried to create a non-stacked bar chart showing the proportion of females in the consumers data set who heard by word of mouth against the proportion who did not, as well as the proportion of males who heard and who did not hear by word of mouth.
ggplot(data consumers) + geom_bar( mapping = aes( x = wom, color = gender, position = "dodge" )
ggplot(data = consumers) + geom_bar( mapping = aes( x = wom, y = ..prop.., group = gender, fill = gender ), position = "dodge" )
gradethis::grade_code( correct = "To get proportions, you must indeed specify both the y argument (indicating that you want proportions) and the group argument (specifying which total to use for caluclating proportions).", incorrect = "If you don't see the problems, build up the graph from zero, step by step." )
gganimate
packageIf you want to animate your ggplot
plot, the package gganimate
provides you with tools to create an animated GIF (with the gifski
package) or a video (with the av
package).
The code below creates an animated gif using Gapminder data on life expectancy, GDP per capita, and population size by country.
Note that it takes quite some time to generate the animation.
# Install the following packages if they haven't been installed. library(gganimate) library(gifski) library(gapminder) #data used # This code creates an animated ggplot g <- ggplot(gapminder::gapminder, aes(gdpPercap, lifeExp, size = pop, colour = country)) + geom_point(alpha = 0.7, show.legend = FALSE) + scale_colour_manual(values = country_colors) + scale_size(range = c(2, 12)) + scale_x_log10() + facet_wrap(~continent) + # Here comes the gganimate specific bits labs(title = 'Year: {frame_time}', x = 'GDP per capita', y = 'life expectancy') + transition_time(year) + ease_aes('linear') + shadow_wake( wake_length = 0.2 ) # And here, we show the animated plot. gganimate::animate( plot = g, #gganimate plot to be shown nframes = 78, #1 frame for each year from 1952 to 2007 plus 2x11 additional frames for start and end renderer = gifski_renderer( #save as animated GIF file = "gapminder.gif", loop = TRUE ), start_pause = 12, #first frame shows 12 times end_pause = 12, #last frame shows 36 times rewind = FALSE #roll back to the start )
If you would have a look at the gapminder data (e.g., with View(gapminder)
), you would see that the data are available for 1952, 1957, 1962, 1967, and so on. The gganimate()
animation creates frames for the years in between. In a way, the data for the in-between-years are fabricated; they may give a wrong view of reality.
Movement is fascinating but it can also be frustrating if the user cannot pause or determine the speed of the animation. Evaluate the pros and cons of animations critically.
ggplotly()
in the plotly
packageThe plotly
library is designed for creating interactive graphics. It has its own language for creating graphs but for the ggplot
user, it provides the 'ggplotly()' function to change a ggplot
plot into an interactive plotly
plot.
The plotly
library offers the option to zoom, select items in the graph, and see additional information about the items in the graph.
If you carefully position your cursor on a dot, the respondent's first name will pop up (works better if you select the Compare data on hover option).
#these packages have been installed by UsingRTutorials library(plotly) library(gapminder) #contains the data used here #Step 1: create a ggplot g <- ggplot( data = consumers, aes(x = ad_expo, y = brand_aw) ) + geom_point(aes(color = gender), size = 2) + geom_smooth(method = "lm", formula = y ~ x, se = FALSE, aes(color = gender)) + geom_smooth( data = consumers[consumers$ad_expo > 1 & consumers$gender == "male",], mapping = aes(x = ad_expo, y = brand_aw), method = "lm", formula = y ~ x, se = FALSE, color = "blue", linetype = "dashed" ) + geom_text( aes(label = firstname), alpha = 0 #trick: make labels invisible (transparent) ) + scale_x_continuous(name = "Exposure", breaks = 1:10, limits = c(1, 10)) + scale_y_continuous(name = "Brand awareness", breaks = 1:10, limits = c(1, 10)) + theme_bw() #Step 2: Turn the ggplot into a plotly plot and use plotly options. ggplotly(g, tooltip = c("text"), dynamicTicks = TRUE)
There are (limited) options for animation, as demonstrated in the plot below. Run the code to see the plot.
#Step 1: create a ggplot and use a variable to define the frames of the animation p <- ggplot(gapminder, aes(gdpPercap, lifeExp, color = continent)) + geom_point(aes(size = pop, frame = year, ids = country)) + scale_x_log10() #this changes the scale to a log scale, so very large differences are compressed #Step 2: Turn the ggplot into a plotly plot. ggplotly(p)
p <- ggplot(gapminder, aes(gdpPercap, lifeExp, color = continent)) + geom_point(aes(size = pop, frame = year, ids = country)) + scale_x_log10() #this changes the scale to a log scale, so very large differences are compressed ggplotly(p, tooltip = "country")
gradethis::grade_code()
The shiny
package is the RStudio contribution to interactive R products. Actually, this tutorial is made with Shiny and it contains a Shiny app, namely, the animated plots used in the Evaluating a plot Section.
Here is the code for these plots. You cannot run the code here because we cannot start a Shiny app from within a Shiny app (this tutorial). If you create a new Shiny app in RStudio (File > New File > Shiny Web App) and copy and paste the below code into the Shiny app file (replacing all existing contents), you can run the app from RStudio (use the Run App button).
# load the shiny package library(shiny) # first part of the app: the User Interface (ui) ui <- fluidPage( fluidRow( #the first (and only) row in the interface column(2, #the first (left) column, width 2 out of 12 radioButtons("radio", label = h3("Select a plot"), choices = list("Plot 1" = 1, #label and value if selected "Plot 2" = 2, "Plot 3" = 3, "Plot 4" = 4, "Plot 5" = 5, "Plot 6" = 6), selected = 1), #only a slider can be animated sliderInput("slider", label = "", min = 0, max = 20, value = 0, step = 1, ticks = FALSE, animate = TRUE ) ), column(10, #the second (right) column, width 10 out of 12 plotOutput("regPlot") #containing the plot named regPlot ) ) ) # second part of the app: the R code server <- function(input, output, session) { # create a plot named regPlot to be shown in the output output$regPlot <- renderPlot({ # the basic (empty) plot g <- ggplot( data = consumers, aes(x = ad_expo, y = brand_aw) ) + scale_x_continuous(name = "Exposure", breaks = 1:10, limits = c(1, 10)) + scale_y_continuous(name = "Brand awareness", breaks = 1:10, limits = c(1, 10)) + theme_bw( base_size = 14 ) ## Create a plot version for each radio option if (input$radio == 1) { # just add regression line and confidence interval to the empty plot g <- g + geom_smooth(method = "lm", formula = y ~ x, se = TRUE, color = "black") } else if (input$radio == 2) { # point size reflects brand_aw g <- g + geom_point(aes(size = brand_aw), color = "grey") + geom_smooth(method = "lm", formula = y ~ x, se = FALSE) } else if (input$radio == 3) { # outlier not visible (and regression line without outlier) g <- g + geom_point(size = 2, color = "grey") + geom_smooth(data = consumers[consumers$ad_expo > 1,], mapping = aes(x = ad_expo, y = brand_aw), method = "lm", formula = y ~ x, se = FALSE) + scale_x_continuous(limits = c(4, 10)) } else if (input$radio == 4) { # grey total regression line, red regression line without outlier g <- g + geom_point(size = 2, aes(color = ifelse(ad_expo > 1, "grey", "red")), show.legend = FALSE) + geom_smooth(method = "lm", formula = y ~ x, se = FALSE, color = "grey", size = 2) + geom_smooth( # trick: base R to create a subset of all consumers, omitting the outlier data = consumers[consumers$ad_expo > 1,], mapping = aes(x = ad_expo, y = brand_aw), method = "lm", formula = y ~ x, se = FALSE, color = "red" ) + geom_text(x = 3.9, y = 4.3, label = "without outlier", size = 3, color = "red", alpha = 0.6) + scale_color_manual(values = c("red", "grey")) } else if (input$radio == 5) { # regression line per gender, with additional regression line # for males without outlier g <- g + geom_point(aes(color = gender), size = 2) + geom_smooth( method = "lm", formula = y ~ x, se = FALSE, aes(color = gender) ) + geom_smooth( data = consumers[consumers$ad_expo > 1 & consumers$gender == "male",], mapping = aes(x = ad_expo, y = brand_aw), method = "lm", formula = y ~ x, se = FALSE, color = "blue", linetype = "dashed" ) + geom_text(x = 4.1, y = 3.4, label = "without outlier", size = 3, color = "blue", alpha = 0.6) } else if (input$radio == 6) { # density conours and regression lines per gender g <- g + geom_density2d(aes(color = gender), adjust = 2) + geom_smooth( method = "lm", formula = y ~ x, se = FALSE, aes(color = gender) ) } # Show plot, depending on slider value # This is the animation trick: slider values range from 0 to 20, # if it is 0, a text is shown, if it is 1 or 2 or at least 10, # the plot is shown, a text is shown between 2 and 10, and # a text is added to the plot if the slider is 20. if (input$slider == 0) { #show press button text ggplot(data = consumers) + geom_text( x = 0.5, y = 0.5, label = "Press the little play button", size = 12 ) } else if ((input$slider > 0 & input$slider < 4) | (input$slider > 10)) { # ask user to reset the slider if (input$slider == 20) { if (input$radio == 3 ) {hor_pos = 7} else {hor_pos = 5.5} g <- g + geom_text( x = hor_pos, y = 8.5, label = "Reset the slider to 0\nbefore you watch another plot.", size = 8 ) } #show plot g } else { # What did you see? ggplot(data = consumers) + geom_text( x = 0.5, y = 0.5, label = "What did you see?", size = 18 ) } }) } # the command to create and run the app shinyApp(ui, server)
Check out the Shiny demos gallery for inspiration. Start simple!
If you want to create a Shiny app, first create your plot with ggplot()
, then add it to the Shiny app.
Plots are very useful for getting a first idea of your data.
As a first step, use ggplot
plots to describe individual variables and relations between variables in (one of) your Data Project file(s).
We use the tidyverse approach to R programming.
knitr::include_graphics("images/tidyverse2.png")
Source: https://r-unimelb.gitbook.io/rbook/into-the-tidyverse/the-tidyverse
tidyverse suite of packages:
Cheat sheets!
NOTE: The tidyverse package(s) are loaded in the tutorials, so they can be used.
Time to start exploring (visualizing) your Data Project data,
The tutorial must be able to find this data set, so you have to make the Data Project directory your working directory:
setwd()
command from the RStudio console to the below code box. Now, you can work with the data set within this code box.# Set your Data Project directory as working directory. # (Copy the setwd() command from RStudio here.) # Load your data in the object myData. myData <- read_csv("filename.csv") # Have a look at the variables in your data. str(myData) # Create your first plot. ggplot() # Create a second plot. ggplot() # And a third? ggplot()
Note: As long as you do not press the Start Over button, the code (and plots) are preserved in this tutorial, so you can use the code later.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.