library(ggplot2) library(umdpsyc) library(learnr) library(gradethis) data(mtcars) data("exam_scores") knitr::opts_chunk$set(exercise.checker = gradethis::grade_learnr) knitr::opts_chunk$set(echo = FALSE) # learnr::tutorial_options(exercise.timelimit = 60)
One of the first steps that you might take once you have some data in R is to look at descriptive statistics. Remember, the types of descriptive statistics that you use will be different depending on whether it is categorical or numerical data. We will look at numerical descriptive statistics and graphs here; check out the categorical-descriptives
tutorial in order to learn about how to do descriptive statistics and graphs for categorical data.
In the examples for this tutorial, we will use the exam scores dataset.
head(exam_scores)
This is a fictional dataset, created for the purposes of the tutorials provided in the umdpsyc
package, but are based on real studies and real results. It contains data about exam scores as well as variables that might be associated with exam scores, such as anxiety levels, method of note-taking, and amount of sleep that the student got before taking the exam.
Many of the functions for getting numerical summaries of center and spread are intuitive in R.
|Function|Type of Plot|
|---|---|
|mean(x)
|Finds the mean of a vector x
|
| median(x)
| Finds the median of a vector x
.
| sd(x)
| Finds the standard deviation of a vector x
|
| variance(x)
| Finds the variance of a vector x
|
| IQR(x)
| Finds the IQR of a vector x
|
| summary(x)
| Finds the min, 1st quartile, median, mean, 3rd quartile and max of a vector x
, or of each column in a data frame x
|
In general, these are used on a vector, rather than a data frame, so you'll need to be careful to specify the correct column within the data frame. Remember that you can use $
notation to extract a specific column from the data frame.
mean(exam_scores$exam1) sd(exam_scores$exam1)
How would you find the median of the exam 2 scores?
grade_result( pass_if(~ identical(.result, median(exam_scores$exam2)), "Good Job!"), fail_if(~ TRUE) )
The basic functions for plotting numerical variables in R
are shown in the table below.
|Function|Type of Plot|
|---|---|
|hist(x)
|histogram|
| boxplot(x)
| boxplot |
| plot(x,y)
| scatterplot |
For example, to get a simple histogram, we can use the following command.
hist(exam_scores$exam1)
We can make adjustments to this graph by adding additional arguments to the hist
function. Arguments are the inputs that we provide a function. Above, we only provided one argument: the data to be plotted. We can also specify that we want a different number of bins, for example.
hist(exam_scores$exam1, breaks = 20) # The "breaks = 20" argument sets the number of bins
We can make boxplots using the boxplots
function.
boxplot(exam_scores$exam1)
Notice that these are very bare-bones plots, without titles or axes labels in some cases. Let's add a simple title to the boxplot above. We can do this using the main
argument.
boxplot(exam_scores$exam1, main = "Scores on Exam 1")
Note that the quotations around the text you want to add as the title are very crucial! The main
argument takes in a string as the input, so you need to provide it a string variable.
qplot
As we mentioned before in the intro-to-r
tutorial, there are external packages that contain various functions, data, and other tools to help us do more with R. One of these is the ggplot2
package, which contains many functions for generating visually pleasing graphs.
# Run install.packages if you do not have it installed already. # install.packages('ggplot2') library(ggplot2)
The ggplot2
package is incredibly rich and highly customizable for an experienced user. There is an entire book you can read through on using ggplot2
to create graphics, and the code can get quite complicated. We won't go over all of that in this short introduction to producing graphs. Rather, in this tutorial, we'll focus on one function, qplot
, which is a quick way of getting nice looking graphs without doing much work.
qplot
The idea behind qplot
is that it can be used to generate graphs very quickly using just one function. To do this, it tries to detect the type of data being plotted, and generates an appropriate graph based on this. Let's try using qplot
with exam scores.
qplot(exam_scores$exam1)
Instead of using breaks
as an argument to specify the number of bins we want for the histogram, we use bins
with qplot
. In addition, we can use the data
argument to make it easier to read and parse what is happening in the function. We show two ways of doing the same thing below, with the first commented out.
# This is the same thing # qplot(exam_scores$exam1, bins = 10) qplot(exam1, data = exam_scores, bins = 10)
This made a histogram, exactly like we might have wanted. If we wanted a boxplot instead, we can add an additional argument to specify this. Note that we specify y=
because we want to plot the graph on the y-axis. The qplot
function for boxplots only works for putting the continuous variable on the y-axis -- in the categorical-descriptives
, we'll go over side-by-side boxplots which can show the distributions by group in a categorical variable, and we'll have to specify x=
there. For now, just remember to specify y=
when using qplot
for boxplots.
qplot(y = exam1, data = exam_scores, geom = 'boxplot')
If we want to make a scatterplot, we can use it very similar to how we would use plot
.
# This does the same thing # qplot(exam_scores$exam1, exam_scores$exam2) qplot(exam1, exam2, data = exam_scores)
We can add a title like we did before using the main
argument.
qplot(exam1, exam2, data = exam_scores, main = "Scatterplot of scores on two exams")
Consider the exam_scores
dataset.
data(exam_scores) head(exam_scores)
What is the overall mean of all exam scores for exam 2?
grade_result( pass_if(~ identical(.result, mean(exam_scores$exam2)), "Good Job!"), fail_if(~ TRUE) )
What is the mean of exam 1 scores for students who took notes with paper?
grade_result( pass_if(~ identical(.result, mean(exam_scores$exam1[exam_scores$note_mode == 1])), "Good Job!"), fail_if(~ TRUE) )
Create a histogram of the exam 1 scores for all students who used a computer as their method of note-taking. Use 15 bins in the histogram, and use the base R method (not using qplot
).
grade_result( pass_if(~ identical(.result, hist(exam_scores[exam_scores$note_mode == 2,'exam1'], breaks = 15)), "Good Job!"), pass_if(~ identical(.result, hist(exam_scores$exam1[exam_scores$note_mode == 2], breaks = 15)), "Good Job!"), # pass_if(~ isTRUE(all.equal(.result, qplot(exam_scores[exam_scores$note_mode == 2,'exam1'], bins = 15))), "Good Job!"), fail_if(~ TRUE) )
submission_code <- function(UID){ httr::sha1_hash('numdesc',as.character(UID)) }
Generate your submission code by putting in your UID in the function below. For example, if your UID is 2
, then your code should look like submission_code(UID = 2)
# Replace the number below with your UID submission_code(2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.