require(learnr) library(printr) knitr::opts_chunk$set(echo = TRUE)
The aim of the lab session is to teach you the basics of the R programming language. We will follow the first chapters of Hands-On Programming with R by Garret Grolemund, a free and open-source resource.
Those who may be interested in learning more advanced topics are invited to consult Advanced R, by Hadley Wickham (another free and open-source resource) or the official R documentation.
To introduce you to the language, we will develop a small project: we want to simulate the rolling of two dice.
The first thing we have to ask ourselves is therefore how to represent a die. We know for a fact that a common die is made of six faces. Each one of them has a number, going from $1$ to $6$.
We need to understand how to tell R that we wish to save some numbers. The first thing we notice is that we can directly type into the R console what we need.
We can try to type some basic operations as
2 + 2
You will notice the [1]
that appears before you see the result. It is a way for R to tell what element of the solution is displayed. In the case of a single number, we only have one element.
Type in the R console some basic operations and observe the results
Write the R code required to add two plus two:
This change whenever we have, for instance, a vector as
91:115
Try it!
If you type a command that is incomplete, R will prompt you with a +
sign until a complete expression is formed:
15 / 3
At the same time, it will give you an error when it is not possible to obtain an expression that it can interpret.
3 % 5
Let's play some magic To see if you've got the hang of how R works, let us play a game:
We are now ready to make a new step. We know how to manipulate numbers with basic operations. We want to find a way to tell R to save the results.
To do so, we use the <-
operator.
a <- 42 a die <- 1:6 die
You can name objects however you may please, but bear in mind that names are case sensitive and that there are things you cannot do:
^
, !
, $
, @
, +
, -
, /
or *
.Also notice that R will overwrite objects without asking you, so that
a <- 42 a a <- 49 a
Notice that in the environment pane of RStudio, objects that you create are added to the list.
To see what objects you have created you can type ls()
.
ls()
Everybody loves algebra and I am sure you do too. Let us see some basic operations.
die - 1 die / 2 die ** 2 exp(die)
I am sure you have noticed that something seemingly bizarre has happened. R has expanded the scalars to vectors and then computed the element-wise operation we have told it.
To see it, let us consider what happens when we do die * die
.
die * die
What happens if we use two vectors of different dimensions? The second vector will be expanded. This feature is called vector recycling. Let us see an example.
die + 1:2
R will prompt you a warning message whenever the dimensions are not compatible.
die + 1:5
The traditional inner and outer products can be obtained by doing %*%
and %o%
.
die %*% die die %o% die
R comes with a number of functions that are useful to carry out a number of tasks.
round(3.14159265359) mean(die) factorial(5) log10(10) log(1)
When we call more functions on each other, they are executed from the inside out.
round(mean(die + 1)) die + 1 mean(die + 1) round(mean(die + 1))
This is very convenient for us. We are interested in selecting a random face of a die. The function sample
is what we need.
sample(die) sample(die) sample(die)
It looks like something is out of place. We obtain an entire shuffling of the data. Maybe we need to have a closer look at the sample
function.
To better understand a function, say foo
, we use the help
function to read the documentations. The args
function returns the arguments of foo
.
args(sample)
help(sample, help_type="text")
sample
Notice that sample
takes four arguments: x
, size
, replace
, prob
:
x
: the vector from which we want to sample.size
: the number of elements we wish to sample.replace
: to indicate whether each sampling consider all the elements of x
prob
: to indicate a vector of probabilities describing the probability of obtaining each value. # Use the function to obtain one random element of the die.
sample(die, size = 1)
Notice what we have done here. We have called a function passing some arguments:
die
size = 1
In R, it is possible to pass arguments by explicitly setting them (as we did we size = 1
) and by passing them in order.
We also see that after calling args(sample)
, we see that the last two arguments already have a value: replace = FALSE
and prob = NULL
. Such arguments are given a default value: if you do not specify their value, R will pass the default ones.
As a side, notice that any text preceded by #
is a comment and will be ignored by R. This gives you an opportunity to document your code, use it!
We are now interested in writing our first function. The function should return the sum of two dice being rolled.
The syntax to write a function is quite straightforward.
my_function <- function() {}
In this case, we have called the function my_function
. The function
constructor is then followed by brackets ()
: here we can pass the arguments. Between the curly brackets {}
we put the body of the function: the set of instructions our function should compute.
Let us try to write the first bit.
roll <- function() { # This function defines a die of 6 faces and print two values. die <- 1:6 extraction <- sample(die, size = 2) print(extraction) } roll() roll() roll() roll()
Pretty neat, right?
Try to do it again, this selecting one value!
Each time a function is called, R creates a new environment where to perform the calculation. What is an environment? We may think of it as a box where computations are made. Each box knows in which bigger box it is contained and looks there to find the arguments you are passing. At the same time, each function can only modify the objects of its own box.
So, how do we give to the outer box the values we have computed? Easy! Just append it as the last line of the function and R will take care of communicating the result.
Try to complete the function so that it returns the sum of the two values, using the sum
function.
roll <- function() { # This function defines a die of 6 faces and print two values. die <- 1:6 extraction <- sample(die, size = 2, replace = TRUE) }
roll <- function() { # This function defines a die of 6 faces and print two values. die <- 1:6 extraction <- sample(die, size = 2, replace = TRUE) sum(extraction) }
Everything seems to be working fine. What if, however, we want our function to be more general and take as input an indefinite number of dice with an indefinite number of faces? We need to pass some arguments!
roll <- function(die, number_die) { extraction <- sample(die, size = number_die, replace = FALSE) sum(extraction) } die <- 1:6 number_die <- 2 print(roll(die, number_die)) print(roll(die, number_die)) print(roll(die, number_die)) print(roll(die, number_die))
It looks great, but for a fact. Look at what happens now if we just call roll
.
roll()
Can you fix it?
roll <- function(die, number_die) { extraction <- sample(die, size = number_die, replace = FALSE) sum(extraction) }
roll <- function(die = 1:6, number_die = 2) { extraction <- sample(die, size = number_die, replace = FALSE) sum(extraction) }
So, after all this work it may be worth it to save our function and use it in the future. How can we do so?
RStudio allows you to create R script. An R script is a set of instructions that can be stored as a .R
file and executed at a later time. It is a very convenient feature, as it allows to store your (well-documented) code. To crate a new script, you can either open it manually in RStudio or press Ctrl+Shift+N
on Windows and Cmd+Shift+N
on MacOS.
During the R session, objects are created and stored by name. As we have seen, to retrieve all the objects that are currently stored, it is possible to use the ls()
command.
Moreover, objects can be removed using the rm()
function.
For instance,
rm(x, y, z, ink, junk, temp, foo, bar)
will remove all the objects with those names.
At the end of each session, R will ask you if you wish to store into a file the objects currently part of your session.
Such objects will be stored to a file called .RData
, in the current working directory.
Moreover, the command lines that you have used will be stored into a .RHistory
file.
When R is later restarted, it reloads the workspace from the file, as well as the associated commands history.
However, it is important to notice that saving .RData
is increasingly seen as a bad practice. The reason for this is simple. Imagine that you have performed a statistical analysis on some data and that you are relying for the analysis on an object you have not explicitly saved. Nobody else is going to be able to run your code!
Explicit is almost always better than implicit!
Congratulations! You have moved your very first steps using the R programming language.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.