library(learnr) knitr::opts_chunk$set(echo = FALSE) tutorial_options(exercise.timelimit = 60, exercise.blanks = "___+")
Let's start by practicing using some standard math functions. There are two main flavors:
Functions that operate on individual numbers:
sqrt()
: square root
abs()
: absolute valuelog()
: natural logarithmexp()
: exponential
Functions that operate on sets of numbers to estimate summary statistics:
mean()
median()
max()
Try some of these out!
What happens if you apply mathematical functions like sqrt()
to a numeric vector (rather than just a single value)?
num_vec <- 1:10
num_vec <- 1:10 sqrt(num_vec)
What about if you supply a vector of strings to such a function?
string_vec <- c('it', 'was', 'the', 'best', 'of', 'times')
Try out some statistics functions, like mean
and median
on a numeric vector.
num_vec <- 1:10
Multiple your vector by 2 and show that the mean
and median
of the resulting vector change in the expected way
num_vec <- 1:10
num_vec <- 1:10 mean(2*num_vec) median(2*num_vec)
Recall the general components of a function:
output <- do_the_thing(input1, input2)
Inputs (arguments) can include:
the objects on which the function acts
additional parameters that specify how the function acts (options)
These often will have default values, so you wont NEED to specify anything.
Outputs are any results generated by the function.
For example, result <- sqrt(2)
provides the number 2 as the only input to the function sqrt
, and the result is assigned to the variable named result
. If we just do sqrt(2)
, the result is printed to the console.
Whenever you're working with a new function, it's best to take a look at the 'help documentation' to get a quick overview of what the function does, it's inputs and outputs. There's often lots of info, and you don't need to go through all of it, but it's very helpful to get a feel for how to use this information.
The thing to look at first (beyond the basic description of what the function does) is the 'Usage'. This gives you a quick sense of the inputs.
For example, for the log
function it looks like this:
log(x, base = exp(1))
.
The inputs are x
and base
. Inputs which are assigned a default value (in this case base
) are optional. So if you don't specify the base, it will be exp(1)
(the natural logarithm). The x
input does not have a default value, so if you don't specify anything it will give an error. Try it out:
log()
Look at the help docs for the mean
function, using the help
function. What are the 'required' inputs? What are the optional inputs?
Test your knowledge by computing the mean of the following vector x
, ignoring any values that are missing (NA
).
x <- c(1, 2, 3, NA, 5)
x <- c(1, 2, 3, NA, 5) mean(x, na.rm=T)
What happens when you compute mean as follows? Why?
mean(1, 2, 3, 4, 5)
mean(1, 2, 3, 4, 5) ## This is because only the first parameter, the number 1 is used for the calculation and everything else is rejected
Let's try another example where we need to look at a function's help documentation. Look up how to use the round
function to round pi
to the nearest 'hundredths' digit.
pi
name_of_function <- function(argument1, argument2) { *statements or code that does something* return(some_data) }
Write a function (call it square
) that takes a number as input and returns its square.
Does your function work on numeric vectors also?
square <- function(num) { num_square <- num **2 return (num_square) }
Now make a function (call it raise_to_power
) that takes as input a number x
and a power p
and raises x to the pth power. Give the input p
a default value of 2, and ensure that when you call raise_to_power(x)
without providing a value of p
it gives the square of x.
raise_to_power <- function(x, p=2) { return (x**p) }
lubridate
is a tidyverse
package which has helper functions for working with dates.
This package has a function now
(which doesn't have any inputs). What happens if you try to use it now? Why?
Now load the lubridate
package and try the now
and today
functions.
library(lubridate) now() today()
Run the following code chunks, look at the error messages, and then try to fix the errors
x <- '1' y <- 2 x+y
## Adding a string and an integer together
x <- 1 y <- 2 x+y
my_numeric_vector <- c(1, 2, 3, 4) mean(my_numerc_vector)
## SPELLINGS!!!
my_numeric_vector <- c(1, 2, 3, 4) mean(my_numeric_vector)
x <- list(1, 2, 3) mean(x)
x <- c(1, 2, 3) mean(x)
Below are some more useful functions that are good to be familiar with. Test them out, and also look at the help documentation for each to get a feel for what they do.
Create a numeric vector and try applying the functions range
and sd
.
num_vec <- 1:10 range(num_vec) sd(num_vec)
Create a second numeric vector and compute the Pearson correlation using the cor
function. Then try using the same function to calculate the 'Spearman' correlation.
num_vec <- 1:10
num_vec <- 1:10 num_vec2 <- 10:1 cor(num_vec, num_vec2, method = 'pearson') cor(num_vec, num_vec2, method = 'spearman')
Use t.test
to do a t-test of whether the mean of your numeric vector is significantly different from 0.
num_vec <- -50:50 t.test(num_vec) num_vec_2 <- 50:150 t.test(num_vec_2)
Now use the same function to do a two-sample t-test of whether the means of your two numeric vectors are different (This is what happens when you give t.test
an x
and a y
vector as input). Use the same vectors you made to test the cor
function above.
num_vec <- 1:10 num_vec2 <- 10:1 t.test(num_vec, num_vec2)
Now try making the t.test a 'paired' t-test (don't worry if you don't know what this means at this point, it should be clear how to do this from the help documentation.)
num_vec <- 1:10 num_vec2 <- 10:1 t.test(num_vec, num_vec2, paired = T)
Use the seq
function to make a vector of numbers from 0 to 100 by 3's. Then use the length
function to see how many such numbers there are.
seq(0,100,3) length(seq(0,100,3))
Use the function sample
to create a vector of 20 random integers between 1 and 5
sample(5, size=20, replace=T) num_vec <- 1:5 sample(num_vec, size=20, replace=T)
Now use the function unique
to verify the number of unique values are in your random vector
num_vec <- 1:5 rand_nums <- sample(num_vec, size=20, replace=T) unique(rand_nums)
Make two vectors of strings, then experiment using the functions intersect
, union
and setdiff
. At the end you should be able to relate these functions to different regions of a Venn Diagram.
color_vec1 <- c('red', 'green', 'blue', 'purple') color_vec2 <- c('green', 'orange', 'pink', 'black', 'white') intersect(color_vec1, color_vec2) union(color_vec1, color_vec2) setdiff(color_vec1, color_vec2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.