In AshirBorah/cp_bootcamp_r_tutorials:

library(learnr)
knitr::opts_chunk$set(echo = FALSE)
tutorial_options(exercise.timelimit = 60, exercise.blanks = "___+")

Basic math functions

Let's start by practicing using some standard math functions. There are two main flavors:

Functions that operate on individual numbers:
sqrt(): square root
abs(): absolute value
log(): natural logarithm
exp(): exponential
Functions that operate on sets of numbers to estimate summary statistics:
mean()
median()
max()

Try some of these out!

What happens if you apply mathematical functions like sqrt() to a numeric vector (rather than just a single value)?

num_vec <- 1:10

num_vec <- 1:10
sqrt(num_vec)

What about if you supply a vector of strings to such a function?

string_vec <- c('it', 'was', 'the', 'best', 'of', 'times')

Try out some statistics functions, like mean and median on a numeric vector.

num_vec <- 1:10

Multiple your vector by 2 and show that the mean and median of the resulting vector change in the expected way

num_vec <- 1:10

num_vec <- 1:10
mean(2*num_vec)
median(2*num_vec)

Function help

Recall the general components of a function:

output <- do_the_thing(input1, input2)

Inputs (arguments) can include:
the objects on which the function acts
additional parameters that specify how the function acts (options)
These often will have default values, so you wont NEED to specify anything.
Outputs are any results generated by the function.

For example, result <- sqrt(2) provides the number 2 as the only input to the function sqrt, and the result is assigned to the variable named result. If we just do sqrt(2), the result is printed to the console.

Whenever you're working with a new function, it's best to take a look at the 'help documentation' to get a quick overview of what the function does, it's inputs and outputs. There's often lots of info, and you don't need to go through all of it, but it's very helpful to get a feel for how to use this information.

The thing to look at first (beyond the basic description of what the function does) is the 'Usage'. This gives you a quick sense of the inputs.

For example, for the log function it looks like this:

log(x, base = exp(1)).

The inputs are x and base. Inputs which are assigned a default value (in this case base) are optional. So if you don't specify the base, it will be exp(1) (the natural logarithm). The x input does not have a default value, so if you don't specify anything it will give an error. Try it out:

log()

Look at the help docs for the mean function, using the help function. What are the 'required' inputs? What are the optional inputs?

Test your knowledge by computing the mean of the following vector x, ignoring any values that are missing (NA).

x <- c(1, 2, 3, NA, 5)

x <- c(1, 2, 3, NA, 5)
mean(x, na.rm=T)

What happens when you compute mean as follows? Why?

mean(1, 2, 3, 4, 5)

mean(1, 2, 3, 4, 5)
## This is because only the first parameter, the number 1 is used for the calculation and everything else is rejected

Let's try another example where we need to look at a function's help documentation. Look up how to use the round function to round pi to the nearest 'hundredths' digit.

pi

Writing simple functions

name_of_function <- function(argument1, argument2) {
    *statements or code that does something*
    return(some_data)
}

Write a function (call it square) that takes a number as input and returns its square. Does your function work on numeric vectors also?

square <- function(num) {
  num_square <- num **2
  return (num_square)
}

Now make a function (call it raise_to_power) that takes as input a number x and a power p and raises x to the pth power. Give the input p a default value of 2, and ensure that when you call raise_to_power(x) without providing a value of p it gives the square of x.

raise_to_power <- function(x, p=2) {
  return (x**p)
}

Using packages

lubridate is a tidyverse package which has helper functions for working with dates.

This package has a function now (which doesn't have any inputs). What happens if you try to use it now? Why?

Now load the lubridate package and try the now and today functions.

library(lubridate)
now()
today()

Fixing erros using error messages

Run the following code chunks, look at the error messages, and then try to fix the errors

x <- '1'
y <- 2
x+y

## Adding a string and an integer together

x <- 1
y <- 2
x+y

my_numeric_vector <- c(1, 2, 3, 4)
mean(my_numerc_vector)

## SPELLINGS!!!

my_numeric_vector <- c(1, 2, 3, 4)
mean(my_numeric_vector)

x <- list(1, 2, 3)
mean(x)

x <- c(1, 2, 3)
mean(x)

Testing some useful functions

Below are some more useful functions that are good to be familiar with. Test them out, and also look at the help documentation for each to get a feel for what they do.

Stats functions

Create a numeric vector and try applying the functions range and sd.

num_vec <- 1:10
range(num_vec)
sd(num_vec)

Create a second numeric vector and compute the Pearson correlation using the cor function. Then try using the same function to calculate the 'Spearman' correlation.

num_vec <- 1:10

num_vec <- 1:10
num_vec2 <- 10:1

cor(num_vec, num_vec2, method = 'pearson')
cor(num_vec, num_vec2, method = 'spearman')

Use t.test to do a t-test of whether the mean of your numeric vector is significantly different from 0.

num_vec <- -50:50
t.test(num_vec)

num_vec_2 <- 50:150
t.test(num_vec_2)

Now use the same function to do a two-sample t-test of whether the means of your two numeric vectors are different (This is what happens when you give t.test an x and a y vector as input). Use the same vectors you made to test the cor function above.

num_vec <- 1:10
num_vec2 <- 10:1

t.test(num_vec, num_vec2)

Now try making the t.test a 'paired' t-test (don't worry if you don't know what this means at this point, it should be clear how to do this from the help documentation.)

num_vec <- 1:10
num_vec2 <- 10:1

t.test(num_vec, num_vec2, paired = T)

Vector tools

Use the seq function to make a vector of numbers from 0 to 100 by 3's. Then use the length function to see how many such numbers there are.

seq(0,100,3)
length(seq(0,100,3))

Use the function sample to create a vector of 20 random integers between 1 and 5

sample(5, size=20, replace=T)

num_vec <- 1:5
sample(num_vec, size=20, replace=T)

Now use the function unique to verify the number of unique values are in your random vector

num_vec <- 1:5
rand_nums <- sample(num_vec, size=20, replace=T)
unique(rand_nums)

Set functions

Make two vectors of strings, then experiment using the functions intersect, union and setdiff. At the end you should be able to relate these functions to different regions of a Venn Diagram.

color_vec1 <- c('red', 'green', 'blue', 'purple')
color_vec2 <- c('green', 'orange', 'pink', 'black', 'white')

intersect(color_vec1, color_vec2)
union(color_vec1, color_vec2)
setdiff(color_vec1, color_vec2)