(PART) Programming {-}

Function Basics

To understand computations in R, two slogans are helpful:
- Everything that exists is an object.
- Everything that happens is a function call.

-- John Chambers

Introduction to Functions

Functions are an central part of robust R programming and we will spend a significant amount of time writing functions. Thinking of functions in the mathematical sense will make the properties more apparent than any other framework.

Your First Function

All R functions have three parts:

When you print a function in R, it shows you these three important components. If the environment isn't displayed, it means that the function was created in the global environment.

myadd <- function(x, y) {
  message(paste0("x = ", x, "\n"))
  message(paste0("y = ", y, "\n"))
  x + y
}

When calling a function you can pass the parameters in order, by name, or a combination.

myadd(1, 3)            # arguments by position
myadd(x = 1, y = 3)    # arguments by name
myadd(y = 3, x = 1)    # name order doesn't matter
myadd(y = 3, 1)        # combination

```{block2, type='rmdtip'} Even though it’s legal, I don’t recommend messing around with the order of the arguments too much, since it can lead to some confusion. Convention is to pass arguments in the order the function defines them, and to use the arguments names if the function takes more than 2 or 3 arguments.

You can also specify default values for your arguments.  Default values _should_ be the values most often used.  `rnorm` uses the default of `mean = 0` and `sd = 1`.  We usually want to sample from the standard normal distribution, but we are not forced to. 

```r
myadd2 <- function(x = 3, y = 0){
  cat(paste0("x = ", x, "\n"))
  cat(paste0("y = ", y, "\n"))
  x + y
}
myadd2()              # use the defaults
myadd2(x = 1)
myadd2(y = 1)
myadd2(x = 1, y = 1)

By default the last line of the function is returned. Thus, there is no reason to explicitly call return, unless you are returning from the function early. Inside functions use stop to return error messages, warning to return warning messages, and message to print a message to the console. stopifnot is useful for argument checking. It checks that each argument is TRUE, and produces a generic error message if not.

f <- function(age) {

  stopifnot(is.numeric(age), length(age) == 1L)

  if (age < 0) {
    stop("age must be a positive number")
  }

  if (age < 18) {
    warning("Check your data.  We only care about adults.")
  }

  message(paste0("Your person is ", age, " years old"))
}

f("A")
f(-10)
f(10)
f(30)

Lazy Evaluation

R is lazy. Arguments to functions are evaluated lazily, that is they are evaluated only as needed in the body of the function.

In this example, the function f() has two arguments: a and b.

f <- function(a, b) {
  a^2
} 

f(2)     # this works
f(2, 1)  # this does too

This function never actually uses the argument b, so calling f(2) or f(2, 1) will not produce an error because the 2 gets positionally matched to a. It’s common to write a function that does not use an argument and not notice it simply because R never throws an error.

The Dot-dot-dot (...) Argument

There is a special argument in R known as the ... argument, which indicate a variable number of arguments that are usually passed on to other functions. The two most common cases for using ... in a function are:

  1. The number of arguments passed to the function cannot be known in advance.
  2. Extending another function and you don’t want to copy the entire argument list of the original function.

Number of arguments passed to the function cannot be known in advance.

The ... argument is also necessary when the number of arguments passed to the function cannot be known in advance. This is clear in functions like paste(), cat(), and sum().

args(paste)
args(cat)
args(sum)

Because both paste() and cat() print out text to the console by combining multiple character vectors together, it is impossible for those functions to know in advance how many character vectors will be passed to the function by the user. So the first argument to either function is .... Similarly with sum().

One catch with ... is that any arguments that appear after ... on the argument list must be named explicitly and cannot be partially matched or matched positionally.

Take a look at the arguments to the paste() function.

args(paste)

With the paste() function, the arguments sep and collapse must be named explicitly and in full if the default values are not going to be used.

Extending another function

For example, a custom plotting function may want to make use of the default plot() function along with its entire argument list. The function below changes the default for the type argument to the value type = "l" (the original plot default is type = "p").

mylineplot <- function(x, y, ...) {
        plot(x, y, type = "l", ...)         ## Pass '...' to 'plot' function
}

Sometimes you will combine both in one function.

commas <- function(...) {
  paste(..., sep = "", collapse = ", ")
}

commas(letters[1:10])

Environments & Scoping

An environment is a collection of (symbol, value) pairs, i.e. x <- 10, x is a symbol and 10 might be its value. Every environment has a parent environment and it is possible for an environment to have multiple “children”. The only environment without a parent is the empty environment.

Scoping is the set of rules that govern how R looks up the value of a symbol. In the example below, scoping is the set of rules that R applies to go from the symbol x to its value 10:

x <- 10
x

R has two types of scoping: lexical scoping, implemented automatically at the language level, and dynamic scoping, used in select functions to save typing during interactive analysis. We discuss lexical scoping here because it is intimately tied to function creation. Dynamic scoping is an advanced topic and is discussed in Advanced R.

How do we associate a value to a free variable? There is a search process that occurs that goes as follows:

If the value of a symbol is not found in the environment in which a function was defined, then the search is continued in the parent environment. The search continues up the sequence of parent environments until we hit the top-level environment; this usually the global environment (workspace) or the namespace of a package. After the top-level environment, the search continues down the search list until we hit the empty environment. If a value for a given symbol cannot be found once the empty environment is arrived at, then an error is thrown.

x <- 0

f <- function(x = -1) {
  x <- 1
  y <- 2
  c(x, y)
}

g <- function(x = -1) {
  y <- 1
  c(x, y)
}

h <- function() {
  y <- 1
  c(x, y)
}

What do the following return?

"First class objects"

Functions in R are "first class objects", which means that they can be treated much like any other R object. Importantly,

If you're familiar with common language like C, these features might appear a bit strange. However, they are really important in R and can be useful for data analysis.

Since functions ARE objects you can pass functions as arguments and return functions as results.

my_summary <- function(x, funs = c(mean, sd), ...) {
  lapply(funs, function(f) f(x, na.rm = TRUE))
}

y <- 1:10
my_summary(y)
my_summary(y, c(mean, median, sd, IQR, mad))

Unlike most languages you can define a function within a function and / or return a function. This nested function only lives inside the parent function.

make.power <- function(n) {
  # [TBD] checks on n
  pow <- function(x) {
      # [TBD] checks on x
    x^n 
  }
  pow
}

make.power(4)  # returns a function
pow(x=4)       # Note: `pow` does not exist outside of the `make.power` function

cube <- make.power(3)          
as.list(environment(cube))
cube(2)

square <- make.power(2)
squareroot <- make.power(.5)


square(8)
squareroot(9)

Best Practices

Writing functions is an iterative process, they will not be perfect on your first try. As with most things the more you practice the closer to the final version you will get on your first pass.

Advice from the R Inferno:

Make your functions as simple as possible. Simple has many advantages:

Tips for writing "good" functions

{block type='rmdimportant'} Software testing is important, but, in part because it is frustrating and boring, many of us avoid it. 'testthat' is a testing framework for R that is easy learn and use, and integrates with your existing 'workflow'.

Exercises

  1. Create function that takes a numeric year-quarter (ex 20183) and returns the quarter n-quarters before / after it. Example two quarters previous to 20183 is 20181.
  2. Come up with 5 functions (you don't have program them) that will operate on your data. (Ex. Create a demographics table)
  3. Create a read_* function that
  4. reads in the data file
  5. converts columns to the appropriate data types.
  6. “Tidy” your data (if appropriate)
    The first argument to your read function should be the file name to read in. Are there additional parameters that are needed? Think beyond the Subscriber Report, what are other things you typically do upon first reading in a data file.


DavisBrian/rclassnotes documentation built on May 17, 2019, 8:19 a.m.