In DavisBrian/rclassnotes: Does not matter.

(PART) Programming {-}

Function Basics

To understand computations in R, two slogans are helpful:
- Everything that exists is an object.
- Everything that happens is a function call.

-- John Chambers

Introduction to Functions

Functions are an central part of robust R programming and we will spend a significant amount of time writing functions. Thinking of functions in the mathematical sense will make the properties more apparent than any other framework.

Functions are a means of abstraction. A concept/computation is encapsulated/isolated from the rest with a function.
Functions should do one thing, and do it well (compute, or plot, or save, ... not all in one go).
Side effects: your functions should not have any (unless, of course, that is the main point of that function - plotting, write to disk, ...). Functions shouldn't make any changes in any environment. The only return their output.
Do not use global variables. Everything the function needs is being passed as an argument. Function must be self-contained.
Function streamline code and process

Your First Function

All R functions have three parts:

the body(), the code inside the function.
the formals(), the list of arguments which controls how you can call the function.
the environment(), the "map" of the location of the function's variables.

When you print a function in R, it shows you these three important components. If the environment isn't displayed, it means that the function was created in the global environment.

myadd <- function(x, y) {
  message(paste0("x = ", x, "\n"))
  message(paste0("y = ", y, "\n"))
  x + y
}

The body of the function is everything between the { }. Note this does the computation AND returns the result.
x and y are the arguments to the function.
the environment this function lives in is the global environment. (We'll discuss environments more in the next section.)

When calling a function you can pass the parameters in order, by name, or a combination.

myadd(1, 3)            # arguments by position
myadd(x = 1, y = 3)    # arguments by name
myadd(y = 3, x = 1)    # name order doesn't matter
myadd(y = 3, 1)        # combination

```{block2, type='rmdtip'} Even though it’s legal, I don’t recommend messing around with the order of the arguments too much, since it can lead to some confusion. Convention is to pass arguments in the order the function defines them, and to use the arguments names if the function takes more than 2 or 3 arguments.

You can also specify default values for your arguments.  Default values _should_ be the values most often used.  `rnorm` uses the default of `mean = 0` and `sd = 1`.  We usually want to sample from the standard normal distribution, but we are not forced to. 

```r
myadd2 <- function(x = 3, y = 0){
  cat(paste0("x = ", x, "\n"))
  cat(paste0("y = ", y, "\n"))
  x + y
}
myadd2()              # use the defaults
myadd2(x = 1)
myadd2(y = 1)
myadd2(x = 1, y = 1)

By default the last line of the function is returned. Thus, there is no reason to explicitly call return, unless you are returning from the function early. Inside functions use stop to return error messages, warning to return warning messages, and message to print a message to the console. stopifnot is useful for argument checking. It checks that each argument is TRUE, and produces a generic error message if not.

f <- function(age) {

  stopifnot(is.numeric(age), length(age) == 1L)

  if (age < 0) {
    stop("age must be a positive number")
  }

  if (age < 18) {
    warning("Check your data.  We only care about adults.")
  }

  message(paste0("Your person is ", age, " years old"))
}

f("A")
f(-10)
f(10)
f(30)

Lazy Evaluation

R is lazy. Arguments to functions are evaluated lazily, that is they are evaluated only as needed in the body of the function.

In this example, the function f() has two arguments: a and b.

f <- function(a, b) {
  a^2
} 

f(2)     # this works
f(2, 1)  # this does too

This function never actually uses the argument b, so calling f(2) or f(2, 1) will not produce an error because the 2 gets positionally matched to a. It’s common to write a function that does not use an argument and not notice it simply because R never throws an error.

The Dot-dot-dot (`...`) Argument

There is a special argument in R known as the ... argument, which indicate a variable number of arguments that are usually passed on to other functions. The two most common cases for using ... in a function are:

The number of arguments passed to the function cannot be known in advance.
Extending another function and you don’t want to copy the entire argument list of the original function.

Number of arguments passed to the function cannot be known in advance.

The ... argument is also necessary when the number of arguments passed to the function cannot be known in advance. This is clear in functions like paste(), cat(), and sum().

args(paste)
args(cat)
args(sum)

Because both paste() and cat() print out text to the console by combining multiple character vectors together, it is impossible for those functions to know in advance how many character vectors will be passed to the function by the user. So the first argument to either function is .... Similarly with sum().

One catch with ... is that any arguments that appear after ... on the argument list must be named explicitly and cannot be partially matched or matched positionally.

Take a look at the arguments to the paste() function.

args(paste)

With the paste() function, the arguments sep and collapse must be named explicitly and in full if the default values are not going to be used.

Extending another function

For example, a custom plotting function may want to make use of the default plot() function along with its entire argument list. The function below changes the default for the type argument to the value type = "l" (the original plot default is type = "p").

mylineplot <- function(x, y, ...) {
        plot(x, y, type = "l", ...)         ## Pass '...' to 'plot' function
}

Sometimes you will combine both in one function.

commas <- function(...) {
  paste(..., sep = "", collapse = ", ")
}

commas(letters[1:10])

Environments & Scoping

An environment is a collection of (symbol, value) pairs, i.e. x <- 10, x is a symbol and 10 might be its value. Every environment has a parent environment and it is possible for an environment to have multiple “children”. The only environment without a parent is the empty environment.

Scoping is the set of rules that govern how R looks up the value of a symbol. In the example below, scoping is the set of rules that R applies to go from the symbol x to its value 10:

x <- 10
x

R has two types of scoping: lexical scoping, implemented automatically at the language level, and dynamic scoping, used in select functions to save typing during interactive analysis. We discuss lexical scoping here because it is intimately tied to function creation. Dynamic scoping is an advanced topic and is discussed in Advanced R.

How do we associate a value to a free variable? There is a search process that occurs that goes as follows:

If the value of a symbol is not found in the environment in which a function was defined, then the search is continued in the parent environment. The search continues up the sequence of parent environments until we hit the top-level environment; this usually the global environment (workspace) or the namespace of a package. After the top-level environment, the search continues down the search list until we hit the empty environment. If a value for a given symbol cannot be found once the empty environment is arrived at, then an error is thrown.

x <- 0

f <- function(x = -1) {
  x <- 1
  y <- 2
  c(x, y)
}

g <- function(x = -1) {
  y <- 1
  c(x, y)
}

h <- function() {
  y <- 1
  c(x, y)
}

What do the following return?

f()
g()
h()
g(h())
f(g())
g(f())

"First class objects"

Functions in R are "first class objects", which means that they can be treated much like any other R object. Importantly,

Functions can be passed as arguments to other functions. This is very handy for the various apply functions, like lapply() and sapply().
Functions can be nested, so that you can define a function inside of another function

If you're familiar with common language like C, these features might appear a bit strange. However, they are really important in R and can be useful for data analysis.

Since functions ARE objects you can pass functions as arguments and return functions as results.

my_summary <- function(x, funs = c(mean, sd), ...) {
  lapply(funs, function(f) f(x, na.rm = TRUE))
}

y <- 1:10
my_summary(y)
my_summary(y, c(mean, median, sd, IQR, mad))

Unlike most languages you can define a function within a function and / or return a function. This nested function only lives inside the parent function.

make.power <- function(n) {
  # [TBD] checks on n
  pow <- function(x) {
      # [TBD] checks on x
    x^n 
  }
  pow
}

make.power(4)  # returns a function
pow(x=4)       # Note: `pow` does not exist outside of the `make.power` function

cube <- make.power(3)          
as.list(environment(cube))
cube(2)

square <- make.power(2)
squareroot <- make.power(.5)


square(8)
squareroot(9)

Best Practices

Writing functions is an iterative process, they will not be perfect on your first try. As with most things the more you practice the closer to the final version you will get on your first pass.

Advice from the R Inferno:

Make your functions as simple as possible. Simple has many advantages:

Simple functions are likely to be human efficient: they will be easy to understand and to modify.
Simple functions are likely to be computer efficient.
Simple functions are less likely to be buggy, and bugs will be easier to fix.
(Perhaps ironically) simple functions may be more general—thinking about the heart of the matter often broadens the application.

Tips for writing "good" functions

Get the correct answer first, then worry about efficiency or speed.
Write test cases and save the script!
Don't be afraid of very short and simple functions. They can greatly add to the readability of your code.
What is a good name for the function? Since functions act on objects good function names tend to be verbs. Think about the auto-complete feature in RStudio.
What are the inputs to the function?
What should the user (not) be able to control?
Error check user input, but be reasonable.
Give useful and clear error messages.
Use sensible defaults to arguments.
Reduce duplicate code as much as possible.
Where are changes likely to occur and is there a way to abstract that piece away?
Think about corner cases, missing values, empty arguments.
Think about piping.
Write test cases and save the script!

{block type='rmdimportant'} Software testing is important, but, in part because it is frustrating and boring, many of us avoid it. 'testthat' is a testing framework for R that is easy learn and use, and integrates with your existing 'workflow'.

Exercises

Create function that takes a numeric year-quarter (ex 20183) and returns the quarter n-quarters before / after it. Example two quarters previous to 20183 is 20181.
Come up with 5 functions (you don't have program them) that will operate on your data. (Ex. Create a demographics table)
Create a read_* function that
reads in the data file
converts columns to the appropriate data types.
“Tidy” your data (if appropriate)
The first argument to your read function should be the file name to read in. Are there additional parameters that are needed? Think beyond the Subscriber Report, what are other things you typically do upon first reading in a data file.