To understand computations in R, two slogans are helpful:
- Everything that exists is an object.
- Everything that happens is a function call.-- John Chambers
Functions are an central part of robust R programming and we will spend a significant amount of time writing functions. Thinking of functions in the mathematical sense will make the properties more apparent than any other framework.
All R functions have three parts:
the body()
, the code inside the function.
the formals()
, the list of arguments which controls how you can call the function.
the environment()
, the "map" of the location of the function's variables.
When you print a function in R, it shows you these three important components. If the environment isn't displayed, it means that the function was created in the global environment.
myadd <- function(x, y) { message(paste0("x = ", x, "\n")) message(paste0("y = ", y, "\n")) x + y }
{ }
. Note this does the computation AND returns the result.x
and y
are the arguments to the function.When calling a function you can pass the parameters in order, by name, or a combination.
myadd(1, 3) # arguments by position myadd(x = 1, y = 3) # arguments by name myadd(y = 3, x = 1) # name order doesn't matter myadd(y = 3, 1) # combination
```{block2, type='rmdtip'} Even though it’s legal, I don’t recommend messing around with the order of the arguments too much, since it can lead to some confusion. Convention is to pass arguments in the order the function defines them, and to use the arguments names if the function takes more than 2 or 3 arguments.
You can also specify default values for your arguments. Default values _should_ be the values most often used. `rnorm` uses the default of `mean = 0` and `sd = 1`. We usually want to sample from the standard normal distribution, but we are not forced to. ```r myadd2 <- function(x = 3, y = 0){ cat(paste0("x = ", x, "\n")) cat(paste0("y = ", y, "\n")) x + y } myadd2() # use the defaults myadd2(x = 1) myadd2(y = 1) myadd2(x = 1, y = 1)
By default the last line of the function is returned. Thus, there is no reason to explicitly call return
, unless you are returning from the function early. Inside functions use stop
to return error messages, warning
to return warning messages, and message
to print a message to the console. stopifnot
is useful for argument checking. It checks that each argument is TRUE, and produces a generic error message if not.
f <- function(age) { stopifnot(is.numeric(age), length(age) == 1L) if (age < 0) { stop("age must be a positive number") } if (age < 18) { warning("Check your data. We only care about adults.") } message(paste0("Your person is ", age, " years old")) } f("A") f(-10) f(10) f(30)
R is lazy. Arguments to functions are evaluated lazily, that is they are evaluated only as needed in the body of the function.
In this example, the function f()
has two arguments: a
and b
.
f <- function(a, b) { a^2 } f(2) # this works f(2, 1) # this does too
This function never actually uses the argument b
, so calling f(2)
or f(2, 1)
will not produce an error because the 2 gets positionally matched to a. It’s common to write a function that does not use an argument and not notice it simply because R never throws an error.
...
) ArgumentThere is a special argument in R known as the ...
argument, which indicate a variable number of arguments that are usually passed on to other functions. The two most common cases for using ...
in a function are:
Number of arguments passed to the function cannot be known in advance.
The ...
argument is also necessary when the number of arguments passed to the function cannot be known in advance. This is clear in functions like paste()
, cat()
, and sum()
.
args(paste) args(cat) args(sum)
Because both paste()
and cat()
print out text to the console by combining multiple character vectors together, it is impossible for those functions to know in advance how many character vectors will be passed to the function by the user. So the first argument to either function is ...
. Similarly with sum()
.
One catch with ...
is that any arguments that appear after ...
on the argument list must be named explicitly and cannot be partially matched or matched positionally.
Take a look at the arguments to the paste()
function.
args(paste)
With the paste()
function, the arguments sep
and collapse
must be named explicitly and in full if the default values are not going to be used.
Extending another function
For example, a custom plotting function may want to make use of the default plot()
function along with its entire argument list. The function below changes the default for the type
argument to the value type = "l"
(the original plot
default is type = "p"
).
mylineplot <- function(x, y, ...) { plot(x, y, type = "l", ...) ## Pass '...' to 'plot' function }
Sometimes you will combine both in one function.
commas <- function(...) { paste(..., sep = "", collapse = ", ") } commas(letters[1:10])
An environment is a collection of (symbol, value) pairs, i.e. x <- 10
, x
is a symbol and 10
might be its value. Every environment has a parent environment and it is possible for an environment to have multiple “children”. The only environment without a parent is the empty environment.
Scoping is the set of rules that govern how R looks up the value of a symbol. In the example below, scoping is the set of rules that R applies to go from the symbol x
to its value 10
:
x <- 10 x
R has two types of scoping: lexical scoping, implemented automatically at the language level, and dynamic scoping, used in select functions to save typing during interactive analysis. We discuss lexical scoping here because it is intimately tied to function creation. Dynamic scoping is an advanced topic and is discussed in Advanced R.
How do we associate a value to a free variable? There is a search process that occurs that goes as follows:
If the value of a symbol is not found in the environment in which a function was defined, then the search is continued in the parent environment. The search continues up the sequence of parent environments until we hit the top-level environment; this usually the global environment (workspace) or the namespace of a package. After the top-level environment, the search continues down the search list until we hit the empty environment. If a value for a given symbol cannot be found once the empty environment is arrived at, then an error is thrown.
x <- 0 f <- function(x = -1) { x <- 1 y <- 2 c(x, y) } g <- function(x = -1) { y <- 1 c(x, y) } h <- function() { y <- 1 c(x, y) }
What do the following return?
f()
g()
h()
g(h())
f(g())
g(f())
Functions in R are "first class objects", which means that they can be treated much like any other R object. Importantly,
Functions can be passed as arguments to other functions. This is very handy for the various apply functions, like lapply()
and sapply()
.
Functions can be nested, so that you can define a function inside of another function
If you're familiar with common language like C, these features might appear a bit strange. However, they are really important in R and can be useful for data analysis.
Since functions ARE objects you can pass functions as arguments and return functions as results.
my_summary <- function(x, funs = c(mean, sd), ...) { lapply(funs, function(f) f(x, na.rm = TRUE)) } y <- 1:10 my_summary(y) my_summary(y, c(mean, median, sd, IQR, mad))
Unlike most languages you can define a function within a function and / or return a function. This nested function only lives inside the parent function.
make.power <- function(n) { # [TBD] checks on n pow <- function(x) { # [TBD] checks on x x^n } pow } make.power(4) # returns a function pow(x=4) # Note: `pow` does not exist outside of the `make.power` function cube <- make.power(3) as.list(environment(cube)) cube(2) square <- make.power(2) squareroot <- make.power(.5) square(8) squareroot(9)
Writing functions is an iterative process, they will not be perfect on your first try. As with most things the more you practice the closer to the final version you will get on your first pass.
Make your functions as simple as possible. Simple has many advantages:
Tips for writing "good" functions
{block type='rmdimportant'}
Software testing is important, but, in part because it is frustrating and boring, many of us avoid it. 'testthat' is a testing framework for R that is easy learn and use, and integrates with your existing 'workflow'.
read_*
function thatAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.