Iteration

Control Structures

Control structures allow you to put some "logic" into your R code, rather than just always executing the same R code every time. Control structures allow you to respond to inputs or to features of the data and execute different R expressions accordingly.

Commonly used control structures are

if-else

The if-else combination is probably the most commonly used control structure in R (or perhaps any language). This structure allows you to test a condition and act on it depending on whether it's true or false.

For starters, you can just use the if statement.

if(<condition>) {
        # do something
} 
# Continue with rest of code

The above code does nothing if the condition is false. If you have an action you want to execute when the condition is false, then you need an else clause.

if(<condition>) {
        # do something
} 
else {
        # do something else
}

You can have a series of tests by following the initial if with any number of else ifs.

if(<condition1>) {
        # do something
} else if(<condition2>)  {
        # do something different
} else {
        # do something else different
}

``{block type='rmdwarning'} There is also anifelsefunction which is vectorized version. It is essentially anif-elsewrapped in afor` loop so that the condition, and action, is performed on each element in a vector.

### `for` Loops

For loops are pretty much the only looping construct that you will
need in R. While you may occasionally find a need for other types of
loops, in my experience doing data analysis, I've found very few
situations where a for loop wasn't sufficient. 

In R, for loops take an iterator variable and assign it successive
values from a sequence or vector. For loops are most commonly used for
iterating over the elements of an object (list, vector, etc.)

The following three loops all have the similar behavior.

```r
x <- c("a", "b", "c", "d")

for(i in 1:length(x)) {
        ## Print out each element of 'x'
        print(x[i])  
}

The seq_along() function is commonly used in conjunction with for loops in order to generate an integer sequence based on the length of an object (in this case, the object x).

## Generate a sequence based on length of 'x'
for(i in seq_along(x)) {   
        print(x[i])
}

It is not necessary to use an index-type variable.

for(letter in x) {
        print(letter)
}

```{block2, type='rmdtip'} Nested loops are commonly needed for multidimensional or hierarchical data structures (e.g. matrices, lists). Be careful with nesting though. Nesting beyond 2 to 3 levels often makes it difficult to read/understand the code. If you find yourself in need of a large number of nested loops, you may want to break up the loops by using functions (discussed later).

### `while` Loops

While loops begin by testing a condition. If it is true, then they
execute the loop body. Once the loop body is executed, the condition
is tested again, and so forth, until the condition is false, after
which the loop exits.

```r
count <- 0
while(count < 10) {
        print(count)
        count <- count + 1
}

```{block2, type='rmdwarning'} While loops can potentially result in infinite loops if not written properly. Use with care!

Sometimes there will be more than one condition in the test.

```r
z <- 5
set.seed(1)

while(z >= 3 && z <= 10) {
        coin <- rbinom(1, 1, 0.5)

        if(coin == 1) {  ## random walk
                z <- z + 1
        } else {
                z <- z - 1
        } 
}
print(z)

Conditions are always evaluated from left to right. For example, in the above code, if z were less than 3, the second test would not have been evaluated.

repeat Loops

repeat initiates an infinite loop right from the start. These are not commonly used in statistical or data analysis applications but they do have their uses. The only way to exit a repeat loop is to call break.

One possible paradigm might be in an iterative algorithm where you may be searching for a solution and you don't want to stop until you're close enough to the solution. In this kind of situation, you often don't know in advance how many iterations it's going to take to get "close enough" to the solution.

x0 <- 1
tol <- 1e-8

repeat {
        x1 <- computeEstimate()

        if(abs(x1 - x0) < tol) {  ## Close enough?
                break
        } else {
                x0 <- x1
        } 
}

``{block type='rmdnote'} The above code will not run since thecomputeEstimate()` function is not defined. I just made it up for the purposes of this demonstration.

The loop above is a bit dangerous because there's no guarantee it will ever stop. You could get in a situation where the values of `x0` and `x1`
oscillate back and forth and never converge. Better to set a hard
limit on the number of iterations by using a `for` loop and then
report whether convergence was achieved or not.

### `next`, `break`

While not used very often it's nice to know about these.

`next` is used to skip an iteration of a loop. 

```r
for(i in 1:100) {
        if(i <= 20) {
                ## Skip the first 20 iterations
                next                 
        }
        ## Do something here
}

break is used to exit a loop immediately, regardless of what iteration the loop may be on.

for(i in 1:100) {
      print(i)

      if(i > 20) {
              ## Stop loop after 20 iterations
              break  
      }     
}

Looping

For loops are so common that that R has some functions which implement looping in a compact form to make your life easier. For a more in depth look see this

# Two dimensional matrix
M <- matrix(sample(1:16), 4, 4)
M
# apply min to rows
apply(M, 1, min)
# apply max to columns
apply(M, 2, max)

If you want row/column means or sums for a 2D matrix, be sure to investigate the highly optimized, lightning-quick colMeans, rowMeans, colSums, rowSums.

x <- list(a = 1, b = 1:3, c = 10:25)
x
lapply(x, FUN = length) 
sapply(x, FUN = length) 
vapply(x, FUN = length, FUN.VALUE = 0L) 

x <- 1:20
y <- factor(rep(letters[1:5], each = 4)) # a vector of the same length as x
tapply(x, y, sum) 

# Sums the 1st elements, the 2nd elements, etc. 
mapply(sum, 1:5, 1:5, 1:5) 

# find the mean of 10 random normal variables, 5 times
replicate(5, mean(rnorm(10)))

tapply is in a similar spirit to a common data analysis paradigm called split-apply-combine where we split our data set based on a group, apply a function or code to it, and combine the results back together. We will revisit this paradigm in greater detail when we get to R for Data Science.



DavisBrian/rclassnotes documentation built on May 17, 2019, 8:19 a.m.