library(BST02R)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
data(pain)
mydata <- pain

Often we want to calculate some things only for a specific subgroup of our patient group. (For example we want to calculate the average for the variables age and weight, but only for the women in the data set and not for the men). So it is important to select specific variables and observations. In R making these selections is called indexing. We start by showing how we can make selections within a single variable (a vector). Afterwards we will see how selecting variables and observations in a data.frame works.

Indexing a vector

As an example we will work with age variable in the data.frame mydata ; So let's create a variable that contains this single variable.

ages <- mydata$age

Making selections based on the position in a vector

The easiest way of selecting elements in a vector is by using the position of the elements in the vector. This can be done by providing the positions of the elements we want to select between square brackets after the name of the vector. So when we want to select the age of the first patient we can run the following code:

ages[1]

We can also do this for multiple elements at the same time.

selection <- ages[c(2, 3)]
selection

The vector of indices does not have to be sorted and indices may be duplicated. For example:

selection[c(2, 1, 1)]

When we use a vector with negative integers we will select all observations except those on the specified positions. So if we want to select all the ages except for the first three we can use.

ages[-c(1, 2, 3)]

It is not possible to combine positive and negative indices this way.

Selections based on a condition (TRUE / FALSE)

The second way to select elements from a vector is by using a vector of logical values (i.e.. TRUE/FALSE values) between the square brackets. In this way we select all values for which the value between the brackets is TRUE.

some_values<- c('foo', 'bar', 'baz')
some_values[c(TRUE, FALSE, TRUE)]

This way of selecting is especially useful when we use some comparison or condition to select variables.

ages[mydata$sex=='Female']
mydata$ptno[ages>65]

Often the conditions are based on the comparison operators: | Operator | Meaning | |----------|--------------------------| | == | Equal to | | != | Not equal to | | > | Greater than | | < | Smaller than | | >= | Greater than or equal to | | <= | Smaller than or equal to |

Conditions can be combined using & (and) and | or. Finally there is the logical negation operator ! ()

Making selections using names

When the elements in a vector all have a name we can use these names to select the elements.

bp_with_name <- c(sys=135, dia=85)
bp_with_name['dia']

Making selections in a list

To make selections in a list, we can use single square brackets, double square brackets and the dollar sign. The use of single square brackets works in the same way as it does for vectors.

mylist <- list(
  foo=c(1,2, 3),
  bar=c('a', 'b'),
  baz=list(TRUE, c(2, 4))
  )
mylist[c(1,3)]
mylist[c(1,2,3)==2]
mylist['bar']

We can also use double square brackets and the dollar sign for a list. There are two important differences between using single and double square brackets: 1. Using double square brackets only allows us to select a single element from the list. 2. The result of a selection with double brackets is the element itself, while if we make the selection with single brackets the result is a list consisting of the selected elements.

mylist[[1]]
mode(mylist[[1]]) # a vector
mode(mylist[1])   # a list with with a numeric vector as its single element   

There is another way to select a variable from a list which is by using the dollar sign ('$'). This is an alternative to using double square brackets in combination with a name. When we use this the result is always a vector.

mylist$foo      # results is a vector

Instead of using the dollar sign we can use double square brackets. We now need to put the name between quotes like for single square brackets. We can also use the position of the variable using these double square brackets.

mydata[['treatm']]
mydata[[1]]

It is not possible to select more than one element using double brackets; The result will always be a vector (instead of a data.frame)

Selecting observations and variables in a data.frame

Selecting observations and variables in a data.frame works more or less the same for data.frames as it does for lists. However because a data.frame is two dimensional we can two indices between the square brackets. The first one corresponds to the observations (rows) and the second corresponds to the variables (columns). So, as an example, we can select the first two observations from the third variable in the data.frame using the syntax:

mydata[c(1, 2), 3]

When the first or second position is left blank all rows or columns are selected. For example:

mydata[, 3]       # sex (3rd variable) for all patients
mydata[c(1,2), ]  # all variables for the first two patients
mydata[c(-3,-4), 'sex'] # Negative numbers and names can also be used

We have to be careful when we want to select a single variable from a data.frame, as we do above. The result will now no longer be a data.frame but it is transformed to a vector. When we want to prevent this we can use drop=FALSE, as follows:

mydata[, 3, drop=FALSE]       # data.frame met een variabele

We can also use a single index between the square brackets. This works as if the data.frame was a list of variables (it's columns).

mydata[1]
mydata[[1]]


stenw/BST02R documentation built on May 26, 2019, 4:35 a.m.