knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
There are many alternatives to perform rowwise jobs in R.
In this Article, we consider, in turns, these alternatives.
We will stick to our example about drugs usage shown in introduction.
The idea is to compare alternative ways to create a new variable named everused
which indicates if each respondent has used any of the considered pain relievers for non medical purpose or not.
This Article requires you to load the following packages:
library(lay) ## for lay() and the data library(dplyr) ## for many things library(tidyr) ## for pivot_longer() and pivot_wider() library(purrr) ## for pmap_lgl() library(slider) ## for slide() library(data.table) ## for an alternative to base and dplyr
Please install them if they are not present on your system.
One solution is to simply do the following:
drugs_full |> mutate(everused = codeine | hydrocd | methdon | morphin | oxycodp | tramadl | vicolor)
It is certainly very efficient from a computational point of view, but coding this way presents two main limitations:
drugs |> rowwise() |> mutate(everused = any(c_across(-caseid))) |> ungroup()
It is easy to use as c_across()
turns its input into a vector and rowwise()
implies that the
vector only represents one row at a time. Yet, for now it remains quite slow on large datasets (see Efficiency below).
library(tidyr) ## requires to have installed {tidyr} drugs |> pivot_longer(-caseid) |> group_by(caseid) |> mutate(everused = any(value)) |> ungroup() |> pivot_wider() |> relocate(everused, .after = last_col())
Here the trick is to turn the rowwise problem into a column problem by pivoting the values and then pivoting the results back. Many find that this involves a little too much intellectual gymnastic. It is also not particularly efficient on large dataset both in terms of computation time and memory required to pivot the tables.
library(purrr) ## requires to have installed {purrr} drugs |> mutate(everused = pmap_lgl(pick(-caseid), ~ any(...)))
This is a perfectly fine solution and actually part of what one implementation of lay()
relies on
(if .method = "tidy"
), but from a user perspective it is a little too geeky-scary.
library(slider) ## requires to have installed {slider} drugs |> mutate(everused = slide_vec(pick(-caseid), any))
The package {slider} is a powerful package which provides several sliding window functions. It can be used to perform rowwise operations and is quite similar to {lay} in terms syntax. It is however not as efficient as {lay} and I am not sure it supports the automatic splicing demonstrated above.
library(data.table) ## requires to have installed {data.table} drugs_dt <- data.table(drugs) drugs_dt[, ..I := .I] drugs_dt[, everused := any(.SD), by = ..I, .SDcols = -"caseid"] drugs_dt[, ..I := NULL] as_tibble(drugs_dt)
This is a solution for those using {data.table}. It is not particularly efficient, nor particularly easy to remember for those who do not program frequently using {data.table}.
apply()
drugs |> mutate(everused = apply(pick(-caseid), 1L, any))
This is the base R solution. Very efficient and actually part of the default method used in lay()
.
Our implementation of lay()
strips the need of defining the margin (the 1L
above) and benefits from
the automatic splicing and the lambda syntax as shown above.
for (i in ...) {...}
drugs$everused <- NA columns_in <- !colnames(drugs) %in% c("caseid", "everused") for (i in seq_len(nrow(drugs))) { drugs$everused[i] <- any(drugs[i, columns_in]) } drugs
This is another base R solution, which does not involve any external package. It is not very pretty, nor particularly efficient.
There are probably other ways. If you think of a nice one, please leave an issue and we will add it here!
The results of benchmarks comparing alternative implementations for our simple rowwise job are shown in another Article (see benchmarks).
As you will see, lay()
is not just simple and powerful, it is also quite efficient!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.