library("purrr")
library("tallyr")

Overview

The main goal of tallyr is to provide a quick and easy way to monitor progress whilst iterating through a data frame, applying a function to each row at a time. This can be useful when the time taken for this step is sufficiently long enough to run the script in the background, coming back to the console at regular intervals to check the progress. To help with monitoring this progress, the number of rows remaining can be displayed in the console so that you can see at a glance how far the script has progressed and how far there is left to go.

Installation

The package is on github and can be installed through the remotes package

# install.packages("remotes")
remotes::install_github("gcfrench/tallyr")

There are two function in the package

Using the counter

The counter is initiated by passing the data frame into the tally_counter() function. This can be done within a pipeline containing the iteration step, for example in conjuction with the tidyverse suite of packages.

To demonstrate this take the first five rows of the iris dataset passing each row, one at a time, into a function that returns the sepal length for that particular flower.

output <- iris[1:5, ] %>%
  pmap(
    function(...) {
      message(paste0("The sepal length is ", pluck(list(...), "Sepal.Length")), " cm")
      Sys.sleep(0.25)
    })

In this example case there is a small number of rows within the data frame but if it was larger enough to run the script in the background then there is no indication of how far the script is progressing and how many more rows there is left to go.

This is where using the tally counter can help. This counter can be initiated just before running the iteration step by adding the tally_counter() function within the pipeline. On running the pipe an initial message will show the counter set to the number of rows to be iterated over.

output <- iris[1:5, ] %>%
  tally_counter() %>%
  pmap(
    function(...) {
      message(paste0("The sepal length is ", pluck(list(...), "Sepal.Length")), " cm")
      Sys.sleep(0.25)
    })

Displaying the count during each iteration step is now easily done by adding the click() function to a message within your function. Each time the function is run the counter will be updated and displayed in the console.

output <- iris[1:5, ] %>%
  tally_counter() %>%
  pmap(
    function(...) {
      message(paste0(click(), ": The sepal length is ", pluck(list(...), "Sepal.Length")), " cm")
      Sys.sleep(0.25)
    })

Changing the counter behaviour

By default the counter counts down from the total number of rows to be iterated over, allowing a quick assessment of how far there is to go with the iteration step, but this behaviour can be changed so that instead the counter counts up from zero.

You can alter this count behaviour via the type argument. On passing add to this argument the counter will increase sequentially, whilst passing subtract the counter will decrease sequentially.

output <- iris[1:5, ] %>%
  tally_counter(type = "add") %>%
  pmap(
    function(...) {
      message(paste0(click(), ": The sepal length is ", pluck(list(...), "Sepal.Length")), " cm")
      Sys.sleep(0.25)
    })

The number of digits displayed by the counter is set to four to mimic the appearance of a real tally counter. However this number of digits is not fixed and will increase to accommodate increasing number of iterations, so for example above 9,999 iterations the number of digits increases to five, above 99,999 iterations six digits will be displayed and so on.



gcfrench/tallyr documentation built on Oct. 8, 2019, 5:17 p.m.