dfCount: Count number of rows per group

Description Usage Arguments Details Value Performance Note See Also Examples

View source: R/dfCount.R

Description

Count how many times each distinct value of a data.frame column is observed.

Usage

1
dfCount(df, col, sort = TRUE, name = "total")

Arguments

df

A data.frame.

col

The column to count.

sort

Whether or not to sort the resulting total column.

name

The name of the total column.

Details

dfCount(x, "y") is similar in functionality to table(x$y), but performs better on large datasets (according to my not-so-thorough testing).

There are two main differences between dfCount and table:

1. dfCount returns a data.frame instead of table object

2. dfCount includes a row for number of NA observations, whereas table does not by default

Value

A data.frame with two columns: The first column is the distinct values of the given variable, the second column shows the total number of rows with that value.

Performance

This function performs much faster than its equivalent table call on large datasets, even though the table function does not sort the results. The main speed boost is due to the fact that 'dplyr' is used.

For example, with the following data.frame

df <- data.frame(a = rep(1:50, 100000))

running dfCount(df, "a") on my machine 50 times is, on average, 10x faster than table(df$a) (217 milliseconds vs 2112 milliseconds).

See the package vignette for more benchmarking analysis.

Note

The dplyr package is required for this function.

See Also

plotCount

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
if (requireNamespace("nycflights13", quietly = TRUE)) {
  flights <- nycflights13::flights
  dfCount(flights, "dest")
  dfCount(flights, "dest", sort = FALSE)
  dfCount(flights, "dest", name = "flights")
}

dfCount(infert, "education")
dfCount(infert, "education", sort = FALSE)
data.frame(table(infert$education))

daattali/rsalad documentation built on Oct. 28, 2019, 12:16 p.m.