Description Usage Arguments Details Value Performance Note See Also Examples
Count how many times each distinct value of a data.frame column is observed.
1 |
df |
A data.frame. |
col |
The column to count. |
sort |
Whether or not to sort the resulting total column. |
name |
The name of the total column. |
dfCount(x, "y")
is similar in functionality to table(x$y)
, but
performs better on large datasets (according to my not-so-thorough testing).
There are two main differences between dfCount
and table
:
1. dfCount
returns a data.frame
instead of table
object
2. dfCount
includes a row for number of NA observations, whereas
table
does not by default
A data.frame with two columns: The first column is the distinct values of the given variable, the second column shows the total number of rows with that value.
This function performs much faster than its equivalent table
call on
large datasets, even though the table
function does not sort the
results. The main speed boost is due to the fact that 'dplyr' is used.
For example, with the following data.frame
df <- data.frame(a = rep(1:50, 100000))
running dfCount(df, "a")
on my machine 50 times is, on average, 10x
faster than table(df$a)
(217 milliseconds vs 2112 milliseconds).
See the package vignette for more benchmarking analysis.
The dplyr
package is required for this function.
1 2 3 4 5 6 7 8 9 10 | if (requireNamespace("nycflights13", quietly = TRUE)) {
flights <- nycflights13::flights
dfCount(flights, "dest")
dfCount(flights, "dest", sort = FALSE)
dfCount(flights, "dest", name = "flights")
}
dfCount(infert, "education")
dfCount(infert, "education", sort = FALSE)
data.frame(table(infert$education))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.