dt_counts_and_percents: Group by, count, and percent count in a data.table

dt_counts_and_percentsR Documentation

Group by, count, and percent count in a data.table

Description

This function takes a (quoted) column to group by, counts the number of occurrences, sorts descending, and adds the percent of occurrences for each level of the grouped-by column.

Usage

dt_counts_and_percents(DT, group_by_this, percent.cutoff = 0, big.mark = FALSE)

Arguments

DT

The data.table object to operate on

group_by_this

A quoted column to group by

percent.cutoff

A percent (out of 100) such that all the count percents lower than this number will be grouped into "OTHER" in the returned data.table (default is 0)

big.mark

If FALSE (default) the "count" column is left as an integer. If not FALSE, it must be a character to separate every three digits of the count. This turns the count column into a string.

Details

For long-tailed count distributions, a cutoff on the percent can be placed; percent of counts lower than this percent will be grouped into a category called "OTHER". The percent is a number out of 100

The final row is a total count.

The quoted group-by variable must be a character or factor. If it is not, it will be temporarily converted into one and a warning is issued.

Value

Returns a data.table with three columns: the grouped-by column, a count column, and a percent column (out of 100) to two decimal places

Examples


iris_dt <- as.data.table(iris)
dt_counts_and_percents(iris_dt, "Species")
mt <- as.data.table(mtcars)
mt[, cyl:=factor(cyl)]
dt_counts_and_percents(mt, "cyl")
dt_counts_and_percents(mt, "cyl", percent.cutoff=25)


libbib documentation built on Nov. 10, 2022, 6:16 p.m.