fmode: Fast (Grouped, Weighted) Statistical Mode for Matrix-Like...

View source: R/fmode.R

fmodeR Documentation

Fast (Grouped, Weighted) Statistical Mode for Matrix-Like Objects

Description

fmode is a generic function and returns the (column-wise) statistical mode i.e. the most frequent value of x, (optionally) grouped by g and/or weighted by w. The TRA argument can further be used to transform x using its (grouped, weighted) mode. Ties between multiple possible modes can be resolved by taking the minimum, maximum, (default) first or last occurring mode.

Usage

fmode(x, ...)

## Default S3 method:
fmode(x, g = NULL, w = NULL, TRA = NULL, na.rm = .op[["na.rm"]],
      use.g.names = TRUE, ties = "first", nthreads = .op[["nthreads"]], ...)

## S3 method for class 'matrix'
fmode(x, g = NULL, w = NULL, TRA = NULL, na.rm = .op[["na.rm"]],
      use.g.names = TRUE, drop = TRUE, ties = "first", nthreads = .op[["nthreads"]], ...)

## S3 method for class 'data.frame'
fmode(x, g = NULL, w = NULL, TRA = NULL, na.rm = .op[["na.rm"]],
      use.g.names = TRUE, drop = TRUE, ties = "first", nthreads = .op[["nthreads"]], ...)

## S3 method for class 'grouped_df'
fmode(x, w = NULL, TRA = NULL, na.rm = .op[["na.rm"]],
      use.g.names = FALSE, keep.group_vars = TRUE, keep.w = TRUE, stub = .op[["stub"]],
      ties = "first", nthreads = .op[["nthreads"]], ...)

Arguments

x

a vector, matrix, data frame or grouped data frame (class 'grouped_df').

g

a factor, GRP object, atomic vector (internally converted to factor) or a list of vectors / factors (internally converted to a GRP object) used to group x.

w

a numeric vector of (non-negative) weights, may contain missing values.

TRA

an integer or quoted operator indicating the transformation to perform: 0 - "na" | 1 - "fill" | 2 - "replace" | 3 - "-" | 4 - "-+" | 5 - "/" | 6 - "%" | 7 - "+" | 8 - "*" | 9 - "%%" | 10 - "-%%". See TRA.

na.rm

logical. Skip missing values in x. Defaults to TRUE and implemented at very little computational cost. If na.rm = FALSE, NA is treated as any other value.

use.g.names

logical. Make group-names and add to the result as names (default method) or row-names (matrix and data frame methods). No row-names are generated for data.table's.

ties

an integer or character string specifying the method to resolve ties between multiple possible modes i.e. multiple values with the maximum frequency or sum of weights:

Int. String Description
1 "first" take the first occurring mode.
2 "min" take the smallest of the possible modes.
3 "max" take the largest of the possible modes.
4 "last" take the last occurring mode.

Note: "min"/"max" don't work with character data. See also Details.

nthreads

integer. The number of threads to utilize. Parallelism is across groups for grouped computations and at the column-level otherwise.

drop

matrix and data.frame method: Logical. TRUE drops dimensions and returns an atomic vector if g = NULL and TRA = NULL.

keep.group_vars

grouped_df method: Logical. FALSE removes grouping variables after computation.

keep.w

grouped_df method: Logical. Retain sum of weighting variable after computation (if contained in grouped_df).

stub

character. If keep.w = TRUE and stub = TRUE (default), the summed weights column is prefixed by "sum.". Users can specify a different prefix through this argument, or set it to FALSE to avoid prefixing.

...

arguments to be passed to or from other methods. If TRA is used, passing set = TRUE will transform data by reference and return the result invisibly.

Details

fmode implements a pretty fast C-level hashing algorithm inspired by the kit package to find the statistical mode.

If na.rm = FALSE, NA is not removed but treated as any other value (i.e. its frequency is counted). If all values are NA, NA is always returned.

The weighted mode is computed by summing up the weights for all distinct values and choosing the value with the largest sum. If na.rm = TRUE, missing values will be removed from both x and w i.e. utilizing only x[complete.cases(x,w)] and w[complete.cases(x,w)].

It is possible that multiple values have the same mode (the maximum frequency or sum of weights). Typical cases are simply when all values are either all the same or all distinct. In such cases, the default option ties = "first" returns the first occurring value in the data reaching the maximum frequency count or sum of weights. For example in a sample x = c(1, 3, 2, 2, 4, 4, 1, 7), the first mode is 2 as fmode goes through the data from left to right. ties = "last" on the other hand gives 1. It is also possible to take the minimum or maximum mode, i.e. fmode(x, ties = "min") returns 1, and fmode(x, ties = "max") returns 4. It should be noted that options ties = "min" and ties = "max" give unintuitive results for character data (no strict alphabetic sorting, similar to using < and > to compare character values in R). These options are also best avoided if missing values are counted (na.rm = FALSE) since no proper logical comparison with missing values is possible: With numeric data it depends, since in C++ any comparison with NA_real_ evaluates to FALSE, NA_real_ is chosen as the min or max mode only if it is also the first mode, and never otherwise. For integer data, NA_integer_ is stored as the smallest integer in C++, so it will always be chosen as the min mode and never as the max mode. For character data, NA_character_ is stored as the string "NA" in C++ and thus the behavior depends on the other character content.

fmode preserves all the attributes of the objects it is applied to (apart from names or row-names which are adjusted as necessary in grouped operations). If a data frame is passed to fmode and drop = TRUE (the default), unlist will be called on the result, which might not be sensible depending on the data at hand.

Value

The (w weighted) statistical mode of x, grouped by g, or (if TRA is used) x transformed by its (grouped, weighed) mode.

See Also

fmean, fmedian, Fast Statistical Functions, Collapse Overview

Examples

x <- c(1, 3, 2, 2, 4, 4, 1, 7, NA, NA, NA)
fmode(x)                            # Default is ties = "first"
fmode(x, ties = "last")
fmode(x, ties = "min")
fmode(x, ties = "max")
fmode(x, na.rm = FALSE)             # Here NA is the mode, regardless of ties option
fmode(x[-length(x)], na.rm = FALSE) # Not anymore..

## World Development Data
attach(wlddev)
## default vector method
fmode(PCGDP)                      # Numeric mode
head(fmode(PCGDP, iso3c))         # Grouped numeric mode
head(fmode(PCGDP, iso3c, LIFEEX)) # Grouped and weighted numeric mode
fmode(region)                     # Factor mode
fmode(date)                       # Date mode (defaults to first value since panel is balanced)
fmode(country)                    # Character mode (also defaults to first value)
fmode(OECD)                       # Logical mode
                                  # ..all the above can also be performed grouped and weighted
## matrix method
m <- qM(airquality)
fmode(m)
fmode(m, na.rm = FALSE)         # NA frequency is also counted
fmode(m, airquality$Month)      # Groupwise
fmode(m, w = airquality$Day)    # Weighted: Later days in the month are given more weight
fmode(m>50, airquality$Month)   # Groupwise logical mode
                                # etc..
## data.frame method
fmode(wlddev)                      # Calling unlist -> coerce to character vector
fmode(wlddev, drop = FALSE)        # Gives one row
head(fmode(wlddev, iso3c))         # Grouped mode
head(fmode(wlddev, iso3c, LIFEEX)) # Grouped and weighted mode

detach(wlddev)

SebKrantz/collapse documentation built on Oct. 10, 2024, 2:38 p.m.