grouped_functions: Estimate Measures of Central Tendency for Already Grouped...

Description Usage Arguments Details Value Examples

Description

Estimates the mean, median, and mode of already grouped data given the interval ranges and the frequencies of each group.

Usage

1
2
3
4
5
grouped_mean(frequencies, intervals, sep = NULL, trim = NULL)

grouped_mode(frequencies, intervals, sep = NULL, trim = NULL, method = 1)

grouped_median(frequencies, intervals, sep = NULL, trim = NULL)

Arguments

frequencies

A vector of frequencies.

intervals

A 2-column matrix with the same number of rows as the length of frequencies, with the first column being the lower class boundary, and the second column being the upper class boundary. Alternatively, intervals may be a character vector, and you may specify sep (and possibly, trim) to have the function automatically create the required matrix.

sep

Optional character that separates lower and uppper class boundaries if intervals is entered as a character vector.

trim

Optional leading or trailing characters to trim from the character vector being used for intervals. There is an in-built pattern to trim the breakpoint labels created by base::cut(). If you are using a grouped_* function on the output of cut (where, for some reason, you no longer have access to the original data), you can use trim = "cut".

method

A single value (1 or 2) determining which method will be used to estimate the grouped mode. See the notes section for the different approaches.

Details

Calculation of Grouped Mean

The following formula is used to calculate the grouped mean:

M = (sum f * x)/n

Where:

Calculation of Grouped Median

The following forumla is used to calculate the grouped median:

M = L + (n/2 - cf)/f * c

Where:

Calculation of Grouped Mode

The following formula is used to calculate the grouped mode if method = 1:

Z = L + ((f1 - f0) / (2 * f1 - f0 - f2)) * c

Where:

Keep in mind that while it might be easy to say which is the modal group, the mode of the source data may not even be in that group. Additionally, it is possible for data to have more than one mode or conversely, no mode.

The following formula is used to calculate the grouped mode if method = 2:

M = (3 * x) - (2 * y)

Where:

Value

A single numeric value representing the grouped mean, median, or mode, depending on which function was called.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
mydf <- structure(list(salary = c("1500-1600", "1600-1700", "1700-1800",
        "1800-1900", "1900-2000", "2000-2100", "2100-2200", "2200-2300",
        "2300-2400", "2400-2500"), number = c(110L, 180L, 320L, 460L,
        850L, 250L, 130L, 70L, 20L, 10L)), .Names = c("salary", "number"),
        class = "data.frame", row.names = c(NA, -10L))
mydf

with(mydf, grouped_median(frequencies = number, intervals = salary, sep = "-"))

## Example with intervals manually specified
Freq <- mydf$number
X <- cbind(c(1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400),
           c(1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500))

grouped_median(Freq, X)

# Using `cut`
set.seed(1)
x <- sample(100, 100, replace = TRUE)
y <- data.frame(table(cut(x, 10)))

with(y, grouped_mean(Freq, Var1, sep = ",", trim = "cut"))
mean(x)

with(y, grouped_median(Freq, Var1, sep = ",", trim = "cut"))
median(x)

## Note that the mode might be really far off depending on the approach used
with(y, grouped_mode(Freq, Var1, sep = ",", trim = "cut"))
with(y, grouped_mode(Freq, Var1, sep = ",", trim = "cut", method = 2))
tail(sort(table(x)))

mrdwab/mathrrr documentation built on July 20, 2020, 11:14 p.m.