# aggregating: Aggregating functions In mosaic: Project MOSAIC Statistics and Mathematics Teaching Utilities

## Description

The `mosaic` package makes several summary statistic functions (like `mean` and `sd`) formula aware.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35``` ```mean_(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) mean(x, ...) median(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) range(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) sd(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) max(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) min(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) sum(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) IQR(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) fivenum(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) iqr(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) prod(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) sum(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) favstats(x, ..., data = NULL, groups = NULL, na.rm = TRUE) quantile(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) var(x, y = NULL, na.rm = getOption("na.rm", FALSE), ..., data = NULL) cor(x, y = NULL, ..., data = NULL) cov(x, y = NULL, ..., data = NULL) ```

## Arguments

 `x` a numeric vector or a formula `...` additional arguments `data` a data frame in which to evaluate formulas (or bare names). Note that the default is `data = parent.frame()`. This makes it convenient to use this function interactively by treating the working environment as if it were a data frame. But this may not be appropriate for programming uses. When programming, it is best to use an explicit `data` argument – ideally supplying a data frame that contains the variables mentioned. `groups` a grouping variable, typically a name of a variable in `data` `na.rm` a logical indicating whether `NA`s should be removed before computing `y` a numeric vector or a formula

## Details

Many of these functions mask core R functions to provide an additional formula interface. Old behavior should be unchanged. But if the first argument is a formula, that formula, together with `data` are used to generate the numeric vector(s) to be summarized. Formulas of the shape `x ~ a` or `~ x | a` can be used to produce summaries of `x` for each subset defined by `a`. Two-way aggregation can be achieved using formulas of the form `x ~ a + b` or ` x ~ a | b`. See the examples.

## Note

Earlier versions of these functions supported a "bare name + data frame" interface. This functionality has been removed since it was (a) ambiguous in some cases, (b) unnecessary, and (c) difficult to maintain.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28``` ```mean(HELPrct\$age) mean( ~ age, data = HELPrct) mean( ~ drugrisk, na.rm = TRUE, data = HELPrct) mean(age ~ shuffle(sex), data = HELPrct) mean(age ~ shuffle(sex), data = HELPrct, .format = "table") # wrap in data.frame() to auto-convert awkward variable names data.frame(mean(age ~ shuffle(sex), data = HELPrct, .format = "table")) mean(age ~ sex + substance, data = HELPrct) mean( ~ age | sex + substance, data = HELPrct) mean( ~ sqrt(age), data = HELPrct) sum( ~ age, data = HELPrct) sd(HELPrct\$age) sd( ~ age, data = HELPrct) sd(age ~ sex + substance, data = HELPrct) var(HELPrct\$age) var( ~ age, data = HELPrct) var(age ~ sex + substance, data = HELPrct) IQR(width ~ sex, data = KidsFeet) iqr(width ~ sex, data = KidsFeet) favstats(width ~ sex, data = KidsFeet) cor(length ~ width, data = KidsFeet) cov(length ~ width, data = KidsFeet) tally(is.na(mcs) ~ is.na(pcs), data = HELPmiss) cov(mcs ~ pcs, data = HELPmiss) # NA because of missing data cov(mcs ~ pcs, data = HELPmiss, use = "complete") # ignore missing data # alternative approach using filter explicitly cov(mcs ~ pcs, data = HELPmiss %>% filter(!is.na(mcs) & !is.na(pcs))) ```

### Example output

```Loading required package: dplyr

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

filter, lag

The following objects are masked from 'package:base':

intersect, setdiff, setequal, union

New to ggformula?  Try the tutorials:
learnr::run_tutorial("introduction", package = "ggformula")
learnr::run_tutorial("refining", package = "ggformula")

The 'mosaic' package masks several functions from core packages in order to add
additional features.  The original behavior of these functions should not be affected by this.

Attaching package: 'mosaic'

The following object is masked from 'package:Matrix':

mean

The following objects are masked from 'package:dplyr':

count, do, tally

The following objects are masked from 'package:stats':

IQR, binom.test, cor, cor.test, cov, fivenum, median, prop.test,
quantile, sd, t.test, var

The following objects are masked from 'package:base':

max, mean, min, prod, range, sample, sum

 35.65342
 35.65342
 1.887168
female     male
35.71028 35.63584
shuffle(sex)     mean
1       female 35.94393
2         male 35.56358
shuffle.sex.     mean
1       female 35.28972
2         male  35.7659
female.alcohol   male.alcohol female.cocaine   male.cocaine  female.heroin
39.16667       37.95035       34.85366       34.36036       34.66667
male.heroin
33.05319
female.alcohol   male.alcohol female.cocaine   male.cocaine  female.heroin
39.16667       37.95035       34.85366       34.36036       34.66667
male.heroin
33.05319
 5.936703
 16151
 7.710266
 7.710266
female.alcohol   male.alcohol female.cocaine   male.cocaine  female.heroin
7.980333       7.575644       6.195002       6.889772       8.035839
male.heroin
7.973568
 59.4482
 59.4482
female.alcohol   male.alcohol female.cocaine   male.cocaine  female.heroin
63.68571       57.39037       38.37805       47.46896       64.57471
male.heroin
63.57779
B    G
0.75 0.60
B    G
0.75 0.60
sex min    Q1 median    Q3 max     mean        sd  n missing
1   B 8.4 8.875   9.15 9.625 9.8 9.190000 0.4517801 20       0
2   G 7.9 8.550   8.80 9.150 9.5 8.784211 0.4935846 19       0
 0.6410961
 0.4304453
is.na(pcs)
is.na(mcs) TRUE FALSE
TRUE     2     0
FALSE    0   468
 NA
 13.46433
 13.46433
```

mosaic documentation built on Jan. 18, 2021, 5:09 p.m.