# aggregating: Aggregating functions In mosaic: Project MOSAIC Statistics and Mathematics Teaching Utilities

## Description

The `mosaic` package makes several summary statistic functions (like `mean` and `sd`) formula aware.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35``` ```mean_(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) mean(x, ...) median(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) range(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) sd(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) max(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) min(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) sum(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) IQR(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) fivenum(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) iqr(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) prod(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) sum(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) favstats(x, ..., data = NULL, groups = NULL, na.rm = TRUE) quantile(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) var(x, y = NULL, na.rm = getOption("na.rm", FALSE), ..., data = NULL) cor(x, y = NULL, ..., data = NULL) cov(x, y = NULL, ..., data = NULL) ```

## Arguments

 `x` a numeric vector or a formula `...` additional arguments `data` a data frame in which to evaluate formulas (or bare names). Note that the default is `data = parent.frame()`. This makes it convenient to use this function interactively by treating the working environment as if it were a data frame. But this may not be appropriate for programming uses. When programming, it is best to use an explicit `data` argument – ideally supplying a data frame that contains the variables mentioned. `groups` a grouping variable, typically a name of a variable in `data` `na.rm` a logical indicating whether `NA`s should be removed before computing `y` a numeric vector or a formula

## Details

Many of these functions mask core R functions to provide an additional formula interface. Old behavior should be unchanged. But if the first argument is a formula, that formula, together with `data` are used to generate the numeric vector(s) to be summarized. Formulas of the shape `x ~ a` or `~ x | a` can be used to produce summaries of `x` for each subset defined by `a`. Two-way aggregation can be achieved using formulas of the form `x ~ a + b` or ` x ~ a | b`. See the examples.

## Note

Earlier versions of these functions supported a "bare name + data frame" interface. This functionality has been removed since it was (a) ambiguous in some cases, (b) unnecessary, and (c) difficult to maintain.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28``` ```mean(HELPrct\$age) mean( ~ age, data = HELPrct) mean( ~ drugrisk, na.rm = TRUE, data = HELPrct) mean(age ~ shuffle(sex), data = HELPrct) mean(age ~ shuffle(sex), data = HELPrct, .format = "table") # wrap in data.frame() to auto-convert awkward variable names data.frame(mean(age ~ shuffle(sex), data = HELPrct, .format = "table")) mean(age ~ sex + substance, data = HELPrct) mean( ~ age | sex + substance, data = HELPrct) mean( ~ sqrt(age), data = HELPrct) sum( ~ age, data = HELPrct) sd(HELPrct\$age) sd( ~ age, data = HELPrct) sd(age ~ sex + substance, data = HELPrct) var(HELPrct\$age) var( ~ age, data = HELPrct) var(age ~ sex + substance, data = HELPrct) IQR(width ~ sex, data = KidsFeet) iqr(width ~ sex, data = KidsFeet) favstats(width ~ sex, data = KidsFeet) cor(length ~ width, data = KidsFeet) cov(length ~ width, data = KidsFeet) tally(is.na(mcs) ~ is.na(pcs), data = HELPmiss) cov(mcs ~ pcs, data = HELPmiss) # NA because of missing data cov(mcs ~ pcs, data = HELPmiss, use = "complete") # ignore missing data # alternative approach using filter explicitly cov(mcs ~ pcs, data = HELPmiss %>% filter(!is.na(mcs) & !is.na(pcs))) ```

### Example output

```Loading required package: dplyr

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

filter, lag

The following objects are masked from 'package:base':

intersect, setdiff, setequal, union

New to ggformula?  Try the tutorials:
learnr::run_tutorial("introduction", package = "ggformula")
learnr::run_tutorial("refining", package = "ggformula")

The 'mosaic' package masks several functions from core packages in order to add
additional features.  The original behavior of these functions should not be affected by this.

Attaching package: 'mosaic'

The following object is masked from 'package:Matrix':

mean

The following objects are masked from 'package:dplyr':

count, do, tally

The following objects are masked from 'package:stats':

IQR, binom.test, cor, cor.test, cov, fivenum, median, prop.test,
quantile, sd, t.test, var

The following objects are masked from 'package:base':

max, mean, min, prod, range, sample, sum

[1] 35.65342
[1] 35.65342
[1] 1.887168
female     male
35.71028 35.63584
shuffle(sex)     mean
1       female 35.94393
2         male 35.56358
shuffle.sex.     mean
1       female 35.28972
2         male  35.7659
female.alcohol   male.alcohol female.cocaine   male.cocaine  female.heroin
39.16667       37.95035       34.85366       34.36036       34.66667
male.heroin
33.05319
female.alcohol   male.alcohol female.cocaine   male.cocaine  female.heroin
39.16667       37.95035       34.85366       34.36036       34.66667
male.heroin
33.05319
[1] 5.936703
[1] 16151
[1] 7.710266
[1] 7.710266
female.alcohol   male.alcohol female.cocaine   male.cocaine  female.heroin
7.980333       7.575644       6.195002       6.889772       8.035839
male.heroin
7.973568
[1] 59.4482
[1] 59.4482
female.alcohol   male.alcohol female.cocaine   male.cocaine  female.heroin
63.68571       57.39037       38.37805       47.46896       64.57471
male.heroin
63.57779
B    G
0.75 0.60
B    G
0.75 0.60
sex min    Q1 median    Q3 max     mean        sd  n missing
1   B 8.4 8.875   9.15 9.625 9.8 9.190000 0.4517801 20       0
2   G 7.9 8.550   8.80 9.150 9.5 8.784211 0.4935846 19       0
[1] 0.6410961
[1] 0.4304453
is.na(pcs)
is.na(mcs) TRUE FALSE
TRUE     2     0
FALSE    0   468
[1] NA
[1] 13.46433
[1] 13.46433
```

mosaic documentation built on Jan. 18, 2021, 5:09 p.m.