Summary statistics for a single numeric variable, possibly separated by the levels of a factor variable or variables. This function is very similar to `summary`

for a numeric variable.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
Summarize(object, ...)
## Default S3 method:
Summarize(
object,
digits = getOption("digits"),
na.rm = TRUE,
exclude = NULL,
nvalid = c("different", "always", "never"),
percZero = c("different", "always", "never"),
...
)
## S3 method for class 'formula'
Summarize(
object,
data = NULL,
digits = getOption("digits"),
na.rm = TRUE,
exclude = NULL,
nvalid = c("different", "always", "never"),
percZero = c("different", "always", "never"),
...
)
|

`object` |
A vector of numeric data. |

`...` |
Not implemented. |

`digits` |
A single numeric that indicates the number of decimals to round the numeric summaries. |

`na.rm` |
A logical that indicates whether numeric missing values ( |

`exclude` |
A string that contains the level that should be excluded from a factor variable. |

`nvalid` |
A string that indicates how the “validn” result will be handled. If |

`percZero` |
A string that indicates how the “percZero” result will be handled. If |

`data` |
A data.frame that contains the variables in |

This function is primarily used with formulas of the following types (where `quant`

and `factor`

generically represent quantitative/numeric and factor variables, respectively):

Formula | Description of Summary |

`~quant` | Numerical summaries (see below) of `quant` . |

`quant~factor` | Summaries of `quant` separated by levels in `factor` . |

`quant~factor1*factor2` | Summaries of `quant` separated by the combined levels in `factor1` and `factor2` . |

Numerical summaries include all results from `summary`

(min, Q1, mean, median, Q3, and max) and the sample size, valid sample size (sample size minus number of `NA`

s), and standard deviation (i.e., `sd`

). `NA`

values are removed from the calculations with `na.rm=TRUE`

(the DEFAULT). The number of digits in the returned results are controlled with `digits=`

.

A named vector or data frame (when a quantitative variable is separated by one or two factor variables) of summary statistics for numeric data.

Students often need to examine basic statistics of a quantitative variable separated for different levels of a categorical variable. These results may be obtained with `tapply`

, `by`

, or `aggregate`

(or with functions in other packages), but the use of these functions is not obvious to newbie students or return results in a format that is not obvious to newbie students. Thus, the formula method to `Summarize`

allows newbie students to use a common notation (i.e., formula) to easily compute summary statistics for a quantitative variable separated by the levels of a factor.

Derek H. Ogle, derek@derekogle.com

See `summary`

for related one dimensional functionality. See `tapply`

, `summaryBy`

in doBy, `describe`

in psych, `describe`

in prettyR, and `basicStats`

in fBasics for similar “by” functionality.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
## Create a data.frame of "data"
n <- 102
d <- data.frame(y=c(0,0,NA,NA,NA,runif(n-5)),
w=sample(7:9,n,replace=TRUE),
v=sample(0:2,n,replace=TRUE),
g1=factor(sample(c("A","B","C",NA),n,replace=TRUE)),
g2=factor(sample(c("male","female","UNKNOWN"),n,replace=TRUE)),
g3=sample(c("a","b","c","d"),n,replace=TRUE),
stringsAsFactors=FALSE)
# typical output of summary() for a numeric variable
summary(d$y)
# this function
Summarize(d$y,digits=3)
Summarize(~y,data=d,digits=3)
Summarize(y~1,data=d,digits=3)
# note that nvalid is not shown if there are no NAs and
# percZero is not shown if there are no zeros
Summarize(~w,data=d,digits=3)
Summarize(~v,data=d,digits=3)
# note that the nvalid and percZero results can be forced to be shown
Summarize(~w,data=d,digits=3,nvalid="always",percZero="always")
## Numeric vector by levels of a factor variable
Summarize(y~g1,data=d,digits=3)
Summarize(y~g2,data=d,digits=3)
Summarize(y~g2,data=d,digits=3,exclude="UNKNOWN")
## Numeric vector by levels of two factor variables
Summarize(y~g1+g2,data=d,digits=3)
Summarize(y~g1+g2,data=d,digits=3,exclude="UNKNOWN")
## What happens if RHS of formula is not a factor
Summarize(y~w,data=d,digits=3)
## Summarizing multiple variables in a data.frame (must reduce to numerics)
lapply(as.list(d[,1:3]),Summarize,digits=4)
|

