Summarize | R Documentation |
Summary statistics for a single numeric variable, possibly separated by the levels of a factor variable or variables. This function is very similar to summary
for a numeric variable.
Summarize(object, ...)
## Default S3 method:
Summarize(
object,
digits = getOption("digits"),
na.rm = TRUE,
exclude = NULL,
nvalid = c("different", "always", "never"),
percZero = c("different", "always", "never"),
...
)
## S3 method for class 'formula'
Summarize(
object,
data = NULL,
digits = getOption("digits"),
na.rm = TRUE,
exclude = NULL,
nvalid = c("different", "always", "never"),
percZero = c("different", "always", "never"),
...
)
object |
A vector of numeric data. |
... |
Not implemented. |
digits |
A single numeric that indicates the number of decimals to round the numeric summaries. |
na.rm |
A logical that indicates whether numeric missing values ( |
exclude |
A string that contains the level that should be excluded from a factor variable. |
nvalid |
A string that indicates how the “validn” result will be handled. If |
percZero |
A string that indicates how the “percZero” result will be handled. If |
data |
A data.frame that contains the variables in |
This function is primarily used with formulas of the following types (where quant
and factor
generically represent quantitative/numeric and factor variables, respectively):
Formula | Description of Summary |
~quant | Numerical summaries (see below) of quant . |
quant~factor | Summaries of quant separated by levels in factor . |
quant~factor1*factor2 | Summaries of quant separated by the combined levels in factor1 and factor2 . |
Numerical summaries include all results from summary
(min, Q1, mean, median, Q3, and max) and the sample size, valid sample size (sample size minus number of NA
s), and standard deviation (i.e., sd
). NA
values are removed from the calculations with na.rm=TRUE
(the DEFAULT). The number of digits in the returned results are controlled with digits=
.
A named vector or data frame (when a quantitative variable is separated by one or two factor variables) of summary statistics for numeric data.
Students often need to examine basic statistics of a quantitative variable separated for different levels of a categorical variable. These results may be obtained with tapply
, by
, or aggregate
(or with functions in other packages), but the use of these functions is not obvious to newbie students or return results in a format that is not obvious to newbie students. Thus, the formula method to Summarize
allows newbie students to use a common notation (i.e., formula) to easily compute summary statistics for a quantitative variable separated by the levels of a factor.
Derek H. Ogle, DerekOgle51@gmail.com
See summary
for related one dimensional functionality. See tapply
, summaryBy
in doBy, describe
in psych, describe
in prettyR, and basicStats
in fBasics for similar “by” functionality.
## Create a data.frame of "data"
n <- 102
d <- data.frame(y=c(0,0,NA,NA,NA,runif(n-5)),
w=sample(7:9,n,replace=TRUE),
v=sample(0:2,n,replace=TRUE),
g1=factor(sample(c("A","B","C",NA),n,replace=TRUE)),
g2=factor(sample(c("male","female","UNKNOWN"),n,replace=TRUE)),
g3=sample(c("a","b","c","d"),n,replace=TRUE),
stringsAsFactors=FALSE)
# typical output of summary() for a numeric variable
summary(d$y)
# this function
Summarize(d$y,digits=3)
Summarize(~y,data=d,digits=3)
Summarize(y~1,data=d,digits=3)
# note that nvalid is not shown if there are no NAs and
# percZero is not shown if there are no zeros
Summarize(~w,data=d,digits=3)
Summarize(~v,data=d,digits=3)
# note that the nvalid and percZero results can be forced to be shown
Summarize(~w,data=d,digits=3,nvalid="always",percZero="always")
## Numeric vector by levels of a factor variable
Summarize(y~g1,data=d,digits=3)
Summarize(y~g2,data=d,digits=3)
Summarize(y~g2,data=d,digits=3,exclude="UNKNOWN")
## Numeric vector by levels of two factor variables
Summarize(y~g1+g2,data=d,digits=3)
Summarize(y~g1+g2,data=d,digits=3,exclude="UNKNOWN")
## What happens if RHS of formula is not a factor
Summarize(y~w,data=d,digits=3)
## Summarizing multiple variables in a data.frame (must reduce to numerics)
lapply(as.list(d[,1:3]),Summarize,digits=4)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.