aggregate_data | R Documentation |
Summarizes non-categorical variables in a dataframe by grouping them based on specified categorical variables and returns the aggregated result along with the tidyverse code used to generate it.
aggregate_data(
data,
group_vars,
summaries,
vars = NULL,
names = NULL,
quantiles = c(0.25, 0.75)
)
aggregate_dt(
data,
dt,
dt_comp,
group_vars = NULL,
summaries,
vars = NULL,
names = NULL,
quantiles = c(0.25, 0.75)
)
data |
A dataframe or survey design object to be aggregated. |
group_vars |
A character vector specifying the variables in |
summaries |
An unnamed character vector or named list of summary functions to calculate for each group. If unnamed, the vector elements should be names of variables in the dataset for which summary statistics need to be calculated. If named, the names should correspond to the summary functions (e.g., "mean", "sd", "iqr") to be applied to each variable. |
vars |
(Optional) A character vector specifying the names of variables
in the dataset for which summary statistics need to be calculated.
This argument is ignored if |
names |
(Optional) A character vector or named list providing name templates for the newly created variables. See details for more information. |
quantiles |
(Optional) A numeric vector specifying the desired quantiles (e.g., c(0.25, 0.5, 0.75)). See details for more information. |
dt |
A character string representing the name of the date-time variable in the dataset. |
dt_comp |
A character string specifying the component of the date-time to use for grouping. |
The aggregate_data()
function accepts any R function that returns a
single-value summary (e.g., mean
, var
, sd
, sum
, IQR
). By default,
new variables are named {var}_{fun}
, where {var}
is the variable name
and {fun}
is the summary function used. The user can provide custom names
using the names
argument, either as a vector of the same length as vars
,
or as a named list where the names correspond to summary functions (e.g.,
"mean" or "sd").
The special summary "missing" can be included, which counts the number of
missing values in the variable. The default name for this summary is
{var}_missing
.
If quantiles
are requested, the function calculates the specified
quantiles (e.g., 25th, 50th, 75th percentiles), creating new variables for
each quantile. To customize the names of these variables, use {p}
as a
placeholder in the names
argument, where {p}
represents the quantile
value. For example, using names = "Q{p}_{var}"
will create variables like
"Q0.25_Sepal.Length" for the 25th percentile.
An aggregated dataframe containing the summary statistics for each group, along with the tidyverse code used for the aggregation.
aggregate_dt()
: Aggregate data by dates and times
Tom Elliott, Owen Jin, Zhaoming Su
Zhaoming Su
code
aggregate_data
aggregated <-
aggregate_data(iris,
group_vars = c("Species"),
summaries = c("mean", "sd", "iqr")
)
code(aggregated)
head(aggregated)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.