aggregate_df: Compute summary statistics within groups in a nested data...

Description Usage Arguments Value Examples

View source: R/padis_general_functions.R

Description

This function allows the computation of some summary statistics within groups/persons or other clusters. The data needs to be in a long format. At least one variable needs to be a grouping-variable, e.g. id. Other variables of the data frame should be numeric.

Usage

1
2
3
aggregate_df(data, id, remove_var = NULL, prefix_out = NULL,
  intake_var = NULL, out_values = c("mean", "sd", "count", "sum", "missing",
  "cor", "min", "max", "true"))

Arguments

data

The data frame in long-format that contains the variables to be analysed

id

Character string. The id or grouping variable, in which other observations are nested

remove_var

Character string. Variables that should be removed before the computation proceeds (otherwise, the function assumes that all other variables should be used for the computation)

prefix_out

The prefix for the variables that are returned. Default is NULL. If NULL, the input from id is used as prefix.

intake_var

Character string. Specific variables (e.g. names of these variables) for which the ananylses should be run, i. e. if only a subset of variables should be used.

out_values

The values to be returned. Can be either

"mean"

Computes the mean within each group/id. Missing values are removed before computation.

"sd"

Computes the sd each group/id. Missing values are removed before computation.

"count"

Computes the number of cases within each group/id, including missings.

"sum"

Computes the sum of values within each group/id. Missing values are removed before computation.

"missing"

Counts the number of missing values in each group/id.

"cor"

Computes the within-correlation for each variable within each group/id. Pearson correlation (cor) with "pairwise.complete.obs" is used.

"max"

Returns the maximum in each group/id, ignoring NAs.

"min"

Returns the minimum in each group/id, ignoring NAs.

"true"

Per group/id, returns 0 if the group/id contains any other value than 0, otherwise returns 1. Missings are ignored.

Value

Returns a data frame in wide format (i. e. one row per group/id). Variable names are the original variable names with a correspondng prefix and an underscore (e. g. mean_ for the mean). For the correlations, the names of the two variables that are correlated with each other are pasted together and the prefix cor is added, e. g. cor.var_1.var_2 for the correlation between var_1 and var_2.

Examples

1
2
3
4
df <- aggregate_df(wide_example_data, id="id")
head(df)
data <- ssd.day
id = "PAR.ID"

kthorstmann/padis documentation built on May 24, 2019, 5:01 a.m.