pre_aggregated: Aggregation functions

View source: R/1_aggregate.R

pre_aggregatedR Documentation

Aggregation functions

Description

Used whenever the df to be analyzed is preaggregated, i.e. the data has already by grouped into periods (corresponding to itemsets).

Usage

aggregate_sequences(df, include_date = FALSE, multiset = FALSE, summary_stats = TRUE, output_directory = "~")

Arguments

df

A dataframe that has either 3 or 4 columns; 3 columns in the order of id, date, event if the date is not desired to be included; or 4 columns in the order of id, date, period, event if the date is to be included.

include_date

Logical indicator which controls the inclusion of the date variable in the returning data. If creating reports using the -generate_reports- function of approxmapR, then the dates will be included in the alignment_with_date output file if this argument is equal to TRUE - default value is FALSE.

multiset

Beta; Logical indicator which controls the exclusion of multiple events within the same event set.

summary_stats

Logical controlling printing of summary statistics regarding aggregation. Defaults to TRUE

output_directory

The path to where the exports should be placed.

Value

Returns a dataframe that has the properly classes dataframe

Examples

library(approxmapR)
library(tidyverse)

data("demo1")
demo1 <- data.frame(do.call("rbind", strsplit(as.character(demo1$id.date.item), ",")))
names(demo1) <- c("id", "period", "event")

# Identifying the earliest date per -id- and setting it as the -index_dt-
demo1 <- demo1 %>% group_by(id) %>% mutate(index_dt = min(as.Date(period, "%m/%d/%Y"))) 

# Creating an Index from the earliest date
demo1 <- demo1 %>%
          mutate(date = as.Date(period, "%m/%d/%Y")) %>%
          mutate(period = as.numeric(difftime(date, index_dt, units = "days"))) %>%
          select(id, period, event) %>% arrange(id, period)


# Aggregating custom aggregation frames with the following groupings:
#    [] index date will be first period (1),
#    [] the first 28 days after the index date will be grouped into weekly periods (2 - 4), and then
#    [] events which occurred on the 29th day or more from the index day will be grouped in a monthly frame (5+)
demo1 <- demo1 %>% group_by(id) %>% mutate(date = period,
                                          n_ndays7 = period / 7,
                                          period = as.integer(case_when(period == 0 ~ 1,
                                                             ceiling(n_ndays7) < 5 ~ ceiling(n_ndays7) + 1,
                                                             TRUE ~ floor(n_ndays7) + 2))
                                          ) %>% select(id, date, period, event)

# Since -demo1- has the date column, need to select only the id, period, and event columns if the dates are not
#    to be included
agg <- demo1 %>% select(id, period, event) %>% pre_aggregated()

# No need to select specific columns if the dates are desired to be included
agg <- demo1 %>% pre_aggregated(include_date = TRUE)

ilangurudev/approxmapR documentation built on March 22, 2022, 1:15 p.m.