knitr::opts_chunk$set( collapse = FALSE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of rcohort
is to perform cohort analysis with R. This package build on top of data.table
hence it should bring good speed!
rcohort
use data.table
version 1.14.3, which is not available on CRAN yet. Therefore, you need to update data.table
to the development version first to be able to use rcohort
.
You can install the development version of rcohort
from GitHub with:
# install.packages("devtools") data.table::update.dev.pkg() devtools::install_github("vohai611/rcohort")
This is a basic example which shows you how to execute cohort analysis with rcohort
df = readRDS("example-data/online-retail.rds")
This is a typical dataset of an online retailer (Source)
head(df)
cohort_count()
perform cohort analysis by counting the appearance of customer in each time period. In particular, Customer are grouped by the first time they appear in the dataset. We can choose the rounded time to every "month", "week", "quarter" or "year".
library(rcohort) df1 = df |> cohort_count(.id_col = CustomerID, .time_col = InvoiceDate, time_unit = "month", percent_form = TRUE) head(df1, 5)
To include group of all customer, use all_group = TRUE
df2 = df |> cohort_count(.id_col = CustomerID, .time_col = InvoiceDate, time_unit = "month", all_group = TRUE) head(df2, 5)
Beside counting the appearance of customers, you can summarise cohort performance over time. For example, we can observe the change of the revenue over time.
df$Revenue = df$Quantity * df$UnitPrice cohort_df = df |> cohort_summarise(.id_col = CustomerID, .time_col = InvoiceDate, .summarise_col = Revenue,.fn = "sum", time_unit = "month", percent_form = TRUE) head(cohort_df,5)
We can also quick visualize cohort data by calling auto_plot
cohort_df |> auto_plot("tile")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.