With trace data - from web logs, behavioural logs, really anything to do with user actions - reconstructing sessions (or 'sessionising') is essential. It lets an analyst divide up user actions into actual periods of sustained interaction, and from there compute a whole host of useful metrics, from session length to bounce rate.
reconstructr is a library for just that - sessionisation and metric computation - in a way that keeps all the metadata about the events you're sessionising.
The nature of a "session" has provided fodder for researchers for years. Most people take an approach based on inactivity thresholds; if someone does not take an action in N seconds, their session has ended and a new one begins on their next logged action.
Using reconstructr, we can conveniently divide events into sessions with the
sessionise function. This takes a data.frame of events (along with specifiers of which column contains the user ID, and which column contains the timestamp) and a threshold value (in seconds). When the time between events by a user crosses that threshold, the session ends and a new one begins.
We can demonstrate this using reconstructr's inbuilt session dataset:
library(reconstructr) data("session_dataset") str(session_dataset) 'data.frame': 63524 obs. of 3 variables: $ uuid : chr "47dc43895814861e21a2edf93348c826" "a736822df1890011694e7049cb3abef3" "674d2d00e096a3319874a4347caa1f4a" "f62d315398e6d04a3f2fa02e8ae42d49" ... $ timestamp: POSIXlt, format: "2014-01-07 00:00:15" "2014-01-07 00:01:11" "2014-01-07 00:01:54" ... $ url : chr "https://www.nasa.gov/history/mercury/mercury.html" "https://www.nasa.gov/images/ksclogosmall.gif" "https://www.nasa.gov/elv/hot.gif" "https://www.nasa.gov/facts/faq04.html" ... sessionised_data <- sessionise(x = session_dataset, timestamp = timestamp, user_id = uuid, threshold = 1800) str(sessionised_data) 'data.frame': 63524 obs. of 5 variables: $ uuid : chr "0005839b3e8483d50870f61f50307fa7" "000b047bad36484451f12c114ab5eb28" "000b047bad36484451f12c114ab5eb28" "000b047bad36484451f12c114ab5eb28" ... $ timestamp : POSIXlt, format: "2014-01-14 12:47:59" "2014-01-07 14:25:11" "2014-01-09 12:47:17" ... $ url : chr "https://www.nasa.gov/history/apollo/images/footprint-logo.gif" "https://www.nasa.gov/ksc.html" "https://www.nasa.gov/biomed/threat/gif/beachmousefinsmall.gif" "https://www.nasa.gov/shuttle/resources/orbiters/atlantis.html" ... $ session_id: chr "09cd65049020ed55472a2d8b1f47787e" "9dcb2f610297b3fe2c810907fa90fb8e" "70bcde51eff332d4ac820a90930f0f6e" "70bcde51eff332d4ac820a90930f0f6e" ... $ time_delta: int NA NA NA 45 4 75 274 47 NA 28 ...
Sessionisation adds two new columns; 'session_id', a unique per-session ID, and 'time_delta' - the time between an event and the previous event in the session. If the event was the first (or only) one in a session, that value will be
Crucially, existing metadata (like URLs, or activity type) is carried along with the session information and not dropped.
From the sessionised data we can compute a whole host of useful metrics, many of which have convenience functions in this package.
An important metric in session data is the bounce rate: the proportion of sessions that included only a single event. This represents (absent data quality issues) the number of sessions where a user took only one action and then simply left.
It can be computed with
bounce_rate, which takes a sessionised dataset and produces the percentage of sessions resulting in bounces. Optionally (if you provide an argument for the
user_id parameter) it produces the bounce rate on a per-user basis, rather than for the dataset overall:
str(bounce_rate(sessionised_data)) num 20.7 str(bounce_rate(sessionised_data, user_id = uuid)) 'data.frame': 10000 obs. of 2 variables: $ user_id : chr "0005839b3e8483d50870f61f50307fa7" "000b047bad36484451f12c114ab5eb28" "000b2bc1a5438d8d54d4fbec139a2fd5" "001b6e80a14ba8d809c4ff18cdbade40" ... $ bounce_rate: num 100 14.3 0 100 100 ...
time_on_page is very similar, calculating either the mean (or median) time between events - either for the dataset as a whole or, if the
by_session parameter is TRUE, for each session:
str(time_on_page(sessionised_data)) num 146 str(time_on_page(sessionised_data, by_session = TRUE)) 'data.frame': 22226 obs. of 2 variables: $ session_id : chr "00011b1e098848edee7e50a2174fe6ef" "0001f6457a4d09a8c2092278fec89a89" "000451f0869b7eab3582c093ace0253d" "0004c56ace95f92ee12bf9552401f923" ... $ time_on_page: num NaN NaN NaN NaN NaN ...
(It's not broken, it just so happened the first few sessions contained no non-NA time deltas).
session_length provide easy ways of identifying how many sessions are in a sessionised dataset (overall, or on a per-user basis) and how long those sessions are, respectively.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.