knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(rmarkdown.html_vignette.check_title = FALSE)
library(retention.helpers)
library(tidyverse)

This article describes data available, for internal Charles Sturt University use, in the following packages: data.csu.retention, data.csu.activity and data.csu.exit.

The retention dataset : data.csu.retention

This includes data on student demographics, enrolments, academic results, as well as data on the Retention Teams interventions.

The activity dataset : data.csu.activity

This includes LMS activity data on students. It is in a separate dataset due to its large size.

The exit survey dataset : data.csu.exit

This includes data on the exit survey results. It is not commonly used with the other data and is sometimes sensitive so is in a separate dataset.

Common design features of the data set

There are essentially four families of tables in the data:

Common variables

A key part of the cleaning and processing of the data is consistent names for variables throughout the tables. These will be commonly used for as keys for making joins.

Data freshness

In addition to the tables, there are a collection of timestamps beginning with the prefix last_updated_*. These indicate when the data was last updated.

data.csu.retention::last_updated_student_opa

Retention data package tables

The tables in the

academic

This table gives academic results by student (id), subject and session.

data.csu.retention::academic |> 
    slice(1:10) |> 
    mutate(  # anonymising 
        id = "student_id", subject = "ABC123", 
        offering = "ABC123_202230_W_D") |>
    glimpse()

academic_details

This is historic data, only available for the three main sessions in 2019, and only for HEPPP subjects in those sessions.

data.csu.retention::academic_details |> 
    slice(1:10) |> 
    mutate( # anonymising
        id = "student_id", subject = "ABC123") |> 
    glimpse()

assessment_marks

This data collects together assessments, grades and cumulative marks for selected subjects. Each row is a result for a particular assessment item (title), for a student id, in a given subject, in a given session. Some marks are numeric and some are categorical (grades, or pass / fail items).

data.csu.retention::assessment_marks |> 
    slice(1:10) |> 
    mutate( # anonumising
        subject = "ABC123", id = "student_id", bb_pk1 = "bb_pk") |> 
    glimpse()

campus_codes

In the offering codes (formatted like ABC123_202230_W_I) the third component (the W in the example here) indicates the campus that the offering is taught from. This table matches these one or two character codes to the campus name.

data.csu.retention::campus_codes |> glimpse()

embedded_tutors

Part of the Retention Teams interventions is the embedded tutors program. This table is a record of students (id) who have attended a tutorial session, for a particular subject, in a particular session. It does not take into account multiple tutorial sessions for the same subject in the same session.

data.csu.retention::embedded_tutors |> 
    mutate( # anonymising
        id = "student_id", subject = "ABC123"
    ) |> 
    glimpse()

contact

A component of the various intervention campaigns run by the Retention Team involve communication with students. This table attempts to draw together records of the different types of contact that have happened, when, with who, and how they went. It is one row per contact attempt, for a particular student id, in a session, for a campaign, at a particular time (contact_timestamp).

data.csu.retention::contact |> 
    mutate(id = "student_id") |> 
    glimpse()

enrolments

A table of the subject enrolment records for each student. It includes when they enrolled in the subject and when they withdrew. This data generally takes about a week to align itself with what actually happened - so beware enrolment movements that are recent. One row per student id, per session, per offering, per subject - although it is technically possible that students could be moving in and out of the same subject on the same day.

data.csu.retention::enrolments |> 
    slice(1:10) |> 
    mutate(
        id = "student_id", subject = "ABC123", 
        offering = str_c("ABC123", session, "W", "D", sep = "_")
    ) |> 
    glimpse()

flags

As part of some of the intervention campaigns students may be flagged as 'at-risk' for some reason. This is recorded in the flags table. Each row is for a particular concern, by student id, in a subject / offering, in a particular session for a particular campaign. The is also identified by when it occured, however this data has been inconsistently recorded over the years and may be in any of week, flag_timestamp or trigger_date.

data.csu.retention::flags |>
    filter(session == 202230) |> 
    slice(1:10) |> 
    mutate(id = "student_id", subject = "ABC123", 
           offering = str_c("ABC123", session, "W", "D", sep = "_"),
           concern_detail = "detailed comments") |> 
    glimpse()

flags_unticked

When students are deemed at risk due to missing an early assessment item, this is checked with the academic. Some students are removed from the flag list (perhaps they had already organised an extension). These students do not appear on the flag list, but do appear on the flags_unticked table. This is a work in progress and currently only has data for the sessions: r str_c(sort(unique(data.csu.retention::flags_unticked$session)), collapse = ", ").

data.csu.retention::flags_unticked |> 
    mutate(id = "student_id", offering = "ABC123_202230_W_D") |> 
    glimpse()

interventions

This table aggregates data from flags, contact, embedded_tutors and other sources to summarise what interventions have been made by the Retention Team and how they went. It is organised as one row per student id, per session, per campaign.

data.csu.retention::interventions |> 
    mutate(id = "student_id", intervention_target = "ABC123_202090_W_D") |> 
    glimpse()

offerings

This includes data on the individual subject offerings. It is one row per offering.

data.csu.retention::offerings |> 
    slice(1:5) |> 
    mutate(academic_name = "Bob Katter", academic_email = "bob@csu.edu.au", 
           academic_id = "bkatter27") |> 
    glimpse()

student_ids

Data on student identifying variables. One row per student id, which is the university id.

data.csu.retention::student_ids |>
    names() # this is all identifiable data

student_demographics

This table includes the most recent student demographic data, one row per student id.

data.csu.retention::student_demographics |>
    slice(1:10) |> 
    mutate(id = "student_id", firstname = "firstname", lastname = "lastname") |> 
    glimpse()

student_course

This table includes data on the students enrolment in a particular course (program of study). It is one row per student id, per course, per admit_session.

data.csu.retention::student_course |>
    filter(admit_session == 202030) |> 
    slice(89:99) |> 
    mutate(
        id = "student_id", course = "Course Name", course_code = "10000AB"
    ) |> 
    glimpse()

sessions

The sessions table details important dates for the teaching session. One row per session.

data.csu.retention::sessions |> 
    glimpse()

subjects

The subjects table contains the latest subject level data. Whilst the offering table is tied to a session, this table only grabs the latest data, one row per subject.

data.csu.retention::subjects |> 
    glimpse()

triggers

This table attempts to store data on the triggers used to flag 'at-risk' students in several of the campaigns. It is one row per offering per campaign per session.

data.csu.retention::triggers |> 
    mutate(offering = "ABC123_201930_W_D", subject = "ABC123") |> 
    glimpse()

Activity data package tables

activity

This table is an aggregate of the trace data from the Learning Management System. It aggregates per student id, per day (date), per subject offering site.

data.csu.activity::activity |> 
    slice(1:10) |> 
    mutate(id = "student_id", offering = "ABC123_202290_W_D") |> 
    glimpse()

SQL for activity aggregation

This query is run on the Blackboard DDA public schema table.

``` {sql eval = F} / aa by-day / select id as id, replace(coalesce(child_course_id, course_id), 'S-', '') as offering, date as date, count(distinct session_id) as logins, count(session_id) as clicks, count(distinct content_pk1) as views

from (select u.student_id as id, cm.course_id as course_id, cmchild.course_id as child_course_id, aa.session_id as session_id, aa.timestamp::date as date, aa.content_pk1 as content_pk1 from activity_accumulator aa inner join users u on u.pk1 = aa.user_pk1 inner join course_main cm on aa.course_pk1 = cm.pk1 inner join course_users cu on cu.crsmain_pk1 = cm.pk1 and u.pk1 = cu.users_pk1 left join course_main cmchild on cmchild.pk1 = cu.child_crsmain_pk1 where u.student_id is not null and cu.role = 'S' and u.lastname not like '%PreviewUser' and aa.event_type != 'SESSION_TIMEOUT' and cm.course_id similar to 'S-%202290%' and aa.timestamp::date >= make_date(2023, 2, 1) -- one month at a time and aa.timestamp::date < make_date(2023, 3, 1) ) as tab1 group by id, offering, date;

# Exit survey data package tables

## `exit`

``` {r}
data.csu.exit::exit |> 
    slice(1:10) |>
    mutate(id = "student_id", name = "name", other_info = "other info") |> 
    glimpse()


benwhicks/retention.helpers documentation built on Feb. 6, 2023, 5:02 p.m.